TOXICOLOGICAL SCIENCES 97(1), 55–64 (2007) doi:10.1093/toxsci/kfm023 Advance Access publication February 20, 2007 Application of Genomic Biomarkers to Predict Increased Lung Tumor Incidence in 2-Year Rodent Cancer Bioassays Russell S. Thomas,1 Linda Pluta, Longlong Yang, and Thomas A. Halsey2 The Hamner Institutes for Health Sciences, 6 Davis Drive, Research Triangle Park, NC 27709-2137 Received December 23, 2006; accepted February 14, 2007 Rodent cancer bioassays are part of a legacy of safety testing that has not changed significantly over the past 30 years. The bioassays are expensive, time consuming, and use hundreds of animals. Fewer than 1500 chemicals have been tested in a rodent cancer bioassay compared to the thousands of environmental and industrial chemicals that remain untested for carcinogenic activity. In this study, we used existing data generated by the National Toxicology Program (NTP) to identify gene expression biomarkers that can predict results from a rodent cancer bioassay. A set of 13 diverse chemicals was selected from those tested by the NTP. Seven chemicals were positive for increased lung tumor incidence in female B6C3F1 mice and six were negative. Female mice were exposed subchronically to each of the 13 chemicals, and microarray analysis was performed on the lung. Statistical classification analysis using the gene expression profiles identified a set of eight probe sets corresponding to six genes whose expression correctly predicted the increase in lung tumor incidence with 93.9% accuracy. The sensitivity and specificity were 95.2 and 91.8%, respectively. Among the six genes in the predictive signature, most were enzymes involved in endogenous and xenobiotic metabolism, and one gene was a growth factor receptor involved in lung development. The results demonstrate that increases in chemically induced lung tumor incidence in female mice can be predicted using gene biomarkers from a subchronic exposure and may form the basis of a more efficient and economical approach for evaluating the carcinogenic activity of chemicals. Key Words: genomics; biomarkers; rodent cancer bioassays. INTRODUCTION The primary goal of toxicology and safety testing is to identify agents that have the potential to cause adverse effects in humans. Unfortunately, many of these tests have not changed significantly in the past 30 years, and most are inefficient, costly, and rely heavily on the use of animals. The rodent cancer bio1 To whom correspondence should be addressed. Fax: (919) 558-1300. E-mail: [email protected]. 2 Present address: Almac Diagnostics, 801-1 Capitola Drive, Durham, NC 27713. assay is one of these safety tests and was originally established as a screen to identify potential carcinogens that would be further analyzed in human epidemiological studies (Bucher and Portier, 2004). Today, the rodent cancer bioassay has evolved into the primary means to determine the carcinogenic potential of a chemical and generate quantitative information on the dose-response behavior for chemical risk assessments. The experimental design for rodent cancer bioassays involves exposing mice and rats of both sexes for a period of 2 years. Several dosage levels are chosen with the high dose corresponding to the maximum tolerated dose. Approximately 50 animals per sex per dose level are used in each study. Due to the resource-intensive nature of these studies, each bioassay costs $2–4 million and takes over 3 years to complete (National Toxicology Program [NTP], 1996). Over the past 30 years, only 1468 chemicals have been tested in a rodent cancer bioassay (Gold et al., 1999). By comparison, approximately 9000 chemicals are used by industry in quantities greater than 10,000 lbs, and nearly 90,000 chemicals have been inventoried by the U.S. Environmental Protection Agency as part of the Toxic Substances Control Act. Given the disparity between the number of chemicals tested in a rodent cancer bioassay and the number of chemicals used by industry, a more efficient and economical system of identifying chemical carcinogens needs to be developed. Despite considerable advances in the biomedical sciences over the past decade, few viable alternatives to the rodent cancer bioassay have been identified. An effort by the NTP to replace the 2-year rodent bioassay with shorter term assessments in transgenic mouse models has yielded limited success (Pritchard et al., 2003). The predictive accuracy of the individual transgenic mouse models ranged from 74 to 81% and was as high as 83% when used in combination (Pritchard et al., 2003). Using a transcriptomic approach, other investigators have showed that gene expression biomarkers following a 24-h exposure could predict tumor formation for nongenotoxic hepatocarcinogens with 84% accuracy (Nie et al., 2006). In our laboratory, we have compared transcriptomic and metabonomic technologies for their ability to identify biomarkers that could predict increased lung and liver tumor incidence following a longer, 90-day exposure (Thomas et al., 2006). The results of Ó The Author 2007. Published by Oxford University Press on behalf of the Society of Toxicology. All rights reserved. For Permissions, please email: [email protected] 56 THOMAS ET AL. the study showed that tissue-specific gene expression biomarkers were generally more accurate than the metabonomic biomarkers and that increased tumor incidence was predicted with relatively high accuracy. In the present study, we focused our efforts on lung tumors and expanded the number of carcinogenic and noncarcinogenic chemicals to assess the ability of gene expression biomarkers to predict increased tumor incidence across a more diverse set of chemicals. Chemicals. 1,5-Naphthalenediamine (NAPD; CAS no. 2243-62-1; purity: 97%), 2,3-benzofuran (BFUR; CAS no. 271-89-6; purity: 99%), N(1-naphthyl) ethylenediamine dihydrochloride (NEDD; CAS no. 1465-25-4; purity: 98%), pentachloronitrobenzene (PCNB; CAS no. 82-68-8; purity: 99%), 2,2-bis(bromomethyl)-1,3-propanediol (BBMP; CAS no. 3296-90-0; purity: 98%), 1,2-dibromoethane (DBET; CAS no. 106-93-4; purity: 99%), coumarin (COUM; CAS no. 91-64-5; purity: 99%), benzene (BENZ; CAS no. 71-43-2; purity: 99%), and 2-chloromethylpyridine hydrochloride (CMPH; CAS no. 6959-47-3; purity: 98%) were purchased from Sigma-Aldrich (St Louis, MO). N-methylolacrylamide (MACR; CAS no. 924-42-5; purity: 98%) was purchased from Pfaltz & Bauer (Waterbury, CT). 4-Nitroanthranilic acid (NAAC; CAS no. 619-17-0; purity: 98.5%), diazinon (DIAZ; CAS no. 333-41-5; purity: 98%), and malathion (MALA; CAS no. 121-75-5; purity: 95%) were purchased from Advanced Technology and Industry (Hong Kong, China). mice were randomized by weight and divided into treatment groups (Table 1). The 13 chemicals used in this study have been previously tested by the NTP. Seven of the chemicals were positive for an increased incidence of primary alveolar/bronchiolar adenomas or carcinomas and six were negative. Study results from the NTP for each of the 13 chemicals are summarized in Table 2. Animal treatment was initiated at 5 weeks of age. Mice were housed five per cage in polycarbonate cages in a temperature- (17.8°C–26.1°C) and humidity (30–70%) -controlled environment with a standard 12-h light/dark cycle. All animals were given access to food (Harlan Teklad; NIH-07 ground meal; Madison, WI) and water ad libitum. Animal exposures for each chemical were performed via the route and dose listed in Table 1. Gavage exposures were administered 5 days/week, and feed exposures were provided 7 days/week. The oral route of exposure was chosen for this study since the majority of chemicals producing lung tumors in the NTP rodent bioassay were delivered through the oral route. Animal use in this study was approved by the Institutional Animal Use and Care Committee of The Hamner Institutes for Health Sciences and was conducted in accordance with the National Institutes of Health (NIH) guidelines for the care and use of laboratory animals. Animals were housed in fully accredited American Association for Accreditation of Laboratory Animal Care facilities. Following 13 weeks of exposure, the mice were euthanized with a lethal ip dose of sodium pentobarbital (Abbott Laboratories, Chicago, IL). The four right lung lobes were isolated by suturing, removed, and minced together in RNAlater (Ambion, Austin, TX). The left lung lobe was inflated with 10% neutral-buffered formalin and stored in 10% formalin for further processing. Following a standard fixation period, the lung tissues were embedded into paraffin blocks, sectioned at 5 lm, and stained with hematoxylin and eosin. Histological changes were assessed by an accredited pathologist. Animals and treatment. One hundred and fifty female B6C3F1 mice were obtained from Charles River Laboratories (Raleigh, NC). Female B6C3F1 mice were chosen since they represent the most sensitive model for chemically induced lung tumor formation in the NTP rodent bioassay. Upon receipt, the Gene expression microarray analysis. Microarray analysis was performed on three to four animals per treatment group except for the corn oil and feed control groups (CCON and FCON) where additional animals were analyzed due to staggered exposures with parallel control groups. A total of MATERIALS AND METHODS TABLE 1 Treatment Groups and Abbreviations Used in the 90-Day Exposure to the 13 Chemicals Used in This Study Chemical Abbreviation NTP no. Route Dose in study 1,5-Naphthalenediamine 2,3-Benzofuran 2,2-Bis(bromomethyl)-1,3-propanediol N-Methylolacrylamide 1,2-Dibromoethane Coumarin Benzene N-(1-naphthyl) ethylenediamine dihydrochloride Pentachloronitrobenzene 4-Nitroanthranilic acid 2-Chloromethylpyridine hydrochloride Diazinon Malathion NAPD BFUR BBMP MACR DBET COUM BENZ NEDD 143 370 452 352 86 422 289 168 Feed Gavagea Feed Gavageb Gavagea Gavagea Gavagea Feed PCNB NAAC CMPH DIAZ MALA 61 109 178 137 24 Feed Feed Gavageb Feed Feed 2000 ppm 240 mg/kg 1250 ppm 50 mg/kg 62 mg/kg 200 mg/kg 100 mg/kg 3000 ppm (2000 ppm)c 8187 ppm 10,000 ppm 250 mg/kg 200 ppm 16,000 ppmd (14,800 ppm) Water Corn oil Feed WCON CCON FCON a Gavageb Gavagea Gavage exposure with a corn oil vehicle (5 ml/kg). Gavage exposure with a deionized water vehicle (5 ml/kg). c The initial dose of 3000 ppm was reduced to 2000 ppm in week 2 of the study due to taste aversion and weight loss. The 2000 ppm dose is the same as the low dose in the original bioassay. d Due to signs of toxicity, the 16,000 ppm dose was reduced to 0 ppm on day 9 for a period of 2 days. The dose was raised to 8000 ppm for a period of 9 days and returned to 16,000 ppm for the remainder of the study. The time-weighted average dose was 14,800 ppm. b 57 BIOMARKERS FOR CHEMICALLY INDUCED LUNG TUMORS TABLE 2 Detailed Testing Results by the NTP among the 13 Chemicals Used in the Study Genotoxicity results by the NTP Mouse CHO cell CHO cell Chemical Salmonella lymphoma CAa SCEb NAPD BFUR BBMP MACR DBET COUM BENZ þ þ, , þ þ þ NEDD PCNB NAAC CMPH DIAZ MALA þ þ, þ þ þ þ þ, þ þ þ þ E þ Incidence of alveolar/bronchiolar adenomas or carcinomas in female B6C3F1 mice Control Low þ E þ þ Wþ þ þ þ, , þ þ þ þ þ 0/49 2/50 5/52 6/50 0/20 2/51 4/49 10/48 9/48 5/50 8/50 11/43 5/49 5/42 þ þ, þ þ þ þ þ, þ þ þ 0/49 0/20 1/45 1/19 1/23 0/10 2/48 0/23 5/41 1/49 1/46 0/49 Mid 15/51 7/49 10/50 High 5/46 14/47 19/50 13/49 6/46 27/51 13/49 1/31 1/20 1/48 3/48 2/49 0/47 Other tumor sites in Relative dose in NTP female micec present study classificationd Thy, Liv Liv, For Har, Ski, For Har, Liv, Ovr Lym, Sto Liv Zym, Ovr, Mam, Har, Lym, Liv None None None None None None High High High High Low High High Lung Lung Lung Lung Lung Lung Lung carc carc carc carc carc carc carc Lowe High High High Highf High Noncarc Noncarc Noncarc Noncarc Noncarc Noncarc Note. þ, positive; , negative; E, equivocal; Wþ, weakly positive. a Testing results for chromosomal aberrations (CA) in the Chinese hamster ovary (CHO) cells. b Testing results for sister chromatid exchange (SCE) in the Chinese hamster ovary cells. c Tumor site abbreviations: Thy, thyroid; Liv, liver; For, forestomach; Har, harderian gland; Zym, zymbal gland; Mam, mammary gland; Ski, skin; Ovr, ovary; Lym, lymphoma; Sto ¼ stomach. d Lung carc ¼ lung carcinogen; Noncarc ¼ noncarcinogen. This definition is based on a statistically significant increase in primary alveolar/bronchiolar adenomas or carcinomas in female B6C3F1 mice. e The initial dose of 3000 ppm was reduced to 2000 ppm in week 2 of the study due to taste aversion and weight loss. The 2000 ppm dose is the same as the low dose in the original bioassay. f Due to signs of toxicity, the 16,000 ppm dose was reduced to 0 ppm on day 9 for a period of 2 days. The dose was raised to 8000 ppm for a period of 9 days and returned to 16,000 ppm for the remainder of the study. The time-weighted average dose was 14,800 ppm. 70 animals were analyzed. Total RNA was isolated from the lung tissue using Trizol reagent (Invitrogen, Carlsbad, CA). The isolated RNA was further purified using RNeasy columns (Qiagen, Valencia, CA), and the integrity of the RNA was verified spectrophotometrically and with the Agilent 2100 Bioanalyzer (Palo Alto, CA). Double-stranded cDNA was synthesized from 5 lg of total RNA using the One-Cycle cDNA synthesis kit (Affymetrix, Santa Clara, CA). Biotin-labeled cRNA was transcribed from the cDNA using the GeneChip IVT labeling kit (Affymetrix). Fifteen micrograms of labeled cRNA was fragmented and hybridized to Affymetrix Mouse Genome 430 2.0 arrays for 16 h at 45°C. The hybridized arrays were washed using the GeneChip Fluidics Station 450 and scanned using a GeneChip 3000 scanner. Microarray data were processed using Robust Multi-array Average with a log2 transformation (Irizarry et al., 2003). The gene expression results have been deposited in the National Center for Biotechnology Information Gene Expression Omnibus (accession no. GSE6116). Analysis of chemical diversity. Molecular descriptors representing the two-dimensional structure of each of the 13 chemical treatments were downloaded from PubChem in the simplified molecular input line entry specification (SMILES) format. SMILES codes for all single chemicals tested in the NTP rodent cancer bioassay were downloaded from DSSTox (Richard et al., 2006). The SMILES code for each chemical was converted into a chemical fingerprint using the GenerateMD software application (Version 3.1.7.1, JChem, ChemAxon, Budapest, Hungary). The chemical fingerprints were then compared for structural similarity using the Tanimoto coefficient and the Compr software application (Version 3.1.7.1, JChem, ChemAxon). Basic statistical and annotation analysis of microarray data. Basic statistical differences were analyzed using both a one-way analysis of variance for differences across the 13 chemical treatments and a linear model (Smyth, 2005) with a contrast between the lung carcinogens and the noncarcinogenic treatments. In both analyses, probability values were adjusted for multiple comparisons using a false discovery rate (Reiner et al., 2003). Analysis of enrichment within gene ontology (GO) categories was performed using NIH David (Dennis et al., 2003). Briefly, Affymetrix probe set identifiers for the genes of interest were uploaded to the DAVID Web site and analyzed based on the Affymetrix 430_2 reference list. A hypergeometric test was performed to identify GO categories with significantly enriched gene numbers. The resulting list of GO categories was refined by selecting categories containing two or more genes. Statistical classification analysis. Classification analysis was performed using a combination of the Golub algorithm (Golub et al., 1999) for feature selection and a support vector machine model for classification (radial basis function kernel, C ¼ 17,150, c ¼ 0.0022). The predicted end point was increased lung tumor incidence in female B6C3F1 mice according to the NTP rodent cancer bioassay (Table 2). To assess the predictive accuracy of the model on the current data set, 10-fold cross-validation was performed. The crossvalidation process consisted of first randomly dividing all 70 animals into 10 equally sized groups (i.e., seven animals per group). Nine of the groups were then lumped together to use as a training set (63 animals), and the remaining group was used as the test set (seven animals). The data for the animals in the 58 THOMAS ET AL. test set was set aside as if we had never observed them. Feature selection was then performed on the training set using the Golub algorithm (Golub et al., 1999), and the genes with the largest Golub statistic were used to build a support vector machine classification model. The model was then used to predict the classes for the seven animals in the test set that were held out at the beginning of the process. The cross-validation process was repeated at least 100 times to obtain a good estimate of the predictive accuracy. Accuracy was calculated by dividing the number of correct predictions in the test set by the total number of predictions. Different numbers of genes were evaluated in the feature selection process to assess the change in predictive accuracy with gene number. The classification analysis was performed using the PCP software program (Buturovic, 2006). RESULTS Structural and Mechanistic Diversity among Chemical Treatments The 13 chemical treatments in the study were intentionally chosen to be diverse in terms of chemical structure, genotoxicity, and potential modes of action. The structural diversity among the chemicals was analyzed using a Tanimoto similarity coefficient with a coefficient of 1.0 being identical molecules and 0.0 having no structural similarity. The average similarity among all 13 chemicals in the study was 0.141 with a maximum similarity of 0.508 between NEDD and NAPD (Table 3). Among the lung carcinogens, the average similarity dropped to 0.123 with a maximum similarity of 0.327 between DBET and BBMP. By comparison, the average similarity for all single chemicals tested by the NTP in a rodent cancer bioassay was 0.155. To assess potential differences in mode of action, gene expression changes across the 13 chemical treatments were used as a surrogate of mechanistic diversity. A total of 28,780 probe sets corresponding to 25,375 unique transcripts were significantly altered among the chemical treatments based on a oneway ANOVA. Given that there are an estimated 39,015 unique transcripts on the microarray, the number of transcripts altered by the 13 chemicals corresponds to approximately 65% of the transcriptome. Histological Changes Gross histological examination of the lung tissue identified treatment-related lesions in only NAPD-treated animals. Morphological changes were found in all five animals examined and were limited to the bronchiolar epithelial cells that exhibited karyomegaly and karyorrhexis. There was occasional peribronchiolar infiltration by neutrophils and mononuclear cells. Bronchiolar epithelial cell morphology was suggestive of regenerative hyperplasia. In the 2-year NTP study, the primary nonneoplastic lesion was adenomatous hyperplasia occurring in 30% of the animals (NTP, 1978). Histopathological changes in the 13-week subchronic study were not provided for NAPD in the original NTP report. Given the absence of lung lesions among the remaining tumorigenic treatment groups, histological changes alone following a 90-day exposure were not predictive of increased lung tumor incidence in a 2-year bioassay. This result is consistent with a previous study that reported the poor predictive properties of histological lesions (Allen et al., 2004). Gross Gene Expression Differences between Lung Carcinogens and Noncarcinogenic Treatments To obtain an overall sense of key differences between the lung carcinogens and noncarcinogenic treatments, a twosample statistical comparison was performed between animals treated with chemicals showing increased lung tumor incidence in the 2-year bioassay and animals treated with the negative chemicals plus the vehicle controls. A total of 82 probe sets corresponding to 75 unique transcripts were significantly altered. Sixty-five transcripts were significantly upregulated in animals treated with the lung carcinogens, and 10 transcripts were TABLE 3 Tanimoto Similarity Coefficients among the Lung Carcinogenic and Noncarcinogenic Treatments NEDD PCNB MALA NAAC DIAZ CMPH BFUR NAPD BBMP BENZ COUM DBET MACR NEDD PCNB MALA NAAC DIAZ CMPH BFUR NAPD BBMP BENZ COUM DBET MACR 1.000 0.268 1.000 0.104 0.150 1.000 0.247 0.394 0.202 1.000 0.144 0.170 0.241 0.209 1.000 0.187 0.216 0.127 0.203 0.214 1.000 0.258 0.157 0.083 0.158 0.116 0.130 1.000 0.508 0.267 0.058 0.233 0.104 0.192 0.317 1.000 0.055 0.057 0.094 0.099 0.060 0.060 0.074 0.055 1.000 0.175 0.112 0.034 0.095 0.037 0.105 0.195 0.314 0.046 1.000 0.148 0.151 0.238 0.276 0.234 0.162 0.175 0.102 0.095 0.079 1.000 0.051 0.024 0.031 0.042 0.023 0.051 0.031 0.047 0.327 0.079 0.031 1.000 0.157 0.104 0.108 0.147 0.099 0.071 0.072 0.103 0.178 0.041 0.130 0.100 1.000 BIOMARKERS FOR CHEMICALLY INDUCED LUNG TUMORS 59 FIG. 1. Heat of genes differentially expressed in the lung following a 90-day exposure to lung carcinogenic and noncarcinogenic treatments. Chemical details, abbreviations, and NTP study results are provided in Tables 1 and 2. Red represents high gene expression and blue is low expression. significantly downregulated. The gene expression differences between the lung carcinogens and the noncarcinogenic chemicals are depicted in Figure 1, and a complete list is provided as supplemental material (Supplemental Table 1). A subset of the significant gene expression changes were also verified using quantitative RT-PCR (Supplemental Fig. 1). Notably, there were a number of highly discriminating gene expression changes that were common among the lung carcinogens despite the diversity in chemical structures, genotoxicity categories, and potential mechanisms. A GO analysis of the significant gene expression changes showed enrichment in multiple categories with the majority related to endogenous and xenobiotic metabolic processes (Table 4). Changes in glutathione-related processes were consistent with a variety of known biomarkers in both rodent and human tumorigenesis (Balendiran et al., 2004; Hayes and Pulford, 1995; Kwak et al., 2004), and a previous study has also identified gene expression changes related to fatty acid metabolism in human colorectal cancers (Yeh et al., 2006). In addition, changes in aldehyde dehydrogenase activity have been associated with experimental and human tumors in a variety of tissues (Lindahl, 1992). Statistical Classification Analysis to Predict Increased Lung Tumor Incidence To evaluate the ability of the gene expression changes to predict increased lung tumor incidence in a rodent cancer bioassay, statistical classification analysis was performed using a combination of the Golub feature selection algorithm (Golub et al., 1999) and a support vector machine model as the classifier. Ten-fold cross-validation was used to estimate the predictive accuracy. Using this approach, tissue gene expression profiles were capable of predicting a chemically induced increase in lung tumor incidence with 93.9% accuracy using only eight probe sets that correspond to six different genes (Fig. 2). The sensitivity and specificity of the model with the eight biomarkers was 95.2 and 91.8%, respectively. The predictive accuracy of the model declined as more genes were added. The top gene expression biomarkers were changes in the UDPglucuronosyltransferase 1a (Ugt1a) family, carboxylesterase 1 60 THOMAS ET AL. TABLE 4 Ontology Analysis of Significant Gene Expression Changes between Lung Carcinogenic and Noncarcinogenic Treatmentsa Biological process GO category P-value Response to chemical stimulus Response to abiotic stimulus 0.0002 Molecular function GO category P-value Glutathione metabolism Fatty acid metabolism 0.0026 0.0029 Coenzyme metabolism 0.0062 Cofactor metabolism Lipid metabolism Sulfur metabolism 0.0108 0.0226 0.0315 Cellular lipid metabolism Physiological process Carboxylic acid metabolism 0.0352 Glutathione transferase activity Transferase activity, transferring alkyl or aryl (other than methyl) groups Catalytic activity Transferase activity, transferring hexosyl groups Transferase activity, transferring glycosyl groups Transferase activity Oxidoreductase activity Glucuronosyltransferase activity Serine esterase activity 0.0374 Carboxylesterase activity 0.0110 0.0390 0.0148 Organic acid metabolism Response to toxin 0.0390 0.0461 Aldehyde metabolism 0.0461 Oxidoreductase activity, acting on the aldehyde or oxo group of donors Carboxylic ester hydrolase activity Carbonyl reductase (NADPH) activity Aldehyde dehydrogenase [NAD(P)þ] activity 0.0009 0.0000 (Ces1), fibroblast growth factor receptor 2 (Fgfr2), epoxide hydrolase 1 (Ephx1), glutathione S-transferase l 1 (Gstm1), and an unannotated gene (Table 5). A complete ranking is provided as supplemental material (Supplemental Table 2). For changes in Ugt1a expression, the three corresponding probe sets were not specific for a particular isoform. The Ugt1a isoforms are produced through the alternative splicing of variable exons connected to four constant exons at the 3#-end of the gene (Zhang et al., 2004). 0.0000 DISCUSSION 0.0000 0.0000 0.0001 0.0001 0.0008 0.0022 0.0110 0.0168 0.0176 0.0220 a GO analysis was performed using NIH David (http://david.abcc.ncifcrf. gov/). The rodent cancer bioassay is part of an established legacy of safety testing to identify chemical, biological, and physical agents that have the potential for adverse human effects. The tests are inefficient, expensive, and involve large-scale animal exposures. For industry, the inefficiencies of the current safety testing paradigm lead to increased financial costs to develop a product and fewer resources to devote to understanding the underlying mode of action for the toxic effect. For the public, the inefficiencies result in access to fewer chemicals that may benefit society and fewer resources to assess the safety of untested chemicals. Therefore, it is in the interest of society as a whole to identify more efficient and economical alternatives to the standard rodent cancer bioassay. In the search for an alternative to the bioassay, we used existing data generated by the NTP to assemble a training set of 13 chemicals. Seven of the chemicals were positive for an increased incidence of primary alveolar/bronchiolar adenomas or carcinomas and six were negative (Table 2). Animals were exposed subchronically to each of the 13 chemicals plus corresponding vehicle controls, and microarray analysis was performed on the lungs of three to four animals per treatment group. Gene expression changes were then examined for TABLE 5 Top Gene Expression Biomarkers That Discriminate between Lung Carcinogenic and Noncarcinogenic Treatments Based on the Golub Algorithm Affymetrix ID Transcript ID Gene symbol Gene description Golub score Expression in carcinogens 1424783_a_at Mm.42472.1 Ugt1a 0.9150 Increased 1449486_at 1443996_at 1426260_a_at Mm.22720.1 Mm.25061.1 Mm.42472.3 Ces1 Fgfr2 Ugt1a 0.8877 0.8550 0.8393 Increased Decreased Increased 1422438_at 1426261_s_at Mm.9075.1 Mm.42472.3 Ephx1 Ugt1a 0.8385 0.8220 Increased Increased 1424266_s_at 1448330_at Mm.29110.1 Mm.2011.1 AU018778 Gstm1 UDP-glucuronosyltransferase 1 family, polypeptide Aa Carboxylesterase 1 Fibroblast growth factor receptor 2 UDP-glucuronosyltransferase 1 family, polypeptide Aa Epoxide hydrolase 1, microsomal UDP-glucuronosyltransferase 1 family, polypeptide Aa Expressed sequence AU018778 Glutathione S-transferase, mu 1 0.7514 0.7419 Increased Increased a Probe set interrogates multiple isoforms of the Ugt1a family. BIOMARKERS FOR CHEMICALLY INDUCED LUNG TUMORS FIG. 2. Results from the statistical classification analysis for predicting chemically induced increases in lung tumor incidence using subchronic gene expression biomarkers. Accuracy was estimated based on 10-fold crossvalidation and calculated by dividing the number of correct predictions by the total number of predictions. potential biomarkers that could discriminate between the lung carcinogenic and noncarcinogenic treatments. Based on the analysis of the chemical structures using the Tanimoto similarity coefficient, the average diversity of the 13 chemicals used in the investigation was similar to the diversity among all chemicals tested by the NTP in the rodent cancer bioassay. In addition, the overall changes in gene expression among the 13 chemical treatments included approximately 65% of the transcriptome, suggesting that the set of chemicals was also mechanistically diverse. Therefore, the ability to identify biomarkers and predict increased lung tumor incidence in a rodent cancer bioassay using these 13 chemicals should reflect the ability to predict lung tumor incidence across all the chemicals tested by the NTP. A formal statistical classification analysis demonstrated that the tissue gene expression profiles were capable of predicting chemically induced increases in lung tumor incidence with 93.9% accuracy using eight probe sets that correspond to six different genes. The ability to predict a complex biological response such as an increase in tumors with relatively few genes is not uncommon. A previous study was able to predict chemically induced rat liver tumors following a 24-h exposure with 84% accuracy using only six genes (Nie et al., 2006), and an 11 gene signature was able to predict recurrence, metastasis, and death in a variety of cancers (Glinsky et al., 2005). The predictive accuracy of the model in our study also declined as more genes were added. The decline in the predictive accuracy with increasing gene numbers has been reported previously and is due to the addition of genes that are treatment specific and not related to the predicted toxicological end point (Thomas et al., 2001). The identification of biomarkers that predict an increase in tumor incidence is fundamentally different than biomarkers that predict tumor formation in an individual animal. The bio- 61 markers that were identified in this study were likely to be genes that created a favorable cellular environment for chemically induced lung tumor formation and not those that determined whether a specific animal gets tumors. Among the six genes in the predictive signature, most were enzymes involved in endogenous and xenobiotic metabolic processes and one was a growth factor receptor involved in lung development. The functional breakdown of these predictive biomarkers was consistent with the established role of metabolism and growth factor signaling in tumorigenesis. Dysregulation of metabolic process in preneoplasia and neoplasia is a relatively wellestablished phenomena (Costello and Franklin, 2005; Mazurek et al., 1997), and many endogenous and xenobiotic metabolizing enzymes have been used as histochemical markers in early initiation-promotion studies (Hasegawa and Ito, 1994). Similarly, the disruption of growth factors and developmental processes related to proliferation and differentiation are believed to be critical steps in carcinogenesis (Breuhahn et al., 2006; Datta and Datta, 2006; Srinivasan et al., 2005). Among the most predictive metabolic enzymes was Ugt1a. Ugt1a is one of a family of enzymes that catalyze the glucuronidation of endogenous and xenobiotic molecules (Tukey and Strassburg, 2000). The mouse Ugt1 locus produces nine different genes through the alternative splicing of 14 variable exons to four constant exons (Zhang et al., 2004). Genomewide scans have identified the Ugt1a locus as playing an important role in chemical carcinogenesis (Tukey and Strassburg, 2000), and various isoforms have been shown to be differentially expressed in human liver cancer (Strassburg et al., 1997). Another predictive metabolic enzyme was Ces1. Ces1 is part of a large multigene family of enzymes that hydrolyze ester and amide bonds and play a role in cellular cholesterol esterification (Ghosh, 2000; Uphoff and Drexler, 2000). Previous studies have suggested that Ces1 may play a role in detoxifying ester or amide containing xenobiotics in the lung (Munger et al., 1991; Uphoff and Drexler, 2000). Notably, human CES1 was part of an 11 gene transcriptional signature that was used to predict therapy outcome and malignancy for multiple types of human cancer including lung cancer (Glinsky et al., 2005). In contrast to our studies, the downregulation of human CES1 was considered prognostic (Glinsky et al., 2005). However, the transcriptional signature in their study was applied to relatively late-stage tumors and not as early classifier of carcinogenic potential. The next metabolic enzyme in the predictive set was Ephx1. Ephx1 has been shown to play a role in the activation and detoxification of many polyaromatic hydrocarbons (Arand et al., 2005). In human cancer, one study has noted an increased expression of human EPHX1 in hepatocellular carcinomas and variable expression in lung tumors (Coller et al., 2001). A separate study has identified increased expression of EPHX1 in human glioblastomas (Kessler et al., 2000). The increased expression in human liver cancer is supported by rodent studies where expression of Ephx1 was increased in preneoplastic nodules (Griffin and Gengozian, 1984; Novikoff et al., 1979). 62 THOMAS ET AL. The fourth most predictive metabolic enzyme was the relative uncharacterized AU018778 gene. The amino acid sequence of the AU018778 gene showed significant similarity to carboxylesterases with approximately 65% identity with mouse Ces1. On the genomic level, the gene is found in a cluster of esterases downstream of Ces1 and upstream of Es22 and Ces3. In normal tissue, AU018778 is predominantly expressed in kidney, liver, intestine, and adipose tissue (Su et al., 2004). No reports were found that showed an altered expression in cancer. The last metabolic enzyme in the predictive set was Gstm1. Gstm1 is part of a family of glutathione transferases that are involved in the metabolism of endogenous and xenobiotic molecules and can modulate cell signaling through a variety of mechanisms (Hayes et al., 2005). Although the majority of work on Gstm1 in cancer has focused on associating human polymorphic differences with susceptibility, increased expression of GSTM1 has been identified as a potential biomarker in human head and neck tumors (Bongers et al., 1995). In the lung, a previous study has reported that human GSTM1 was infrequently expressed in normal tissue and its expression was not increased in lung tumors (Spivack et al., 2003). In rodent studies, increased expression of mu class glutathione transferases have been observed in preneoplastic nodules in the rat liver (Hayes and Pulford, 1995), but not in the mouse liver (Hatayama et al., 1993). The only nonmetabolic gene in the predictive set was Fgfr2. Fgfr2 is part of a family of receptor tyrosine kinases that bind fibroblast growth factors and initiate cellular signals that affect proliferation and differentiation (Eswarakumar et al., 2005). Alternative splicing of Fgfr2 results in two different isoforms, Fgfr2b and Fgfr2c, that have different ligand binding affinities (Eswarakumar et al., 2005). The targeted disruption of the Fgfr2b isoform in mice results in abnormal development of the lung, pituitary, thyroid, teeth, and limbs (De Moerlooze et al., 2000), while disruption of the Fgfr2c isoform results in skeletal abnormalities (Eswarakumar et al., 2002). Additional research has shown that Fgfr2b plays a significant role in lung development (De Langhe et al., 2006; del Moral et al., 2006). One study has reported that binding of Fgf9 to Fgfr2b cooperates with Shh signaling to regulate mesenchymal proliferation in lung development (White et al., 2006). Notably, expression of Shh was also found to be one of the top 20 predictive biomarkers in our study (Supplemental Table 2). In cancer, expression of Fgfr2 has shown different behaviors depending on tissue and cell type. In human lung and colorectal cancer, increased expression of Fgfr2b was observed in cancer tissue (Watanabe et al., 2000; Yamayoshi et al., 2004), while in human gastric and bladder cancer, decreased expression of Fgfr2b was observed in cancer cells and was associated with poor patient prognosis (Diez de Medina et al., 1997; Matsunobu et al., 2006). In our study, decreased expression was predictive of lung tumor formation. In summary, our results demonstrate that an increase in lung tumor incidence can be predicted based on gene expression changes following only a subchronic exposure. Although the present study was limited to 13 chemicals delivered through an oral route and the female mouse lung, the results suggest that this approach has the potential to be more broadly applied to other organ systems and animal models. To adequately assess the full potential of the approach, additional studies would need to be performed and many are currently underway. These studies include additional chemicals, other routes of exposure such as inhalation, other organ systems, and equivalent studies in both sexes and other rodent species. If the approach proves to be more broadly applicable, it has the potential to be an efficient and economical alternative to the rodent cancer bioassay and opens the door to a fundamental shift in chemical safety testing. Based on NTP records, five organ sites (liver, lung, kidney, mammary, and hematopoietic) account for approximately 50% of the positive chemical responses, and 24 organ sites have at least five positive chemicals in at least one species and sex. In the short term, developing gene expression biomarkers for each of the top five tumor sites in both mice and rats would provide an efficient means to prioritize chemicals. Within industry, prioritizing chemicals based on short-term gene expression biomarkers would provide an assessment of product safety earlier in the development pipeline leading to substantial monetary savings and reduced time to market. For the public, the majority of chemicals in the United States are not required to be tested for carcinogenic activity unless evidence for adverse health effects is obtained. Therefore, identifying biomarkers for each of the top five organ sites would allow regulatory agencies to more effectively identify chemicals that need additional safety testing. For long-term goals, developing biomarkers for each of the 24 tissues may allow for the eventual replacement of the rodent cancer bioassay, and performing studies on chemicals that are both rodent and human carcinogens could identify biomarkers with more direct relevance to human health. SUPPLEMENTARY DATA Supplementary data are available online at http://toxsci. oxfordjournals.org/ and includes (1) a complete list of the genes identified as being significantly different between animals treated with chemicals showing increased incidence of lung tumors in the rodent cancer bioassay and animals treated with negative chemicals plus the vehicle controls (Supplemental Table 1); (2) a complete ranking of the most predictive gene expression biomarkers based on the Golub feature selection algorithm (Supplemental Table 2); and (3) qRT-PCR validation of a subset of the differentially expressed genes (Supplemental Fig. 1). ACKNOWLEDGMENTS Research was supported by the American Chemistry Council’s Long Range Research Initiative under the Improved Methods Focus Area. BIOMARKERS FOR CHEMICALLY INDUCED LUNG TUMORS REFERENCES Allen, D. G., Pearse, G., Haseman, J. K., and Maronpot, R. R. (2004). Prediction of rodent carcinogenesis: An evaluation of prechronic liver lesions as forecasters of liver tumors in NTP carcinogenicity studies. Toxicol. Pathol. 32, 393–401. Arand, M., Cronin, A., Adamska, M., and Oesch, F. (2005). Epoxide hydrolases: Structure, function, mechanism, and assay. Methods Enzymol. 400, 569–588. 63 Gold, L. S., Manley, N. B., Slone, T. H., and Rohrbach, L. (1999). Supplement to the Carcinogenic Potency Database (CPDB): Results of animal bioassays published in the general literature in 1993 to 1994 and by the National Toxicology Program in 1995 to 1996. Environ. Health Perspect. 107(Suppl. 4), 527–600. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537. Balendiran, G. K., Dabur, R., and Fraser, D. (2004). The role of glutathione in cancer. Cell Biochem. Funct. 22, 343–352. Griffin, M. J., and Gengozian, N. (1984). Epoxide hydrolase: A marker for experimental hepatocarcinogenesis. Ann. Clin. Lab. Sci. 14, 27–31. Bongers, V., Snow, G. B., de Vries, N., Cattan, A. R., Hall, A. G., van der Waal, I., and Braakhuis, B. J. (1995). Second primary head and neck squamous cell carcinoma predicted by the glutathione S-transferase expression in healthy tissue in the direct vicinity of the first tumor. Lab. Invest. 73, 503–510. Hasegawa, R., and Ito, N. (1994). Hepatocarcinogenesis in the rat. In Carcinogenesis (M. P. Waalkes and J. M. Ward, Eds.), pp. 39–65. Raven Press, New York. Breuhahn, K., Longerich, T., and Schirmacher, P. (2006). Dysregulation of growth factor signaling in human hepatocellular carcinoma. Oncogene 25, 3787–3800. Bucher, J. R., and Portier, C. (2004). Human carcinogenic risk evaluation, Part V: The national toxicology program vision for assessing the human carcinogenic hazard of chemicals. Toxicol. Sci. 82, 363–366. Buturovic, L. J. (2006). PCP: A program for supervised classification of gene expression profiles. Bioinformatics 22, 245–247. Coller, J. K., Fritz, P., Zanger, U. M., Siegle, I., Eichelbaum, M., Kroemer, H. K., and Murdter, T. E. (2001). Distribution of microsomal epoxide hydrolase in humans: An immunohistochemical study in normal tissues, and benign and malignant tumours. Histochem. J. 33, 329–336. Costello, L. C., and Franklin, R. B. (2005). ‘Why do tumour cells glycolyse?’: From glycolysis through citrate to lipogenesis. Mol. Cell. Biochem. 280, 1–8. Datta, S., and Datta, M. W. (2006). Sonic Hedgehog signaling in advanced prostate cancer. Cell. Mol. Life Sci. 63, 435–448. De Langhe, S. P., Carraro, G., Warburton, D., Hajihosseini, M. K., and Bellusci, S. (2006). Levels of mesenchymal FGFR2 signaling modulate smooth muscle progenitor cell commitment in the lung. Dev. Biol. 299, 52–62. De Moerlooze, L., Spencer-Dene, B., Revest, J., Hajihosseini, M., Rosewell, I., and Dickson, C. (2000). An important role for the IIIb isoform of fibroblast growth factor receptor 2 (FGFR2) in mesenchymal-epithelial signalling during mouse organogenesis. Development 127, 483–492. del Moral, P. M., De Langhe, S. P., Sala, F. G., Veltmaat, J. M., Tefft, D., Wang, K., Warburton, D., and Bellusci, S. (2006). Differential role of FGF9 on epithelium and mesenchyme in mouse embryonic lung. Dev. Biol. 293, 77–89. Dennis, G., Jr., Sherman, B. T., Hosack, D. A., Yang, J., Gao, W., Lane, H. C., and Lempicki, R. A. (2003). DAVID: Database for annotation, visualization, and integrated discovery. Genome Biol. 4, P3. Diez de Medina, S. G., Chopin, D., El Marjou, A., Delouvee, A., LaRochelle, W. J., Hoznek, A., Abbou, C., Aaronson, S. A., Thiery, J. P., and Radvanyi, F. (1997). Decreased expression of keratinocyte growth factor receptor in a subset of human transitional cell bladder carcinomas. Oncogene 14, 323–330. Eswarakumar, V. P., Lax, I., and Schlessinger, J. (2005). Cellular signaling by fibroblast growth factor receptors. Cytokine Growth Factor Rev. 16, 139–149. Eswarakumar, V. P., Monsonego-Ornan, E., Pines, M., Antonopoulou, I., Morriss-Kay, G. M., and Lonai, P. (2002). The IIIc alternative of Fgfr2 is a positive regulator of bone formation. Development 129, 3783–3793. Ghosh, S. (2000). Cholesteryl ester hydrolase in human monocyte/macrophage: Cloning, sequencing, and expression of full-length cDNA. Physiol. Genomics 2, 1–8. Glinsky, G. V., Berezovska, O., and Glinskii, A. B. (2005). Microarray analysis identifies a death-from-cancer signature predicting therapy failure in patients with multiple types of cancer. J. Clin. Invest. 115, 1503–1521. Hatayama, I., Nishimura, S., Narita, T., and Sato, K. (1993). Sex-dependent expression of class pi glutathione S-transferase during chemical hepatocarcinogenesis in B6C3F1 mice. Carcinogenesis 14, 537–538. Hayes, J. D., Flanagan, J. U., and Jowsey, I. R. (2005). Glutathione transferases. Annu. Rev. Pharmacol. Toxicol. 45, 51–88. Hayes, J. D., and Pulford, D. J. (1995). The glutathione S-transferase supergene family: Regulation of GST and the contribution of the isoenzymes to cancer chemoprotection and drug resistance. Crit. Rev. Biochem. Mol. Biol. 30, 445–600. Irizarry, R. A., Bolstad, B. M., Collin, F., Cope, L. M., Hobbs, B., and Speed, T. P. (2003). Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31, e15. Kessler, R., Hamou, M. F., Albertoni, M., de Tribolet, N., Arand, M., and Van Meir, E. G. (2000). Identification of the putative brain tumor antigen BF7/ GE2 as the (de)toxifying enzyme microsomal epoxide hydrolase. Cancer Res. 60, 1403–1409. Kwak, M. K., Wakabayashi, N., and Kensler, T. W. (2004). Chemoprevention through the Keap1-Nrf2 signaling pathway by phase 2 enzyme inducers. Mutat. Res. 555, 133–148. Lindahl, R. (1992). Aldehyde dehydrogenases and their role in carcinogenesis. Crit. Rev. Biochem. Mol. Biol. 27, 283–335. Matsunobu, T., Ishiwata, T., Yoshino, M., Watanabe, M., Kudo, M., Matsumoto, K., Tokunaga, A., Tajiri, T., and Naito, Z. (2006). Expression of keratinocyte growth factor receptor correlates with expansive growth and early stage of gastric cancer. Int. J. Oncol. 28, 307–314. Mazurek, S., Boschek, C. B., and Eigenbrodt, E. (1997). The role of phosphometabolites in cell proliferation, energy metabolism, and tumor therapy. J. Bioenerg. Biomembr. 29, 315–330. Munger, J. S., Shi, G. P., Mark, E. A., Chin, D. T., Gerard, C., and Chapman, H. A. (1991). A serine esterase released by human alveolar macrophages is closely related to liver microsomal carboxylesterases. J. Biol. Chem. 266, 18832–18838. Nie, A. Y., McMillian, M., Brandon Parker, J., Leone, A., Bryant, S., Yieh, L., Bittner, A., Nelson, J., Carmen, A., Wan, J., et al. (2006). Predictive toxicogenomics approaches reveal underlying molecular mechanisms of nongenotoxic carcinogenicity. Mol. Carcinog. 45, 914–933. Novikoff, A. B., Novikoff, P. M., Stockert, R. J., Becker, F. F., Yam, A., Poruchynsky, M. S., Levin, W., and Thomas, P. E. (1979). Immunocytochemical localization of epoxide hydrase in hyperplastic nodules induced in rat liver by 2-acetylaminofluorene. Proc. Natl. Acad. Sci. USA 76, 5207–5211. NTP (1978). Bioassay of 1,5-Naphthalenediamine for Possible Carcinogenicity. U.S. Department of Health and Human Services National Toxicology Program, Washington, DC. NTP (1996). Annual Plan for Fiscal Year 1996. National Toxicology Program, Washington, DC. Pritchard, J. B., French, J. E., Davis, B. J., and Haseman, J. K. (2003). The role of transgenic mouse models in carcinogen identification. Environ. Health Perspect. 111, 444–454. 64 THOMAS ET AL. Reiner, A., Yekutieli, D., and Benjamini, Y. (2003). Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19, 368–375. Richard, A. M., Gold, L. S., and Nicklaus, M. C. (2006). Chemical structure indexing of toxicity data on the internet: Moving toward a flat world. Curr. Opin. Drug. Discov. Devel. 9, 314–325. Smyth, G. K. (2005). Limma: Linear models for microarray data. In Bioinformatics and Computational Biology Solutions Using R and Bioconductor (R. Gentleman, V. Carey, S. Dudoit, R. A. Irizarry, and W. Huber, Eds.), pp. 397–420. Springer, New York. Spivack, S. D., Hurteau, G. J., Fasco, M. J., and Kaminsky, L. S. (2003). Phase I and II carcinogen metabolism gene expression in human lung tissue and tumors. Clin. Cancer Res. 9, 6002–6011. Srinivasan, D. M., Kapoor, M., Kojima, F., and Crofford, L. J. (2005). Growth factor receptors: Implications in tumor biology. Curr. Opin. Investig. Drugs 6, 1246–1249. Strassburg, C. P., Manns, M. P., and Tukey, R. H. (1997). Differential downregulation of the UDP-glucuronosyltransferase 1A locus is an early event in human liver and biliary cancer. Cancer Res. 57, 2979–2985. Su, A. I., Wiltshire, T., Batalov, S., Lapp, H., Ching, K. A., Block, D., Zhang, J., Soden, R., Hayakawa, M., Kreiman, G., et al. (2004). A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. USA 101, 6062–6067. Thomas, R. S., O’Connell, T. M., Pluta, L., Wolfinger, R. D., Yang, L., and Page, T. J. (2006). A comparison of transcriptomic and metabonomic technologies for identifying biomarkers predictive of two-year rodent cancer bioassays. Toxicol. Sci. 96, 40–46. Thomas, R. S., Rank, D. R., Penn, S. G., Zastrow, G. M., Hayes, K. R., Pande, K., Glover, E., Silander, T., Craven, M. W., Reddy, J. K., et al. (2001). Identification of toxicologically predictive gene sets using cDNA microarrays. Mol. Pharmacol. 60, 1189–1194. Tukey, R. H., and Strassburg, C. P. (2000). Human UDP-glucuronosyltransferases: Metabolism, expression, and disease. Annu. Rev. Pharmacol. Toxicol. 40, 581–616. Uphoff, C. C., and Drexler, H. G. (2000). Biology of monocyte-specific esterase. Leuk. Lymphoma 39, 257–270. Watanabe, M., Ishiwata, T., Nishigai, K., Moriyama, Y., and Asano, G. (2000). Overexpression of keratinocyte growth factor in cancer cells and enterochromaffin cells in human colorectal cancer. Pathol. Int. 50, 363–372. White, A. C., Xu, J., Yin, Y., Smith, C., Schmid, G., and Ornitz, D. M. (2006). FGF9 and SHH signaling coordinate lung growth and development through regulation of distinct mesenchymal domains. Development 133, 1507–1517. Yamayoshi, T., Nagayasu, T., Matsumoto, K., Abo, T., Hishikawa, Y., and Koji, T. (2004). Expression of keratinocyte growth factor/fibroblast growth factor-7 and its receptor in human lung cancer: Correlation with tumour proliferative activity and patient prognosis. J. Pathol. 204, 110–118. Yeh, C. S., Wang, J. Y., Cheng, T. L., Juan, C. H., Wu, C. H., and Lin, S. R. (2006). Fatty acid metabolism pathway play an important role in carcinogenesis of human colorectal cancers by Microarray-Bioinformatics analysis. Cancer Lett. 233, 297–308. Zhang, T., Haws, P., and Wu, Q. (2004). Multiple variable first exons: A mechanism for cell- and tissue-specific gene regulation. Genome Res. 14, 79–89.