Meta-analysis of gene expression profiles related to relapse-free survival in 1,079 breast cancer patients

TitleMeta-analysis of gene expression profiles related to relapse-free survival in 1,079 breast cancer patients
Publication TypeJournal Article
Year of Publication2009
AuthorsGyorffy B., Schafer R.
JournalBreast Cancer Res Treat
Date PublishedDec
ISBN Number1573-7217 (Electronic)0167-6806 (Linking)
Accession Number19052860
KeywordsBreast Neoplasms/*genetics/*mortality, Databases, Genetic, Disease-Free Survival, Female, Gene Expression, *Gene Expression Profiling, Humans, Kaplan-Meier Estimate, Oligonucleotide Array Sequence Analysis, Principal Component Analysis, Prognosis, Tumor Markers, Biological/*genetics

The transcriptome of breast cancers have been extensively screened with microarrays and large sets of genes associated with clinical features have been established. The aim of this study was to validate original gene sets on a large cohort of raw breast cancer microarray data with known clinical follow-up. We recovered 20 publications and matched them to Affymetrix HGU133A annotations. Raw Affymetrix HGU133A microarray data were extracted from GEO and MAS5 normalized. For classifying patients using the selected gene sets, we applied prediction analysis of microarrays and constructed Kaplan-Meier plots. A new classification including all patients was generated using supervised principal components analysis. Seven studies including 1,470 patients were downloaded from GEO. Notably, we uncovered 641 microarrays representing 251 individual tumor specimens among them, which were repeatedly described under independent GEO identifiers. We excluded all redundant data and used the remaining 1,079 samples. Eight of the 20 gene sets were able to predict response at a significance of P < 0.05. The discrimination of good and poor prognosis groups exclusively relying on gene expression data resulted in high significance (P = 1.8E-12). A model including genes fitted by both gene expression and clinical covariates (lymph node status and grade) contains 44 genes and can predict response at P = 9.5E-7. The outcome provides a ranking of the gene lists regarding applicability on an independent dataset. We established a consensus predictor combining the available clinical and gene expression data. The database comprising expression profiles of 1,079 breast cancers can be used to classify individual patients.