Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method.
摘要:
MOTIVATION: We recently introduced a multivariate approach that selects a subset of predictive genes jointly for sample classification based on expression data. We tested the algorithm on colon and leukemia data sets. As an extension to our earlier work, we systematically examine the sensitivity, reproducibility and stability of gene selection/sample classification to the choice of parameters of the algorithm. METHODS: Our approach combines a Genetic Algorithm (GA) and the k-Nearest Neighbor (KNN) method to identify genes that can jointly discriminate between different classes of samples (e.g. normal versus tumor). The GA/KNN method is a stochastic supervised pattern recognition method. The genes identified are subsequently used to classify independent test set samples. RESULTS: The GA/KNN method is capable of selecting a subset of predictive genes from a large noisy data set for sample classification. It is a multivariate approach that can capture the correlated structure in the data. We find that for a given data set gene selection is highly repeatable in independent runs using the GA/KNN method. In general, however, gene selection may be less robust than classification. AVAILABILITY: The method is available at http://dir.niehs.nih.gov/microarray/datamining CONTACT: LI3
展开
关键词:
Algorithms Colonic Neoplasms Databases, Factual Gene Expression Lymphoma, B-Cell 算法 结肠肿瘤 数据库, 事实型 基因表达 淋巴瘤, B细胞
DOI:
10.1093/bioinformatics/17.12.1131
被引量:
年份:
2001


































通过文献互助平台发起求助,成功后即可免费获取论文全文。
相似文献
参考文献
引证文献
来源期刊
引用走势
辅助模式
引用
文献可以批量引用啦~
欢迎点我试用!