Peng, H.: Minimum Redundancy Feature Selection from Microarray Gene Expression Data. Journal of Bioinformatics and Computational Biology 3(2), 185-205
摘要:
How to selecting a small subset out of the thousands of genes in microarray data is important for accurate classification of phenotypes. Widely used methods typically rank genes according to their differential expressions among phenotypes and pick the top-ranked genes. We observe that feature sets so obtained have certain redundancy and study methods to minimize it. We propose a minimum redundancy - maximum relevance (MRMR) feature selection framework. Genes selected via MRMR provide a more balanced coverage of the space and capture broader characteristics of phenotypes. They lead to significantly improved class predictions in extensive experiments on 6 gene expression data sets: NCI, Lymphoma, Lung, Child Leukemia, Leukemia, and Colon. Improvements are observed consistently among 4 classification methods: Naive Bayes, Linear discriminant analysis, Logistic regression, and Support vector machines. SUPPLIMENTARY: The top 60 MRMR genes for each of the datasets are listed in http://crd.lbl.gov/~cding/MRMR/. More information related to MRMR methods can be found at http://www.hpeng.net/.
展开
关键词:
arrays feature extraction genetics redundancy microarray gene expression data minimum redundancy - maximum relevance framework minimum redundancy feature selection phenotypes spectrum characteristic Bioinformatics
DOI:
10.1142/S0219720005001004
被引量:
年份:
2005
通过文献互助平台发起求助,成功后即可免费获取论文全文。
相似文献
参考文献
引证文献
来源期刊
引用走势
辅助模式
引用
文献可以批量引用啦~
欢迎点我试用!