Abstract:There are various kinds of classification algorithms for gene expression profile data, each of which has its own characteristics, and the effects of different classification algorithms on different gene expression profile datasets are also dissimilar. In this paper, we mainly summarize the characters and research progress of several widespread algorithms (including discriminant analysis, decision tree, support vector machine and integration algorithm) so as to provide scientific guidance for the related research and application.
[1] 陈金瓯, 柳青. DNA微阵列数据判别的旋转森林方法[J]. 中国卫生统计, 2012, 29(4):525-528.
[2] 袁联雄, 佘玲玲, 林爱华,等. 常用分类算法在不同样本量和类分布的不平衡数据中的分类效果比较[J]. 中国医院统计, 2015,22(1):22-26.
[3] Bernau C, Augustin T, Boulesteix AL. Correcting the optimal resampling-based error rate by estimating the error rate of wrapper algorithms[J]. Biometrics, 2013, 69(3):693-702.
[4] Ding Y, Tang S, Liao SG, et al. Bias correction for selecting the minimal-error classifier from many machine learning models[J]. Bioinformatics, 2014, 30(22):3152.
[5] Tibshirani RJ, Tibshirani R. A bias correction for the minimum error rate in cross-validation[J]. Ann Appl Stat, 2009, 3(2):822-829.
[6] Kim KI, Simon R. Probabilistic classifiers with high-dimensional data[J]. Biostatistics, 2011, 12(3):399-412.
[7] Novianti PW, Jong VL, Roes KCB, et al. Factors affecting the accuracy of a class prediction model in gene expression data[J]. BMC Bioinformatics, 2015, 16(1):199.
[8] Jong VL, Novianti PW, Roes KCB, et al. Selecting a classification function for class prediction with gene expression data[J]. Bioinformatics, 2016, 32(12):1814-1822.
[9] Zahra S, Gholi MNM, Leila S, et al. Prediction of depression in cancer patients with different classification criteria, linear discriminant analysis versus logistic regression [J]. Glob J Health Sci, 2016, 8(7):41-46.
[10] Tebbens JD, Schlesinger P. Improving implementation of linear discriminant analysis for the high dimension/small sample size problem[J]. Comput Stat Data Anal, 2007, 52(1):423-437.
[11] Li R, Wu B. Sparse regularized discriminant analysis with application to microarrays[J]. Comput Biol Chem, 2012, 39(8):14-19.
[12] Tibshirani R, Hastie T, Narasimhan B, et al. Class prediction by nearest shrunken centroids, with applications to DNA microarrays[J]. Stat Sci, 2003, 18(1):104-117.
[13] Guo Y, Hastie T, Tibshirani R. Regularized linear discriminant analysis and its application in microarrays[J]. Biostatistics, 2007, 8(1):86-100.
[14] Huang D, Yu Q, Miao H, et al. Comparison of linear discriminant analysis methods for the classification of cancer based on gene expression data[J]. J Exp Clin Cancer Res, 2009, 28(1):149.
[15] Ledoit O, Wolf M. A well-conditioned estimator for large-dimensional covariance matrices[J]. J Multivar Anal, 2004, 88(2):365-411.
[16] Xu P, Brock GN, Parrish RS. Modified linear discriminant analysis approaches for classification of high-dimensional microarray data[J]. Comput Stat Data Anal, 2009, 53(5):1674-1687.
[17] Williamsdevane CLR, Reif DM, Hubal EC, et al. Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes[J]. BMC Syst Biol, 2013, 7(1):1-19.
[18] Quinlan JR. Induction of decision trees[J]. Mach Learn, 1986, 1(1):81-106.
[19] Polaka I, Tom I, Borisov A. Decision tree classifiers in bioinformatics[J]. Appl Comp Syst, 2010, 42(1):118-123.
[20] Breiman L. Random forests[J]. Mach Learn, 2001, 45(1):5-32.
[21] 谢妞妞. 决策树算法综述[J]. 软件导刊, 2015, 14(11):63-65.
[22] Breiman L, Friedman JH, Olshen R, et al. Classification and regression trees[J]. Biometrics, 1984, 40(3):358.
[23] WU XY, WU ZY, LI K. Identification of differential gene expression for microarray data using recursive random forest[J]. Chin Med J (Engl), 2008, 121(24):2492-2496.
[24] 武晓岩, 李康. 随机森林方法在基因表达谱数据分析中的应用及研究进展[J]. 中国卫生统计, 2009, 26(4):437-440.
[25] Ge G, Wong GW. Classification of premalignant pancreatic cancer mass-spectrometry data using decision tree ensembles[J]. BMC Bioinformatics, 2008, 9(1):1-12.
[26] Tan PJ, Dowe DL, Dix TI. Building classification models from microarray data with tree-based classification algorithms[C]// Australian Joint Conference on Advances in Artificial Intelligence. Lecture notes in artificial intelligence:vol. 4830. Berlin: Springer-Verlag, 2008:589-598.
[27] Hu H, Li J, Wang H, et al. A maximally diversified multiple decision tree algorithm for microarray data classification[C]// WISB 2006, vol. 73. Darlinghurst, Australia:Australian Computer Society, Inc, 2006:35-38.
[28] Czajkowski M, Grzes M, Kretowski M. Multi-test decision trees for gene expression data analysis[M]// Security and Intelligent Information Systems. Springer Berlin Heidelberg, 2012:154-167.
[29] Czajkowski M, Grzes M, Kretowski M. Multi-test decision tree and its application to microarray data classification[J]. Artif Intell Med, 2014, 61(1):35.
[30] Vapnik VN. The nature of statistical learning theory[J]. IEEE Trans Neural Netw, 2002, 8(6):1564-1564.
[31] 顾亚祥, 丁世飞. 支持向量机研究进展[J]. 计算机科学, 2011, 38(1):14-17.
[32] 丁世飞, 齐丙娟, 谭红艳. 支持向量机理论与算法研究综述[J]. 电子科技大学学报, 2011, 40(1):2-10.
[33] 郭虎升. 支持向量机的优化建模方法研究[D]. 太原:山西大学, 2014.
[34] Abdi MJ1, Hosseini SM, Rezghi M. A novel weighted support vector machine based on particle swarm optimization for gene selection and tumor classification[J]. Comput Math Methods Med, 2012, 2012(9):320698.
[35] Liu J, Li SC, Luo X. Iterative reweighted noninteger norm regularizing SVM for gene expression data classification[J]. Comput Math Methods Med, 2013, 2013(1-2):768404.
[36] 陈冰. 多分类器集成算法研究[D]. 济南:山东师范大学, 2009.
[37] Blagus R, Lusa L. Boosting for high-dimensional two-class prediction[J]. BMC Bioinformatics, 2015, 16(1):1-17.
[38] Osareh A, Shadgar B. An efficient ensemble learning method for gene microarray classification[J]. Biomed Res Int, 2013, 2013(2):478410.
[39] Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data[J]. J Am Stat Assoc, 2002, 97(457):77-87.
[40] Lee JW, Lee JB, Park M, et al. An extensive comparison of recent classification tools applied to microarray data[J]. Comput Stat Data Anal, 2005, 48(4):869-885.
[41] Parry RM, Jones W, Stokes TH, et al. k-Nearest neighbor models for microarray gene expression analysis and clinical outcome prediction[J]. Pharmacogenomics J, 2010, 10(4):292.
[42] Chou HL, Yao CT, Su SL, et al. Gene expression profiling of breast cancer survivability by pooled cDNA microarray analysis using logistic regression, artificial neural networks and decision trees[J]. BMC Bioinformatics, 2013, 14(1):1-11.
[43] 刘匆提, 李昂, 门志红,等. 惩罚logistic回归方法在SNPs数据变量筛选研究中的应用[J]. 实用预防医学, 2016, 23(11):1395-1399.