Abstract:Objective To compare the abilities of 3 kinds of penalized logistic regression methods including(L1 regularization, L2 regularization and elastic net) in the variable selection of single nucleotide polymorphisms (SNPs) data. Methods We generated the simulated SNPs data in different conditions according to the setup parameters, and then assessed the abilities of 3 penalized logistic regression methods in the variable selection from the 3 aspects of accuracy rate, error rate and correct index.Results The accuracy and error rates of the 3 penalized logistic regression methods showed as follows: L2 regularization>elastic net>L1 regularization and L2 regularization >elastic net>L1 regularization. The correct indexes of the 3 penalized logistic regression methods showed as follows: elastic net>L1 regularization >L2 regularization. Conclusions Elastic net is the best among the 3 methods in terms of variable selection, which combines the ideas of both L1 and L2 regularization. In high-dimensional data analysis, this method not only guarantees the sparsity of the model which thus facilitates the interpretation of the results, but also solves the problem that the correlated dependent variables can not simultaneously enter the model.
刘匆提, 李昂, 门志红, 姜博, 肖纯, 刘艳, 李贞子. 惩罚logistic回归方法在SNPs数据变量筛选研究中的应用[J]. 实用预防医学, 2016, 23(11): 1395-1399.
LIU Cong-ti, LI Ang, MEN Zhi-hong, JIANG Bo, XIAO Chun, LIU Yan, LI Zhen-zi. Application of penalized logistic regression methods to the variable selection of SNPs data. , 2016, 23(11): 1395-1399.
[1] 徐嘉兴, 李钢, 陈国良. 基于 logistic回归模型的矿区土地利用演变驱动力分析[J]. 农业工程学报, 2012,20(2):247-255. [2] 张锡兴, 陈田木, 刘如春,等. Logistic模型在甲型H1N1流感大流行模拟中的应用[J]. 实用预防医学, 2014, 21(9):1052-1055. [3] 王梦佳. 基于logistic回归模型的P2P网贷平台借款人信用风险评估 [D].北京: 北京外国语大学, 2015. [4] 杨喆, 段重阳, 刘晓秋. 小学生行为问题与家庭环境影响因素分析[J]. 实用预防医学, 2013, 20(7):852-854. [5] Park MY, Hastie T. Penalized logistic regression for detecting gene interactions[J]. Biostatistics, 2008, 9(1):30-50. [6] Genkin A, Lewis DD, Madigan D. Large-scale Bayesian logistic regression for text categorization [J]. Technometrics, 2007, 49(3):291-304. [7] Kim SJ, Koh K, Lustig M, et al. An interior-point method for large-scale l 1-regularized least squares [J]. IEEE J-STSP, 2007, 1(4):606-617. [8] Zhu J, Hastie T. Classification of gene microarrays by penalized logistic regression [J]. Biostatistics, 2004, 5(3):427-443. [9] Zou H, Hastie T. Regularization and variable selection via the elastic net [J]. J Roy Stat Soc B, 2005, 67(2):301-320. [10] 徐曜华. 基于SNP特征的样本分类[D]. 西安:西安电子科技大学, 2010. [11] Breiman L. Better subset selection using the non-negative garotte[J]. Technometrics, 1995,37(4):373-384. [12] Tibshirani R. Regression shrinkage and selection via the lasso [J]. J Roy StaT Soc B, 1996, 267-288. [13] Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems [J]. Technometrics, 1970, 12(1):55-67. [14] Fu WJ. Penalized regressions:the bridge versus the lasso [J]. J Comput Graph Stat, 1998, 7(3):397-416. [15] Fu WJ. Nonlinear GCV and quasi-GCV for shrinkage models [J]. J Stat Plan Infer, 2005, 131(2):333-347. [16] 顾星博, 李昂, 温琪,等. Rstudio和随机丛林在高维全基因组学数据分析中的应用[J]. 中国卫生统计, 2015,32(6):955-957. [17] Yang C, Wan X, Yang Q, et al. Identifying main effects and epistatic interactions from large-scale SNP data via adaptive group Lasso[J]. BMC Bioinformatics, 2010, 11(1):1. [18] D'Angelo GM, Rao D, Gu CC. Combining least absolute shrinkage and selection operator (LASSO) and principal-components analysis for detection of gene-gene interactions in genome-wide association studies[C].BMC Proc, 2009, 3(7):1. [19] Usai MG, Goddard ME, Hayes BJ. LASSO with cross-validation for genomic selection[J]. Genet Res, 2009, 91(6):427-436. [20] Tibshirani R. Regression shrinkage and selection via the lasso[J]. J Roy Stat Soc B, 1996:267-288. [21] Zou H, Zhang HH. On the adaptive elastic-net with a diverging number of parameters [J]. Ann Stat, 2009, 37(4):1733. [22] 王琦. Elastic net logistic回归快速多因子降维算法 [D].合肥: 中国科学技术大学, 2013. [23] Meinshausen N. Relaxed lasso[J]. Comput Stat Data An, 2007, 52(1):374-393. [24] Zou H. The adaptive lasso and its oracle properties[J]. J Am Stat Assoc, 2006, 101(476):1418-1429.