基于正则化回归的变量选择方法在高维数据中的应用

doi:10.3969/j.issn.1006-3110.2018.06.002

摘要
图/表
参考文献
相关文章 (1)

全文: PDF (710 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要变量筛选和模型估计一直是高维数据的研究热点,而高维数据的维度灾难问题日渐突出,传统的统计分析方法因模型不稳定不再适用,本文对高维数据中基于正则化回归的变量选择方法的原理、适用的数据类型及优缺点、调整参数的选择进行综述。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	荣雯雯
	张奇
	刘艳

关键词 ：高维数据, 正则化, 惩罚项, 调整参数

Abstract：Variable filtering and model estimation have been the hotspot of high dimensional data, and the dimensionality problem of high dimensional data is becoming more and more prominent. The traditional statistical analysis method is no longer applicable due to the instability of the model. In this paper, we review the principle of variable selection method based on regularized regression in high dimensional data, the data type and the advantages and disadvantages, and the selection of adjustment parameters.

Key words： high dimensional data regularization penalty item adjustment parameter

收稿日期: 2017-05-17

R195.1

基金资助:国家自然基金(81172741,30972537)

通讯作者: 刘艳,E-mail:liuyan@ems.hrbmu.edu.cn。

作者简介: 荣雯雯(1991-),女,安徽省淮北市人,硕士在读,主要从事医学领域统计学方法的应用与研究工作。

引用本文:

荣雯雯, 张奇, 刘艳. 基于正则化回归的变量选择方法在高维数据中的应用[J]. 实用预防医学, 2018, 25(6): 645-648. RONG Wen-wen, ZHANG Qi, LIU Yan. Application of variable selection method based on regularized regression to high dimensional data. , 2018, 25(6): 645-648.

链接本文:

https://www.syyfyx.com/CN/10.3969/j.issn.1006-3110.2018.06.002 或 https://www.syyfyx.com/CN/Y2018/V25/I6/645

[1] 王巧智, 黄强, 黄河,等. 大数据下结核患者诊疗质量控制系统架构设计探讨[J]. 实用预防医学, 2016, 23(10):1280-1283.
[2] 顾星博, 温琪, 史晓雯,等. 随机森林的并行运算方法及适用条件[J]. 实用预防医学, 2016, 23(2):129-132.
[3] 李仲达, 林建浩, 王美今. 大数据时代的高维统计:稀疏建模的发展及其应用[J]. 统计研究, 2015, 32(1):3-11.
[4] 邱东. 大数据时代对统计学的挑战[J]. 统计研究, 2014, 31(1):16-22.
[5] Breiman L. Heuristics of instability and stabilization in model selection[J]. Ann Stat, 1996, 24(6):2350-2383.
[6] Hoerl AE, Kennard RW. Ridge regression: applications to nonorthogonal problems[J]. Technometrics, 1970, 12(1):69-82.
[7] Tibshirani R. Regression shrinkage and subset selection with the lasso[J]. J Roy Stat Soc, 1996,58(1):267-288.
[8] Efron B, Hastie T, Johnstone I, et al. Least angle regression[J]. Ann Stat, 2004, 32(2):407-451.
[9] Zou H. The adaptive lasso and its oracle properties[J]. J Am Stat Assoc, 2006, 101(476):1418-1429.
[10] Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties[J]. J Am Stat Assoc, 2001, 96(456):1348-1360.
[11] Zhang CH. Penalized linear unbiased selection[J]. Dept Stat, 2007,3.
[12] Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent.[J]. J Stat Softw, 2010, 33(1):1-22.
[13] Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems[J]. Technometrics, 2000, 42(1):55-67.
[14] Zou H, Hastie T. Regularization and variable selection via the elastic net[J]. J Roy Stat Soc, 2005, 67(2):301-320.
[15] 刘匆提, 李昂, 门志红,等. 惩罚logistic回归方法在SNPs数据变量筛选研究中的应用[J]. 实用预防医学, 2016, 23(11):1395-1399.
[16] Jia JZ, Yu B. On model selection consistency of the elastic net when p》 n[J]. Stat Sin, 2008, 24(2):595-611.
[17] Ghosh S. Adaptive elastic net: an improvement of elastic net to achieve oracle properties: IUPUI tech report No.pr07-01[R]. Indianapolis, USA: Department of Mathematical Sciences, Indiana University-Purdue University, 2007.
[18] Zou H, Zhang HH. On the adaptive elastic-net with a diverging number of parameters[J]. Ann Stat, 2009, 37(4):1733-1751.
[19] Huang J, Breheny P, Ma S, et al. The mnet method for variable selection[J]. Stat Sin, 2010,26(3):718-721.
[20] Yuan M, Lin Y. Model selection and estimation in regression with grouped variables[J]. J Roy Stat Soc, 2006, 68(1):49-67.
[21] Breheny P, Huang J. Penalized methods for bi-level variable selection[J]. Stat Intfc, 2009, 2(3):369-380.
[22] Wang L, Chen G, Li H. Group SCAD regression analysis for microarray time course gene expression data[J]. Bioinformatics, 2007, 23(12):1486-1494.
[23] Huang J, Breheny P, Ma S. A selective review of group selection in high-dimensional models[J]. J Inst Math Stat, 2013, 27(4):481-499.
[24] Liu J, Huang J, Ma S. Integrative analysis of multiple cancer prognosis datasets under the heterogeneity model[M]. Springer New York, 2013,32 (20):3509-3521.
[25] Noah S, Jerome F, Trevor H, et al. A sparse-group lasso[J]. J Comput Graph Stat, 2013, 22(2):231-245.
[26] Friedman J, Hastie T, Tibshirani R. A note on the group lasso and a sparse group lasso[J]. Statistics, 2010.
[27] Wu TT, Lange K. Coordinate descent algorithms for lasso penalized regression[J]. Ann Appl Stat, 2008, 2(1):224-244.
[28] Fang K, Wang X, Zhang S, et al. Bi-level variable selection via adaptive sparse group lasso[J]. J Stat Comput Sim, 2014, 85(1):1-11.
[29] 胡局新, 张功杰. 基于K折交叉验证的选择性集成分类算法[J]. 科技通报, 2013,33(12):115-117.