基于数据挖掘技术的肺癌危险度预测模型的构建

doi:10.3969/j.issn.1006-3110.2022.11.028

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (1667 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要目的借助数据挖掘技术,联合流行病学特征和临床症状资料构建肺癌危险度预测模型,评价各模型用于肺癌危险度预测的性能,并筛选出最优模型。方法选取460例肺癌患者和560例肺良性疾病患者为研究对象,收集其流行病学特征和临床症状共16个自变量。将研究对象按照3∶1的比例随机分为训练集与测试集,应用支持向量机(support vector machine,SVM)、决策树C5.0和人工神经网络(artificial neural network,ANN)分别建立肺癌危险度预测模型,并比较不同模型的预测性能。结果经特征提取,痰中带血、发热出汗和吸烟史等9个变量被筛选为有效变量,用来构建肺癌危险度预测模型。测试集中SVM、决策树C5.0和ANN模型的灵敏度分别为74.1%、62.5%和92.9%;特异度分别为76.2%、80.4%和64.3%;阳性预测值分别为70.9%、71.4%和67.1%;阴性预测值分别为79.0%、73.2%和92.0%;准确度分别为75.3%、72.5%和76.9%;曲线下面积分别为0.752(95%CI:0.694~0.803)、0.715(95%CI:0.655~0.769)和0.786(95%CI:0.730~0.835)。结论 ANN预测模型的整体性能优于SVM模型和决策树C5.0模型,在肺癌高危人群的筛查中具有潜在的应用价值。

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	黄普超
	原慧洁
	张桂芳

关键词 ：肺癌, 支持向量机, 决策树, 人工神经网络, 危险度预测

Abstract：Objective To establish risk prediction models for lung cancer based on the data about epidemiological characteristics and clinical symptoms by data mining technology, and to evaluate the performance of each model so as to screen out the optimal predictive model. Methods Four hundred and sixty patients with lung cancer and 560 patients with benign lung disease were selected as the subjects, and 16 independent variables comprising of epidemiological characteristics and clinical symptoms were collected. All the subjects were randomly divided into the training set and the test set in a ratio of 3:1. Based on the variables and by the use of support vector machine (SVM), C5.0 decision tree and artificial neural network (ANN), 3 risk prediction models for lung cancer were established respectively, and the predictive performances of these models were compared. Results After feature extraction, 9 variables including blood in phlegm, fever and sweating and smoking history were selected as the effective variables in the establishment of risk prediction models for lung cancer. In the test set, the sensitivities of SVM, C5.0 decision tree and ANN models were 74.1%, 62.5% and 92.9%, respectively. The specificities were 76.2%, 80.4% and 64.3%, respectively. The positive predictive values were 70.9%, 71.4% and 67.1%, respectively. The negative predictive values were 79.0%, 73.2% and 92.0%, respectively. The accuracies were 75.3%, 72.5% and 76.9%, respectively. The areas under the curves were 0.752 (95%CI:0.694-0.803), 0.715 (95%CI:0.655-0.769) and 0.786 (95%CI:0.730-0.835), respectively. Conclusion The ANN prediction model has a better overall performance than SVM and C5.0 decision tree models, and it has potential application value in the screening of high-risk population for lung cancer.

Key words： lung cancer support vector machine decision tree artificial neural network risk prediction

收稿日期: 2021-12-17

R734.2

通讯作者: 张桂芳,E-mail: xxchzhangguifang@126.com。

作者简介: 黄普超(1988-),男,河南商丘人,硕士,主治医师,　主要从事肿瘤诊治工作。

引用本文:

黄普超, 原慧洁, 张桂芳. 基于数据挖掘技术的肺癌危险度预测模型的构建[J]. 实用预防医学, 2022, 29(11): 1390-1394. HUANG Pu-chao, YUAN Hui-jie, ZHANG Gui-fang. Construction of risk prediction model for lung cancer based on data mining technology. , 2022, 29(11): 1390-1394.

链接本文:

https://www.syyfyx.com/CN/10.3969/j.issn.1006-3110.2022.11.028 或 https://www.syyfyx.com/CN/Y2022/V29/I11/1390

[1] Sung H, Ferlay J, Siegel RL, et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries[J]. CA Cancer J Clin, 2021, 71(3):209-249.
[2] Henley SJ, Ward EM, Scott S, et al. Annual report to the nation on the status of cancer, part I: national cancer statistics[J]. Cancer, 2020, 126(10):2225-2249.
[3] Saberi-Karimian M, Khorasanchi Z, Ghazizadeh H, et al. Potential value and impact of data mining and machine learning in clinical diagnostics[J]. Crit Rev Clin Lab Sci, 2021, 58(4):275-296.
[4] Valluru D,Jeya IJS. IoT with cloud based lung cancer diagnosis model using optimal support vector machine[J]. Health Care Manag Sci, 2020, 23(4):670-679.
[5] Vosshenrich J,Zech CJ,Heye T,et al. Response prediction of hepatocellular carcinoma undergoing transcatheter arterial chemoembolization: unlocking the potential of CT texture analysis through nested decision tree models[J]. Eur Radiol, 2021, 31(6):4367-4376.
[6] Xu S, Guan LJ, Shi BQ, et al. Recurrent hemoptysis after bronchial artery embolization: prediction using a nomogram and artificial neural network model[J]. AJR Am J Roentgenol, 2020, 215(6):1490-1498.
[7] 中华医学会, 中华医学会肿瘤学分会, 中华医学会杂志社. 中华医学会肺癌临床诊疗指南(2019版)[J]. 中华肿瘤杂志, 2020, 42(04):257-287.
[8] Henley SJ, Thomas CC, Lewis DR, et al. Annual report to the nation on the status of cancer, part II: progress toward healthy people 2020 objectives for 4 common cancers[J]. Cancer, 2020, 126(10):2250-2266.
[9] 郑荣寿, 孙可欣, 张思维, 等. 2015年中国恶性肿瘤流行情况分析[J]. 中华肿瘤杂志, 2019, 41(1):19-28.
[10] Oudkerk M, Liu S, Heuvelmans MA, et al. Lung cancer LDCT screening and mortality reduction - evidence, pitfalls and future perspectives[J]. Nat Rev Clin Oncol, 2021, 18(3):135-151.
[11] de Koning HJ, van der Aalst CM, de Jong PA, et al. Reduced lung-cancer mortality with volume CT screening in a randomized trial[J]. N Engl J Med, 2020, 382(6):503-513.
[12] 于丽娅, 郭薇, 吕艺, 等. 辽宁省城市肺癌患者10年生存率及其影响因素分析[J]. 实用预防医学, 2021, 28(12):1432-1436.
[13] Smith RA, Andrews KS, Brooks D, et al. Cancer screening in the United States, 2019: a review of current American Cancer Society guidelines and current issues in cancer screening[J]. CA Cancer J Clin, 2019, 69(3):184-210.
[14] Hidaka A, Sawada N, Svensson T, et al. Family history of cancer and subsequent risk of cancer: a large-scale population-based prospective study in Japan[J]. Int J Cancer, 2020, 147(2):331-337.
[15] Wood DE, Kazerooni EA, Baum SL, et al.Lung cancer screening, version 3.2018, NCCN clinical practice guidelines in oncology[J]. J Natl Compr Canc Netw, 2018, 16(4):412-441.
[16] Duggirala HJ, Tonning JM, Smith E, et al. Use of data mining at the Food and Drug Administration[J]. J Am Med Inform Assoc, 2016, 23(2):428-434.
[17] Duan S,Cao H,Liu H,et al. Development of a machine learning-based multimode diagnosis system for lung cancer[J]. Aging (Albany NY), 2020, 12(10):9840-9854.
[18] 高孜博, 李迪, 段书音, 等. 数据挖掘技术在肺癌危险度预测模型中的应用[J]. 肿瘤防治研究, 2021, 48(5):479-483.
[19] Yang GR, Wang XJ. Artificial neural networks for neuroscientists: a primer[J]. Neuron, 2020, 107(6):1048-1070.