Abstract:Objective The modified significance analysis of microarray-1 (MSAM1) method and the modified significance analysis of microarray-2 (MSAM2) method are obtained by using the Gaussian kernel function and the Euclidean distance function to improve the significance analysis of microarray (SAM) method, respectively. The original SAM method, the support vector machine recursive feature elimination (SVM-RFE) method, and the Relief method were compared to evaluate the gene selection and classification prediction ability of the MSAM1 method and the MSAM2 method in gene expression data. Methods The leukemia data set was obtained from the golubEsets package in Bioconductor (Golub, et al. gave 50 differential genes contained in the data set). Five kinds of gene selection methods were implemented using R software. The gene selection ability and classification prediction capability were evaluated by the correct rate and the area under the ROC curve, namely, the AUC value. Kruskal-Wallis H test was used to compare thebetween-groupdifferences in the correct rate and AUC valueamong the five methods,andSNK-q testwas used for further pairwise comparison. Results Both the correct rate and the AUC value were optimal for MSAM1 and MSAM2, followed by the SAM and SVM-RFE methods, and the Relief method was ranked last.The between-group differencesamong the five methodswere statistically significant (H=150.333, P<0.0001; H=293.2579, P<0.0001). The results of the pairwise comparison showed that there was no statistically significant difference between MSAM1 and MSAM2 (P>0.05), but the differences between the above-mentioned two methods and the other three methods were statistically significant (P<0.05). Conclusions The weighted SAM method modified by Gaussian kernel function and Euclidean distance function improves the gene selection and classification prediction ability of SAM method, and can obtain more stable analysis results in the application of actual gene expression data.
任雨冬, 陆震, 李婧惟, 刘艳. 基因表达数据中加权SAM法的基因选择和分类预测研究[J]. 实用预防医学, 2020, 27(12): 1537-1539.
REN Yu-dong, LU Zhen, LI Jing-wei, LIU Yan. Gene selection and classification prediction of weighted SAM method in gene expression data. , 2020, 27(12): 1537-1539.
[1] 张丽娟, 李舟军. 微阵列数据癌症分类问题中的基因选择[J]. 计算机研究与发展, 2009, 46(5):794-802. [2] Saeys Y, Inza I, Larrañaga P. A review of feature selection techniques in bioinformatics[J]. Bioinformatics, 2007, 23(19):2507-2517. [3] Kang S, Song J. Robust gene selection methods using weighting schemes for microarray data analysis[J]. BMC Bioinformatics, 2017, 18(1):389. [4] Golub TR, Slonim DK, Tamayo P, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring[J]. Science, 1999, 286(5439):531-537. [5] Tusher VG, Tibshirani R, Chu G. Significance analysis of microarrays applied to the ionizing radiation response[J]. Proc Natl Acad Sci USA, 2001, 98(9):5116-5121. [6] Grace C, Nacheva EP. Significance analysis of microarrays (SAM) offers clues to differences between the genomes of adult Philadelphia positive ALL and the lymphoid blast transformation of CML[J]. Cancer Inform, 2012, 11(11):173-183. [7] Shahjaman M, Kumar N, Mollah MMH, et al. Robust significance analysis of microarrays by minimum β-divergence method[J]. Biomed Res Int, 2017, 2017:1-18. [8] Hossain MR, Bassel GW, Pritchard J, et al. Trait specific expression profiling of salt stress responsive genes in diverse rice genotypes as determined by modified significance analysis of microarrays[J]. Front Plant Sci, 2016, 7:567. [9] 侯艳, 谢宏宇, 张晓凤, 等. 高维组学数据的变量筛选方法及其应用[J]. 中国卫生统计, 2016, 33(3):521-526. [10] Vapnik V. The nature of statistical learning theory[M]. Springer Science & Business Media, 2013:355. [11] 丁世飞, 齐丙娟, 谭红艳. 支持向量机理论与算法研究综述[J]. 电子科技大学学报, 2011, 40(1):2-10. [12] 张学工. 关于统计学习理论与支持向量机[J]. 自动化学报, 2000, 26(1):32-42. [13] Guyon I, Weston J, Barnhill S, et al. Gene selection for cancer classification using support vector machines[J]. Mach Learn, 2002, 46(1-3):389-422. [14] Kira K, Rendell LA. The feature selection problem: traditional methods and a new algorithm[C]//Proceedings of the Ninth National Conference on Artificial Intelligence. New Orleans: AAAI Press,1992, 2:129-134.