Abstract:Objective To compare the differences in the efficacy of HAPGEN2, gs2.0 and GWAsimulator2 in simulating single nucleotide polymorphisms (SNPs) data, and to provide guidance for the future use of SNPs data simulation methods. Methods The simulation data were generated by the above-mentioned three methods using the real population SNPs data as the original data. The simulation performance was evaluated by the linkage disequilibrium model and the minimum allele frequency, and the setting efficiency of the disease site was evaluated by χ2 difference. Results The ability of HAPGEN2 to simulate the linkage disequilibrium model was superior to those of gs2.0 and GWAsimulator2. The abilities of gs2.0 and GWAsimulator2 to simulate the minimum allele frequency were similar to as well as superior to that of HAPGEN2, and all the three methods could be used to set the single disease site. Conclusions The three SNPs data simulation methods have advantages and disadvantages, and the users can choose the appropriate simulation method according to the actual demand.
刘芸良, 肖纯, 史晓雯, 刘艳. 三种SNPs数据仿真方法的效能比较[J]. 实用预防医学, 2018, 25(2): 152-154.
LIU Yun-liang, XIAO Chun, SHI Xiao-wen, LIU Yan. Comparision of the efficacy of three SNPs data simulation methods. , 2018, 25(2): 152-154.
[1] Olivier M. A haplotype map of the human genome[J]. Physiol Genomics, 2003, 13(1):3-9. [2] Frazer KA, Ballinger DG, Cox DR, et al. A second generation human haplotype map of over 3.1 million SNPs[J]. Nature, 2007, 449(7164):851-861. [3] Cingolani P, Platts A, Wang LL, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3[J]. Fly, 2012, 6(2):80-92. [4] 凃欣, 石立松, 汪樊,等. 全基因组关联分析的进展与反思[J]. 生理科学进展, 2010, 41(2):87-94. [5] 权晟, 张学军. 全基因组关联研究的深度分析策略[J]. 遗传, 2011, 33(2):100-108. [6] 郝兴杰, 胡林, 张淑君. 全基因组关联分析方法的研究进展[J]. 畜牧兽医学报, 2016, 47(2):213-217. [7] 郑娟娟, 孙远洁, 李昂,等. 探讨χ2检验结合FDR筛选致病SNPs位点的适用条件[J]. 实用预防医学, 2012, 19(11):1604-1608. [8] 刘匆提, 李昂, 门志红,等. 惩罚logistic回归方法在SNPs数据变量筛选研究中的应用[J]. 实用预防医学, 2016, 23(11):1395-1399. [9] Hendricks AE, Dupuis J, Gupta M, et al. A comparison of gene region simulation methods[J]. PLoS One, 2012, 7(7):e40925. [10] Li N, Stephens M. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data[J]. Genetics, 2003, 165(4):2213-2233. [11] Su Z, Marchini J, Donnelly P. HAPGEN2: simulation of multiple disease SNPs[J]. Bioinformatics, 2011, 27(16):2304-2305. [12] Li J, Chen Y. Generating samples for association studies based on HapMap data[J]. BMC Bioinformatics, 2008, 24:9-44. [13] Li C, Li M. GWAsimulator: a rapid whole-genome simulation program[J]. Bioinformatics, 2008, 24(1):140-142. [14] Durrant C, Zondervan KT, Cardon LR, et al. Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes[J]. Am J Hum Genet, 2004, 75(1):35-43. [15] Rosenberg NA, Nordborg M. Genealogical trees, coalescent theory and the analysis of genetic polymorphisms[J]. Nat Rev Genet, 2002, 3(5):380-390. [16] Peng B, Kimmel M. simuPOP: a forward-time population genetics simulation environment[J]. Bioinformatics, 2005, 21(18):3686-3687. [17] Peng B, Amos CI, Kimmel M. Forward-time simulations of human populations with complex diseases[J]. PloS Genet, 2007, 3(3):e47. [18] Peng B, Amos CI. Forward-time simulation of realistic samples for genome-wide association studies[J]. BMC Bioinformatics, 2010, 1(11):442. [19] Wright FA, Huang H, Guan X, et al. Simulating association studies: a data-based resampling method for candidate regions or whole genome scans[J]. Bioinformatics, 2007, 23(19):2581-2588. [20] 孙远洁, 郑娟娟, 李昂,等. 复杂性疾病SNPs数据模拟的实现与效果评价[J]. 实用预防医学, 2013, 20(1):4-8.