"景先生毕设|www.jxszl.com

基于聚类算法和主成分分析的多性状关联分析方法【字数:11329】

2024-11-03 10:15编辑: www.jxszl.com景先生毕设

目录
摘要II
关键词II
AbstractIII
引言
1 引言1
2 文献综述2
3 材料与方法4
3.1 遗传模型4
3.2 Kmeans聚类算法4
3.3 簇数K值的选择5
3.4 代表表型的选择6
3.5 实验材料6
3.5.1 模拟数据6
3.5.2 植物真实数据8
4 结果与分析8
4.1 模拟实验结果8
4.2 植物真实数据分析14
5 讨论15
5.1 论文工作总结15
5.2 研究展望16
致谢17
参考文献18
附录20
获得学术成果43
基于聚类算法和主成分分析的多性状关联分析方法
摘 要
目前,在全基因组关联研究当中,已有数千种与复杂的性状和疾病相关联的遗传变异被鉴定出,成功地应用到了关于人类、植物和动物遗传学等的研究当中。相比于一次只分析一个表型,对多个表型进行联合分析可以利用表型之间的相关信息,以此来提高检验的统计功效,并发现遗传多效性的生物学原理。本文提出了一种基于Kmeans聚类算法和主成分分析的多性状关联分析方法,可以在降低多个表型数据维度和提高分析速度的同时,还能提高检测的功效和能力。本文共进行了三种遗传结构下四种情形的十二组模拟研究,并分析了植物的真实数据集中的19个表型,从统计功效、运算时间以及检测到的SNP和基因的数量的角度验证了该方法的有效性。结果表明,与传统的多元方差分析和单变量方差分析对比,新方法具备更多的优势。另外,代表表型的选择也十分重要,即使基于相同的聚类信息也可能导致截然不同的结果,不同的情形的结果表明本文所提出的方法稳健性更高。最后我们对该方法可能存在的改进和需要进一步研究的方面进行了讨论。
JOINT ANALYSIS METHOD OF MULTIPLE PHENOTYPES BASED ON CLUSTERING ALGORITHM AND PRINCIPAL COM *51今日免费论文网|www.51jrft.com +Q: @351916072
PONENT ANALYSIS
ABSTRACT
Nowadays, thousands of genetic variations associated with complicated traits and diseases have been authenticated in genomewide association studies, which have been successfully applied to the studies of human, plant and animal genetics. Compared to analyzing only one phenotype at a time, joint analysis of multiple phenotypes can use the relevant information between phenotypes to improve the statistical power of the test and discover the biological principles of genetic pleiotropy. This paper proposes a multitrait association analysis method based on Kmeans clustering algorithm and principal component analysis, which can reduce the dimension of multiple phenotypic data and increase the speed of analysis, while also improving the power and ability of detection. In this paper, a total of twelve sets of simulation studies on four situations under three genetic structures are carried out, and 19 phenotypes in the real data set of plants are analyzed from the perspective of statistical power, computing time, and the number of detected SNPs and genes, to demonstrate the validity of this method. The results show that the new method has more advantages than traditional multivariate analysis of variance and univariate analysis of variance. In addition, the choice of representative phenotype is also very important. Even based on the same clustering information may lead to very different results. The results of different situations indicate that the proposed method is more robust. Finally, we discussed the possible improvements of the method and the areas that need further study.
KEY WORDS: Genomewide Association Studies; Clustering Algorithms; Principal Component Analysis; Statistical Power; Genetic Structure 1 引言

原文链接:http://www.jxszl.com/jsj/sxtj/606743.html