188比分直播

Transfer learning for improving the genetic effect size estimation with accommodating heterogeneous GWAS summary data

发布者:文明办发布时间:2024-12-23浏览次数:10


主讲人:李启寨 中国科学院数学与系统科学研究院研究员


时间:2024年12月28日9:00


地点:三号楼332室


举办单位:数理学院


主讲人介绍:中国科学院数学与系统科学研究院研究员,系统科学研究所副所长;2001年本科毕业于中国科学技术大学,2006年博士毕业于中国科学院数学与系统科学研究院,2006-2009年在美国国立卫生健康研究院(NIH)国家癌症研究所(NCI)从事博士后研究;2016年当选国际统计学会推选会员(ISI Elected Member), 2017年获国家优秀青年科学基金,2020年当选美国统计学会会士(ASA Fellow),2023年获国家杰出青年科学基金;研究方向:生物医学统计、遗传统计、复杂数据的统计推断等;在Nature Genetics, Science Advances, Angewandte Chemie-International Edition, Cancer Research, American Journal of Human Genetics, Bioinformatics,IEEE Transactions on Pattern Analysis and Machine Intelligence, Journal of the American Statistical Association, Journal of the Royal Statistical Society Series B, Biometrics等期刊发表SCI论文110余篇;现任中国数学会常务理事、中国现场统计研究会常务理事等。


内容介绍:In Genome-wide association studies (GWAS), summary statistics have become one of the most popular formats for data sharing and analyzing. This paper focuses on utilizing GWAS summary statistics as auxiliary data to enhance the estimation efficiency of Polygenic risk score (PRS) models. Existing methods heavily rely on the complete homogeneity assumption that all studies are under the same parametric model, which is unrealistic given the diverse populations studied in different GWAS. Biological evidence suggests that risk variants can have different effect sizes in different populations. To address this limitation, we introduce SS-trans, a novel framework that effectively leverages heterogeneous summary data from external studies to enhance statistical analysis in the internal study of interest. Unlike existing approaches, our framework relaxes the requirement of complete homogeneity and only necessitates partial parameter similarity across studies. Our theoretical analysis demonstrates significant improvements in estimation accuracy within the internal study, even when external studies exhibit only local similarity. The advantage of the proposed framework is also supported by extensive numerical experiments on both synthetic data and real data of Gene Environment Association Studies type 2 diabetes dataset.