北京大学 | ENGLISH
讲座信息
Automated feature extraction and selection for high-throughput phenotyping
发布时间:2016-11-11      点击量:677
主讲人:Yu Sheng
讲座地点:Room 311, Wang Ke-Zhen Building, Peking University
讲座日期:2016-11-18
讲座时间:14:00 — 15:00
 

CBI seminar
Title:Automated feature extraction and selection for high-throughput phenotyping
Speaker: Dr.  Yu Sheng,
Assistant Professor of statistics
in the Center for Statistical Science of Tsinghua University.
Time:14:00-15:00, Friday, November 18, 2016
Location: Room 311, Wang Ke-Zhen Building, Peking University
Abstract:
With the rapid adoption of electronic medical records (EMR), medicine and healthcare has become one of the most important field for big data applications. One of the important applications in medical research is the EMR-based phenotyping, which is to identify patients with certain phenotypes with machine learning algorithms. The conventional procedure for designing a phenotyping algorithm requires the participation of medical experts to discuss with statisticians and medical informaticians about the variables to use and the medical terms to search for, and the designing of one algorithm typically takes months to finalize. We propose a data-driven method to automate the algorithm designing process that can achieve higher accuracy even than expert designed algorithms. We utilize publicly available knowledge sources, such as the Wikipedia, to collect an initial set of candidate features. Billing codes and the natural language variable of the target phenotype are used to created surrogates of the gold-standard labels, and penalized logistic regression models are trained repeatedly with bootstrap to predict the surrogates in order to evaluate the informativeness of the candidate features. Only a succinct set of highly informative features will pass the data-driven screening and enter the final model to predict the true gold-standard labels. This method has been implemented in the development of large scale biobanks in top ranked hospitals in the U.S.
Speaker Bio:
Dr. Yu Sheng is Assistant Professor of statistics in the Center for Statistical Science of Tsinghua University. Dr. Yu received his BS and MA degrees in statistics from Nankai University and the University of Michigan, and he received his PhD degree in systems engineering (operations research) from the George Washington University. He started his research in medical informatics since his research work at Harvard University, and his current research interests include deep understanding of the medical language with machine learning methods, internet and data-driven knowledge extraction, and supervised and unsupervised EMR analysis.
Welcome!

[友情链接]
北大生科微信公众号 生声不息微信公众号
联系我们 | 地理位置
北京大学生命科学学院 版权所有 地址:北京市海淀区颐和园路5号金光生命科学大楼