Contact Information
BG 38A RM 10N1003E
8600 Rockville Pike
Bethesda, MD 20894-3825
Tel: 301-594-3218
Fax: 301-480-0814
[email protected]
Yifan Peng
Research Fellow
Biomedical Text Mining Group (with Zhiyong Lu, Ph.D.)
Computational Biology Branch
NCBI, NLM, NIH
Research Interests
- Bio Text Mining/Bio Natural Language Processing
- Clinical Natural Language Processing
- Deep Learning for Healthcare
- Biomedical Image Processing
Education
- Ph.D., Computer Science, University of Delaware, 2016 (supervisors: Dr. Cathy H. Wu and Dr. Vijay K. Shanker)
- M.S., Computer Science, University of Delaware, 2013
- M.Eng., Sginal and Information Processing, Peking University, China, 2010
- B.S.E., Computer Science and Technology, Beijing University of Technology, China, 2007
Professional Activities
- Academic Editor: PLoS ONE
- Program Committee: The BioNLP Workshop, 2019; The ICCV Visual Recognition for Medical Image Workshop, 2019; The IEEE International Conference on Healthcare Informatics (ICHI), 2019; The IEEE International Conference on Biomedical and Health Informatics (ICBHI), 2016-Present; ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB), 2015
- Reviewer: Bioinformatics, BMC Bioinformatics, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Journal of Biomedical Semantics, Database (Oxford), PLoS One, IEEE Transactions on Knowledge and Data Engineering, Journal of Digital Imaging, AMIA Annual Symposium, AMIA Informatics Summit, Computer Methods and Programs in Biomedicine, Knowledge-Based Systems
- Poster judge: Annual Graudate Student Research Symposium, NIH, 2017-Present
Work Experience
- Research Fellow, NCBI, NIH, Bethesda, MD, 2017-Present
- Intern/Visiting scholar, NCBI, NIH, Bethesda, MD, 2015
- Intern, Google Inc., Mountain View, CA, 2014
- Intern, Natural Language Processing Research Group, Sogou Corp., Beijing, China, 2009
- Intern, China Research Lab, IBM Corp., Beijing, China, 2009
- Editor and Reporter, Programmer Magazine and CSDN, Beijing, China, 2008
Grant Awards
- 1 K99 LM013001-01, NIH/NLM, A framework to enhance radiology structured report by invoking NLP and DL: Models and Applications. Role: PI (2019-2024)
Selected Recent Publications
* co-first author. Google Scholar
Journal articles
- DeepSeeNet: A deep learning model for automated classification of patient-based age-related macular degeneration severity from color fundus photographs. Peng Y*, Dharssi S*, Chen Q, Keenan T, Agrón E, Wong W, Chew E, Lu Z. Ophthalmology. 2018. Arixv: 1811.07492
- ML-Net: multi-label classification of biomedical texts with deep neural networks. Du J, Chen Q, Peng Y, Xiang Y, Tao C, Lu Z. Journal of the American Medical Informatics Association (JAMIA). 2019, 1-7. ocz085. Arixv: 1811.05475
- A deep learning approach for automated detection of geographic atrophy from color fundus photographs. Keenan T*, Dharssi S*, Peng Y*, Chen Q, Agrón E, Wong W, Lu Z, Chew E. Ophthalmology. 2019. Arixv: 1906.03153
- Extracting chemical-protein relations with ensembles of SVM and deep learning models. Peng Y, Rios A, Kavuluru R, Lu Z. Database. 2018, 1-9. bay073.
- LitVar: a semantic search engine for linking genomic variant data in PubMed and PMC. Allot A*, Peng Y*, Wei CH, Lee K, Phan L, Lu Z. Nucleic Acids Research. 2018. gky355.
- Opportunities and obstacles for deep learning in biology and medicine. Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, Ferrero E, Agapow P-M, Zietz M, Hoffman MM, Xie W, Rosen GL, Lengerich BJ, Israeli J, Lanchantin J, Woloszynek S, Carpenter AE, Shrikumar A, Xu J, Cofer EM, Lavender CA, Turaga SC, Alexandari AM, Lu Z, Harris DJ, DeCaprio D, Qi Y, Kundaje A, Peng Y, Wiley LK, Segler MHS, Boca SM, Swamidass SJ, Huang A, Gitter A. Greene CS. Journal of The Royal Society Interface. 2018, 15(141), 20170387.
- Improving chemical disease relation extraction with rich features and weakly labeled data. Peng Y, Chi CH, Lu Z. Journal of Cheminformatics. 2016, 8(53), 1-12.
- BioC-compatible full-text passage detection for protein-protein interactions using Extended Dependency Graph. Peng Y, Arighi C, Wu CH, Vijay-Shanker K. Database. 2016, 1-8. baw072.
- Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task. Wei C*, Peng Y*, Leaman R, Davis AP, Mattingly CJ, Li J, Wiegers TC, Lu Z. Database. 2016, 1-8. baw032.
- BioCreative V BioC track overview: collaborative biocurator assistant task for BioGRID. Dogan R, Chatr-aryamontri A, Chang C, Oughtred R, Rust J, Batista-Navarro R, Carter J, Ananiadou S, Matos S, Santos A, Campos D, Oliveira J, Singh O, Jonnagaddala J, Dai HJ, Su ECY, Chang YC, Su YC, Chu CH, Chen CC, Hsu WL, Peng Y, Arighi C, Wu CH, Vijay-Shanker K, Aydın F, Hüsünbeyi ZM, Özgür A, Shin SY, Kwon D, Tyers M, Dolinski K, Wilbur J, Comeau D. Database. 2016, 1-8. baw121.
- miRTex: A text mining system for miRNA-Gene relation extraction. Li G, Ross K, Arighi C, Peng Y, Wu CH, Vijay-Shanker K. PLOS Computational Biology. 2015, 11(9), 1-24.
- A generalizable NLP framework for fast development of pattern-based biomedical relation extraction systems. Peng Y, Torii M, Wu CH, Vijay-Shanker K. BMC Bioinformatics. 2014, 15(285), 1-18.
- iSimp in BioC standard format: enhancing the interoperability of a sentence simplification system. Peng Y, Tudor CO, Torii M , Wu CH, Vijay-Shanker K. Database2014, 1-8. bau038.
- BioC interoperability track overview. Comeau DC, Batista-Navarro R, Dai HJ, Doğan RI, Jimeno Yepes A, Khare R, Lu Z, Marques H, Mattingly C, Neves M, Peng Y, Rak R, Rinaldi F, Tsai TH, Verspoor, K, Wiegers T, Wu CH, Wilbur WJ. Database 2014, 1-12. bau053.
- BioC: a minimalist approach to interoperability for biomedical text processing. Comeau DC, Doğan RI, Ciccarese P, Cohen KB, Krallinger M, Leitner F, Lu Z, Peng Y, Rinaldi F, Torii M, Valencia V, Verspoor K, Wiegers TC, Wu CH, Wilbur WJ. Database. 2013, 1-15. bat064.
Conference proceedings
- Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. Peng Y, Yan S, Lu Z. In Proceedings of the Workshop on Biomedical Natural Language Processing (BioNLP). 2019.
- Holistic and Comprehensive Annotation of Clinically Significant Findings on Diverse CT Images: Learning from Radiology Reports and Label Ontology. Yan K, Peng Y, Standfort V, Bagheri M, Lu Z, Summers RM. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2019, 8523-8532.
- MULAN: Multitask Universal Lesion Analysis Network for Joint Lesion Detection, Tagging, and Segmentation. Yan K, Tang Y, Peng Y, Standfort V, Bagheri M, Lu Z, Summers RM. International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI). 2019.
- A self-attention based deep learning method for lesion attribute detection from CT reports. Peng Y*, Yan K, Standfort V, Summers RM, Lu Z. IEEE International Conference on Healthcare Informatics (ICHI). 2019.
- BioSentVec: creating sentence embeddings for biomedical texts. Chen Q*, Peng Y*, Lu Z. IEEE International Conference on Healthcare Informatics (ICHI). 2019.
- A multi-task deep learning framework for the classification of Age-related Macular Degeneration. Chen Q*, Peng Y*, Keenan T, Dharssi S, Agrón E, Wong WT, Chew EY, Lu Z. In AMIA 2019 Informatics Summit. 2019, 505-514.
- Fine-grained lesion annotation in CT images with knowledge mined from radiology reports. Yan K, Peng Y, Lu Z, Summers R. IEEE International Symposium on Biomedical Imaging (ISBI). 2019, 285-288.
- TieNet: Text-Image Embedding Network for Common Thorax Disease Classification and Reporting in Chest X-rays. Wang X*, Peng Y*, Lu L, Lu Z, Summers R. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2018, 9049-9058.
- NegBio: a high-performance tool for negation and uncertainty detection in radiology reports. Peng Y, Wang X, Lu L, Bagheri M, Summers R, Lu Z. In AMIA 2018 Informatics Summit. 2018, 188-196.
- Chemical-protein relation extraction with ensembles of SVM, CNN, and RNN models. Peng Y, Rios A, Kavuluru R, Lu Z. In Proceedings of the BioCreative VI Workshop. 2017, 148-151.
- Deep learning for extracting protein-protein interactions from biomedical literature. Peng Y, Lu Z. In Proceedings of BioNLP workshop. 2017, 29-38.
- BioCreative VI Precision Medicine Track: creating a training corpus for mining protein-protein interactions affected by mutations. Doğan RI, Chatr-aryamontri A, Kim S, Wei C, Peng Y, Comeau D, Lu Z. In Proceedings of BioNLP workshop. 2017, 171-175.
- ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. Wang X, Peng Y, Lu L, Bagheri M, Lu Z, Summers R. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017, 2097-2106.
- An extended dependency graph for relation extraction in biomedical texts. Peng Y, Gupta S, Wu CH, Vijay-Shanker K. In Proceedings of BioNLP workshop. 2015, 21-30.
- Extended dependency graph for BioC-compatible protein-protein interaction (PPI) passage detection in full-text articles. Peng Y, Arighi C, Wu CH, Vijay-Shanker K. In Proceedings of the BioCreative V Workshop. 2015, 30-35.
- Overview of the Biocreative V chemical disease relation (CDR) task. Wei C, Peng Y, Leaman R, Davis A, Mattingly C, Li J, Wiegers T, Lu Z. In Proceedings of the BioCreative V Workshop. 2015, 154-166.
- Enhancing the interoperability of iSimp by using the BioC format. Peng Y, Tudor CO, Torii M , Wu CH, Vijay-Shanker K. In Proceedings of the BioCreative IV Workshop. 2013, 5-9.
- iSimp: a sentence simplification system for biomedical text. Peng Y, Tudor CO, Torii M , Wu CH, Vijay-Shanker K. IEEE International Conference on Bioinformatics and Biomedicine (BIBM2012). 2012, 211-216.
Abstracts
- Comprehensive Lesion Tagging on Diverse CT Images: Learning from Radiology Reports and Label Ontology. Yan K, Peng Y, Standfort V, Bagheri M, Lu Z, Summers RM. In RSNA Annual Meeting. 2019. (talk)
- MULAN: Multitask Universal Lesion Analysis Network for Joint Lesion Detection, Tagging, and Segmentation in CT Images. Yan K, Tang Y, Peng Y, Standfort V, Bagheri M, Lu Z, Summers RM. In RSNA Annual Meeting. 2019. (talk)
- Deep learning prediction of progression to late age-related macular degeneration in the Age-Related Eye Disease Study (AREDS) using deep feature extraction and survival analysis. Keenan TD, Peng Y, Chen Q, Agrón E, Wong WT, Lu Z, Chew EY. In AAO 2019. 2019. (poster)
- Integrative Analyses of dbSNP For Variant Prioritization and Interpretation. Phan L, Ward M, Allot A, Peng Y, Wei CH, Lee K, Lu Z, Wang J, Youkharibache P, Zhang D, Lanczycki C, Geer L, Geer R, Marchler-Bauer A, Madej T, Lu S, Marchler G, Wang Y, Bryant S, Rose P, Holmes JB, Kattman BL. In Biological Data Science. 2018.
- Literature Mining to Improve the Prioritization, Curation, and Integration of Knowledge for Clinically Relevant Variants. Wei CH, Phan L, Allot A, Peng Y, Lee K, Maiti R, Hefferon T, Feltz J, Lu Z. In ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB). 2018. (talk)
- Discovering drug and retinal disease association patterns from electronic medical records: a text mining approach. Dharssi S, Peng Y, Leaman R, Chew EY, Lu Z. Investigative Ophthalmology & Visual Science. 2018, 59(9), 4149. (ARVO E-Abstract 2924709)
- Automatic Classification and Reporting of Multiple Common Thorax Diseases Using Chest Radiograph. Wang X, Peng Y, Lu L, Bagheri M, Lu Z, Summers R. In RSNA Annual Meeting. 2018. (talk)
- A Deep Learning Approach for Automated Detection and Quantification of Geographic Atrophy from Color Fundus Photographs. Dharssi S*, Peng Y*, Chen Q, Agron E, Wong W, Chew E, Lu Z. In AAO 2018. 2018. (talk)
- Literature Mining to Improve the Prioritization, Curation, and Integration of Knowledge for Clinically Relevant Variants. Wei CH, Phan L, Allot A, Peng Y, Lee K, Maiti R, Hefferon T, Feltz J, Lu Z. In ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB). 2018. (talk)
- Discovering drug and retinal disease association patterns from electronic medical records: a text mining approach. Dharssi S, Peng Y, Leaman R, Chew EY, Lu Z. In ARVO Annual Meeting. 2018:59. ARVO E-Abstract 2924709. (poster)
- Text Mining Radiology Reports for Deep Learning Radiology Images. Peng Y, Wang X, Lu L, Bagheri M, Summers R, Lu Z. In AMIA 2017 Annual Symposium. 2017, 157-158. (talk).
- Automatic Classification and Reporting of Multiple Common Thorax Diseases Using Chest Radiograph. Wang X, Peng Y, Lu L, Bagheri M, Lu Z, Summers R. In RSNA Annual Meeting. 2018. (talk)
- A semantic search engine for linking genomic variant data in PubMed and PMC. Allot A, Peng Y, Wei CH, Lee K, Phan L, Lu Z. In International Society for Computational Biology. 2018. (poster)
- Text mining tools for biocuration in iProLINK web portal. Arighi C, Ding R, Gupta S, Li G, Mahmood AA, Peng Y, Ren J, Ross K, Tudor CO, Huang H, Schimidt C, Wu CH, Vijay-Shanker K. In Biocuration. 2015. (poster)
Patents
- WO2018176035A1, US Application No. 62/476,029, filed 03/24/2017. Method and System of Building Hospital-Scale Medical Image Database. Wang X, Peng Y, Lu L, Bagheri M, Lu Z, Summers R.
Talks
- Overview of natural language processing in biomedical and cancer research. Invited talk at NCI Center for Biomedical Informatics and Information Technology (CBIIT) Cancer Data Science workshop. 01/24/2019.
- Clinical NLP and deep learning for disease classification and reporting in Chest X-rays. Invited talk at ICSA 2018 Applied Statistics Symposium. 06/17/2018.
- Clinical NLP and deep learning for disease classification and reporting in Chest X-rays. Invited talk at Computer and Information Science Department, University of Delaware. 06/25/2018.
- Deep learning for extracting protein-protein interactions from biomedical literature. NIH-CSSA Symposium. 06/02/2017.
- Text Mining Radiology Reports for Deep Learning Radiology Images. Invited talk at ABCC-Imaging and Visualization Group, Frederick National Laboratory for Cancer Research. 06/23/2017.
Honors and Awards
- NIH Summer Research Mentor Award, NIH, 2019
- NIH National Library of Medicine Honor Awards, NIH, 2018
- NIH Clinical Center CEO Awards, NIH, 2017
- NIH Employee Awards, NIH, 2016-2017
- CHEMPROT-Elsevier Prize, BioCreative VI, 2017
- BioCreative Workshop Travel Award, 2015
- Professional Development Award, University of Delaware, 2015
Tools
- BLUE benchmark
- NCBI_BERT
- BioWordVec & BioSentVec
- DeepSeeNet
- ChestXray-NIHCC datasets (News release)
- LitVar
- NegBio (News release)