Further processing options
Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature
Saved in:
Journal Title: | Journal of the American Medical Informatics Association |
---|---|
Authors and Corporations: | , , |
In: | Journal of the American Medical Informatics Association, 23, 2016, 4, p. 766-772 |
Type of Resource: | E-Article |
Language: | English |
published: |
Oxford University Press (OUP)
|
Subjects: |
Summary: | <jats:title>Abstract</jats:title> <jats:p>Objective Identifying disease-mutation relationships is a significant challenge in the advancement of precision medicine. The aim of this work is to design a tool that automates the extraction of disease-related mutations from biomedical text to advance database curation for the support of precision medicine.</jats:p> <jats:p>Materials and Methods We developed a machine-learning (ML) based method to automatically identify the mutations mentioned in the biomedical literature related to a particular disease. In order to predict a relationship between the mutation and the target disease, several features, such as statistical features, distance features, and sentiment features, were constructed. Our ML model was trained with a pre-labeled dataset consisting of manually curated information about mutation-disease associations. The model was subsequently used to extract disease-related mutations from larger biomedical literature corpora.</jats:p> <jats:p>Results The performance of the proposed approach was assessed using a benchmarking dataset. Results show that our proposed approach gains significant improvement over the previous state of the art and obtains F-measures of 0.880 and 0.845 for prostate and breast cancer mutations, respectively.</jats:p> <jats:p>Discussion To demonstrate its utility, we applied our approach to all abstracts in PubMed for 3 diseases (including a non-cancer disease). The mutations extracted were then manually validated against human-curated databases. The validation results show that the proposed approach is useful in a real-world setting to extract uncurated disease mutations from the biomedical literature.</jats:p> <jats:p>Conclusions The proposed approach improves the state of the art for mutation-disease extraction from text. It is scalable and generalizable to identify mutations for any disease at a PubMed scale.</jats:p> |
---|---|
Physical Description: | 766-772 |
ISSN: |
1067-5027
1527-974X |
DOI: | 10.1093/jamia/ocw041 |