Project Spotlight: Clinical Abbreviation Resolution Engine
February 27, 2019
Center for Commercial Applications of Healthcare Data’s Daqing He, PhD, explains his team’s deep learning algorithm to reduce abbreviation misinterpretation within clinical datasets and the ways the Alliance has positioned his project to succeed.
Please share a little about your background and your research experiences.
I am a full professor at the School of Computing and Information at the University of Pittsburgh. My research interests cover information retrieval, natural language processing (NLP), and intelligent system design. Over the years, I’ve been fortunate to publish over 200 papers at top journals and conferences on these areas.
What led you to the Pittsburgh Health Data Alliance?
Dr. Rebecca Jacobson, who is currently the VP of Analytics at UPMC Enterprises, and I started to talk about research collaboration when she was a Professor in the Department of Biomedical Informatics. We identified that resolving abbreviation/acronym ambiguities in clinical text is an important, but yet underdeveloped, task in the clinical domain. We decided that with our strong combined NLP skills and her additional medical/clinical background that we’d be a perfect team for taking on this problem. Around the same time, we noticed the Alliance as a great platform for supporting transferable technology in the health and medical domain, so I applied and was given funding!
Walk us through your project.
Unfortunately, in the clinical world, abbreviations aren’t unique. “TX” for example, can mean both “Treatment” and “Transplant.” Currently, 71% of identified abbreviations in clinical text can be ambiguous in their meanings. Especially when considering potential applications of NLP, high accuracy acronym and abbreviation disambiguation is crucial for completing tasks correctly. Our Clinical Abbreviation Resolution Engine (CARE) project utilizes the latest deep learning models and trains them on large-scale collections of clinical reports to improve abbreviation resolution. The project also develops a clustering-based annotation strategy and interface to quickly generate annotated data for model development. We work closely with our collaborators at UPMC on the third-generation models and on developing pipelines for integrating the models into UPMC clinical NLP systems.
How do you and your project partners’ strengths complement each other?
Our team has strong deep learning model development skills, NLP/text mining experience, and extensive medical knowledge. Having the skills, knowledge, and data gives us a unique advantage to not just analyze the problem, but to also then create solutions and test them.
In what ways has UPMC played a role lending clinical expertise and sharing data?
Our UPMC partners provide us with the ability to train models on over 7 million clinical reports. I know I don’t have to say it, but I will anyway: that’s a lot. Our UPMC partners also work closely with us on the development of the clustering-based annotation strategy and interface which is very helpful.
When you look at Pittsburgh as a region, what role do you see the Pittsburgh Health Data Alliance playing? What do you foresee the future of innovation looking like here?
Technology transfer projects require three critical things: partners, domain knowledge and datasets, and funding to carry out the transfer tasks. The Pittsburgh Health Data Alliance helps to fill each one of these areas. And, because of this, I feel that the Alliance will continue to provide projects and researchers with the assets necessary to succeed. As this continues to happen, projects – and the Pittsburgh area – will flourish. Pittsburgh is filled with talented people in the health and medical fields, and I’m really looking forward to seeing their great ideas come to fruition and the city becoming one of the nation’s leading innovation hubs.