Potential of Natural Language Processing (NLP) in Healthcare

By Vinitha Ganesan, PhD

Health Catalyst published the report “Healthcare NLP: The Secret to Unstructured Data’s Full Potential,” based on a webinar given by Wendy Chapman, PhD, Chair of the Department of Biomedical Informatics, University of Utah School of Medicine, and Mike Dow, Technical Director at Health Catalyst. Vinitha Ganesan, PhD, Commercial Translation Architect at the Center for Commercial Applications of Healthcare Data, shared her thoughts on the report and on how the Pittsburgh Health Data Alliance is approaching unstructured data with projects like Clinical Abbreviation Resolution Engine.

GanesanThe Health Catalyst report references how Electronic Health Records (EHRs) are currently frustrating clinicians, as they take away time from patient engagement and care; and that Natural Language Processing (NLP) tools have the potential to reduce this frustration and enhance EHR use by extracting useful information. For instance, NLP can enable an EHR interface that makes it easier for clinicians to find buried data and make diagnoses or treatments they might have otherwise overlooked.

However, implementing NLP for complex healthcare applications comes with significant challenges due to the limited usability of EHR data. The EHR data is not readily usable because of the way data is entered: Clinicians and other caregivers at the hospitals commonly use abbreviations when typing the information into the system. This creates ambiguity in clinical text (such as physician notes) as these abbreviations can have different meanings depending on the context. This unprocessed data is usually referred to as “unstructured data.” The report discusses how healthcare organizations can apply NLP tools to convert unstructured EHR data to machine interpretable data. This process can in turn increase the usability of EHR data and help with the ability of inferring insights from EHRs that can ultimately improve healthcare outcomes. The report also recommends that organizations need to look for NLP tools that can be applied to healthcare-specific vendor systems other than those that are readily available in order to maximize the potential of NLP.

One of the PHDA projects, Clinical Abbreviation Resolution Engine (CARE) addresses ambiguity in unstructured clinical data. CARE is an NLP platform technology that uses deep learning to address ambiguity in abbreviations used in clinical texts to significantly improve text information extraction and interpretation from the EHRs. An example of this ambiguity is the occurrence of the abbreviation “CP” in discharge summaries. “CP” could mean “clinical pathology” or “cerebral palsy” or “chest pain,” depending on the context. Such occurrences reduce the quality of the unstructured EHR data and require manual input to prevent incorrect prompts or incorrect reimbursement codes downstream.

CARE is a high-performance engine that is being developed specifically for healthcare applications. It can handle large amounts of clinical abbreviation disambiguation in EHRs at a fast pace. CARE identifies a clinical abbreviation, learns the context in which the abbreviation is embedded and then classifies the abbreviation correctly without any manual intervention. As the engine has been modeled well using millions of clinical text documents, it will scale well for improving usability of EHR data.

The Health Catalyst report talks about NLP models evolving towards accessing the high hanging fruit, i.e. aiming for complex applications in healthcare, which is precisely what CARE is doing. This kind of technology will be highly valuable for the existing NLP pipeline at UPMC and beyond.