Project Spotlight: Diagnosis Coding Engine

Medical diagnostic errors impact 12 million adults each year in the US. A key reason why diagnostic errors are made – even by the best clinicians in highly reliable organizations – is the increasing complexity of the diagnostic process, with over 10,000 diseases and 5,000 laboratory tests to choose from.

This project focuses specifically on preventing coding and billing errors. To address this cognitively complex problem, the team is developing an engine that will predict likely diagnosis codes based on information available in a patient’s electronic health record. Specifically, the solution will review both structured and unstructured data, such as clinical notes, and apply a machine learning-based mapping from these data to specific diagnosis codes.

We connected with researchers, Dr. Pradeep Ravikumar and Dr. Jeremy Weiss of Carnegie Mellon University, to learn more about the project in the responses below.

Please share a little about your background and your research experiences.

Our team consists of Dr. Pradeep Ravikumar, an expert in machine learning (ML), and Dr. Jeremy Weiss, an expert in medical informatics. Dr. Ravikumar’s research group at Carnegie Mellon University works on next-generation statistical machine learning under two main verticals: “graceful AI” and “scrappy AI.” In graceful AI, we aim to learn ML models that are explainable, robust to train and test time corruptions, and resilient to distribution shifts. While in scrappy AI, we aim to learn ML models under resource constraints by discovering or leveraging various notions of “structure” and domain knowledge.

Dr. Weiss’ Care Health and Reasoning Machines (CHARM) lab develops longitudinal methods for health records data to uncover disease patterns in diseases of internal medicine. When deriving insights from health records data and making them useful to clinicians and stakeholders, these methods have to address the challenges upstream of modeling real-world, incomplete and censored data. We develop these methods to provide tailored solutions in sepsis progression, opioid abuse, and the COVID-19 pandemic.

What led you to the PHDA?

We first met at an event which in part discussed the vast mismatch between problems ML researchers were tackling in the clinical domain, broadly around difficult prediction tasks, and problems that clinical practitioners found most pressing. Our discussions then led to the broad vision of our project: to help make electronic health records useful to doctors, rather than bog them down with extra work that detracts from their clinical practice.

From there, thanks to the outreach of PHDA and the Center for Machine Learning and Health (CMLH) to researchers at CMU, we were introduced to PHDA focusing on one aspect of our vision: to help simplify the task of populating problem lists in electronic health records.

Walk us through your project.

Our broader vision is to help doctors with their workflows around electronic health records. One specific problem we aim to solve is populating problem lists in electronic health records. This entails predicting ICD diagnosis codes after the patient has finished their visit. Currently, initial errors in such coding can range as high as 80% in some cases. Reducing these coding errors can lower costs, cause fewer penalties and result in enhanced reimbursements. When the coding predictions are explainable, it also lowers the burden of audits.

How do you and your project partners’ strengths complement each other?

Pradeep’s research expertise is in statistical ML, specifically in being able to learn complex models that can leverage domain knowledge (which there is a lot of in the clinical domain). This domain knowledge comes with certifiable guarantees that are robust and explainable. This nicely complements Weiss’ research expertise in medical informatics where his medical training aligns the algorithmic approaches with clinical scenarios.

How is the PHDA uniquely positioned to assist your team and grow your project to commercialization

One of the most crucial inputs to modern ML systems is data. The larger and higher the quality of the data is, the better the resulting machine learnt systems. Another crucial input is domain knowledge about workflows, processes and connections.

A project such as ours has a fantastic synergy with PHDA. By leveraging its data warehouses and its domain expertise, we can create a lot of value. We can accomplish this not only as a commercial enterprise, but also to realize the original vision of why we have electronic health records in the first place: to move towards better clinical practice for doctors and patients.

Subscribe to our newsletter to get the latest Alliance news, event invitations and updates, and behind-the-scenes spotlights on our researchers, projects, and team members.

By clicking Submit, I agree to the Pittsburgh Health Data Alliance's Terms of Use and Privacy Policy.