Project Spotlight: Clinical Genomics Modeling Platform
September 25, 2019
Carl Kingsford, PhD, and Christopher Langmead, PhD, both hold doctorates in computer science and specialize in the development and application of efficient algorithms to address challenges in computational biology. Dr. Kingsford’s group focuses on algorithms for extracting knowledge from large biological data sets, particularly high-throughput DNA and RNA sequencing data, while Dr. Langmead’s group focuses on the use of machine learning to create models for clinical applications. Together, they lead the Clinical Genomics Modeling Platform project.
What led you to the Pittsburgh Health Data Alliance?
Dr. Kingsford: We were drawn to the Alliance because it provided us with a unique opportunity to assist physicians by creating new technologies that turn healthcare data into clinical decision support tools.
Walk us through your funded project.
Dr. Langmead: Our project is a framework for automatically learning predictive models from high-throughput DNA sequencing data. Genomic data poses a number of challenges for standard machine learning techniques, including size, complexity, and the fact that the majority of the data is irrelevant to the predictive task (e.g., predicting patient survival time). Our suite of software addresses these challenges through a combination of scalable algorithms that are capable of not only identifying which parts of the genome are relevant to the predictive task, but also capable of explaining why those parts matter in a biologically intuitive fashion. This is an improvement over existing methods that tend to produce inscrutable models. Our software has been shown to scale to thousands of genomes, and so far, has been used to produce models for predicting survival in breast cancer patients. Ultimately, our project will make predictive models easier to build, apply, and understand, for a variety of end-users including non-computational biologists and clinicians.
What are your project’s next steps?
Dr. Kingsford: The next steps for our project are to demonstrate our system on an even higher scale (tens of thousands of genomes) and on a larger set of clinical outcomes. The PHDA is uniquely positioned to assist us in the acquisition of the larger, anonymized data sets from UPMC and the University of Pittsburgh, and also by identifying clinical collaborators.
What do you foresee the future of innovation looking like here in Pittsburgh?
Dr. Langmead: Pittsburgh is the ideal environment to foster innovation in healthcare, due to its world-class expertise in medicine, computer science, and machine learning/artificial intelligence, and the presence of a motivated and skilled workforce. The PHDA is the nexus between the key institutions and industry partners. It helps to establish connections and collaborations that will produce new technologies that will help improve human health.