Skip to main content
abstract news background

New Tools to Mine Health Datasets May Uncover the Genetic Causes of Heart Defects

Aug 07, 2017

Earlier this summer, comedian Jimmy Kimmel delivered a moving monologue detailing the dramatic events that unfolded following the birth his young son, Billie, who was diagnosed with the birth defect, tetralogy of Fallot with pulmonary atresia. This rare disorder falls under the umbrella of congenital heart defects (CHD), which are characterized by structural defects in the heart or major blood vessels. While only one percent of babies are born with a form of CHD, they may require open heart surgery and special heart care throughout their lives.

Medical practitioners have long believed that CHDs have a genetic component, yet they have had little success identifying the responsible culprits. In July, University of Utah Health was one of four institutions across the United States that received a $3.7 million grant from the American Heart Association to prevent and treat CHD.

The U’s project relies on two new computational tools, CAE and WARP, to mine Utah-specific databases—Enterprise Data Warehouse (EDW) and Utah Population Database (UPDB)to uncover genetic and environmental factors that cause CHD.  

These tools are the brain child of Mark Yandell, PhD, a professor of human genetics and director of the USTAR Center for Genetic Discovery, who understood the power locked within these databases.

The Cohort Assembly Engine (CAE), simply pronounced “Kay,” provides researchers the ability to enter simple text terms, like congenital heart defect and fatigue, to conduct Google-like searches of the EDW, which contains the medical information of 3.5 million University of Utah and Intermountain Health patients.

The query produces a list of patients, ranked by statistical significance that match the search terms. In addition, it provides the secondary conditions, like a heart murmur, associated with these patients. Clinicians can use this output to identify a group of patients to enroll in a clinical study to explore the genetic component of a disease.

Researchers also can feed CAE output directly into the UPDB, which combines the genealogy, demographic, and health records of almost 14 million individuals across the state, to uncover genetic commonality of the disease.

While it is impossible to sequence every family member in large, inter-connected families, Yandell and his team developed another computational tool, Warp, to prioritize individuals to sequence, which maximizes research dollars to discover gene variations important in understanding disease.

“This work is really a cutting-edge application of statistics and computational approaches to mine big datasets,” said Yandell. “These tools assemble maximally informative cohorts of patients, which helps us to understand clinical outcomes, improve patient care, and maximizes the cost-effectiveness of genome sequencing.  

When asked why this approach had not been done before, Yandell admits that these databases are incredibly complex. 

The relationship between the health data in the EDW is incredibly interconnected, and the UPDB is a complicated spider web of marriage and childbirth for millions of people over a 200-year period down through time,” he said. “It is super interesting and complex dataset to compute over that is also a hideously complicated undertaking.”
Yandell believes CAE and WARP will be available to outside researchers with proper Internal Review Board (IRB) permissions to mine the EDW and UPDB as early as 2018.

“We hope these tools will empower the whole CCTS network to explore the cause of disease,” Yandell said.

Yandell worked alongside Gordon Lemmon, PhD, senior software developer at the USTAR Center for Genomic Discovery, Alex Henrie, Software engineer, Karen Eilbeck, PhD, professor of Biomedical Informatics, Vikant Deshmukh, PhD, adjunct assistant professor in Population Health Sciences, Martin Tristani-Firouzi, MD, professor in Pediatrics, and Bruce Bray, MD professor of Biomedical Informatics at U of U Health in developing these tools.

The Program in Personalized Health through the U of U Health Center for Clinical and Translational Science provided early funding to develop CAE and WARP. The Program in Personalized Health leads research, clinical, and educational initiatives to bring “the right care, to the right person, at the right time, for the right cost.”