Project Team: Audrey Fu, Md. Bahadur Badsha, Evan Martin

Complex diseases often involve changes in DNA sequence, and in DNA transcription and methylation, an epigenetic process that can both regulate and be regulated by gene expression. These changes result in a wide range of symptoms or multiple subtypes of the same disease. In breast cancer, for example, different patterns of gene expression and DNA methylation characterize subtypes that vary in terms of tumor progression and treatment. In order to develop more effective treatments for different subtypes, it is necessary to understand the genes and processes (i.e., transcription and methylation) that drive the differences between subtypes. It is therefore of immense interest to understand how genetic variation influences disease through gene regulatory networks. Unfortunately, identification of genes and processes that are key to diseases is often compromised by inference based on correlation, not causation.

Our long-term goal is to develop computational methods to infer gene regulatory networks that are potentially causal for multiple clinical phenotypes using genomic and clinical data of complex diseases. In this project, we will develop new statistical approaches based on the principle of Mendelian randomization to systematically identify regulatory networks involving both transcription and methylation that are potentially causal for disease subtype. We will use breast cancer as the disease model and apply our methods to genomic data. The principle of Mendelian randomization assumes that the alleles of a genetic variant are randomly assigned to individuals in a population, analogous to a natural randomization experiment. This principle has gained increasing attention in genomics, given its power to separate correlation due to causation from correlation not due to causation.

The models and algorithms developed here will allow us to make causal statements about the two processes at the single gene level and account for confounding variables, which similar studies have not examined. These methods will help to identify key genes for specific breast cancer subtypes and elucidate the roles of transcription and methylation when many genes are involved, offering insights into genes and processes that could better inform subtype classification, cancer diagnosis and development of novel drug targets. These methods are not limited to breast cancer but are applicable to complex diseases in general.