In the investigation of the mechanisms behind gene regulation and its impact on diseases, two lines of research have been largely separately carried out in recent years. On the one hand, gene regulatory networks and protein interaction networks have been under extensive study, especially in systems biology, where genetic variation is usually ignored. On the other hand, mutations, indels (insertions and deletions), and copy number variants have been identified for many diseases in genome-wide association studies. It is therefore of immense interest to understand how genetic variation influences disease through gene regulatory networks.
To construct these networks, at least three key pieces of information are important: gene expression, transcription factor binding, and genotypes (especially at expression quantitative trait loci; that is, eQTLs). In particular, the latter two enable causal inference in the network construction, although how to use them in a probabilistic and rigorous way has not been systematically explored. This project aims to develop statistical models and efficient computational strategies, drawing on recent advances in graphical models and causal inference, to construct causal regulatory networks involving genetic variation and TF binding. This project will use breast cancer as a disease model and apply the proposed methodologies to different subtypes. Topological features of the inferred regulatory networks may suggest potentially different mechanisms in breast cancer subtypes.