Download Curriculum Vitae

Jian Zhou is an Assistant Professor in the Lyda Hill Department of Bioinformatics, and he is a Lupe Murchison Foundation Scholar in Medical Research and a Scholar of the Cancer Prevention Research Institute of Texas (CPRIT). His research interest is to develop computational approaches, especially machine learning and statistical methods, to tackle challenges and explore opportunities emerging from big data in biomedical research. He is particularly passionate about building comprehensive and quantitative models with deep learning for studying the regulatory functions of genomic sequences and their link to human diseases, leveraging their predictive power on even sequences unobserved in existing data. His lab site is .

He entered the field of computational biology and machine learning as a PhD student in Dr. Olga Troyanskaya’s group at Princeton University. He was first drawn to the challenge of understanding interactions underlying chromatin organization and developed a method to resolve direct interactions among chromatin proteins from complex co-binding patterns. The interaction model led to an interaction-driven chromatin state discovery algorithm to survey epigenetic state landscape and identification of enhancer-like states with distinct regulatory properties. 

In developing a sequence-dependency component for the chromatin protein interactions models, he discovered that deep learning can achieve huge performance leaps over previous techniques such as motif-based methods in predicting chromatin properties and transcription factor binding from the sequence. More importantly, the models enabled for the first time predicting the effects of any known or unobserved genomic variations genome-wide, with what he calls “in silico mutagenesis”. The system he created, DeepSEA, enables predicting chromatin biochemical effects of any genomic variant with high accuracy and single nucleotide resolution using only genomic sequences as input. This direction of deep learning sequence modeling and in silico mutagenesis has since seen a wave of deep learning-driven research efforts.

These works on sequence models opened up opportunities to study the roles of noncoding mutations in complex human diseases that have previously eluded discovery. During his postdoc, he and colleagues showed that noncoding de novo mutations contribute to ASD (with Robert Darnell’s group) and studied their roles in congenital heart disease (with Bruce Gelb’s group). He is optimistic that these works mark the beginning of unveiling the role of noncoding mutations and their cell type-specific impacts on complex human diseases.

The sequence models of chromatin also laid the foundation for predicting expression levels of the genes from sequences, which has been a long-standing challenge. His recent work, ExPecto, enabled prediction of human cell type-specific expression ab initio from sequence, as well as predicting the mutation effects on expression. Moreover, he demonstrated that sequence-based expression models can be used to improve understanding of human diseases--from predicting causal variants in GWAS and QTL studies to deciphering tissue-specific evolutionary constraints on gene expression.

More recently, he developed a "quasilinear" methodological framework for interpretable exploratory analysis of single-cell data and enabling direct comparison and integration of single-cell datasets. These methods also provide a statistical framework for inferring confidence sets of single cell cluster, trajectory, and surface structures, addressing an important missing link in single cell data analysis. He has used part of these methods to explain gene expression variation in developmental kidney and organoid single cell data, in collaboration with Matthias Kretzler’s group.

Research Interest

  • Machine Learning and Statistical Methods for Genomics
  • Sequence Basis of Genome Regulation
  • Statistical Methods for Single-cell Data Analysis
  • The Evolution of Noncoding Genome


Featured Publications LegendFeatured Publications

Single-cell analysis of progenitor cell dynamics and lineage specification in the human fetal kidney.
Menon R, Otto EA, Kokoruda A, Zhou J, Zhang Z, Yoon E, Chen YC, Troyanskaya O, Spence JR, Kretzler M, Cebrián C, Development 2018 08 145 16
A community computational challenge to predict the activity of pairs of compounds.
Bansal M, Yang J, Karan C, Menden MP, Costello JC, Tang H, Xiao G, Li Y, Allen J, Zhong R, Chen B, Kim M, Wang T, Heiser LM, Realubit R, Mattioli M, Alvarez MJ, Shen Y, Gallahan D, Singer D, Saez-Rodriguez J, Xie Y, Stolovitzky G, Califano A, Nat. Biotechnol. 2014 Dec 32 12 1213-22
Histone H3K9 trimethylase Eggless controls germline stem cell maintenance and differentiation.
Wang X, Pan L, Wang S, Zhou J, McDowell W, Park J, Haug J, Staehling K, Tang H, Xie T, PLoS Genet. 2011 Dec 7 12 e1002426

Honors & Awards

  • CPRIT Scholar in Cancer Research
  • Endowed Scholar in Medical Research