Jian Zhou is an Assistant Professor in the Lyda Hill Department of Bioinformatics, and he is a Lupe Murchison Foundation Scholar in Medical Research and a Scholar of the Cancer Prevention Research Institute of Texas (CPRIT). His research interest is to develop computational approaches, especially machine learning and statistical methods, to tackle challenges and explore opportunities emerging from big data in biomedical research. He is particularly passionate about building comprehensive and quantitative models with deep learning for studying the regulatory functions of genomic sequences and their link to human diseases, leveraging their predictive power on even sequences unobserved in existing data. His lab site is zhoulab.io .
He entered the field of computational biology and machine learning as a PhD student in Dr. Olga Troyanskaya’s group at Princeton University. He was first drawn to the challenge of understanding interactions underlying chromatin organization and developed a method to resolve direct interactions among chromatin proteins from complex co-binding patterns. The interaction model led to an interaction-driven chromatin state discovery algorithm to survey epigenetic state landscape and identification of enhancer-like states with distinct regulatory properties.
In developing a sequence-dependency component for the chromatin protein interactions models, he discovered that deep learning can achieve huge performance leaps over previous techniques such as motif-based methods in predicting chromatin properties and transcription factor binding from the sequence. More importantly, the models enabled for the first time predicting the effects of any known or unobserved genomic variations genome-wide, with what he calls “in silico mutagenesis”. The system he created, DeepSEA, enables predicting chromatin biochemical effects of any genomic variant with high accuracy and single nucleotide resolution using only genomic sequences as input. This direction of deep learning sequence modeling and in silico mutagenesis has since seen a wave of deep learning-driven research efforts.
These works on sequence models opened up opportunities to study the roles of noncoding mutations in complex human diseases that have previously eluded discovery. During his postdoc, he and colleagues showed that noncoding de novo mutations contribute to ASD (with Robert Darnell’s group) and studied their roles in congenital heart disease (with Bruce Gelb’s group). He is optimistic that these works mark the beginning of unveiling the role of noncoding mutations and their cell type-specific impacts on complex human diseases.
The sequence models of chromatin also laid the foundation for predicting expression levels of the genes from sequences, which has been a long-standing challenge. His recent work, ExPecto, enabled prediction of human cell type-specific expression ab initio from sequence, as well as predicting the mutation effects on expression. Moreover, he demonstrated that sequence-based expression models can be used to improve understanding of human diseases--from predicting causal variants in GWAS and QTL studies to deciphering tissue-specific evolutionary constraints on gene expression.
More recently, he developed a "quasilinear" methodological framework for interpretable exploratory analysis of single-cell data and enabling direct comparison and integration of single-cell datasets. These methods also provide a statistical framework for inferring confidence sets of single cell cluster, trajectory, and surface structures, addressing an important missing link in single cell data analysis. He has used part of these methods to explain gene expression variation in developmental kidney and organoid single cell data, in collaboration with Matthias Kretzler’s group.
- Machine Learning and Statistical Methods for Genomics
- Sequence Basis of Genome Regulation
- Statistical Methods for Single-cell Data Analysis
- The Evolution of Noncoding Genome
- Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development.
- Zhou J, Schor IE, Yao V, Theesfeld CL, Marco-Ferreres R, Tadych A, Furlong EEM, Troyanskaya OG, PLoS Genet. 2019 Sep 15 9 e1008382
- Whole-genome deep-learning analysis identifies contribution of noncoding mutations to autism risk.
- Zhou J, Park CY, Theesfeld CL, Wong AK, Yuan Y, Scheckel C, Fak JJ, Funk J, Yao K, Tajima Y, Packer A, Darnell RB, Troyanskaya OG, Nat. Genet. 2019 06 51 6 973-980
- Selene: a PyTorch-based deep learning library for sequence data.
- Chen KM, Cofer EM, Zhou J, Troyanskaya OG, Nat. Methods 2019 04 16 4 315-318
- Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk.
- Zhou J, Theesfeld CL, Yao K, Chen KM, Wong AK, Troyanskaya OG, Nat. Genet. 2018 08 50 8 1171-1179
- Probabilistic modelling of chromatin code landscape reveals functional diversity of enhancer-like chromatin states.
- Zhou J, Troyanskaya OG, Nat Commun 2016 Feb 7 10528
- Predicting effects of noncoding variants with deep learning-based sequence model.
- Zhou J, Troyanskaya OG, Nat. Methods 2015 Oct 12 10 931-4
- Global quantitative modeling of chromatin factor interactions.
- Zhou J, Troyanskaya OG, PLoS Comput. Biol. 2014 Mar 10 3 e1003525
- Deep supervised and convolutional generative stochastic network for protein secondary structure prediction
- Zhou J, Troyanskaya OG, ICML 2014
- New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries.
- Shrine N, Guyatt AL, Erzurumluoglu AM, Jackson VE, Hobbs BD, Melbourne CA, Batini C, Fawcett KA, Song K, Sakornsakolpat P, Li X, Boxall R, Reeve NF, Obeidat M, Zhao JH, Wielscher M, Weiss S, Kentistou KA, Cook JP, Sun BB, Zhou J, Hui J, Karrasch S, Imboden M, Harris SE, Marten J, Enroth S, Kerr SM, Surakka I, Vitart V, Lehtimäki T, Allen RJ, Bakke PS, Beaty TH, Bleecker ER, Bossé Y, Brandsma CA, Chen Z, Crapo JD, Danesh J, DeMeo DL, Dudbridge F, Ewert R, Gieger C, Gulsvik A, Hansell AL, Hao K, Hoffman JD, Hokanson JE, Homuth G, Joshi PK, Joubert P, Langenberg C, Li X, Li L, Lin K, Lind L, Locantore N, Luan J, Mahajan A, Maranville JC, Murray A, Nickle DC, Packer R, Parker MM, Paynton ML, Porteous DJ, Prokopenko D, Qiao D, Rawal R, Runz H, Sayers I, Sin DD, Smith BH, Soler Artigas M, Sparrow D, Tal-Singer R, Timmers PRHJ, Van den Berge M, Whittaker JC, Woodruff PG, Yerges-Armstrong LM, Troyanskaya OG, Raitakari OT, Kähönen M, Pola?ek O, Gyllensten U, Rudan I, Deary IJ, Probst-Hensch NM, Schulz H, James AL, Wilson JF, Stubbe B, Zeggini E, Jarvelin MR, Wareham N, Silverman EK, Hayward C, Morris AP, Butterworth AS, Scott RA, Walters RG, Meyers DA, Cho MH, Strachan DP, Hall IP, Tobin MD, Wain LV, Nat. Genet. 2019 03 51 3 481-493
- Organoid single cell profiling identifies a transcriptional signature of glomerular disease.
- Harder JL, Menon R, Otto EA, Zhou J, Eddy S, Wys NL, O'Connor C, Luo J, Nair V, Cebrian C, Spence JR, Bitzer M, Troyanskaya OG, Hodgin JB, Wiggins RC, Freedman BS, Kretzler M, JCI Insight 2019 Jan 4 1
Honors & Awards
- Cancer Prevention Research Institute of Texas First Time Faculty Recruitment Award
- Endowed Scholar in Medical Research