Download Curriculum Vitae

Daehwan Kim is Michael L. Rosenberg  Assistant Professor in the Lyda Hill Department of Bioinformatics and a Scholar of the Cancer Prevention Research Institute of Texas (CPRIT). Dr. Kim’s research is focused on developing computer algorithms and statistical methods that enable accurate and rapid analysis of biological data, in particular sequencing data. Among his several first-author papers, his paper on the TopHat2 software program (published in 2013 in Genome Biology) and another on HISAT (published in 2015 in Nature Methods) have been cited over 10,000 and 7,000 times, respectively.

Currently one important obstacle facing analyses of sequencing data is their reliance on the human reference genome to align sequencing reads. The human reference genome was assembled using only a few samples and thus does not reflect genetic diversity across individuals and populations. This reliance on a single reference genome can introduce significant biases in downstream analyses, and it can miss important disease-related genetic variants if they occur in regions not present in the reference genome.

To address these challenges, Dr. Kim recently developed a novel indexing scheme using a graph approach that captures a wide representation of genetic variants and has low memory requirements. He has built a new alignment system, HISAT2, that enables fast search through the index. HISAT2 is the first and only practical method available for aligning sequencing reads to a graph at the human genome scale while only requiring a small amount of memory typically available on a conventional desktop. The graph-based alignment approach enables much higher alignment sensitivity and accuracy than linear reference-based alignment approaches, especially for highly polymorphic genomic regions such as HLA genes, DNA fingerprinting loci, and LINEs. The system also has the potential to perform unbiased alignment irrespective of which individual genome is sequenced.

Building off of HISAT2, Dr. Kim plans to develop a practical software solution that can accurately analyze an individual’s genome and its >20,000 genes within a few hours on a desktop computer. The availability of an individual’s genetic information made possible by this proposed work is essential to promoting personalized medicine. The software will enable researchers to more efficiently perform unbiased analyses for next-generation sequencing experiments, further improving our understanding of tumorigenesis and finding personalized treatments for cancer patients. Anyone who has access to sequencing data will be able to easily perform these functions using just one software package.

We have launched a very challenging, ambitious project to develop a platform, tentatively called Life Design Platform, that if successfully implemented, would enable us to create new organisms. The output of the platform will be whole genome DNA sequences, which can be transfected into a living organism such as E. coli. This platform will consist of four main components: visual programming language, visualization, simulation, and compiler translation of biochemical pathways into DNA. Each component will have a broad impact on and several applications in biomedical and cancer research.


(2017), Computer Sciences
Graduate School
(2017), Computer Sciences

Research Interest

  • DNA, RNA, and bisulfite sequence alignment
  • Graph alignment to population of genomes and genotyping
  • Personalized medicine with a focus on cancer diagnosis


Featured Publications LegendFeatured Publications

Centrifuge: rapid and sensitive classification of metagenomic sequences.
Kim D, Song L, Breitwieser FP, Salzberg SL Genome Res. 2016 Dec 26 12 1721-1729
Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown.
Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL Nat Protoc 2016 Sep 11 9 1650-67
Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L Nat Protoc 2012 Mar 7 3 562-78
TopHat-Fusion: an algorithm for discovery of novel fusion transcripts.
Kim D, Salzberg SL Genome Biol. 2011 Aug 12 8 R72
* See Google Scholar webpage for more publications
Since 2011

Professional Associations/Affiliations

  • Department of Bioinformatics (2017)