Daehwan Kim is Michael L. Rosenberg Assistant Professor in the Lyda Hill Department of Bioinformatics and a Scholar of the Cancer Prevention Research Institute of Texas (CPRIT). Dr. Kim’s research is focused on developing computer algorithms and statistical methods that enable accurate and rapid analysis of biological data, in particular sequencing data. Among his several first-author papers, his paper on the TopHat2 software program (published in 2013 in Genome Biology) and another on HISAT (published in 2015 in Nature Methods) have been cited over 10,000 and 6,500 times, respectively.
Currently one important obstacle facing analyses of sequencing data is their reliance on the human reference genome to align sequencing reads. The human reference genome was assembled using only a few samples and thus does not reflect genetic diversity across individuals and populations. This reliance on a single reference genome can introduce significant biases in downstream analyses, and it can miss important disease-related genetic variants if they occur in regions not present in the reference genome.
To address these challenges, Dr. Kim recently developed a novel indexing scheme using a graph approach that captures a wide representation of genetic variants and has low memory requirements. He has built a new alignment system, HISAT2 (ccb.jhu.edu/software/hisat2), that enables fast search through the index. HISAT2 is the first and only practical method available for aligning sequencing reads to a graph at the human genome scale while only requiring a small amount of memory typically available on a conventional desktop. The graph-based alignment approach enables much higher alignment sensitivity and accuracy than linear reference-based alignment approaches, especially for highly polymorphic genomic regions such as HLA genes, DNA fingerprinting loci, and LINEs. The system also has the potential to perform unbiased alignment irrespective of which individual genome is sequenced.
Building off of HISAT2, Dr. Kim plans to develop a practical software solution that can accurately analyze an individual’s genome and its >20,000 genes within a few hours on a desktop computer. The availability of an individual’s genetic information made possible by this proposed work is essential to promoting personalized medicine. The software will enable researchers to more efficiently perform unbiased analyses for next-generation sequencing experiments, further improving our understanding of tumorigenesis and finding personalized treatments for cancer patients. Anyone who has access to sequencing data will be able to easily perform these functions using just one software package.
We have launched a very challenging, ambitious project to develop a platform, tentatively called Life Design Platform, that if successfully implemented, would enable us to create new organisms. The output of the platform will be whole genome DNA sequences, which can be transfected into a living organism such as E. coli. This platform will consist of four main components: visual programming language, visualization, simulation, and compiler translation of biochemical pathways into DNA. Each component will have a broad impact on and several applications in biomedical and cancer research.
- (2017), Computer Sciences
- Graduate School
- (2017), Computer Sciences
- DNA, RNA, and bisulfite sequence alignment
- Graph alignment to population of genomes and genotyping
- Personalized medicine with a focus on cancer diagnosis
- Graph-Based Genome Alignment and Genotyping with HISAT2 and HISAT-genotype
- Kim D, Paggi JM, Park C, Bennett C, and Salzberg SL Nature Biotechnology 2019 37 8 907-915
- HISAT: a fast spliced aligner with low memory requirements.
- Kim D, Langmead B, Salzberg SL Nat. Methods 2015 Apr 12 4 357-60
- TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions.
- Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL Genome Biol. 2013 Apr 14 4 R36
- Centrifuge: rapid and sensitive classification of metagenomic sequences.
- Kim D, Song L, Breitwieser FP, Salzberg SL Genome Res. 2016 Dec 26 12 1721-1729
- Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown.
- Pertea M, Kim D, Pertea GM, Leek JT, Salzberg SL Nat Protoc 2016 Sep 11 9 1650-67
- Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.
- Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, Pimentel H, Salzberg SL, Rinn JL, Pachter L Nat Protoc 2012 Mar 7 3 562-78
- TopHat-Fusion: an algorithm for discovery of novel fusion transcripts.
- Kim D, Salzberg SL Genome Biol. 2011 Aug 12 8 R72
- * See Google Scholar webpage for more publications
- Since 2011
- Department of Bioinformatics (2017)