Preview

Comparison of the Two Major Classes of Assembly Algorithms: Overlap^Layout^Consensus and de-Bruijn-Graph

Powerful Essays
Open Document
Open Document
1811 Words
Grammar
Grammar
Plagiarism
Plagiarism
Writing
Writing
Score
Score
Comparison of the Two Major Classes of Assembly Algorithms: Overlap^Layout^Consensus and de-Bruijn-Graph
B RIEFINGS IN FUNC TIONAL GENOMICS . VOL 11. NO 1. 25^37

doi:10.1093/bfgp/elr035

Comparison of the two major classes of assembly algorithms: overlap^layout^consensus and de-bruijn-graph
Zhenyu Li*, Yanxiang Chen*, Desheng Mu*, Jianying Yuan, Yujian Shi, Hao Zhang, Jun Gan, Nan Li, Xuesong Hu, Binghang Liu, Bicheng Yang and Wei Fan
Advance Access publication date 19 December 2011
Downloaded from http://bfg.oxfordjournals.org/ at The University of Miami Libraries on February 7, 2013

Abstract
Since the completion of the cucumber and panda genome projects using Illumina sequencing in 2009, the global scientific community has had to pay much more attention to this new cost-effective approach to generate the draft sequence of large genomes. To allow new users to more easily understand the assembly algorithms and the optimum software packages for their projects, we make a detailed comparison of the two major classes of assembly algorithms: overlap ^ layout^ consensus and de-bruijn-graph, from how they match the Lander^Waterman model, to the required sequencing depth and reads length. We also discuss the computational efficiency of each class of algorithm, the influence of repeats and heterozygosity and points of note in the subsequent scaffold linkage and gap closure steps. We hope this review can help further promote the application of second-generation de novo sequencing, as well as aid the future development of assembly algorithms. Keywords: OLC; DBG; de novo assembly; second-generation

INTRODUCTION
One of the most important tasks in genome biology is to obtain a complete genome sequence, which is finished by a combination of sequencing technology and assembly software [1–3]. The high cost of Sanger sequencing technology has long been a limiting factor for genome projects, as we can see from the limited number of large genomes published before 2010. Fortunately, the second-generation sequencing technologies Roche/454 (www.454.com), Illumina/solexa

You May Also Find These Documents Helpful