Scaffolding algorithm using second- and third-generation reads

Wiktor Franus , Wiktor Kuśmirek , Robert Marek Nowak

Abstract

The second generation sequencing methods produce high-quality short reads, which are assembled into contigs by DNA assemblers. Due to the fact that length of a single read is limited to 500bp it is really hard to assembly full genomes or full chromosomes. Generating longer contigs with low cost of sequencing is a main effort of computer scientists in this area. We propose to link contings created from second-generation reads using reads from third-generation sequencers. Such reads have length 10-20kbp. An existing implementation of this approach appears to be time and memory demanding for larger genomes. We developed an algorithm based on Bloom filter and extremely memory-efficient associative array. Our implementation remarkably exceeds the previous one in terms of time and memory consumption. Presented algorithm, provided as a shared library, is a part of the dnaasm de-novo assembler. The library has been created using C++ programming language, Boost and Google Sparse Hash libraries. Both web browser-based graphical user interface and command line interface are provided. Source code as well as a demo web application and a docker image are available at the dnaasm project web-page: http://dnaasm.sourceforge.net. Our application has been tested on real data of bacteria, yeast and plant genomes.
Author Wiktor Franus (FEIT / ICS)
Wiktor Franus,,
- The Institute of Computer Science
, Wiktor Kuśmirek (FEIT / IN)
Wiktor Kuśmirek,,
- The Institute of Computer Science
, Robert Marek Nowak (FEIT / IN)
Robert Marek Nowak,,
- The Institute of Computer Science
Pages108083A-1-108083A-10
Publication size in sheets0.5
Book Romaniuk Ryszard, Linczuk Maciej Grzegorz (eds.): Proceedings of SPIE: Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018, vol. 10808, 2018, SPIE - the International Society for Optics and Photonics, ISBN 9781510622036, 2086 p.
Keywords in Englishscaffold, next generation sequencing, de-novo, DNA assemblers, hybrid DNA assembly, third-generation sequencing
DOIDOI:10.1117/12.2501505
URL https://www.spiedigitallibrary.org/conference-proceedings-of-spie/10808/108083A/Scaffolding-algorithm-using-second--and-third-generation-reads/10.1117/12.2501505.full?SSO=1
Languageen angielski
File
108083A_franus.pdf 479.55 KB
Score (nominal)15
ScoreMinisterial score = 15.0, 16-10-2018, BookChapterMatConf
Ministerial score (2013-2016) = 15.0, 16-10-2018, BookChapterMatConf
Citation count*
Cite
Share Share

Get link to the record


* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Back