De Novo genome assembly for third generation sequencing data

Mateusz Forc , Wiktor Kuśmirek , Robert Marek Nowak

Abstract

The second generation sequencing techniques opened doors to further research on a world scale, because the cost of DNA sequencing dropped significantly. However, the second generation sequencing technology has some drawbacks, mainly short read length. In 2017 the new devices, that use real-time sequencing started to be available. This approach, called "the third-generation sequencing" achieve read length of 20kbp and error rate about 15%. As a consequence of this process new DNA assemblers were developed. In this article we propose an implementation of Overlap Graph-based de novo assembly algorithm for third-generation sequencing data. The proposed method involves graph algorithms and dynamic programming, optimized using a MinHash filter. The solution has been tested on both simulated and real data of bacteria obtained from Oxford Nanopore MinION sequencer. The algorithm is included in "OLC" module of the dnaasm de novo assembler. Dnaasm application provides command line interface as well as web browser-based client. Source code as well as a demo web application and a docker image are available at the dnaasm project web-page: http://dnaasm.sourceforge.net.
Author Mateusz Forc (FEIT / ICS)
Mateusz Forc,,
- The Institute of Computer Science
, Wiktor Kuśmirek (FEIT / IN)
Wiktor Kuśmirek,,
- The Institute of Computer Science
, Robert Marek Nowak (FEIT / IN)
Robert Marek Nowak,,
- The Institute of Computer Science
Pages108083D-1-108083D-8
Publication size in sheets0.5
Book Romaniuk Ryszard, Linczuk Maciej Grzegorz (eds.): Proceedings of SPIE: Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018, vol. 10808, 2018, SPIE - the International Society for Optics and Photonics, ISBN 9781510622036, 2086 p.
Keywords in EnglishMinHash, Overlap Graph-based assembly, third-generation sequencing, DNA assem- blers
DOIDOI:10.1117/12.2501543
URL https://www.spiedigitallibrary.org/conference-proceedings-of-spie/10808/2501543/De-Novo-genome-assembly-for-third-generation-sequencing-data/10.1117/12.2501543.full
Languageen angielski
File
108083D_Forc.pdf 753.97 KB
Score (nominal)15
ScoreMinisterial score = 15.0, 16-10-2018, BookChapterMatConf
Ministerial score (2013-2016) = 15.0, 16-10-2018, BookChapterMatConf
Citation count*
Cite
Share Share

Get link to the record


* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Back