De Novo genome assembly for third generation sequencing data
Mateusz Forc , Wiktor Kuśmirek , Robert Marek Nowak
AbstractThe second generation sequencing techniques opened doors to further research on a world scale, because the cost of DNA sequencing dropped significantly. However, the second generation sequencing technology has some drawbacks, mainly short read length. In 2017 the new devices, that use real-time sequencing started to be available. This approach, called "the third-generation sequencing" achieve read length of 20kbp and error rate about 15%. As a consequence of this process new DNA assemblers were developed. In this article we propose an implementation of Overlap Graph-based de novo assembly algorithm for third-generation sequencing data. The proposed method involves graph algorithms and dynamic programming, optimized using a MinHash filter. The solution has been tested on both simulated and real data of bacteria obtained from Oxford Nanopore MinION sequencer. The algorithm is included in "OLC" module of the dnaasm de novo assembler. Dnaasm application provides command line interface as well as web browser-based client. Source code as well as a demo web application and a docker image are available at the dnaasm project web-page: http://dnaasm.sourceforge.net.
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.