Scaffolding algorithm using second- and third-generation reads

Wiktor Franus , Wiktor Kuśmirek , Robert Marek Nowak


The second generation sequencing methods produce high-quality short reads, which are assembled into contigs by DNA assemblers. Due to the fact that length of a single read is limited to 500bp it is really hard to assembly full genomes or full chromosomes. Generating longer contigs with low cost of sequencing is a main effort of computer scientists in this area. We propose to link contings created from second-generation reads using reads from third-generation sequencers. Such reads have length 10-20kbp. An existing implementation of this approach appears to be time and memory demanding for larger genomes. We developed an algorithm based on Bloom filter and extremely memory-efficient associative array. Our implementation remarkably exceeds the previous one in terms of time and memory consumption. Presented algorithm, provided as a shared library, is a part of the dnaasm de-novo assembler. The library has been created using C++ programming language, Boost and Google Sparse Hash libraries. Both web browser-based graphical user interface and command line interface are provided. Source code as well as a demo web application and a docker image are available at the dnaasm project web-page: Our application has been tested on real data of bacteria, yeast and plant genomes.
Author Wiktor Franus (FEIT / ICS)
Wiktor Franus,,
- The Institute of Computer Science
, Wiktor Kuśmirek (FEIT / IN)
Wiktor Kuśmirek,,
- The Institute of Computer Science
, Robert Marek Nowak (FEIT / IN)
Robert Marek Nowak,,
- The Institute of Computer Science
Publication size in sheets0.5
Book Romaniuk Ryszard, Linczuk Maciej Grzegorz (eds.): Proceedings of SPIE: Photonics Applications in Astronomy, Communications, Industry, and High-Energy Physics Experiments 2018, Proceedings of SPIE: The International Society for Optical Engineering, vol. 10808, 2018, SPIE - The International Society for Optics and Photonics, ISBN 9781510622036, 2048 p., DOI:10.1117/12.2504983
Keywords in Englishscaffold, next generation sequencing, de-novo, DNA assemblers, hybrid DNA assembly, third-generation sequencing
ProjectDevelopment of new algorithms in the areas of software and computer architecture, artificial intelligence and information systems and computer graphics . Project leader: Arabas Jarosław, , Phone: +48 22 234 7432, start date 01-08-2018, end date 30-09-2019, II/2018/DS/1, Completed
WEiTI Działalność statutowa
Languageen angielski
108083A_franus.pdf 479.55 KB
Score (nominal)15
Score sourceconferenceIndex
ScoreMinisterial score = 15.0, 01-02-2020, ChapterFromConference
Publication indicators WoS Citations = 0; Scopus Citations = 0; Scopus SNIP (Source Normalised Impact per Paper) [Not active]: 2018 = 0.394
Citation count*
Share Share

Get link to the record

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Are you sure?