KOMB is a tool for fast identification of unitigs of interest in metagenomes. KOMB introduces the concept of a Hybrid Unitig Graph (an extension to compacted de Bruijn graphs) and relies on k-core and K-truss decomposition algorithms.
Currently KOMB relies on igraph for k-core decomposition. Hence, although all the adjacency information for the graph construction is done in the functions Kgraph::readSAM, Kgraph::getEdgeInfo, and Kgraph::generateGraph we still need to create and initialize an igraph graph object. This step takes a significant amount of time, and the resulting graph data structure requires a large amount of RAM to be stored. This is limiting in the following ways:
Large RAM usage prevents KOMB from being run on extremely large datasets (e.g. the whole human genome data as exemplified by HG002 300x Illumina data from GIAB consortium).
Redundancy of converting already stored information into a different object comes at a large runtime cost. Current KOMB run on HG002 chr11 spends half of all processing time initializing the igraph object.
Currently igraph only implements serial graph construction, and serial k-core decomposition. Recent work showed promising parallelization options for k-core decomposition, and hence it might be useful to implement these algorithms as an alternative for very large graphs.
Thus, it would be a good idea to implement an alternative graph data structure that is optimized for KOMB and can support parallel construction and k-core decomposition.