The gatb-core from gatb

Convenience for programmers using lambda expression

Hello!

I'd like to know if it is possible to change the signature or to add another iterator() function in gatb::core::tools::dp::Iterator class. I guess this class is used a lot when programming in GATB, but we can't use it with lambda expressions. Right now, this function is defined as:

template <typename Functor> void iterate (/* R: removed const for convenience */ /*const*/ Functor& f) { for (first(); !isDone(); next()) { f (item()); } }

As the parameter is not const anymore, it is not possible to use this function with lambda expressions, which is convenient for those using c++11. For those using versions below c++11, however, the function seems more convenient as it is now...

This is not a big deal though...

Segmentation fault when creating de Bruijn Graph

Hi.
I'm using Graph::create to test some sequences. Then I find that in some special inputs, the lib will crash. One example is ["TCAG", "TCCA"] with kmer-size = 4. The code is in below:

size_t kmerSize = 4;
IBank *bank = new BankStrings("TCAG", "TCCA", NULL);
Graph graph = Graph::create(bank, "-kmer-size %d -abundance-min 1 -verbose 0", kmerSize);

There are also other inputs can cause the problem such as ['TCAGTCCA'](combine them) and ['TCCTCAG'] with kmersize = 4.

I find the crash line with GDB: 452 line, '__sync_fetch_and_or (this->blooma + (h0 >> 3), bit_mask[h0 & 7])', in function void BloomCacheCoherent::insert (const Item& item), in file 'Bloom.hpp'
In the first case, h0 = 14049774491787554146

My environment:
OS: Ubuntu 18.04
CMake version: 3.10.3
gcc/g++ version: 7.3.0
Make version: 4.1
GATB-CORE version: 1.4.1

I also try clang 6.0.0 and the problem still.

What is the reason? Could this be avoided?
Thanks.

Python bindings ?

Is there someone working on this ?

The only reference I found is on the GATB Global Architecture page :

Wrappers for other languages will be available in the future

I needed some basic scripting capabilities for exploring GATB's graphs so I wrote a simple Cython wrapper around gatb::core::debruijn::impl::Graph. Here is a demo. Code will be available soon.

Suggestions or questions are welcomed.

BCALM 2 wtf: repart(max_minimizer) < repart(min_minimizer)

I was trying BCALM 2 (commit d47cf58) on some MiSeq data, and it failed with this error:
wtf? traveller kmer = AAAAAATTCGT, min_minimizer=167 max_minimizer=1048575, repart(min_minimizer)=1, repart(max_minimizer)=0
I found the place in the source the error came from, but I can't tell why it happened. I'm using a single, interleaved FASTQ file with 2.6 million reads. They're variable-length reads; do they have to be fixed-length, maybe?
Here's the full command line I used:
$ bcalm -kmer-size 21 -abundance-min 10 -in SC8C1.fq -out k21a10/SC8C1.bcalm
(It also failed with a -kmer-size of 11.)
I can't tell from the error message whether I'm doing something basic wrong, or if there's a problem with the input data. I read all the documentation I can find, and I couldn't see any obvious mistake I made.

be clear in the documentation that Graph::contains() has false positives on arbitrary nodes

example error in mac

The following errors during I compile example in mac (version 10.11.6 ):

In file included from examples/debruijn/debruijn1.cpp:4:
In file included from include/gatb/gatb_core.hpp:78:
include/gatb/debruijn/impl/Graph.hpp:37:10: fatal error: 'tr1/unordered_map' file not found
#include <tr1/unordered_map>

command: "clang++ examples/debruijn/debruijn1.cpp -Iinclude -Llib -lgatbcore -lhdf5 -ldl -lz -lpthread -std=c++0x -O3 -DBOOST_NO_CXX11_RVALUE_REFERENCES=1 -o debruijn1"

Depth-first search on a graph

Hi guys,

I was wondering if you could point me to an example of how to implement depth-first search on a GATB de Bruijn graph? I see in the snippets there is an implementation for BFS but I need DFS.

In some of the slides floating around from presentations you have done it says the API provides ways of doing DFS on a graph but I am struggling to piece it together from the documentation.

Effectively what I am trying to do is find all paths from a starting kmer to some given "end kmer". What I had thought of doing was running DFS from the start kmer and then taking all paths from that node which include my "end kmer".

I have tried implementing DFS but have been running into issues that most likely relate to me not fully understanding the API - i.e when trying to generate paths getting stuck in unexpected (to me) cycles relating to reverse compliment kmers.

Any help is most appreciated.

-DNONCANONICAL doesnt work on osx?

Teo reports it freezes RepartitorAlgorithm::execute()

compile error

hello,
I get follows error when used cmake .. compile gate-core:

-- CppUnit FOUND (/public/home/bin/cppunit/include)
CMake Error at CMakeLists.txt:68 (message):
  Insufficient gcc version (gcc>=4.7 needed)


-- Configuring incomplete, errors occurred!
See also "/public/home//software/gatb-core/gatb-core/build/CMakeFiles/CMakeOutput.log".
See also "/public/home//software/gatb-core/gatb-core/build/CMakeFiles/CMakeError.log".

But my gcc version is 4.8.5 and I get this error message?

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/public/home/zpxu/bin/gcc-4.8.5/libexec/gcc/x86_64-unknown-linux-gnu/4.8.5/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ./configure --prefix=/public/home/zpxu/bin/gcc-4.8.5 --enable-threads=posix --disable-checking --disable-multilib --with-mpc=/public/home/zpxu/bin/mpc-0.8.1/ --with-gmp=/public/home/zpxu/bin/gmp-4.3.2/ --with-mpfr=/public/home/zpxu/bin/mpfr-2.4.2/
Thread model: posix
gcc version 4.8.5 (GCC)

printf messes with type of size_t

Hello,
The warnings shown below you may want to address.
Cheers,
Steffen

/home/moeller/git/med-team/gatb-core/gatb-core/thirdparty/BooPHF/BooPHF.h: In member function ‘void boomphf::bitVector::print() const’:
/home/moeller/git/med-team/gatb-core/gatb-core/thirdparty/BooPHF/BooPHF.h:535:33: warning: format ‘%lli’ expects argument of type ‘long long int’, but argument 2 has type ‘uint64_t’ {aka ‘long unsigned int’} [-Wformat=]
  535 |    printf("bit array of size %lli: \n",_size);
      |                              ~~~^      ~~~~~
      |                                 |      |
      |                                 |      uint64_t {aka long unsigned int}
      |                                 long long int
      |                              %li
/home/moeller/git/med-team/gatb-core/gatb-core/thirdparty/BooPHF/BooPHF.h:539:19: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘uint64_t’ {aka ‘long unsigned int’} [-Wformat=]
  539 |      printf(" (%llu) ",ii);
      |                ~~~^    ~~
      |                   |    |
      |                   |    uint64_t {aka long unsigned int}
      |                   long long unsigned int
      |                %lu
/home/moeller/git/med-team/gatb-core/gatb-core/thirdparty/BooPHF/BooPHF.h:548:16: warning: format ‘%llu’ expects argument of type ‘long long unsigned int’, but argument 2 has type ‘uint64_t’ {aka ‘long unsigned int’} [-Wformat=]
  548 |     printf("%llu :  %lli,  ",ii,_ranks[ii]);
      |             ~~~^             ~~
      |                |             |
      |                |             uint64_t {aka long unsigned int}
      |                long long unsigned int
      |             %lu
/home/moeller/git/med-team/gatb-core/gatb-core/thirdparty/BooPHF/BooPHF.h:548:24: warning: format ‘%lli’ expects argument of type ‘long long int’, but argument 3 has type ‘__gnu_cxx::__alloc_traits<std::allocator<long unsigned int>, long unsigned int>::value_type’ {aka ‘long unsigned int’} [-Wformat=]
  548 |     printf("%llu :  %lli,  ",ii,_ranks[ii]);
      |                     ~~~^
      |                        |
      |                        long long int
      |                     %li

add time complexities of Graph query operations in the documentation

Having some troubles with File of files

Hello,

When giving a file of files (fof) to create a graph with gatb::core::debruijn::impl::Graph::create() (-in parameter), I ran into several problems regarding relative paths. I am using the latest GATB release, v1.3.0, but it might be that you already fixed this problem in recent commits (apologies for that, I tried to find any commit that could deal with this, but I couldn't. However, I did not try the version on master...).

One example is if the directory structure is like this:

|--gatb_exec (the executable that creates the graph and do stuff)
|--input
    |--reads1.fa
    |--reads2.fa
    |--...
    |--fof

and fof contains:

input/reads1.fa
input/reads2.fa
...

An attempt on creating the graph results on:
EXCEPTION: Unable to open bank 'output/tmp/readsFile' (if it is a list of files, perhaps some of the files inside don't exist)
However, all the files do exist, and the paths are correct.

Weirdly enough, if the structure is a little bit different, and fof is:

../input/reads1.fa
../input/reads1.fa
...

it works (maybe in the first example, I should precede all files with ./ ? I did not test this...).

The temporary solution I found so far is to read the fof, and create another fof with the absolute paths of each file (using boost::filesystem::canonical() to do so), and give this absolute fof to gatb::core::debruijn::impl::Graph::create(). However, when creating this absolute fof, boost::filesystem::canonical() automatically inserts quotes since paths can have space, e.g.:
"/Users/ishi/Desktop/kissplicesvn/kissplice/branches/CDBG_GWAS_GATB_1_3_0/cdbggwas/build/cdbggwas-1.5.5-Darwin/sample_example/GCA_001449145.1_WH-SGI-V-07060_genomic.fna"
And the graph is still not built:
EXCEPTION: Unable to open bank 'output/tmp/readsFile' (if it is a list of files, perhaps some of the files inside don't exist)

If the quotes are removed from the path, then the graph is successfully built, but surely this will cause problems to users having spaces in their paths.

I guess this is probably a minor bug, but since in several tools (including this one), usually it is the user that will build the fof, it could give them some headaches...

Thanks!!

A different parallelisation granularity in GATB

Dear GATB team,

In pandora, we use GATB to make a local assembly of de-novo variants. We are facing a performance issue where we have to perform several thousands local assemblies using GATB. Removing some outliers that we are currently dealing with, usually these local assemblies are made on small graphs, and we can process each one of them very fast (usually in less than 1 second, in some cases more, but 5 seconds is the upper limit removing some outliers). This is all using 1 thread (giving -nb-cores 1 when building the graph). Our performance problem arises from the fact that we have to perform this small assembly several thousand times, adding up to a considerable runtime.

A natural way to speed this up is multithreading the processing of all these local assemblies, but we are facing issues with GATB in this case. I think it is reasonable that GATB was not designed to have several graphs built by different threads simultaneously in memory, as the general use case I guess is a huge graph built from NGS reads, instead of thousands of small graphs as is our particular use case. Anyway, a minimum working example where we can reproduce the issues we are having can be found here, where we simply start 8 threads, and each one tries to build the same graph, but we get several types of runtime errors. If we comment the #pragma omp line, everything runs single-threadedly and well.

I think we already identified and solved one issue with this type of multithreading. Running strace -y -t -e trace=open,close, we could identify that GATB created several temporary (trashme*) files using the process name (see

gatb-core/gatb-core/src/gatb/system/impl/FileSystemCommon.cpp

Line 185 in 7cb8a48

ss << tmp_prefix() << "_" << System::thread().getProcess();

). This means that all threads would read/write to the same files simultaneously, which would certainly cause crashes. We tried to fix this with these changes, and this part seems to have worked, as strace shows that temporary files are now created with different names.

However, even with these changes, we still get several errors. We looked at these errors by running our multithreading example in debug mode and looking at the stack frames when a segmentation fault happened. In several cases, the errors were within the HDF5 library, which I think was not compiled with the threadsafe parameter turned on, but this answer on SO seems to show that is hard to have multiple threads accessing HDF5 library in C++. Other errors were memory corruption errors (e.g. double frees, destructors called twice, etc), but it seems that all these errors are related with some GATB singleton objects, like

gatb-core/gatb-core/src/gatb/tools/misc/impl/HostInfo.hpp

Line 54 in 7cb8a48

static system::SmartObject singleton;

which makes sense.

This short investigation made us realise that enabling this type of parallelisation in GATB might be more complicated than we thought, and might need a considerable time investment, which we can't afford to do it. We would like to consult you whether you think this is indeed a complicated issue to solve that needs considerable development, or if it is only a few changes here and there.

Thanks for reaching until here! I owe you a cookie when we meet again :p !

Changing the path where the temp files trashme_* are created

Hello,

I am using GATB and creating the graph as:

gatb::core::debruijn::impl::Graph::create("-in %s -kmer-size %d -abundance-min 0 -out %s/graph -nb-cores %d", readsFile.c_str(), kmerSize, outputFolder.c_str(), nbCores);

However, when doing as such, some temporary files are created in the working directory:

            trashme_17941_cfp
            trashme_17941_debloom_partitions.h5
            trashme_17941_dsk_partitions_gatb
            trashme_17941_t2_kmers

I would like to create these files in a temporary folder. To do so, I looked at some parameters (especially SortingCountAlgorithm<span>::getOptionsParser()) and I am creating now the graph as:

gatb::core::debruijn::impl::Graph::create("-in %s -kmer-size %d -abundance-min 0 -out %s/graph -out-tmp %s -out-dir %s -nb-cores %d", readsFile.c_str(), kmerSize, outputFolder.c_str(), tmpFolder.c_str(), tmpFolder.c_str(), nbCores);

But this just moved the creation of trashme_*_dsk_partitions_gatb to the temp folder. The other files are still created in the working directory.

Is there a way of specifying a temp folder for all the trashme files?

PS:
I am still using GATB v1.2.2 in this project. If this is doable in newer version, then it is fine, I will update the GATB core version...

Thanks!

LoRDEC with GATB-core-1.2.xx

LoRDEC needs to be updated to match 1.2.0 and on versions of GATB-core

#################################################################
make all
g++ lordec-correct.cpp -Igatb-core-1.2.0-bin-Linux/include/ -std=c++0x -O2 -c
lordec-correct.cpp: In function ‘int extend(gatb::core::debruijn::impl::Graph, int, gatb::core::debruijn::impl::Node, char*, int*, char*, int, int*, int*, int*)’:
lordec-correct.cpp:163:56: error: expected primary-expression before ‘>’ token
Graph::Vector neighbors = graph.successors(begin);
^
lordec-correct.cpp:216:60: error: expected primary-expression before ‘>’ token
Graph::Vector neighbors = graph.successors(cnode);
^
lordec-correct.cpp: In function ‘int best_path(gatb::core::debruijn::impl::Graph, gatb::core::debruijn::impl::Node, gatb::core::debruijn::impl::Node, char*, int*, char*, int, int*, int*)’:
lordec-correct.cpp:312:56: error: expected primary-expression before ‘>’ token
Graph::Vector neighbors = graph.successors(begin);
^
lordec-correct.cpp:372:55: error: expected primary-expression before ‘>’ token
Graph::Vector neighbors = graph.successors(cnode);
^
lordec-correct.cpp: In function ‘int traverseForwardUniSimplePath(gatb::core::debruijn::impl::Graph, gatb::core::debruijn::impl::Node&, int, char*, int*)’:
lordec-correct.cpp:418:58: error: expected primary-expression before ‘>’ token
Graph::Vector neighbors = graph.successors(node);
^
lordec-correct.cpp: In function ‘int correct_one_read(gatb::core::bank::Sequence, char*, gatb::core::debruijn::impl::Graph, gatb::core::system::IFile*, gatb::core::system::ISynchronizer*, int)’:
lordec-correct.cpp:470:10: error: ‘KSIZE_4’ was not declared in this scope
Kmer<KSIZE_4>::ModelDirect model(kmer_len);
^
lordec-correct.cpp:470:17: error: template argument 1 is invalid
Kmer<KSIZE_4>::ModelDirect model(kmer_len);
^
lordec-correct.cpp:470:32: error: expected initializer before ‘model’
Kmer<KSIZE_4>::ModelDirect model(kmer_len);
^
lordec-correct.cpp:471:10: error: the value of ‘KSIZE_4’ is not usable in a constant expression
Kmer<KSIZE_4>::ModelDirect::Iterator itKmer(model);
^
lordec-correct.cpp:470:10: note: ‘KSIZE_4’ was not declared ‘constexpr’
Kmer<KSIZE_4>::ModelDirect model(kmer_len);
^
lordec-correct.cpp:471:17: error: could not convert template argument ‘KSIZE_4’ to ‘long unsigned int’
Kmer<KSIZE_4>::ModelDirect::Iterator itKmer(model);
^
lordec-correct.cpp:471:20: error: ‘ModelDirect’ has not been declared
Kmer<KSIZE_4>::ModelDirect::Iterator itKmer(model);
^
lordec-correct.cpp:471:42: error: qualified-id in declaration before ‘itKmer’
Kmer<KSIZE_4>::ModelDirect::Iterator itKmer(model);
^
lordec-correct.cpp:483:5: error: ‘itKmer’ was not declared in this scope
itKmer.setData (seq.getData());
^
lordec-correct.cpp:495:49: error: ‘model’ was not declared in this scope
Node node = graph.buildNode(Data((char *)(model.toString(itKmer->value()).c_str())));
^
Makefile:89 : la recette pour la cible « lordec-correct.o » a échouée
make: *** [lordec-correct.o] Erreur 1

segmentation fault with the test code

#include <gatb/gatb_core.hpp>
#include

using namespace std;

int main(int argc, char* argv[]) {
string inqfile="/prednet/data03/OutputByRun03/pipeline_test/180420_MN00392_0030_A000H2F5H2/polishedfastq/H001271_200_R1_clean.fastq.gz";
cerr << inqfile << endl;
try {
IBank* fqf = Bank::open(inqfile);
Graph graph = Graph::create(fqf, "-abundance-min %d", 5);
cout << "Graph info " << graph.getInfo() << endl;
delete fqf;
}
catch (Exception& e) {
cerr << e.getMessage() << endl;
}
return 0;
}
[Graph: build branching nodes ] 99 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 1793.8 % m[Graph: build branching nodes ] 100 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 1800.0 % m[Graph: nb branching found : 158282 ] 100 % elapsed: 0 min 0 sec remaining: 0 min 0 sec cpu: 1723.5 % mem: [ 525, 525, 993] MB
Graph info graph
gatb-core-library
version : 1.4.2
git_sha1 : notset
build_date : 2020-05-01/12:51:23
build_system : Linux-4.15.0-88-generic
build_compiler : /usr/bin/cc (7.4.0)
build_kmer_size : 32 64 96 128
host
name : pus2
nb_cores : 20
memory : 251.8
disk_current_dir : 211.3
max_file_nb : 50000
pid : 5229
configuration
config
kmer_size : 31
mini_size : 10
solidity_kind : sum
abundance_min : 5
abundance_max : 2147483647
available_space : 216367
estimated_sequence_number : 1109914
estimated_sequence_volume : 146
estimated_kmers_number : 118760798
estimated_kmers_volume : 906
max_disk_space : 214367
max_memory : 5000
nb_passes : 1
nb_partitions : 20
nb_bits_per_kmer : 64
nb_cores : 20
minimizer_type : lexicographic (kmc2 heuristic)
repartition_type : unordered
nb_cores_per_partition : 1
nb_partitions_in_parallel : 20
nb_cached_items_per_core_per_part : 262144
nb_banks : 1
system
cpu : 100.0
repartition
system
cpu : 100.0
dsk
bank
bank_uri : /prednet/data03/OutputByRun03/pipeline_test/180420_MN00392_0030_A000H2F5H2/polishedfastq/H001271_200_R1_clean.fastq.gz
bank_size : 354960808
bank_total_nt : 168348045
sequences
seq_number : 1219937
seq_size_min : 20
seq_size_max : 151
seq_size_mean : 138.0
seq_size_deviation : 24.2
kmers
kmers_nb_valid : 131647981
kmers_nb_invalid : 102694
stats
temp_files
nb_superkmers : 12361178
avg_superk_length : 10.65
minimizer_density : 2.16
total_size_(MB) : 134
tmp_file_biggest_(MB) : 9
tmp_file_smallest_(MB) : 5
tmp_file_mean_(MB) : 6.7
histogram
cutoff : 3
nb_ge_cutoff : 3304951
ratio_weak_volume : 0.29
first_peak : 60
kmers
solidity_kind : sum
thresholds : 5
kmers_nb_distinct : 37851419
kmers_nb_solid : 1841041
kmers_nb_weak : 36010378
kmers_percent_weak : 95.1
partitions
nb_partitions : 20
nb_items : 1841041
part_biggest : 134129
part_smallest : 72389
part_mean : 92052.1
kind
vector : 20
fillsolid_time : 0.591
1.read : 0.148
2.sort : 0.141
3.dump : 0.302
time : 4.163
fill_partitions : 3.373
fill_solid_kmers : 0.790
system
cpu : 625.2
mphf
stats
nb_keys : 1841041
data_size : 1085092
bits_per_key : 4.715
prec : 255
nb_abund_above_prec : 3
time : 0.651
build : 0.646
save : 0.005
system
cpu : 215.2
bloom
stats
kind : neighbor
bitsize : 11109522
nb_hash : 4
nbits_per_kmer : 6.034370
time : 0.212
build_from_kmers : 0.212
system
cpu : 295.2
debloom
stats
kind : cascading
impl : DebloomMinimizerAlgorithm
bitsize : 16542714
nbits_per_kmer : 8.985521
cfp : 5433192
bloom2 : 4149975
bloom3 : 620743
bloom4 : 231882
set : 430592
nb : 687723
time : 0.400
cascading : 0.178
fill_debloom_file : 0.151
finalize_debloom_file : 0.071
system
cpu : 550.0
branching
stats
nb_branching : 158282
percentage : 8.6
checksum_branching : c86a3a255523805b
time
build : 0.162
system
cpu : 1723.5

Segmentation fault (core dumped)

missing parts of HDF5

I'm happily using GATB v1.4.1 for LoRDEC but i'm now annoyed because i can't find H5File::isHdf5 function in included hdf5 lib.

I fought hard to include the libhdf5 from my system (Ubuntu 18.04) but then there's a warning at execution :

Warning! ***HDF5 library version mismatched error***                                                                                                                                                           
The HDF5 header files used to compile this application do not match                                                                                                                                            
the version used by the HDF5 library to which this application is linked.

and i get a (core dumped).

Would it be possible to include the whole libhdf5 in GATB to provide H5File::isHdf5 and avoid having to include another libhdf5 ? (simplest solution IMHO)
If not, is there an easy way to tell GATB to use my system's libhdf5 ?
If so, will GATB compile with libhdf5 1.10.0 instead of 1.8.18 ?

getDefaultProperties() bug

There is a bug in the getDefaultProperties() function

gatb-core/gatb-core/src/gatb/tools/misc/impl/OptionsParser.cpp

Line 402 in e0c9525

IProperties* OptionsParser::getDefaultProperties ()

If we use an "OptionNoParam" to build an option that takes no parameters, it is always returned by getDefaultProperties(), I do not see any way to tell if this option should be default or not.

Gatb-core v1.2.1 does not compile on old clang

Hi,

Problem of compilation with new version of gatb-core v1.2.1 : something related to struct hash in LargeInt.hpp (commit 6d9525e)
Compiler : clang 4.1 on mac

LargeInt.hpp:870:10: error: no template named 'hash'; did you mean 'boost::hash'? struct hash< gatb::core::tools::math::LargeInt<precision> >

Simplifications.hpp:76:96: error: use of undeclared identifier 'nullptr'
                                unsigned int backtrackingLimit = 0, Node *avoidFirstNode = nullptr,

Reading BAM files

Hi!

Is there any possibility to read a BAM file as a bank of sequences? Do you know of any software relying on GATB which would have done that?

Thanks!

mac compilation problem tr1

compilation of MindTheGap with clang v7.3.0 does not work since commit a4438e3 (19 mai), probably because of this code removed
if (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 4.3)
set (dont_use_tr1 0)
else()
set (dont_use_tr1 1)

Should we patch gatb-core v1.2.0 for this ?

Could not find ZLIB

Hi, I try to install GATB-Core on my computer and it's keep failing due to an error i don't understand.
It seems that CMake couldn't find ZLIB package in the the hdf5 thirdparty.
Obviously i do have zlib install here:
/home/$1/anaconda2/include
/home/$1/anaconda2/lib/libz.so
But the Hdf5 thirdparty doesnt seems to look at it at all.
I did try to set manually the CMAKE_PREFIX_PATH like this but it doesnt help: cmake -D CMAKE_PREFIX_PATH="/home/$1/anaconda2/lib/" ..

CMake Warning at thirdparty/hdf5/CMakeFilters.cmake:36 (find_package):
Could not find a package configuration file provided by "ZLIB" with any of
the following names:

ZLIBConfig.cmake
zlib-config.cmake

Add the installation prefix of "ZLIB" to CMAKE_PREFIX_PATH or set
"ZLIB_DIR" to a directory containing one of the above files. If "ZLIB"
provides a separate development package or SDK, be sure it has been
installed.
Call Stack (most recent call first):
thirdparty/hdf5/CMakeLists.txt:574 (include)

-- Found ZLIB: /home/$1/anaconda2/lib/libz.so (found version "1.2.11")
-- Filter ZLIB is ON
DYNAMIC BINARIES for gatb-h5dump
-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE)
-- Configuring done
-- Generating done
-- Build files have been written to: /home/$1/PycharmProjects/kover/core/kover/dataset/tools/kmer_tools/thirdparty/gatb-core/gatb-core/build

Kmer size clarification

I was wondering if you could clear something up for me. In the README documentation, there is a section on kmer default sizes which I am slightly confised about.

Say I build GATB with -DKSIZE_LIST=32 does this mean that the maximum kmer size I can use in GATB-related code is 15 or 31? i.e. does the KSIZE_LIST refer to the number of bits used to represent a kmer? This line in the CMake file seems to suggest it is bits

gatb-core/gatb-core/src/CMakeLists.txt

Line 31 in 7cb8a48

list (APPEND KSIZE_STRING_TYPE_TMP "boost::mpl::int_<${ksize}>")

But the name of the variable/flag seems to indicate it is kmer size - a little confusing....

be clear in the doc that Graph::buildNode() may return a real node, it's not necessarily fake.

Missing parentheses in several expressions

Hello,

when I compile gatb-core with clang, several potential errors are reported. I think that they are real errors. See this log:

In file included from /Users/karel/github/metang/proc/main.cpp:1:
In file included from /Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/gatb_core.hpp:40:
In file included from /Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/tools/collections/impl/Bloom.hpp:33:
/Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/tools/math/LargeInt.hpp:842:21: warning: & has lower
      precedence than ==; == will be evaluated first [-Wparentheses]
            if (val & 15 == 0) // val starts with AA
                    ^~~~~~~~~
/Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/tools/math/LargeInt.hpp:842:21: note: place parentheses
      around the '==' expression to silence this warning
            if (val & 15 == 0) // val starts with AA
                    ^
                      (      )
/Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/tools/math/LargeInt.hpp:842:21: note: place parentheses
      around the & expression to evaluate it first
            if (val & 15 == 0) // val starts with AA
                    ^
                (       )
In file included from /Users/karel/github/metang/proc/main.cpp:1:
In file included from /Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/gatb_core.hpp:40:
In file included from /Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/tools/collections/impl/Bloom.hpp:33:
In file included from /Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/tools/math/LargeInt.hpp:855:
/Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/tools/math/LargeInt1.pri:300:17: warning: & has lower
      precedence than ==; == will be evaluated first [-Wparentheses]
        if (val & 15 == 0) // val starts with AA
                ^~~~~~~~~
/Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/tools/math/LargeInt1.pri:300:17: note: place parentheses
      around the '==' expression to silence this warning
        if (val & 15 == 0) // val starts with AA
                ^
                  (      )
/Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/tools/math/LargeInt1.pri:300:17: note: place parentheses
      around the & expression to evaluate it first
        if (val & 15 == 0) // val starts with AA
                ^
            (       )
In file included from /Users/karel/github/metang/proc/main.cpp:1:
In file included from /Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/gatb_core.hpp:40:
In file included from /Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/tools/collections/impl/Bloom.hpp:33:
In file included from /Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/tools/math/LargeInt.hpp:860:
/Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/tools/math/LargeInt2.pri:272:21: warning: & has lower
      precedence than ==; == will be evaluated first [-Wparentheses]
            if (val & 15 == 0) // val starts with AA
                    ^~~~~~~~~
/Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/tools/math/LargeInt2.pri:272:21: note: place parentheses
      around the '==' expression to silence this warning
            if (val & 15 == 0) // val starts with AA
                    ^
                      (      )
/Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/tools/math/LargeInt2.pri:272:21: note: place parentheses
      around the & expression to evaluate it first
            if (val & 15 == 0) // val starts with AA
                    ^
                (       )
In file included from /Users/karel/github/metang/proc/main.cpp:1:
In file included from /Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/gatb_core.hpp:78:
/Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/debruijn/impl/Graph.hpp:1183:30: warning: & has lower
      precedence than ==; == will be evaluated first [-Wparentheses]
            if ((value >> 1) & 1 == 1) 
                             ^~~~~~~~
/Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/debruijn/impl/Graph.hpp:1183:30: note: place parentheses
      around the '==' expression to silence this warning
            if ((value >> 1) & 1 == 1) 
                             ^
                               (     )
/Users/karel/github/metang/proc/contrib/gatbcore-install/include/gatb/debruijn/impl/Graph.hpp:1183:30: note: place parentheses
      around the & expression to evaluate it first
            if ((value >> 1) & 1 == 1) 
                             ^
                (               )
4 warnings generated.

Support for aarch64 / M1 macs?

Hi. I'm trying to build filt3r which depends on, and compiles, gatb-core. I'm able to build successfully on x86 linux, x86 mac os, but the build fails on aarch64/macos. I don't know my way around cmake and C++ code well enough to really diagnose much more than that.

It does appear that the error is occuring during the configuration phase though. Here's the console output of the error, after having tried more than once to build already:

tfenne@TimsM1x /tmp/gatb-core/gatb-core/build $ cmake ..
-- CppUnit NOT FOUND
[uint128 check] __uint128 not found.
Found: Enabling support for 128 bit integers using __uint128_t.
CMake Error at CMakeLists.txt:151 (STRING):
  STRING sub-command REGEX, mode REPLACE needs at least 6 arguments total to
  command.


CMake Error at CMakeLists.txt:153 (STRING):
  STRING sub-command REGEX, mode REPLACE needs at least 6 arguments total to
  command.


-- Options: -std=c++11 -O3 -DNDEBUG -Wall -Wno-unused-function -Wno-format -Wno-unknown-pragmas  -DINT128_FOUND -Wno-invalid-offsetof
CMake Warning (dev) at CMakeLists.txt:228 (set):
  Cannot set "gatb-core-flags": current scope has no parent.
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) at CMakeLists.txt:229 (set):
  Cannot set "gatb-core-includes": current scope has no parent.
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) at CMakeLists.txt:230 (set):
  Cannot set "gatb-core-libraries": current scope has no parent.
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Warning (dev) at CMakeLists.txt:231 (set):
  Cannot set "gatb-core-cmake": current scope has no parent.
This warning is for project developers.  Use -Wno-dev to suppress it.

-- OPTIMIZED KMER SIZES INTERVALS ARE 32 64 96 128 <-- max supported kmer size without recompilation
CMake Warning (dev) at CMakeLists.txt:239 (set):
  Cannot set "gatb-core-klist": current scope has no parent.
This warning is for project developers.  Use -Wno-dev to suppress it.

--  ---------- GATB TOOLS ----------
DYNAMIC BINARIES for dbgh5
DYNAMIC BINARIES for dbginfo
DYNAMIC BINARIES for leon
-- SOVERSION: 103.1.0
-- SOVERSION_TOOLS: 100.1.2
-- SOVERSION_CXX: 103.1.0
-- SOVERSION_F: 102.0.0
-- SOVERSION_HL: 100.1.2
-- SOVERSION_HL_CXX: 100.1.3
-- SOVERSION_HL_F: 100.0.4
-- SOVERSION_JAVA: 100.4.0
-- Warnings Configuration:
-- Could NOT find ZLIB (missing: ZLIB_DIR)
-- Filter ZLIB is ON
-- Generating 'H5Epubgen.h'
Generating 'H5Einit.h'
Generating 'H5Eterm.h'
Generating 'H5Edefin.h'

-- Generating '/tmp/gatb-core/gatb-core/thirdparty/hdf5/src/H5version.h'

-- Generating 'H5overflow.h'

-- Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE)
-- Configuring incomplete, errors occurred!
See also "/tmp/gatb-core/gatb-core/build/CMakeFiles/CMakeOutput.log".
See also "/tmp/gatb-core/gatb-core/build/CMakeFiles/CMakeError.log".

I can also provide the output and error logs if someone is interested in looking into this.

please create new release

gatb-core has not make any new release for a while.
However, minia and other tools make use of gatb code more recent than available releases, and do not build with last gatb release. Since last release, ABI has changed and code is not compatible anymore (functions with different number of parameters etc....)

Expecting to have a freezed copy of gatb in a tool subdir is not a very good practice/expectation (as set in gatbcore cmake module file).

Tools should be able to compile/link against gatb as an external lib with its include on a system.

Anyway, could you make a new stable release available ?

Question regarding queryAbundance(node) function.

Hello all,
I am using GATB library for my work. I have a naive question which I have not been able to find an answer for. While using one of the functions to find the abundance of a node (queryAbundance(node)) in the initial dataset, I always get the abundance as 255 for any kmer having abundance greater than 255. Upon looking at the gatb-core source code I found the following template in Graph.cpp:

template<typename Node, typename Edge, typename GraphDataVariant>
struct queryAbundance_visitor : public boost::static_visitor {
Node& node;
queryAbundance_visitor (Node& node) : node(node){}
template<size_t span> int operator() (const GraphData& data) const
{
unsigned long hashIndex = getNodeIndex(data, node);
if(hashIndex == ULLONG_MAX) return 0; // node was not found in the mphf
unsigned char value = (*(data._abundance)).at(hashIndex);
return value;
}
};

The template is returning an unsigned char value which has a range of 0 to 255. This might be a reason why the value is always 255. But even if I change it to "unsigned int", I still get the same result. Inspired from the snippet given in GATB library (deBruijn26.cpp), I am using the query abundance function as follows:

std::string s = model.toString (itKmer->value()); //itkmer->value is the kmer of length 21 from a read
const char* sq = s.c_str();
Node node = graph.buildNode(sq);
auto abund = graph.queryAbundance(node);

Am I making a mistake here in my above code?

regards
Dilip

mphfKind is not stored in hdf5 file

Hi.

Opening a graph generated with an emphf table (for example a graph generated with Minia) trigger a segfault.

The problem is that GraphTemplate (const std::string& uri) initialize _mphfKind to MPHF_BOOPHF, then the map is loaded with the BooPHF implementation, triggering the crash.

I tried to implement the detection but found that mphfKind is not stored in /configuration/xml nor in /mphf/xml. grep -i emphf confirmed that the string is not present.

GATB-Core GitHub repo cannot be used in a CMake external project

Hello,

since the root directory of this project does not contain any CMakeLists.txt, it cannot be used as an external project (or it is extremely complicated, see http://stackoverflow.com/questions/30028117/cmake-externalproject-how-to-specify-relative-path-to-the-root-cmakelists-txt).

Solution: https://github.com/GATB/gatb-core/tree/master/gatb-core should be the root directory.

Official GATB-Core releases are correct so I can define an URL instead of a GitHub repository (see, e.g., https://github.com/karel-brinda/cmake-ext/blob/master/gatbcore.cmake and a full working example at https://github.com/karel-brinda/cmake-ext-test). Nevertheless, I prefer to use GitHub repositories when it is possible.

Thanks.

problem with <boost/graph/adjacency_list.hpp>

I had only inserted "#include <boost/graph/adjacency_list.hpp>" at the head of the gatb-core example file debruijn1.cpp, and type the following commands:
> cmake -DGATB_CORE_INCLUDE_EXAMPLES=True -DCMAKE_BUILD_TYPE=Debug ..
> make -j4 debruijn1

there is a compile error:
[100%] Building CXX object examples/CMakeFiles/debruijn1.dir/debruijn/debruijn1.cpp.o
In file included from /usr/local/include/boost/pending/container_traits.hpp:15:0,
from /usr/local/include/boost/graph/named_graph.hpp:23,
from /usr/local/include/boost/graph/adjacency_list.hpp:37,
from /home/xieminzhu/Downloads/GATB/gatb-core/gatb-core/examples/debruij/debruijn1.cpp:3:
/usr/local/include/boost/next_prior.hpp:64:1: error: template argument 2 is invalid

^
The g++ version is : gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1)
The boost installed in the Ubuntu is boost_1_71_0

Tutorial links for videos and pdf are not responding

Hello, I wanted to read the beginner's doc but is unavailable due to respond time of gatb-core.gforge.inria.fr (yes I tried several browsers). Is there any way to circumvent this ?

Bugfixes for kmer counting

Rare cases, but need to be fixed:

multi-bank counting activated when it shouldn't
higher memory usage than requested (not sure if this bug occurs)
when k = 2, results are off

Please tag release to allow compilation of minia

Hello, I found
7cb8a48
suggesting that this is synced with minia development, and indeed compiling minia with the latest release gives

/home/moeller/git/med-team/minia/minia/src/Minia.cpp:170:65:   required from here
/home/moeller/git/med-team/minia/minia/src/Minia.cpp:312:62: error: ‘class gatb::core::debruijn::impl::GraphUnitigsTemplate<128>’ has no member named ‘_nbSolidKmers’

It would be great if you would tag a new point release with this change (and maybe also mention this in the minia source tree).

Many thanks!
Steffen

Memory reservation in dbgh5

Hello all,

I encountered what might be a bug during a simple test of dbgh5 on my computer. (Ubuntu 18.04, 8GB of RAM, last commit of gatb-core)

When working on a small test file (1000 reads), dbgh5 crashes with this message :

[DSK: Collecting stats on reads_r1       ]  100  %   elapsed:   0 min 0  sec   remaining:   0 min 0  sec   cpu:  50.0 %   mem: [  20,   20,   20] MB 
[DSK: Pass 1/1, Step 1: partitioning     ]  0    %   elapsed:   0 min 0  sec   remaining:   0 min 0  sec   cpu:  -1.0 %   mem: [  46,   46,   75] MB 
EXCEPTION: Pool allocation failed for 80 bytes (kmers alloc), mainbuffer is null?. Current usage is 96 and capacity is 5242881152
Pool allocation failed for 8 bytes (kmers alloc), mainbuffer is null?. Current usage is 120 and capacity is 5242881152
Pool allocation failed for 72 bytes (kmers alloc), mainbuffer is null?. Current usage is 200 and capacity is 5242881152
Pool allocation failed for 0 bytes (kmers alloc), mainbuffer is null?. Current usage is 208 and capacity is 5242881152
Pool allocation failed for 8 bytes (kmers alloc), mainbuffer is null?. Current usage is 232 and capacity is 5242881152
Pool allocation failed for 72 bytes (kmers alloc), mainbuffer is null?. Current usage is 312 and capacity is 5242881152
Pool allocation failed for 8 bytes (kmers alloc), mainbuffer is null?. Current usage is 328 and capacity is 5242881152
Pool allocation failed for 16 bytes (kmers alloc), mainbuffer is null?. Current usage is 352 and capacity is 5242881152

The problem can be solved by :

Freeing some memory on my computer
Lowering the -max-memory parameter (5GB by default)

It seems to me that having 5GB of memory available should not be necessary for such a tiny example.
Is this a bug or the expected behavior of gatb?

Regards,

Cervin

close some output information

Hi,

Is there any method to close some output? For example, when I build a graph, many output ...

[DSK: Collecting stats on ill-test-5K-...] 0 % elapsed: 0 min 0 sec remaining:...
[DSK: Pass 1/1, Step 2: counting kmers ] 51.5 % elapsed: 0 min 2 sec remaining:...
[DSK: nb solid kmers found : 496 ] 51.5 % elapsed: 0 min 2 sec remaining:...
[MPHF: populate ] 100 % elapsed: 0 min 0 sec remaining:...
[Bloom: read solid kmers ] 0 % elapsed: 0 min 0 sec remaining:...
[Debloom: build extension ] 100 % elapsed: 0 min 0 sec remaining:...
[Debloom: finalization ] 0 % elapsed: 0 min 0 sec remaining:...

I fail to close the listener. So, I want to ask for help. What should I do to close the output.

Thank you very much.

Help Wanted

Hi,

I am interested in using GATB in long reads analysis as they seem more appealing with HiFi reads and ultra-long ONT reads. Is GATB specifically for NGS short reads? Will there be future support and updates for long reads data? This is to ensure that I am using the right technologies for the right purpose.

Comments will be highly appreciated.

Thank you!

compilation error with new version of AppleClang

Hi,

gatb-core does not compile any more with AppleClang 10.0.1.1001004 (it worked with AppleClang 10.0.0.10001145)
Here are the error messages:

gatb-core/src/gatb/debruijn/impl/BranchingAlgorithm.cpp:155:58: error: 'predecessors' following the 'template’ keyword does not refer to a template
GraphVector<Node> predecessors = graph->template predecessors (node);
gatb-core/src/gatb/debruijn/impl/BranchingAlgorithm.cpp:154:58: error: 'successors' following the 'template' keyword does not refer to a template
GraphVector<Node> successors   = graph->template successors   (node);

I have tested with two different commits of gatb-core f25f57b (current master) and an older one 5642ef0 (March 2018), and obtained the same errors.

Claire

Setting install prefix

I would like to install GATB and use it as a library.
How can I specify the install prefix please?

The prefix is ignored when specified as follows:

cmake --DCMAKE_INSTALL_PREFIX="/custom/prefix/" ..
make install

Executing cmake DESTDIR=/home/rffrancon/Documents/pandora/cmake-build-debug install installs to the following path: /custom/prefix/usr/local/HDF_Group/HDF5/1.8.18.

Thank you for your help.

How to use Model Direct?

Hello,

How to use the Model Direct kmer type in the De Bruijn graph? apparently gatb-core seems to use only the ModelCanonical.

Thanks,
Lucas B. Rocha

gatb / gatb-core Goto Github PK

gatb-core's People

Contributors

Stargazers

Watchers

Forkers

gatb-core's Issues

Recommend Projects

Recommend Topics

Recommend Org