Code Monkey home page Code Monkey logo

gap2seq's Introduction

Gap2Seq

Gap2Seq is a program for filling gaps in scaffolds produced by genome assembly tools using short read data such as reads produced by Illumina sequencing.

Since version 3.0, it can also genotype insertions from variant calls produced by variant calling tools.

Reference

L. Salmela, K. Sahlin, V. Mäkinen, and A.I. Tomescu: Gap filling as exact path length problem. In Proc. RECOMB 2015, LNBI 9029, Springer 2015, pp. 281-292.

L. Salmela and A.I. Tomescu: Safely filling gaps with partial solutions common to all solutions. In TCBB.

R. Walve, L. Salmela, V. Mäkinen: Variant genotyping with gap filling. In PLoS ONE 12(9): e0184608.

Requirements

Gap2Seq has been tested on systems running Linux on a X86_64 architecture. Gap2Seq uses GATB library for the de Bruijn graph implementation and htslib for reading alignments for read filtering. The libraries are included in the Gap2Seq package.

Compiling Gap2Seq requires a C++11 compatible compiler (GCC 4.7+ or Clang 3.5+) and CMake version 3.1 or newer.

Installation

git clone --recursive https://github.com/rikuu/Gap2Seq
cd Gap2Seq
cd thirdparty/htslib; make; cd ../..
mkdir build;  cd build;  cmake ..;  make

The main script Gap2Seq and the binaries Gap2Seq-core, GapMerger, GapCutter and ReadFilter can then be found in the build directory.

Usage

Gap2Seq [parameters]

Required parameters:
-f, --filled <FASTA file>   output file for filled scaffolds

-r, --reads <FASTA/Q files> short reads, several files can be specified as a list separated by ','
or
-l, --libraries <lib conf>  list of aligned read libraries

-s, --scaffolds <FASTA/Q file> scaffolds with gaps
or
-g, --gaps <FASTA/Q file>   pre-cut gaps
-b, --bed <BED file>
or
-v, --vcf <VCF file>        variants in the reads against reference
-R, --reference <FASTA file>

Optional parameters:
-t, --threads <int>         number of threads to use  [default 1]
-k <int>                    k-mer length for DBG  [default 31]
--max-mem <float>           maximum memory usage of DP table computation in gigabytes (excluding DBG) [default 20]
--fuz <int>                 number of nucleotides to ignore on gap fringes  [default 10]
--dist-error <int>          maximum error in gap estimates  [default 500]
--solid <int>               threshold for solid k-mers for building the DBG [default 2]
--randseed <int>            random seed (0 to use current time)  [default 0]
--all-upper                 fill all bases in upper case.
--unique                    fill only gaps with a unique path of best length
--best-only                 consider only paths that have optimal length
-h, --help                  show this help message and exit

Examples

This example shows how to run Gap2Seq on the GAGE S. aureus data.

Download the GAGE data sets from

http://gage.cbcb.umd.edu/data/Staphylococcus_aureus/Data.original.tgz http://gage.cbcb.umd.edu/data/Staphylococcus_aureus/Assembly.tgz

Unpack the data files.

Without read filtering

Run Gap2Seq (here we run it for the SGA scaffolds)

Gap2Seq --scaffolds Assembly/SGA/genome.scf.fasta --filled Assembly/SGA/genome.scf.fill.fasta --reads Data/original/frag_1.fastq,Data/original/frag_2.fastq,Data/original/shortjump_1.fastq,Data/original/shortjump_2.fastq

The filled scaffolds are then in the file Assembly/SGA/genome.scf.fill.fasta.

With read filtering

Align, sort, and index the read libraries to the scaffolds with e.g. BWA MEM.

bwa index Assembly/SGA/genome.scf.fasta

bwa mem Assembly/SGA/genome.scf.fasta Data/original/frag_1.fastq Data/original/frag_2.fastq | samtools sort -O bam - > Data/original/frag.bam
samtools index Data/original/frag.bam

bwa mem Assembly/SGA/genome.scf.fasta Data/original/shortjump_1.fastq Data/original/shortjump_2.fastq | samtools sort -O bam - > Data/original/shortjump.bam
samtools index Data/original/shortjump.bam

Create a read library configuration file, a tab-separated list with a single read library per line: bam, mean insert size, std dev, threshold

echo -e "Data/original/frag.bam\t180\t18\t40" > libraries.txt
echo -e "Data/original/shortjump.bam\t3500\t350\t40" >> libraries.txt

Run Gap2Seq.

Gap2Seq --scaffolds Assembly/SGA/genome.scf.fasta --filled Assembly/SGA/genome.scf.fill.fasta --libraries libraries.txt

Insertion genotyping

First, using any insertion/variant calling pipeline, construct a VCF file of the variants in the reads against a reference genome. Then run Gap2Seq supplying it with the VCF, reference, and the reads.

Gap2Seq --vcf Assembly/SGA/variants.vcf --reference Assembly/SGA/genome.scf.fasta --filled Assembly/SGA/genome.scf.fill.fasta --reads Data/original/frag_1.fastq,Data/original/frag_2.fastq,Data/original/shortjump_1.fastq,Data/original/shortjump_2.fastq

Insertion genotyping can also be combined with read filtering.

Changelog

Version 3.1

When classifying filled bases into safe and unsafe bases, all paths within the allowed interval are now considered. In the previous version, only paths with optimal length were considered. The old behaviour can still be invoked using the parameter -best-only.

ReadFilter now infers the read length. This breaks compatibility with old read library configurations.

Read filtering is slightly improved for gap filling. Unchanged for insertion genotyping.

Unmapped reads are now filtered from alignments only once per read library. Speeds up gap filling around 30% for large scaffolds.

Using the wrapper script without alignments now correctly delegates all processing to Gap2Seq-core.

Updated to GATB-core 1.4.1 and HTSlib to 1.8.

Version 3.0

Gap2Seq.sh is replaced with a Python script, which accepts gaps/scaffolds in FASTA/FASTQ and VCF formats and reads in FASTA/FASTQ and SAM/BAM formats.

Optional per-gap read filtering when run with new script.

Gap2Seq binary is renamed Gap2Seq-core and can still be used instead of the new script.

Flanks of length between k and k+fuz are now used rather than the hard limit of k+fuz.

Switched to GATB-core 1.2.2.

Version 2.0

Parallelization is now on gap level when run with the Gap2Seq.sh script.

Safe bases inserted into gaps are outputted in upper case and unsafe bases are outputted in lower case.

Version 1.0

Optimized version of the algorithm.

Output now indicates on which gaps search was aborted because of the memory limit.

Version 0.3

Reorganized parallel execution.

Version 0.2

Proper synchronization for access to the memuse hash table.

Switched to GATB 1.0.5.

The maximum memory limitation option is now total for all threads. This is then divided evenly to all threads.

Memory usage tracking now includes all major data structures excluding the DBG.

gap2seq's People

Contributors

ammaraziz avatar leenasalmela avatar rikuu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

gap2seq's Issues

test suite?

Would it be possible to add a test suite, at least a simple one?

error running

Cut 71 scaffolds into 71 contigs and 0 gaps
Traceback (most recent call last):
File "./Gap2Seq", line 445, in
args['bed'], args['gaps'] = cut_gaps(args['k'], args['fuz'], args['mask'], args['scaffolds'])
File "./Gap2Seq", line 311, in cut_gaps
return open(bed_file, 'r'), open(gap_file, 'r')
FileNotFoundError: [Errno 2] No such file or directory: 'tmp.gaps'

What should I do?

tag a release, add source tarball to the release

@rikuu would it be possible to tag a release, that includes the bugfixes since 3.1? also, to upload to github, as part of the release, a tarball that includes the thirdparty libraries with the right source versions? (this is for packaging with bioconda) thanks!

No output

Hi!

I used Gap2Seq to fill gaps of my samples, but when running Gap2seq, I can't get any outputs. Then I tried the Gage exsample, also didn't get outputs.

I got following details:
Max mem: 894784853
Filled 0 gaps out of 0

Any help would be greatly appreciated.

tag a release

Can you tag the current release (3.0?), if it's stable? This would help with creating a bioconda recipe for Gap2Seq.

format genotyping

Dear,

I only want to know which format return the program... I ran several programs to detect de novo insertions, but I want to know if the vcf which I send is overwrite with GT information.

Thanks

Jordi

bugs in Gap2Seq

Hi every one,

I installed the program Gap2Seq, and I found different bugs (or I hope) in the binary Gap2Seq.

when you ran the Gap2Seq like this:
Gap2Seq --vcf insertions_filtered.vcf -t 48 --reference genome.fa --filled genome.fill.fa --reads insilico3_1.fq.gz,insilico3_2.fq.gz

I ran the Gap2Seq to genotype a vcf which contain insertions already detected by Pamir. You could found the following error:

line 329 "f" is not a variable (or something like that)...
If you go to this line you will find the f variable, which I consider the correct variable is reference_file, the same happens in line 340.

Even more in line 339, you should change the "r+" by "w+", because you need create some files... If you do this changes and install the HDF5 version 1.8.18 the program will works.

My only question is know how I will get the output, the program overwrite the vcf? or generate a new one?
I hope this helps

Jordi

htslib library linking problems

Hi,

I ran into some problems, when trying to build Gap2Seq 3.1, so during make I got following error:

../thirdparty/htslib/libhts.a(cram_io.o): In function lzma_mem_deflate': /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/cram/cram_io.c:678: undefined reference to lzma_stream_buffer_bound'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/cram/cram_io.c:684: undefined reference to lzma_easy_buffer_encode' ../thirdparty/htslib/libhts.a(cram_io.o): In function cram_compress_by_method':
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/cram/cram_io.c:1040: undefined reference to BZ2_bzBuffToBuffCompress' ../thirdparty/htslib/libhts.a(cram_io.o): In function cram_uncompress_block':
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/cram/cram_io.c:966: undefined reference to BZ2_bzBuffToBuffDecompress' ../thirdparty/htslib/libhts.a(cram_io.o): In function lzma_mem_inflate':
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/cram/cram_io.c:700: undefined reference to lzma_easy_decoder_memusage' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/cram/cram_io.c:700: undefined reference to lzma_stream_decoder'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/cram/cram_io.c:715: undefined reference to lzma_code' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/cram/cram_io.c:728: undefined reference to lzma_code'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/cram/cram_io.c:737: undefined reference to lzma_end' ../thirdparty/htslib/libhts.a(hfile_libcurl.o): In function easy_errno':
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:164: undefined reference to curl_easy_getinfo' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:178: undefined reference to curl_easy_getinfo'
../thirdparty/htslib/libhts.a(hfile_libcurl.o): In function wait_perform': /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:686: undefined reference to curl_multi_fdset'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:707: undefined reference to curl_multi_perform' ../thirdparty/htslib/libhts.a(hfile_libcurl.o): In function process_messages':
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:662: undefined reference to curl_multi_info_read' ../thirdparty/htslib/libhts.a(hfile_libcurl.o): In function wait_perform':
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:689: undefined reference to curl_multi_timeout' ../thirdparty/htslib/libhts.a(hfile_libcurl.o): In function libcurl_close':
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1049: undefined reference to curl_easy_pause' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1058: undefined reference to curl_multi_remove_handle'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1062: undefined reference to curl_easy_cleanup' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1063: undefined reference to curl_multi_cleanup'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1058: undefined reference to curl_multi_remove_handle' ../thirdparty/htslib/libhts.a(hfile_libcurl.o): In function libcurl_write':
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:821: undefined reference to curl_easy_pause' ../thirdparty/htslib/libhts.a(hfile_libcurl.o): In function libcurl_exit':
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:267: undefined reference to curl_share_cleanup' ../thirdparty/htslib/libhts.a(hfile_libcurl.o): In function libcurl_open':
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1121: undefined reference to curl_multi_init' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1124: undefined reference to curl_easy_init'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1128: undefined reference to curl_easy_setopt' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1132: undefined reference to curl_easy_setopt'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1140: undefined reference to curl_easy_setopt' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1141: undefined reference to curl_easy_setopt'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1142: undefined reference to curl_easy_setopt' ../thirdparty/htslib/libhts.a(hfile_libcurl.o):/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1149: more undefined references to curl_easy_setopt' follow
../thirdparty/htslib/libhts.a(hfile_libcurl.o): In function libcurl_open': /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1173: undefined reference to curl_multi_add_handle'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1203: undefined reference to curl_easy_cleanup' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1204: undefined reference to curl_multi_cleanup'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1197: undefined reference to curl_multi_remove_handle' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1135: undefined reference to curl_easy_setopt'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1136: undefined reference to curl_easy_setopt' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1167: undefined reference to curl_easy_setopt'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1187: undefined reference to curl_easy_getinfo' ../thirdparty/htslib/libhts.a(hfile_libcurl.o): In function restart_from_position':
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:926: undefined reference to curl_easy_setopt' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:945: undefined reference to curl_easy_duphandle'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:949: undefined reference to curl_easy_setopt' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:950: undefined reference to curl_easy_setopt'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:951: undefined reference to curl_easy_setopt' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1028: undefined reference to curl_easy_cleanup'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:961: undefined reference to curl_multi_add_handle' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1028: undefined reference to curl_easy_cleanup'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1028: undefined reference to curl_easy_cleanup' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:968: undefined reference to curl_easy_pause'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:988: undefined reference to curl_multi_remove_handle' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:991: undefined reference to curl_easy_reset'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:992: undefined reference to curl_multi_remove_handle' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1020: undefined reference to curl_easy_reset'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1021: undefined reference to curl_multi_remove_handle' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1001: undefined reference to curl_easy_cleanup'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1003: undefined reference to curl_easy_setopt' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1004: undefined reference to curl_easy_setopt'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:994: undefined reference to curl_easy_cleanup' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1007: undefined reference to curl_easy_reset'
../thirdparty/htslib/libhts.a(hfile_libcurl.o): In function libcurl_read': /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:763: undefined reference to curl_easy_pause'
../thirdparty/htslib/libhts.a(hfile_libcurl.o): In function hfile_plugin_init_libcurl': /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1351: undefined reference to curl_global_init'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1354: undefined reference to curl_share_init' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1356: undefined reference to curl_share_setopt'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1357: undefined reference to curl_share_setopt' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1358: undefined reference to curl_share_setopt'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1384: undefined reference to curl_version_info' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1360: undefined reference to curl_share_cleanup'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1361: undefined reference to curl_global_cleanup' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1373: undefined reference to curl_share_cleanup'
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:1374: undefined reference to curl_global_cleanup' ../thirdparty/htslib/libhts.a(hfile_libcurl.o): In function libcurl_exit':
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_libcurl.c:289: undefined reference to curl_global_cleanup' ../thirdparty/htslib/libhts.a(hfile_s3.o): In function s3_sign':
/home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_s3.c:77: undefined reference to EVP_sha1' /home/schusterbauer/Programs/Gap2Seq/thirdparty/htslib/hfile_s3.c:77: undefined reference to HMAC'
collect2: error: ld returned 1 exit status
make[2]: *** [ReadFilter] Error 1
make[1]: *** [CMakeFiles/ReadFilter.dir/all] Error 2
make: *** [all] Error 2

With some help from my supervisor I finally managed to build it by adding

'-lpthread -lbz2 -llzma -lcurl -lcrypto'
to CMakeLists.txt at line 65, so it now looks like

target_link_libraries("ReadFilter" ${gatb-core-libraries} ${htslib-library} -lpthread -lbz2 -llzma -lcurl -lcrypto)

Anyway it seems like this shouldn't be necessary and I don't know why it's not automatically linking it correctly. Any Ideas?

Also I still have troubles running it, but I'll open another issue for that.

Greetings,
Veronika

Error running Gage Example

So because I'm very new to this I tried to run the Gage example with the Example Data.

When running

python3.4 /[mybuildFolderPath]/Gap2Seq --scaffolds Assembly/SGA/genome.scf.fasta --filled Assembly/SGA/genome.scf.fill.fasta --reads Data/original/frag_1.fastq,Data/original/frag_2.fastq,Data/original/shortjump_1.fastq,Data/original/shortjump_2.fastq

I get following Ouput:

Cutting gaps
Merging filled gaps and contigs
Traceback (most recent call last):
File "/home/schusterbauer/Programs/Gap2Seq-3.1/build/Gap2Seq", line 505, in
merge_gaps(args['filled'], args['final_out'])
File "/home/schusterbauer/Programs/Gap2Seq-3.1/build/Gap2Seq", line 319, in merge_gaps
stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
File "/usr/lib/python3.4/subprocess.py", line 561, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/schusterbauer/Programs/Gap2Seq-3.1/build/GapMerger', '-scaffolds', 'Assembly/SGA/genome.scf.fill.fasta', '-gaps', 'tmp.filled', '-contigs', 'tmp.contigs']' returned non-zero exit status 1

As this didn't mean a lot to me, but I think I tracked it down to Gap2Seq-core failing to fill any gaps and therefore creating an empty 'Assembly/SGA/genome.scf.fill.fasta' and no 'tmp.filled' file.

But the real question now is, why doesn't Gap2Seq-core work properly.

I ran it directly with:

/[mybuildFolderPath]/Gap2Seq-core -scaffolds Assembly/SGA/genome.scf.fasta -filled Assembly/SGA/genome.scf.fill.fasta -reads Data/original/frag_1.fastq,Data/original/frag_2.fastq,Data/original/shortjump_1.fastq,Data/original/shortjump_2.fastq

and it seems to do stuff but then in the end just outputs:

etc...
[Graph: build branching nodes ] 97 % elapsed: 0 min 1 sec remaining: 0 min 0 s[Graph: build branching nodes ] 98 % elapsed: 0 min 1 sec remaining: 0 min 0 s[Graph: build branching nodes ] 99 % elapsed: 0 min 1 sec remaining: 0 min 0 s[Graph: build branching nodes ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 s[Graph: nb branching found : 406997 ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 s[Graph: nb branching found : 406997 ] 100 % elapsed: 0 min 1 sec remaining: 0 min 0 sec cpu: 719.6 % mem: [ 700, 703, 770] MB
Max mem: 2684354560
Filled 0 gaps out of 0

I would appreciate any help

Sidenote: Gap2Seq 2.1 seems to run smoothly, but I was mainly excited to try the Insertion Genotyping :/

Greetings,
Veronika

k-mer counts were clipped to 255

Hi,

Gap2Seq looks like a great tool and mostly performed well when I tested it (closing most smaller [<1000 bp] gaps in my test dataset). However, I always get an error message while the program runs and I am worrying that this might affect the ability of the program to close larger gaps. Here is the error message (plus adjacent lines from the log):

2018-09-07 17:41:37: Round 62, 0% nodes remaining
2018-09-07 17:41:37: Assigning values
2018-09-07 17:41:39: Setting abundances of 42410107 kmers.
2018-09-07 17:41:52: WARNING: 100379 k-mer counts were clipped to 255
2018-09-07 17:42:11: Saving mphf to disk

I assume it means that the program was not able to correctly store kmer counts due to some memory limitation. Increasing the memory available to the program (to 200 GB) does not seem to make a difference though.

Any ideas?
All the best
Andreas

bzlib.h fatal error

Upon running this line of code while installing Gap2Seq:
cd thirdparty/htslib; make; cd ..

I generate the below 'fatal error':

gcc -g -Wall -O2 -I. -c -o vcf.o vcf.c
gcc -g -Wall -O2 -I. -c -o vcfutils.o vcfutils.c
gcc -g -Wall -O2 -I. -c -o cram/cram_codecs.o cram/cram_codecs.c
gcc -g -Wall -O2 -I. -c -o cram/cram_decode.o cram/cram_decode.c
gcc -g -Wall -O2 -I. -c -o cram/cram_encode.o cram/cram_encode.c
gcc -g -Wall -O2 -I. -c -o cram/cram_external.o cram/cram_external.c
gcc -g -Wall -O2 -I. -c -o cram/cram_index.o cram/cram_index.c
gcc -g -Wall -O2 -I. -c -o cram/cram_io.o cram/cram_io.c
cram/cram_io.c:57:10: fatal error: bzlib.h: No such file or directory
#include <bzlib.h>
^~~~~~~~~
compilation terminated.
make: *** [Makefile:121: cram/cram_io.o] Error 1

Any ideas as to addressing this issue?

Thanks,
Chris

Gap2Seq 3.1 inserting new deletions

Hey,

I've been testing Gap2Seq's insertion genotyping ability, by randomly inserting "gaps" (replacing nucleotides in the fasta reference).

  1. The results were rather impressive, as it can even fill gaps up to 1000 bp correctly.

  2. I realized something that seems to be a rather serious bug. Gap2Seq is inserting a 10bp deletion after each gap it has filled in. The deletion happens about 1 kmer after the end of the gap (I've tested it with 2 different k-mer sizes). Therefore my guess would be the bug is somewhere in recombining the assembled sequence with the rest of the reference (although I must admit I have not looked at your code). Also this only seems to happen using the --library option.

Could you please recheck that behavior? As mentioned in the header I used Gap2Seq v. 3.1.

Used Command:

python3 Gap2Seq --scaffold MyGappedReference.fasta --filled MyFilledReference.fasta --library MyLibraryFile.txt -k 73 -t 8 --max-mem 10

MyLibraryFile.txt:

MyBamFile.bam\t450\t70\t150

Best regards,
Veronika

Installation problem

Hi

Should probably be

git clone --recursive https://github.com/rikuu/Gap2Seq
cd Gap2Seq
cd thirdparty/htslib
make
cd ..
mkdir build
cd build
cmake ../..
make

Since CMakeLists.txt is in the main folder, and not in Gap2Seq/thirdparty

Cheers
Erik

ZLIB error

Hi,

I'm trying to compile Gap2Seq. I'm stuck at the cmake stage. Apparently it doesnt find ZLIBConfig.cmake even though I think I manage to correctly compile ZLIB. In fact ZLIBConfig.cmake it's not present in the ZLIB that I have. This is the error:

-- Checking IF your system converts long double to (unsigned) long values with special algorithm... no
-- Checking IF your system can convert (unsigned) long to long double values with special algorithm... no
-- Checking IF correctly converting long double to (unsigned) long long values... yes
-- Checking IF correctly converting (unsigned) long long to long double values... yes
-- Checking IF alignment restrictions are strictly enforced... yes
CMake Warning at thirdparty/gatb-core/gatb-core/thirdparty/hdf5/CMakeFilters.cmake:36 (find_package):
  Could not find a package configuration file provided by "ZLIB" with any of
  the following names:

    ZLIBConfig.cmake
    zlib-config.cmake

  Add the installation prefix of "ZLIB" to CMAKE_PREFIX_PATH or set
  "ZLIB_DIR" to a directory containing one of the above files.  If "ZLIB"
  provides a separate development package or SDK, be sure it has been
  installed.
Call Stack (most recent call first):
  thirdparty/gatb-core/gatb-core/thirdparty/hdf5/CMakeLists.txt:574 (include)


-- Found ZLIB: /usr/lib64/libz.so  
-- Filter ZLIB is ON
DYNAMIC BINARIES for gatb-h5dump
CMake Error at CMakeLists.txt:43 (MESSAGE):
  Lambda expressions not available.  Use a newer C++ compiler (e.g.  GCC
  Version 4.5 or greater)


-- Configuring incomplete, errors occurred!
See also "/home/lv70640/c7701100/software/Gap2Seq/build/CMakeFiles/CMakeOutput.log".
See also "/home/lv70640/c7701100/software/Gap2Seq/build/CMakeFiles/CMakeError.log".

Any help?
F

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.