cibiv / nextgenmap Goto Github PK

NextGenMap is a flexible highly sensitive short read mapping tool that handles much higher mismatch rates than comparable algorithms while still outperforming them in terms of runtime. This allows analysing large scale datasets even with increased SNP rates or higher error rates (e.g. caused by specialized experimental protocols) and avoids biases caused by highly variable regions in the genome.

License: Other

C 12.85% C++ 83.44% Objective-C 0.18% SAS 0.01% CLIPS 0.04% Ada 0.62% Assembly 0.99% Pascal 0.49% C# 0.39% CMake 0.25% Makefile 0.26% HTML 0.21% Batchfile 0.01% M4 0.01% DIGITAL Command Language 0.19% Module Management System 0.01% Perl 0.03% Roff 0.03% Shell 0.01% Dockerfile 0.01%

short-read-mapper next-generation-sequencing c-plus-plus-11 opencl

nextgenmap's Introduction

Please see our github wiki for more information (https://github.com/Cibiv/NextGenMap/wiki)

nextgenmap's People

Contributors

Stargazers

Watchers

Forkers

georg-rath sdwfrost gsc0107 kdm9 mmesbahu hermannschwaerzleruibk enformatik wook2014 lightsun

nextgenmap's Issues

Build error using Clang 5.0.0

I encountered this CMake build error using the following commands:
cmake -G"Ninja" -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Release ../..

../../src/ReadProvider.cpp:480:34: error: ordered comparison between pointer and zero ('char *' and 'int')
        while (gzgets(fp, buffer, 1000) > 0 && buffer[0] == '@') {
....
../../utils/paired/interleave-pairs.cpp:62:34: error: ordered comparison between pointer and zero ('char *' and 'int')
        while (gzgets(fp, buffer, 1000) > 0 && buffer[0] == '@') {

I just needed to remove the > 0 comparison to get it to build and compile.
I checked GNU g++ compiler and it works without the change.

NextGenMap for bisulfite data and methylation calling

Dear @fritzsedlazeck ,

My lab is using NextGenMap quite extensively, as it performs quite well with the highly polymorphic datasets we have (transcriptomic and epigenomic data from pooled wild populations). We are encountering however several issues when using NGM with bisulfite data, in particular with the format of the SAM/BAM file generated and its compatibility with downstream mainstream methylation callers (e.g. CGmaptools). The way we are calling NGM to map our reads is:

ngm --paired --bam -r <<genome>> -1 <<read1>>.fastq -2 <<read2>>.fastq -o <<output>>.bam -t 4 --no-unal --bs-mapping

I have a couple of questions in this regard:

How does the BAM headers and flags of the BAM generated by NGM relate to those generated by other aligners (perhaps more standard in methylation pipelines) such as bowtie2 or Bismark? It is not completely clear from the documentation, and while I see that NGM offers the opportunity to add/customise the format of the SAM/BAM output, it is something that I'm not very familiar with.
Is there any methylation caller that would work directly with the NGM output? I see one is BiSS or BiS-SNP, but BiSS code is apparently not available.

Thank you for your help and for generating such a great piece of code!

Best,

Chema

OPENCL Couldn't create sub-devices. Error

Hello,

I was tying NGM at Texas Advanced Computing Center (https://portal.tacc.utexas.edu/user-guides/stampede2). However, an error occurs constantly. I compiled the NGM through CMake. I wonder if anyone has insight on how to solve this issue. Thanks. Any suggestion is greatly appreciated.

ESC[AESC[2K[OPENCL] Available platforms: 1
[OPENCL] AMD Accelerated Parallel Processing
[OPENCL] Selecting OpenCl platform: AMD Accelerated Parallel Processing
[OPENCL] Platform: OpenCL 1.2 AMD-APP (1214.3)
[OPENCL] 1 CPU device found.
[OPENCL] Device 0: Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz (Driver: 1214.3 (sse2,avx))
[OPENCL] Couldn't create sub-devices. Error:
Error: Invalid value (-30)
terminate called without an active exception

Best,
Mao-Lun

[PREPROCESS] Max. k-mer frequency set so

In the logs about preprocessing, I get the following warning (?) when the index is generated from the reference.

[PREPROCESS] Max. k-mer frequency set so 557!

Is this bad?

[MAIN] NextGenMap 0.5.5
2019-10-21 20:39:18.727 2019/10/22 03:39:18 CMD Pid:401    [MAIN] Startup : x64 (build Sep 11 2019 22:37:01)
2019-10-21 20:39:18.728 2019/10/22 03:39:18 CMD Pid:401    [MAIN] Starting time: 2019-10-22.03:39:18
2019-10-21 20:39:18.728 2019/10/22 03:39:18 CMD Pid:401    [CONFIG] Parameter:  --affine 0 --argos_min_score 0 --bin_size 2 --block_multiplier 2 --broken_pairs 0 --bs_cutoff 6 --bs_mapping 0 --cpu_threads 8 --dualstrand 1 --fast 0 --fast_pairing 0 --force_rlength_check 0 --format 1 --gap_extend_penalty 5 --gap_read_penalty 20 --gap_ref_penalty 20 --hard_clip 0 --keep_tags 0 --kmer 13 --kmer_min 0 --kmer_skip 2 --match_bonus 10 --match_bonus_tc 2 --match_bonus_tt 10 --max_cmrs 2147483647 --max_equal 1 --max_insert_size 1000 --max_polya -1 --max_read_length 0 --min_identity 0.650000 --min_insert_size 0 --min_mq 0 --min_residues 0.500000 --min_score 0.000000 --mismatch_penalty 15 --mode 0 --no_progress 1 --no_unal 0 --ocl_threads 1 --output /scratch/align-sha1_c7d15ec369be08560eb09d0d0eda99d87d0d7081-Ql7pv/.align-083497078/664647_GIG.aligned.sam --overwrite 1 --pair_score_cutoff 0.900000 --paired 1 --parse_all 1 --pe_delimiter / --qry /scratch/align-sha1_c7d15ec369be08560eb09d0d0eda99d87d0d7081-Ql7pv/.align-083497078/664647_GIG.fastq --qry_count -1 --qry_start 0 --ref /scratch/align-sha1_c7d15ec369be08560eb09d0d0eda99d87d0d7081-Ql7pv/.align-083497078/Ha412HOv2.0-20181130.fasta --ref_mode -1 --rg_id HV3WMBBXX.2.NTGTCA --rg_lb NTGTCA --rg_pl ILLUMINA --rg_pu HV3WMBBXX.2.NTGTCA --rg_sm 664647_GIG --sensitive 0 --silent_clip 0 --skip_mate_check 0 --skip_save 0 --slam_seq 0 --step_count 4 --strata 0 --topn 1 --trim5 0 --update_check 0 --very_fast 0 --very_sensitive 0
2019-10-21 20:39:18.728 2019/10/22 03:39:18 CMD Pid:401    [NGM] Opening for output (SAM): /scratch/align-sha1_c7d15ec369be08560eb09d0d0eda99d87d0d7081-Ql7pv/.align-083497078/664647_GIG.aligned.sam
2019-10-21 20:39:18.730 2019/10/22 03:39:18 CMD Pid:401    [SEQPROV] Encoding reference sequence.
2019-10-21 20:39:26.993 2019/10/22 03:39:26 CMD Pid:401    [SEQPROV] Size of reference genome 3251 Mbp (max. 17179 Mbp)
2019-10-21 20:39:26.994 2019/10/22 03:39:26 CMD Pid:401    [SEQPROV] Allocating 1625747302 (3280859081) bytes for the reference.
2019-10-21 20:39:42.990 2019/10/22 03:39:42 CMD Pid:401    [SEQPROV] BinRef length: 1625734859ll (elapsed 15.996025)
2019-10-21 20:39:42.990 2019/10/22 03:39:42 CMD Pid:401    [SEQPROV] 0 reference sequences were skipped (length < 10).
2019-10-21 20:39:43.007 2019/10/22 03:39:43 CMD Pid:401    [SEQPROV] Writing encoded reference to /scratch/align-sha1_c7d15ec369be08560eb09d0d0eda99d87d0d7081-Ql7pv/.align-083497078/Ha412HOv2.0-20181130.fasta-enc.2.ngm
2019-10-21 20:39:46.842 2019/10/22 03:39:46 CMD Pid:401    [SEQPROV] Writing to disk took 3.84s
2019-10-21 20:39:46.859 2019/10/22 03:39:46 CMD Pid:401    [PREPROCESS] Building reference table
2019-10-21 20:39:46.859 2019/10/22 03:39:46 CMD Pid:401    [PREPROCESS] Allocated 1 hashtable units (tableLocMax=2^32.000000, genomeSize=2^31.598445)
2019-10-21 20:39:46.859 2019/10/22 03:39:46 CMD Pid:401    [PREPROCESS] Building RefTable #0 (kmer length: 13, reference skip: 2)
2019-10-21 20:39:46.859 2019/10/22 03:39:46 CMD Pid:401    [PREPROCESS] 	Number of k-mers: 67108865
2019-10-21 20:41:04.960 2019/10/22 03:41:04 CMD Pid:401    [PREPROCESS] 	Counting kmers took 78.10s
2019-10-21 20:41:08.313 2019/10/22 03:41:08 CMD Pid:401    [PREPROCESS] 	Average number of positions per prefix: 18.880308
2019-10-21 20:41:08.313 2019/10/22 03:41:08 CMD Pid:401    [PREPROCESS] 	Index size: 335544325 byte (67108865 x 5)
2019-10-21 20:41:08.313 2019/10/22 03:41:08 CMD Pid:401    [PREPROCESS] 	Generating index took 3.35s
2019-10-21 20:41:12.165 2019/10/22 03:41:12 CMD Pid:401    [PREPROCESS] 	Allocating and initializing prefix Table took 3.85s
2019-10-21 20:41:12.165 2019/10/22 03:41:12 CMD Pid:401    [PREPROCESS] 	Number of prefix positions is 1064959800 (4)
2019-10-21 20:41:12.165 2019/10/22 03:41:12 CMD Pid:401    [PREPROCESS] 	Size of RefTable is 4259839200
2019-10-21 20:50:24.598 2019/10/22 03:50:24 CMD Pid:401    [PREPROCESS] 	Number of repetitive k-mers ignored: 98193
2019-10-21 20:50:24.598 2019/10/22 03:50:24 CMD Pid:401    [PREPROCESS] 	Overall time for creating RefTable: 637.74s
2019-10-21 20:50:24.598 2019/10/22 03:50:24 CMD Pid:401    [PREPROCESS] Writing RefTable to /scratch/align-sha1_c7d15ec369be08560eb09d0d0eda99d87d0d7081-Ql7pv/.align-083497078/Ha412HOv2.0-20181130.fasta-ht-13-2.3.ngm
2019-10-21 20:50:33.287 2019/10/22 03:50:33 CMD Pid:401    [PREPROCESS] Writing to disk took 8.69s
2019-10-21 20:50:33.449 2019/10/22 03:50:33 CMD Pid:401    [PREPROCESS] Max. k-mer frequency set so 557!
2019-10-21 20:50:33.449 2019/10/22 03:50:33 CMD Pid:401    [INPUT] Input is paired end data.
2019-10-21 20:50:33.449 2019/10/22 03:50:33 CMD Pid:401    [INPUT] Opening file /scratch/align-sha1_c7d15ec369be08560eb09d0d0eda99d87d0d7081-Ql7pv/.align-083497078/664647_GIG.fastq for reading
2019-10-21 20:50:33.451 2019/10/22 03:50:33 CMD Pid:401    [INPUT] Input is Fastq

OPENCL error: couldn't get number of OpenCl devices. Device not found

Hi,

I installed NGM using (mini)conda on ubuntu for windows. I have a i5-5300U (2,3GHz). Everything goes well until [OPENCL] Build status failed: Program build failure is reported (-11). The CPU has 4 cores and -t is set to 4 (I also get the error when setting it lower). What can I do to solve this problem?

Kind regards

Header Problem with Picard

Dear NGM developers and users,

i just tried NGM on my BAM files and it worked fine.
But a problem occured when i tried to use Picard AddOrReplaceReadgroups on these BAM files. I got the following error message:

Exception in thread "main" net.sf.samtools.SAMFormatException: Error parsing SAM header. Problem parsing @pg key:value pair. Line:
@pg ID: PN:ngm CL: --affine 0 --bam 1 --block_multiplier 2 --bs_cutoff 8 --bs_mapping 0 --cpu_threads 4 --dualstrand 1 --fast_pairing 0 --format 2 --gap_extend_penalty 5 --gap_read_penalty 20 --gap_ref_penalty 20 --gpu 1 { 0 } --hard_clip 0 --kmer 13

Seemingly the problems seem to arise from the "--" in all the header lines

Best regards
Thomas

Now packaged in Homebrew Science

brew install brewsci/bio/nextgenmap

Error: X connection to localhost:17.0 broken (explicit kill or server shutdown).

I'm trying to run the latest Github version of NGM on a server I connect to via ssh. However it fails pretty quickly with an X error. See the the log below:

~/src/NextGenMap/bin/ngm-0.5.0/ngm -r viruses.fa -1 R1.nohuman.fastq -2 R2.nohuman.fastq
[MAIN] NextGenMap 0.5.0
[MAIN] Startup : x64 (build May 26 2016 10:47:40)
[MAIN] Starting time: 2016-05-26.13:44:33
[CONFIG] Parameter:  --affine 0 --argos_min_score 0 --bin_size 2 --block_multiplier 2 --bs_cutoff 6 --bs_mapping 0 --cpu_threads 1 --dualstrand 1 --fast 0 --fast_pairing 0 --force_rlength_check 0 --format 1 --gap_extend_penalty 5 --gap_read_penalty 20 --gap_ref_penalty 20 --hard_clip 0 --keep_tags 0 --kmer 13 --kmer_min 0 --kmer_skip 2 --match_bonus 10 --match_bonus_tc 2 --match_bonus_tt 10 --max_cmrs 2147483647 --max_equal 1 --max_insert_size 1000 --max_read_length 0 --min_identity 0.650000 --min_insert_size 0 --min_mq 0 --min_residues 0.500000 --min_score 0.000000 --mismatch_penalty 15 --mode 0 --no_progress 0 --no_unal 0 --ocl_threads 1 --overwrite 1 --pair_score_cutoff 0.900000 --paired 1 --parse_all 1 --pe_delimiter / --qry1 R1.nohuman.fastq --qry2 R2.nohuman.fastq --qry_count -1 --qry_start 0 --ref viruses.fa --ref_mode -1 --sensitive 0 --silent_clip 0 --skip_mate_check 0 --skip_save 0 --slam_seq 0 --step_count 4 --strata 0 --topn 1 --trim5 0 --update_check 0 --very_fast 0 --very_sensitive 0
[NGM] Wrinting output (SAM) to stdout
[SEQPROV] Reading encoded reference from viruses.fa-enc.2.ngm
[SEQPROV] Reading 0 Mbp from disk took 0.00s
[PREPROCESS] Reading RefTable from viruses.fa-ht-13-2.3.ngm
[PREPROCESS] Reading from disk took 0.67s
[PREPROCESS] Max. k-mer frequency set so 100!
[INPUT] Input is single end data.
[INPUT] Opening file R1.nohuman.fastq for reading
[INPUT] Opening file R2.nohuman.fastq for reading
[INPUT] Input is Fastq
[INPUT] Estimating parameter from data
[INPUT] Reads found in files: 305562
[INPUT] Average read length: 73 (min: 35, max: 78)
[INPUT] Corridor width: 15
[INPUT] Average kmer hits pro read: 0.075254
[INPUT] Max possible kmer hit: 20
[INPUT] Estimated sensitivity: 0.300000
[INPUT] Estimating parameter took 4.723s
[INPUT] Input is Fastq
[INPUT] Input is Fastq
X connection to localhost:17.0 broken (explicit kill or server shutdown).

-h should return 0, not 1

I asked for the help, and got it. It's not an error.

(i'm packaging for brew)

-t/--threads cmd line parsing error

. I am using v0.5.5, and originally submitted with the following cmd:

ngm \
      -r KN99_genome_fungidb.fasta \
      -q run_1272_s_3_withindex_sequence_AAATGCA.fastq.gz \
      --threads 2 \
      --bam

but, I received this error:

/usr/local/bin/ngm-core: unrecognized option '--threads'

The help text says --threads is a valid argument:

...
General:

 -t/--threads <int>            Number of candidate search threads
...

I resubmitted, replacing --threads with -t and eliminated that error:

  ngm \
     -r KN99_genome_fungidb.fasta \
     -q run_1272_s_3_withindex_sequence_AAATGCA.fastq.gz \
     -t 2 \
     --bam

when I try to make following the installing Instructions，it occurs 2 errors:make[2]: * [src/CMakeFiles/ngm-core.dir/ReadProvider.cpp.o] Error 1 137 make[1]: * [src/CMakeFiles/ngm-core.dir/all] Error 2, how can I solve the problems? gcc version 4.1.2

Error: NVIDIA Platform not found

Hello,

I compiled ngm myself, apparently without errors. It works well with the CPU computation.
However, when I use the -g option I get:

[OPENCL] Available platforms: 1
[OPENCL] AMD Accelerated Parallel Processing
[OPENCL] NVIDIA Platform not found. Falling back to AMD.
[OPENCL] Available platforms: 1
[OPENCL] AMD Accelerated Parallel Processing
[OPENCL] Selecting OpenCl platform: AMD Accelerated Parallel Processing
[OPENCL] Platform: OpenCL 1.2 AMD-APP (1214.3)
[OPENCL] Couldn't get number of OpenCl devices. Error: 
Error: Device not found. (-1)
terminate called without an active exception

I am using Manjaro linux on a Lenovo thinkpad P70. The GPU is the Quadro M600M, driver installed as per the Manjaro documentation
sudo mhwd -a pci nonfree 0300

Thank you

Nvidia platform not found, under windows10 Bash.

Hi,
I have installed the latest version (0.5.5) under windows bash, everything went flawlessly and the test run outputs the following:
[MAIN] Done (2000 reads mapped (100.00%), 0 reads not mapped, 2000 lines written)(elapsed: 2.439756s)
But when I add the "-g" option, output is as follows;

alp@Windows8:/mnt/c/NextGenMap-0.5.5/bin/ngm-0.5.5$ ./ngm -r dh10b_ecoli.fasta -1 dh10b_ecoli.fasta_1.fastq -2 dh10b_ecoli.fasta_2.fastq -g -o test.sam
[MAIN] NextGenMap 0.5.5
[MAIN] Startup : x64 (build Feb 23 2018 16:21:14)
[MAIN] Starting time: 2018-02-23.16:35:52
[CONFIG] Parameter:  --affine 0 --argos_min_score 0 --bin_size 2 --block_multiplier 2 --broken_pairs 0 --bs_cutoff 6 --bs_mapping 0 --cpu_threads 1 --dualstrand 1 --fast 0 --fast_pairing 0 --force_rlength_check 0 --format 1 --gap_extend_penalty 5 --gap_read_penalty 20 --gap_ref_penalty 20 --gpu 1 { 0 } --hard_clip 0 --keep_tags 0 --kmer 13 --kmer_min 0 --kmer_skip 2 --match_bonus 10 --match_bonus_tc 2 --match_bonus_tt 10 --max_cmrs 2147483647 --max_equal 1 --max_insert_size 1000 --max_polya -1 --max_read_length 0 --min_identity 0.650000 --min_insert_size 0 --min_mq 0 --min_residues 0.500000 --min_score 0.000000 --mismatch_penalty 15 --mode 0 --no_progress 0 --no_unal 0 --ocl_threads 1 --output test.sam --overwrite 1 --pair_score_cutoff 0.900000 --paired 1 --parse_all 1 --pe_delimiter / --qry1 dh10b_ecoli.fasta_1.fastq --qry2 dh10b_ecoli.fasta_2.fastq --qry_count -1 --qry_start 0 --ref dh10b_ecoli.fasta --ref_mode -1 --sensitive 0 --silent_clip 0 --skip_mate_check 0 --skip_save 0 --slam_seq 0 --step_count 4 --strata 0 --topn 1 --trim5 0 --update_check 0 --very_fast 0 --very_sensitive 0
[NGM] Opening for output (SAM): test.sam
[SEQPROV] Reading encoded reference from dh10b_ecoli.fasta-enc.2.ngm
[SEQPROV] Reading 4 Mbp from disk took 0.00s
[PREPROCESS] Reading RefTable from dh10b_ecoli.fasta-ht-13-2.3.ngm
[PREPROCESS] Reading from disk took 0.39s
[PREPROCESS] Max. k-mer frequency set so 100!
[INPUT] Input is paired end data.
[INPUT] Opening file dh10b_ecoli.fasta_1.fastq for reading
[INPUT] Opening file dh10b_ecoli.fasta_2.fastq for reading
[INPUT] Input is Fastq
[INPUT] Estimating parameter from data
[INPUT] Reads found in files: 1000
[INPUT] Average read length: 100 (min: 100, max: 102)
[INPUT] Corridor width: 20
[INPUT] Average kmer hits pro read: 29.000000
[INPUT] Max possible kmer hit: 29
[INPUT] Estimated sensitivity: 0.900000
[INPUT] Estimating parameter took 0.003s
[INPUT] Input is Fastq
[INPUT] Input is Fastq
[OPENCL] Available platforms: 1
[OPENCL] AMD Accelerated Parallel Processing
[OPENCL] NVIDIA Platform not found. Falling back to AMD.
[OPENCL] Available platforms: 1
[OPENCL] AMD Accelerated Parallel Processing
[OPENCL] Selecting OpenCl platform: AMD Accelerated Parallel Processing
[OPENCL] Platform: OpenCL 1.2 AMD-APP (1214.3)
[OPENCL] Couldn't get number of OpenCl devices. Error:
Error: Device not found. (-1)
terminate called without an active exception

Is this due to windows environment or need some other tweaks to make it running? System have two Nvidia boards, one is gtx1060 3GB and other is GTX 1080Ti, primary is 1060. The driver is v388.13

Read discarded

Hi,

I mapped my paired-end reads to the genome using:
ngm -t $threads -p -b -r $ref -1 $in1 -2 $in2 -o $outputFile

and I found the log said:
[MAIN] Done (933226 reads mapped (83.79%), 180582 reads not mapped (30 discarded), 1113778 lines written)(elapsed: 55.646282s)

I am not sure why some reads were discarded?

In addition, I mapped the sam dataset to another reference, and the number of discarded reads is different.
It makes me confuse.

[MAIN] Done (931410 reads mapped (83.62%), 182398 reads not mapped (16 discarded), 1113792 lines written)(elapsed: 61.987221s)

Old use of PERL license 1.0

The old Artistic License 1.0 is considered non-free (i.e. not open source) and shouldn't be used any more. Instead the Artistic License 2.0 or a dual AL/GPL license can be used. These provide the most amount of compatibility with other open source projects.

More details available here: https://en.wikipedia.org/wiki/Artistic_License

Also of concern is the inclusion of 'All rights reserved' in Templatized C++ Command Line Parser Library. The original code on sourceforge is a permissive license, but perhaps Daniel Aarno's modifications made it proprietary. Either way, it's inclusion is copyright infringement or a mistake in the license file (I hope it's the later)

Considering I'm not a lawyer and this project uses a couple of different code sources with their own licenses. I recommend contacting the Freedom Software Law Center https://www.softwarefreedom.org/about/contact/ they offer open source projects legal advice in cases such as this.

Tag handling with --keep-tags creates invalid SAM output

Hi,

I was attempting to re-align reads that were aligned with another aligner.
I was using the --keep-tags option, primarily because I have RG tags and OQ tags that I care about on my reads. However, with --keep-tags, the other tags including MD, NM, MC, and AS are also copied. Since NGM also sets these tags, they are appended to the end of the read so that these tags all appear twice in the read. This is a violation of the SAM specification and consequently causes SAMtools to crash when it tries to parse the read.

As an example, the malformed read looks like this: 2 151M = 129799794 343 CCCTTGCTGCATGAGCCAGTAGCTGGGTGGGCATGGTAGCCTCTTGTCTTCCTAGCTTGCCCCTCCAGACATGGAACCTCCACACTGTGAGCGACTTGGTGTGGGGCAATCCAGGCAGATGTGCTCAGTCTGCCACACCTAGGATGGGGCT :862939:9=:=<<=9===<>4=>==<,;054=6;':=>8;/1/5;==?-<>??;<>>>9<<9?=&><7;;>28=.<<0:9-7>>@97<+<'+;3?>3)<:>[email protected]=@2:1-)>><?4?A).=??<)3=.;@>?A,*4@A5;#### MD:Z:10C48A91 PG:Z:MarkDuplicates.1E.5J RG:Z:HK2WY.5 NM:C:2 OQ:Z:####A7AA7,,FFFAA,A7,7FFA,,AF7AA<<,,7A7KF<,FFKKKFFAA<,7FAA<7,F,F7AKKFA,AA,FF,FA7FAF7FF(FKAFFAKKFFFKKKF,KKFF7,7,FAKF<,F7F<<<F,FKKKF<KAKKKAKKFKAFA<A<<,<<< UQ:C:22 AS:C:141 MQ:i:60 MC:Z:151M AS:i:1460 NM:i:2 NH:i:0 XI:f:0.9868 X0:i:0 XE:i:39 XR:i:151 MD:Z:91T48G10

For now, I'll get around this by getting the reads and aligning them as FASTQ, but if NGM is still being developed I think a good option would be to allow the user to specify which tags to keep when using --keep-tags, have NGM overwrite tags it outputs, or allow more user control over which tags are output by NGM.

RNA-seq support

Dear Developers,

is there a possibility to support mapping RNA-seq reads into the genome? I can provide an input annotation file with list of exons to help the mapper.

Thanks, Minh

Support / checking for long lines in reference fasta

Love the software. But it took me noticing a few odd things in my results before I went and read all the docs carefully, and then I found this listed (very clearly) in open issues on the github README:

The length of a line in a input FASTA file must not exceed 4096 bp.

It would be great if you could fix this, so that it would read any length of input lines from a reference.fa file. Failing that, checking whether the reference will be truncated and spitting an error should presumably be just a couple of lines of code.

Right now, I find the behaviour a little troubling: NextGenMap ran perfectly well on my data, and it wasn't until I was looking at the output that I realised something must be up (a lot of the genome had 0 mapping quality). To me, this has the potential to cause inferential issues to users (admittedly, users who don't look carefully at their output... but we know they exist) who aren't aware the issue exists. A simple error & quit, or just fixing the issue (even by reformatting the reference.fa to the format you need) should both be pretty simple, and might help avoid issues for users.

-i (--min-identity) parameter may not function

Dear NGM developers and users,

I explored the -i (--min-identity) parameter, using the default (-i 0.65) and +/- 0.1 values (-i 0.55 and -i 0.75). I obtainted strictly identical results with those three different values for -i.

As I am working with highly polymorphic genomes, I was expecting the -i parameter to have a significant impact of the output, which leads me to suspect that modifying this value is not taken into account by ngm. You may want to investigate this into more details.

For your information,
Mathieu

NextGenMap not finding NVIDIA GPU

Hi, I tried to run NextGenMap on an NVIDIA GPU under Ubuntu 13.04 with cuda-5.5 installed from the NVIDIA repositories. Unfortunately, it did not detect the GPU:

[MAIN] NextGenMap 0.4.10
[MAIN] Startup : x64 (build Feb  4 2014 16:51:09)
[MAIN] Starting time: 2014-02-05.10:34:05
[CONFIG] Parameter:  --affine 0 --block_multiplier 2 --bs_cutoff 8 --bs_mapping 0 --cpu_threads 8 --dualstrand 1 --fast_pairing 0 --format 1 --gap_extend_penalty 5 --gap_read_penalty 20 --gap_ref_penalty 20 --gpu 1 { 0 } --hard_clip 0 --kmer 13 --kmer_min 0 --kmer_skip 2 --match_bonus 10 --match_bonus_tc 4 --match_bonus_tt 4 --max_cmrs 2147483647 --max_equal 1 --max_insert_size 1000 --min_identity 0.650000 --min_insert_size 0 --min_residues 0.500000 --min_score 0.000000 --mismatch_penalty 15 --mode 0 --no_progress 0 --no_unal 0 --ocl_threads 1 --output performance/ERS251102_1.5000000.ngm.sam --overwrite 1 --pair_score_cutoff 0.900000 --paired 0 --parse_all 1 --pe_delimiter 47 --qry sampled/ERS251102_1.5000000.fastq.gz --qry_count -1 --qry_start 0 --ref ref/genome.fa --ref_mode -1 --silent_clip 0 --skip_save 0 --step_count 4 --strata 0 --topn 1
[STATS] Exporting stats at 0x191b000 (Key 0x731a0243, ID 0x0)
[NGM] Opening for output (SAM): performance/ERS251102_1.5000000.ngm.sam
[NGM] NGM Core initialization
[SEQPROV] Init sequence provider.
[SEQPROV] Reading encoded reference from ref/genome.fa-enc.ngm
[SEQPROV] Reading from disk took 0.43s
[PREPROCESS] Reading RefTable from ref/genome.fa-ht-13-2.ngm
[PREPROCESS] Reading from disk took 1.07s
[INPUT] Initializing ReadProvider
[INPUT] Input is Fastq
[INPUT] Estimating parameter from data
[INPUT] Reads found in files: 5000000
[INPUT] Average read length: 100 (min: 100, max: 102)
[INPUT] Corridor width: 20
[INPUT] Average kmer hits pro read: 23.857899
[INPUT] Max possible kmer hit: 29
[INPUT] Estimated sensitivity: 0.822686
[INPUT] Initializing took 12.771s
[INPUT] Input is Fastq
[MAIN] Core initialization complete
[OPENCL] Couldn't get OpenCl platform ids. Error: 
[OPENCL] NVIDIA Platform not found. Falling back to AMD.
[OPENCL] Couldn't get OpenCl platform ids. Error: 
[OPENCL] No OpenCl platform found.

Any idea what happens here? My own Opencl stuff using pyopencl works fine, and it is really a default setup.

BAM validation problems (also re: Picard)

On BAM output files from ngm (which is great so far, btw!), Picard reports (when adding RG to or validating the BAM):

Mapped mate should have mate reference name

For the moment, this error can be worked around using:

java -Xmx30g -jar \
~/bin/FixMateInformation.jar \
I=<input>.bam \
O=<output>.bam \
VALIDATION_STRINGENCY=LENIENT

But it would be excellent to get valid BAM straightaway on output from ng.

@RG headers not being passed on to BAM

Hi,

I have sequencing data from different lanes/batches that I've merged into a single unaligned BAM. The headers in the merged unaligned bams look something like this:

@RG	ID:L001	SM:SAMPLEXXX	LB:L001	PL:Illumina
@RG	ID:L002	SM:SAMPLEXXX	LB:L002	PL:Illumina

But then when I run ngm-core using the --keep-tags flag these get ignored in the output file. However, the read group info in the reads is present (i.e., there are reads with their respective RG:Z:L001 and RG:Z:L002 tags).

I'm wondering if there is bug that is preventing the @rg headers to be passed on to the output file.

Thanks,
Santiago

Massive discrepency between proper pair counts between NGM and bwa

I was wondering if you could provide some insight into why there would be such a massive discrepancy in how many proper pairs ngm finds when compared to bwa. I'm not insinuating that one is wrong and the other is right, just wanting to get an idea of why they would be so different.
Here is the samtools flagstat output for the same reads mapped to the same reference using:

ngm

3382756 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
3211477 + 0 mapped (94.94% : N/A)
3382756 + 0 paired in sequencing
1691378 + 0 read1
1691378 + 0 read2
######################################
456738 + 0 properly paired (13.50% : N/A)
######################################
3201086 + 0 with itself and mate mapped
10391 + 0 singletons (0.31% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

bwa mem

3383677 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
921 + 0 supplementary
0 + 0 duplicates
3213327 + 0 mapped (94.97% : N/A)
3382756 + 0 paired in sequencing
1691378 + 0 read1
1691378 + 0 read2
######################################
3148218 + 0 properly paired (93.07% : N/A)
######################################
3202742 + 0 with itself and mate mapped
9664 + 0 singletons (0.29% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Error: Tried to insert kmer 9318400 starting at position 143779831, number of slots 65. Position: 65

When trying to map against mm10, while building the "ref table", NGM version 0.5.5 (installed via BioConda) aborts with the error:

Tried to insert kmer 9318400 starting at position 143779831, number of slots 65. Position: 65

Weirdly enough, retrying multiple times always produced similar errors, but for a different position and number of slots. The error does not occur with NGM 0.5.3 (also installed with BioConda). 0.5.4 was not tested, since it's not available through BioConda.

The only commit between 0.5.3 and 0.5.5 that touches code in src/PrefixTable.cpp seems to be 54ee69fd5051, but I can't seen an obvious problem in that commit...

Error message received while installing

CMake Error at CMakeLists.txt:28 (message):
The compiler /usr/bin/c++ has no C++11 support. Please use a different C++
compiler.

-- Configuring incomplete, errors occurred!
See also "/data/results/Dr_Ashwin/SVfinder_SV_analysis/ngmlr-master/ngmlr-master/build/CMakeFiles/CMakeOutput.log".
See also "/data/results/Dr_Ashwin/SVfinder_SV_analysis/ngmlr-master/ngmlr-master/build/CMakeFiles/CMakeError.log".

Please help me to get it installed correctly

Is there any way to recreate the semantics of bwa's X0 auxiliary SAM tag? -- finding reads that map to a single location well

BWA produces in its output an auxiliary SAM tag called X0. X0's value is an integer N, where N is the number of "distinct best" alignments for a given read. (e.g. if two different deletions on a given read make that read map best in two distinct sites, and map equally well there, then that read would be tagged with X0==2)

Is there any way to express a filtering step equivalent to reads where X0==1 on a bam aligned with NGM? i.e. to select reads that map uniquely in one place.

A tool, angsd, further down my pipeline currently relies on that flag to perform filtering , and I'd like to stick with NGM for alignment. I'm hoping there's a post-processing step I can do, or a patch I could apply.

Error during testing: [SEQPROV] RefBase file not found ()

I compiled with no errors, following instructions, and received this error when testing. Any help would be greatly appreciated!

./ngm -r dh10b_ecoli.fasta -1 dh10b_ecoli.fasta_1.fastq -2 dh10b_ecoli.fasta_2.fastq -o test.sam

[MAIN] NextGenMap 0.4.12
[MAIN] Startup : x64 (build Mar 15 2016 09:42:53)
[MAIN] Starting time: 2016-03-15.09:58:17
[CONFIG] Parameter: --affine 0 --argos_min_score 0 --block_multiplier 2 --bs_cutoff 6 --bs_mapping 0 --cpu_threads 1 --dualstrand 1 --fast_pairing 0 --format 1 --gap_extend_penalty 5 --gap_read_penalty 20 --gap_ref_penalty 20 --hard_clip 0 --keep_tags 0 --kmer 13 --kmer_min 0 --kmer_skip 2 --match_bonus 10 --match_bonus_tc 4 --match_bonus_tt 4 --max_cmrs 2147483647 --max_equal 1 --max_insert_size 1000 --min_identity 0.650000 --min_insert_size 0 --min_mq 0 --min_residues 0.500000 --min_score 0.000000 --mismatch_penalty 15 --mode 0 --no_progress 0 --no_unal 0 --ocl_threads 1 --output test.sam --overwrite 1 --pair_score_cutoff 0.900000 --paired 1 --parse_all 1 --pe_delimiter / --qry1 dh10b_ecoli.fasta_1.fastq --qry2 dh10b_ecoli.fasta_2.fastq --qry_count -1 --qry_start 0 --ref dh10b_ecoli.fasta --ref_mode -1 --silent_clip 0 --skip_mate_check 0 --skip_save 0 --step_count 4 --strata 0 --topn 1
[STATS] Exporting stats at 0x8f60b000 (Key 0x730110db, ID 0x18001)
[NGM] Opening for output (SAM):
[NGM] NGM Core initialization
[SEQPROV] Init sequence provider.
[SEQPROV] RefBase file not found ()
This error is fatal. Quitting...

GCC

gcc (Ubuntu 5.2.1-22ubuntu2) 5.2.1 20151010

Ubuntu

Distributor ID: Ubuntu
Description: Ubuntu 15.10
Release: 15.10
Codename:

GPU

*-display:1
description: VGA compatible controller
product: GK104GL [GRID K520]
vendor: NVIDIA Corporation
physical id: 3
bus info: pci@0000:00:03.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nvidia latency=0
resources: irq:28 memory:ea000000-eaffffff memory:c0000000-c7ffffff memory:e2000000-e3ffffff ioport:c100(size=128) memory:ef000000-ef07ffff
*-display:2
description: VGA compatible controller
product: GK104GL [GRID K520]
vendor: NVIDIA Corporation
physical id: 4
bus info: pci@0000:00:04.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nvidia latency=0
resources: irq:32 memory:eb000000-ebffffff memory:c8000000-cfffffff memory:e4000000-e5ffffff ioport:c180(size=128) memory:ef080000-ef0fffff
*-display:3
description: VGA compatible controller
product: GK104GL [GRID K520]
vendor: NVIDIA Corporation
physical id: 5
bus info: pci@0000:00:05.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nvidia latency=0
resources: irq:36 memory:ec000000-ecffffff memory:d0000000-d7ffffff memory:e6000000-e7ffffff ioport:c200(size=128) memory:ef100000-ef17ffff
*-display:4
description: VGA compatible controller
product: GK104GL [GRID K520]
vendor: NVIDIA Corporation
physical id: 6
bus info: pci@0000:00:06.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress vga_controller bus_master cap_list rom
configuration: driver=nvidia latency=0
resources: irq:40 memory:ed000000-edffffff memory:d8000000-dfffffff memory:e8000000-e9ffffff ioport:c280(size=128) memory:ef180000-ef1fffff

Odd MapQ result compared to BWA

Hi all,

First of all, many apologies. I thought I'd posted an issue here on this topic some time ago, but it appears that is not the case.

We have encountered a significant discrepancy between the mapping qualities reported by BWA, and by NGM. Specifically, a reasonable proportion (approx 1/3) of reads have quality scores less than 10 when using NGM in default sensitivity mode. The actual alignment of a read was identical in most cases, and there was no difference between aligners in the number of alignments for a read (at least on a few screenfuls of alignments, and on average). This caused us to filter out a significant proportion of our data. Note that this happened relatively equally across all ~1500 sequencing runs (of ~500 distinct samples). There are about 200 samples with approx 10x coverage, and the remainder have approx 5x.

We have a blog post on my blog describing this, and the effects of this issue, in further detail.

A couple of questions:

Does this behaviour sound deliberate on NGM's part, or more like a bug? If this is deliberate, what could we have mis-configured to cause such low mapping qualities?
If this sounds like a bug, how should we help debug it? I can post logs, however the data is not yet public (though I can send a subset).

Thanks,
Kevin

Incorrect permissions after install

After successful compilation, the ngm script is installed with incorrect permissions under /usr/bin:

$ ls -l /usr/bin | grep ngm
-rwxr-x---. 1 root root        466 May 23 14:16 ngm
-rwxrwxr-x. 1 root root   15618416 May 23 14:16 ngm-core
-rwxrwxr-x. 1 root root   14665096 May 23 14:16 ngm-core-debug
-rwxr-x---. 1 root root        483 May 23 14:16 ngm-debug
-rwxr-x---. 1 root root        472 May 23 14:16 ngm-log
-rwxrwxr-x. 1 root root   15054712 May 23 14:16 ngm-utils
-rwxrwxr-x. 1 root root   11280904 May 23 14:16 ngm-utils-debug

These should all have 755 permissions, so that all users on the system can run them.

It also affects the Docker container when run as a non-privileged user with docker run -u 1000:1000 -v etc. Singularity containers using the same set of installation instructions fail for the same reason.

"Broken paired" interleaved FASTQ input

Hi,

Thanks for releasing nextgenmap, it's a great tool.

We're in the process of switching from using BWA to nextgenmap for read alignment. One sticking point at the moment is the support for "broken paired" read input. This is like interleaved format, except that reads whose pairs have been lost during trimming do not appear, e.g.:

@frag1/1
ACGT
+
IIIII
@frag1/2
CAGT
+
IIIII
@frag2/1
ACGT
+
IIIII
@frag3/1
ACGT
+
IIIII
@frag3/2
CAGT
+
IIIII

BWA mem is able to handle this format when given the -p flag. It does so by checking the read ids and treating reads that come with identical (before the /{1,2}) read ids as a pair, and the rest as single end.

I notice that I get an error while using this format with nextgenmap that the read ids do not match. How difficult would it be to write a command line flag that allows ngm to parse this format? Perhaps --paired with --qry could trigger this behaviour? Our current work-around is to add fake reads consisting of a single N to ensure every single read is correctly paired. I guess it could also be posible to implement this behaviour by filtering out such reads when writing the SAM?

Thanks,
Kevin

Question about indexing the reference genome

Hi I see that NGM has its own method for indexing the reference genome but I'm wondering if the reference could also be indexed ahead of time using other tools (samtools faidx)? What are the consequences of indexing the reference with samtools instead of NGM? Thanks!

OpenCL 1.1, 1.2 and Mac issues

Let me preface this with the fact that the last time I did any coding in C/CPP (and that was only little, too) was >10 years ago, so bear with me.

I've been trying to compile on a old Mac Pro with an ATI gfx card and have been running in to some issues.

I'd like to submit a pull request, but I haven't solved all of them yet, and my 'fixes' so far do not involve any checking.

In Yosemite (10.10.4) with XCode 6 OpenMP is not supported out of the box see this question on SO, so I installed gcc (importantly, perhaps: gcc-5) through homebrew.
Yosemite 10.10/XCode 6 ships with OpenCL 1.2
Some stuff down the road doesn't work fully (I'll get to that in a minute)

1 Fixing issues with OpenCL

In root CMakeLists.txt:
set(OPENCL_LIBRARIES, " -lOpenCL") will not include the OpenCL framework properly, change to:

if(APPLE)
    set(OPENCL_LIBRARIES " -framework OpenCL")
else(APPLE)
    set(OPENCL_LIBRARIES " -lOpenCL")
endif(APPLE)

1.1 OpenCL 1.1 vs 1.2 properties

in lib/mason/opencl/OclHost.cpp the use of then OpenCL property CL_DEVICE_PARTITION_EQUALLY_EXT only refers to OpenCL 1.1, CL_DEVICE_PARTITION_EQUALLY needs to be used for 1.2. Incidentally, this is used on line 102, but not 206. See here for a quick comparison. Similarly CL_PROPERTIES_LIST_END_EXT on line 206 should terminate the list, but strangely does not work, even though listed above as functional in OpenCL 1.2. CL_PROPERTIES_LIST_END_EXT is defined as 0, so setting partitionPrty[2] = 0 on line 208 should fix that.

1.2 NVidia vs ATI CL properties

Again in lib/mason/opencl/OclHost.cpp, this time in int OclHost::getThreadPerMulti()
CL_DEVICE_COMPUTE_CAPABILITY_MAJOR_NV only gets major revision from NVIDIA cards, to achieve the same from an ATI card you need to use CL_DEVICE_GFXIP_MAJOR_AMD and MINOR_AMD respectively. Strangely, this is not caught in the OSX OpenCL framework, but defining it in include/CL/opencl.h helped. I added the following lines after the #ifdef __APPLE__ ... #endif, as suggested in this amd community post:

/* Two properties required for AMD in OpenCL 1.2 */
#define CL_DEVICE_GFXIP_MAJOR_AMD                   0x404A
#define CL_DEVICE_GFXIP_MINOR_AMD                   0x404B

and changed the COMPUTE_CAPABILITY_MAJOR_NV and COMPUTE_CAPABILITY_MINOR_NV reference to GFXIP_MAJOR_AMD and GFXIP_MINOR_AMD, respectively.

2 Stuff down the road

So far so good, but other errors are cropping up.

2.1 Function not defined on OSX:

src/writer/PlainFileWriter.h:46:75: error: 'fwrite_unlocked' was not declared in this scope

Turns out, fwrite_unlocked is not defined on OS X.
As per this issue, adding the following in PlainFileWriter.h fixed it:

#ifdef __APPLE__
  #define fwrite_unlocked fwrite
  #define fflush_unlocked fflush
#endif

2.2 Multithreading on mac

cpu_set_t, mask, CPU_SET, and pthread_setaffinity_np are not defined on osx, it is all handled by a mac-specific thread-affinity API. As this isn't easy to fix, devs usually resort to disabling multithreading on Macs. However, Facebook has addressed some of these issues in a virtual machine they wrote (hhvm), which can be seen in the following header file:
https://github.com/facebook/hhvm/blob/master/hphp/runtime/ext/hotprofiler/ext_hotprofiler.h
Furthermore, BRL-CAD (through libbu++) has apparently also solved this problem, slightly differently:
https://github.com/kanzure/brlcad/blob/master/src/libbu/affinity.c

So I fiddled around a bit and modified src/core/unix_threads.cpp somewhat.

on the top add:

#if defined(__APPLE__)
#  include <mach/thread_policy.h>
#  include <mach/mach.h>
#  include <stdio.h>
#  include <stdlib.h>
#  include <sys/sysctl.h>
#endif

Change NGMSetThreadAffinity to:

void NGMSetThreadAffinity(NGMThread * thread, int cpu)
{
    if (cpu == -1)
        return;

    pthread_t self = 0;

    #if defined(__APPLE__)
    /* Mac OS X mach thread affinity hinting.  Mach implements a CPU
     * affinity policy by default so this just sets up an additional
     * hint on how threads can be grouped/ungrouped.  Here we set all
     * threads up into their own group so threads will get their own
     * cpu and hopefully be kept in place by Mach from there.
     */
    thread_extended_policy_data_t epolicy;
    thread_affinity_policy_data_t apolicy;
    // This should work
    thread_t curr_thread = mach_thread_self();
    kern_return_t ret;

    /* discourage interrupting this thread */
    epolicy.timeshare = FALSE;

    ret = thread_policy_set(curr_thread, THREAD_EXTENDED_POLICY, (thread_policy_t) &epolicy, THREAD_EXTENDED_POLICY_COUNT);
    if (ret != KERN_SUCCESS)
    /* I don't want to bother with error handling and void won't return int, so we'll just print and exit */
    // return -1;
    printf("thread_policy_set(1) returned %d\n", ret);
    exit(1);

    /* Get number of CPUs from brlcad/src/libbu/parallel.c */
    int ncpu;

    size_t len;
    int maxproc;
    int mib[] = {CTL_HW, HW_AVAILCPU};

    len = sizeof(maxproc);
    if (sysctl(mib, 2, &maxproc, &len, NULL, 0) == -1) {
        perror("sysctl");
    } else {
        ncpu = maxproc; /* should be able to get sysctl to return maxproc */
    }

    /* put each thread into a separate group */
    apolicy.affinity_tag = cpu % ncpu;
    ret = thread_policy_set(curr_thread, THREAD_EXTENDED_POLICY, (thread_policy_t) &apolicy, THREAD_EXTENDED_POLICY_COUNT);
    if (ret != KERN_SUCCESS)
        /* more errors */
        // return -1;
        printf("thread_policy_set(1) returned %d\n", ret);
        exit(1);

    #else
    /* This works on linux */
    if (thread == 0)
    {
        self = pthread_self();
        thread = &self;
    }
    cpu_set_t * mask = new cpu_set_t();
    CPU_SET(cpu, mask);
    pthread_setaffinity_np(*thread, sizeof(cpu_set_t), mask);

    #endif
}

2.3 Linking on osx

Almost done now, the mac linker (strangely not gnu ld) does not accept they keyword -Bdynamic, so I tracked it down in lib/mason/opencl/CMakeLists.txt and changed target_link_libraries(MASonOpenCl "-Wl,-Bdynamic ${OPENCL_LIBRARIES}") to:

if(APPLE)
  target_link_libraries(MASonOpenCl "${OPENCL_LIBRARIES}") # Apple doesn't like -Bdynamic
else(APPLE)
  target_link_libraries(MASonOpenCl "-Wl,-Bdynamic ${OPENCL_LIBRARIES}")
endif(APPLE)

and did the same for MASonOpenCl-debug in the same file.

Now it looks like it's all dandy and it compiles, but when running ngm it segfaults after.

gdb offers the following unhelpful insight:

[SEQPROV] 0 reference sequences were skipped (length < 10).
[SEQPROV] Writing encoded reference to

Program received signal SIGSEGV, Segmentation fault.
0x00007fff852f3c67 in ?? () from /usr/lib/system/libsystem_c.dylib
(gdb) bt
#0  0x00007fff852f3c67 in ?? () from /usr/lib/system/libsystem_c.dylib
#1  0x00007fff5fbff140 in ?? ()
#2  0x00007fff852f6ec5 in ?? () from /usr/lib/system/libsystem_c.dylib
#3  0x0000000000000001 in ?? ()
#4  0x0000206000711c70 in ?? ()
#5  0x0000000000000004 in ?? ()
#6  0x00007fff5fbff0f0 in ?? ()
#7  0x0000000000000000 in ?? ()

It has loads of can't open to read symbols: No such file or directory. for a variety of files from libgomp, libstdc++-v3 and libgcc -- very weird and I'm still digging in to it. I just hope the SIGSEGV didn't come from the changes I made above…

Error: program build failure (-11) occurs when running NextGenMap with Docker

Hi there~
I wanna report an Error occurs when try to run NGM with Docker. While it can be ran through smoothly with NextGenMap installed in PC.
Thank you so much!

(base) xzhen@L190514  ~/Documents/SLAM  docker run -m 8g -v $(pwd):/data -ti philres/nextgenmap ngm -r /data/dh10b_ecoli.fasta -1 /data/dh10b_ecoli.fasta_1.fastq -2 /data/dh10b_ecoli.fasta_2.fastq -o /data/dh10b_ecoli.fasta_mapped_ngm.sam

[MAIN] NextGenMap 0.5.5
[MAIN] Startup : x64 (build Jul 22 2018 20:40:59)
[MAIN] Starting time: 2020-11-04.23:22:27
[CONFIG] Parameter: --affine 0 --argos_min_score 0 --bin_size 2 --block_multiplier 2 --broken_pairs 0 --bs_cutoff 6 --bs_mapping 0 --cpu_threads 1 --dualstrand 1 --fast 0 --fast_pairing 0 --force_rlength_check 0 --format 1 --gap_extend_penalty 5 --gap_read_penalty 20 --gap_ref_penalty 20 --hard_clip 0 --keep_tags 0 --kmer 13 --kmer_min 0 --kmer_skip 2 --match_bonus 10 --match_bonus_tc 2 --match_bonus_tt 10 --max_cmrs 2147483647 --max_equal 1 --max_insert_size 1000 --max_polya -1 --max_read_length 0 --min_identity 0.650000 --min_insert_size 0 --min_mq 0 --min_residues 0.500000 --min_score 0.000000 --mismatch_penalty 15 --mode 0 --no_progress 0 --no_unal 0 --ocl_threads 1 --output /data/dh10b_ecoli.fasta_mapped_ngm.sam --overwrite 1 --pair_score_cutoff 0.900000 --paired 1 --parse_all 1 --pe_delimiter / --qry1 /data/dh10b_ecoli.fasta_1.fastq --qry2 /data/dh10b_ecoli.fasta_2.fastq --qry_count -1 --qry_start 0 --ref /data/dh10b_ecoli.fasta --ref_mode -1 --sensitive 0 --silent_clip 0 --skip_mate_check 0 --skip_save 0 --slam_seq 0 --step_count 4 --strata 0 --topn 1 --trim5 0 --update_check 0 --very_fast 0 --very_sensitive 0
[NGM] Opening for output (SAM): /data/dh10b_ecoli.fasta_mapped_ngm.sam
[SEQPROV] Reading encoded reference from /data/dh10b_ecoli.fasta-enc.2.ngm
[SEQPROV] Reading 4 Mbp from disk took 0.02s
[PREPROCESS] Reading RefTable from /data/dh10b_ecoli.fasta-ht-13-2.3.ngm
[PREPROCESS] Reading from disk took 2.58s
[PREPROCESS] Max. k-mer frequency set so 100!
[INPUT] Input is paired end data.
[INPUT] Opening file /data/dh10b_ecoli.fasta_1.fastq for reading
[INPUT] Opening file /data/dh10b_ecoli.fasta_2.fastq for reading
[INPUT] Input is Fastq
[INPUT] Estimating parameter from data
[INPUT] Reads found in files: 1000
[INPUT] Average read length: 100 (min: 100, max: 102)
[INPUT] Corridor width: 20
[INPUT] Average kmer hits pro read: 29.000000
[INPUT] Max possible kmer hit: 29
[INPUT] Estimated sensitivity: 0.900000
[INPUT] Estimating parameter took 0.011s
[INPUT] Input is Fastq
[INPUT] Input is Fastq
[OPENCL] Available platforms: 1
[OPENCL] AMD Accelerated Parallel Processing
[OPENCL] Selecting OpenCl platform: AMD Accelerated Parallel Processing
[OPENCL] Platform: OpenCL 1.2 AMD-APP (1214.3)
[OPENCL] 1 CPU device found.
[OPENCL] Device 0: Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz (Driver: 1214.3 (sse2,avx))
[OPENCL] 6 CPU cores available.
[Progress] Mapped: 1, CMR/R: 0, CS: 0 (0), R/S: 0, Time: 0.00 0.00 0.00, Pa[Progress] Mapped: 1, CMR/R: 0, CS: 0 (0), R/S: 0, Time: 0.00 0.00 0.00, Pa[Progress] Mapped: 1, CMR/R: 0, CS: 0 (0), R/S: 0, Time: 0.00 0.00 0.00, Pa[OPENCL] Build failed: Program build failure
[OPENCL] Build status: build failed
[OPENCL] Build log:
[OPENCL] Internal Error: as failed
[OPENCL] Codegen phase failed compilation.
[OPENCL] Unable to build program end.
Error: Program build failure (-11)
terminate called without an active exception

Add --version switch

Should print ngm 0.5.5 to stdout and exit with 0.

Compiling NextGenMap from source throws error about not finding opencl on a non-gpu system

Hi,

I tried to install NextGenMap 0.5.0 on a pure CPU cluster. I was following the installation instructions provided at:

https://github.com/Cibiv/NextGenMap/wiki/Installation

The installation instructions explicitly state:

In order to build NextGenMap only cMake (>=2.8) and g++ are required.

But when following the instructions, the make step crashes with an error message that libOpenCL could not be found:

cd /scratch/54486642.tmpdir/NextGenMap-0.5.0/build/src && /cluster/apps/cmake/2.8.12/x86_64/bin/cmake -E cmake_link_script CMakeFiles/ngm-core.dir/link.txt --verbose=1
/cluster/apps/gcc/4.8.2/bin/g++   -O2 -g -DNDEBUG -ftree-vectorize -march=corei7-avx -mavx    CMakeFiles/ngm-core.dir/parser/BamParser.cpp.o CMakeFiles/ngm-core.dir/writer/BAMWriter.cpp.o CMakeFiles/ngm-core.dir/parser/VcfParser.cpp.o CMakeFiles/ngm-core.dir/config/Config.cpp.o CMakeFiles/ngm-core.dir/CS.cpp.o CMakeFiles/ngm-core.dir/CSstatic.cpp.o CMakeFiles/ngm-core.dir/misc/Debug.cpp.o CMakeFiles/ngm-core.dir/log/Logging.cpp.o CMakeFiles/ngm-core.dir/MappedRead.cpp.o CMakeFiles/ngm-core.dir/NGM_main.cpp.o CMakeFiles/ngm-core.dir/NGM.cpp.o CMakeFiles/ngm-core.dir/UpdateCheck.cpp.o CMakeFiles/ngm-core.dir/core/NGMTask.cpp.o CMakeFiles/ngm-core.dir/AlignmentBuffer.cpp.o CMakeFiles/ngm-core.dir/PrefixTable.cpp.o CMakeFiles/ngm-core.dir/ReadProvider.cpp.o CMakeFiles/ngm-core.dir/parser/SamParser.cpp.o CMakeFiles/ngm-core.dir/writer/SAMWriter.cpp.o CMakeFiles/ngm-core.dir/writer/ScoreWriter.cpp.o CMakeFiles/ngm-core.dir/seqan/EndToEndAffine.cpp.o CMakeFiles/ngm-core.dir/SequenceProvider.cpp.o CMakeFiles/ngm-core.dir/OutputReadBuffer.cpp.o CMakeFiles/ngm-core.dir/ScoreBuffer.cpp.o CMakeFiles/ngm-core.dir/core/unix.cpp.o CMakeFiles/ngm-core.dir/core/unix_threads.cpp.o CMakeFiles/ngm-core.dir/core/windows_threads.cpp.o CMakeFiles/ngm-core.dir/core/windows.cpp.o  -o ../../bin/ngm-0.5.0/ngm-core -rdynamic -lpthread ../lib/mason/opencl/libMASonOpenCl.a ../lib/bamtools-2.3.0/src/api/libbamtools.a ../lib/zlib-1.2.7/libz.a -Wl,-Bdynamic -lOpenCL 
/usr/bin/ld: cannot find -lOpenCL
collect2: error: ld returned 1 exit status
make[2]: *** [../bin/ngm-0.5.0/ngm-core] Error 1
make[2]: Leaving directory `/scratch/54486642.tmpdir/NextGenMap-0.5.0/build'
make[1]: *** [src/CMakeFiles/ngm-core.dir/all] Error 2
make[1]: Leaving directory `/scratch/54486642.tmpdir/NextGenMap-0.5.0/build'
make: *** [all] Error 2
[sfux@eu-c7-001-01 build]$

If only cmake and g++ are required, why is the compiler trying to link against opencl, that is not mentioned as dependency to build NextGenMap ?

0.4.13 on OSX

Compiled with XCode 7.1 (AppleClang 7.0.0.7000176) which initially works fine. There are some minor compiler warnings in bamtools and seqan, but especially the former should be fixed with v2.4, according to their repo. In fact, I just tried it and bamtools-v2.4.0 does fix compiler warnings and compiles fine.

Running ngm then, however it crashes fairly early on:
When using the --output parameter it crashes with the message:

[NGM] Opening for output (SAM): �]��
[FILTER] Unable to open output file �]��
This error is fatal. Quitting...

When redirecting to stdout it crashes later on, as it thinks there are 0 reads in the input file.
I think the error arises somewhere entirely different: Config.GetString or InternalGet seems to be the culprit. When debugging I do get proper return value, but couldn't track it down what the problem then seems to be. I did change the input and output names in the command line, but that only made it worse:

[MAIN] Query file (�_��) does not exist.

Maybe a problem with iostream/sstream?

Provide an entrypoint in the docker container

The Docker container's entrypoint is not configured , which means the user has to manually specify they want to run ngm at container start.

The entrypoint should be set to /usr/bin/ngm.

glibc detected, double free or corruption

The output:

[MAIN] NextGenMap 0.5.3
[MAIN] Startup : x64 (build Jan 15 2017 18:21:53)
[MAIN] Starting time: 2017-06-27.18:23:51
[CONFIG] Parameter:  --affine 0 --argos_min_score 0 --bam 1 --bin_size 2 --block_multiplier 2 --bs_cutoff 6 --bs_mapping 0 --cpu_threads 1 --dualstrand 1 --fast 0 --fast_pairing 0 --force_rlength_check 0 --format 2 --gap_extend_penalty 5 --gap_read_penalty 20 --gap_ref_penalty 20 --hard_clip 0 --keep_tags 0 --kmer 13 --kmer_min 0 --kmer_skip 2 --match_bonus 10 --match_bonus_tc 2 --match_bonus_tt 10 --max_cmrs 2147483647 --max_equal 1 --max_insert_size 1000 --max_polya -1 --max_read_length 0 --min_identity 0.650000 --min_insert_size 0 --min_mq 0 --min_residues 0.500000 --min_score 0.000000 --mismatch_penalty 15 --mode 0 --no_progress 0 --no_unal 0 --ocl_threads 1 --output /data/user1/tools/bioprocs/tests/workdir/PyPPL.pAlignPEByNGM.notag.1yEFAyWF/0/output/test.bam --overwrite 1 --pair_score_cutoff 0.900000 --paired 1 --parse_all 1 --pe_delimiter / --qry1 /data/user1/tools/bioprocs/tests/workdir/PyPPL.pAlignPEByNGM.notag.1yEFAyWF/0/input/test_1.clean.fq.gz --qry2 /data/user1/tools/bioprocs/tests/workdir/PyPPL.pAlignPEByNGM.notag.1yEFAyWF/0/input/test_2.clean.fq.gz --qry_count -1 --qry_start 0 --ref /data/user1/tools/bioprocs/tests/workdir/PyPPL.pAlignPEByNGM.notag.1yEFAyWF/0/input/lambda_virus.fa --ref_mode -1 --rg_id ngm --rg_sm sample --sensitive 0 --silent_clip 0 --skip_mate_check 0 --skip_save 0 --slam_seq 0 --step_count 4 --strata 0 --topn 1 --trim5 0 --update_check 0 --very_fast 0 --very_sensitive 0
[NGM] Opening for output (BAM): /data/user1/tools/bioprocs/tests/workdir/PyPPL.pAlignPEByNGM.notag.1yEFAyWF/0/output/test.bam
[SEQPROV] Reading encoded reference from /data/user1/tools/bioprocs/tests/workdir/PyPPL.pAlignPEByNGM.notag.1yEFAyWF/0/input/lambda_virus.fa-enc.2.ngm
[SEQPROV] Reading 0 Mbp from disk took 0.00s
[PREPROCESS] Reading RefTable from /data/user1/tools/bioprocs/tests/workdir/PyPPL.pAlignPEByNGM.notag.1yEFAyWF/0/input/lambda_virus.fa-ht-13-2.3.ngm
[PREPROCESS] Reading from disk took 0.27s
[PREPROCESS] Max. k-mer frequency set so 100!
[INPUT] Input is paired end data.
[INPUT] Opening file /data/user1/tools/bioprocs/tests/workdir/PyPPL.pAlignPEByNGM.notag.1yEFAyWF/0/input/test_1.clean.fq.gz for reading
[INPUT] Opening file /data/user1/tools/bioprocs/tests/workdir/PyPPL.pAlignPEByNGM.notag.1yEFAyWF/0/input/test_2.clean.fq.gz for reading
[INPUT] Input is Fastq
[INPUT] Estimating parameter from data
[INPUT] Reads found in files: 10000
[INPUT] Average read length: 114 (min: 11, max: 684)
[INPUT] Corridor width: 22
[INPUT] Average kmer hits pro read: 27.263716
[INPUT] Max possible kmer hit: 34
[INPUT] Estimated sensitivity: 0.801874
[INPUT] Estimating parameter took 0.040s
[INPUT] Input is Fastq
[INPUT] Input is Fastq
[OPENCL] Available platforms: 1
[OPENCL] AMD Accelerated Parallel Processing
[OPENCL] Selecting OpenCl platform: AMD Accelerated Parallel Processing
[OPENCL] Platform: OpenCL 1.2 AMD-APP (1214.3)
[OPENCL] 1 CPU device found.
[OPENCL] Device 0: Intel(R) Xeon(R) CPU E5-4650 v2 @ 2.40GHz (Driver: 1214.3 (sse2,avx))
[OPENCL] 80 CPU cores available.
[Progress] Mapped: 2090, CMR/R: 0, CS: 0 (0), R/S: 0, Time: 0.00 0.00 0.00, Pairs: 98.60 285.18
*** glibc detected *** /data/shared/tools/miniconda2/bin/ngm-core: double free or corruption (out): 0x00007f1ef0120440 ***
======= Backtrace: =========
/lib64/libc.so.6[0x337b075dee]
/lib64/libc.so.6[0x337b078c80]
/data/shared/tools/miniconda2/bin/ngm-core(_ZN10MappedRead11clearScoresEi+0x43)[0x44ad93]
/data/shared/tools/miniconda2/bin/ngm-core(_ZN11ScoreBuffer6top1SEEP10MappedRead+0x121)[0x478c31]
/data/shared/tools/miniconda2/bin/ngm-core(_ZN11ScoreBuffer6top1PEEP10MappedRead+0x80a)[0x4798aa]
/data/shared/tools/miniconda2/bin/ngm-core(_ZN11ScoreBuffer5DoRunEv+0x3c6)[0x479c96]
/data/shared/tools/miniconda2/bin/ngm-core(_ZN11ScoreBuffer7addReadEP10MappedReadi+0x5a)[0x47a3fa]
/data/shared/tools/miniconda2/bin/ngm-core(_ZN2CS8RunBatchEP11ScoreBufferP15AlignmentBuffer+0x1df)[0x44716f]
/data/shared/tools/miniconda2/bin/ngm-core(_ZN2CS5DoRunEv+0x6b0)[0x448c90]
/data/shared/tools/miniconda2/bin/ngm-core(_ZN7NGMTask3RunEv+0x12)[0x450582]
/data/shared/tools/miniconda2/bin/ngm-core(_ZN4_NGM10ThreadFuncEPv+0x19)[0x44b1d9]
/lib64/libpthread.so.0[0x337b807aa1]
/lib64/libc.so.6(clone+0x6d)[0x337b0e8bcd]
======= Memory map: ========
00400000-00515000 r-xp 00000000 00:15 21361100537                        /data/shared/tools/miniconda2/bin/ngm-core
00715000-00717000 rw-p 00115000 00:15 21361100537                        /data/shared/tools/miniconda2/bin/ngm-core
00717000-00719000 rw-p 00000000 00:00 0 
02696000-0270c000 rw-p 00000000 00:00 0                                  [heap]
337ac00000-337ac20000 r-xp 00000000 08:05 237                            /lib64/ld-2.12.so
337ae1f000-337ae21000 r--p 0001f000 08:05 237                            /lib64/ld-2.12.so
337ae21000-337ae22000 rw-p 00021000 08:05 237                            /lib64/ld-2.12.so
337ae22000-337ae23000 rw-p 00000000 00:00 0 
337b000000-337b18a000 r-xp 00000000 08:05 274                            /lib64/libc-2.12.so
337b18a000-337b38a000 ---p 0018a000 08:05 274                            /lib64/libc-2.12.so
337b38a000-337b38e000 r--p 0018a000 08:05 274                            /lib64/libc-2.12.so
337b38e000-337b390000 rw-p 0018e000 08:05 274                            /lib64/libc-2.12.so
337b390000-337b394000 rw-p 00000000 00:00 0 
337b400000-337b483000 r-xp 00000000 08:05 319                            /lib64/libm-2.12.so
337b483000-337b682000 ---p 00083000 08:05 319                            /lib64/libm-2.12.so
337b682000-337b683000 r--p 00082000 08:05 319                            /lib64/libm-2.12.so
337b683000-337b684000 rw-p 00083000 08:05 319                            /lib64/libm-2.12.so
337b800000-337b817000 r-xp 00000000 08:05 450                            /lib64/libpthread-2.12.so
337b817000-337ba17000 ---p 00017000 08:05 450                            /lib64/libpthread-2.12.so
337ba17000-337ba18000 r--p 00017000 08:05 450                            /lib64/libpthread-2.12.so
337ba18000-337ba19000 rw-p 00018000 08:05 450                            /lib64/libpthread-2.12.so
337ba19000-337ba1d000 rw-p 00000000 00:00 0 
337bc00000-337bc02000 r-xp 00000000 08:05 452                            /lib64/libdl-2.12.so
337bc02000-337be02000 ---p 00002000 08:05 452                            /lib64/libdl-2.12.so
337be02000-337be03000 r--p 00002000 08:05 452                            /lib64/libdl-2.12.so
337be03000-337be04000 rw-p 00003000 08:05 452                            /lib64/libdl-2.12.so
337c400000-337c407000 r-xp 00000000 08:05 670                            /lib64/librt-2.12.so
337c407000-337c606000 ---p 00007000 08:05 670                            /lib64/librt-2.12.so
337c606000-337c607000 r--p 00006000 08:05 670                            /lib64/librt-2.12.so
337c607000-337c608000 rw-p 00007000 08:05 670                            /lib64/librt-2.12.so
337ec00000-337ec11000 r-xp 00000000 08:07 608                            /usr/lib64/libXext.so.6.4.0
337ec11000-337ee11000 ---p 00011000 08:07 608                            /usr/lib64/libXext.so.6.4.0
337ee11000-337ee12000 rw-p 00011000 08:07 608                            /usr/lib64/libXext.so.6.4.0
7f1ecbffe000-7f1ee0000000 rw-p 00000000 00:00 0 
7f1ee0000000-7f1ee0021000 rw-p 00000000 00:00 0 
7f1ee0021000-7f1ee4000000 ---p 00000000 00:00 0 
7f1ee8000000-7f1ee8021000 rw-p 00000000 00:00 0 
7f1ee8021000-7f1eec000000 ---p 00000000 00:00 0 
7f1eee122000-7f1eef5a7000 rw-p 00000000 00:00 0 
7f1eef5a7000-7f1eef5ab000 r-xp 00000000 08:09 171                        /tmp/OCLU7fclh.so
7f1eef5ab000-7f1eef7aa000 ---p 00004000 08:09 171                        /tmp/OCLU7fclh.so
7f1eef7aa000-7f1eef7ab000 r--p 00003000 08:09 171                        /tmp/OCLU7fclh.so
7f1eef7ab000-7f1eef7ac000 rw-p 00004000 08:09 171                        /tmp/OCLU7fclh.so
7f1eef7ac000-7f1eef7ad000 ---p 00000000 00:00 0 
7f1eef7ad000-7f1eeffbf000 rw-p 00000000 00:00 0 
7f1eeffbf000-7f1eeffc0000 ---p 00000000 00:00 0 
7f1eeffc0000-7f1ef0000000 rw-p 00000000 00:00 0 
7f1ef0000000-7f1ef28e2000 rw-p 00000000 00:00 0 
7f1ef28e2000-7f1ef4000000 ---p 00000000 00:00 0 
7f1ef4030000-7f1ef403d000 r-xp 00000000 08:05 4589                       /lib64/libnss_files-2.12.so
7f1ef403d000-7f1ef423c000 ---p 0000d000 08:05 4589                       /lib64/libnss_files-2.12.so
7f1ef423c000-7f1ef423d000 r--p 0000c000 08:05 4589                       /lib64/libnss_files-2.12.so
7f1ef423d000-7f1ef423e000 rw-p 0000d000 08:05 4589                       /lib64/libnss_files-2.12.so
7f1ef423e000-7f1ef4244000 r-xp 00000000 00:15 21022841758                /data/shared/tools/miniconda2/lib/libXdmcp.so.6.0.0
7f1ef4244000-7f1ef4443000 ---p 00006000 00:15 21022841758                /data/shared/tools/miniconda2/lib/libXdmcp.so.6.0.0
7f1ef4443000-7f1ef4444000 rw-p 00005000 00:15 21022841758                /data/shared/tools/miniconda2/lib/libXdmcp.so.6.0.0
7f1ef4444000-7f1ef4447000 r-xp 00000000 00:15 20987757570                /data/shared/tools/miniconda2/lib/libXau.so.6.0.0
7f1ef4447000-7f1ef4646000 ---p 00003000 00:15 20987757570                /data/shared/tools/miniconda2/lib/libXau.so.6.0.0
7f1ef4646000-7f1ef4647000 rw-p 00002000 00:15 20987757570                /data/shared/tools/miniconda2/lib/libXau.so.6.0.0
7f1ef4647000-7f1ef4677000 r-xp 00000000 00:15 20799082210                /data/shared/tools/miniconda2/lib/libxcb.so.1.1.0
7f1ef4677000-7f1ef4876000 ---p 00030000 00:15 20799082210                /data/shared/tools/miniconda2/lib/libxcb.so.1.1.0
7f1ef4876000-7f1ef4877000 rw-p 0002f000 00:15 20799082210                /data/shared/tools/miniconda2/lib/libxcb.so.1.1.0
7f1ef4877000-7f1ef49d8000 r-xp 00000000 00:15 21068298609                /data/shared/tools/miniconda2/lib/libX11.so.6.3.0
7f1ef49d8000-7f1ef4bd7000 ---p 00161000 00:15 21068298609                /data/shared/tools/miniconda2/lib/libX11.so.6.3.0
7f1ef4bd7000-7f1ef4bde000 rw-p 00160000 00:15 21068298609                /data/shared/tools/miniconda2/lib/libX11.so.6.3.0
7f1ef4bde000-7f1ef6d04000 r-xp 00000000 00:15 21378309287                /data/shared/tools/miniconda2/bin/opencl/lib/libamdocl64.so
7f1ef6d04000-7f1ef6f03000 ---p 02126000 00:15 21378309287                /data/shared/tools/miniconda2/bin/opencl/lib/libamdocl64.so
7f1ef6f03000-7f1ef716f000 rw-p 02125000 00:15 21378309287                /data/shared/tools/miniconda2/bin/opencl/lib/libamdocl64.so
7f1ef716f000-7f1ef7284000 rw-p 00000000 00:00 0 
7f1ef7284000-7f1ef7286000 rw-p 02392000 00:15 21378309287                /data/shared/tools/miniconda2/bin/opencl/lib/libamdocl64.so
7f1ef7296000-7f1ef7297000 ---p 00000000 00:00 0 
7f1ef7297000-7f1f0bcd7000 rw-p 00000000 00:00 0 
7f1f0bcd7000-7f1f0bced000 r-xp 00000000 00:15 19827413760                /data/shared/tools/miniconda2/lib/libgcc_s.so.1
7f1f0bced000-7f1f0beec000 ---p 00016000 00:15 19827413760                /data/shared/tools/miniconda2/lib/libgcc_s.so.1
7f1f0beec000-7f1f0beed000 rw-p 00015000 00:15 19827413760                /data/shared/tools/miniconda2/lib/libgcc_s.so.1
7f1f0beed000-7f1f0beee000 rw-p 00074000 00:15 19827413760                /data/shared/tools/miniconda2/lib/libgcc_s.so.1
7f1f0beee000-7f1f0beef000 rw-p 00000000 00:00 0 
7f1f0beef000-7f1f0c05a000 r-xp 00000000 00:15 19829243798                /data/shared/tools/miniconda2/lib/libstdc++.so.6.0.21
7f1f0c05a000-7f1f0c25a000 ---p 0016b000 00:15 19829243798                /data/shared/tools/miniconda2/lib/libstdc++.so.6.0.21
7f1f0c25a000-7f1f0c264000 r--p 0016b000 00:15 19829243798                /data/shared/tools/miniconda2/lib/libstdc++.so.6.0.21
7f1f0c264000-7f1f0c266000 rw-p 00175000 00:15 19829243798                /data/shared/tools/miniconda2/lib/libstdc++.so.6.0.21
7f1f0c266000-7f1f0c26a000 rw-p 00000000 00:00 0 
7f1f0c26a000-7f1f0c2ab000 rw-p 00178000 00:15 19829243798                /data/shared/tools/miniconda2/lib/libstdc++.so.6.0.21
7f1f0c2ab000-7f1f0c2b1000 r-xp 00000000 00:15 21369988987                /data/shared/tools/miniconda2/bin/opencl/lib/libOpenCL.so.1
7f1f0c2b1000-7f1f0c4b0000 ---p 00006000 00:15 21369988987                /data/shared/tools/miniconda2/bin/opencl/lib/libOpenCL.so.1
7f1f0c4b0000-7f1f0c4b2000 rw-p 00005000 00:15 21369988987                /data/shared/tools/miniconda2/bin/opencl/lib/libOpenCL.so.1
7f1f0c4b2000-7f1f0c4b3000 rw-p 00000000 00:00 0 
7f1f0c4d7000-7f1f0c4d9000 rw-p 00000000 00:00 0 
7ffc70878000-7ffc7088e000 rw-p 00000000 00:00 0                          [stack]
7ffc7095e000-7ffc7095f000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
/data/user1/tools/bioprocs/tests/workdir/PyPPL.pAlignPEByNGM.notag.1yEFAyWF/0/job.script: line 7:  5575 Aborted

The command:

ngm --rg-id ngm --rg-sm sample -r "/data/shared/tools/bioprocs/tests/workdir/PyPPL.pAlignPEByNGM.notag.1yEFAyWF/0/input/lambda_virus.fa.gz" -t 1 -1 "/data/shared/tools/bioprocs/tests/workdir/PyPPL.pAlignPEByNGM.notag.1yEFAyWF/0/input/test_1.clean.fq.gz" -2 "/data/shared/tools/bioprocs/tests/workdir/PyPPL.pAlignPEByNGM.notag.1yEFAyWF/0/input/test_2.clean.fq" -o "/data/shared/tools/bioprocs/tests/workdir/PyPPL.pAlignPEByNGM.notag.1yEFAyWF/0/output/test.bam" --bam

The input files:

The fasta file can be downloaded from lambda_virus.fa, which is actually a virus genome offered by bowtie2 for testing.

It also provides a script simulate.pl to generate some artifact reads. The script is available at: simulate.pl

The two files of reads used in the above test, generated by simulate.pl (then trimmed by trimmomatic), can be downloaded at:
reads1, reads2

uname -a:

Linux xxx 2.6.32-696.1.1.el6.x86_64 #1 SMP Tue Apr 11 17:13:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

lsb_release:

LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch

paired-end mapping FF orientation

Hello,
There is any option to set the orientation of the mates reads?
Thanks
Amit

ngm quits on OpenCL error

I installed the program on a Macbook Pro OS X 10.8.5 with Intel Core i7 and Intel HD Graphics 4000 (so OpenCL 1.2). After successfully building ngm with the MacPorts version of gcc 4.9, the program runs through a few of the initial processes before terminating with the following error: "[OPENCL] Couldn't get number of OpenCl devices. Error:
Error: Device not found. (-1)".

Any ideas on what is causing this problem and how to fix it would be greatly appreciated.

Thanks,
Lucy

MultiQC module for NGM

Hi developers,

is there a possibility to write a module for NGM log/output files, such that its mapping summary can be displayed in MultiQC (http://multiqc.info) report file?

Thanks, Minh

Documentation for heuristic gap model

Apologies if this already exists somewhere, but I was not able to find it after scanning the original publication, the supplements and associated references.

I was curious if there was any documentation somewhere for the "heuristic non-affine gap model" described in this poster about the mapper:

http://schatzlab.cshl.edu/publications/posters/2015/2015.GI.NextGenMap-LR.pdf

Many of us heard word that non-affine penalties increased the complexity of the algorithm (e.g. https://en.wikipedia.org/wiki/Gap_penalty#Comparing_time_complexities), and so I was curious what the heuristic was and how it was employed. Are there any good sources? This might also be a good candidate question to be added to the FAQ (https://github.com/Cibiv/NextGenMap/wiki/FAQ)?

Thanks for making this publicly available and for the great work!

Warm wishes,
Nigel

Run issue

Dear developers,

when I try to run NGM,
$./ngm
it showed that
/parastor/users/lnszyd/longyan/NGP/N_test/NextGenMap-0.5.0/bin/ngm-0.5.0/ngm-core: /usr/lib64/libstdc++.so.6: version GLIBCXX_3.4.11' not found (required by /parastor/users/lnszyd/longyan/NGP/N_test/NextGenMap-0.5.0/bin/ngm-0.5.0/ngm-core) /parastor/users/lnszyd/longyan/NGP/N_test/NextGenMap-0.5.0/bin/ngm-0.5.0/ngm-core: /usr/lib64/libstdc++.so.6: versionGLIBCXX_3.4.9' not found (required by /parastor/users/lnszyd/longyan/NGP/N_test/NextGenMap-0.5.0/bin/ngm-0.5.0/ngm-core)
when I type:
$strings /usr/lib64/libstdc++.so.6 | grep GLIBCXX
it doesn't contain GLIBCXX_3.4.9 or GLIBCXX_3.4.11, because the /usr/lib64/libstdc++.so.6 is the very old version lib
Cuz I want to use NGM in school's cluster system(GPU platform), I don't have root access to update libstdc++.so.6

And when I type:
$ldd ./ngm-core
it showed(the fifth line):
./ngm-core: /usr/lib64/libOpenCL.so.1: no version information available (required by ./ngm-core)
./ngm-core: /usr/lib64/libOpenCL.so.1: no version information available (required by ./ngm-core)
libpthread.so.0 => /lib64/libpthread.so.0 (0x000000357d800000)
libOpenCL.so.1 => /usr/lib64/libOpenCL.so.1 (0x00002b5c4f10e000)
libstdc++.so.6 => /parastor/users/lnszyd/cy_test/softwares/gcc-4.8.2/lib64/libstdc++.so.6 (0x00002b5c4f314000)
libm.so.6 => /lib64/libm.so.6 (0x000000357d000000)
libgcc_s.so.1 => /parastor/users/lnszyd/cy_test/softwares/gcc-4.8.2/lib64/libgcc_s.so.1 (0x00002b5c4f629000)
libc.so.6 => /lib64/libc.so.6 (0x000000357cc00000)
/lib64/ld-linux-x86-64.so.2 (0x000000357c800000)
libdl.so.2 => /lib64/libdl.so.2 (0x000000357d400000)

And it's no use to
$export LD_LIBRARY_PATH=/the_path_to_new_version_lib/
How can I solve this problem.
Thx!

unrecognized option -h/--help

Executing ngm with the options -h or --help results in ngm-core: unrecognized option '-h' (or '--help' respectively) and results in exit code 1. The option should be recognised as valid and result in exit code 0.

NGM version: 0.5.5

laod: unrecognized subcommand

Hi - I run ngm (0.4.12) in paired-end mode and get the errors below. It still run, but might be something to look at:

cmdTrace.c(713):ERROR:104: 'laod' is an unrecognized subcommand
cmdModule.c(411):ERROR:104: 'laod' is an unrecognized subcommand

The command used was:
ngm -r ref.fa -1 lane6_read1.fastq -2 lane6_read2.fastq -o out.sam -p -t 32

Thx

Provide example data to test that config is OK ?

Hi,

Thanks for your work on NGM, it seems like a great tool to use !
I tried it on my amplicon sequencing data and could not get it to work though. At this point I can't exclude that it comes from a peculiarity of my data, if something went wrong during the installation process or anything else.

Would it be possible for you to provide a sample dataset (raw fastq reads and its corresponding reference genome) that you use to test ngm for example ?

(A bacterial genome sequencing project would be great: it is small enough that the alignment can be run on a single computer in less than a minute (indexing a 3.6Mb genome with NGM took less than 10s on my setup).)

Thanks !

NGM: Header does not match the data

Hello,
I really need help!

I mapped paired-end short reads to a fasta file using ngm and piped the output to samtools.
I used samtools sort to generate a sorted bamfile which I want to index with samtools index.

This is the command I used:
ngm -r HK1_racon1_nanopolished_genome.fa -1 cHK1_1.fq -2 cHK1_2.fq -t64 --max-read-length 150 --affine | samtools sort -l 9 -@ 10 -O BAM -o HK1.paired.sort.bam && samtools index HK1.paired.sort.bam

I got the following error message from samtools:
[bam_sort_core] merging from 310 files and 10 in-memory blocks...
[E::hts_idx_push] Region 589897413..589897563 cannot be stored in a bai index. Try using a csi index with min_shift = 14, n_lvls >= 6
samtools index: failed to create index for "HK1.paired.sort.bam": Numerical result out of range

It seems that one or more of the reads fall outside of the header reference. The maximum length of any reference sequence is 35803611, while the read that triggered the error starts at 589897413. It looks like the longest chromosome in the SQ headers is ~36Mb while the alignments are out to ~590Mb. This would appear to indicate the header does not match the data and therefore the indexing is not working. So, the issue might be with the ngm alignment.
Do you have any tipps?

Not able to view sam file ?

Hi,
After performing alignment, not able to view sam file. whether header is missing in sam file? . How to sort it out ?
Aligner - NextGenMap
Illumina human dataset- SRR3440456 (14.14 GB)
Readlength - 126bp
ref -hg38.fa (3.1Gb)

CMD#

./ngm -r Ref/hg38.fa -1 /data/R1q20.fastq -2 /data/R2q20.fastq -t 10 -o /data/NextGenMap-0.5.0/SRR2962693.ngm.sam

Cant view Samfile

samtools view -H SRR2962693.ngm.sam
[E::hts_hopen] Failed to open file SRR2962693.ngm.sam
[E::hts_open_format] Failed to open file SRR2962693.ngm.sam
samtools view: failed to open "SRR2962693.ngm.sam" for reading: Exec format error

thanks

The docker container does not specify which version of ubuntu is used

It is best practice to specify the versions used during build, to facilitate reproducible research.

The Dockerfile simply uses FROM ubuntu, where it should specifically choose a known working release.

I've tested building FROM ubuntu:artful, so you could use that if you like.

Supplementary Data Missing

The links to the supplementary data are not working on the Bioinformatics website.

https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btt468

Is the data accessible from anywhere else?