bcgsc / ntcard Goto Github PK

View Code? Open in Web Editor NEW

76.0 19.0 9.0 1.27 MB

Estimating k-mer coverage histogram of genomics data

Home Page: http://www.bcgsc.ca/platform/bioinfo/software/ntcard

License: MIT License

C++ 86.62% Makefile 6.91% C 1.08% Shell 0.15% M4 5.25%

cardinality-estimation k-mer-counting k-mer-frequency hyperloglog streaming-algorithms

ntcard's Introduction

ntCard

ntCard is a streaming algorithm for cardinality estimation in genomics datasets. As input it takes file(s) in fasta, fastq, sam, or bam formats and computes the total number of distinct k-mers, F₀, and also the k-mer coverage frequency histogram, f_i, i>=1.

Install ntCard on macOS

Install Homebrew, and run the command

brew install brewsci/bio/ntcard

Install ntCard on Linux

Install Linuxbrew, and run the command

brew install brewsci/bio/ntcard

Compiling ntCard from GitHub

When installing ntCard from GitHub source the following tools are required:

To generate the configure script and make files:

./autogen.sh

Compiling ntCard from source

To compile and install ntCard in /usr/local:

$ ./configure
$ make 
$ sudo make install

To install ntCard in a specified directory:

$ ./configure --prefix=/opt/ntCard
$ make 
$ make install

ntCard uses OpenMP for parallelization, which requires a modern compiler such as GCC 4.2 or greater. If you have an older compiler, it is best to upgrade your compiler if possible. If you have multiple versions of GCC installed, you can specify a different compiler:

$ ./configure CC=gcc-xx CXX=g++-xx

For the best performance of ntCard, pass -O3 flag:

$ ./configure CFLAGS='-g -O3' CXXFLAGS='-g -O3'

To run ntCard, its executables should be found in your PATH. If you installed ntCard in /opt/ntCard, add /opt/ntCard/bin to your PATH:

$ PATH=/opt/ntCard/bin:$PATH

Run ntCard

ntcard [OPTIONS] ... FILE(S) ...

Parameters:

-t, --threads=N: use N parallel threads [1] (N>=2 should be used when input files are >=2)
-k, --kmer=N: the length of k-mer
-c, --cov=N: the maximum coverage of k-mer in output [1000]
-p, --pref=STRING: the prefix for output file names
-o, --output=STRING: the name for single output file name (can be used only for single compact output file)
FILE(S): input file or set of files seperated by space, in fasta, fastq, sam, and bam formats. The files can also be in compressed (.gz, .bz2, .xz) formats . A list of files containing file names in each row can be passed with @ prefix.

For example to run ntcard on a test file reads.fastq with k=50 and output the histogram in a file with prefix freq:

$ ntcard -k50 -p freq reads.fastq

To run ntcard on a test file reads.fastq with multiple k's k=32,64,96,128 and output the histograms in files with prefix freq use:

$ ntcard -k32,64,96,128 -p freq reads.fastq

As another example, to run ntcard on 5 input files file_1.fq.gz, file_2.fa, file_3.sam, file_4.bam, file_5.fq with k=64 and 6 threads and maximum frequency of c=100 on output file with prefix freq:

$ ntcard -k64 -c100 -t6 -p freq file_1.fq.gz file_2.fa file_3.sam file_4.bam file_5.fq

If we have a list of input files lib.in, to run ntCard with k=144 and 12 threads and output file with prefix freq:

$ ntcard -k144 -t12 -p freq @lib.in

Output:

The numbers Fk provide useful statistics on the input sequences. By default, F0 and F1 are output to stdout along with runtime.
- F0 denotes the number of distinct k-mers appearing in the stream sequences
- F1 is the total number of k-mers in the input datasets
- F2 is the Gini index of variation that can be used to show the diversity of k-mers and
- F∞ results in the most frequent k-mer in the input reads.
A tab separated output file with columns k, f, and n.
- k k-mer size
- f the frequency of a k-mer
- n the number of k-mers with frequency f

Publications

ntCard

Hamid Mohamadi, Hamza Khan, and Inanc Birol. ntCard: a streaming algorithm for cardinality estimation in genomics data. Bioinformatics (2017) 33 (9): 1324-1330. 10.1093/bioinformatics/btw832

ntcard's People

Contributors

Stargazers

Watchers

Forkers

altingia sjackman microfred nazaninsh tw7649116 pierrepeterlongo emollier sohnjangil

ntcard's Issues

Do you plan to upgrade to latest nthash

Hi,
I'm a member of the Debian Med team who is maintaining ntCard as well as ntHash for Debian. When ntHash version was bumped to version 2.3.0 I assumed ntCard might follow soonish and made the mistake to simply upload it to the Debian mirror. Now we have the unfortunate situation that ntCard 1.2.2 does not build against ntHash version 2.3.0 as it was reported in a bug report which boils down to the error:

g++ -DHAVE_CONFIG_H -I.  -I./Common -I/usr/include -I/<<PKGBUILDDIR>> -fopenmp -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Werror -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -c -o Common/nthll-Fcontrol.o `test -f 'Common/Fcontrol.cpp' || echo './'`Common/Fcontrol.cpp
	ntcard.cpp:2:10: fatal error: nthash/ntHashIterator.hpp: No such file or directory
	    2 | #include "nthash/ntHashIterator.hpp"
	      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
	compilation terminated.
	make[3]: *** [Makefile:632: ntcard-ntcard.o] Error 1
	make[3]: *** Waiting for unfinished jobs....
	nthll.cpp:11:10: fatal error: nthash/ntHashIterator.hpp: No such file or directory
	   11 | #include "nthash/ntHashIterator.hpp"
	      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
	compilation terminated.

I wonder whether we can expect some ntCard that is compatible with ntHash soon (or may be something is just missing in ntHash)?

Kind regards, Andreas.

Errant backtick

ame (can be used only for single compact output file)`

last char ` not meant to be there

ntCard report interpretation

I tried today the ntCard with this parameters: ntcard -k32,64,96 -p freq -t2 R1.fastq R2.fastq
I get freq_k32.hist, freq_k64.hist and freq_k96.hist files which is tab text files. How can I interpret the output files? How can I choose the optimal k-mer based on this *.hist files?

freq_k32.hist.txt
freq_k64.hist.txt
freq_k96.hist.txt

v1.2.0 does not work with FASTA files

ntCard version 1.2.0 histograms for FASTA and FASTQ files are different.
I think there is a bug in parsing FASTA files.
Here is an example:

test.fasta

>1
CACACACACAAAATCAGTACGTAGCTGATCGTACGATCGTACGATCGTAGCTAGCTAGCTGATGCTAGCTGACTGATCGTAGCTATGTAGCTGATCGATCGTGATCGATCGTACGTAGCTGATGATCGTACGTAGCTAGCTAGCTGATCGATCGATCGTACGTACGTACGTAGTCGATCGTA

histogram:

F1      0
F0      0
1       9223372036854775808
2       9223372036854775808
3       9223372036854775808
4       9223372036854775808
...

test.fastq

@1
CACACACACAAAATCAGTACGTAGCTGATCGTACGATCGTACGATCGTAGCTAGCTAGCTGATGCTAGCTGACTGATCGTAGCTATGTAGCTGATCGATCGTGATCGATCGTACGTAGCTGATGATCGTACGTAGCTAGCTAGCTGATCGATCGATCGTACGTACGTACGTAGTCGATCGTA
+1
11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111

histogram:

F1      158
F0      128
1       128
2       0
3       0
4       0
...

Speed doesn't change when --threads increases?

This is on a FASTQ file with Reads 1859724 Yield 274758053

for N in $(seq 1 18); do echo threads=$N && ntcard -t $N -k 25 -p ntcard$N R1.fq.gz ; done
threads=1
Runtime(sec): 6.8182
threads=2
Runtime(sec): 6.7579
threads=3
Runtime(sec): 6.3849
threads=4
Runtime(sec): 6.4639
threads=5
Runtime(sec): 6.6152
threads=6
Runtime(sec): 6.5880
threads=7
Runtime(sec): 6.7932
threads=8
Runtime(sec): 6.6067
threads=9
Runtime(sec): 6.6375
threads=10
Runtime(sec): 6.5650
threads=11
Runtime(sec): 6.6318
threads=12
Runtime(sec): 6.3923
threads=13
Runtime(sec): 6.5634
threads=14
Runtime(sec): 6.6389
threads=15
Runtime(sec): 6.5851
threads=16
Runtime(sec): 6.5536
threads=17
Runtime(sec): 6.6643
threads=18
Runtime(sec): 6.7836

Reg: kmer dump generation

Hi!

I scoured through the readme and I might have probably missed it, but do we have a "dump" option of sorts like from either kmc tools or jellyfish?

For example consider the output from Jellyfish dump option (and it can also output in a fasta format with the count as a header)

TATTGTCATTATTCATTTTTT 1
GTAATTTTTGTGCGACACGAA 1
GCGGGCGGTATCCAGTTCGTT 1
ATGTAGGCCCGGATGTTTGCC 1
CATACAACAAGCGACGAGCAT 1

Here 1 is the count of the kmers encountered. Would it be possible to generate an output like this as well?

The current output can then be retrieved from this dump as well.

Support Uracil nucleotides in input files

Currently, uracil nucleotides are ignored because only A,C,G,T's are supported.
'U's can simply be treated as 'T's in RNA-seq data.

Show help when no parameters?

% ntcard

ntcard: missing arguments
ntcard: missing argument -k ...
Try `ntcard --help' for more information.

It would be nicer to just show the --help if there are ZERO arguments?
it usually means someone is trying to see how to run it.

Tag a new stable release

Could you please tag a new stable release that includes the feature Added option for output to a single file?

Support for U bases not working properly

I noticed that the histograms for U's are still different from those of T's for committ 2a48882.

I noticed that seedTab was updated, but msTab33r and msTab31l haven't been updated.

Add -h alias for --help ?

I notice all the other parameters have short and long options but not help.

Request for ntCard option to specify directory for temporary files

Hi,

Could I request an option to write temporary files to a user specified location please?
I was just testing ntCard with my data and the job gave no output. The cluster reported that the hard disk quota had been reached where my input files were, although the output was supposed to be written to a folder in my home directory. I guess this implies that temporary files were being written to the project folder which is at max quota.

My script:

#! /bin/bash

#SBATCH -J "ntCard test"
#SBATCH -A b2010042
#SBATCH -t 7-00:00:00
#SBATCH -n 16
#SBATCH -p node
#SBATCH -e log.ntcard_Spruce.2017-08-08_13.00-%j.out
#SBATCH -o log.ntcard_Spruce.2017-08-08_13.00-%j.out

export PATH="~/bin/ntCard/bin:$PATH"
FQDIR=/proj/b2010042/nobackup/douglas/fosmid-pool-data/raw-data/
time ntcard -t $SLURM_NPROCS -p spruce_freq $FQDIR/*.fq.gz

Thank you for telling me about ntCard. It was nice to meet you at ISMB.
Regards,
Mahesh.

brew repo has migrated

OLD
brew install homebrew/science/ntcard

NEW
brew install brewsci/bio/ntcard

Problems w.r.t speed of ntCard and accuracy in genome size estimation

I am trying to use ntCard to estimate the genome size on an Illumina paired ended 50x sequencing data distributed among 4 FASTQ files (each of size around 14 GB, making the total data approximately 56 GB). I have observed two problems while trying to do so:

I used the command ntCard -t4 -c3000000 -k21 -p freq @files.in for my experiment. It took around 40 minutes for ntCard to finish the simulation. This means in the ideal case it would have taken 40 * 4 / 16 = 10 minutes, had my data been divided among 16 files and used 16 threads. That doesn't match with the claim in the ntCard paper where it finished simulations of 500 GB data in 6 minutes. Am I doing anything wrong? Do I need to tune the parameters in some different way? I also used the default -c1000 but it still took the same 40 minutes to finish the experiment.
I ran genomescope (after the necessary adjustments in the output file) on both the -c1000 and -c3000000 experiments to estimate the genome size. I got 2.5 Gbp and 2.8 Gbp as results (which is quite off from assembly distance 3.055 Gbp and exact histogram results 3.09 Gbp). I just wanted to confirm if I am supposed to expect the ntCard output to be slightly off like what I saw in my experiments. Or am I doing something wrong with the parameters during the ntCard simulations?

--version is a bit non-standard

%  ntcard --version
ntcard Version 1.0.0
Written by Hamid Mohamadi.
Copyright 2016 Canada's Michael Smith Genome Science Centre

prefer the first line output to be more like the GNU standard:

ntcard 1.0.0

for parsing purposes.

Bump version number to 1.0.1

With the release of version 1.0.1, the version number in configure.ac needs to be changed
See https://github.com/bcgsc/ntCard/blob/master/configure.ac#L2

Use of read names to determine input file type

getftype uses the read names to determine the file type ( 0 == fastq, 1 == fasta and 2 == sam). In practice read names should be considered free text and I think you should prabably get the file type from the extension or ask the user as a parameter. In our case since we do not use this read naming scheme it was detecting our input as fasta even though it was fastq and this lead to interger overruns.

MACOS problem

Hi,
when I run makefile in Kmergenie , I encounter a problem:

.....
checking for g++ option to support OpenMP... unsupported
configure: error: NTCARD must be compiled with a C++ compiler that supports OpenMP threading.
make: *** [ntCard/ntcard] Error 1

DO YOU HAVE ANY IDEA TO RESOLVE IT?
THANKS for your help!

Report bugs to email or github ?

--help says Report bugs to [email protected].

Do you not want them on github?

Maybe replace with https://github.com/bcgsc/ntCard/issues

Setting the verbosity arguments

Would it be possible to add verbosity argument to the CLI to repress writing to stdout? I am running this over many gene clusters so the outputs start to add up quickly.

typo: Accepatble

Current spelling is unacceptable :)

Stop automatic outputting to "freq_kNN.hist"

I was shocked that ntcard ran with only a -k parameter and wrote a file to my folder. It took me a while to realise becuase i was in a folder with lots of sequence files.

Maybe the -p option should be mandatory and not have a default

The principle of "least surprise" is useful for command line tools.

P.S. I am a hypocrite, because some of my tools have default output folders :-P

feature request: specify output prefix or name

Thanks for making ntCard, I think it's a great tool and will be very useful!

One small tweak I was hoping for is that outputs keep getting the same name, 'freq_kXX.hist'. Can you add a flag to allow this to specified by the user, so that one can keep track of histograms of several different files? Otherwise, I noticed a second binary is created in /bin - does it do something different?

Thanks again, and keep up the great work!
Cheers, R

Integer overflow on calculating kmer distribution

Running ntcard sometimes runs into integer overflow. I am not able to figure out why it only breaks sometimes, but it broke on a simple fasta run which I hope can help get to the bottom of it.

Here is the fasta file (test.fna):

>CC8_C10A0
AATGCATACATACAT

>CC8_C100A0
ATGATGATGATGATG

>CC8_C100A1
TGATGATGATGATGA

and the command I use is:
ntcard -k 5 -o res.txt test.fna

The output file all contains 9223372036854775808 on the third column.

Let me know if you need any other information.

ntcard file.gbk fails, but file.gff works?

Does ntcard support GFF files with embedded FASTA at the end?

% ntcard -k25 NC_000913.gbk
Error in reading file: NC_000913.gbk

% ntcard -k25 NC_000913.gff3
Runtime(sec): 3.1387

headerless fasta sequences seem to be ignored

(Using the tagged release 1.0.1)

FYI, it seems that if given a fasta file that has no ">" header, ntcard simply ignores the whole file.

In my case, it was obvious there was some problem, because I only had one file, and since it had no header, I got what were obviously meaningless results (a few zeros followed by a bunch of 2^63s). But it did take me some time to figure out what I was doing wrong.

Seems like the user ought to made aware of this, either a note in the readme, or a runtime warning if a headerless sequence is encountered.

It seems possible that the underlying problem might ignore some of the input but not all, in which case the user would probably be unaware of it.

In case it makes any difference, I was piping my fasta in on stdin, and using /dev/stdin as the filename.

error while running make

Hi, I've been stuck while trying to run make. This is what's happening:

make  all-am
make[1]: Entering directory `/home/myuser/ntCard-master'
g++ -DHAVE_CONFIG_H -I.  -I./Common -I/home/myuser/ntCard-master -fopenmp  -Wall -Wextra -Werror -O3 -MT ntcard-ntcard.o -MD -MP -MF .deps/ntcard-ntcard.Tpo -c -o ntcard-ntcard.o `test -f 'ntcard.cpp' || echo './'`ntcard.cpp
cc1plus: warnings being treated as errors
ntcard.cpp: In function 'int main(int, char**)':
ntcard.cpp:289: error: iteration variable 'file_i' is unsigned
ntcard.cpp:317: error: iteration variable 'k' is unsigned
make[1]: *** [ntcard-ntcard.o] Error 1
make[1]: Leaving directory `/home/myuser/ntCard-master'
make: *** [all] Error 2

Am I missing some requirement or something? Thanks for any help! :)

Jagged kmer coverage profiles with gzipped FASTA

We discovered inconsistencies in kmer histograms on two experimental ONT datasets between uncompressed and compressed FASTA input files*. In independent runs and testing different k values (16,18,20,22,25), two gzipped FASTA ONT (NA19240 [PRJEB29523] and NA12878 [SRR10965087]) read files yielded jagged and uninterpretable kmer profiles. Problem exacerbated at higher k vals. Issue observed with ntcard v1.1.1, v1.2.1 and v1.2.2.

NA12878 ONT FASTA

NA12878 ONT FASTA GZIPPED

====

NA19240 ONT FASTA

NA19240 ONT FASTA GZIPPED

*We have only observed this with FASTA files, not FASTQ files and only when using experimental nanopore data

dyld: cannot load Abort trap: 6

dyld: cannot load 'rbxfpsunlocker_osx' (load command 0x80000034 is unknown)
Abort trap: 6

This is the error code im getting when trying to run the program. I have the program in my downloads folder and followed each step closely please help.

Report f1, F0 and F1

f_c is the count of k-mers with coverage c
f₁ is the count of singleton k-mers (k-mers with coverage 1)
F₀ is the zeroth moment of f, which is Σ_i f_i, the number of distinct k-mers
F₁ is the first moment of f, which is Σ_i i f_i, the total number of k-mers in the reads (number of reads times number of k-mers per read)

If these values are reported in a tab-separated-values (TSV) table, we can use estimate.py from KmerStream to estimate the following useful parameters:

G the genome size
λ the mean k-mer coverage of the Poisson distribution
ε_k the probability that a k-mer contains an error

See the KmerStream paper for the formulae.

The TSV table should look like so, where Q is the quality trimming parameter:

k	F0	f1	F1
32	15686875710	12361159570	188878769625
64	145132117825	134005267461	161086206625
96	120458305249	111221687401	133300308154
128	21617343636	18233322390	105522711154
160	19369615797	15914061304	77772705236
192	46485745358	43797611019	50099054358
224	21147873853	20060404949	22586521835

K-mers with odd frequency having 0 counts

Hi!

I've been using ntCard v1.1.0 with success until recently. The command I have used is the following:
ntcard -p $outputbasename -t100 -k12,24,36,48,60,72,84,96,108,120,132 @$inputfile > $log
where $outputbasename is a name generated based on the input file, stored in $inputfile, which has the list of files to be processed.

The result I have got in the last files I have processed using ntCard have odd frequencies with a 0, like this (first lines of a file, I have omitted F0 and F1):

1       0
2       1839217003
3       0
4       779118217
5       0
6       640700808
7       0
8       458220499

and this (last lines of the same file):

991     0
992     125
993     0
994     39
995     0
996     31
997     0
998     65
999     0
1000    19

The only difference between the two ntCard runs is that in the first one, I used uncompressed fastq files, and in the second one, every file is compressed in gz format.

ntCard does not create output file unless -k is specified

Running

➤ ntcard  -t 32 paired_dat1.fq
Runtime(sec): 0.1228

does not output any files. However if I specify a k value:

➤ ntcard  -t 32 paired_dat1.fq -k64
Runtime(sec): 13.2829

I get the histogram output file. I believe this is a bug, since the kmer size should 64 by default, according to the help message.

integer underflow bug?

A user of RNA-Bloom had discovered a bug in ntCard version 1.2.2:
bcgsc/RNA-Bloom#43

Here are the first 10 lines from the output histogram file.

>head rnabloom_out/rnabloom_k17.hist 
F1      270
F0      0
1       9223372036854775808
2       9223372036854775808
3       9223372036854775808
4       9223372036854775808
5       9223372036854775808
6       9223372036854775808
7       9223372036854775808
8       9223372036854775808

The output is bogus.

Shouldn't F0 be >= 1 if F1 is > 1?
The values for the histogram are all 9223372036854775808. I wonder if this is an integer underflow?

Here is the exact command that was used:

ntcard -t 8 -k 17 -c 65535 -p ntcard_test filtered.fastq

I was able to replicate the exact same output as well.

problem with make

Trying to make and receive the following error:

g++ -Wall -O3 -fopenmp -Ilib -ldl -o ntcard ntcard.cpp lib/Uncompress.cpp lib/SignalHandler.cpp lib/Fcontrol.cpp
/usr/bin/ld: /tmp/cczwFT7G.o: undefined reference to symbol 'dlsym@@GLIBC_2.2.5'
/usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/libdl.so: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
Makefile:12: recipe for target 'ntcard' failed
make: *** [ntcard] Error 1

Any help would be appreciated. Thank you!

Can't build from source

Get the following compilation error when compiling ntHashIterator.hpp:
Building on Fedora 36
gcc (GCC) 12.2.1 20221121

g++ -DHAVE_CONFIG_H -I.  -I./Common -I/home/jregalado/Software/ntCard -fopenmp  -Wall -Wextra -Werror -O3 -MT ntcard-ntcard.o -MD -MP -MF .deps/ntcard-ntcard.Tpo -c -o ntcard-ntcard.o `test -f 'ntcard.cpp' || echo './'`ntcard.cpp
In file included from vendor/ntHash/ntHashIterator.hpp:6,
                 from ntcard.cpp:2:
In function ‘bool NTMC64(const char*, unsigned int, unsigned int, uint64_t&, uint64_t&, unsigned int&, uint64_t*)’,
    inlined from ‘bool NTMC64(const char*, unsigned int, unsigned int, uint64_t&, uint64_t&, unsigned int&, uint64_t*)’ at vendor/ntHash/nthash.hpp:467:13,
    inlined from ‘void ntHashIterator::init()’ at vendor/ntHash/ntHashIterator.hpp:66:53,
    inlined from ‘ntHashIterator::ntHashIterator(const std::string&, unsigned int, unsigned int)’ at vendor/ntHash/ntHashIterator.hpp:41:13,
    inlined from ‘void ntRead(const std::string&, const std::vector<unsigned int>&, uint16_t*, size_t*)’ at ntcard.cpp:151:38:
vendor/ntHash/nthash.hpp:489:17: error: array subscript 1 is outside array bounds of ‘void [8]’ [-Werror=array-bounds]
  489 |         hVal[i] =  tVal;
      |         ~~~~~~~~^~~~~~~
In constructor ‘ntHashIterator::ntHashIterator(const std::string&, unsigned int, unsigned int)’,
    inlined from ‘void ntRead(const std::string&, const std::vector<unsigned int>&, uint16_t*, size_t*)’ at ntcard.cpp:151:38:
vendor/ntHash/ntHashIterator.hpp:39:58: note: at offset 8 into object of size 8 allocated by ‘operator new []’
   39 |      m_seq(seq), m_h(h), m_k(k), m_hVec(new uint64_t[h]), m_pos(0)

Compilation blocked because of Werror

Using gcc version 9.3.0 on Ubuntu
I got a compilation problem due to a warning that blocks the compilation because of -Werror

ntcard.cpp: In function ‘void stRead(const string&, const std::vector&, uint16_t*, size_t*)’:
ntcard.cpp:166:35: error: type qualifiers ignored on cast result type [-Werror=ignored-qualifiers]
166 | ntComp((const uint64_t)(*itr)[0], t_Counter + k * opt::nSamp * opt::rBuck);

Without the -Werror flag I got
ntcard.cpp: In function ‘void stRead(const string&, const std::vector&, uint16_t*, size_t*)’:
ntcard.cpp:166:35: warning: type qualifiers ignored on cast result type [-Wignored-qualifiers]
166 | ntComp((const uint64_t)(*itr)[0], t_Counter + k * opt::nSamp * opt::rBuck);

GenomeScope does not accept hist files from ntCard

Hi,
@sjackman advised me to run ntCard on my raw data in bcgsc/abyss#178 (comment) . Are you sure the output files can be parsed by GenomeScope http://qb.cshl.edu/genomescope/ ? This is a fish genome, probably 1.7Gbp.

It complains with: File was uploaded but it has 1 column(s). The file must have 2 columns separated by a single space, which is the default in Jellyfish. I admit I used current 'git master' version.

The command was: ntcard -t "$threads" -k32,64,96,112,128,144,156 -c $num .... I used num=512 for the PE+MP files and num=1000 for the PE-only data. Probably is a good idea to omit MP-datasets as the library diversity may be low, right?

A list of files containing file names in each row can be passed with @ prefix.

Typo in --help

Accepatble file formats: fastq, fasta, sam, bam, gz, bz, zip.

=> Acceptable

bcgsc / ntcard Goto Github PK

ntcard's Introduction

ntCard

Install ntCard on macOS

Install ntCard on Linux

Compiling ntCard from GitHub

Compiling ntCard from source

Run ntCard

Publications

ntcard's People

Contributors

Stargazers

Watchers

Forkers

ntcard's Issues

Recommend Projects

Recommend Topics

Recommend Org