refresh-bio / kmc Goto Github PK
View Code? Open in Web Editor NEWFast and frugal disk based k-mer counter
Fast and frugal disk based k-mer counter
hello trying to compile KMC-3.0.1 I have some avx2 related errors.
same kind of error as mentioned in #17
rapid diggind in the code shows that avx2 is enabled on xeon proc
but some xeon does not provides avx2 support.
see for example an example of my /proc/cpuinfo
(running a docker on mac)
processor : 5
vendor_id : GenuineIntel
cpu family : 6
model : 62
model name : Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz
stepping : 4
cpu MHz : 3699.593
cache size : 10240 KB
physical id : 5
siblings : 1
core id : 0
cpu cores : 1
apicid : 5
initial apicid : 5
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc eagerfpu pni pclmulqdq dtes64 ds_cpl ssse3 cx16 xtpr pcid dca sse4_1 sse4_2 popcnt aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase erms xsaveopt arat
bugs :
bogomips : 7570.22
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
Refers to #20
I am building on linux with g++ 5.4.0 and as/binutils 2.2.8.
My build fails with:
In file included from kmc_dump/nc_utils.cpp:15:0:
kmc_dump/nc_utils.h: In static member function 'static int CNumericConversions::Double2PChar(double, int, uchar*)':
kmc_dump/nc_utils.h:124:22: error: 'modf' is not a member of 'std'
double fractPart = std::modf(val, &ipart);
^
kmc_dump/nc_utils.h:124:22: note: suggested alternative:
In file included from /usr/include/features.h:346:0,
from /global/common/genepool/usg/languages/gcc/5.4.0/include/c++/5.4.0/x86_64-unknown-linux-gnu/bits/os_defines.h:39,
from /global/common/genepool/usg/languages/gcc/5.4.0/include/c++/5.4.0/x86_64-unknown-linux-gnu/bits/c++config.h:482,
from /global/common/genepool/usg/languages/gcc/5.4.0/include/c++/5.4.0/string:38,
from kmc_dump/nc_utils.h:14,
from kmc_dump/nc_utils.cpp:15:
/usr/include/bits/mathcalls.h:116:1: note: 'modf'
__MATHCALL (modf,, (Mdouble __x, Mdouble *__iptr));
Any idea what is happening? Thx...
Hi
I am willing to install KMC as part of the required dependencies for IVA and I am struggling with the installation of KMC on my Mac.
#1. I set the new path of gcc in the makefile_mac
#2. Considering the following error message:
I removed the -fopenmp option from makefile_mac
#3. I reran make -f makefile_mac but I finally got this error I cannot fix (sorry for that...)
Any advices would be very appreciate
thanks ++
a
Hi, I need to count k-mers for each read in a metagenomic FASTQ or FASTA file.
Is it possible to get counts for each single read of the file instead of the whole file?
@marekkokot at
https://github.com/refresh-bio/KMC/blob/master/kmc_tools/dump_writer.h#L142
should be counter = counter_max?
I'm trying to use the C++ API but when I include kmc_file.h and compile, I get the following compilation error.
Compile command
g++ read_db.cpp -o read_db
#include <iostream>
#include "../KMC-3.0.1/kmc_api/kmc_file.h"
int main(){
CKMCFile kmer_database;
return 0;
}
Error,
Compilation started at Thu Apr 13 13:59:49
g++ read_kmc.cpp -o read_kmc
In file included from read_kmc.cpp:2:
In file included from ./../KMC-3.0.1/kmc_api/kmc_file.h:14:
./../KMC-3.0.1/kmc_api/kmer_defs.h:36:11: fatal error: 'ext/algorithm' file not found
#include <ext/algorithm>
^
1 error generated.
Compilation exited abnormally with code 1 at Thu Apr 13 13:59:49
gcc version 6.3.0
OSX El Capitan 10.11.6
% /home/sw/kmc/3.0.1/bin/kmc
K-Mer Counter (KMC) ver. 3.0.0 (2017-01-28)
Usage:
I've been testing out KMC with the following dummy sequence
>dummy
AATGGGTCCCTGTTTCGCGATAAAATGCCAATCGCTCTAAATATCGCGCTAGC
with the command kmc -ci0 -fm -k3 -cs300 dummy_genome.fa dummy_kmc kmc_temp
The result is
Stage 1: 100%
1st stage: 0.002018s
2nd stage: 0.001863s
Total : 0.003881s
Tmp size : 0MB
Stats:
No. of k-mers below min. threshold : 0
No. of k-mers above max. threshold : 0
No. of unique k-mers : 25
No. of unique counted k-mers : 25
Total no. of k-mers : 51
Total no. of sequences : 1
Total no. of super-k-mers : 0
while the sequence actually has 33 unique 3-mers.
I'm compiling with gcc 6.3.0. Maybe I'm doing something wrong...
Add option to ignore untrustworthy kmers in reads, with "untrustworthy" defined as "having more than this many bases below a given quality".
This is not an issue but do you have any interesting to develop a python binding?
Thank you
Hello,
I'm trying to use KMC v3 with reads previously corrected with BFC. However, KMC stops during stage 2, there is no warning or error message, and the stats table shows only 0s. I ran Jellyfish v2 with the same corrected reads without a problem. Below are the commands that I'm using.
Correct reads
bash -c "bfc -s 200m -k33 -t 16 <(seqtk mergepe reads_1.fastq.gz reads_2.fastq.gz) <(seqtk mergepe reads_1.fastq.gz reads_2.fastq.gz) | gzip -1 > bfc-corrected.fastq.gz"
Count k-mers
kmc -k21 -ci2 -m100 -t12 -v bfc-corrected.fastq.gz bfc-corrected_kmc3 ./tmp
This is an example of a read pair after BFC correction:
@E00476:214:HHLTNALXX:8:1101:21217:1186 ec:Z:0_0:104_0_3:0_0
aTAACATATAATGTTTTTAAATAAATTTTAATTTAATTGGAATACTTATTTATTCAATAAAATTATTAACAATAATTTACCTCTATTTTGGTTTCAATTAAATAAATTTATAgAGAAATAaTAAATAAATAAAGCTTCTAACTTTATAATA
+
&???????????????????????????????????????+??????+??????++??+++???+???????????????+???+?????+????++??+++?+++??????%++++???%++?????+?+???+????+?++????++??
@E00476:214:HHLTNALXX:8:1101:21217:1186 ec:Z:0_0:103_0_3:0_0
aTATATTTTTGTTTATTATTTTAAGTATAGGTTAATTGAAGAATTATTTAATTTATTAAAATTAGATTATTTTGTTTATTATAAAATATTTTATTTTTTTTTTATAATTATAATTTTTTATTATTTTTTATTTgATTAAAATaTATGAATA
+
&?????????????????????????????????????????????????????++????????????????????+??????????+?????????????????++?++++++?????++????????++??%+??++++?#???+++?+
I would really appreciate any help.
Hi,
Here are some examples that should work but produce errors
operations_definition_file_1 (with dot)
INPUT:
sample0 = sample0.31mers
sample1 = sample1.31mers
OUTPUT:
samples_union.31mers=sample0+sample1
kmc_tools complex
Error: wrong line format, line: 5
operations_definition_file_2 (with slash)
INPUT:
sample0 = sample0.31mers
sample1 = sample1.31mers
OUTPUT:
samples_union/samples_union=sample0+sample1
kmc_tools complex
Error: wrong line format, line: 5
I was expecting it would write gzipped out when it took gzipped input, but that doesn't seem to be the case. Is there a switch to turn on gzipped output?
-- that was not a bug. sorry for the issue! --
Setting Max. counter value equal to my UINT_MAX
, AKA -cs4294967295
shows:
********** Used parameters: **********
[...]
Max. counter value : 4294967295
[...]
However, the max counter remains at the default of 255:
$ cat my_reads.32mers | cut -f2 | sort -n | tail - n1
255
Setting this parameter works up to at least 1000000, which exceeds my USHRT_MAX
, so it's not clear what the actual limit is.
I am using kmc to check for rare k-mers before genome assembly. I was wondering if there's a way to mask rare k-mers (replace with N) instead of filtering them out or trimming them. Filtering leads to losing more data than needed. Trimming leads to reads of unequal lengths, which makes it difficult to detect positional biases in the reads, if any, after removing rare k-mers.
There appears to be issues with OSX 10.11.5. As an example, the precompiled version 2.3 on OSX 10.11.5 fails on this input file:
https://www.dropbox.com/s/dipykcat21aepi8/reads.fa
using the command:
kmc -fa -k60 -sf1 -ci10 -cs100000000 -cx100000000 reads.fa foo .
gives the error "Error: Cannot open temporary file ... ".
But it works fine on OSX 10.8.5. (See also sanger-pathogens/iva/issues/63, looks like the homebrew version and compiling from source also fails).
Hi
We are working on analysis of Bioinformatics tools (related to Kmer counting) and KMC is one of them. We have gone through readme file and it is very helpful. As we are doing analysis so we want to be very sure about details. So it would be great if you help us validating below details of KMC.
Data structure and Sorting Algo:
Array, Priority queue, Radix sort, Counting sort
Approach:
Two disk based, Modified minimum sub-string partitioning (signature).
The limit of k-size : less than 257
Supports online k-mer frequency retrieval : No
Supports compressed file processing : Yes
Thanks
Tarang
Hello, ran into this problem with make
today:
g++ -Wall -O3 -m64 -static -Wl,--whole-archive -lpthread -Wl,--no-whole-archive -std=c++11 -c kmer_counter/kmer_counter.cpp -o kmer_counter/kmer_counter.o
In file included from /usr/local/include/assert.h:5:0,
from /usr/include/c++/5/cassert:43,
from kmer_counter/radix.h:13,
from kmer_counter/kb_collector.h:18,
from kmer_counter/kmc.h:26,
from kmer_counter/kmer_counter.cpp:18:
/usr/local/include/except.h:15:32: error: conflicting declaration ‘typedef struct Except_Frame_T* Except_Frame_T’
typedef struct Except_Frame_T *Except_Frame_T;
^
/usr/local/include/except.h:15:16: note: previous declaration as ‘struct Except_Frame_T’
typedef struct Except_Frame_T *Except_Frame_T;
^
/usr/local/include/except.h:17:18: error: field ‘prev’ has incomplete type ‘Except_Frame_T’
Except_Frame_T prev;
^
/usr/local/include/except.h:16:8: note: definition of ‘struct Except_Frame_T’ is not complete until the closing brace
struct Except_Frame_T {
^
makefile:79: recipe for target 'kmer_counter/kmer_counter.o' failed
make: *** [kmer_counter/kmer_counter.o] Error 1
kmc --help
kmc -k27 -m24 NA19238.fastq NA.res \data\kmc_tmp_dir\
kmc -k27 -m24 @files.lst NA.res \data\kmc_tmp_dir\
Should the \
be /
?
@marekkokot The filter command does not seem to handle correctly fasta files where the sequence is split across multiple lines. Also, if the file is too large, it fails with "Wrong file" message.
Any suggestions?
g++-5 -Wall -O3 -m64 -static -Wl,--whole-archive -lpthread -Wl,--no-whole-archive -std=c++14 -c kmc_tools/
percent_progress.cpp -o kmc_tools/percent_progress.o
In file included from kmc_dump/nc_utils.cpp:15:0:
kmc_dump/nc_utils.h: In static member function 'static int CNumericConversions::Double2PChar(double, int,
uchar*)':
kmc_dump/nc_utils.h:124:22: error: 'modf' is not a member of 'std'
double fractPart = std::modf(val, &ipart);
^
kmc_dump/nc_utils.h:124:22: note: suggested alternative:
In file included from /home/linuxbrew/.linuxbrew/include/features.h:368:0,
from /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_1/include/c++/5.5.0/x86_64-unknown-linux
-gnu/bits/os_defines.h:39,
from /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_1/include/c++/5.5.0/x86_64-unknown-linux
-gnu/bits/c++config.h:489,
from /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_1/include/c++/5.5.0/string:38,
from kmc_dump/nc_utils.h:14,
from kmc_dump/nc_utils.cpp:15:
/home/linuxbrew/.linuxbrew/include/bits/mathcalls.h:115:1: note: 'modf'
__MATHCALL (modf,, (_Mdouble_ __x, _Mdouble_ *__iptr)) __nonnull ((2));
^
make: *** [kmc_dump/nc_utils.o] Error 1
make: *** Waiting for unfinished jobs....
In file included from kmc_dump/kmc_dump.cpp:17:0:
kmc_dump/nc_utils.h: In static member function 'static int CNumericConversions::Double2PChar(double, int, uchar*)':
kmc_dump/nc_utils.h:124:22: error: 'modf' is not a member of 'std'
double fractPart = std::modf(val, &ipart);
^
kmc_dump/nc_utils.h:124:22: note: suggested alternative:
kmc_api/kmc_file.cpp: In member function 'bool CKMCFile::BinarySearch(int64, int64, const CKmerAPI&, uint64&, uint32)':
kmc_api/kmc_file.cpp:1360:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
if (index_start >= total_kmers)
^
In file included from /home/linuxbrew/.linuxbrew/include/features.h:368:0,
from /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_1/include/c++/5.5.0/x86_64-unknown-linux-gnu/bits/os_defines.h:39,
from /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_1/include/c++/5.5.0/x86_64-unknown-linux-gnu/bits/c++config.h:489,
from /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_1/include/c++/5.5.0/iostream:38,
from kmc_dump/kmc_dump.cpp:15:
/home/linuxbrew/.linuxbrew/include/bits/mathcalls.h:115:1: note: 'modf'
__MATHCALL (modf,, (_Mdouble_ __x, _Mdouble_ *__iptr)) __nonnull ((2));
^
make: *** [kmc_dump/kmc_dump.o] Error 1
K-Mer Counter (KMC) ver. 3.0.0 (2017-01-28)
https://github.com/marekkokot/KMC/archive/v3.0.1.tar.gz
I think you have fixed it in github KMC_VER
but haven't made a new release.
Which will be 3.0.2 then :-)
I have use-cases where I have a very large FASTQ file and wish to run kmc
on, but I don't want it to read the whole file, as I only need the results for some estimations.
Would you be able to add an option that stopped processing after -nr <value>
reads (or -nk <value>
kmers?
When a kmer in a read includes IUPAC ambiguity codes, add all possible concretizations of that kmer.
Hi, I was wondering if we have a point release incorporating the recent updates. The version on our uni's compute cluster won't be updated otherwise (policy) and several of us here are dependent on the new masking feature in kmc_tools.
Dear marekkokot,
I am new to Linux, so this problem may look like sully. I download the KMC3 file and make it. In the bin fold, I could see the three files: kmc, kmc_dump and kmc_tools. But when I run the command line : /home/niu/KMC-3.0.1/bin/kmc -k20 reads.fq kmers1 tmp. I got the error like: Error: Cannot open temporary file tmp/kmc_00000.bin. I tried several times and got the same error. I also chenked my tmp, it still has 50G space. So could you help me to figure this problem? Thank you very much.
Best,
Tim
[DISCLAIMER: I am not a lawyer and the following are only my interpretations of the licensing terms -- hence no legal advice but only well-intended suggestions/remarks.]
The README.md states that
KMC software distributed under GNU GPL 2 licence.
yet it uses asmlib
(optionally) and vectorclass
which are both GPL-3.0+ licensed.
Sadly GPL-2.0 and GPL-3.0 are not compatible, see
https://www.gnu.org/licenses/gpl-faq.html#AllCompatibility.
Hence, to use vectorclass
, KMC would have to be made available via GPLv3, i.e., licensed under one of GPL-2.0+, GPL-3.0, or GPL-3.0+.
Furthermore I also find the following from KMC's readme problematic/misleading:
Note: asmlib is free only for non commercial purposes. If needed, you can contact the author of asmlib or compile KMC without asmlib.
Note: for commercial usage of asmlib follow the instructions in 'License conditions' (http://www.agner.org/optimize/asmlib-instructions.pdf) or compile KMC without asmlib. In case of doubt, please consult the original documentations.
vcl is under the licence GNU GPL 3 or higher Node: for commercial usage of vcl follow the instructions in 'License' section (http://www.agner.org/optimize/vectorclass.pdf)
But as asmlib
/vectorclass
can be used in terms of the GPL-3.0, no restrictions concerning commercial/non-commercial usage should be applicable, see
https://www.gnu.org/licenses/gpl-faq.html#NoMilitary
and
https://www.gnu.org/licenses/gpl.html#section7.
(IMO, the use of the sole term "Commercial licenses" from the asmlib
and vectorclass
license texts is also misleading, as they (to me) kind of suggest the interpretation GPL=free=non-commercial which is wrong. "Alternative custom/proprietary/??? license" might have been a better choice...)
Just for reference/context the license information for asmlib
and vectorclass
:
From http://www.agner.org/optimize/asmlib-instructions.pdf:
10 License conditions
These software libraries are free: you can redistribute the software and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the license, or any later version.Commercial licenses are available on request to www.agner.org/contact.
This software is distributed in the hope that it will be useful, but without any warranty. See
the file license.txt or www.gnu.org/licenses for the license text.
From http://www.agner.org/optimize/vectorclass.pdf:
License
The VCL vector class library has a dual license system. You can use it for free in
open source software, or pay for using it in proprietary software.You are free to copy, use, redistribute and modify this software under the terms of
the GNU General Public License as published by the Free Software Foundation,
version 3 or any later version. See the file license.txt.Commercial licenses are available on request.
Hi,
I was using KMC (3.0.0) for producing kmers of the unitig files (generated by BCALM). The union of the databases produced a smaller resulting data-base.
Is it an anomaly?
Datasets:
The unitigs for paired-end (_1 and _2) files were generated using BCALM.
Command to produce individual kmer data-bases for the two files:
./KMC/bin/kmc -k63 -r -ci1 -fa SRR1291024.unitigs.fa SRR1291024.kmers .
./KMC/bin/kmc -k63 -r -ci1 -fa SRR1291024.unitigs.fa SRR1291070.kmers .
Command to produce union:
./KMC/bin/kmc_tools simple SRR1291024.kmers -ci1 SRR1291070.kmers -ci1 union kmers_superset -ci1
Size of the resultant individual data-bases:
SRR1291024.kmers.kmc_pre (66M)
SRR1291024.kmers.kmc_suf (40G)
SRR1291070.kmers.kmc_pre (66M)
SRR1291070.kmers.kmc_suf (40G)
Size of the resultant union data-base:
kmers_superset.kmc_pre (33M)
kmers_superset.kmc_suf (39G)
Are the results correct?
@marekkokot Is there a way to set all kmer counts in a database to a specific value? If not, would that be hard to add? Use case: I have a set of samples, and a kmc kmer database for each sample. I want to make a database that, for each kmer, records which samples have it (basically a colored de Bruijn graph). If there are <=64 samples I can assign to kmers from sample i the count 2^i, then the sum of counts gives the set of samples . If 64<n_samples<=128 can represent this with two kmc kmer databases per sample, etc.
Hi, I'm not sure I understand how you split a super k-mer in (k,x)-mers (I guess x=1 is too simple and not enough explanatory) and why the subset are non-overlapping.
Could you kindly provide a pratical example with the real x
value you use in the program? (it is 3?)
Best regards
Could KMC
progress/warnings/errors be sent to STDERR
instead of STDOUT
. This would then mean that KMC
follows Linux conventions and makes pipelining KMC
commands easier and more intuative.
kmc_tools
sends progress to STDOUT
:
$ kmc_tools transform my_kmer_db -ci4 dump /dev/stdout | head
in1: 0% AAAAAAAAAAATGATGGGCATTTTAGAAGGGCATTTCAGGTTCATTGAAAAATTATTTTAGTAACCCTAGT 10
AAAAAAAAAAATGATGGGCATTTTAGAAGGGCATTTCAGGTTCATTGAAAATTATTTTAGTAAACCCTAGT 12
AAAAAAAAAATGATGGGCATTTTAGAAGGGCATTTCAGGTTCATTGAAAAATTATTTTAGTAACCCTAGTT 10
AAAAAAAAAATGATGGGCATTTTAGAAGGGCATTTCAGGTTCATTGAAAATTATTTTAGTAAACCCTAGTT 11
AAAAAAAAACCCTAGTCATTTTATCCTAACCTAACGCAGTCGTTAGCTTCGATCCAAAATCCCCTATTGTT 15
AAAAAAAAACGTCCATGACCATTGGTCGTCTAACAGCCACACTGGTAGCTAGTCTTGTACTCCATGCAAAT 16
AAAAAAAAACTAGGAAAAAAATAGACCACAAACAGAGTGGACATCAACTTAGATGTGACATAACTATGTCA 11
AAAAAAAAACTAGGACAAAAAAATAGACCACAAACAGAGTGGACATCAACTTAGATGTGACATAACTATGT 11
AAAAAAAAACTAGGGTTTCGTAGTAGCAATCTTCGCACTCCGGAAATTCTACCGAGGCAAACAATAACTAT 12
AAAAAAAAAGAAAAGAAAAGGTTAGCTACAGACGTGTGATGAATCAAGTGCTTGAGCTAGTTAGCTTTGTT 12
This means that to put kmc_tools
into a pipeline, you need to either
$ kmc_tools transform my_kmer_db -ci4 dump /dev/stderr 2>&1 > /dev/null | head
kmc_tools
to not report progress:$ kmc_tools -hp transform my_kmer_db -ci4 dump /dev/stdout | head
In Linux it is convention to send progress/warnings/errors to STDERR
and have results etc sent to STDOUT
. This is so that the expected output of a command can be easily piped into another command (assuming no seeking is required). This is very powerful and can be used to avoid disk IO.
Therefore a change to sending progress/errors/warnings to STDERR
would allow a more simplified approach to pipelining KMC
commands:
kmc_tools transform my_kmer_db -ci4 dump /dev/stdout 2>progress.log | head
Issue #23 was where this was originally raised.
Please could you implement a check for the existence (or create) of the directories specified on the command line for <output_file_name>
and <working_directory>
.
As a user it is frustrating for KMC to spend many minutes or hours doing computation only for it to fail because the directory I specified for the working directory did not exist. Similarly for the parent directory I specify for the output file name.
The help for kmc_dump
states that -ci
excluded kmers occurring less than the specified number of time and -cx
excludes kmers occurring more than the specified number of times. So to get the kmers which occur exactly 10 times I should be able to specify -ci10 -cx10
. However, this returns nothing.
If I specify -ci10 -cx11
, as expected I get a list of kmers occurring 10 or 11 times.
@marekkokot Is there a test suite you use to verify correctness of kmc and kmc_tools? If there is, could it be checked into github?
Could you please tag the 2.2 release in GitHub? 2.1.1 is the latest release tagged here, but the KMC web site says that the current release is 2.2. Thanks.
Hi there.
I am currently trying to use KMC to count 36-mers in a bench of files I have downloaded from the SRA and for a lot of them, KMC just returns me the following error:
********Error: Wrong input file!
An example is the file SRR1047856 from the SRA. On one computer with Ubuntu 15.04, I have downloaded its corresponding SRA file and extracted the FASTA file out of it. Then, I ran the command:
/home/gholley/KMC/bin/kmc -k36 -ci3 -fa SRR1047856.fasta SRR1047856_comp .
and obtained in return:
********Error: Wrong input file!
I tried different parameters for k and ci. I tried limiting the number of threads and the RAM-only mode as well as extracting the FASTQ file from the SRA file instead of the FASTA file. Same error.
I though that my SRA file might have been corrupted during the download so I re-downloaded directly the FASTA file from the SRA on a different computer (with Ubuntu 16.10). My local KMC branch was up to date with this git repository. I tried the same command and I obtained the same error.
Any help with this would be welcome :)
Thank you!
Best, Guillaume.
I am building on linux with g++ 5.4.0 and as/binutils 2.2.8.
My build fails with:
In file included from kmc_dump/nc_utils.cpp:15:0:
kmc_dump/nc_utils.h: In static member function 'static int CNumericConversions::Double2PChar(double, int, uchar*)':
kmc_dump/nc_utils.h:124:22: error: 'modf' is not a member of 'std'
double fractPart = std::modf(val, &ipart);
^
kmc_dump/nc_utils.h:124:22: note: suggested alternative:
In file included from /usr/include/features.h:346:0,
from /global/common/genepool/usg/languages/gcc/5.4.0/include/c++/5.4.0/x86_64-unknown-linux-gnu/bits/os_defines.h:39,
from /global/common/genepool/usg/languages/gcc/5.4.0/include/c++/5.4.0/x86_64-unknown-linux-gnu/bits/c++config.h:482,
from /global/common/genepool/usg/languages/gcc/5.4.0/include/c++/5.4.0/string:38,
from kmc_dump/nc_utils.h:14,
from kmc_dump/nc_utils.cpp:15:
/usr/include/bits/mathcalls.h:116:1: note: 'modf'
__MATHCALL (modf,, (_Mdouble_ __x, _Mdouble_ *__iptr));
Any idea what is happening? Thx...
OK, so I have a little bit of an issue with KMC 3.0.1 on a Linux system. I have multiple fasta files (let's call them F1.fasta, F2.fasta, ..., Fn.fasta) which contain multiple genes each and I ran
kmc -k15 -fm -ci1 -cs1677215 F1.fasta F1.fasta temp/
kmc_dump F1.fasta F1.fasta.15.kmrs
This counts the 15mers within each fasta file. I then ran
cat F*.fasta > all.fasta
kmc -k15 -fm -ci1 -cs1677215 all.fasta all.fasta temp/
kmc_dump all.fasta all.fasta.15.kmrs
This concatenates all the fasta files together and counts the 15mers in there. Now, there are a set of 15mers that are found in the individual fasta files, let's call one of these kmer X, that isn't found in the all.fasta file. This is kind of baffling me as it should't be possible for that to happen. How can a kmer be found in an individual fasta file, but not when we concatenate the fasta files together?
I have a total of about 5500 fasta files and X appears in them <1 time (typically).
To dig even further, I ran KMC 2.3.0 on the same all.fasta file and got different results. Those results were more inline with those of the individual KMC 3.0.1 runs (X was found in the KMC 2.3.0 run). Additionally, I should note that both KMC 2.3.0 and KMC 3.0.1 find the same number of unique 15mers, however, the 15mers that are flip flopped around (a total of 5 15mers are flip flopped) do not have the same counts. This makes me think there may be an issue with the way a kmer is getting encoded inside the database and then getting decoded in the dump. IE, if I decoded the DB to produce 15mer X, it wasn't X that was encoded there to begin with, rather it was some other 15mer Y (or stated differently encode(Y) = E, decode(E) = X). In any case, something changed between 2.3.0 and 3.0.1 (possibly 3.0.0) to produce this result.
I have the all.fasta file that was used to produce the above results. It's 550MB in size (165MB compressed). Github won't let me attach it here, so if you need it, please do ask (maybe I could email it to you?).
One final note, I did test this on the executable that you offer on your website (http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=kmc&subpage=download) which is stated as 3.0. I got the odd results which made me go and compile 3.0.1 from scratch (I didn't see an available executable on your GitHub) to have the same results. Because of the bug that has the 3.0.1 kmc executable still printing 3.0.0 as its version, I'm not sure which version you have on your website. But if it is 3.0.0, I did test that as well. If not, then I did not test 3.0.0. That said, from a user perspective, please fix that little bug with the 3.0.0 on your next release (I'm sure you have already as you closed it the request for this; from a user perspective, it's really annoying not knowing what version we're actually on).
Any insight on this issue would be greatly appreciated!
I am looking to use KMC to filter rare k-mers pre-assembly and was wondering if there's a way to tell it to ignore N
s in the reads (which could be uncalled bases or masked low-quality bases). Maybe KMC automatically does that?
Bug reported by mail:
Is there a limit to the length of sequences in a fasta file for the 'kmc' command?
I run this command
kmc -k25 -ci1 -fa input/test.fasta output/test.res work
When the fasta file contains one sequence of 50,000 'A's, the program completes.
When the fasta file contains one sequence of 60,000 'A's, the programs halts with the message 'Error: Wrong input file!'.So, I conclude that sequence lengths have a limit between 50,000 and 60,000 characters. Is that correct?
The limit is not strict (it depents on couple of factors, yet it is enough for short reads)
The workaround in reported case is to use -fm (multifasta format), but in general long reads should be also supported in fasta and fastq format.
The example in the manual gives "terminate called after throwing an instance of std::regex_error"
Same error for other examples I've tried. What would be the simplest example of the operations file that should work? @marekkokot
Running KMC produces two output files: XXX.kmc_pre
and XXX.kmc_suf
.
Could you please add an option to not create/write or keep these files?
For example, often we only want the summary Stats:
.
According to help:
-ci<value> - print k-mers occurring less than <value> times
-cx<value> - print k-mers occurring more of than <value> times
However, it seems like -ci<value>
prints k-mers occuring greater than <value>
times. Same for -cx
.
0635dae seems to be matching the release 2.3 which is announced on the homepage but is not available in the release section of the repository.
What is the best way to obtain a count of specified collection of kmers. Currently, I do a 'dump' and then extract the ones I want. Is there a better way?
Thanks!
Hi,
I tried to install KMC ,but there are some problems occured when I used the command "make DISABLE_ASMLIB=true" and I don't know how to solve this, could you give me some advice?
Best,
############################################################################
make DISABLE_ASMLIB=true
g++ -Wall -O3 -m64 -static -Wl,--whole-archive -lpthread -Wl,--no-whole-archive -std=c++11 -DDISABLE_ASMLIB -mavx2 -mfma -fabi-version=0 -c kmer_counter/raduls_avx2.cpp -o kmer_counter/raduls_avx2.o
/tmp/cckJQjTJ.s: Assembler messages:
/tmp/cckJQjTJ.s:27711: Error: no such instruction:vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:27714: Error: no such instruction:
vextracti128 $0x1,%ymm0,16(%rcx)'
/tmp/cckJQjTJ.s:27716: Error: no such instruction:vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:27718: Error: no such instruction:
vextracti128 $0x1,%ymm0,48(%rcx)'
/tmp/cckJQjTJ.s:27823: Error: no such instruction:vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:27829: Error: no such instruction:
vextracti128 $0x1,%ymm0,16(%rax)'
/tmp/cckJQjTJ.s:27831: Error: no such instruction:vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:27833: Error: no such instruction:
vextracti128 $0x1,%ymm0,48(%rax)'
/tmp/cckJQjTJ.s:36827: Error: no such instruction:vinserti128 $0x1,%xmm1,%ymm0,%ymm0' /tmp/cckJQjTJ.s:36829: Error: no such instruction:
vextracti128 $0x1,%ymm0,16(%rdx)'
/tmp/cckJQjTJ.s:36900: Error: no such instruction:vinserti128 $0x1,%xmm1,%ymm0,%ymm0' /tmp/cckJQjTJ.s:36902: Error: no such instruction:
vextracti128 $0x1,%ymm0,16(%rax)'
/tmp/cckJQjTJ.s:41669: Error: no such instruction:vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0' /tmp/cckJQjTJ.s:41670: Error: suffix or operands invalid for
vpaddq'
/tmp/cckJQjTJ.s:41672: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rdx,%rax)' /tmp/cckJQjTJ.s:42925: Error: no such instruction:
vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0'
/tmp/cckJQjTJ.s:42926: Error: suffix or operands invalid forvpaddd' /tmp/cckJQjTJ.s:42928: Error: no such instruction:
vextracti128 $0x1,%ymm0,16(%rdx,%rax)'
/tmp/cckJQjTJ.s:46028: Error: no such instruction:vinserti128 $0x1,%xmm1,%ymm0,%ymm0' /tmp/cckJQjTJ.s:46031: Error: no such instruction:
vextracti128 $0x1,%ymm0,16(%rax)'
/tmp/cckJQjTJ.s:46033: Error: no such instruction:vinserti128 $0x1,%xmm1,%ymm0,%ymm0' /tmp/cckJQjTJ.s:46035: Error: no such instruction:
vextracti128 $0x1,%ymm0,48(%rax)'
/tmp/cckJQjTJ.s:46852: Error: no such instruction:vinserti128 $0x1,%xmm2,%ymm3,%ymm2' /tmp/cckJQjTJ.s:46853: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:46855: Error: no such instruction:vextracti128 $0x1,%ymm2,16(%rax)' /tmp/cckJQjTJ.s:46857: Error: no such instruction:
vextracti128 $0x1,%ymm0,48(%rax)'
/tmp/cckJQjTJ.s:46913: Error: no such instruction:vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:46919: Error: no such instruction:
vextracti128 $0x1,%ymm0,16(%rax)'
/tmp/cckJQjTJ.s:46921: Error: no such instruction:vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:46923: Error: no such instruction:
vextracti128 $0x1,%ymm0,48(%rax)'
/tmp/cckJQjTJ.s:47314: Error: no such instruction:vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0' /tmp/cckJQjTJ.s:47315: Error: suffix or operands invalid for
vpaddq'
/tmp/cckJQjTJ.s:47317: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rdx,%rax)' /tmp/cckJQjTJ.s:48587: Error: no such instruction:
vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0'
/tmp/cckJQjTJ.s:48588: Error: suffix or operands invalid forvpaddd' /tmp/cckJQjTJ.s:48590: Error: no such instruction:
vextracti128 $0x1,%ymm0,16(%rdx,%rax)'
/tmp/cckJQjTJ.s:50550: Error: no such instruction:vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:52688: Error: no such instruction:
vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0'
/tmp/cckJQjTJ.s:52689: Error: suffix or operands invalid forvpaddq' /tmp/cckJQjTJ.s:52691: Error: no such instruction:
vextracti128 $0x1,%ymm0,16(%rdx,%rax)'
/tmp/cckJQjTJ.s:53960: Error: no such instruction:vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0' /tmp/cckJQjTJ.s:53961: Error: suffix or operands invalid for
vpaddd'
/tmp/cckJQjTJ.s:53963: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rdx,%rax)' /tmp/cckJQjTJ.s:57720: Error: no such instruction:
vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0'
/tmp/cckJQjTJ.s:57721: Error: suffix or operands invalid forvpaddq' /tmp/cckJQjTJ.s:57723: Error: no such instruction:
vextracti128 $0x1,%ymm0,16(%rdx,%rax)'
/tmp/cckJQjTJ.s:58988: Error: no such instruction:vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0' /tmp/cckJQjTJ.s:58989: Error: suffix or operands invalid for
vpaddd'
/tmp/cckJQjTJ.s:58991: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rdx,%rax)' /tmp/cckJQjTJ.s:62058: Error: no such instruction:
vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0'
/tmp/cckJQjTJ.s:62059: Error: suffix or operands invalid forvpaddq' /tmp/cckJQjTJ.s:62061: Error: no such instruction:
vextracti128 $0x1,%ymm0,16(%rdx,%rax)'
/tmp/cckJQjTJ.s:63323: Error: no such instruction:vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0' /tmp/cckJQjTJ.s:63324: Error: suffix or operands invalid for
vpaddd'
/tmp/cckJQjTJ.s:63326: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rdx,%rax)' /tmp/cckJQjTJ.s:65465: Error: no such instruction:
vinserti128 $0x1,%xmm1,%ymm0,%ymm0'
/tmp/cckJQjTJ.s:65467: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:66050: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:66052: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:66091: Error: no such instruction:
vinserti128 $0x1,%xmm1,%ymm0,%ymm0'
/tmp/cckJQjTJ.s:66093: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:66451: Error: no such instruction:
vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0'
/tmp/cckJQjTJ.s:66452: Error: suffix or operands invalid forvpaddq' /tmp/cckJQjTJ.s:66454: Error: no such instruction:
vextracti128 $0x1,%ymm0,16(%rdx,%rax)'
/tmp/cckJQjTJ.s:67714: Error: no such instruction:vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0' /tmp/cckJQjTJ.s:67715: Error: suffix or operands invalid for
vpaddd'
/tmp/cckJQjTJ.s:67717: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rdx,%rax)' /tmp/cckJQjTJ.s:70575: Error: no such instruction:
vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0'
/tmp/cckJQjTJ.s:70576: Error: suffix or operands invalid forvpaddq' /tmp/cckJQjTJ.s:70578: Error: no such instruction:
vextracti128 $0x1,%ymm0,16(%rdx,%rax)'
/tmp/cckJQjTJ.s:71836: Error: no such instruction:vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0' /tmp/cckJQjTJ.s:71837: Error: suffix or operands invalid for
vpaddd'
/tmp/cckJQjTJ.s:71839: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rdx,%rax)' /tmp/cckJQjTJ.s:73822: Error: no such instruction:
vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0'
/tmp/cckJQjTJ.s:73823: Error: suffix or operands invalid forvpaddq' /tmp/cckJQjTJ.s:73825: Error: no such instruction:
vextracti128 $0x1,%ymm0,16(%rdx,%rax)'
/tmp/cckJQjTJ.s:75091: Error: no such instruction:vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0' /tmp/cckJQjTJ.s:75092: Error: suffix or operands invalid for
vpaddd'
/tmp/cckJQjTJ.s:75094: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rdx,%rax)' /tmp/cckJQjTJ.s:81696: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:81698: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:81760: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:81762: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:81832: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:81834: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rdx)' /tmp/cckJQjTJ.s:81909: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:81911: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:83185: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:83187: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:83249: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:83251: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:83323: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:83325: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rsi)' /tmp/cckJQjTJ.s:83399: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:83401: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:93800: Error: no such instruction:
vinserti128 $0x1,%xmm2,%ymm3,%ymm2'
/tmp/cckJQjTJ.s:93801: Error: no such instruction:vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:93803: Error: no such instruction:
vextracti128 $0x1,%ymm2,16(%rax)'
/tmp/cckJQjTJ.s:93805: Error: no such instruction:vextracti128 $0x1,%ymm0,48(%rax)' /tmp/cckJQjTJ.s:93896: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:93902: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:93904: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:93906: Error: no such instruction:vextracti128 $0x1,%ymm0,48(%rax)' /tmp/cckJQjTJ.s:94006: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:94009: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rcx)' /tmp/cckJQjTJ.s:94011: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:94013: Error: no such instruction:vextracti128 $0x1,%ymm0,48(%rcx)' /tmp/cckJQjTJ.s:94117: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:94123: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:94125: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:94127: Error: no such instruction:vextracti128 $0x1,%ymm0,48(%rax)' /tmp/cckJQjTJ.s:95555: Error: no such instruction:
vinserti128 $0x1,%xmm2,%ymm3,%ymm2'
/tmp/cckJQjTJ.s:95556: Error: no such instruction:vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:95558: Error: no such instruction:
vextracti128 $0x1,%ymm2,16(%rax)'
/tmp/cckJQjTJ.s:95560: Error: no such instruction:vextracti128 $0x1,%ymm0,48(%rax)' /tmp/cckJQjTJ.s:95650: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:95656: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:95658: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:95660: Error: no such instruction:vextracti128 $0x1,%ymm0,48(%rax)' /tmp/cckJQjTJ.s:95763: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:95766: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rcx)' /tmp/cckJQjTJ.s:95768: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:95770: Error: no such instruction:vextracti128 $0x1,%ymm0,48(%rcx)' /tmp/cckJQjTJ.s:95872: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:95878: Error: no such instruction:vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:95880: Error: no such instruction:
vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:95882: Error: no such instruction: `vextracti128 $0x1,%ymm0,48(%rax)'
make: *** [kmer_counter/raduls_avx2.o] Error 1
Stats:
No. of k-mers below min. threshold : 12041315
No. of k-mers above max. threshold : 0
No. of unique k-mers : 15114589
No. of unique counted k-mers : 3073274
Total no. of k-mers : 134782293
Total no. of reads : 1091283
Total no. of super-k-mers : 15598454
It would be great if there was a -j <stats.json>
option to write the above stdout
table in JSON format to a specified file.
This would make it machine readable for pipelines etc.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.