Code Monkey home page Code Monkey logo

kmc's Issues

avx2 detection broken.

hello trying to compile KMC-3.0.1 I have some avx2 related errors.
same kind of error as mentioned in #17

rapid diggind in the code shows that avx2 is enabled on xeon proc
but some xeon does not provides avx2 support.

see for example an example of my /proc/cpuinfo
(running a docker on mac)

processor       : 5
vendor_id       : GenuineIntel
cpu family      : 6
model           : 62
model name      : Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz
stepping        : 4
cpu MHz         : 3699.593
cache size      : 10240 KB
physical id     : 5
siblings        : 1
core id         : 0
cpu cores       : 1
apicid          : 5
initial apicid  : 5
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc eagerfpu pni pclmulqdq dtes64 ds_cpl ssse3 cx16 xtpr pcid dca sse4_1 sse4_2 popcnt aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase erms xsaveopt arat
bugs            :
bogomips        : 7570.22
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

build erro: "modf is not a member of std"

I am building on linux with g++ 5.4.0 and as/binutils 2.2.8.
My build fails with:
In file included from kmc_dump/nc_utils.cpp:15:0:
kmc_dump/nc_utils.h: In static member function 'static int CNumericConversions::Double2PChar(double, int, uchar*)':
kmc_dump/nc_utils.h:124:22: error: 'modf' is not a member of 'std'
double fractPart = std::modf(val, &ipart);
^
kmc_dump/nc_utils.h:124:22: note: suggested alternative:
In file included from /usr/include/features.h:346:0,
from /global/common/genepool/usg/languages/gcc/5.4.0/include/c++/5.4.0/x86_64-unknown-linux-gnu/bits/os_defines.h:39,
from /global/common/genepool/usg/languages/gcc/5.4.0/include/c++/5.4.0/x86_64-unknown-linux-gnu/bits/c++config.h:482,
from /global/common/genepool/usg/languages/gcc/5.4.0/include/c++/5.4.0/string:38,
from kmc_dump/nc_utils.h:14,
from kmc_dump/nc_utils.cpp:15:
/usr/include/bits/mathcalls.h:116:1: note: 'modf'
__MATHCALL (modf,, (Mdouble __x, Mdouble *__iptr));

Any idea what is happening? Thx...

makefile_mac

Hi
I am willing to install KMC as part of the required dependencies for IVA and I am struggling with the installation of KMC on my Mac.
#1. I set the new path of gcc in the makefile_mac
#2. Considering the following error message:

screen shot 2017-03-12 at 9 29 26 am

I removed the -fopenmp option from makefile_mac

#3. I reran make -f makefile_mac but I finally got this error I cannot fix (sorry for that...)
screen shot 2017-03-12 at 9 28 49 am

Any advices would be very appreciate

thanks ++
a

Count K-mers read by read

Hi, I need to count k-mers for each read in a metagenomic FASTQ or FASTA file.
Is it possible to get counts for each single read of the file instead of the whole file?

Using kmc_file.h from v3.0.1

I'm trying to use the C++ API but when I include kmc_file.h and compile, I get the following compilation error.

Compile command

g++ read_db.cpp -o read_db

#include <iostream>
#include "../KMC-3.0.1/kmc_api/kmc_file.h"

int main(){
  CKMCFile kmer_database;
  return 0;
}

Error,

Compilation started at Thu Apr 13 13:59:49

g++ read_kmc.cpp -o read_kmc
In file included from read_kmc.cpp:2:
In file included from ./../KMC-3.0.1/kmc_api/kmc_file.h:14:
./../KMC-3.0.1/kmc_api/kmer_defs.h:36:11: fatal error: 'ext/algorithm' file not found
        #include <ext/algorithm>
                 ^
1 error generated.

Compilation exited abnormally with code 1 at Thu Apr 13 13:59:49

gcc version 6.3.0
OSX El Capitan 10.11.6

kmer counts incorrect?

I've been testing out KMC with the following dummy sequence

>dummy
AATGGGTCCCTGTTTCGCGATAAAATGCCAATCGCTCTAAATATCGCGCTAGC

with the command kmc -ci0 -fm -k3 -cs300 dummy_genome.fa dummy_kmc kmc_temp

The result is

Stage 1: 100%
1st stage: 0.002018s
2nd stage: 0.001863s
Total    : 0.003881s
Tmp size : 0MB

Stats:
   No. of k-mers below min. threshold :            0
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :           25
   No. of unique counted k-mers       :           25
   Total no. of k-mers                :           51
   Total no. of sequences             :            1
   Total no. of super-k-mers          :            0

while the sequence actually has 33 unique 3-mers.

I'm compiling with gcc 6.3.0. Maybe I'm doing something wrong...

python binding

This is not an issue but do you have any interesting to develop a python binding?
Thank you

KMC 3 stops during stage 2 when using BFC-corrected reads

Hello,

I'm trying to use KMC v3 with reads previously corrected with BFC. However, KMC stops during stage 2, there is no warning or error message, and the stats table shows only 0s. I ran Jellyfish v2 with the same corrected reads without a problem. Below are the commands that I'm using.

Correct reads
bash -c "bfc -s 200m -k33 -t 16 <(seqtk mergepe reads_1.fastq.gz reads_2.fastq.gz) <(seqtk mergepe reads_1.fastq.gz reads_2.fastq.gz) | gzip -1 > bfc-corrected.fastq.gz"

Count k-mers
kmc -k21 -ci2 -m100 -t12 -v bfc-corrected.fastq.gz bfc-corrected_kmc3 ./tmp

This is an example of a read pair after BFC correction:
@E00476:214:HHLTNALXX:8:1101:21217:1186 ec:Z:0_0:104_0_3:0_0
aTAACATATAATGTTTTTAAATAAATTTTAATTTAATTGGAATACTTATTTATTCAATAAAATTATTAACAATAATTTACCTCTATTTTGGTTTCAATTAAATAAATTTATAgAGAAATAaTAAATAAATAAAGCTTCTAACTTTATAATA
+
&???????????????????????????????????????+??????+??????++??+++???+???????????????+???+?????+????++??+++?+++??????%++++???%++?????+?+???+????+?++????++??
@E00476:214:HHLTNALXX:8:1101:21217:1186 ec:Z:0_0:103_0_3:0_0
aTATATTTTTGTTTATTATTTTAAGTATAGGTTAATTGAAGAATTATTTAATTTATTAAAATTAGATTATTTTGTTTATTATAAAATATTTTATTTTTTTTTTATAATTATAATTTTTTATTATTTTTTATTTgATTAAAATaTATGAATA
+
&?????????????????????????????????????????????????????++????????????????????+??????????+?????????????????++?++++++?????++????????++??%+??++++?#???+++?+

I would really appreciate any help.

kmc_tools complex does not accept non-alphanumeric characters in out_db_path

Hi,
Here are some examples that should work but produce errors

operations_definition_file_1 (with dot)

INPUT:
sample0 = sample0.31mers 
sample1 = sample1.31mers 
OUTPUT:
samples_union.31mers=sample0+sample1
kmc_tools complex
Error: wrong line format, line: 5

operations_definition_file_2 (with slash)

INPUT:
sample0 = sample0.31mers 
sample1 = sample1.31mers 
OUTPUT:
samples_union/samples_union=sample0+sample1
kmc_tools complex
Error: wrong line format, line: 5

Max. counter value parameters not always respected

Setting Max. counter value equal to my UINT_MAX, AKA -cs4294967295 shows:

********** Used parameters: **********
[...]
Max. counter value           : 4294967295
[...]

However, the max counter remains at the default of 255:

$ cat my_reads.32mers | cut -f2 | sort -n | tail - n1
255

Setting this parameter works up to at least 1000000, which exceeds my USHRT_MAX, so it's not clear what the actual limit is.

Mask rare kmers, instead of filtering.

I am using kmc to check for rare k-mers before genome assembly. I was wondering if there's a way to mask rare k-mers (replace with N) instead of filtering them out or trimming them. Filtering leads to losing more data than needed. Trimming leads to reads of unequal lengths, which makes it difficult to detect positional biases in the reads, if any, after removing rare k-mers.

Help: KMC configuration

Hi

We are working on analysis of Bioinformatics tools (related to Kmer counting) and KMC is one of them. We have gone through readme file and it is very helpful. As we are doing analysis so we want to be very sure about details. So it would be great if you help us validating below details of KMC.

Data structure and Sorting Algo:
Array, Priority queue, Radix sort, Counting sort

Approach:
Two disk based, Modified minimum sub-string partitioning (signature).

The limit of k-size : less than 257
Supports online k-mer frequency retrieval : No
Supports compressed file processing : Yes

Thanks
Tarang

compile issue

Hello, ran into this problem with make today:

g++ -Wall -O3 -m64 -static -Wl,--whole-archive -lpthread -Wl,--no-whole-archive -std=c++11  -c kmer_counter/kmer_counter.cpp -o kmer_counter/kmer_counter.o
In file included from /usr/local/include/assert.h:5:0,
                 from /usr/include/c++/5/cassert:43,
                 from kmer_counter/radix.h:13,
                 from kmer_counter/kb_collector.h:18,
                 from kmer_counter/kmc.h:26,
                 from kmer_counter/kmer_counter.cpp:18:
/usr/local/include/except.h:15:32: error: conflicting declaration ‘typedef struct Except_Frame_T* Except_Frame_T’
 typedef struct Except_Frame_T *Except_Frame_T;
                                ^
/usr/local/include/except.h:15:16: note: previous declaration as ‘struct Except_Frame_T’
 typedef struct Except_Frame_T *Except_Frame_T;
                ^
/usr/local/include/except.h:17:18: error: field ‘prev’ has incomplete type ‘Except_Frame_T’
   Except_Frame_T prev;
                  ^
/usr/local/include/except.h:16:8: note: definition of ‘struct Except_Frame_T’ is not complete until the closing brace
 struct Except_Frame_T {
        ^
makefile:79: recipe for target 'kmer_counter/kmer_counter.o' failed
make: *** [kmer_counter/kmer_counter.o] Error 1

g++ (c++14) error: 'modf' is not a member of 'std'

Any suggestions?

g++-5 -Wall -O3 -m64 -static -Wl,--whole-archive -lpthread -Wl,--no-whole-archive -std=c++14 -c kmc_tools/
percent_progress.cpp -o kmc_tools/percent_progress.o
In file included from kmc_dump/nc_utils.cpp:15:0:
kmc_dump/nc_utils.h: In static member function 'static int CNumericConversions::Double2PChar(double, int,
uchar*)':
kmc_dump/nc_utils.h:124:22: error: 'modf' is not a member of 'std'
   double fractPart = std::modf(val, &ipart);
                      ^
kmc_dump/nc_utils.h:124:22: note: suggested alternative:
In file included from /home/linuxbrew/.linuxbrew/include/features.h:368:0,
                 from /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_1/include/c++/5.5.0/x86_64-unknown-linux
-gnu/bits/os_defines.h:39,
                 from /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_1/include/c++/5.5.0/x86_64-unknown-linux
-gnu/bits/c++config.h:489,
                 from /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_1/include/c++/5.5.0/string:38,
                 from kmc_dump/nc_utils.h:14,
                 from kmc_dump/nc_utils.cpp:15:
/home/linuxbrew/.linuxbrew/include/bits/mathcalls.h:115:1: note:   'modf'
 __MATHCALL (modf,, (_Mdouble_ __x, _Mdouble_ *__iptr)) __nonnull ((2));
 ^
make: *** [kmc_dump/nc_utils.o] Error 1
make: *** Waiting for unfinished jobs....
In file included from kmc_dump/kmc_dump.cpp:17:0:
kmc_dump/nc_utils.h: In static member function 'static int CNumericConversions::Double2PChar(double, int, uchar*)':
kmc_dump/nc_utils.h:124:22: error: 'modf' is not a member of 'std'
   double fractPart = std::modf(val, &ipart);
                      ^
kmc_dump/nc_utils.h:124:22: note: suggested alternative:
kmc_api/kmc_file.cpp: In member function 'bool CKMCFile::BinarySearch(int64, int64, const CKmerAPI&, uint64&, uint32)':
kmc_api/kmc_file.cpp:1360:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
  if (index_start >= total_kmers)
                  ^
In file included from /home/linuxbrew/.linuxbrew/include/features.h:368:0,
                 from /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_1/include/c++/5.5.0/x86_64-unknown-linux-gnu/bits/os_defines.h:39,
                 from /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_1/include/c++/5.5.0/x86_64-unknown-linux-gnu/bits/c++config.h:489,
                 from /home/linuxbrew/.linuxbrew/Cellar/gcc/5.5.0_1/include/c++/5.5.0/iostream:38,
                 from kmc_dump/kmc_dump.cpp:15:
/home/linuxbrew/.linuxbrew/include/bits/mathcalls.h:115:1: note:   'modf'
 __MATHCALL (modf,, (_Mdouble_ __x, _Mdouble_ *__iptr)) __nonnull ((2));
 ^
make: *** [kmc_dump/kmc_dump.o] Error 1

Option to limit number of reads/kmers processed?

I have use-cases where I have a very large FASTQ file and wish to run kmc on, but I don't want it to read the whole file, as I only need the results for some estimations.

Would you be able to add an option that stopped processing after -nr <value> reads (or -nk <value> kmers?

New point release (3.0.2).

Hi, I was wondering if we have a point release incorporating the recent updates. The version on our uni's compute cluster won't be updated otherwise (policy) and several of us here are dependent on the new masking feature in kmc_tools.

Error message: Error: Cannot open temporary file tmp/kmc_00000.bin

Dear marekkokot,

I am new to Linux, so this problem may look like sully. I download the KMC3 file and make it. In the bin fold, I could see the three files: kmc, kmc_dump and kmc_tools. But when I run the command line : /home/niu/KMC-3.0.1/bin/kmc -k20 reads.fq kmers1 tmp. I got the error like: Error: Cannot open temporary file tmp/kmc_00000.bin. I tried several times and got the same error. I also chenked my tmp, it still has 50G space. So could you help me to figure this problem? Thank you very much.

Best,
Tim

asmlib/vectorclass: Licensing issues and clarification

[DISCLAIMER: I am not a lawyer and the following are only my interpretations of the licensing terms -- hence no legal advice but only well-intended suggestions/remarks.]

The README.md states that

KMC software distributed under GNU GPL 2 licence.

yet it uses asmlib (optionally) and vectorclass which are both GPL-3.0+ licensed.
Sadly GPL-2.0 and GPL-3.0 are not compatible, see
https://www.gnu.org/licenses/gpl-faq.html#AllCompatibility.
Hence, to use vectorclass, KMC would have to be made available via GPLv3, i.e., licensed under one of GPL-2.0+, GPL-3.0, or GPL-3.0+.

Furthermore I also find the following from KMC's readme problematic/misleading:

Note: asmlib is free only for non commercial purposes. If needed, you can contact the author of asmlib or compile KMC without asmlib.

Note: for commercial usage of asmlib follow the instructions in 'License conditions' (http://www.agner.org/optimize/asmlib-instructions.pdf) or compile KMC without asmlib. In case of doubt, please consult the original documentations.

vcl is under the licence GNU GPL 3 or higher Node: for commercial usage of vcl follow the instructions in 'License' section (http://www.agner.org/optimize/vectorclass.pdf)

But as asmlib/vectorclass can be used in terms of the GPL-3.0, no restrictions concerning commercial/non-commercial usage should be applicable, see
https://www.gnu.org/licenses/gpl-faq.html#NoMilitary
and
https://www.gnu.org/licenses/gpl.html#section7.
(IMO, the use of the sole term "Commercial licenses" from the asmlib and vectorclass license texts is also misleading, as they (to me) kind of suggest the interpretation GPL=free=non-commercial which is wrong. "Alternative custom/proprietary/??? license" might have been a better choice...)


Just for reference/context the license information for asmlib and vectorclass:

From http://www.agner.org/optimize/asmlib-instructions.pdf:

10 License conditions

These software libraries are free: you can redistribute the software and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, either version 3 of the license, or any later version.

Commercial licenses are available on request to www.agner.org/contact.

This software is distributed in the hope that it will be useful, but without any warranty. See
the file license.txt or www.gnu.org/licenses for the license text.

From http://www.agner.org/optimize/vectorclass.pdf:

License

The VCL vector class library has a dual license system. You can use it for free in
open source software, or pay for using it in proprietary software.

You are free to copy, use, redistribute and modify this software under the terms of
the GNU General Public License as published by the Free Software Foundation,
version 3 or any later version. See the file license.txt.

Commercial licenses are available on request.

Union resulting in smaller database than the individual file

Hi,

I was using KMC (3.0.0) for producing kmers of the unitig files (generated by BCALM). The union of the databases produced a smaller resulting data-base.
Is it an anomaly?

Datasets:

  1. ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR129/004/SRR1291024/SRR1291024_1.fastq.gz
  2. ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR129/000/SRR1291070/SRR1291070_1.fastq.gz

The unitigs for paired-end (_1 and _2) files were generated using BCALM.

Command to produce individual kmer data-bases for the two files:
./KMC/bin/kmc -k63 -r -ci1 -fa SRR1291024.unitigs.fa SRR1291024.kmers .
./KMC/bin/kmc -k63 -r -ci1 -fa SRR1291024.unitigs.fa SRR1291070.kmers .

Command to produce union:
./KMC/bin/kmc_tools simple SRR1291024.kmers -ci1 SRR1291070.kmers -ci1 union kmers_superset -ci1

Size of the resultant individual data-bases:
SRR1291024.kmers.kmc_pre (66M)
SRR1291024.kmers.kmc_suf (40G)
SRR1291070.kmers.kmc_pre (66M)
SRR1291070.kmers.kmc_suf (40G)

Size of the resultant union data-base:
kmers_superset.kmc_pre (33M)
kmers_superset.kmc_suf (39G)

Are the results correct?

set all counts to a specified value

@marekkokot Is there a way to set all kmer counts in a database to a specific value? If not, would that be hard to add? Use case: I have a set of samples, and a kmc kmer database for each sample. I want to make a database that, for each kmer, records which samples have it (basically a colored de Bruijn graph). If there are <=64 samples I can assign to kmers from sample i the count 2^i, then the sum of counts gives the set of samples . If 64<n_samples<=128 can represent this with two kmc kmer databases per sample, etc.

(k,x)-mers explanation

Hi, I'm not sure I understand how you split a super k-mer in (k,x)-mers (I guess x=1 is too simple and not enough explanatory) and why the subset are non-overlapping.
Could you kindly provide a pratical example with the real x value you use in the program? (it is 3?)

Best regards

Progress/warnings to STDERR

Could KMC progress/warnings/errors be sent to STDERR instead of STDOUT. This would then mean that KMC follows Linux conventions and makes pipelining KMC commands easier and more intuative.

Current Behaviour

kmc_tools sends progress to STDOUT:

$ kmc_tools transform my_kmer_db -ci4 dump /dev/stdout | head
in1: 0% AAAAAAAAAAATGATGGGCATTTTAGAAGGGCATTTCAGGTTCATTGAAAAATTATTTTAGTAACCCTAGT 10
AAAAAAAAAAATGATGGGCATTTTAGAAGGGCATTTCAGGTTCATTGAAAATTATTTTAGTAAACCCTAGT 12
AAAAAAAAAATGATGGGCATTTTAGAAGGGCATTTCAGGTTCATTGAAAAATTATTTTAGTAACCCTAGTT 10
AAAAAAAAAATGATGGGCATTTTAGAAGGGCATTTCAGGTTCATTGAAAATTATTTTAGTAAACCCTAGTT 11
AAAAAAAAACCCTAGTCATTTTATCCTAACCTAACGCAGTCGTTAGCTTCGATCCAAAATCCCCTATTGTT 15
AAAAAAAAACGTCCATGACCATTGGTCGTCTAACAGCCACACTGGTAGCTAGTCTTGTACTCCATGCAAAT 16
AAAAAAAAACTAGGAAAAAAATAGACCACAAACAGAGTGGACATCAACTTAGATGTGACATAACTATGTCA 11
AAAAAAAAACTAGGACAAAAAAATAGACCACAAACAGAGTGGACATCAACTTAGATGTGACATAACTATGT 11
AAAAAAAAACTAGGGTTTCGTAGTAGCAATCTTCGCACTCCGGAAATTCTACCGAGGCAAACAATAACTAT 12
AAAAAAAAAGAAAAGAAAAGGTTAGCTACAGACGTGTGATGAATCAAGTGCTTGAGCTAGTTAGCTTTGTT 12

This means that to put kmc_tools into a pipeline, you need to either

  • Do some funcky redirection trickery:
$ kmc_tools transform my_kmer_db -ci4 dump /dev/stderr 2>&1 > /dev/null | head
  • Ask kmc_tools to not report progress:
$ kmc_tools -hp transform my_kmer_db -ci4 dump /dev/stdout | head

Desired/Conventional Behaviour

In Linux it is convention to send progress/warnings/errors to STDERR and have results etc sent to STDOUT. This is so that the expected output of a command can be easily piped into another command (assuming no seeking is required). This is very powerful and can be used to avoid disk IO.

Therefore a change to sending progress/errors/warnings to STDERR would allow a more simplified approach to pipelining KMC commands:

kmc_tools transform my_kmer_db -ci4 dump /dev/stdout  2>progress.log | head

Issue #23 was where this was originally raised.

Fail quickly on missing directories

Please could you implement a check for the existence (or create) of the directories specified on the command line for <output_file_name> and <working_directory>.

As a user it is frustrating for KMC to spend many minutes or hours doing computation only for it to fail because the directory I specified for the working directory did not exist. Similarly for the parent directory I specify for the output file name.

bug: kmc_dump -ci and -cx inclusive/exclusive

The help for kmc_dump states that -ci excluded kmers occurring less than the specified number of time and -cx excludes kmers occurring more than the specified number of times. So to get the kmers which occur exactly 10 times I should be able to specify -ci10 -cx10. However, this returns nothing.

If I specify -ci10 -cx11, as expected I get a list of kmers occurring 10 or 11 times.

test suite?

@marekkokot Is there a test suite you use to verify correctness of kmc and kmc_tools? If there is, could it be checked into github?

tag the 2.2 release in GitHub

Could you please tag the 2.2 release in GitHub? 2.1.1 is the latest release tagged here, but the KMC web site says that the current release is 2.2. Thanks.

Error: Wrong input file!

Hi there.

I am currently trying to use KMC to count 36-mers in a bench of files I have downloaded from the SRA and for a lot of them, KMC just returns me the following error:
********Error: Wrong input file!

An example is the file SRR1047856 from the SRA. On one computer with Ubuntu 15.04, I have downloaded its corresponding SRA file and extracted the FASTA file out of it. Then, I ran the command:
/home/gholley/KMC/bin/kmc -k36 -ci3 -fa SRR1047856.fasta SRR1047856_comp .
and obtained in return:
********Error: Wrong input file!
I tried different parameters for k and ci. I tried limiting the number of threads and the RAM-only mode as well as extracting the FASTQ file from the SRA file instead of the FASTA file. Same error.

I though that my SRA file might have been corrupted during the download so I re-downloaded directly the FASTA file from the SRA on a different computer (with Ubuntu 16.10). My local KMC branch was up to date with this git repository. I tried the same command and I obtained the same error.

Any help with this would be welcome :)
Thank you!

Best, Guillaume.

build error: 'modf' is not a member of 'std'

I am building on linux with g++ 5.4.0 and as/binutils 2.2.8.
My build fails with:

In file included from kmc_dump/nc_utils.cpp:15:0:
kmc_dump/nc_utils.h: In static member function 'static int CNumericConversions::Double2PChar(double, int, uchar*)':
kmc_dump/nc_utils.h:124:22: error: 'modf' is not a member of 'std'
   double fractPart = std::modf(val, &ipart);
                      ^
kmc_dump/nc_utils.h:124:22: note: suggested alternative:
In file included from /usr/include/features.h:346:0,
                 from /global/common/genepool/usg/languages/gcc/5.4.0/include/c++/5.4.0/x86_64-unknown-linux-gnu/bits/os_defines.h:39,
                 from /global/common/genepool/usg/languages/gcc/5.4.0/include/c++/5.4.0/x86_64-unknown-linux-gnu/bits/c++config.h:482,
                 from /global/common/genepool/usg/languages/gcc/5.4.0/include/c++/5.4.0/string:38,
                 from kmc_dump/nc_utils.h:14,
                 from kmc_dump/nc_utils.cpp:15:
/usr/include/bits/mathcalls.h:116:1: note:   'modf'
 __MATHCALL (modf,, (_Mdouble_ __x, _Mdouble_ *__iptr));

Any idea what is happening? Thx...

Weird kmer counting

OK, so I have a little bit of an issue with KMC 3.0.1 on a Linux system. I have multiple fasta files (let's call them F1.fasta, F2.fasta, ..., Fn.fasta) which contain multiple genes each and I ran

kmc -k15 -fm -ci1 -cs1677215 F1.fasta F1.fasta temp/
kmc_dump F1.fasta F1.fasta.15.kmrs

This counts the 15mers within each fasta file. I then ran

cat F*.fasta > all.fasta
kmc -k15 -fm -ci1 -cs1677215 all.fasta all.fasta temp/
kmc_dump all.fasta all.fasta.15.kmrs

This concatenates all the fasta files together and counts the 15mers in there. Now, there are a set of 15mers that are found in the individual fasta files, let's call one of these kmer X, that isn't found in the all.fasta file. This is kind of baffling me as it should't be possible for that to happen. How can a kmer be found in an individual fasta file, but not when we concatenate the fasta files together?

I have a total of about 5500 fasta files and X appears in them <1 time (typically).

To dig even further, I ran KMC 2.3.0 on the same all.fasta file and got different results. Those results were more inline with those of the individual KMC 3.0.1 runs (X was found in the KMC 2.3.0 run). Additionally, I should note that both KMC 2.3.0 and KMC 3.0.1 find the same number of unique 15mers, however, the 15mers that are flip flopped around (a total of 5 15mers are flip flopped) do not have the same counts. This makes me think there may be an issue with the way a kmer is getting encoded inside the database and then getting decoded in the dump. IE, if I decoded the DB to produce 15mer X, it wasn't X that was encoded there to begin with, rather it was some other 15mer Y (or stated differently encode(Y) = E, decode(E) = X). In any case, something changed between 2.3.0 and 3.0.1 (possibly 3.0.0) to produce this result.

I have the all.fasta file that was used to produce the above results. It's 550MB in size (165MB compressed). Github won't let me attach it here, so if you need it, please do ask (maybe I could email it to you?).

One final note, I did test this on the executable that you offer on your website (http://sun.aei.polsl.pl/REFRESH/index.php?page=projects&project=kmc&subpage=download) which is stated as 3.0. I got the odd results which made me go and compile 3.0.1 from scratch (I didn't see an available executable on your GitHub) to have the same results. Because of the bug that has the 3.0.1 kmc executable still printing 3.0.0 as its version, I'm not sure which version you have on your website. But if it is 3.0.0, I did test that as well. If not, then I did not test 3.0.0. That said, from a user perspective, please fix that little bug with the 3.0.0 on your next release (I'm sure you have already as you closed it the request for this; from a user perspective, it's really annoying not knowing what version we're actually on).

Any insight on this issue would be greatly appreciated!

Ignore N in the reads.

I am looking to use KMC to filter rare k-mers pre-assembly and was wondering if there's a way to tell it to ignore Ns in the reads (which could be uncalled bases or masked low-quality bases). Maybe KMC automatically does that?

Extend the length of supported reads in fastq and fasta format

Bug reported by mail:

Is there a limit to the length of sequences in a fasta file for the 'kmc' command?

I run this command
kmc -k25 -ci1 -fa input/test.fasta output/test.res work

When the fasta file contains one sequence of 50,000 'A's, the program completes.
When the fasta file contains one sequence of 60,000 'A's, the programs halts with the message 'Error: Wrong input file!'.

So, I conclude that sequence lengths have a limit between 50,000 and 60,000 characters. Is that correct?

The limit is not strict (it depents on couple of factors, yet it is enough for short reads)

The workaround in reported case is to use -fm (multifasta format), but in general long reads should be also supported in fasta and fastq format.

Option to NOT keep .kmc_pre and .kmc_suf outputs

Running KMC produces two output files: XXX.kmc_pre and XXX.kmc_suf.

Could you please add an option to not create/write or keep these files?

For example, often we only want the summary Stats:.

kmc_dump option ci and cx are reversed

According to help:

-ci<value> - print k-mers occurring less than <value> times
-cx<value> - print k-mers occurring more of than <value> times

However, it seems like -ci<value> prints k-mers occuring greater than <value> times. Same for -cx.

how to get counts of selected kmers

What is the best way to obtain a count of specified collection of kmers. Currently, I do a 'dump' and then extract the ones I want. Is there a better way?

Install error

Hi,
I tried to install KMC ,but there are some problems occured when I used the command "make DISABLE_ASMLIB=true" and I don't know how to solve this, could you give me some advice?
Best,
############################################################################

make DISABLE_ASMLIB=true
g++ -Wall -O3 -m64 -static -Wl,--whole-archive -lpthread -Wl,--no-whole-archive -std=c++11 -DDISABLE_ASMLIB -mavx2 -mfma -fabi-version=0 -c kmer_counter/raduls_avx2.cpp -o kmer_counter/raduls_avx2.o
/tmp/cckJQjTJ.s: Assembler messages:
/tmp/cckJQjTJ.s:27711: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:27714: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rcx)'
/tmp/cckJQjTJ.s:27716: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:27718: Error: no such instruction: vextracti128 $0x1,%ymm0,48(%rcx)'
/tmp/cckJQjTJ.s:27823: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:27829: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rax)'
/tmp/cckJQjTJ.s:27831: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:27833: Error: no such instruction: vextracti128 $0x1,%ymm0,48(%rax)'
/tmp/cckJQjTJ.s:36827: Error: no such instruction: vinserti128 $0x1,%xmm1,%ymm0,%ymm0' /tmp/cckJQjTJ.s:36829: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx)'
/tmp/cckJQjTJ.s:36900: Error: no such instruction: vinserti128 $0x1,%xmm1,%ymm0,%ymm0' /tmp/cckJQjTJ.s:36902: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rax)'
/tmp/cckJQjTJ.s:41669: Error: no such instruction: vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0' /tmp/cckJQjTJ.s:41670: Error: suffix or operands invalid for vpaddq'
/tmp/cckJQjTJ.s:41672: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx,%rax)' /tmp/cckJQjTJ.s:42925: Error: no such instruction: vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0'
/tmp/cckJQjTJ.s:42926: Error: suffix or operands invalid for vpaddd' /tmp/cckJQjTJ.s:42928: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx,%rax)'
/tmp/cckJQjTJ.s:46028: Error: no such instruction: vinserti128 $0x1,%xmm1,%ymm0,%ymm0' /tmp/cckJQjTJ.s:46031: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rax)'
/tmp/cckJQjTJ.s:46033: Error: no such instruction: vinserti128 $0x1,%xmm1,%ymm0,%ymm0' /tmp/cckJQjTJ.s:46035: Error: no such instruction: vextracti128 $0x1,%ymm0,48(%rax)'
/tmp/cckJQjTJ.s:46852: Error: no such instruction: vinserti128 $0x1,%xmm2,%ymm3,%ymm2' /tmp/cckJQjTJ.s:46853: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:46855: Error: no such instruction: vextracti128 $0x1,%ymm2,16(%rax)' /tmp/cckJQjTJ.s:46857: Error: no such instruction: vextracti128 $0x1,%ymm0,48(%rax)'
/tmp/cckJQjTJ.s:46913: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:46919: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rax)'
/tmp/cckJQjTJ.s:46921: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:46923: Error: no such instruction: vextracti128 $0x1,%ymm0,48(%rax)'
/tmp/cckJQjTJ.s:47314: Error: no such instruction: vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0' /tmp/cckJQjTJ.s:47315: Error: suffix or operands invalid for vpaddq'
/tmp/cckJQjTJ.s:47317: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx,%rax)' /tmp/cckJQjTJ.s:48587: Error: no such instruction: vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0'
/tmp/cckJQjTJ.s:48588: Error: suffix or operands invalid for vpaddd' /tmp/cckJQjTJ.s:48590: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx,%rax)'
/tmp/cckJQjTJ.s:50550: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:52688: Error: no such instruction: vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0'
/tmp/cckJQjTJ.s:52689: Error: suffix or operands invalid for vpaddq' /tmp/cckJQjTJ.s:52691: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx,%rax)'
/tmp/cckJQjTJ.s:53960: Error: no such instruction: vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0' /tmp/cckJQjTJ.s:53961: Error: suffix or operands invalid for vpaddd'
/tmp/cckJQjTJ.s:53963: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx,%rax)' /tmp/cckJQjTJ.s:57720: Error: no such instruction: vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0'
/tmp/cckJQjTJ.s:57721: Error: suffix or operands invalid for vpaddq' /tmp/cckJQjTJ.s:57723: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx,%rax)'
/tmp/cckJQjTJ.s:58988: Error: no such instruction: vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0' /tmp/cckJQjTJ.s:58989: Error: suffix or operands invalid for vpaddd'
/tmp/cckJQjTJ.s:58991: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx,%rax)' /tmp/cckJQjTJ.s:62058: Error: no such instruction: vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0'
/tmp/cckJQjTJ.s:62059: Error: suffix or operands invalid for vpaddq' /tmp/cckJQjTJ.s:62061: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx,%rax)'
/tmp/cckJQjTJ.s:63323: Error: no such instruction: vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0' /tmp/cckJQjTJ.s:63324: Error: suffix or operands invalid for vpaddd'
/tmp/cckJQjTJ.s:63326: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx,%rax)' /tmp/cckJQjTJ.s:65465: Error: no such instruction: vinserti128 $0x1,%xmm1,%ymm0,%ymm0'
/tmp/cckJQjTJ.s:65467: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:66050: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:66052: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:66091: Error: no such instruction: vinserti128 $0x1,%xmm1,%ymm0,%ymm0'
/tmp/cckJQjTJ.s:66093: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:66451: Error: no such instruction: vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0'
/tmp/cckJQjTJ.s:66452: Error: suffix or operands invalid for vpaddq' /tmp/cckJQjTJ.s:66454: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx,%rax)'
/tmp/cckJQjTJ.s:67714: Error: no such instruction: vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0' /tmp/cckJQjTJ.s:67715: Error: suffix or operands invalid for vpaddd'
/tmp/cckJQjTJ.s:67717: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx,%rax)' /tmp/cckJQjTJ.s:70575: Error: no such instruction: vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0'
/tmp/cckJQjTJ.s:70576: Error: suffix or operands invalid for vpaddq' /tmp/cckJQjTJ.s:70578: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx,%rax)'
/tmp/cckJQjTJ.s:71836: Error: no such instruction: vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0' /tmp/cckJQjTJ.s:71837: Error: suffix or operands invalid for vpaddd'
/tmp/cckJQjTJ.s:71839: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx,%rax)' /tmp/cckJQjTJ.s:73822: Error: no such instruction: vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0'
/tmp/cckJQjTJ.s:73823: Error: suffix or operands invalid for vpaddq' /tmp/cckJQjTJ.s:73825: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx,%rax)'
/tmp/cckJQjTJ.s:75091: Error: no such instruction: vinserti128 $0x1,16(%rdx,%rax),%ymm0,%ymm0' /tmp/cckJQjTJ.s:75092: Error: suffix or operands invalid for vpaddd'
/tmp/cckJQjTJ.s:75094: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx,%rax)' /tmp/cckJQjTJ.s:81696: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:81698: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:81760: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:81762: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:81832: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:81834: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rdx)' /tmp/cckJQjTJ.s:81909: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:81911: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:83185: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:83187: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:83249: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:83251: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:83323: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:83325: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rsi)' /tmp/cckJQjTJ.s:83399: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:83401: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:93800: Error: no such instruction: vinserti128 $0x1,%xmm2,%ymm3,%ymm2'
/tmp/cckJQjTJ.s:93801: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:93803: Error: no such instruction: vextracti128 $0x1,%ymm2,16(%rax)'
/tmp/cckJQjTJ.s:93805: Error: no such instruction: vextracti128 $0x1,%ymm0,48(%rax)' /tmp/cckJQjTJ.s:93896: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:93902: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:93904: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:93906: Error: no such instruction: vextracti128 $0x1,%ymm0,48(%rax)' /tmp/cckJQjTJ.s:94006: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:94009: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rcx)' /tmp/cckJQjTJ.s:94011: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:94013: Error: no such instruction: vextracti128 $0x1,%ymm0,48(%rcx)' /tmp/cckJQjTJ.s:94117: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:94123: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:94125: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:94127: Error: no such instruction: vextracti128 $0x1,%ymm0,48(%rax)' /tmp/cckJQjTJ.s:95555: Error: no such instruction: vinserti128 $0x1,%xmm2,%ymm3,%ymm2'
/tmp/cckJQjTJ.s:95556: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0' /tmp/cckJQjTJ.s:95558: Error: no such instruction: vextracti128 $0x1,%ymm2,16(%rax)'
/tmp/cckJQjTJ.s:95560: Error: no such instruction: vextracti128 $0x1,%ymm0,48(%rax)' /tmp/cckJQjTJ.s:95650: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:95656: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:95658: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:95660: Error: no such instruction: vextracti128 $0x1,%ymm0,48(%rax)' /tmp/cckJQjTJ.s:95763: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:95766: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rcx)' /tmp/cckJQjTJ.s:95768: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:95770: Error: no such instruction: vextracti128 $0x1,%ymm0,48(%rcx)' /tmp/cckJQjTJ.s:95872: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:95878: Error: no such instruction: vextracti128 $0x1,%ymm0,16(%rax)' /tmp/cckJQjTJ.s:95880: Error: no such instruction: vinserti128 $0x1,%xmm0,%ymm1,%ymm0'
/tmp/cckJQjTJ.s:95882: Error: no such instruction: `vextracti128 $0x1,%ymm0,48(%rax)'
make: *** [kmer_counter/raduls_avx2.o] Error 1

Option to write stdout Stats: to JSON file?

Stats:
   No. of k-mers below min. threshold :     12041315
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :     15114589
   No. of unique counted k-mers       :      3073274
   Total no. of k-mers                :    134782293
   Total no. of reads                 :      1091283
   Total no. of super-k-mers          :     15598454

It would be great if there was a -j <stats.json> option to write the above stdout table in JSON format to a specified file.

This would make it machine readable for pipelines etc.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.