Code Monkey home page Code Monkey logo

Comments (6)

marekkokot avatar marekkokot commented on July 22, 2024

Hello,

Thanks for reporting that issue.
Are your input files publicly available, if yes could you point me how I can get them? If no could you at least specify the size of input files.
On your short example KMC works fine on my machine.
By "stats table" you mean something like that:

1st stage: 0.390857s
2nd stage: 1.96024s
Total    : 2.3511s
Tmp size : 0MB

Stats:
   No. of k-mers below min. threshold :          262
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :          262
   No. of unique counted k-mers       :            0
   Total no. of k-mers                :          262
   Total no. of reads                 :            2
   Total no. of super-k-mers          :           37

Or you mean some other "stats table" (which?)? I am asking because this table should be printed after finishing stage 2.
BTW. KMC usually do not need that much amount of memory, unless you have really big files.

from kmc.

flopezo avatar flopezo commented on July 22, 2024

Hi Marek,

The input files are not publicly available. The input includes 211,646,643 interleaved read pairs, and the size of the gzipped FASTQ file is approximately 36GB.

Yes, I meant the table printed after finishing stage 2. I used this command kmc3 -k21 -ci2 -t12 -v bfc-corrected.fastq.gz bfc-corrected_kmc3 ./tmp, and the output is below:

******* Stage 1 configuration: *******

No. of bins                  : 512
Bin part size                : 65536
Input buffer size            : 16777216

No. of readers               : 1
No. of splitters             : 11

Max. mem. size               : 12000MB
Max. mem. per storer         :  6088MB
Max. mem. for single package :    23MB

Max. mem. for PMM (bin parts):  9367MB
Max. mem. for PMM (FASTQ)    :  1819MB
Max. mem. for PMM (reads)    :     2MB
Max. mem. for PMM (b. reader):   805MB

Stage 1: 100%

******* Stage 2 configuration: *******
No. of threads               : 12

Max. mem. for 2nd stage      :    16MB

Stage 2: 100%
1st stage: 586.464s
2nd stage: 1.68055s
Total    : 588.144s
Tmp size : 0MB

Stats:
   No. of k-mers below min. threshold :            0
   No. of k-mers above max. threshold :            0
   No. of unique k-mers               :            0
   No. of unique counted k-mers       :            0
   Total no. of k-mers                :            0
   Total no. of reads                 :            0
   Total no. of super-k-mers          :            0

I usually use the default setting for memory (-m12), but I was trying different settings and forgot to change that one. Thank you.

from kmc.

marekkokot avatar marekkokot commented on July 22, 2024

Oh, OK you mean 0s as zeros, not as zero seconds :)
Hmm, for now, I don't know the reason for this behavior :(
Do you compile KMC on your own and use the last commit on github or do you use our precompiled version?
I will try to generate some files to reproduce this behavior, but if you notice it again on public files or on a smaller example that you could send me it would be really helpful.

from kmc.

flopezo avatar flopezo commented on July 22, 2024

I tried with both KMC v2 installed with Conda and your KMC v3 pre-compiled version. I don't understand what might be the problem. KMC works fine when I use the raw FASTQ files; that is, without prior error correction with BFC. I will send you an e-mail and attach a few thousand reads.

from kmc.

marekkokot avatar marekkokot commented on July 22, 2024

Hi,
Ok I know the reason. There are tabs ('\t') in reads' headers. I am not sure if fastq file format allows using tabs in headers, you are the first person that reports a problem related to tabs in headers (even the first published version of KMC assumes no tabs in headers).

I am not sure if we should allow tabs, what do you think?. On the other hand at first look it seems that there is only a little change required in KMC code, so maybe I will do it i the next couple of days.

Anyway thanks for reporting that bug and using KMC.

from kmc.

flopezo avatar flopezo commented on July 22, 2024

Thank you for your help! I also think that FASTQ headers are not supposed to have tabs. At least in Illumina reads, a space should precede the read number element. I have read that the format of reads corrected with BFC might cause parsing problems in other tools, such as SGA and khmer.

However, BFC-corrected reads have been used with various genome assemblers, and I have used them without any prior reformatting in assemblies with SPAdes.

Anyway, I will replace tabs with spaces and try again.

from kmc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.