Comments (6)
Hello,
Thanks for reporting that issue.
Are your input files publicly available, if yes could you point me how I can get them? If no could you at least specify the size of input files.
On your short example KMC works fine on my machine.
By "stats table" you mean something like that:
1st stage: 0.390857s
2nd stage: 1.96024s
Total : 2.3511s
Tmp size : 0MB
Stats:
No. of k-mers below min. threshold : 262
No. of k-mers above max. threshold : 0
No. of unique k-mers : 262
No. of unique counted k-mers : 0
Total no. of k-mers : 262
Total no. of reads : 2
Total no. of super-k-mers : 37
Or you mean some other "stats table" (which?)? I am asking because this table should be printed after finishing stage 2.
BTW. KMC usually do not need that much amount of memory, unless you have really big files.
from kmc.
Hi Marek,
The input files are not publicly available. The input includes 211,646,643 interleaved read pairs, and the size of the gzipped FASTQ file is approximately 36GB.
Yes, I meant the table printed after finishing stage 2. I used this command kmc3 -k21 -ci2 -t12 -v bfc-corrected.fastq.gz bfc-corrected_kmc3 ./tmp
, and the output is below:
******* Stage 1 configuration: *******
No. of bins : 512
Bin part size : 65536
Input buffer size : 16777216
No. of readers : 1
No. of splitters : 11
Max. mem. size : 12000MB
Max. mem. per storer : 6088MB
Max. mem. for single package : 23MB
Max. mem. for PMM (bin parts): 9367MB
Max. mem. for PMM (FASTQ) : 1819MB
Max. mem. for PMM (reads) : 2MB
Max. mem. for PMM (b. reader): 805MB
Stage 1: 100%
******* Stage 2 configuration: *******
No. of threads : 12
Max. mem. for 2nd stage : 16MB
Stage 2: 100%
1st stage: 586.464s
2nd stage: 1.68055s
Total : 588.144s
Tmp size : 0MB
Stats:
No. of k-mers below min. threshold : 0
No. of k-mers above max. threshold : 0
No. of unique k-mers : 0
No. of unique counted k-mers : 0
Total no. of k-mers : 0
Total no. of reads : 0
Total no. of super-k-mers : 0
I usually use the default setting for memory (-m12
), but I was trying different settings and forgot to change that one. Thank you.
from kmc.
Oh, OK you mean 0s as zeros, not as zero seconds :)
Hmm, for now, I don't know the reason for this behavior :(
Do you compile KMC on your own and use the last commit on github or do you use our precompiled version?
I will try to generate some files to reproduce this behavior, but if you notice it again on public files or on a smaller example that you could send me it would be really helpful.
from kmc.
I tried with both KMC v2 installed with Conda and your KMC v3 pre-compiled version. I don't understand what might be the problem. KMC works fine when I use the raw FASTQ files; that is, without prior error correction with BFC. I will send you an e-mail and attach a few thousand reads.
from kmc.
Hi,
Ok I know the reason. There are tabs ('\t') in reads' headers. I am not sure if fastq file format allows using tabs in headers, you are the first person that reports a problem related to tabs in headers (even the first published version of KMC assumes no tabs in headers).
I am not sure if we should allow tabs, what do you think?. On the other hand at first look it seems that there is only a little change required in KMC code, so maybe I will do it i the next couple of days.
Anyway thanks for reporting that bug and using KMC.
from kmc.
Thank you for your help! I also think that FASTQ headers are not supposed to have tabs. At least in Illumina reads, a space should precede the read number element. I have read that the format of reads corrected with BFC might cause parsing problems in other tools, such as SGA and khmer.
However, BFC-corrected reads have been used with various genome assemblers, and I have used them without any prior reformatting in assemblies with SPAdes.
Anyway, I will replace tabs with spaces and try again.
from kmc.
Related Issues (20)
- Windows release file appears broken HOT 1
- std::thread::hardware_concurrency() should not be used
- Error: cannot open file: /path/xx.fq.gz HOT 3
- the use of the -ci parameter HOT 3
- KMC reads the sequences from standard input HOT 2
- How to get kmer count of kmc output database HOT 8
- Apple Silicon Build Issue HOT 6
- have no peak HOT 11
- Error: unknown exception HOT 1
- Bug with extra large `k`-mers HOT 6
- trying to increase maximal value of a counter (using -cs) HOT 3
- KMC hangs HOT 1
- kmc stuck at Stage 1: XX%(less than 100%) HOT 1
- request: specify several input files (paired-end reads) HOT 3
- how to install kmc on arm platform οΌthe compiled KMC binary will not be used on ARM HOT 2
- how to install kmc HOT 3
- Space shortage for output not reported
- When compiling make on the arm platform, the following error occurs, using the clang and clang++ compilers HOT 21
- Error: some error while reading fasta file, please contact authors (kmc_core/fastq_reader.cpp: 704) HOT 1
- KMC yields an error on fixed width fasta files HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kmc.