Code Monkey home page Code Monkey logo

bbmap's People

Contributors

galaxy001 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bbmap's Issues

Can bbmap be used for mapping short contigs to reference genome?

Hi,

I am trying to extract viral information from metagenome, however, 95% of the assembled viral contigs were less than 10k, so I am thinking to map the short contigs(<10k) to the reference viral genome to get more long viral contigs, can I do this by bbmap.sh?

Thank you!

Ning

Ubuntu 14.04 install

Hi, is this the best repo/docs to try to install BBMap on an Ubuntu 14.04 machine? Thx

Tile comparison in clumpify

Hi,

I have some questions regarding this code to make sure I understood it correctly.

For a given read with QNAME xxx:yyy:zz:6:1205:5221:72504, following the naming conventions for NovaSeq 6000: lane=6, surface=1,swath=2,tile=05,x=522,y=72504
I think you use 1205 as 'tile' in your code. Is this correct?

If so, for another read with QNAME ..:6:1305:5000:5000, you would then define tile as 1305. When you use

double d=Tools.absdif((double)a[i], (double)b[i]);
, since you don't use absDif from ClusterTools (
static final float absDif(float[] a, float[] b){
) but use absdif from shared tools(
public static int absdif(int a, int b) {
), it would return 100. So if it is on the same tile (05) but different swaths (2,3) it would not continue. Thus it will only treat duplicates from adjacent tiles as cluster duplicates, not from adjacent swaths. Is this correct?

Thank you!

Cannot run reformat.sh for subsample WGS

Hi, I'm having a problem with the reformat.sh

OS=Windows 10
Java=java version "1.8.0_221"
Running= Windows Power Shell ISE

This is the command I run for assemblystats and the results:

PS D:> java -cp /Tesis/bbmap/current jgi.AssemblyStats2 in=/Tesis/bbmap/Anomolachesilla/Anpal_R1.fastq.gz
A C G T N IUPAC Other GC GC_stdev
0.3265 0.1718 0.1740 0.3277 0.0000 0.0000 0.0000 0.3458 0.1026

Main genome scaffold total: 118978742
Main genome contig total: 118978742
Main genome scaffold sequence total: 17582.156 MB
Main genome contig sequence total: 17582.006 MB 0.001% gap
Main genome scaffold N/L50: 70607581/150
Main genome contig N/L50: 70545810/150
Main genome scaffold N/L90: 70607581/150
Main genome contig N/L90: 70545810/150
Max scaffold length: 150
Max contig length: 150
Number of scaffolds > 50 KB: 0
% main genome in scaffolds > 50 KB: 0.00%

Minimum Number Number Total Total Scaffold
Scaffold of of Scaffold Contig Contig
Length Scaffolds Contigs Length Length Coverage


All 	   118,978,742	   118,978,742	17,582,154,806	17,582,005,997	 100.00%
 50 	   118,856,685	   118,856,685	17,576,837,747	17,576,726,133	 100.00%
100 	   117,054,982	   117,054,982	17,437,127,756	17,437,017,664	 100.00%

BUT, Get I try to run the reformat.sh with my sequences I obtain this:

PS D:> java -cp /Tesis/bbmap/reformat.sh in1=/Tesis/bbmap/Anomolachesilla/Anpal_R1.fastq.gz in2=/Tesis/bbmap/Anomolachesilla/Anpal_R2.fastq.gz out1=/Tesis/bbmap/Anomolachesilla/Anpal_1.fastq.gz out2=/Tesis/bbmap/Anomolachesilla/Anpal_2.fastq.gz samplerate=0.1 int=t
java : Error: Cannot find or load the principal class in1=.Tesis.bbmap.Anomolachesilla.Anpal_R1.fastq.gz
Line: 1 Character: 1

  • java -cp /Tesis/bbmap/reformat.sh in1=/Tesis/bbmap/Anomolachesilla/An ...
  •   + CategoryInfo          : NotSpecified: (Error: no se ha...pal_R1.fastq.gz:String) [], RemoteException
      + FullyQualifiedErrorId : NativeCommandError
    
    
    

I already use:

java -cp ./Tesis/bbmap/reformat.sh in1=./Tesis/bbmap/Anomolachesilla/Anpal_R1.fastq.gz in2=./Tesis/bbmap/Anomolachesilla/Anpal_R2.fastq.gz out1=./Tesis/bbmap/Anomolachesilla/Anpal_1.fastq.gz out2=./Tesis/bbmap/Anomolachesilla/Anpal_2.fastq.gz samplerate=0.1 int=t

An others variats like .fq.gz and others.

Thanks for the help

randomreads.sh simulated mates are far away

I am simulating test reads with randomreads.sh from BBMap package.

My command is

~/users/tg/hhovhannisyan/Software/bbmap/randomreads.sh \ ref=C_glabrata_CBS138_current_chromosomes.fasta \ out1=read1.fastq out2=read2.fastq \ build=1 \ length=10 \ reads=2 \ coverage=-1 \ replacenoref=t \ simplenames=t \ seed=-1 \ paired=t \ metagenome=t \ addpairnum=t

The output is:
for read1

@0_+106297_106306_ChrM_C_glabrata_CBS138 (1402899 nucleotides) 1:
AGGTTTTAAT
+
989945?446
@1
-_438159_438168_ChrJ_C_glabrata_CBS138 (1195129 nucleotides) 1:
ATATCTTCCT
+
AA?CB?@bcb

for read2 :
@0_-761178_761187_ChrM_C_glabrata_CBS138 (1402899 nucleotides) 2:
AGCAGAGAGA
+
?>64844:64
@1
+_646646_646655_ChrJ_C_glabrata_CBS138 (1195129 nucleotides) 2:
TGCCAGTTTC
+
??B@A?><@>

As far as I understand the reads @0 from read1 and @0 from read2 correspond to the read pair. However, there a super far away form each other. Is this a normal behaviour of the software?

Thanks

Can this software create an abundance table?

Hi Brian I have used your software for many years but was wondering if it had this capability.

I have a directory of files (48 files, 4 groups each containing 12 samples) containing the output from diamond. I used diamond to perform a blastx search of reads against a protein database. All I want to be able to do is generate counts for each occurrence of each gene identified per sample and create an abundance table containing these counts for all samples.

3TFCDRXX:2:1101:19180:1172	BAC0211|mdtB/yegN|sp|P76398|MDTB_ECOLI	52.0	50	24	0	1	150	741	790	1.2e-08	50.1
A00484:57:H3TFCDRXX:2:1101:9598:1204	BAC0316|pstB|sp|P0AAH0|PSTB_ECOLI	72.9	48	13	0	150	7	155	202	9.2e-17	77.0
A00484:57:H3TFCDRXX:2:1101:29939:1611	BAC0619|copA|tr|Q7WYH1|Q7WYH1_PSEPU	65.2	46	16	0	6	143	132	177	3.3e-16	75.1
A00484:57:H3TFCDRXX:2:1101:29595:1642	BAC0211|mdtB/yegN|sp|P76398|MDTB_ECOLI	85.2	27	4	0	3	83	889	915	4.5e-08	48.1
A00484:57:H3TFCDRXX:2:1101:16242:1689	BAC0467|zraR/hydH|sp|P14375|ZRAR_ECOLI	58.3	48	20	0	146	3	284	331	1.7e-10	56.2
A00484:57:H3TFCDRXX:2:1101:9516:1752	BAC0646|mdtB|tr|D0ZND9|D0ZND9_SALT1	64.5	31	11	0	58	150	42	72	5.1e-07	44.7
The 

Above is what the file looks...It basically has the sequence id extracted from a .fasta file and its corresponding blastx hit.

A buddy created a simple script to do this:

cut –f 2 diamond_output.tab > diamond_output_ids
Then:
sort diamond_output_ids | uniq –c | sort –n > ids_counts

The output of ids_counts looks like this (the numbers represent the number of times the gene was observed):

86 BAC0269|nia|tr|Q92Z60|Q92Z60_RHIME
  87 BAC0504|farB|tr|Q9RQ29|Q9RQ29_NEIGO
  88 BAC0078|copA|sp|O32220|COPA_BACSU
  89 BAC0487|pmrA|sp|Q70FH0|PMRA_PECSS

But I feel like BBMap could do this far more efficiently than doing each sample serially. Can BBMap help me?

bbmap repair reads renaming error

Hi,
I noticed that when bbmap repair renames reads by removing /1 or /2 at the end of reads' names, it also remove /1 or /2 in the quality score line if it finishes by /1 or /2 ... which is really annoying. Is it possible to control that and to give a correction ?
Regards,
Laure

How to avoid interleaved output with ecco=t?

I am doing overlap-based error correction with bbmerge and I want only the error-corrected Read 1 reads. I have set ecco=t, mix=f, merge=f because I want to keep reads that are mergeable (have extensive overlap), but I don't want to merge them. If I provide only one output file ('out'), I get an interleaved file. If I also provide 'out2' to try to separate Read 1 and Read 2, the output files aren't even created (neither read files nor histogram) even though the insert size/stats are all reported. Removing 'merge=f' doesn't affect the outcome. How can I get the mergeable (but not merged), error-corrected Read 1 reads in a non-interleaved file?

Feature request

FilterReadsByName names=headers.txt should work with "stdin" like in= and out= do, as well. Please and thank you!

My reason is selfish. I want an elegant pipe to split a fastq into two mutually-exclusive halves:
reformat.sh interleaved=FALSE in=whole_file.fastq out=stdout overwrite=true samplerate=0.5
| tee half1.fastq
| grep ${HEADER}
| cut -d "@" -f 2
filterbyname.sh in=whole_file.fastq out=half2.fastq overwrite=true names=stdin include=false

thanks in advance. Feel free to ignore ;)

threads crashing

Hi,

repeatedly getting this error when using:

mapPacBio.sh ref=in.fa nodisk=t ambig=all local=t maxindel=50 nzo=t outm=mapped.fa in=reads.fa nodisk=t

I'm mapping an assembled ~4Mbp chromosome onto another draft one of similar size.

Thanks,

Theo

java -Djava.library.path=/home/theoa/bin/bbmap/jni/ -ea -Xmx276730m -cp /home/theoa/bin/bbmap/current/ align2.BBMapPacBio build=1 overwrite=true minratio=0.40 fastareadlen=6000 ambiguous=best minscaf=100 startpad=10000 stoppad=10000 midpad=6000 ref=../chr20/chr20scafs.fa nodisk=t ambig=all local=t maxindel=50 nzo=t outm=.//drom_chr20scafs.outm.fa in=genomes/drom.fa nodisk=t
Executing align2.BBMapPacBio [build=1, overwrite=true, minratio=0.40, fastareadlen=6000, ambiguous=best, minscaf=100, startpad=10000, stoppad=10000, midpad=6000, ref=../chr20/chr20scafs.fa, nodisk=t, ambig=all, local=t, maxindel=50, nzo=t, outm=.//drom_chr20scafs.outm.fa, in=genomes/drom.fa, nodisk=t]

BBMap version 36.11
Set MINIMUM_ALIGNMENT_SCORE_RATIO to 0.400
Retaining all best sites for ambiguous mappings.
Executing dna.FastaToChromArrays2 [../chr20/chr20scafs.fa, 1, writeinthread=false, genscaffoldinfo=true, retain, waitforwriting=false, gz=true, maxlen=536670912, writechroms=false, minscaf=100, midpad=6000, startpad=10000, stoppad=10000, nodisk=true]

Set genScaffoldInfo=true
Set genome to 1

Loaded Reference: 0.001 seconds.
Loading index for chunk 1-1, build 1
Indexing threads started for block 0-1
Indexing threads finished for block 0-1
Generated Index: 2.883 seconds.
Analyzed Index: 0.798 seconds.
Started output stream: 0.024 seconds.
Cleared Memory: 0.384 seconds.
Processing reads in single-ended mode.
Started read stream.
Started 46 mapping threads.
Exception in thread "Thread-48" java.lang.AssertionError: -22, -22
259420, 1,0,17646635,17654197,0,00,293,97148,404340,0,365546,,mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmSmSmSSSSmmSSSSmSSSmSSSSSSSmSSmmSSSSmSSSmmSmSSSmSSSSmmSSSSSmSSSSSSSSSSSmSSSmSSSSSSmmSSSSmSmSmSSSmmSSmSSSmmSSSSSSSSmSSmmSSSSSSSmmSmSmSmSSSmmSSSSmSSmmSSSmSSSSmSSmmSmSSSmSSSSSSSmmSSmSSSSmSSSSmmSSSSSSSSSmSmSSSSSSSSSSmmmmmmSSSmSSSSSSSmSSSSSSSSmmSSmSSSSSmmSSSmSSSSmSmSSmSSSSSSSmSSSSSSSSSSSSmSSmSSSSmSSmmmSmmmSSSSmSSSSSSmmSSSSSSSSSSSSSSmSSSmmSSSSSSSmmSSSmSSmSSSSmSSmSSSSSmSmSmSSSmmSSSmSSSSSmSmSSmSmSSmSmSSSSmSSmSSSSmSmSSSSSmmSSSSSmSmSSmSSSSmSSmSmSmSmmmSmSSSSSmSSSmSSSmSmSSSSSSSSmmSSmSSSSSmSSSSSSSSmSSSSSmSSSSSSSmSSSmSmSSmSSSSSmSSSmSSSSmSSSSSSmSSSSSSSSSSSSSSSmSSSmSSSSSSmmSSSSSSSSSSSSSmSSSSSSSmmSSSSSmSmSmSSmmSmSSSSSmSSSSSmmSSmSSSSSmSSSSSmSSSSSSSSSSSmSSSSSSSSSmSmSSSSSSSSSSSSSmSSSmSSmSSSSSSSSSSSSSmSSSmSmmSSSSmmSSmmSSSSSmSSmmSSSSSSSSmSSSSSSSmSSSSSSSmmSSmmmmSmSSSmSSSSmmSSmSmSSSSmSSSSSSmmmSmmmmSSmmSmSmSSmSSSSSSSmSSSSSSSSSSmSSSSSSmSSmSSmSmSSSSSSmmmSSSSSSSSmSmmSSSSSSSSSSSSmSSSSSSSmmSSSSSSSSmmSSSSSSSSSmSSmSSSSSmSSmSSSSSSSSSmmSSmSSSSmmSSSmSSmSSSSmSSmSSSmmSSSmmSmSSSSSSSSmmSSSmSSSSmSSSmSSSmSSmmmSSSmmSSSmmSSSmmSSSSSSSSSSmmSSSSSmSmSSSSSSSmSmSmSmSmSSSmSSmSmmSmSmSSmSmSSmSSmSSmSSSSmSSSSmSSSSSSSSSmSSSSSSSSSSmSmmSmSmmSSSSSmSSSSSmSmSSSSSSmSmSSSSmmSSmSSSSmSSSSmmSSSSSmmSSSSmSSSSmSSSSSSmSSSSSSmSSmSSmmSmSSmSSSSSSSSSSSSSmSmSSmmSSSSmSSSmSmmSSSmmSSSSmSmSSmmmSSSSmmmSmmSSSSSSSSSSSSmSmSSmSSSSSSSSSmmSSSSSmSSSmmmSmmSmmmSmSmSSmSSSmmSSSSSSSmSSSSSSSmSSmmmSSSSmSmSSSSmmSSSSSmSSSSSmSSSSmSSSmSSSSmmSSSSmSSmSmSSSSSSmSSmSSSSSmSSmSSSmmmmSmSSSSmSSSSSSSSSmSSmSSSSSSSSSSSSSmSSSSSmSSSSmSmSSSSSSSSSSSSSSSSSSSSSmSSSSSSSSSSSSSSSSSSmSmSSSmSSSSmSSSmSmSmSSSSmSSSSSSSSmSmSSSSmSmSSSmmmSSSmSSmSSmSSSSmmSSSSSmSSSmmSSmSSSSSSmSSSSSSSSSSSSSSSSSSSSmSSSSSmmmSSmSSSmmmSSmSSSmmSSmSmSSmSSmmSSmSmSSSSSSSSmSSSSSSSSNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNmSSSSSSSSSmSSSSSSSmmSSmSmSSSmSSSSmmmmSSSSSSSSSSSSmSSSSSSmmSSmSSSSSSSSSSSSmSSSmmSSSSmmmSSmSSSSmSSSmSmSSmmmSmSSSmmmSSSSSSmmSSSSSSSmSSSSSSSSSSSSSSSSSSSSSSmSmSSSmmSmSSSSmSmmSSSSSSSSSSmSSSSmSSSmSSSSSSSmSSSmSSSSSSSSSSSmSSSSmmSSSSSSSmSSSSSSSSSSSSmmSSmSSSSmSSSmSSmSmmmSSmSSmSmmSSSmSSSSmmSSSSSSmmSSmSmSSSSmmSmSmSSmmSSSSSSSmSmmSSSmSmSSSSSmmSSSSmmSSmSSSSmSSSSmSSmmSSSmmSmSSSSmmSSSSSSSmSSmmSSmmSSSSSSSSSSSSSSSSSSSSSSSSSSSSmSmmSmSmmmSSSmSmSmSmSSSSSmSSSSmSSSSSmSmmSSSSSmmSmmSmSSSmSSmSmSmSSSmSmSSmmmSSSSSSmmmSSSSSSSSmmSmSSSSSSSmSmmSSSSSSmmSSSSSSmSmSSSSSSSSSSmSmSSSmSSSmSSmmSmSSSmSmSSSSSSSmmmSSSmSSSmSSSmSSmSSSSSSSSSSSSSSSmmSSmSSmSSmSmSSSSSSSSSSSSmSmSSmmSSSSSmmmmSSSSSSSSSmSmSSSmSSmSSSSSmSSSSSSmSmmmmSSSSmSmSSSSSSmmSSSSSSmSSSSSSmSSSSSmSmSSSSmmSSSSSSmSmSmSmSSSSmSSSSSSmSmSSSSSmmmSSmSmSSSmmSSSSSSSSSSSSmSSSSSmSmSmmSmmSSSSSSSSSSSmSSSSSSSSSSSSSSmSmSSSSmSSSSmSSSSSSmSmSmSmSSSmmSSSmmSSSmSSSmSmSSmmmmSmSmSmSSSmSmSmSSSSmSSSSSmmSSSmmSSSmSmSSmSmSmSmmSSSSmmSSSSSSSSmSSSSSSSmmSmmmSSmmSSSmSmSSSSSmSSmSSSmSmSSmSSSSSSmSmSSmSSSSSmmSmSSSSSmSmSSSmmSSSSSmSSmSSmmSSmmSmmSSSSSmSSmSmmmSSSSSSSmSSmSmSSSSSSSSSmSmSSmSSSSSSmSSSSSSSmSSSmSmmSSSSSSSSSSSSmmSmSmmmmSmmSSmSmmSSSmmSmSSSSSSmmmSSmSmSSSSSSSSSmSSmSSSSSmSmSSmmmSSSmmmSSSmSmmSmSSSSmSSSSSSSmmSSSSmSmmSSSSmSSSmSmSSSSSSmSSSSmSSSSmSSmSSSSSSSmSSSSmSmSSmmSSSSmSSSmSSSmSSSmSSmSSmmSSSSmSSSSmSSSSSSmSSSmSSSSSSmmmSSSSSmSSSSSSSSSSmmSSSSSSSSSSSSSSSSSSSSSmSSSmSSSmmSSmSSSSSmSSSmmSSSSSSmmSSSSSSmSSSSSSmSSmmSmmSSSSSSSSmmSSSSmmmmSSSSSSmSSSSSmmSSSmmmSmSSSSSmSmSSSSSmSSmSSmSmmSmmSSSSSmSSSSSSSSSSSSmSSmSSSSSSSSSSSSSSSSSSSmSSSSSSSSSSmmSSSSSmSSSSSSmmmSSSSSSSSmSmSSSSmSmmSSSSSSSSmSSSSSSSmmSSmSSmSSSSSSSmmSSmSmmSSSSSSSmSSmSmmSSSSSmSSmSSSSSSmmmmSSSSSSSSSSSmmSSSSSSmSSSmSSSSSSSSSSmSSmmSSSSSmSSSSSSSmSSSmSSSSmSSSSSSSSSSSmSmmSSSSSSmSSSSmSSSSSmSSSSSSSSSSmSmSSSSSmmSSSSmSSmmmmSmSSmmmSSSSmSSSSmmmSSmmmSSmmSSmSSSSmSSSSSSSSSSmSmSmmSSSSSSmSSSmSmSmmSSSSSSmSSmSSSSSSmSSSSmSSmSSmSSSSSSSSSmmSSmSSSSSSSSSSmSSSmSSSSSSSSSSSSmSmSmSSSSSSSSSmSSSSSmSSSSSmSSSSSSSmSSSmSSmSSSSmSSSmmSSSmSSmSSmSmSSmSSSSSSmSSSSmSSSSSSSSSSmmmSSSmSSSSSSSSSSSmSSSmSmSSSmSmSSmSmmmmSSSSmSSmmSSSmSSSmmSmSSSSSSSSSSmmSSmSSmmSmSSmmmSSSSSmSSSSSSmmSSmSSSSmSSSSSSSSSSSSSSmSSSSSSmSSSmSSSSSSmSSSmSSSSSSSmSmSSmSSSSSSSmmmSSSmSSmSSSSSSSSSSSSmmSSSSSSmSmSSSmSSSSSSSSmSSSSSSmmSSSSSSSSSSSSSSSSmmSSSmSmmSSSSmSSSSSmSSSmSSSmSSSSmSSmSmSSmSSSSSSSmSSSSSSSSmmSSSSSSSSmSSSSSSSSSSSSSSSSmSSSSSSSSSSSSSSSSSSSSSmSSmSSSSmSmSSmSSSSSSSmSSmSSSSSSmmSSSmSSmSSSSSmmSSSSSSSSSSSSSmSSmmmSmSSmSSmmSSSSSSSSmSSSSmSmSmSSSmSSSSSSSSSSSmSSSSSSSmSSSSSSSmmSmSmSSmmSSSSSmSSSSSmSSSSmSSSSSSSSSSmSmSSSSSSSSmSmSSSSmmmSmSSSmmSSSmSSSSSmSmSmmSSSmSSmSmmSSSmmmmmmSSmSSmSSSSmSSSSmmSSSSSSSSSmmSmSSSSmmSmSSmmSSSSSmSSSSSSmSSmSSmSmmSmSSSSmmSSmSSSSSSSSmSSSSmmSmmmSSSSSSSmmmSmSmmmSSSSmSmmmmSmmSSSmSmmmSmSSSmmSSSSmSSSSSSSSSmSSSSSmSmmSSSSSSSSSSSSmmSmSSSSSSSSSSmSSSmSSSSSmmSSSSSSSSSSSmSSSSSSSSSSSmSSmmSmSSmSSSSmSSSSSSSSSSmmmSSSSSmSmSSSSmSSSSSSSSSSmSSSSSSSSSmmmmSSSmSmmSmSSmSmSSmSSSSmmSmmmSSSSSSSSmSSmSSSSSmmSSSSSmSSmSSmSmmSSmmSSSSmmSmSSSSSSSSSSSSSSSmSSSmSSSmSSSSSSSmmSSSSmmSSmSSSmSmSSSSSmSSSmSmmSSSSSSmmSSmmmSmSSSSSSSSSSSSSSSmSSSSSmSSSSSmSSmmSSmSSmmSmmSSSSSmSSSSSmSSmmSSSSSmSSSmSSSmSSSmmSSSSmSSSSSmSmmmmSmSSSmSSSSmSSSmSmSSSmSmmSSSSSSmSSSmSSSmmSmSSSSSSSmmmSSSmSSSSSmmSSmSSSSSmSSSmmSSSSmSSmSSSSSmSSSSSmSmmSmSSmSSSmSSmSSSmSSSmSmSSSmSSSSmSSSSSSSmSmSSSSSSSmSmSSSSmSmmSmSSSmmmSSSSSSSSSSmSSmmSSSSSSSSSSSmSSmmSSSSmSSSSSSmSSSSSSSmmSSSSSmSSmSSSSSSmSSSSSSSSSSmmSSSSSmSSSSSSSSSSSSSmSmSSSmSmmmmmmmmmmSSSSmmmmSSSSSmSSSSmSSSSSSSSmSSSmSmSSSSmSSmmSmSmSSmmSmmSmSSmSmmSmmmSSmmSmSmmSmSSSSmSSSmSSSmSmSSSSSmSmSSmmSmSSmmSSmSSmmmmSSSSSSSSmSSSmSmmSSmSmSSSSmSmSmmSSSSSSSmSSSmSmSSSSSSSSSSSSSSSSSSSSSSSSmSSmSmSmSSmmSSSSmSSmmmSmSSSSSSSSmSSSmSmSmSSSSSSmSmSSSSSSSSSSSSSmmSSmSSSSSSSmmSSmmmSmSSSSSSSSmSSmSSSSmSSSSmmmSSSSSSSSSSSSSSSSSSSSSSSSmSmmSSmSSSSmSSSSSmSSSSmmSSSmSmSSSSSSmSSmSSSSmSSSSSmmSmSSmSSmmSSSSSmSSSmSmmSmSSSmSSSSmSSSSmSmSmmSSmSmSSmSSSmSSSSSSSmSmSSSSmmmmSSSSmSSSmmSSSmSmmSSmSSSSSmSmSSSmSSSmSSmmSSSSmSSSmSSmSmSmSSSmSSmSSSSSSSSSSSSmS, AGCTACAAGGTACAAAATCAACACACAAAAATCAGTTGTCTTTCTACACATTAATAATGAAACATCAGAAAGAGACATTTTTTTAAAATCCCATTTACCATTGCACCCAAAAAAAATACCTAGGAATATATTTAATCAAGGGTGTGAAAGACCTGCACACTGAAAATCCTAAGACACTGAGGAAAGAGACTGAACAGACACAAATAAATGGAAGCTTACCCTCTGCTCACAGACAGAACGAATTAATGTTGTTAAAATGTCCATTCTACCCAAAGCAGTGCTCAAATTCAAAGCAAGCCTCGTCAAATCACTGTGCCCTCCACCAGAAAATAACAGAACATCGTGAATTGACTATACTTCAATAAAAAAAAAAAAAAAAACTTCAGTGGCATTTTTCACAGAAACAGAACAAACATTCCTAACATTCCTGTGGAAACACAAAGGACCCTGAACAGCCAAAGAGATTCTACAAAAGGGAAACAGGCTGGAGGCATCTCCCTCCCTGACTTCACACAGCATTACAAAAGCACAGTCACCAAAAAGGTGTGGAATTGGCACAACCACAGGGACATGGACCAGAGAGAGTCCAGTCTACAAACAAACCTACGGGTGTGTGGTCAGTCAATTCACAAGAAAGGAGCCAGGAACCTGCACCGGGGAAAGGACAGTCCCCTCGATGAATGGTGCTGGGAAAACTGGACACCACACGCCAAAGAAGGACGCGGCACACCGTCTGCACCACACACAGAACCTGACTAAAAACCCACAGCTTCCCACATCCGGCTATTCCTCTCCCGGGGCCTCTGAGGAGGGCCTCCCCTCAGCAGTCAGCCCCTCACGGGAAGCTTCTCCTCACTTCCCCTCACTTTCCCTTCCTCCTCACCTTCCCTCAGGGCAGCACCTCCTGTTGCGTCCTGAGGAGACTGGGCTCCTGTCAGCGTGGACCTAACCCCCTGTTGTCCCAAAGCCCCGGCAGGTCCAGAGGAGACGCGGCTCCAGTCAGGTGCGCAGGACGCCAAGTCCTCGGGCGGAACGCACGCACGGTTCCCGCTGCAGGTGGGGGAGGCGGGGCCCCGGGCCACCCCGCCCAAAAAAACGCGGACCCAGTCGCCCTCACTGCCAGTGGAGGCCCCGCCCTGACAGCGGTTCTGCTCAGTCCTTTAAACTGGAAATTCAAACTGAAATGGAGCCAAGTATCTGTAACTTCTCCAAATGCTTAGAAACAATCGACTGGAAGCCCTACGGAATGAAGTGAACGGAGACACAAACTTTAAACCTTACACCCTTGTCTACTGAGCTTTCATGGTAACCTGCCATAAATGACTTCCTCCCTCCCCGAGGCCTCGCTTTTGTTTTTAGCTGAAGGTGATATTCAAGGTGATGCTTCTGGCCATTTCAGGGAGGGAGTCAGTGTTCCTGGGTCTCTCCCCTGTATACAGGAGGTATGCAAGTTATTAAACCTGTTTGTTCTTCTCCCAGTAATCTGTCTTATTAGGGGCTGGGAGGTCTCTTCCCAGAACCTAGAAGGACAGAGGGAAAGCTGGTTTTCCTCCCACACACACGCAATGGAATTCTTCTCAGCAATAGACATTACACCATCAGCCCAGGGAAAGGCACGGGTAAGTCTTACTTGTGATGCAGGAAACCAGATGCGGGGCATGAGGGCCGTGTCCCAGGTCTCACTTGAGCCCCTGGGACGTGCCCCGCCAAGTGTGGGTCCTTGGCTTCACGCAGGAAAGAATTCAAGGGCGAGCCACAGTTGAGTGAAGGTAGATTTATTCAGAGAGATACATTGAAAGGCAAGAGAAAGGCCACGAGGGGTCGGGGTTGGGTGCTCAGATTAAAATAGGTACACATTCCATAGACAGAATGCGGGCCATCTCTGAAGAGGGAGAGAGAGAGGGGTGGCCGCGAGGTGCCATGTTGCTGGCTTTTATGGGCTTGGTGGCTTCATACGCTAGTAAGTGGAAGGACCAGTCTAAGGGGAAGGGGCTGGGATTCCCAGGAAGTTGGCCATTTCCCACCCTTTGACCTTTTGTGGCTAGCCTTGGGACTGCCACGGTGCCTGTGGGTGTGTTAGTCACCATGTTAATATATTACAATGGGTGTATAATGAAGCTCAAGATCTACTAGAAGTTAAATCTCCCATCATCCTGAGCCTCAAGGCCTACAGGAGGTTGAATCTTTCACCATTTTGATGTTAATTGCTGTGGCATTCCTTGAGTGGCTGTGCCCTGCCCCCTTCCTGTCTCACTTGCAAATTACTAAGTGAAAGAATCCAGTCCGAGGAGGCTGCACACAGTATGACCCTATTTACGACAATTATGGAAAAGGCAAAACTACGCACATGAGCAGATCATTGGGTGGGAGTGGGGGGGTGAAGTGTTCCATGTGAAACTTCAGCTGGTGTCACTATAAGTATGTCTAAGCCCACACCTTTGCACAGCCCAAAGGTCAGTCATAATCTATTCAAATTTAGTTGTTTAGGGTGTAGCAGCATCACAGGATGGAACATACAGGGTGACAGAATCTAACTGTATTAGAAATAGGTGGAAAAACTCTCAGCCGGTGGTGGGGCAAGAGGTGCTGGCCTAAGTGAACTTTGGAAATGCACAGAATCATTCTGCAAAGGCAGAATGGACCACTGTTAACCTGTGGGCTTTGTATTATAGAGTTCTTTCTGGGGTGTAGTGGCTAACAATTCTGAAGCCACTGTGAATGGAACCTGGAAAAGAGAAATGAGTTGATAAAGGGCAGGTGGTGGGAGCCAGGCTCCTCGCTGTGGGAATGGCAAGTCAGAGATGAGCAAGGGGAGGGCCTTGGCCCTCATACTCCTGGATCTGAGTTTCCTAGGTGGATGGCGTGGGGAGTGGACTGGCCAGGGGAAGGCACAGCTCTTTACCCTGGGAAGGCCCTGCAGATGTAGGAACCTCACACCCCTTTCTTCCCCCCATACCTGCTGGAAACACCCAGACCTTTGAAATCAGCCAATGCAGCTTTTCAGACATTCTCACCCCGAGGCAGATAAAGGAGCGTGAGGGTCTTGCACCACCCCCAAGCTGTCCCTGGAGGGTGGAGCCAGGATGGGGTAACTGGATCTCTGACCAGCAGGCCAAAGCAAGGAAGAAGACCAGTCCTGAGGGCAGGGGTCTCCTGAAGTCTGTGAGCCCGGTGCCCTGATGGGATTCATTCCCTGTGGGGACATGGGCGGGGCCTGCAGTGGTTCTGTTTTCCACCTGTGCCCCTTGAATGAGGTCAGGGCAGAACTGGACTCAGGGATCCCCTGGCAGTGTTTCAGTCTCACACAGTGGCTTTATAAAGGAACAGGAGAGTGAAGCTGAGCACCTGACAGCCACTCCTCTCTGGCTCTGAGCTGCTTCCATCAGGACCACAGTCCCCTGCATGGAGTCTGCTGTAACCCACCTGCCCCAACCTGGGCTCTCCTCTTCCTGGATCCCACTCCCAGGCCAGATGCATCTGTCTAACCACTGCATCCTCCTCTCACCTTGAGGCTGACAGGCTCCACACAGGACTGTTGTCCTGCCACTCCCCACACCCCAGGAGGCAGCCCCACCTTCCCTGGTTGCTGAGCTGAGACGCCTGGGGCTTCCCAATCCCCCACTTTCCTACTGCCCTGGTGACAGTCAGCTCTTCCTCCCAAAATCCCAGGTGGGGTTTTGCAGAAATTGACAACCAATCTAAAATTCATGTGGAAAAGGCAAGGAGCTCAGAATAGACCAAGGAATCCTGAAAAAGAAGAACAAATTTGGAGGTATGGAAAACCAGGCAGAATCCCATAGTTGGTATAAAGCCACACTAGCCTTAGTCTAGCCCTGACTCATACCCATACCTTACCCTAACCCTAAATTTAACCCATACCCCTTAACCATAACAGGTTTAAAAAAAAAAAAAAAAAAAAAGATCGTGTACTACAAGCGTCTCTACACCAGAAACCCAGGTTTTTCAACATATTAAAACCAAGCAATGTAATATACCATAGTAACTATGTGACATAAAAACACAAGGTTATGTCAATTGATACAGAAAAAACTTTACACAAAATCCAGTGCTGTTTCATATTCATAAACACGCAAGACAGTAAAAGGAAATTTGTTCAACATGATAAAATGCATTATGCAAAACCCGTAGCAAACTTCATGCTCAATGATGAAGAACTAAAAGGTTTACTCCTGAAATCAAGAACAAGGCCATGATGCTGGCTTTCCCCACTGTTACTCAGTATTGCACTGGAATAACTGCACTGAGCAATGAGGAAAGAAGGAAGATGCAACCACATGTGAAAGGAAAAAGTAAAACGGCTTCTATTCAAAGACACAATTCACATCATCTTATATATAGACCCTCCTAAAGAATCAACTAAAGACTATTTGAGCTTCTAAACTAATTCAGCAAAAGAGTAGAACAAAACATCAATCAACAAAAATCAACTTTATTTGTAGACACTAGCAATGAATAATCTGTAAAAGATTTTAACAAAACAATTCCCTTTTAAATAATTTCAACATTAATGCAATACTTTGGGAGGAGGTTTAATAAGGATGCAAGACTGGTACACTGAAAGTTATAAAGCACTGGTGAAAGAAATACTAAGTAAATATGCAGACATCCCATGGTGATGGAAGCACTTAATCTTGTTAAGATGGCAAAGATCGCAACAGTACATGATGCAACCTACAGATTCAATGTGATTCCTGTGAAAAGTCCCAAGGCCTAAGGGAAGAAGTACACAAGCCTATCCTACCTTACATATGGAATCACAGGGGACTCCAAAGTCCCAAAACAATCTTGAAAAAGAAAATCAAATTTGGAGGCTCACATTTTCCAATTTCAAATCTTAGCACAAAGCAACAGGAAATGAAACAGTGTGATACTGGCACAAATATATACTTATAGATCTGTGGGACTGAATGGTGAGCCCAGAAATAAACCCATCTGTCTAGGATCACCTGATTTTTGACAACCGTGGGCAATATTAAACTGGGAAAAAATAATGTTTTCAACAAATGGTGCTGGGCAATTGGGTACCCAAGTGTAGAGGAATAAAACTGGGTCCTTACTTCACACCACACACAAAACTAACTCAAGATACATTAAAAAGCTTAAATTTAAAAATAAAAAGAAAAATTACAGTCAACCCTCCTTACCTGCAAGTTCCACATCCACAGAATCAACCAACTTCAAATAGGGAAAAAAACATATTTCGAGAAAGTTCCATTCCGGAAAACTTGAATTTGCTGCATGTCAGCAACTATTTCACATTACTATTTTTCAGAAATATTTACATATCATTTACATTGTATTAGGTGCTATAACTGATCTAGAAATGATTTAAAGTAAACAGGAATTTGTGCTTAGATTATTTGCAAATACTACACCATTTTATATAAAGGACTTGAGCACTGGAACCAATACTTCACAGATAAGGAAGGACAGCTGTATAAACTGTCAAAAAAAAAAATGGGGATAAATCATCTTACCTTGGATTTAGCAATTTTCTTAGATCTGACACCAAAACACAAGCAATGAAAGAAAAAAGATAATTTGCATTTCCTCAAAAGTAAAATTTGTGTGCGTCAAAGATGCTATCCAGAAAGTGAAAAGACAGCCCACAAAATCAAAGAAAATATCTGCACATTATATACCTGAAAAGAATCTTTATCAAGAATATACCAAGAACTTTTACAACTCATAAACAAAAAGACAAACAAGCCAATTTTTAAAACAGTTAAAGAACTTGAAGAGACCCTTCTCCAGACAAGACATACAAAGTCAACAAGCACATGACAATATCATTAGTCATGATGTAAATGCCAATTAGTACAATGCCATCCCCCTTCAAACACACTTAGAATGCTTACAATCAAAAACTGAATAAAAATAACAAATATTCGCAAACCTTAGAGAAACTGGAACCCTTAAACGTTACTTGTGGAATTGGAAAATGGCACAGACTATATGGACAACAGTTTGTCATTTCCT
at align2.TranslateColorspaceRead.realign_new(TranslateColorspaceRead.java:407)
at align2.TranslateColorspaceRead.realign_new(TranslateColorspaceRead.java:638)
at align2.AbstractMapThread.genMatchStringForSite(AbstractMapThread.java:999)
at align2.AbstractMapThread.genMatchString(AbstractMapThread.java:895)
at align2.BBMapThreadPacBio.processRead(BBMapThreadPacBio.java:562)
at align2.AbstractMapThread.run(AbstractMapThread.java:495)
Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45


Warning! 1 mapping thread did not terminate normally.
Check the error log; the output may be corrupt or incomplete.
Please submit the full stderr output as a bug report, not just this message.


------------------ Results ------------------

Genome: 1
Key Length: 12
Max Indel: 50
Minimum Score Ratio: 0.4
Mapping Mode: normal
Reads Used: 362346 (2003953468 bases)

Mapping: 1904.373 seconds.
Reads/sec: 190.27
kBases/sec: 1052.29

Read 1 data: pct reads num reads pct bases num bases

mapped: 2.3202% 8407 1.9465% 39006630
unambiguous: 2.2070% 7997 1.9356% 38787559
ambiguous: 0.1132% 410 0.0109% 219071
low-Q discards: 0.4460% 1616 0.3609% 7231533

perfect best site: 0.0003% 1 0.0000% 17
semiperfect site: 0.0003% 1 0.0000% 17

Match Rate: NA NA 90.8816% 36276111
Error Rate: 62.2730% 8400 6.6342% 2648107
Sub Rate: 62.2655% 8399 2.9901% 1193520
Del Rate: 58.6700% 7914 2.2777% 909169
Ins Rate: 58.8035% 7932 1.3664% 545418
N Rate: 28.0080% 3778 2.4842% 991581
Exception in thread "main" java.lang.AssertionError:
The number of reads out does not add up to the number of reads in.
This may indicate that a mapping thread crashed.
If you submit a bug report, include the entire console output, not just this error message.
5768+2639+0+352322+1616 = 362345 != 362346
at align2.AbstractMapper.printOutput(AbstractMapper.java:1867)
at align2.BBMapPacBio.testSpeed(BBMapPacBio.java:458)
at align2.BBMapPacBio.main(BBMapPacBio.java:35)

filterbytile.sh only using Xmx1876m on a large 7.4G instance

Hi, I discovered filterbytile.sh recently and I am trying it to analyse some of our data.

I wrote a DNAnexus app, and when run with default parameters, it runs with -Xmx1876m, even though the instance has 7.4G of memory. If on an instance half the memory size, the Xmx only takes about 300m of memory.

Should I specify the Xmx flag manually to something closer to the total mem amount available?

rRNA filtering for low RAM.

Hello,

I have these files with rRNAs. I have build my own custom silva database to eliminate them. However, the memory requirements appear to be quite high.

Also, I have noticed the rqcfilter.sh
I have found the riboKmers.fa.gz file testing bbduk. But, still the memory is quite high to use it.

What is your thought I the best way to filter rRNAs from metatranscriptomes with lowest memory and speed?

Is it better to use bbduk or rqcfilter?

Many thanks
Rick

clumpify hanging

It appears that running clumpify in an SGE job with no enough memory causes an "Exception in Thread" error, but clumpify doesn't die. The process just hangs and continuously waits for all threads. Here's the full log from one run:

java -ea -Xmx60g -Xms60g -cp /ebio/abt3_projects/software/dev/llmgqc/.snakemake/conda/72fe9c49/opt/bbmap-37.78/current/ clump.Clumpify -Xmx60g in=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/R1.fq.gz in2=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/R2.fq.gz out=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/R1_dedup.fq.gz out2=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/R2_dedup.fq.gz overwrite=t usetmpdir=t tmpdir=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/ dedupe=t dupedist=2500 optical=t
Executing clump.Clumpify [-Xmx60g, in=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/R1.fq.gz, in2=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/R2.fq.gz, out=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/R1_dedup.fq.gz, out2=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/R2_dedup.fq.gz, overwrite=t, usetmpdir=t, tmpdir=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/, dedupe=t, dupedist=2500, optical=t]
Version 37.78 [-Xmx60g, in=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/R1.fq.gz, in2=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/R2.fq.gz, out=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/R1_dedup.fq.gz, out2=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/R2_dedup.fq.gz, overwrite=t, usetmpdir=t, tmpdir=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/, dedupe=t, dupedist=2500, optical=t]

Read Estimate:          10080760
Memory Estimate:        7691 MB
Memory Available:       48242 MB
Set groups to 1
Executing clump.KmerSort [in1=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/R1.fq.gz, in2=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/R2.fq.gz, out1=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/R1_dedup.fq.gz, out2=/tmp/global2/nyoungblut/LLMGQC_27982106400/NO08/0/1/R2_dedup.fq.gz, groups=1, ecco=false, rename=false, shortname=f, unpair=false, repair=false, namesort=false, ow=true, dedupe=t]

Making comparator.
Made a comparator with k=31, seed=1, border=1, hashes=4
Starting cris 0.
Fetching reads.
Making fetch threads.
Starting threads.
Waiting for threads.
Exception in thread "Thread-9" Exception in thread "Thread-15" Exception in thread "Thread-12" Exception in thread "Thread-11" java.lang.AssertionError: SRR1761740.1 1 length=100
	at hiseq.FlowcellCoordinate.setFrom(FlowcellCoordinate.java:51)
	at clump.ReadKey.<init>(ReadKey.java:46)
	at clump.ReadKey.<init>(ReadKey.java:33)
	at clump.ReadKey.makeKey(ReadKey.java:23)
	at clump.KmerComparator.hash_inner(KmerComparator.java:79)
	at clump.KmerComparator.hash(KmerComparator.java:70)
	at clump.KmerComparator.hash(KmerComparator.java:66)
	at clump.KmerSort$FetchThread.run(KmerSort.java:815)
Exception in thread "Thread-10" Exception in thread "Thread-13" java.lang.AssertionError: SRR1761740.1201 1201 length=100
	at hiseq.FlowcellCoordinate.setFrom(FlowcellCoordinate.java:51)
	at clump.ReadKey.<init>(ReadKey.java:46)
	at clump.ReadKey.<init>(ReadKey.java:33)
	at clump.ReadKey.makeKey(ReadKey.java:23)
	at clump.KmerComparator.hash_inner(KmerComparator.java:79)
	at clump.KmerComparator.hash(KmerComparator.java:70)
	at clump.KmerComparator.hash(KmerComparator.java:66)
	at clump.KmerSort$FetchThread.run(KmerSort.java:815)
Exception in thread "Thread-16" java.lang.AssertionError: SRR1761740.601 601 length=31
	at hiseq.FlowcellCoordinate.setFrom(FlowcellCoordinate.java:51)
	at clump.ReadKey.<init>(ReadKey.java:46)
	at clump.ReadKey.<init>(ReadKey.java:33)
	at clump.ReadKey.makeKey(ReadKey.java:23)
	at clump.KmerComparator.hash_inner(KmerComparator.java:79)
	at clump.KmerComparator.hash(KmerComparator.java:70)
	at clump.KmerComparator.hash(KmerComparator.java:66)
	at clump.KmerSort$FetchThread.run(KmerSort.java:815)
java.lang.AssertionError: SRR1761740.1001 1001 length=29
	at hiseq.FlowcellCoordinate.setFrom(FlowcellCoordinate.java:51)
	at clump.ReadKey.<init>(ReadKey.java:46)
	at clump.ReadKey.<init>(ReadKey.java:33)
	at clump.ReadKey.makeKey(ReadKey.java:23)
	at clump.KmerComparator.hash_inner(KmerComparator.java:79)
	at clump.KmerComparator.hash(KmerComparator.java:70)
	at clump.KmerComparator.hash(KmerComparator.java:66)
	at clump.KmerSort$FetchThread.run(KmerSort.java:815)
java.lang.AssertionError: SRR1761740.801 801 length=31
	at hiseq.FlowcellCoordinate.setFrom(FlowcellCoordinate.java:51)
	at clump.ReadKey.<init>(ReadKey.java:46)
	at clump.ReadKey.<init>(ReadKey.java:33)
	at clump.ReadKey.makeKey(ReadKey.java:23)
	at clump.KmerComparator.hash_inner(KmerComparator.java:79)
	at clump.KmerComparator.hash(KmerComparator.java:70)
	at clump.KmerComparator.hash(KmerComparator.java:66)
	at clump.KmerSort$FetchThread.run(KmerSort.java:815)
Exception in thread "Thread-14" java.lang.AssertionError: SRR1761740.401 401 length=39
	at hiseq.FlowcellCoordinate.setFrom(FlowcellCoordinate.java:51)
	at clump.ReadKey.<init>(ReadKey.java:46)
	at clump.ReadKey.<init>(ReadKey.java:33)
	at clump.ReadKey.makeKey(ReadKey.java:23)
	at clump.KmerComparator.hash_inner(KmerComparator.java:79)
	at clump.KmerComparator.hash(KmerComparator.java:70)
	at clump.KmerComparator.hash(KmerComparator.java:66)
	at clump.KmerSort$FetchThread.run(KmerSort.java:815)
java.lang.AssertionError: SRR1761740.1401 1401 length=100
	at hiseq.FlowcellCoordinate.setFrom(FlowcellCoordinate.java:51)
	at clump.ReadKey.<init>(ReadKey.java:46)
	at clump.ReadKey.<init>(ReadKey.java:33)
	at clump.ReadKey.makeKey(ReadKey.java:23)
	at clump.KmerComparator.hash_inner(KmerComparator.java:79)
	at clump.KmerComparator.hash(KmerComparator.java:70)
	at clump.KmerComparator.hash(KmerComparator.java:66)
	at clump.KmerSort$FetchThread.run(KmerSort.java:815)
java.lang.AssertionError: SRR1761740.201 201 length=71
	at hiseq.FlowcellCoordinate.setFrom(FlowcellCoordinate.java:51)
	at clump.ReadKey.<init>(ReadKey.java:46)
	at clump.ReadKey.<init>(ReadKey.java:33)
	at clump.ReadKey.makeKey(ReadKey.java:23)
	at clump.KmerComparator.hash_inner(KmerComparator.java:79)
	at clump.KmerComparator.hash(KmerComparator.java:70)
	at clump.KmerComparator.hash(KmerComparator.java:66)
	at clump.KmerSort$FetchThread.run(KmerSort.java:815)
Fetch time: 	0.080 seconds.
Closing input stream.
Combining thread output.
Combine time: 	0.000 seconds.
Exception in thread "main" java.lang.AssertionError: 0, 3200, true
	at clump.KmerSort.fetchReads(KmerSort.java:720)
	at clump.KmerSort.processInner(KmerSort.java:398)
	at clump.KmerSort.process(KmerSort.java:310)
	at clump.KmerSort.main(KmerSort.java:51)
	at clump.Clumpify.process(Clumpify.java:243)
	at clump.Clumpify.main(Clumpify.java:37)

bbduk memory error

Hello everyone,

I've installed bbmap throw sudo apt-get install bbmap and also by downloading the package, untaring it and adding the path to the directory in the .bashrc file. However, when running bbduk.sh --version or a script bbduk.sh -Xmx... (I've tried with -Xmx5g, -Xmx4g, -Xmx1g, -Xmx800m, -Xmx100m and without setting -Xmx on a 8 gb RAM machine) I get the following error:

/usr/bin/bbduk.sh: línea 344: /usr/share/bbmap/calcmem: No existe el archivo o el directorio
/usr/bin/bbduk.sh: línea 345: setEnvironment: orden no encontrada
/usr/bin/bbduk.sh: línea 346: parseXmx: orden no encontrada
/usr/bin/bbduk.sh: línea 350: freeRam: orden no encontrada
java -Xmxm -Xmsm -cp /usr/share/java/bbmap.jar jgi.BBDuk --version
Invalid maximum heap size: -Xmxm
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.

Also, the folder /calcmem does indeed do not exist. Any ideas on how to solve this problem?

Thank you!

#bbduk #memory #java #bbtools

This is NOT the official BBTools source, please don't use it

Hi all,

I'm not sure who created this or why it was created, but the official source of BBTools is here:

https://sourceforge.net/projects/bbmap/

That's also what pops up as the top hit if you google "BBMap". It's the second hit for "BBTools", but that's OK, since the top hit links to Sourceforge.

I'm not sure how people find this site, but please don't use it; it's causing major problems because the file structure was rearranged. Not to mention that it's an obsolete version.

If you encounter issues, just email me directly, don't post them here.

[suggestion] rename.sh - offset option for numbering

Hi,
I was wondering if you could add an offset option to the rename.sh script. Currently it starts numbering at '0'.

I mostly use it to rename ordered genome assemblies and would therefore like to start counting at '1', as in 'chromosome 1'.

Or is this already possible and I just can't see it?
cheers

Genome size estimation - Reg.

hi,
I would like to estimate plant mitochondrial genome size using Illumina pair-end (each one is 8 GB Raw Reads)
I used the following command for estimation:

./../bin/bin/bbmap/kmercountexact.sh in=trim_CKEMJ1.fastq in2=trim_CKEMJ2.fastq khist=hist.txt peaks=peaks.txt -Xmx188G

Output:

Input: 190851278 reads 28764660740 bases.

For K=31
Unique Kmers: 8205722019
Average Kmer Count: 2.807
Estimated Kmer Depth: 2.605
Estimated Read Depth: 3.253

And, i got hist.txt file.
But I could not understand and how to calculate genome size. Please suggest me.
hist.txt

How to download the latest version

Hello,

I'm trying to use the latest version of BBMap, which has tag of v38.90. But the latest release is shown as 35.85, and released on March 8, 2016. I'm trying to use the dedupe feature of clumpify.sh; in 35.85, this dedupe is shown as Unknown parameter. Could you please let me know how to get the latest version with dedupe enabled?

I've downloaded v38.90 from https://sourceforge.net/projects/bbmap/, is it the latest release?

Thanks!

How to allow insert base in bbduk.sh?

Hi,

When I try to trim illumina reads, if the reads got an inserted base, mismatch argument do not trim it, for it considers all shifted base as a mismatch. How to solve that?

Thanks in advance

[reformat.sh] sam output instead of bam with sambamba

Hi,

I have added sambamba to the conda bbmap recipe in order to use reformat.sh but I realised it creates sam output instead of bam, while using samtools instead of sambamba I get expected bam output. I will modify the conda recipe to use samtools instead of sambamba but I was thinking it shoulkd not make any difference in theory, isn't it?
If my assumption is true either something is wrong in reformat.sh, otherwise the problem comes from the sambamba conda version.
I hope you could help me to decipher the problem.

Best regards,

Jacques

poke @bbushnell

error while running translate6frames.sh

I am trying to map transcript reads to a database of amino acid sequences. To do this I am converting the database into nt with this tool, and converting the reads to amino acids, and then to nt.

While converting the reads back to nt, I get the following error:

"java.lang.Exception:
An input file appears to be misformatted:
The character with ASCII code 76 appeared where a base was expected: 'L'
Sequence #0
Sequence ID: 'HISEQ13:228:C8GY6ANXX:6:1101:6860:3216 1:N:0:CAGATC fr1'
Sequence: '[71, 86, 83, 82, 76, 67, 84, 71, 84, 42, 71, 83, 80, 71, 84, 83, 42, 71, 81, 87, 68, 83, 72, 42, 83, 73, 72, 65, 83, 82, 81, 76, 83, 71, 75, 86, 76, 82, 89, 76, 75, 82, 86, 73]
GVSRLCTGTGSPGTSGQWDSH*SIHASRQLSGKVLRYLKRVI'
This can be bypassed with the flag 'tossjunk', 'fixjunk', or 'ignorejunk'"


When I pass any of the recommendations (tossjunk, ignorejunk, fixjunk), I get another error.

"Exception in thread "main" java.lang.AssertionError: This read is not flagged as an amino acid sequence."

Any idea on what I am doing wrong? Command copied below. Thank you!

translate6frames.sh -Xmx30g in=R1_no_Ribo_trimmed_Plant_9_15_B_aa.fastq out=R1_no_Ribo_trimmed_Plant_9_15_B_aa_to_nt.fastq aaout=f aain=t overwrite=t

Trimming pair-end Illumina data - reg.

Hi,
If I want to trim adapters on pair-end of Illumina reads, should I use ktrim=r for forward read and ktrim=l for reverse read? or ktrim=r is enough to trim both reads?

Thank you.

bbmapskimmer.sh returns inconsistant results

Hi developers. I have tried to use bbmapskimmer.sh to map some primer sequences on my pacbio reads. It seems that a read gets different results depending on it being in a single fasta or in a multifasta.

The command I run:

bbmapskimmer.sh in=primer.fasta out=samout.sam ref=$STR idfilter=0.1 k=8 noheader=t threads=4 ambiguous=all nodisk

primer.fasta:

ssu_1
AGAGTTTGATCATGGCTCAG
ssu_2
AGAGTTTGATCCTGGCTCAG
lsu_1
GGGTTCCCCCATTCGG
lsu_2
GGGTTCCCCCATTCAG
lsu_3
GGGTTTCCCCATTCGG
lsu_4
GGGTTTCCCCATTCAG
lsu_5
GGGTTGCCCCATTCGG
lsu_6
GGGTTGCCCCATTCAG

The sequence in question

problematic_seq
AGAGTTGATCCTGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGGATGAAGGGAGCTTGCTCCTGGATTCAGCGGCGGACGGGTGAGTAATGCCTAGGAATCTGCCTGGTAGTGGGGGATAACGTCCGGAAACGGGCGCTAATACCGCATACGTCCTGAGGGAGAAAGTGGGGGATCTTCGGACCTCACGCTATCAGATGAGCCTAGGTCGGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCCGTAACTGGTCTGAGAGGATGATCAGTCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGTCTTCGGATTGTAAAGCACTTTAAGTTGGGAGGAAGGGCAGTAAGTTAATACCTTGCTGTTTTGACGTTACCAACAGAATAAGCACCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGTGGTTCAGCAAGTTGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCCAAAACTACTGAGCTAGAGTACGGTAGAGGGTGGTGGAATTTCCTGTGTAGCGGTGAAATGCGTAGATATAGGAAGGAACACCAGTGGCGAAGGCGACCACCTGGACTGATACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTAGCCGTTGGGATCCTTGAGATCTTAGTGGCGCAGCTAACGCGATAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCTGGCCTTGACATGCTGAGAACTTTCCAGAGATGGATTGGTGCCTTCGGGAACTCAGACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGTAACGAGCGCAACCCTTGTCCTTAGTTACCAGCACCTCGGGTGGGCACTCTAAGGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTGCTACAATGGTCGGTACAAAGGGTTGCCAAGCCGCGAGGTGGAGCTAATCCCATAAAACCGATCGTAGTCCGGATCGCAGTCTGCAACTCGACTGCGTGAAGTCGGAATCGCTAGTAATCGTGAATCAGAATGTCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCTCCAGAAGTAGCTAGTCTAACCGCAAGGGGGACGGTTACCACGGAGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAGCCGTAGGGGAACCTGCGGCTGGATCACCTCCTTAATCGAAGATCTCAGCTTCTTCATAAGCTCCCACACGAATTGCTTGATTCACTGGTTAGACGATTGGGTCTGTAGCTCAGTTGGTTAGAGCGCACCCCTGATAAGGGTGAGGTCGGCAGTTCGAATCTGCCCAGACCCACCAATTGTTGGTGTGCTGCGTGATCCGATACGGGGCCATAGCTCAGCTGGGAGAGCGCCTGCTTTGCACGCAGGAGGTCAGGAGTTCGATCCTCCTTGGCTCCACCATCTAAAACAATCGTCGAAAGCTCAGAAATGAATGTTCGTAGATGAACATTGATTTCTGGTCTTTGCACCAGAACTGTTCTTTAAAAATTCGGGTATGTGATAGAAGTAAGACTGAATGATCTCTTTCACTGGTGATCATTCAAGTCAAGGTAAAATTTGCGAGTTCAAGCGCGAATTTTCGGCGAATGTCGTCTTCACAGTATAACCAGATTGCTTGGGGTTATATGGTCAAGTGAAGAAGCGCATACGGTGGATGCCTTGGCAGTCAGAGGCGATGAAAGACGTGGTAGCCTGCGAAAAGCTTCGGGGAGTCGGCAAACAGACTTTGATCCGGAGATCTCTGAATGGGGAACCC

Run the command on this sequence alone, I get:

ssu_1	4	*	0	0	*	*	0	0	AGAGTTTGATCATGGCTCAG	*
ssu_2	0	problematic_seq	1	28	4=1I15=	*	0	0	AGAGTTTGATCCTGGCTCAG	*	NM:i:1	AM:i:28	NH:i:1
lsu_1	16	problematic_seq	2114	13	1=1X4=1I9=	*	0	0	CCGAATGGGGGAACCC	*	NM:i:2	AM:i:13	NH:i:1
lsu_2	16	problematic_seq	2114	26	6=1I9=	*	0	0	CTGAATGGGGGAACCC	*	NM:i:1	AM:i:26	NH:i:1
lsu_3	16	problematic_seq	2114	13	1=1X8=1I5=	*	0	0	CCGAATGGGGAAACCC	*	NM:i:2	AM:i:13	NH:i:1
lsu_4	16	problematic_seq	2114	25	10=1I5=	*	0	0	CTGAATGGGGAAACCC	*	NM:i:1	AM:i:25	NH:i:1
lsu_5	4	*	0	0	*	*	0	0	GGGTTGCCCCATTCGG	*
lsu_6	16	problematic_seq	2114	25	10=1I5=	*	0	0	CTGAATGGGGCAACCC	*	NM:i:1	AM:i:25	NH:i:1

Notice that "ssu_2" has a flag 0 and I am certain that except an insertion "T", ssu_2 maps to "problematic_seq".

However, if I add two others into the file:

m54122_180320_131917/45548132/ccs,2130
AGAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGGATGAAGGGAGCTTGCTCCTGGATTCAGCGGCGGACGGGTGAGTAATGCCTAGGAATCTGCCTGGTAGTGGGGGATAACGTCCGGAAACGGGCGCTAATACCGCATACGTCCTGAGGGAGAAAGTGGGGGATCTTCGGACCTCACGCTATCAGATGAGCCTAGGTCGGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCCGTAACTGGTCTGAGAGGATGATCAGTCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGTCTTCGGATTGTAAAGCACTTTAAGTTGGGAGGAAGGGCAGTAAGTTAATACCTTGCTGTTTTGACGTTACCAACAGAATAAGCACCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGTGGTTCAGCAAGTTGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCCAAAACTACTGAGCTAGAGTACGGTAGAGGGTGGTGGAATTTCCTGTGTAGCGGTGAAATGCGTAGATATAGGAAGGAACACCAGTGGCGAAGGCGACCACCTGGACTGATACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTAGCCGTTGGGATCCTTGAGATCTTAGTGGCGCAGCTAACGCGATAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCTGGCCTTGACATGCTGAGAACTTTCCAGAGATGGATTGGTGCCTTCGGGAACTCAGACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGTAACGAGCGCAACCCTTGTCCTTAGTTACCAGCACCTCGGGTGGGCACTCTAAGGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTGCTACAATGGTCGGTACAAAGGGTTGCCAAGCCGCGAGGTGGAGCTAATCCCATAAAACCGATCGTAGTCCGGATCGCAGTCTGCAACTCGACTGCGTGAAGTCGGAATCGCTAGTAATCGTGAATCAGAATGTCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCTCCAGAAGTAGCTAGTCTAACCGCAAGGGGGACGGTTACCACGGAGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAGCCGTAGGGGAACCTGCGGCTGGATCACCTCCTTAATCGAAGATCTCAGCTTCTTCATAAGCTCCCACACGAATTGCTTGATTCACTGGTTAGACGATTGGGTCTGTAGCTCAGTTGGTTAGAGCGCACCCCTGATAAGGGTGAGGTCGGCAGTTCGAATCTGCCCAGACCCACCAATTGTTGGTGTGCTGCGTGATCCGATACGGGGCCATAGCTCAGCTGGGAGAGCGCCTGCTTTGCACGCAGGAGGTCAGGAGTTCGATCCTCCTTGGCTCCACCATCTAAAACAATCGTCGAAAGCTCAGAAATGAATGTTCGTAGATGAACATTGATTTCTGGTCTTTGCACCAGAACTGTTCTTTAAAATTCGGGTATGTGATAGAAGTAAGACTGAATGATCTCTTTCACTGGTGATCATTCAAGTCAAGGTAAAATTTGCGAGTTCAAGCGCGAATTTTCGGCGAATGTCGTCTTCACAGTATAACCAGATTGCTTGGGGTTATATGGTCAAGTGAAGAAGCGCATACGGTGGATGCCTTGGCAGTCAGAGGCGATGAAAGACGTGGTAGCCTGCGAAAAGCTTCGGGGAGTCGGCAAACAGACTTTGATCCGGAGATCTCCTGAATGGGGCAACCC
m54122_180320_131917/44957937/ccs,2130
AGAGTTTGATCCTGGCTCAGATTGAACGCTGGCGGCAGGCCTAACACATGCAAGTCGAGCGGATGAAGGGAGCTTGCTCCTGGATTCAGCGGCGGACGGGTGAGTAATGCCTAGGAATCTGCCTGGTAGTGGGGGATAACGTCCGGAAACGGGCGCTAATACCGCATACGTCCTGAGGGAGAAAGTGGGGGATCTTCGGACCTCACGCTATCAGATGAGCCTAGGTCGGATTAGCTAGTTGGTGGGGTAAAGGCCTACCAAGGCGACGATCCGTAACTGGTCTGAGAGGATGATCAGTCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATATTGGACAATGGGCGAAAGCCTGATCCAGCCATGCCGCGTGTGTGAAGAAGGTCTTCGGATTGTAAAGCACTTTAAGTTGGGAGGAAGGGCAGTAAGTTAATACCTTGCTGTTTTGACGTTACCAACAGAATAAGCACCGGCTAACTTCGTGCCAGCAGCCGCGGTAATACGAAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCGCGTAGGTGGTTCAGCAAGTTGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCCAAAACTACTGAGCTAGAGTACGGTAGAGGGTGGTGGAATTTCCTGTGTAGCGGTGAAATGCGTAGATATAGGAAGGAACACCAGTGGCGAAGGCGACCACCTGGACTGATACTGACACTGAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTAGCCGTTGGGATCCTTGAGATCTTAGTGGCGCAGCTAACGCGATAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCTGGCCTTGACATGCTGAGAACTTTCCAGAGATGGATTGGTGCCTTCGGGAACTCAGACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGTAACGAGCGCAACCCTTGTCCTTAGTTACCAGCACCTCGGGTGGGCACTCTAAGGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTGCTACAATGGTCGGTACAAAGGGTTGCCAAGCCGCGAGGTGGAGCTAATCCCATAAAACCGATCGTAGTCCGGATCGCAGTCTGCAACTCGACTGCGTGAAGTCGGAATCGCTAGTAATCGTGAATCAGAATGTCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGTGGGTTGCTCCAGAAGTAGCTAGTCTAACCGCAAGGGGGACGGTTACCACGGAGTGATTCATGACTGGGGTGAAGTCGTAACAAGGTAGCCGTAGGGGAACCTGCGGCTGGATCACCTCCTTAATCGAAGATCTCAGCTTCTTCATAAGCTCCCACACGAATTGCTTGATTCACTGGTTAGACGATTGGGTCTGTAGCTCAGTTGGTTAGAGCGCACCCCTGATAAGGGTGAGGTCGGCAGTTCGAATCTGCCCAGACCCACCAATTGTTGGTGTGCTGCGTGATCCGATACGGGGCCATAGCTCAGCTGGGAGAGCGCCTGCTTTGCACGCAGGAGGTCAGGAGTTCGATCCTCCTTGGCTCCACCATCTAAAACAATCGTCGAAAGCTCAGAAATGAATGTTCGTAGATGAACATTGATTTCTGGTCTTTGCACCAGAACTGTTCTTTAAAAATTCGGGTATGTGATAGAAGTAAGACTGAATGATCTCTTTCACTGGTGATCATTCAAGTCAAGGTAAAATTTGCGAGTTCAAGCGCGAATTTTCGGCGAATGTCGTCTTCACAGTATAACCAGATTGCTTGGGGTTATATGGTCAAGTGAAGAAGCGCATACGGTGGATGCCTTGGCAGTCAGAGGCGATGAAAGACGTGGTAGCCTGCGAAAAGCTTCGGGGAGTCGGCAAACAGACTTTGATCCGGAGATCTCCGAATGGGGCAACCC

I get:

ssu_1	0	m54122_180320_131917/45548132/ccs,2130	1	3	11=1X8=	*	0	0	AGAGTTTGATCATGGCTCAG	*	XT:A:R	NM:i:1	AM:i:3	NH:i:2
ssu_1	256	m54122_180320_131917/44957937/ccs,2130	1	3	11=1X8=	*	0	0	*	*	NM:i:1	AM:i:3	NH:i:2
ssu_2	0	m54122_180320_131917/45548132/ccs,2130	1	3	20=	*	0	0	AGAGTTTGATCCTGGCTCAG	*	XT:A:R	NM:i:0	AM:i:3	NH:i:2
ssu_2	256	m54122_180320_131917/44957937/ccs,2130	1	3	20=	*	0	0	*	*	NM:i:0	AM:i:3	NH:i:2
lsu_1	16	m54122_180320_131917/44957937/ccs,2130	2115	28	10=1X5=	*	0	0	CCGAATGGGGGAACCC	*	NM:i:1	AM:i:28	NH:i:2
lsu_1	272	problematic_seq	2114	14	1=1X4=1I9=	*	0	0	*	*	NM:i:2	AM:i:14	NH:i:2
lsu_2	16	m54122_180320_131917/45548132/ccs,2130	2115	2	10=1X5=	*	0	0	CTGAATGGGGGAACCC	*	XT:A:R	NM:i:1	AM:i:2	NH:i:2
lsu_2	272	problematic_seq	2114	2	6=1I9=	*	0	0	*	*	NM:i:1	AM:i:2	NH:i:2
lsu_3	16	m54122_180320_131917/44957937/ccs,2130	2115	28	10=1X5=	*	0	0	CCGAATGGGGAAACCC	*	NM:i:1	AM:i:28	NH:i:2
lsu_3	272	problematic_seq	2114	14	1=1X8=1I5=	*	0	0	*	*	NM:i:2	AM:i:14	NH:i:2
lsu_4	16	m54122_180320_131917/45548132/ccs,2130	2115	2	10=1X5=	*	0	0	CTGAATGGGGAAACCC	*	XT:A:R	NM:i:1	AM:i:2	NH:i:2
lsu_4	272	problematic_seq	2114	2	10=1I5=	*	0	0	*	*	NM:i:1	AM:i:2	NH:i:2
lsu_5	16	m54122_180320_131917/44957937/ccs,2130	2115	40	16=	*	0	0	CCGAATGGGGCAACCC	*	NM:i:0	AM:i:40	NH:i:2
lsu_5	272	m54122_180320_131917/45548132/ccs,2130	2115	29	1=1X14=	*	0	0	*	*	NM:i:1	AM:i:29	NH:i:2
lsu_6	16	m54122_180320_131917/45548132/ccs,2130	2115	40	16=	*	0	0	CTGAATGGGGCAACCC	*	NM:i:0	AM:i:40	NH:i:2
lsu_6	272	m54122_180320_131917/44957937/ccs,2130	2115	29	1=1X14=	*	0	0	*	*	NM:i:1	AM:i:29	NH:i:2

So the "problematic_seq" is not having the ssu_2 hit as in my first attempt.

Please help! Thank you!

filterbyname driver not present

When attempting to run filterbyname.sh it outputs the error:

Error: Could not find or load main class driver.FilterReadsByName

When searching through the current/jgi/ directory for the FilterReadsByName class, there is no such class available.

Error: Could not find or load main class jgi.BBMerge

Hello,
I'm using BBmerge to determine which adapter sequences on my short reads (as suggested on http://seqanswers.com/forums/showthread.php?t=63537). I was wondering if you could help with an error I'm getting. When I run this line from my SLURM script:

scripts="/gpfs/scratch/sjfleck/modulefiles/BBMap/sh"
fastq="/gpfs/scratch/sjfleck/MaSuRCA/my_species"
$scripts/bbmerge.sh $fastq/in1=forward_reads_1.fq.gz $fastq/in2=reverse_reads_2.fq.gz outa=my_species_adapters.fa

Here is the error file I get (I'm separating some lines for clarity):

java -Djava.library.path=/gpfs/scratch/sjfleck/modulefiles/BBMap/sh/jni/ -ea -Xmx1000m -cp /gpfs/scratch/sjfleck/modulefiles/BBMap/sh/current/
in1=forward_reads_1.fq.gz
in2=reverse_reads_2.fq.gz
outa=Ping_moct_adapters.fa

Error: Could not find or load main class jgi.BBMerge

There are a few things I'm immediately noticing:

  1. there is no file named "jgi.BBMerge" in "/gpfs/scratch/sjfleck/modulefiles/BBMap/sh/current/"
  • There is a folder called "jgi" in this directory, but no files.
  • within the "jgi" directory, the closest thing to "jgi.BBMerge" is "BBMerge.java" and "BBMergeOverlapper.java"
  1. " jgi.BBMerge" starts with a space in the script. Is this by design?

Any help you could offer would be greatly appreciated. Thank you,
Steve

error when running calctruequality

I'm trying to recalibrate Q scores of a NextSeq run using MiSeq contigs assembled with Tadpole
#mapping reads to reference

bbmap.sh in=concatABC.fastq.gz outm=mapped.sam ref=./Lpe09_06TdpAssemblies/contigs09_06.fa ignorequality maxindel=100 minratio=0.4 ambig=toss qahist=qahist_raw.txt qhist=qhist_raw.txt mhist=mhist_raw.txt
#generating calibration matrices
calctruequality.sh in=mapped.sam

I get the following output
java -ea -Xmx57992m -Xms57992m -cp /home/me/bbmap/current/ jgi.CalcTrueQuality in=mapped.sam
Executing jgi.CalcTrueQuality [in=mapped.sam]

Exception in thread "Thread-2" Exception in thread "Thread-1" java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag and no ScafMap loaded.
at stream.SamLine.toShortMatch(SamLine.java:1212)
at stream.SamLine.toRead (SamLine.java:2015)
at stream.SamLine.toRead (SamLine.java:1875)
at stream.SamReadStreamer$ProcessThread.makeReads (SamReadStreamer.java:206)
at stream.SamReadStreamer$ProcessThread.run (SamReadStreamer.java:135)
java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag and no ScafMap loaded.
at stream.SamLine.toShortMatch(SamLine.java:1212)
at stream.SamLine.toRead (SamLine.java:2015)
at stream.SamLine.toRead (SamLine.java:1875)
at stream.SamReadStreamer$ProcessThread.makeReads (SamReadStreamer.java:206)
at stream.SamReadStreamer$ProcessThread.run (SamReadStreamer.java:135)
Exception in thread "Thread-3" java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag and no ScafMap loaded.
at stream.SamLine.toShortMatch(SamLine.java:1212)
at stream.SamLine.toRead (SamLine.java:2015)
at stream.SamLine.toRead (SamLine.java:1875)
at stream.SamReadStreamer$ProcessThread.makeReads (SamReadStreamer.java:206)
at stream.SamReadStreamer$ProcessThread.run (SamReadStreamer.java:135)

The program didn't quit so I don't know if it's still doing something or not. I don't see /ref/qual/ files being generated. It's been hanging for over an hour. I'm not sure how to proceed. Since the mapped sam file was generated with bbmap using contigs from Tadpole, I'm not sure why there are Ms in the cigar strings. The infile I used for bbmap was made by first concatenating all Read1 from 3 libraries then all Read2 from 3 libraries then those two files concatenated.

Error while running kmercountmulti.sh

I am trying to use the kmercountmulti for counting the kmers for ecoli long reads fasta file on my linux server but I am having this error:

bbmap//calcmem.sh: line 75: [: -v: unary operator expected
java -ea -Xmx500m -cp /home/bbmap/current/ jgi.KmerCountMulti in=/home/eColi_LR_SL.fasta sweep=16,32,64 out=hist1_eColi_LR
Executing jgi.KmerCountMulti [in=/home/eColi_LR_SL.fasta, sweep=16,32,64, out=hist1_eColi_LR]

Exception in thread "main" java.lang.ArrayStoreException: cardinality.LogLog2
at cardinality.MultiLogLog.(MultiLogLog.java:30)
at cardinality.MultiLogLog.(MultiLogLog.java:11)
at jgi.KmerCountMulti.(KmerCountMulti.java:149)
at jgi.KmerCountMulti.main(KmerCountMulti.java:49)

My java version is:

openjdk version "1.8.0_272"
OpenJDK Runtime Environment (build 1.8.0_272-b10)
OpenJDK 64-Bit Server VM (build 25.272-b10, mixed mode)

Kindly help.

Dedupe and nanopore amplicon reads

I'd like to use dedupe.sh to remove contained amplicon Nanopore reads and haven't had much success. My reads are about 4-5Kb and I have tried the following combinations.

minidentity=90 e=10
minidentity=80 e=10
minidentity=70 e=10
minidentity=70 e=20

I observe 0 containments and only 2 overlaps. Am I missing a parameter?

Many Thanks,
Azita

bbduk.sh AssertionError

Dear Developers,

I'm trying to use bbduk.sh to remove adapters and do quality trimming in one pair end illumina reads files (in fastq.gz format). I got the following error:

Changed from ASCII-33 to ASCII-64 on input quality @ (Q31) for base N at lines 1 and 3, position 96 while prescanning.
Changed from ASCII-64 to ASCII-33 on input quality 8 (Q-8) for base G at lines 5 and 7, position 9 while prescanning.
Exception in thread "main" java.lang.AssertionError: ASCII encoding for quality (currently ASCII-33) appears to be wrong 
for input quality 25 for base G at lines 5 and 7, position 9.
GTCGTAACTATGGTCAACGTTCAAGAACTAATCAACTCCGATGATGTAGTCGTCTTCAGCAAGTCCTACTGCCCTTTCTGTGTCCGCGCAAAGACTNNNN
@SRR3734914.2 HWI-ST942:114:C11YJACXX:3:1101:1963:2179 length=100
[64, 83, 82, 82, 51, 55, 51, 52, 57, 49, 52, 46, 50, 32, 72, 87, 73, 45, 83, 84, 57, 52, 50, 58, 49, 49, 52, 58, 67, 49, 49, 89, 74, 65, 67, 88, 88, 58, 51, 58, 49, 49, 48, 49, 58, 49, 57, 54, 51, 58, 50, 49, 55, 57, 32, 108, 101, 110, 103, 116, 104, 61, 49, 48, 48]
	at stream.FASTQ.testQuality(FASTQ.java:231)
	at stream.FASTQ.isInterleaved(FASTQ.java:131)
	at stream.FastqReadInputStream.<init>(FastqReadInputStream.java:58)
	at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:121)
	at stream.ConcurrentReadInputStream.getReadInputStream(ConcurrentReadInputStream.java:55)
	at jgi.BBDukF.spawnProcessThreads(BBDukF.java:1674)
	at jgi.BBDukF.process2(BBDukF.java:1058)
	at jgi.BBDukF.process(BBDukF.java:967)
	at jgi.BBDukF.main(BBDukF.java:71)

Could you please help to see what does this mean and potentially how to solve this?

Thank you!

Problem in reading a FASTQ file with CRLF line breaks

I'm having problems in using bbduk on a fastq file generated under Windows, having CRLF for newlines (\r\n). I'm using BBMap Version 35.85.

The analysis stops with the following error:

java.lang.AssertionError: 
Mismatch between length of bases and qualities for read 3057 (id=15201233A:129:AC51NANNX:7:1340:17128:7008#GTTTCG).
# qualities=0, # bases=50
CAGGAATATTTGCCTGTTGTCCATCGACTACGCCTTTCGGCCTGATCTTA
    at stream.Read.validate(Read.java:103)
    at stream.Read.<init>(Read.java:78)
    at stream.Read.<init>(Read.java:61)
    at stream.FASTQ.quadToRead(FASTQ.java:806)
    at stream.FASTQ.toReadList(FASTQ.java:653)
    at stream.FastqReadInputStream.fillBuffer(FastqReadInputStream.java:111)
    at stream.FastqReadInputStream.nextList(FastqReadInputStream.java:96)
    at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:656)
    at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:635)
Exception in thread "Thread-17" java.lang.NullPointerException
    at jgi.BBDukF$ProcessThread.run(BBDukF.java:2399)

Since I'm a developer, to find the problem, I deeply analyzed the source code of BBMap, and I ended up to the class FastqReadInputStream, that I understood it's used for parsing the file. Then, I made the following test (using Java Nashorn JS engine), using my fastq file as input (that I unzipped for simplicity):

fris = new Packages.stream.FastqReadInputStream("sample.fastq", true);		
for (r = fris.next(); r !== null; r = fris.next()) { 
	r.validate(true); 
}

The validation of all the reads of the file lead to this error (that I suppose is the core of the issue):

The ASCII quality encoding offset (64) is not set correctly, or the reads are corrupt; quality value below -5.
Please re-run with the flag 'qin=33' or 'ignorebadquality'.
Problematic read number 0:

@7001253F:489:DQ66NBXWK:1:6110:14718:2000#GTTTCG
AATGGGGTCATTGCAGCCCTTCTCGGTGGACTGGCGGAAGCCAAATCG
+
!<<<BFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFF`

Of course, I tried to change qin from 33 to 64, but it didn't solve the problem. In addition, I was able to detect the bytes in the input file that make the error to raise: these are in the range (899674;
900193), however, if I isolate these lines and execute my test on them, the error is not raised. So I believe that the bug is dependent also on the previous lines, and the lines in that range just triggers the error.

I can only say that If I remove the CR char at the end of each line the test passes, and BBDuk works on my original input file. if you need I can send you by email the test that I made with the short version of my input FASTQ file (~1MB on 1GB of file). I don't prefer to attach the files here to keep the input FASTQ file private, as much as possible.

Thank you.

filterbyname.sh

I am trying to extract a list of contigs with filterbyname.sh, but it works well if i use (include=f) and if i use (include=t) for the same file and name file i get zero hits. I don't want to exclude those reads. What could be the issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.