Code Monkey home page Code Monkey logo

ea-utils's People

Watchers

 avatar

ea-utils's Issues

mapped bases didn't output

What steps will reproduce the problem?
1. launch sam-stats -B bamfile > stat_file

What is the expected output? What do you see instead?
I expect to have the mapped base as describe in help "mapped bases      : total 
of the lengths of the aligned reads" 
But it isn't.


What version of the product are you using? On what operating system?
svn version 1.31




Original issue reported on code.google.com by [email protected] on 29 May 2012 at 3:48

sam-stats doesn't support -o and -O option

What steps will reproduce the problem?

$ /software/ea-utils/ea-utils-svn/clipper/sam-stats -O test.sam-stats test.bam
/software/ea-utils/ea-utils-svn/clipper/sam-stats: invalid option -- 'O'
Unknown option `-O'.
Usage: sam-stats [options] [file1] [file2...filen]
Version: 1.37.657
...


What is the expected output? What do you see instead?


What version of the product are you using? On what operating system?

svn version of ea-utils


Here is a patch (options also sorted alphabetically):

$ svn diff
Index: sam-stats.cpp
===================================================================
--- sam-stats.cpp       (revision 662)
+++ sam-stats.cpp       (working copy)
@@ -222,7 +222,7 @@
     int long_index=0;
     const char *prefix;

-    while ( (c = getopt_long(argc, argv, "?BArR:Ddx:MhS:", long_options, 
&long_index)) != -1) {
+    while ( (c = getopt_long(argc, argv, "?ABDdhMoO:R:rS:x:", long_options, 
&long_index)) != -1) {
                 switch (c) {
                 case 'd': ++debug; break;                                       // increment debug level
                 case 'D': ++trackdup; break;


After this patch, this option is accepted, but it doesn't do anything as far as 
I can tell because the prefix variable is never used afterwards:

$ grep prefix sam-stats.cpp 
    const char *prefix;
                case 'O': prefix=optarg; break;
"-O PREFIX      Output prefix enabling extended output (see below)\n"




Here is some bash code to extract the info:

$ echo "                case 'd': ++debug; break;                               
        // increment debug level
                case 'D': ++trackdup; break;
                case 'B': inbam=1; break;
                case 'A': max_chr=1000000; break;                               // max chrom
                case 'R': rnafile=optarg;                                       // pass through
                case 'r': max_chr=1000000; rnamode=1; if (histnum < 60) histnum=60; break;                                                                                         
                case 'O': prefix=optarg; break;
                case 'S': histnum=atoi(optarg); break;
                case 'x': ext=optarg; break;
                case 'M': newonly=1; break;                                                                      
                case 'o': fq_out=1; trackdup=1; break;                     // output suff
                case 'h': usage(stdout); return 0;
                case '?':
"| grep -E -o "'.?'" |cut -f2 -d "'" | cut -f2 -d "'"| sort | tr -d '\n'


?ABdDhMoOrRSx


Similar problems exist in other tools ( I didn't check all of them ):

For example: fastq-join.c:

There is a check for "-dRnbeo:t:v:m:p:r:", but options "-nbe" don't exist.

Also in the help "-t" is not mentioned (as is "-d" which is OK as it is a debug 
option).

        while ( (c = getopt (argc, argv, "-dRnbeo:t:v:m:p:r:")) != -1) {
                switch (c) {
                case '\1':
                        if (!in[0]) 
                                in[0]=optarg;
                        else if (!in[1])                
                                in[1]=optarg;
                        else if (!in[2])                
                                in[2]=optarg;
                        else {
                                usage(stderr); return 1;
                        }
                        ++in_n;
                        break;
                case 'o': if (out_n == 3) {
                                usage(stderr); return 1;
                          }
                          out[out_n++] = optarg; 
                          break;
                case 'r': orep = optarg; break;
                case 't': threads = atoi(optarg); break;
                case 'm': mino = atoi(optarg); break;
                case 'p': pctdiff = atoi(optarg); break;
                case 'R': norevcomp = true; break;
                case 'd': debug = 1; break;
                case 'v':
                        if (strlen(optarg)>1) {
                                fprintf(stderr, "Option -v requires a single character argument");
                                exit(1);
                        }
                        verify = *optarg; break;
                case '?': 
                     if (strchr("otvmpr", optopt))
                       fprintf (stderr, "Option -%c requires an argument.\n", optopt);
                     else if (isprint(optopt))
                       fprintf (stderr, "Unknown option `-%c'.\n", optopt);
                     else
                       fprintf (stderr,
                                "Unknown option character `\\x%x'.\n",
                                optopt);
                     usage(stderr);
                     return 1;
                }
        }



Original issue reported on code.google.com by [email protected] on 9 Oct 2013 at 2:35

fastq-join does not work on set of (fastq-mcf) trimmed seqs, although it works perfectly for each seq seperately

What steps will reproduce the problem?
1. copy fwd.fastq in one dir
2. copy rvs.fastq in the same dir
3. fastq-join fwd.fastq rvs.fastq -o c1

What is the expected output? What do you see instead?
I expect three joined reads. Everything goes right if I process each of the 
separately!

But I see just an error:
*** glibc detected *** fastq-join: realloc(): invalid next size: 
0x0000000000ad4e30 ***
======= Backtrace: =========
/lib/libc.so.6(+0x774b6)[0x7f6e8e4ff4b6]
/lib/libc.so.6(+0x7db96)[0x7f6e8e505b96]
/lib/libc.so.6(realloc+0xf0)[0x7f6e8e505eb0]
fastq-join[0x401065]
fastq-join[0x401ce5]
/lib/libc.so.6(__libc_start_main+0xfe)[0x7f6e8e4a6d8e]
fastq-join[0x400c79]
======= Memory map: ========
00400000-00403000 r-xp 00000000 08:05 9579591                            
/usr/local/bin/fastq-join
00602000-00603000 r--p 00002000 08:05 9579591                            
/usr/local/bin/fastq-join
00603000-00604000 rw-p 00003000 08:05 9579591                            
/usr/local/bin/fastq-join
00ad4000-00af5000 rw-p 00000000 00:00 0                                  [heap]
7f6e88000000-7f6e88021000 rw-p 00000000 00:00 0 
7f6e88021000-7f6e8c000000 ---p 00000000 00:00 0 
7f6e8e488000-7f6e8e602000 r-xp 00000000 08:05 4458098                    
/lib/libc-2.12.1.so
7f6e8e602000-7f6e8e801000 ---p 0017a000 08:05 4458098                    
/lib/libc-2.12.1.so
7f6e8e801000-7f6e8e805000 r--p 00179000 08:05 4458098                    
/lib/libc-2.12.1.so
7f6e8e805000-7f6e8e806000 rw-p 0017d000 08:05 4458098                    
/lib/libc-2.12.1.so
7f6e8e806000-7f6e8e80b000 rw-p 00000000 00:00 0 
7f6e8e80b000-7f6e8e820000 r-xp 00000000 08:05 4456504                    
/lib/libgcc_s.so.1
7f6e8e820000-7f6e8ea1f000 ---p 00015000 08:05 4456504                    
/lib/libgcc_s.so.1
7f6e8ea1f000-7f6e8ea20000 r--p 00014000 08:05 4456504                    
/lib/libgcc_s.so.1
7f6e8ea20000-7f6e8ea21000 rw-p 00015000 08:05 4456504                    
/lib/libgcc_s.so.1
7f6e8ea21000-7f6e8eaa3000 r-xp 00000000 08:05 4461329                    
/lib/libm-2.12.1.so
7f6e8eaa3000-7f6e8eca2000 ---p 00082000 08:05 4461329                    
/lib/libm-2.12.1.so
7f6e8eca2000-7f6e8eca3000 r--p 00081000 08:05 4461329                    
/lib/libm-2.12.1.so
7f6e8eca3000-7f6e8eca4000 rw-p 00082000 08:05 4461329                    
/lib/libm-2.12.1.so
7f6e8eca4000-7f6e8ed8c000 r-xp 00000000 08:05 1313209                    
/usr/lib/libstdc++.so.6.0.14
7f6e8ed8c000-7f6e8ef8b000 ---p 000e8000 08:05 1313209                    
/usr/lib/libstdc++.so.6.0.14
7f6e8ef8b000-7f6e8ef93000 r--p 000e7000 08:05 1313209                    
/usr/lib/libstdc++.so.6.0.14
7f6e8ef93000-7f6e8ef95000 rw-p 000ef000 08:05 1313209                    
/usr/lib/libstdc++.so.6.0.14
7f6e8ef95000-7f6e8efaa000 rw-p 00000000 00:00 0 
7f6e8efaa000-7f6e8efca000 r-xp 00000000 08:05 4461330                    
/lib/ld-2.12.1.so
7f6e8f1a6000-7f6e8f1ab000 rw-p 00000000 00:00 0 
7f6e8f1c5000-7f6e8f1ca000 rw-p 00000000 00:00 0 
7f6e8f1ca000-7f6e8f1cb000 r--p 00020000 08:05 4461330                    
/lib/ld-2.12.1.so
7f6e8f1cb000-7f6e8f1cc000 rw-p 00021000 08:05 4461330                    
/lib/ld-2.12.1.so
7f6e8f1cc000-7f6e8f1cd000 rw-p 00000000 00:00 0 
7fff309b5000-7fff309d6000 rw-p 00000000 00:00 0                          [stack]
7fff309ff000-7fff30a00000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  
[vsyscall]


What version of the product are you using? On what operating system?
ea-utils.1.1.1-237.tar.gz, but also tried older version or those precompiled 
from CentOs rpm (same problem).

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 30 Nov 2011 at 12:34

Attachments:

issue with gtf2bed

Hi Erik,

Thank you so much for writing a handy script gtf2bed script; saved me a few 
hours today (well, kinda!). I think I might have stumbled across some error in 
your script. Using the attached .gtf as input (from gencodeV16), your script 
outputs:

chr1    23337326        23342343        ENST00000566855.1       0       -       
23337393        23340540        0       3       268,163,79,     0,3049,4938,

whereas the proper output should be

chr1    23337326        23342343        ENST00000566855.1       0       -       
23337393        23342266        0       3       268,163,79,     0,3049,4938,

I suspect this is an issue with the start codon annotation spanning across a 
splice site. Unfortunately, I am not a perl aficionado and could not debug it 
myself. I'm also way to lazy to setup an account to comment on the eq-utils 
wiki (sorry!).

In any case, I thought I should toss you an email as a heads up.

Kind regards,

Martin Smith

Original issue reported on code.google.com by [email protected] on 1 May 2013 at 6:06

Attachments:

--max-ns 0 does not seem to work, still getting reads with Ns

What steps will reproduce the problem?
1. fastq-mcf --max-ns 0 -o test1 -o test2 -R adapters.fa test1.fq.gz test2.fq.gz

What is the expected output? What do you see instead?

Expected to see reads with Ns discarded. Saw reads with Ns.

    cat test1 test2 | awk '(NR+2)%4==0' | grep -c N
    73

What version of the product are you using? On what operating system?

ea-utils.1.1.2-537.tar.gz on linux x86_64

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 2 Oct 2013 at 3:32

Attachments:

error message: "fastq-mcf has stopped working"

What steps will reproduce the problem?
1. I ran fastq-mcf 4 times without any problems
2. On the 5th time, a windows error message pops us saying "fastq-mcf has 
stopped working" and that windows is searching for a solution to the problem
3. No solution is found and there is nothing I can do

What is the expected output? What do you see instead?
I used fastq-mcf to produce cleaned up fastq file without any problems 4 time.

What version of the product are you using? On what operating system?
ea-utils-1.1.2-621-win64

Please provide any additional information below.
I am running the .exe in a dos command prompt, on a pc running windows 7.

Thank you for any help.


Original issue reported on code.google.com by [email protected] on 8 Sep 2014 at 10:46

fastq-multx supporting sequence in header

PASTED from forums.  


This is an important feature that fastq-mcf should handle, but currently does 
not.   Also, I noticed that Illumina outputs GAGATTCC+GGCTCTGA for dual-indexed 
files.   It's not hard to do in the code, but it is a feature that I intend to 
add.

On Saturday, June 14, 2014 6:20:32 PM UTC-4, Christopher Laumer wrote:
Can fastq-multx (or any other tool that people know of) demultiplex PE fastq 
files based on the index sequence given in the sequence *headers*, not in the 
sequence itself?

For instance consider a 100 bp fastq looking like this (with a mate in a 
different file):

@ILLUMINA-D00365:240:H9N3RADXX:2:1101:2110:2045 1:N:0:GAGATTCCGGCTCTGA
AAGCCGGTATTTAAATATCTTATTGAAAAAATAATTTTATGGTTTGTTTTATTCTTTTAAATAAAATCTTTTAAATCAAC
TCTTTTTTATTCGGCTATTT
+
CCCFFFFFHHHHHJJJJJJJJJJJJJJIJJJJJJJJJJJJJJIJJJHJJJJJJJJJJJJJJJJJJJJJJHHHHHHFFFFF
FEEEEEEDDDDEDDDDDDDE

The index (here, two 8bp dual indices concatenated) is in the sequence name at 
the end ("1:N:0:GAGATTCCGGCTCTGA").

From all I can gather the normal behavior of fastq-multx is to look for the 
index within the sequence itself - but these are reads that have already been 
"demultiplexed" by CASAVA but using the wrong indices (so they made it into the 
"UndeterminedIndices" file... long story). 

Does anyone have any ideas how to handle this (or if fastq-multx can?). I 
really appreciate the input!

Original issue reported on code.google.com by [email protected] on 9 Jul 2014 at 2:12

  • Merged into: #30

Compilation on Mac

the google sparsehash library consistently fails to compile on MAC... some 
efford should be done to integrate/configure the sparsehash library so that it 
works on all platforms.  Probably just re-syncing with the latest version is 
good enough.

Original issue reported on code.google.com by [email protected] on 9 Jul 2014 at 2:13

Patches for porting ea-utils to other POSIX platforms + warnings clean-up

What steps will reproduce the problem?

1. Build on a FreeBSD, OS X, or other non-Linux system using clang.

What is the expected output? What do you see instead?

A few bugs, many style warnings, some run-time errors due to the use of 
getopt() GNU extensions.

What version of the product are you using? On what operating system?

1.1.2 on FreeBSD and OS X.

Please provide any additional information below.

First, thanks for making this software readily available.  Some of our 
researchers expect to find it very useful.

I have attached a set of patches necessary to build on FreeBSD with clang, 
which is highly compatible with gcc, but a bit more verbose by default.

I found a few genuine bugs like "if (!o_n < MAX_FILES)" and patched a bunch of 
style warnings to silence the compiler.  There are more style warnings, mostly 
related to missing return values, but I left them for now.  I would suggest 
compiling with gcc -Wall to uncover potential problems like these.  I've found 
it very helpful in identifying real bugs, even though most of the warnings it 
produces are innocuous.

I also altered some of the getopt() loops to make them conform to POSIX 
behavior.  They were using a leading '-' in optstring, which is a GNU extension 
and probably only works on Linux.

I've developed a FreeBSD port (very much like an RPM) and will be developing a 
pkgsrc package for use on CentOS, NetBSD, OS X, and possibly other platforms.

Note that we have not extensively tested the attached patches yet.  That's 
going to take some time and I wanted to make you aware of them ASAP.  I have 
confirmed that they eliminate all critical errors in "make check".  I few 
errors are reported, but they're due to minor differences in the output such as 
-Nan vs NaN.

Regards,

    Jason
    [email protected]


Original issue reported on code.google.com by [email protected] on 15 Feb 2015 at 5:28

Attachments:

Add "make clean" target to Makefile

It would be nice to have a "make clean" target to remove all compiled files,
so you can really start compiling from scratch.

$ svn diff
Index: Makefile
===================================================================
--- Makefile    (revision 662)
+++ Makefile    (working copy)
@@ -100,3 +100,6 @@
 bam-filter:  bam-filter.cpp 
        $(CC) $(CFLAGS) fastq-lib.cpp -o $@  $< -lbamtools 

+clean:
+       rm -f fastq-clipper fastq-join fastq-mcf fastq-multx fastq-stats 
sam-stats varcall
+       cd samtools && make clean



I first compiled ea-utils with the default settings:
  $ make

Later I decided to compile it with -march=native but make complained:
  $ CFLAGS='-march=native -I.' CPPFLAGS='-march=native -I.' make
  grep: ea-utils.spec: No such file or directory
  make: Nothing to be done for `all'.

Adding a make clean target allows me to recompile it:
  $ make clean
  grep: ea-utils.spec: No such file or directory
  rm -f fastq-clipper fastq-join fastq-mcf fastq-multx fastq-stats sam-stats varcall
  cd samtools && make clean
  make[1]: Entering directory `/software/ea-utils/ea-utils-svn/clipper-test/samtools'
  make[2]: Entering directory `/software/ea-utils/ea-utils-svn/clipper-test/samtools'
  rm -fr gmon.out *.o a.out *.exe *.dSYM razip bgzip samtools *~ *.a *.so.* *.so *.dylib
  make[2]: Leaving directory `/software/ea-utils/ea-utils-svn/clipper-test/samtools'
  make[1]: Leaving directory `/software/ea-utils/ea-utils-svn/clipper-test/samtools'
  $ CFLAGS='-march=native -I.' CPPFLAGS='-march=native -I.' make

Original issue reported on code.google.com by [email protected] on 9 Oct 2013 at 12:47

no ea-utils.spex file

Makefile has typo in 1.1.2-353 and 1.1.2-318 where perl looks for 
'ea-utils.spex' where 'ea-utils.spec' is intended.



Original issue reported on code.google.com by [email protected] on 27 Apr 2012 at 1:06

Wrong number (/ counter / calculation) in summary statistics of fastq-mcf

What steps will reproduce the problem?

1. Create a mock fastq file with the following content:
@readname
ATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCG
+
############################################################

2. Run fastqc-mcf with default parameters:
# ea-utils.1.1.2-686/fastq-mcf n/a test.fastq 
Command Line: n/a test.fastq
Scale used: 2.2
Phred: 33
Threshold used: 1 out of 1
Files: 1
Total reads: 1
Too short after clip: 1
Trimmed 1 reads by an average of 120.00 bases on quality < 7

What is the expected output? What do you see instead?
Observe that fastqc-mcf says on average 120 bases are clipped, though there are 
only 60 bases inside of the fastq file.

What version of the product are you using? On what operating system?
This happened with some older versions, as well as the most recent 
ea-utils.1.1.2-686 on a Linux machine.

This error occurs regardless of whether or not I am using an adapter file.


Original issue reported on code.google.com by delasmurf on 25 Aug 2014 at 12:29

Has anyone been able to compile EA utils on Moutain Lion?

What steps will reproduce the problem?
WHen I run make install I get this error: 
2.
3.

What is the expected output? What do you see instead?

++ -O3 -I.  fastq-lib.cpp tidx/tidx-lib.cpp -o varcall varcall.cpp -lgsl 
-lgslcblas
/usr/include/c++/4.2.1/bits/stl_pair.h: In instantiation of ‘std::pair<const 
std::string&, const std::vector<annot, std::allocator<annot> > >’:
tidx/tidx-lib.cpp:34:   instantiated from here
/usr/include/c++/4.2.1/bits/stl_pair.h:84: error: forming reference to 
reference type ‘const std::string&’
varcall.cpp:33:29: error: gsl/gsl_randist.h: No such file or directory
varcall.cpp: In member function ‘void 
VarCallVisitor::VisitX(PileupSummary&)’:
varcall.cpp:1125: error: ‘gsl_ran_poisson_pdf’ was not declared in this 
scope
varcall.cpp:1167: error: ‘gsl_ran_poisson_pdf’ was not declared in this 
scope
make: *** [varcall] Error 1


What version of the product are you using? On what operating system?
The newer one 1.1.2.537

Please provide any additional information below.

Thanks!


Original issue reported on code.google.com by [email protected] on 4 Mar 2013 at 2:00

The N-percent maximum difference

In the current version, it's default is 8, does this mean at most 8% difference 
allowed? However, in the wiki document, the default is .20

In addition, if the perfect matches wanted, should I set it as '0' in current 
version?

Original issue reported on code.google.com by [email protected] on 24 Apr 2013 at 6:51

fastq-multx

What steps will reproduce the problem?
1.i want to use the fastq-multx for demultiplexing the barcode file with the 
reads file i am using this command
fastq-multx -b barcode.txt s1.fq s2.fq -o 1%.fq -o 2%.fq

2.
3.

What is the expected output? What do you see instead?
i need to create two output files with barcodes in it

What version of the product are you using? On what operating system?
ea-utils.1.1.2-484 i am using in mac

Please provide any additional information below.
can i get steps how to run this command in the command line

i just downloaded it and trying to install

Original issue reported on code.google.com by [email protected] on 10 Nov 2012 at 12:09

fastq-stats make error: google/sparse_hash_map: No such file or directory

What is the expected output? What do you see instead?
CC=g++ PREFIX=/custom/path/ make install
g++ -O3 fastq-lib.cpp -o fastq-mcf fastq-mcf.c
g++ -O3 fastq-lib.cpp -o fastq-multx fastq-multx.c
g++ -O3 fastq-lib.cpp -o fastq-join fastq-join.c
g++ -O3 fastq-lib.cpp -o fastq-stats fastq-stats.cpp
fastq-stats.cpp:34:77: error: google/sparse_hash_map: No such file or directory
fastq-stats.cpp:99: error: expected unqualified-id before ‘<’ token
fastq-stats.cpp:99: error: expected ‘)’ before ‘<’ token
fastq-stats.cpp:99: error: expected initializer before ‘<’ token
fastq-stats.cpp:105: error: ‘google’ has not been declared
fastq-stats.cpp:105: error: expected constructor, destructor, or type 
conversion before ‘<’ token
fastq-stats.cpp:107: error: expected constructor, destructor, or type 
conversion before ‘<’ token
fastq-stats.cpp:118: error: expected constructor, destructor, or type 
conversion before ‘<’ token
fastq-stats.cpp: In function ‘int main(int, char**)’:
fastq-stats.cpp:170: error: ‘vector’ was not declared in this scope
fastq-stats.cpp:170: error: expected primary-expression before ‘>’ token
fastq-stats.cpp:170: error: ‘qcStats’ was not declared in this scope
fastq-stats.cpp:171: error: expected primary-expression before ‘>’ token
fastq-stats.cpp:171: error: ‘qcStats_by_qual’ was not declared in this scope
fastq-stats.cpp:179: error: ‘dups’ was not declared in this scope
fastq-stats.cpp:205: error: ‘vlen’ was not declared in this scope
fastq-stats.cpp:207: error: ‘vlen’ was not declared in this scope
fastq-stats.cpp:277: error: ‘google’ has not been declared
fastq-stats.cpp:277: error: expected primary-expression before ‘,’ token
fastq-stats.cpp:277: error: expected primary-expression before ‘int’
fastq-stats.cpp:277: error: expected ‘;’ before ‘int’
fastq-stats.cpp:278: error: ‘it’ was not declared in this scope
fastq-stats.cpp:293: error: ‘vector’ is not a member of ‘std’
fastq-stats.cpp:293: error: expected primary-expression before ‘>’ token
fastq-stats.cpp:293: error: ‘dup_sort’ was not declared in this scope
fastq-stats.cpp:294: error: ‘google’ has not been declared
fastq-stats.cpp:294: error: expected primary-expression before ‘,’ token
fastq-stats.cpp:294: error: expected primary-expression before ‘int’
fastq-stats.cpp:294: error: expected ‘;’ before ‘int’
fastq-stats.cpp:295: error: ‘it’ was not declared in this scope
fastq-stats.cpp:307: error: ‘sort’ is not a member of ‘std’
fastq-stats.cpp:439: error: ‘vlen’ was not declared in this scope
fastq-stats.cpp: At global scope:
fastq-stats.cpp:521: error: expected unqualified-id before ‘<’ token
fastq-stats.cpp:521: error: expected ‘)’ before ‘<’ token
fastq-stats.cpp:521: error: expected initializer before ‘<’ token
make: *** [fastq-stats] Error 1


What version of the product are you using? On what operating system?
ea-utils.1.1.2-358


Thanks for any help! :)

Original issue reported on code.google.com by [email protected] on 9 Jul 2012 at 2:39

make install failing because it can't find gsl-randist.h

What steps will reproduce the problem?
1.make install

What is the expected output? What do you see instead?
I expect the various binaries to compile and get put in the right place. 
Instead I get the following fatal error:
 varcall.cpp:33:29: fatal error: gsl/gsl_randist.h: No such file or directory

What version of the product are you using? On what operating system?
1.1.2-537 on Ubuntu 12.04 or 12.10 


Please provide any additional information below.
I installed gsl-bin, which includes gsl-randist but that didn't help because I 
guess I just had the binaries, not the source. I considered manually creating a 
gsl directory, downloading gsl_randist.h from the gsl-bin github repository, 
and putting it there, but that seemed likely to end in disaster.

Original issue reported on code.google.com by [email protected] on 28 May 2013 at 4:03

fastq-mcf doesn't work on Lion

Apparently it cannot work on OS X 10.7 (nor on 10.6) because of getopt function 
which always returns -1.

I suggest to apply this patch to use getopt_long:


$ diff -u fastq-mcf.c fastq-mcf.c.new 
--- fastq-mcf.c 2011-11-09 21:33:42.000000000 +0100
+++ fastq-mcf.c.new 2011-12-20 14:30:53.000000000 +0100
@@ -29,6 +29,7 @@
 #include <ctype.h>
 #include <stdio.h>
 #include <stdlib.h>
+#include <getopt.h>
 #include <unistd.h>
 #include <string.h>
 #include <errno.h>
@@ -87,7 +88,7 @@
 int debug=0;
 int warncount = 0;
 int main (int argc, char **argv) {
-   char c;
+   int c;
    bool eol;
    int nmin = 1, nkeep = 15, nmax=0;
    float minpct = 0.25;
@@ -116,7 +117,7 @@
    int e_n = 0;
    bool skipb = 0;

-   while ( (c = getopt (argc, argv, "-nf0uUVSRdbehp:o:l:s:m:t:k:x:P:q:L:C:w:")) 
!= -1) {
+   while ( (c = getopt_long (argc, argv, 
"-nf0uUVSRdbehp:o:l:s:m:t:k:x:P:q:L:C:w:", NULL, NULL)) != -1) {
        switch (c) {
        case '\1': 
            if (!afil) 

Original issue reported on code.google.com by daweonline on 20 Dec 2011 at 1:35

Inconsistant fastq-mcf between servers

Hi,

If I run the following command on two different servers (m1 and m2):

fastq-mcf -q 30 --qual-mean 25 /ngs/transcript/adap_mint_Illum_3.fas 
PRI_ALZH_CrIT_ACAGTC_L002_R1_005.fastq.gz 
PRI_ALZH_CrIT_ACAGTC_L002_R2_005.fastq.gz -o 
PRI_ALZH_CrIT_ACAGTC_L002_R1_005_trim.fastq -o 
PRI_ALZH_CrIT_ACAGTC_L002_R2_005_trim.fastq

I am getting two different results: one is the expected one (on m1), the other 
totally missed some adapters (m2)... On both machine I have compiled the same 
ea-utils versions (ea-utils.1.1.2-537) and I am using the exact same files.

One example for the following adapter:

>MINTadapterThreeprimeA
AAGCAGTGGTATCAACGCAGAGTACTTTTT

On m1:

[tristan@cabecou test_trim] zgrep -c AAGCAGTGGTATCAACGCAGAGTACTTTTT 
PRI_ALZH_CrIT_ACAGTC_L002_R1_005_trim.fastq.gz   
0 

On m2:

[tristan@umr5023-proasellus Sample_PRI_ALZH_CrIT] grep -c 
AAGCAGTGGTATCAACGCAGAGTACTTTTT PRI_ALZH_CrIT_ACAGTC_L002_R2_005_trim.fastq
79601


I guess it must be linked to some soft version differences between the two 
machines. Regarding GCC:

On m1:

[tristan@cabecou test_trim] gcc -v                                              

Using built-in specs.                                                           

COLLECT_GCC=gcc                                                                 

COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.7/lto-wrapper               

Target: x86_64-linux-gnu                                                        

Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 
4.7.2-2ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs 
--enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr 
--program-suffix=-4.7 --enable-shared --enable-linker-build-id 
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext 
--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7 
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object 
--enable-plugin --enable-objc-gc --disable-werror --with-arch-32=i686 
--with-tune=generic --enable-checking=release --build=x86_64-linux-gnu 
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.7.2 (Ubuntu/Linaro 4.7.2-2ubuntu1) 

On m2:

[tristan@umr5023-proasellus Sample_PRI_ALZH_CrIT] gcc -v
Utilisation des specs internes.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.6/lto-wrapper
Target: x86_64-linux-gnu
Configuré avec: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 
4.6.3-1ubuntu5' --with-bugurl=file:///usr/share/doc/gcc-4.6/README.Bugs 
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr 
--program-suffix=-4.6 --enable-shared --enable-linker-build-id 
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext 
--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.6 
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object 
--enable-plugin --enable-objc-gc --disable-werror --with-arch-32=i686 
--with-tune=generic --enable-checking=release --build=x86_64-linux-gnu 
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Modèle de thread: posix
gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) 


Any help much appreciated, m2 being our production machine, our pipeline is 
therefore stopped...

Thanks,
--
Tristan Lefebure


Original issue reported on code.google.com by [email protected] on 25 Mar 2013 at 9:23

Incorrect total reads when using FIFO

What steps will reproduce the problem?
1.mkfifo test1.fastq && mkfifo test2.fq
2.cat source1.fastq>test1.fastq &
3.cat source2.fastq>test2.fastq &
4.fastq-mcf -o source1_filtered.fastq -o source2_filtered.fastq -C 100000 -t 
0.01 -l 50 --qual-mean 25 --max-ns 10 -q 15 adapters.fa test1.fastq test2.fastq 
>filter.stats

What is the expected output? What do you see instead?
The "Total reads: xxxxxx" should match the number of reads in source1.fastq and 
source2.fastq.

However the total reads is 100000 less than the correct value, matching the 
value specified in "-C".

What version of the product are you using? On what operating system?
ea-utils.1.1.2-537.tar.gz, Ubuntu 12.04

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 18 Oct 2013 at 8:09

fastq-mcf does not process all fasta entries

What steps will reproduce the problem?
1. make a fasta file with 14 entries called test.fa
2. run fastq-mcf as: fastq-mcf test.fa <fastq_set_of_choice>
3. Watch the log: only the first 12 entries are reported.

What is the expected output? What do you see instead?
Only the first 12 entries in the fasta file are processed, instead of all 
entries.

The input (fasta entries named from A to N)
==========
$ cat contlist.fa 
>contaA
CAACCATTCATTCCAGCCTTCAATTAAAAGACTAATGATTATGCTACCTT
>contaB
GATCGGAAGAGCACACGTCTGAACTCCAGTCACCAGATCATCTCGTATGC
>contaC
AAAAAATTAAGTTACTTTAGGGATAACAGCGTAATTTTTTTGGAGAGTTC
>contaD
GTCCTTTCGTACTAAAATATCACAATTTTTTAAAGATAGAAACCAACCTG
>contaE
CTCGTCTTTTAAATAAATTTTAGCTTTTTGACTAAAAAATAAAATTCTAT
>contaF
GATCGGAAGAGCACACGTCTGAACTCCAGTCACACAGTGATCTCGTATGC
>contaG
CAAAAACATGTCTTTTTGAATTATATATAAAGTCTAACCTGCCCACTGAA
>contaH
CTAAAATATCACAATTTTTTAAAGATAGAAACCAACCTGGCTTACACCGG
>contaI
ATTTTTTTGGAGAGTTCATATCGATAAAAAAGATTGCGACCTCGATGTTG
>contaJ
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCCGTCCCGATCTCGTA
>contaK
GTCCTTTCGTACTAAAATATCATAATTTTTTAAAGATAGAAACCAACCTG
>contaL
CTGGCTTACACCGGTTTGAACTCAGATCATGTAAGAATTTAAAAGTCGAA
>contaM
GATCGGAAGAGCACACGTCTGAACTCCAGTCACGCCAATATCTCGTATGC
>contaN
GATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGC


The output (entries A to L processed, not M and N)
==========
$ cat fastqmcf.log 
Scale used: 2.2
Phred: 33
Threshold used: 751 out of 300000
Adapter contaA (CAACCATTCATTCCAGCCTTCAATTAAAAGACTAATGATTATGCTACCTT): counted 
1101 at the 'start' of 'readsample.fastq', clip set to 7
Adapter contaC (AAAAAATTAAGTTACTTTAGGGATAACAGCGTAATTTTTTTGGAGAGTTC): counted 
1171 at the 'start' of 'readsample.fastq', clip set to 7
Adapter contaD (GTCCTTTCGTACTAAAATATCACAATTTTTTAAAGATAGAAACCAACCTG): counted 
1398 at the 'start' of 'readsample.fastq', clip set to 6
Adapter contaE (CTCGTCTTTTAAATAAATTTTAGCTTTTTGACTAAAAAATAAAATTCTAT): counted 
1051 at the 'start' of 'readsample.fastq', clip set to 7
Adapter contaG (CAAAAACATGTCTTTTTGAATTATATATAAAGTCTAACCTGCCCACTGAA): counted 
1044 at the 'end' of 'readsample.fastq', clip set to 7
Adapter contaH (CTAAAATATCACAATTTTTTAAAGATAGAAACCAACCTGGCTTACACCGG): counted 
640 at the 'start' of 'readsample.fastq', clip set to 7
Adapter contaI (ATTTTTTTGGAGAGTTCATATCGATAAAAAAGATTGCGACCTCGATGTTG): counted 
1014 at the 'start' of 'readsample.fastq', clip set to 7, warning end was not 
reliable
Adapter contaK (GTCCTTTCGTACTAAAATATCATAATTTTTTAAAGATAGAAACCAACCTG): counted 
933 at the 'start' of 'readsample.fastq', clip set to 7
Adapter contaL (CTGGCTTACACCGGTTTGAACTCAGATCATGTAAGAATTTAAAAGTCGAA): counted 
1164 at the 'start' of 'readsample.fastq', clip set to 7
Files: 1
Total reads: 300000
Too short after clip: 6318
Clipped 'start' reads: Count: 2215, Mean: 19.34, Sd: 7.81
Clipped 'end' reads: Count: 899, Mean: 20.02, Sd: 8.77
Trimmed 9816 reads by an average of 8.33 bases on quality < 7


What version of the product are you using? On what operating system?
Last version, on CentOS6.3 64 bit.

Please provide any additional information below.
I am using fastq-mcf as a galaxy plugin. The behaviour is not due to the galaxy 
plugin code, but due to genuine fastq-mcf behaviour apparently. Thanks for your 
consideration!

See http://toolshed.g2.bx.psu.edu/

Original issue reported on code.google.com by [email protected] on 11 Feb 2013 at 9:44

make: *** [fastq-mcf] Error 1



What is the expected output? What do you see instead?
Attached a word file...problem during make

What version of the product are you using? On what operating system?
ea-utils.1.1.2-537. installed on windows7

Please provide any additional information

Original issue reported on code.google.com by [email protected] on 3 Jun 2013 at 12:49

Attachments:

compilation error because sparsehash/sparse_hash_map is not found

What steps will reproduce the problem?
1. tar -xzvf ea-utils.1.1.2-806.tar.gz
2. cd ea-utils.1.1.2-806
3. PREFIX=$HOME make install

What is the expected output? What do you see instead?
...
g++  -I/home/flutre/include fastq-mcf.cpp fastq-lib.cpp -o fastq-mcf
fastq-mcf.cpp:29:81: fatal error: sparsehash/sparse_hash_map: No such file or 
directory
 #include <sparsehash/sparse_hash_map> // or sparse_hash_set, dense_hash_map, ...
compilation terminated.
make: *** [fastq-mcf] Error 1

What version of the product are you using?
the latest

On what operating system?
Linux 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 
GNU/Linux

Please provide any additional information below.
The file does exist:
$ file sparsehash/sparse_hash_map
sparsehash/sparse_hash_map: ASCII C++ program text
Maybe the Makefile shoud be edited so that this file is found?

Original issue reported on code.google.com by [email protected] on 6 Nov 2014 at 8:03

Limit on the number of files it can handle

What steps will reproduce the problem?
1. Call fastq-multx with 6000 different barcodes

What is the expected output? What do you see instead?
fastq files

What version of the product are you using? On what operating system?
ea-utils.1.1.2-537
on "Ubuntu 12.04.3 LTS"

Please provide any additional information below.

I have about 6000 cells and I need to run fastq-multx on it in order to 
demultiplex them.
However when I do, I get this error:

End used: start
Error opening file '1211.barcoded.fastq': Too many open files

Is there a way to fix this?

Original issue reported on code.google.com by [email protected] on 17 Mar 2014 at 7:01

gtf2bed wrong output.

Hi,

I have been using using the gt2bed perl script for conversion. Recently I came 
across this. 

I downloaded gtf file (refflat hg19) from ucsc table browser. When I convert 
this gtf to bed, I see some of the transcripts have wrong info. 

For. eg: gtf file has 2 entries for transcript 'LOC100133331', one on 
chromsome-1 (4 exons) and chromosome-5 (4 exons). But the script produces 
single entry for both the transcripts, as a transcript on chromsome-1 with 8 
exons. I guess chr5 entry was concatenated with the chr1 entry since tx id was 
same. 

Also bed file produced by the script has lot lesser transcripts than in gtf 
file.



Original issue reported on code.google.com by [email protected] on 30 Oct 2014 at 10:11

Odd adapter parsing

What steps will reproduce the problem?
1. Use GATCGGAAGAGCGGTTCAGCAGGAATGCCGAGATCGGAAG as adapter1
2. Use GATCGGAAGAGCTCGTCAGCAGGAATGCCGAGATCGGAAG as adapter2
3. Issue fastq-mcf -n <adapter>.fa <fastq>

What is the expected output? What do you see instead?
Should show that there are two adapters as input.  You will see that it read 
the first adapter and discarded the second adapter.

What version of the product are you using? On what operating system?
ea-utils.1.0.3-145.tar.gz on Fedora 14 / 2.6.35.13-92.fc14.x86_64

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 16 Jun 2011 at 4:02

"fastq-mcf has stopped working"

What steps will reproduce the problem?
1. I just run fastq-mcf on sets of paired end read files.  It works fine for 
all of them except for one set of files.  I don't see any difference between 
that set and the others that would cause this. 
2. I use the following command line: fastq-mcf -t 0 -D 50 -q 30 
Nextera_adapters.fas H1Pplus_S20_L001_R1_001.fastq 
H1Pplus_S20_L001_R2_001.fastq -o H1Pplus_cleaned_1.fastq -o 
H1Pplus_cleaned_2.fastq
3. I try different option values (k, t, D), but the result is always the same.

What is the expected output? What do you see instead?
I expect two cleaned up read files, H1Pplus_cleaned_1.fastq and 
H1Pplus_cleaned_2.fastq.  At first it seems to work, "Scale used: 2.2
" appears on the command prompt screen.  Immediately after, a windows message 
appears that says "fastq-mcf has stopped working" and that windows is searching 
for a solution to the problem.

What version of the product are you using? On what operating system?
ea-utils-1.1.2-621-win64 on a pc windows 7


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 5 Feb 2015 at 8:18

GNU 'getline' function causes g++ to fail on OS X

I just downloaded ea-utils.1.1.0-190.tar.gz onto my Mac (OS 10.6) and tried to 
compile it, but got this error.

$ sudo make install
grep: ea-utils.spex: No such file or directory
g++ -O3 -o fastq-clipper fastq-clipper.c
fastq-clipper.c: In function 'int main(int, char**)':
fastq-clipper.c:138: error: 'getline' was not declared in this scope
make: *** [fastq-clipper] Error 1

I'm pretty sure the getline function is GNU specific, which restricts the use 
of ea-utils to Linux. I wouldn't imagine it to be too difficult to use 
something more portable (fgets perhaps) that would also allow the code to run 
on other Unix flavors (OS X, Solaris, BSD, etc).

...unless you're choosing only to support Linux...

Original issue reported on code.google.com by [email protected] on 29 Sep 2011 at 12:35

fastq-mcf: Allow merging multiple FASTQ input files to one (or two in case of paired-end) cleaned fastq file

It would be nice if fastq-mcf could write multiple input FASTQ files to one 
cleaned file (single-end sequencing) or to only two output FASTQ files (forward 
and reverse), so it is possible to clean FASTQ files from the same sample 
located in multiple lanes at once and only have one (or two) FASTQ files, so 
you don't need to create a merged cleaned FASTQ file afterwards yourself.


Also is this TODO still valid? http://code.google.com/p/ea-utils/wiki/FastqMcf

====================
When discarding one read for being "too short", it has to discard both pairs. 
For a sequencing run of normal quality this is not an issue. It should, though, 
write "un-mated" reads (whose mate was skipped) to a separate file. Typically, 
since these read mates were poor quality, it's not really useful... but it can 
be for diagnostics. I've seen runs where these provide valuable data.
====================

When cleaning paired-end reads, the cleaned forward and reverse FASTQ files 
seem to have the same number of reads.


It would also be nice to have the number of reads in the cleaned files itself 
in the statistics info:
 Kept reads = ((Total reads) - (Too short after clip))

Files: 2
Total reads: 46911033
Too short after clip: 1420553
Clipped 'end' reads (RNAseq.R1.fastq.gz): Count 1369131, Mean: 23.85, Sd: 15.35
Trimmed 5391478 reads (RNAseq.R1.fastq.gz) by an average of 17.01 bases on 
quality < 15
Clipped 'end' reads (RNAseq.R2.fastq.gz): Count 1113190, Mean: 16.53, Sd: 8.83
Trimmed 9609085 reads (RNAseq.R2.fastq.gz) by an average of 26.38 bases on 
quality < 15


Original issue reported on code.google.com by [email protected] on 9 Oct 2013 at 1:21

fastq-multx when barcode is in header between # and / symbols?

What steps will reproduce the problem?
1. I have fastq files where the barcode was written in the header row between 
symbols # (on the left) and / (on the right)
2. To complicate things further, we used 6-bases barcodes, and the machine was 
set to sequence 8-bases barcodes. We know for a fact that our barcode is the 
first 6 bases after the # symbol. The last two bases should always be "AT" in 
our case
3. I used the -B option with a file giving the 6-bases barcodes.

What is the expected output? What do you see instead?
I observed 88% "unmatched" reads (150.174.690 out of total 170.571.259). The 
log mentioned "End used: end", stating it used the 3' of the read. Is there a 
way to make it look into the header?

What version of the product are you using? On what operating system?
Working on Ubuntu Linux, ea-utils.1.1.2-537.tar.gz

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 20 Mar 2014 at 12:42

Error opening file 'GFI-410.fastq': Value too large for defined data type

I just ran fastq-multx tool to demultiplex barcodes from the fastq file and got 
this error. 

Error opening file 'GFI-410.fastq': Value too large for defined data type

I am using ea-utilsv1.1.258 on Ubuntu 8.04.4 LTS/Linux 2.6.24-25-generic i686

This is the command I used:
fastq-multx -B barcode-list.txt GFI-410.fastq -o barcode.%.fq -b

My fastq file consist of about 170 million reads and of size 41 GB size. My 
barcode list consist of 56 lines of different barcodes 

Is there a size limit for the fastq file to be used / should I use different 
version. Any help would be much appreciated. 

Thank you 

Original issue reported on code.google.com by [email protected] on 11 Jan 2012 at 9:12

fastq-mcf not working correctly on ubuntu 14.04

What steps will reproduce the problem?
1.fastq-mcf trim on Ubuntu 12.04:
kathryn@sol:~$ fastq-mcf -o SRR652093.fastq.clip -x 0 -k 0 -l 20 -L 40 -q 2 
--qual-mean 20 adaptors.fa SRR652093.fastq
Command Line: -o SRR652093.fastq.clip -x 0 -k 0 -l 20 -L 40 -q 2 --qual-mean 20 
adaptors.fa SRR652093.fastq
Scale used: 2.2
Phred: 33
Threshold used: 107 out of 42564
Adapter 5'_rev (TTTCAGGTGCCTACGATCATGCTGATGGCGCGAGGGAGGC): counted 38101 at the 
'end' of 'SRR652093.fastq', clip set to 1
Adapter 3'_for (CATGATTGATGGTGCCTACAG): counted 41709 at the 'start' of 
'SRR652093.fastq', clip set to 1
Files: 1
Total reads: 42564
Too short after clip: 6070
Filtered on quality: 980
Clipped 'start' reads: Count: 35988, Mean: 20.97, Sd: 0.86
Clipped 'end' reads: Count: 27073, Mean: 36.90, Sd: 7.98

2.Output from the same command, same adaptor file and same fastq file run on 
Ubuntu 14.04:
kathryn@kathryn-linux:/local/work$ fastq-mcf -o SRR652093.fastq.clip -x 0 -k 0 
-l 20 -L 40 -q 2 --qual-mean 20 adaptors.fa SRR652093.fastq
Command Line: -o SRR652093.fastq.clip -x 0 -k 0 -l 20 -L 40 -q 2 --qual-mean 20 
adaptors.fa SRR652093.fastq
Scale used: 2.2
Phred: 33
Threshold used: 107 out of 42564
Adapter 5'_rev (TTTCAGGTGCCTACGATCATGCTGATGGCGCGAGGGAGGC): counted 38101 at the 
'end' of 'SRR652093.fastq', clip set to 1
Files: 1
Total reads: 42564
Too short after clip: 1
Filtered on quality: 38
Clipped 'end' reads: Count: 32928, Mean: 36.75, Sd: 8.03

3.Check that the missing adaptor is in the fastq file:
kathryn@kathryn-linux:/local/work$ grep -c '^CATGATTGATGGTGCCTACAG' 
SRR652093.fastq 
41068

What is the expected output? What do you see instead?
I would the outputs on the two systems to be the same.  However, on the more 
recent version of Ubuntu the adaptor at the 'start' of the reads is not found 
or trimmed

What version of the product are you using? On what operating system?
I have tested 1.1.2-686 and 1.1.2-780.  Both work as expected on Ubuntu 10.04 
and 12.04 but not 14.04.


Please provide any additional information below.
I have tried this with other data sets and it is consistently the adaptor at 
the 'start' of the sequence that is not recognised or clipped correctly on 
14.04.

I compiled these packages on all 3 versions of Ubuntu myself as follows:
download tarball
tar xzvf <package>
cd <package dir>
make all

For 1.1.2-780 only before the make all:
cd sparsehash-2.0.2
./configure
sudo make install
cd <package dir>

If there are any other instructions for compiling on 14.04 that I missed, I 
apologise!


Original issue reported on code.google.com by [email protected] on 3 Sep 2014 at 4:19

fastq-mcf doesn't clip... even if it finds things to clip

What steps will reproduce the problem?

1. any attempt to clip adapters

What is the expected output? What do you see instead?

expect them to be clipped

Please use labels and text to provide additional information.

typo in barcode-skip code caused all reads to be skipped... fixed

Original issue reported on code.google.com by [email protected] on 10 May 2011 at 7:28

Giant FASTQ support in stats

Some stats programs have things like kmer (with -K) reports and probe-id 
counting (with -D).

These programs can consume a lot of RAM (>10GB), even with the highly efficient 
sparsehash library on very large files (> 200 mil reads).

The use of a disk-backed key-value store, like levelDB could see decent 
performance, like a hash, but would also allow growth past available RAM with 
decent performance.   I'm thinking that the code should switch to a DB-backed 
store at the 200 mil record level.   This would slow things down by about 3x 
(from 1 mil writes/sec to 300k writes/sec), but would also allow infinte 
growth.  Enabling a large LRU cache could it perform so similarly that the 
sparse hash can be abandoned, especially if the db remains an insigificant 
fraction of the stats collection process.   

Original issue reported on code.google.com by [email protected] on 9 Jul 2014 at 2:26

fastq-mcf: invalid adapter file ==> Floating point exception

What steps will reproduce the problem?

$ cat test_empty_seq_line.fa 
>test1

>test2
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>test3
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT


$ /software/ea-utils/ea-utils-svn/bin/fastq-mcf test_noseq_line.fa test.fq.gz
Scale used: 2.2
Phred: 33
Threshold used: 751 out of 300000
Adapter test1 (): counted 300000 at the 'start' of 
'DMPABP_JO15_Gal4.clean.fq.gz', clip set to 0, warning end was not reliable
Adapter test3 (TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT): counted 
11363 at the 'end' of 'DMPABP_JO15_Gal4.clean.fq.gz', clip set to 4
Files: 1
Floating point exception (core dumped)






$ cat test_no_seq_line.fa 
>test1
>test2
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>test3
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT


$ /software/ea-utils/ea-utils-svn/bin/fastq-mcf test_no_seq_line.fa test.fq.gz

Malformed adapter fasta record at line 3
Scale used: 2.2
Phred: 33
Threshold used: 751 out of 300000
No adapters found, no skewing detected, and no trimming needed.
Files: 1
@HWI-ST571:356:C2C51ACXX:4:1201:1288:2123 1:N:0:TTTGGC
GACTCCTGAGTAGCTGGGATTACGGGCGCAGGCCACCACACCCAGCTAATT
+
C@@FFFFDHGHBFGDIGIBGHIIIJIH):@GGBH@FHIJDHJIEAAEEHAE
@HWI-ST571:356:C2C51ACXX:4:1201:1326:2156 1:N:0:TTAGGC
CAGGGGAAACTTGGCCTCGATGGGCACCAGGGTGGTGTAGGTCTGTTTCAC
+

...





$ cat test_no_seq_line_with2seq.fa 
>test1
>test2
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA


$ /software/ea-utils/ea-utils-svn/bin/fastq-mcf test_no_seq_line_with2seqs.fa 
test.fq.gz | head -n50
Scale used: 2.2
Phred: 33
Threshold used: 751 out of 300000
No adapters found, no skewing detected, and no trimming needed.
Files: 1
@HWI-ST571:356:C2C51ACXX:4:1201:1288:2123 1:N:0:TTTGGC
GACTCCTGAGTAGCTGGGATTACGGGCGCAGGCCACCACACCCAGCTAATT
+
C@@FFFFDHGHBFGDIGIBGHIIIJIH):@GGBH@FHIJDHJIEAAEEHAE
@HWI-ST571:356:C2C51ACXX:4:1201:1326:2156 1:N:0:TTAGGC
CAGGGGAAACTTGGCCTCGATGGGCACCAGGGTGGTGTAGGTCTGTTTCAC
+
CCCFFFFFHHHHHJJJJJJHIJJJJJJJJJJJ?DH@FFGIJFHGGHGIJJJ
...


What is the expected output? What do you see instead?

When an empty sequence is given in the adapter.fa file, fastq-mcf crashes.

When a corrupt adapter.fa file is given (lines next after each other start both 
with ">"), only a warning is printed when there are 2.5 sequences.

When a corrupt adapter.fa file is given (lines next after each other start both 
with ">"), no warning is printed when there are 1.5 sequences.


It would be nice if those problems could be fixed.
Instead of a warning that is displayed when there is something wrong in the 
adapter.fa file, it might be a good idea to error out by default if that 
happens (with optionally an option to override this behaviour).


What version of the product are you using? On what operating system?
svn version



Original issue reported on code.google.com by [email protected] on 4 Dec 2013 at 7:09

sam-stats fails to properly deduce unmapped reads and snp (ins/del) rate

Probably an entirely bwa specific issue.

What steps will reproduce the problem?
0. Generated a bam using BWA 0.6.2
1. Filtered for unmapped reads using samtools view -f4 <your.bam>
2. Ran sam-stats on the above generated sam
3. Noticed that there was 1 read which had a non "* CIGAR string, had a NM 
value of 3 and had a corresponding MD string.

What is the expected output? What do you see instead?
I would expect sam-stats to determine mapped or unmapped based on the bit flag. 
After filtering the unmapped reads, it could then figure out other stats like 
snp, ins and del rates.

What version of the product are you using? On what operating system?
Version: 1.34.488

Please provide any additional information below.

$ grep "NM:" sample.unmapped.sam

HWI-ST1134:50:C0EBHACXX:8:1205:20100:107686     103     GL000195.1     111     
60     101M     =     111     0     
AAGAATTCCTCGTTCACACAGTTTCTTAAGCTTCCTGGGATGCGACCTGTGATGGCTCGGCGGAGCTCGGTGGCAGTTGT
CTCCCTCATCTCCAGTGACAC     
=>;?@BB@>BA9@BAC@B@CAABBABBAAA>BAA>BB>>ABB>9A@>BCABACB>>BA:?>9>???CB<<=@<?D?<>?=
>A?>?BB@??B@>A>@B>>A?     X0:i:1     X1:i:0     
BD:Z:KKLMMMLLMLLLLLLMLLLLNMLCLLKKKLNMKLMLNMKMMNMLLMLLNLLMMNMMMLLLMLLMMNMLLLLLMMM
NMLLLMLLMJLLMMMLMNNONMMMLL     MD:Z:0T0C74C24     RG:Z:1     XG:i:0     
BI:Z:PPPPQQPPPQPPPPPQOPOPQQPKPPPPPPQPPPPQQPNPPQQPPQQQQPPQPQPPPPPPPPPPPQPPPPPPPPP
QQPQPQPPPNQPQQOOOOOQQPQQPO     AM:i:37     NM:i:3     SM:i:37     XM:i:3     
XO:i:0     
OQ:Z:CCCFFFFFHHHHHIIJJGIJJHIIIJIJJIIIJJJIIJJIIJJJJJJJJIJJJJJJJIIHGFD?BBDDD><?@BD
?9>C:@CCDDCDBCCCCCCCDDDDDC     XT:A:U

Original issue reported on code.google.com by [email protected] on 9 Sep 2013 at 5:55

ea-utils621-win fastq-stats and fastq-mcf not working



What is the expected output? What do you see instead?
fastq-stats and fastq-mcf (windows 621 exe) open a new window saying 
fsatq-stats.exe has stopped working,windows is checking for a problem,A problem 
caused the program to stop working correctly.
same for fastq-mcf.
("I used the fastq-join and it has ran without any errors")



What version of the product are you using? On what operating system?
ea-utils 621-win




Original issue reported on code.google.com by [email protected] on 12 Jun 2013 at 4:25

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.