Code Monkey home page Code Monkey logo

annotater's Issues

getting multiple "print() on closed filehandle TMPOUT" messages

i'm seeing multiple messages like this during Taxonomy step

print() on closed filehandle TMPOUT at /path/to/annotater/Reann.pm line 275, <IN> line 17

which is followed by this exception:

------------- EXCEPTION -------------
MSG: Could not read file '~/path/to/viral.1.1.genomic.fna.gis.txt.fa': No such file or directory
STACK Bio::Root::IO::_initialize_io /path/to/bioperl/bioperl-live/Bio/Root/IO.pm:268
STACK Bio::SeqIO::_initialize /path/to/bioperl/bioperl-live/Bio/SeqIO.pm:513
STACK Bio::SeqIO::fasta::_initialize /path/to/bioperl/bioperl-live/Bio/SeqIO/fasta.pm:87
STACK Bio::SeqIO::new /path/to/bioperl/bioperl-live/Bio/SeqIO.pm:389
STACK Bio::SeqIO::new /path/to/bioperl/bioperl-live/Bio/SeqIO.pm:435
STACK Reann::Taxonomy /path/to/annotater/Reann.pm:282
STACK toplevel /path/to/annotater/Reann.pl:17
-------------------------------------

BLAST database error during Taxonomy step

I'm getting a 'No alias or index file found' error during Taxonomy step. I'm only getting this with Rapsearch databases that are not in $BLASTDB.

Taxonomy
        Getting fasta seqs for protein_Mononeg_Aedes
BLAST Database error: No alias or index file found for nucleotide database [protein_Mononeg_Aedes] in search path

BLAST error: No alias or index file found for database

When running annotater with Rapsearch, you need a parallel BLAST database with the same name as the Rapsearch database in the same directory. This is because of the hack to use blastdbcmd in the Version method to get database information. There is no parallel command like blastdbcmd in the rapsearch distribution.

getting integer values for the Description column in Report.txt

When using BLAST+ => 2.5.0, the Description values in the Report are integer values. When I use BLAST+ <= 2.3.0 (didn't try 2.4), the Descriptions are as I expect (i.e. Tobacco mosaic virus, complete genome)

This stems from a change in how blastdbcmd extracts fasta sequences from the blast database. When this fullgi gi|9626125|ref|NC_001367.1| is used with blastdbcmd, it extracts the following

for BLAST >= 2.5.0

blastdbcmd -db viral.1.1.genomic -entry "gi|9626125|ref|NC_001367.1|" | head -n1
>NC_001367.1 Tobacco mosaic virus, complete genome

for BLAST <= 2.3.0

$ blastdbcmd -db viral.1.1.genomic -entry "gi|9626125|ref|NC_001367.1|" | head -n1
>gi|9626125|ref|NC_001367.1| Tobacco mosaic virus, complete genome

Then when Reann.pm reads the fasta file it uses the fasta identifier, which is an ACC.VER for BLAST+ >= 2.5, to update the %acc hash. With ACC.VER, it is creating a new hash key with the description. This new key (the ACC.VER) is never accessed again because later in the code, the fullgi from the Report file (the Accession field) is used to look up what is expected to be the Description. However, the hash value is the number of times the fullgi was found in the Report file (due to this line)

To test, use sewageseqs16.fa with

BLAST+ 2.3.0 (Descriptions are OK)

$ export PATH=/home/paul/local/usr/local/ncbi-blast-2.3.0+/bin/:$PATH
$ rm -rf annotator
$ Reann.pl -file ../sewageseqs16.fa -config sewageseqs.conf -num_threads 8 -evalue 1e-5
sewageseqs.conf
ann.0.fasta
tblastx -show_gis -num_threads 8 -outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen qcovs qcovhsp" -db viral.1.1.genomic -evalue 1e-5 -query ann.0.fasta -out ann.0.0.tblastx
Report
Taxonomy
        Getting fasta seqs for viral.1.1.genomic
        Skipping LocalTaxonomy
temp file is /tmp/3yjYOMnh2E
$ cut -f 1,7,8 annotator/ann.wTax.report.txt
seqID   accession       desc
All.viralseqs.fa.Contig1000     gi|9626125|ref|NC_001367.1|     Tobacco mosaic virus, complete genome
All.viralseqs.fa.Contig1002     gi|764159063|ref|NC_026582.1|   Uncultured phage WW-nAnB, complete genome
All.viralseqs.fa.Contig1006     gi|20177424|ref|NC_003630.1|    Pepper mild mottle virus, complete genome
All.viralseqs.fa.Contig1007     gi|9630643|ref|NC_001918.1|     Aichi virus, complete genome
All.viralseqs.fa.Contig1008     gi|20177424|ref|NC_003630.1|    Pepper mild mottle virus, complete genome
All.viralseqs.fa.Contig1        gi|1043372815|ref|NC_030457.1|  Circovirus-like genome DCCV-8, complete genome
All.viralseqs.fa.Contig10       gi|20260796|ref|NC_003692.1|    Pariacato virus chromosome RNA2, complete genome
All.viralseqs.fa.Contig100      gi|45476493|ref|NC_005817.1|    Sclerophthora macrospora virus A RNA 1, complete sequence
All.viralseqs.fa.Contig1001     gi|1127298492|ref|NC_032551.1|  Beihai picorna-like virus 77 strain BHBei75652 hypothetical protein 1 and hypothetical protein 2 genes, complete cds
All.viralseqs.fa.Contig1003     gi|20177424|ref|NC_003630.1|    Pepper mild mottle virus, complete genome
All.viralseqs.fa.Contig1004     gi|1127298656|ref|NC_032614.1|  Beihai picorna-like virus 74 strain BHHK130969 hypothetical protein 1 and hypothetical protein 2 genes, complete cds
FTWLCJP02IU3XZ  gi|593779948|ref|NC_023737.1|   Mycobacterium phage Pleione, complete genome
FTWLCJP02IU4PY  gi|203454602|ref|NC_011273.1|   Mycobacterium phage Myrna, complete genome
FTWLCJP02IUB6X  gi|197085614|ref|NC_011142.1|   Iodobacteriophage phiPLPE, complete genome
FTWLCJP02IUGFL
FTWLCJP02IUNP9

BLAST+ 2.6.0 (Descriptions are not ok; integer values)

$ export PATH=/home/paul/local/usr/local/ncbi-blast-2.6.0+/bin/:$PATH
$ rm -rf annotator
$ Reann.pl -file ../sewageseqs16.fa -config sewageseqs.conf -num_threads 8 -evalue 1e-5
sewageseqs.conf
ann.0.fasta
tblastx -show_gis -num_threads 8 -outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen qcovs qcovhsp" -db viral.1.1.genomic -evalue 1e-5 -query ann.0.fasta -out ann.0.0.tblastx
Report
Taxonomy
        Getting fasta seqs for viral.1.1.genomic
        Skipping LocalTaxonomy
temp file is /tmp/JkhItzr1AK
$ cut -f 1,7,8 annotator/ann.wTax.report.txt
seqID   accession       desc
All.viralseqs.fa.Contig1000     gi|9626125|ref|NC_001367.1|     1
All.viralseqs.fa.Contig1002     gi|764159063|ref|NC_026582.1|   1
All.viralseqs.fa.Contig1006     gi|20177424|ref|NC_003630.1|    3
All.viralseqs.fa.Contig1007     gi|9630643|ref|NC_001918.1|     1
All.viralseqs.fa.Contig1008     gi|20177424|ref|NC_003630.1|    3
All.viralseqs.fa.Contig1        gi|1043372815|ref|NC_030457.1|  1
All.viralseqs.fa.Contig10       gi|20260796|ref|NC_003692.1|    1
All.viralseqs.fa.Contig100      gi|45476493|ref|NC_005817.1|    1
All.viralseqs.fa.Contig1001     gi|1127298492|ref|NC_032551.1|  1
All.viralseqs.fa.Contig1003     gi|20177424|ref|NC_003630.1|    3
All.viralseqs.fa.Contig1004     gi|1127298656|ref|NC_032614.1|  1
FTWLCJP02IU3XZ  gi|593779948|ref|NC_023737.1|   1
FTWLCJP02IU4PY  gi|203454602|ref|NC_011273.1|   1
FTWLCJP02IUB6X  gi|197085614|ref|NC_011142.1|   1
FTWLCJP02IUGFL
FTWLCJP02IUNP9

Exception error when testing Taxonomy.t

On a OSX 10.12.6, I ran prove -v Taxonomy.t and got this error:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Response Error
Can't verify SSL peers without knowing which Certificate Authorities to trust
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/pgc92/local/lib/bioperl/bioperl-live/Bio/Root/Root.pm:447
STACK: Bio::DB::GenericWebAgent::get_Response /Users/pgc92/local/lib/bioperl/bioperl-live/Bio/DB/GenericWebAgent.pm:214
STACK: Bio::DB::EUtilities::get_Response /Users/pgc92/perl5/lib/perl5/Bio/DB/EUtilities.pm:35
STACK: Taxonomy::accession2gi /Users/pgc92/local/lib/annotater/Taxonomy.pm:169
STACK: Taxonomy.t:19
-----------------------------------------------------------

blastdbcmd errors during Taxonomy step

I'm getting blastdbcmd errors after moving BLAST database files to a new directory mount (/zfs2) on the HTC cluster. Never had these problems before on mobydisk

Taxonomy
        Getting fasta seqs for /zfs2/bar/refs/blast/nr
Error: [blastdbcmd] Entry not found: YP7.1
Error: [blastdbcmd] Skipped YP7.1
Error: [blastdbcmd] Entry not found: XP_00643
Error: [blastdbcmd] Skipped XP_00643

Confirm that use of BLAST's `-max_target_seqs` is intentional

Hi there,

This is a semi-automated message from a fellow bioinformatician. Through a GitHub search, I found that the following source files make use of BLAST's -max_target_seqs parameter:

Based on the recently published report, Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows, there is a strong chance that this parameter is misused in your repository.

If the use of this parameter was intentional, please feel free to ignore and close this issue but I would highly recommend to add a comment to your source code to notify others about this use case. If this is a duplicate issue, please accept my apologies for the redundancy as this simple automation is not smart enough to identify such issues.

Thank you!
-- Arman (armish/blast-patrol)

Rapsearch outputs illegal evalues

Rapsearch 2.24 is outputting illegal evalues (like below) and Rapsearch.pm does not parse them properly

  • 0x0p+0
  • 0x1.3fa9b55ddbf92p-178
  • 0x1.e49bd1ea65088p-134

EXCEPTION: Bio::Root::Exception MSG: Response Error Too Many Requests

When testing the Taxonomy.pm module using Taxonomy.t, recently been getting alot of errors like below (stemming from gi2taxid and taxid2lineage methods). I thought the Bioperl module has a built in delay so that it doesn't send too many requests.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Response Error
Too Many Requests
STACK: Error::throw
STACK: Bio::Root::Root::throw /ihome/uchandran/pgc92/local/lib/bioperl/bioperl-live/lib/Bio/Root/Root.pm:449
STACK: Bio::DB::GenericWebAgent::get_Response /ihome/uchandran/pgc92/local/lib/bioperl/bioperl-live/lib/Bio/DB/GenericWebAgent.pm:215
STACK: Bio::DB::EUtilities::get_Response /ihome/uchandran/pgc92/local/lib/bioperl/Bio-EUtilities/lib/Bio/DB/EUtilities.pm:204
STACK: Bio::DB::EUtilities::get_Parser /ihome/uchandran/pgc92/local/lib/bioperl/Bio-EUtilities/lib/Bio/DB/EUtilities.pm:287
STACK: Bio::DB::EUtilities::next_DocSum /ihome/uchandran/pgc92/local/lib/bioperl/Bio-EUtilities/lib/Bio/DB/EUtilities.pm:856
STACK: Taxonomy::gi2taxid /ihome/uchandran/pgc92/local/lib/annotater/Taxonomy.pm:232

and

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Response Error
Too Many Requests
STACK: Error::throw
STACK: Bio::Root::Root::throw /ihome/uchandran/pgc92/local/lib/bioperl/bioperl-live/lib/Bio/Root/Root.pm:449
STACK: Bio::DB::GenericWebAgent::get_Response /ihome/uchandran/pgc92/local/lib/bioperl/bioperl-live/lib/Bio/DB/GenericWebAgent.pm:215
STACK: Bio::DB::EUtilities::get_Response /ihome/uchandran/pgc92/local/lib/bioperl/Bio-EUtilities/lib/Bio/DB/EUtilities.pm:204
STACK: Taxonomy::taxid2lineage /ihome/uchandran/pgc92/local/lib/annotater/Taxonomy.pm:276

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.