burntsushi / cablastp Goto Github PK
View Code? Open in Web Editor NEWPerforms BLAST on compressed proteomic data.
License: GNU General Public License v2.0
Performs BLAST on compressed proteomic data.
License: GNU General Public License v2.0
The database is Uniref-90 fasta file compressed using compress command. The folder essentially contains following files : coarse.fasta coarse.fasta.index coarse.links coarse.links.index coarse.seeds compressed compressed.index params
The database is in another user's account (though the complete path to it given to the progam)
Tried running cablastp using the following command
cablastp-search /path/capblastp-UNIREF90 /path/stn1_ALL_prodigal.prot --blast-args -max_target_seqs 10 -out stn1_ALL_prodigal.blast -num_threads 6 -outfmt '6 qseqid sseqid pident qcovs length mismatch qstart qend sstart send evalue bitscore stitle'
gs -max_target_seqs 10 -num_threads 6 -out stn1_ALL_prodigal.blast -outfmt '6 qseqid sseqid pident qcovs length mismatch qstart qend sstart send evalue bitscore stitle'
Opening database in path/capblastp-UNIREF90...
Opening compressed database...
Done opening compressed database.
Opening coarse database...
Done opening coarse database.
Done opening database in /path/capblastp-UNIREF90
Blasting query on coarse database...
blastp -db path/capblastp-UNIREF90/blastdb-coarse -num_threads 24 -outfmt 5 -dbsize 4923603367
Error blasting coarse database: Error running 'blastp -db path/blastdb-coarse -num_threads 24 -outfmt 5 -dbsize 4923603367': 'exit status 2'.
stderr:
Is this an issue with the compressing of the database ? it seems to open the compressed database just fine. The path given in stderr after 'in search path' is different from what I gave for the database.
I tried blastp against downloaded compressed nr database. However got an error during the run.... something about not being able to find a file. The only file I downloaded was the Compressed NCBI NR dated Jun 30, 2013 (http://groups.csail.mit.edu/cb/cablastp/cablastp-nr20130630.tar.gz) Is there any way I can check if this is not an issue with the downloaded compressed database ?
Opening database in /nv/hp10/nsarode3/blast_databases/cablastp_database/cablastp-nr20130630...
Opening compressed database...
Done opening compressed database.
Opening coarse database...
Done opening coarse database.
Done opening database in /nv/hp10/nsarode3/blast_databases/cablastp_database/cablastp-nr20130630.
Blasting query on coarse database...
blastp -db /nv/hp10/nsarode3/blast_databases/cablastp_database/cablastp-nr20130630/blastdb-coarse -num_threads 10 -outfmt 5 -dbsize 9281362451
Error blasting coarse database: Error running 'blastp -db /nv/hp10/nsarode3/blast_databases/cablastp_database/cablastp-nr20130630/blastdb-coarse -num_threads 10 -outfmt 5 -dbsize 9281362451': 'exit status 2'.
BLAST Database error: Could not find volume or alias file (cablastp-nr20130630/blastdb-coarse.00) referenced in alias file (/nv/pb4/bio-stewart/cablastp_database/cablastp-nr20130630/blastdb-coarse).
Thanks,
Neha
return an intelligent non-error, rather than an internal BLAST error, when the coarse search finds no results.
Hi,
I just realized that cablastp-extract appends *
characters to my protein sequnces:
Input
>tr|R4XQ05|R4XQ05_ALCXX
MTMDSTIYLTLWAVLAFVSWLIVAGGAVLAVFSRAIKDTTFERIGLAAVSLTATGAACRIFMAGWASAGDAALAA
SAAFYVAAVTAKHIRKPTL
...
Output
>tr|R4XQ05|R4XQ05_ALCXX
MTMDSTIYLTLWAVLAFVSWLIVAGGAVLAVFSRAIKDTTFERIGLAAVSLTATGAACRI
FMAGWASAGDAALAASAAFYVAAVTAKHIRKPTL*
...
Is there a reason for that, or is that a bug?
It should be set to the value of the -p
switch.
We could parse the arguments given after --blast-args
, but I'd rather stick to our simple solution: don't touch them.
Here is another one for ya.
Not sure if the real error is "signal: bus error (core dumped)" or "stderr: num_threads' is currently ignored when 'subject' is specified."
Blasting query on coarse database...
blastp -db /blast_databases/cablastp_database/capblastp-cogs_db_aa/blastdb-coarse -num_threads 9 -outfmt 5 -dbsize 47558392
Decompressing blast hits...
Blasting query on fine database...
blastp -subject /tmp/cablastp-fine-fasta604138705 -dbsize 47558392 -num_threads 9 -max_target_seqs 10 -out /blastoutput/stn_ALL_prodigal.prot_2.blast -outfmt 6 qseqid sseqid pident qcovs length mismatch qstart qend sstart send evalue bitscore
Error blasting fine database: Error running 'blastp -subject /tmp/cablastp-fine-fasta604138705 -dbsize 47558392 -num_threads 9 -max_target_seqs 10 -out /blastoutput/stn_ALL_prodigal.prot_2.blast -outfmt 6 qseqid sseqid pident qcovs length mismatch qstart qend sstart send evalue bitscore': 'signal: bus error (core dumped)'.
stderr:
'num_threads' is currently ignored when 'subject' is specified.
have cablastp-compress write the number of residues to the params file, and have the search tools read it and pass it along to the fine query. Right now, it is hardcoded but that hardcoded value is passed along to the fine query.
While I was going over the NW code in biogo, I noticed that the lookup for backtracking is incorrect, the error is also in the cablastp backtracking code.
The lookup into matrix is transposed between left and up - this is unlikely to make a difference in any normal matrix.
I also corrected the approach to assessing which path to take on the backtrack - the inequality tests were a kludge. Please take a look at the current state of the NW backtracking code in the align package:
http://code.google.com/p/biogo/source/browse/align/nw_letters.go#130
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.