Code Monkey home page Code Monkey logo

snowball's Introduction

snowball's People

Contributors

algbioi avatar

Stargazers

Jie Zhu avatar Daniel Vera avatar Haowen Zhang avatar

Watchers

James Cloos avatar Andreas Bremges avatar Mariano_Javier_de_Leon_Dominguez_Romero avatar 李欢 avatar  avatar

snowball's Issues

Parallel processing issue?

I am trying to run snowball with the command:
python algbioi/ga/run.py -f /local-homes/bioinformatics/erica/metagenome_stool/both_allseqs_R1.fastq.gz -s /local-homes/bioinformatics/erica/metagenome_stool/both_allseqs_R2.fastq.gz -m /local-homes/bioinformatics/erica/metagenome_stool/pfam-snowball.hmm -o both_contigs.fna.gz -i 338 -r 250

but it looks like there is an issue with the multiprocessing module
I receive this to stdout:
This hmmsearch binary will be used:
/usr/bin/hmmsearch
Using temporary directory:
/tmp/snowball_vxNG2N
Running on: Ubuntu 12.04 precise (linux2)
Using 32 processors
Settings:
Read length: 250
Insert size: 338
Min. overlap probability: 0.8
Min. overlap length: 0.5
Min. HMM score: 40
Joining paired-end reads into consensus reads, loading reads from:
/local-homes/bioinformatics/erica/metagenome_stool/both_allseqs_R1.fastq.gz
/local-homes/bioinformatics/erica/metagenome_stool/both_allseqs_R2.fastq.gz
Traceback (most recent call last):
File "algbioi/ga/run.py", line 493, in
_main()
File "algbioi/ga/run.py", line 468, in _main
outAnnot=outAnnot, cleanUp=cleanUp, processors=processors)
File "algbioi/ga/run.py", line 175, in mainSnowball
maxCpu=comh.MAX_PROC)
File "/local-homes/bioinformatics/erica/metagenome_stool/snowball_1_2/algbioi/com/fq.py", line 150, in joinPairEnd
retList = parallel.runThreadParallel(taskList, maxCpu)
File "/local-homes/bioinformatics/erica/metagenome_stool/snowball_1_2/algbioi/com/parallel.py", line 101, in runThreadParallel
retValList.append(taskHandler.get())
File "/usr/lib/python2.7/multiprocessing/pool.py", line 528, in get
raise self._value
IndexError: string index out of range

Any ideas on how to resolve this problem?
Thank you in advance for your help,
Erica

Not joined reads and reads partition issues

I am trying to apply snowball to metatranscriptomics datasets. My current issue is that 99% of the reads were filtered out before running hmm search. So I only get about 1000 reads being joined into consensus reads. Is there any way that I can input paired end reads without requiring them to be joined into consensus reads? Also, even with about 1000 reads passed the filtering cut-off, the program will break at the point partitioning the reads into different Pfam-A gene families. I tried to run snowball on the test datasets you provided, and it worked fine. So any suggestions that I can try? Appreciate for any advises you may give. Thanks a lot!

Program crash

snowball crashed:

COMMAND LINE:
snowball -f R1-trimmed.fq -s R2-trimmed.fq -m Pfam-A.hmm -i 225 -r 140 -p 15

STDOUT
This hmmsearch binary will be used:
"path to hmmsearch"
Using temporary directory:
/tmp/snowball_FRFjf1
Running on: Ubuntu 14.04 trusty (linux2)
Using 15 processors
Settings:
Read length: 140
Insert size: 225
Min. overlap probability: 0.8
Min. overlap length: 0.5
Min. HMM score: 40
Joining paired-end reads into consensus reads, loading reads from:
"path"R1-trimmed.fq
"path"R2-trimmed.fq
Filtered out: 99.259 % reads
Translating reads to protein sequences
Running HMMER (hmmsearch)
Assigning consensus reads to gene domains
Exception in partitionReads:
/tmp/snowball_FRFjf1 40 0.6 1 sample_partitioned True
Not a gzipped file
<type 'exceptions.IOError'>
('Not a gzipped file',)
Traceback (most recent call last):
File "/ebio/abt6_projects9/microbiome_analysis/data/software/mypython/envs/snowball/bin/snowball", line 493, in
_main()
File "/ebio/abt6_projects9/microbiome_analysis/data/software/mypython/envs/snowball/bin/snowball", line 468, in _main
outAnnot=outAnnot, cleanUp=cleanUp, processors=processors)
File "/ebio/abt6_projects9/microbiome_analysis/data/software/mypython/envs/snowball/bin/snowball", line 199, in mainSnowball
comh.SAMPLES_SHUFFLE_RAND_SEED, comh.SAMPLES_PFAM_PARTITIONED_DIR, True, False)
File "/ebio/abt6_projects9/microbiome_analysis/data/software/mypython/envs/snowball/bin/algbioi/haplo/hio.py", line 423, in partitionReads
raise e
IOError: Not a gzipped file

Did I provide a file that was expected to be gzipped but was uncompressed?

Implementation of single end reads

Is there any plans to include the addition of single end reads? I have several paired end libraries and additional single end libraries of the same metagenomes. It will be nice to see how your algorithm performs with this additional data.

It is also often the case where paired-end libraries are generated from different machines eg: HiSeq (read length 2x150bp) and MiSeq (2x250). so far all reads provided to snowball have to be of the same length which is quite inconvenient.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.