algbioi / snowball Goto Github PK

View Code? Open in Web Editor NEW

3.0 5.0 4.0 268 KB

License: GNU General Public License v3.0

Python 100.00%

snowball's Introduction

Snowball

The latest version can be found in the release folder.

Please refer to:

https://github.com/algbioi/snowball/wiki

snowball's People

Contributors

Stargazers

Watchers

Forkers

abremges hzi-bifo khturner

snowball's Issues

Parallel processing issue?

I am trying to run snowball with the command:
python algbioi/ga/run.py -f /local-homes/bioinformatics/erica/metagenome_stool/both_allseqs_R1.fastq.gz -s /local-homes/bioinformatics/erica/metagenome_stool/both_allseqs_R2.fastq.gz -m /local-homes/bioinformatics/erica/metagenome_stool/pfam-snowball.hmm -o both_contigs.fna.gz -i 338 -r 250

but it looks like there is an issue with the multiprocessing module
I receive this to stdout:
This hmmsearch binary will be used:
/usr/bin/hmmsearch
Using temporary directory:
/tmp/snowball_vxNG2N
Running on: Ubuntu 12.04 precise (linux2)
Using 32 processors
Settings:
Read length: 250
Insert size: 338
Min. overlap probability: 0.8
Min. overlap length: 0.5
Min. HMM score: 40
Joining paired-end reads into consensus reads, loading reads from:
/local-homes/bioinformatics/erica/metagenome_stool/both_allseqs_R1.fastq.gz
/local-homes/bioinformatics/erica/metagenome_stool/both_allseqs_R2.fastq.gz
Traceback (most recent call last):
File "algbioi/ga/run.py", line 493, in
_main()
File "algbioi/ga/run.py", line 468, in _main
outAnnot=outAnnot, cleanUp=cleanUp, processors=processors)
File "algbioi/ga/run.py", line 175, in mainSnowball
maxCpu=comh.MAX_PROC)
File "/local-homes/bioinformatics/erica/metagenome_stool/snowball_1_2/algbioi/com/fq.py", line 150, in joinPairEnd
retList = parallel.runThreadParallel(taskList, maxCpu)
File "/local-homes/bioinformatics/erica/metagenome_stool/snowball_1_2/algbioi/com/parallel.py", line 101, in runThreadParallel
retValList.append(taskHandler.get())
File "/usr/lib/python2.7/multiprocessing/pool.py", line 528, in get
raise self._value
IndexError: string index out of range

Any ideas on how to resolve this problem?
Thank you in advance for your help,
Erica

Not joined reads and reads partition issues

I am trying to apply snowball to metatranscriptomics datasets. My current issue is that 99% of the reads were filtered out before running hmm search. So I only get about 1000 reads being joined into consensus reads. Is there any way that I can input paired end reads without requiring them to be joined into consensus reads? Also, even with about 1000 reads passed the filtering cut-off, the program will break at the point partitioning the reads into different Pfam-A gene families. I tried to run snowball on the test datasets you provided, and it worked fine. So any suggestions that I can try? Appreciate for any advises you may give. Thanks a lot!

Program crash

snowball crashed:

COMMAND LINE:
snowball -f R1-trimmed.fq -s R2-trimmed.fq -m Pfam-A.hmm -i 225 -r 140 -p 15

STDOUT
This hmmsearch binary will be used:
"path to hmmsearch"
Using temporary directory:
/tmp/snowball_FRFjf1
Running on: Ubuntu 14.04 trusty (linux2)
Using 15 processors
Settings:
Read length: 140
Insert size: 225
Min. overlap probability: 0.8
Min. overlap length: 0.5
Min. HMM score: 40
Joining paired-end reads into consensus reads, loading reads from:
"path"R1-trimmed.fq
"path"R2-trimmed.fq
Filtered out: 99.259 % reads
Translating reads to protein sequences
Running HMMER (hmmsearch)
Assigning consensus reads to gene domains
Exception in partitionReads:
/tmp/snowball_FRFjf1 40 0.6 1 sample_partitioned True
Not a gzipped file
<type 'exceptions.IOError'>
('Not a gzipped file',)
Traceback (most recent call last):
File "/ebio/abt6_projects9/microbiome_analysis/data/software/mypython/envs/snowball/bin/snowball", line 493, in
_main()
File "/ebio/abt6_projects9/microbiome_analysis/data/software/mypython/envs/snowball/bin/snowball", line 468, in _main
outAnnot=outAnnot, cleanUp=cleanUp, processors=processors)
File "/ebio/abt6_projects9/microbiome_analysis/data/software/mypython/envs/snowball/bin/snowball", line 199, in mainSnowball
comh.SAMPLES_SHUFFLE_RAND_SEED, comh.SAMPLES_PFAM_PARTITIONED_DIR, True, False)
File "/ebio/abt6_projects9/microbiome_analysis/data/software/mypython/envs/snowball/bin/algbioi/haplo/hio.py", line 423, in partitionReads
raise e
IOError: Not a gzipped file

Did I provide a file that was expected to be gzipped but was uncompressed?

Implementation of single end reads

Is there any plans to include the addition of single end reads? I have several paired end libraries and additional single end libraries of the same metagenomes. It will be nice to see how your algorithm performs with this additional data.

It is also often the case where paired-end libraries are generated from different machines eg: HiSeq (read length 2x150bp) and MiSeq (2x250). so far all reads provided to snowball have to be of the same length which is quite inconvenient.

algbioi / snowball Goto Github PK

snowball's Introduction

Snowball

snowball's People

Contributors

Stargazers

Watchers

Forkers

snowball's Issues

Parallel processing issue?

Not joined reads and reads partition issues

Program crash

Implementation of single end reads

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent