diriano / ploidyngs Goto Github PK

View Code? Open in Web Editor NEW

36.0 10.0 14.0 290.25 MB

Explore ploidy levels from NGS data alone

License: GNU General Public License v3.0

Python 0.13% HTML 99.84% R 0.04%

ploidyngs's People

Contributors

Stargazers

Watchers

Forkers

tiramisutes altingia said3427 tw7649116 andypetes xuelei-dai nalruf wesharrell santosrac novigit aldoiavinar abshah kjrom-sol shankarkshakya

ploidyngs's Issues

Error in in ploidyNGS.py

When I test ploidyNGS use the file you provided, I'm getting the following error:

Traceback (most recent call last):
  File "./ploidyNGS.py", line 90, in <module>
    for l in pysam.idxstats(args.bam).split('\n'):
AttributeError: 'list' object has no attribute 'split'

Though I get these two files, there is no histogram in the pdf file.
diploidTest_depth100.tab
diploidTest_depth100.tab.PloidyNGS.pdf

Empty output

Hello,

the output file .tab is generated nevertheless it remains empty, no errors or warnings are printed out.
trying to figure out the reason .. both installation and test went well.

thank you for your time

about simulatePloidyData.py

Dear developers of ploidyNGS,

I have been trying to run the script simulatePloidyData.py to generate a chromosome with heteromorphic loci at the given heterozygosity.

Initially after encountering some trivial errors with respect to python versions (ver 2 vs ver 3), and some straight forward deprecated function usage, I encounter a bit more complex errors. Firstly, the script is unable to handle ploidy levels below 2 (eg. --ploidy 1). Secondly, I get a ValueError when I try to run the script with ploidy 2 (see below).

Please advise on how to fix these errors or possible work-arounds.

Thank you

Best regards,
Abhijeet Shah

ploidy error:

python ~/Downloads/ploidyNGS/simulation/simulatePloidyData.py --genome MyChromosome.fasta --heterozygosity 0.01 --ploidy 1

Traceback (most recent call last):
  File "/Downloads/ploidyNGS/simulation/simulatePloidyData.py", line 98, in <module>
   randomDosageTable = random.randint(0,lenDosageTable-1)
  File "/miniconda3/envs/ploidy/lib/python3.10/random.py", line 370, in randint
   return self.randrange(a, b+1)
  File "/miniconda3/envs/ploidy/lib/python3.10/random.py", line 353, in randrange
   raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (0, 0, 0)

ValueError:

python ~/Downloads/ploidyNGS/simulation/simulatePloidyData.py --genome MyChromosome.fasta --heterozygosity 0.01 --ploidy 2

Traceback (most recent call last):
  File "/Downloads/ploidyNGS/simulation/simulatePloidyData.py", line 102, in <module>
   GenomeAlph.remove(base)
ValueError: list.remove(x): x not in list

end of life of python2

bro, ur code cant fit in python3. Python 2 has already eol now.

Did you checked the haploid example?

When I run the haploid test data it gives a ploidy of 6 when using -g option but the histogram shows a clear haploid ploidy, can you check that?

where is explorePloidyNGS.py?

The simulation page mentions a useful-sounding script called explorePloidyNGS.py, but I can't seem to find it in the repository.

Is it need a long time?

My program was hanging out for a long time in the XXXsorted.bam.bai file, but no any error or increased of this XXXsorted.bam.bai file size.

How long it takes ploidyNGS to finish a job?

Hi,

I have been running ploidyNGS for a whole day and I want to know how long it takes ploidyNGS to finish a job? And can it be multi-threaded? ploidyNGS is just keep runnning and doesn't produce any putputs or messages.

Best,
Quan

Test data returns incorrect ploidy.

Hello,

I receive a different result for the test data and was hoping for your input. When running the test data and guessing ploidy, it guesses 6. This has been recreated on two independent machines, following your instructions. Both cases return ploidy of 6.

When running ./ploidyNGS.py --guess_ploidy --out myTest/DataTestPloidy1_guessPloidy --bam test_data/HaploidGenome/Ploidy1.bowtie2.sorted.bam

I get the output:

This is ploidyNGS version v3.1.2
Current date and time: Mon Oct 31 11:44:30 2022

BAM index present... OK!
Number of mapped reads from BAM: 206062
Observed average coverage: 51.44
Number of heteromorphic positions in  NC_001133.9 :  5936
Total number of heteromorphic positions:  5936

Coverage used for guessing ploidy: 50

  After comparing your data with our simulated dataset
  and computing the Kolmogorov-Smirnov distance, 
  the closest ploidy to yours is 6

Do you what is happening? Thank you for your time.

Mistake in ploidyNGS.py

When I'm trying out either the test or one of my own files, I'm getting the following error:

Traceback (most recent call last):
  File "./ploidyNGS.py", line 98, in <module>
...
TypeError: expected string or Unicode object, file found

I pinpointed the mistake to be in ploidyNGS.py, at line 59
There the BAM file is opened, while at line 85 the file handle is parsed to pysam. This should be the file name as a string only. To solve I simply changed line 59;
bamOBJ = open(args.bam,"r") to bamOBJ = args.bam

The solution seems trivial, but it's not reflected in the traceback

Is it need a long time?

My program was hanging out for a long time in the XXXsorted.bam.bai file, but no any error or increased of this XXXsorted.bam.bai file size.

Job Killed. Running time and memory consumption

Dear ploidyNGS creator,

Thank you for your works and maintaining this git-hub. Your test dataset was OK on my installation.

I launch this command :

[userlocal@NTLT101 ploidyNGS]$ cd ~/ploidyNGS
[userlocal@NTLT101 ploidyNGS]$ source .venv/bin/activate
(.venv) [userlocal@NTLT101 ploidyNGS]$ ./ploidyNGS.py -o /PATH_OUTPUT/diploidTest -b /PATH_OUTPUT/all_sort.bam -d 50
###############################################################

This is ploidyNGS version v3.1.2

nCurrent date and time: Mon Oct 16 18:16:10 2017

###############################################################
No index available for pileup. Creating an index...
Number of mapped reads from BAM: 14590766
Killed

I suppose that my comptuer (not very powerfull) run out of memory. How to prevent such problems ?

1- It would be interessting to have some rough idea our memory consumption and/or duration of computation for a given computer architecture.

2- An other option would be to have some warmings before launching the computation and/or have an option to process the dataset by chunk.

I am thinking of writing a bash script based on samtools 1.4.x :
The planned step are :
1- Split the bam by contigs and if these contigs are too larges (compared to your test dataset)
2- They will be splitted in smaller bam.
3- Launch ./ploidyNGS.py in parallel in bash. I do not know if it is possible regarding your special environment "(.venv)"... Do you know if it is possible or not ?

Cheers,

#Not an issue, but needed help to comment on ploidy of an organism.

Dear Developer,
#sorry to raise an issue unnecessarily.

Below is what I got when I ran the tool,
(.venv) mml@MML:~/softwares/ploidyNGS$ ./ploidyNGS.py --guess_ploidy -o guess_test -b dedup_SS_BWA_reads.bam
###############################################################

This is ploidyNGS version v3.1.2

Current date and time: Tue Nov 14 22:13:57 2017

###############################################################
BAM index present... OK!
Number of mapped reads from BAM: 3483177
Observed average coverage: 54.00
Coverage used for guessing ploidy: 50

After comparing your data with our simulated dataset
and computing the Kolmogorov-Smirnov distance,
the closest ploidy to yours is 3

And the image generated is attached below.

In the scientific field, confusion is haploid? or diploid?

Little information about the organism:
Fungi, 12.5mb genome, Illumina Miseq paired end reads.

Please go through the image and let me know what do you think.

Thanks
Bhagya C T

guessnoidea_test_depth100.tab.PloidyNGS.pdf

Biopython Python3 <> Python2

Hey there,

I want to run your Software but cant get it done because it says: "This script requires Python version 2.7.8 or higher within major version 2" where as Biopython (which is needed for your Software: "from Bio.SeqRecord import SeqRecord") needs Python 3.6 or later: "Biopython requires Python 3.6 or later. Python 2.7 detected."

Is there any solution?

Can be used for allopolyploid?

Hi,
Do I want to know whether this software can be used for allopolyploid species?

Any help is much appreciated.
Thanks.

Best regards,

About ploidyNGS.py

Dear all
I recently tested ploidyNGS, and found an error message below.

commend line
./ploidyNGS.py -o diploidTest -b test_data/simulatedDiploidGenome/Ploidy2.bowtie2.sorted.bam
error message
sh: 1: ploidyNGS_generateHistogram.R: not found

So, I suggest to fix ploidyNGS.py (line229) as shown below.

original : cmdPloidyGraphRscript="ploidyNGS_generateHistogram.R "
fixed : cmdPloidyGraphRscript="Rscript ploidyNGS_generateHistogram.R "

Best,

About test data

Dear all,

I installed ploidyNGS and tested ./ploidyNGS.py -o diploidTest -b test_data/simulatedDiploidGenome/Ploidy2.bowtie2.sorted.bam

After running, I got the output attached
diploidTest_MaxDepth100_MinCov0.tab.PloidyNGS.pdf
which is somewhat different from https://github.com/diriano/ploidyNGS/tree/master/images/diploidTest_depth100.tab.PloidyNGS.png

Are there any problems with the installation process?

Error Message - ZeroDivisionError: integer division or modulo by zero

Greetings,

I'm trying to use ploidyNGS to make predictions about putatively haploid/diploid datasets. I followed the documentation and received the following error:

./ploidyNGS.py --guess_ploidy -o DF_genome -d 100000 -b DF_I_DNA_genome_SORTED.bam
###############################################################

This is ploidyNGS version v3.1.2

Current date and time: Thu Feb 22 16:20:01 2018

###############################################################
BAM index present... OK!
Number of mapped reads from BAM: 31817941
Traceback (most recent call last):
File "./ploidyNGS.py", line 130, in
averageCoverage=countTotalReads/countTotalPositions
ZeroDivisionError: integer division or modulo by zero

I'm guessing by the error output that the denominator in averageCoverage is zero, but that shouldn't be the case. My bam file contains a large eukaryotic genome with lots of scaffolds, but I wanted to test using the -d flag as a first pass before splitting bam into scaffolds. Any help is appreciated!

diriano / ploidyngs Goto Github PK

ploidyngs's People

Contributors

Stargazers

Watchers

Forkers

ploidyngs's Issues

This is ploidyNGS version v3.1.2

nCurrent date and time: Mon Oct 16 18:16:10 2017

This is ploidyNGS version v3.1.2

Current date and time: Tue Nov 14 22:13:57 2017

This is ploidyNGS version v3.1.2

Current date and time: Thu Feb 22 16:20:01 2018

Recommend Projects

Recommend Topics

Recommend Org