Code Monkey home page Code Monkey logo

ploidyngs's People

Contributors

diriano avatar nalruf avatar novigit avatar santosrac avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ploidyngs's Issues

Error in in ploidyNGS.py

When I test ploidyNGS use the file you provided, I'm getting the following error:

Traceback (most recent call last):
  File "./ploidyNGS.py", line 90, in <module>
    for l in pysam.idxstats(args.bam).split('\n'):
AttributeError: 'list' object has no attribute 'split'

Though I get these two files, there is no histogram in the pdf file.
diploidTest_depth100.tab
diploidTest_depth100.tab.PloidyNGS.pdf

Empty output

Hello,

the output file .tab is generated nevertheless it remains empty, no errors or warnings are printed out.
trying to figure out the reason .. both installation and test went well.

thank you for your time

about simulatePloidyData.py

Dear developers of ploidyNGS,

I have been trying to run the script simulatePloidyData.py to generate a chromosome with heteromorphic loci at the given heterozygosity.

Initially after encountering some trivial errors with respect to python versions (ver 2 vs ver 3), and some straight forward deprecated function usage, I encounter a bit more complex errors. Firstly, the script is unable to handle ploidy levels below 2 (eg. --ploidy 1). Secondly, I get a ValueError when I try to run the script with ploidy 2 (see below).

Please advise on how to fix these errors or possible work-arounds.

Thank you

Best regards,
Abhijeet Shah

ploidy error:

python ~/Downloads/ploidyNGS/simulation/simulatePloidyData.py --genome MyChromosome.fasta --heterozygosity 0.01 --ploidy 1

Traceback (most recent call last):
  File "/Downloads/ploidyNGS/simulation/simulatePloidyData.py", line 98, in <module>
   randomDosageTable = random.randint(0,lenDosageTable-1)
  File "/miniconda3/envs/ploidy/lib/python3.10/random.py", line 370, in randint
   return self.randrange(a, b+1)
  File "/miniconda3/envs/ploidy/lib/python3.10/random.py", line 353, in randrange
   raise ValueError("empty range for randrange() (%d, %d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (0, 0, 0)

ValueError:

python ~/Downloads/ploidyNGS/simulation/simulatePloidyData.py --genome MyChromosome.fasta --heterozygosity 0.01 --ploidy 2

Traceback (most recent call last):
  File "/Downloads/ploidyNGS/simulation/simulatePloidyData.py", line 102, in <module>
   GenomeAlph.remove(base)
ValueError: list.remove(x): x not in list

Did you checked the haploid example?

When I run the haploid test data it gives a ploidy of 6 when using -g option but the histogram shows a clear haploid ploidy, can you check that?

Is it need a long time?

My program was hanging out for a long time in the XXXsorted.bam.bai file, but no any error or increased of this XXXsorted.bam.bai file size.

How long it takes ploidyNGS to finish a job?

Hi,

I have been running ploidyNGS for a whole day and I want to know how long it takes ploidyNGS to finish a job? And can it be multi-threaded? ploidyNGS is just keep runnning and doesn't produce any putputs or messages.

Best,
Quan

Test data returns incorrect ploidy.

Hello,

I receive a different result for the test data and was hoping for your input. When running the test data and guessing ploidy, it guesses 6. This has been recreated on two independent machines, following your instructions. Both cases return ploidy of 6.

When running ./ploidyNGS.py --guess_ploidy --out myTest/DataTestPloidy1_guessPloidy --bam test_data/HaploidGenome/Ploidy1.bowtie2.sorted.bam

I get the output:

This is ploidyNGS version v3.1.2
Current date and time: Mon Oct 31 11:44:30 2022

BAM index present... OK!
Number of mapped reads from BAM: 206062
Observed average coverage: 51.44
Number of heteromorphic positions in  NC_001133.9 :  5936
Total number of heteromorphic positions:  5936

Coverage used for guessing ploidy: 50

  After comparing your data with our simulated dataset
  and computing the Kolmogorov-Smirnov distance, 
  the closest ploidy to yours is 6

Do you what is happening? Thank you for your time.

Mistake in ploidyNGS.py

When I'm trying out either the test or one of my own files, I'm getting the following error:

Traceback (most recent call last):
  File "./ploidyNGS.py", line 98, in <module>
...
TypeError: expected string or Unicode object, file found

I pinpointed the mistake to be in ploidyNGS.py, at line 59
There the BAM file is opened, while at line 85 the file handle is parsed to pysam. This should be the file name as a string only. To solve I simply changed line 59;
bamOBJ = open(args.bam,"r") to bamOBJ = args.bam

The solution seems trivial, but it's not reflected in the traceback

Is it need a long time?

My program was hanging out for a long time in the XXXsorted.bam.bai file, but no any error or increased of this XXXsorted.bam.bai file size.

Job Killed. Running time and memory consumption

Dear ploidyNGS creator,

Thank you for your works and maintaining this git-hub. Your test dataset was OK on my installation.

I launch this command :

[userlocal@NTLT101 ploidyNGS]$ cd ~/ploidyNGS
[userlocal@NTLT101 ploidyNGS]$ source .venv/bin/activate
(.venv) [userlocal@NTLT101 ploidyNGS]$ ./ploidyNGS.py -o /PATH_OUTPUT/diploidTest -b /PATH_OUTPUT/all_sort.bam -d 50
###############################################################

This is ploidyNGS version v3.1.2

nCurrent date and time: Mon Oct 16 18:16:10 2017

###############################################################
No index available for pileup. Creating an index...
Number of mapped reads from BAM: 14590766
Killed

I suppose that my comptuer (not very powerfull) run out of memory. How to prevent such problems ?

1- It would be interessting to have some rough idea our memory consumption and/or duration of computation for a given computer architecture.

2- An other option would be to have some warmings before launching the computation and/or have an option to process the dataset by chunk.

I am thinking of writing a bash script based on samtools 1.4.x :
The planned step are :
1- Split the bam by contigs and if these contigs are too larges (compared to your test dataset)
2- They will be splitted in smaller bam.
3- Launch ./ploidyNGS.py in parallel in bash. I do not know if it is possible regarding your special environment "(.venv)"... Do you know if it is possible or not ?

Cheers,

JB

#Not an issue, but needed help to comment on ploidy of an organism.

Dear Developer,
#sorry to raise an issue unnecessarily.

Below is what I got when I ran the tool,
(.venv) mml@MML:~/softwares/ploidyNGS$ ./ploidyNGS.py --guess_ploidy -o guess_test -b dedup_SS_BWA_reads.bam
###############################################################

This is ploidyNGS version v3.1.2

Current date and time: Tue Nov 14 22:13:57 2017

###############################################################
BAM index present... OK!
Number of mapped reads from BAM: 3483177
Observed average coverage: 54.00
Coverage used for guessing ploidy: 50

After comparing your data with our simulated dataset
and computing the Kolmogorov-Smirnov distance,
the closest ploidy to yours is 3

And the image generated is attached below.

In the scientific field, confusion is haploid? or diploid?

Little information about the organism:
Fungi, 12.5mb genome, Illumina Miseq paired end reads.

Please go through the image and let me know what do you think.

Thanks
Bhagya C T

guessnoidea_test_depth100.tab.PloidyNGS.pdf

Biopython Python3 <> Python2

Hey there,

I want to run your Software but cant get it done because it says: "This script requires Python version 2.7.8 or higher within major version 2" where as Biopython (which is needed for your Software: "from Bio.SeqRecord import SeqRecord") needs Python 3.6 or later: "Biopython requires Python 3.6 or later. Python 2.7 detected."

Is there any solution?

Can be used for allopolyploid?

Hi,
Do I want to know whether this software can be used for allopolyploid species?

Any help is much appreciated.
Thanks.

Best regards,

About ploidyNGS.py

Dear all
I recently tested ploidyNGS, and found an error message below.

  1. commend line
    ./ploidyNGS.py -o diploidTest -b test_data/simulatedDiploidGenome/Ploidy2.bowtie2.sorted.bam
  2. error message
    sh: 1: ploidyNGS_generateHistogram.R: not found

So, I suggest to fix ploidyNGS.py (line229) as shown below.

original : cmdPloidyGraphRscript="ploidyNGS_generateHistogram.R "
fixed : cmdPloidyGraphRscript="Rscript ploidyNGS_generateHistogram.R "

Best,

Error Message - ZeroDivisionError: integer division or modulo by zero

Greetings,

I'm trying to use ploidyNGS to make predictions about putatively haploid/diploid datasets. I followed the documentation and received the following error:

./ploidyNGS.py --guess_ploidy -o DF_genome -d 100000 -b DF_I_DNA_genome_SORTED.bam
###############################################################

This is ploidyNGS version v3.1.2

Current date and time: Thu Feb 22 16:20:01 2018

###############################################################
BAM index present... OK!
Number of mapped reads from BAM: 31817941
Traceback (most recent call last):
File "./ploidyNGS.py", line 130, in
averageCoverage=countTotalReads/countTotalPositions
ZeroDivisionError: integer division or modulo by zero

I'm guessing by the error output that the denominator in averageCoverage is zero, but that shouldn't be the case. My bam file contains a large eukaryotic genome with lots of scaffolds, but I wanted to test using the -d flag as a first pass before splitting bam into scaffolds. Any help is appreciated!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.