Code Monkey home page Code Monkey logo

tin-score-calculation's People

Contributors

angrymaciek avatar fgypas avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Forkers

meric466

tin-score-calculation's Issues

calculate-tin.py is giving an error when we used it for a bed file generated with s. cerevisiae gtf file

Describe the bug
calculate-tin.py is giving an error when we used it for a bed file generated with s. cerevisiae gtf file. Below is the error we get:

@ 2021-07-13 16:36:37: Get BAM file(s) ...
Total 1 BAM file(s):
	results/samples/RPL7_3/map_genome/RPL7_3.se.Aligned.sortedByCoord.out.bam
@ 2021-07-13 16:36:37: Processing results/samples/RPL7_3/map_genome/RPL7_3.se.Aligned.sortedByCoord.out.bam
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/tavymi75/zarp/config/my_run/.snakemake/conda/c6ae2829/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/tavymi75/zarp/config/my_run/.snakemake/conda/c6ae2829/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/tavymi75/zarp/config/my_run/.snakemake/conda/c6ae2829/bin/calculate-tin.py", line 361, in gf
    coverage = genebody_coverage(
  File "/home/tavymi75/zarp/config/my_run/.snakemake/conda/c6ae2829/bin/calculate-tin.py", line 260, in genebody_coverage
    start = positions[0] - 1
IndexError: list index out of range
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/tavymi75/zarp/config/my_run/.snakemake/conda/c6ae2829/bin/calculate-tin.py", line 561, in <module>
    main()
  File "/home/tavymi75/zarp/config/my_run/.snakemake/conda/c6ae2829/bin/calculate-tin.py", line 541, in main
    pool.map(gf, conditions)
  File "/home/tavymi75/zarp/config/my_run/.snakemake/conda/c6ae2829/lib/python3.9/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/tavymi75/zarp/config/my_run/.snakemake/conda/c6ae2829/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
IndexError: list index out of range

See also discussion here:
https://git.scicore.unibas.ch/zavolan_group/pipelines/zarp/-/issues/179

Reporter: @meric466

ci: comparing against a ground truth output

Is your feature request related to a problem? Please describe.
Currently the CI only tests that the scripts do not crash; nothing more. We would not know if anyone introduce a bug in the calculations.

Describe the solution you'd like
Include expected output files for the included test input and compare against them in the CI.

tin score calculation does not produce proper output for >1 bamfiles provided

Describe the bug
An issue we discovered with @mkatsanto today.
If you specify >1 input BAMfile to the script the output table still contains scores only for the first sample provided.
Requires a closer look at the paralilization and inspection where the scores are lost/overwritten.

To Reproduce
Run the test from the CI but adjust the command so that you specify two times the same input bam file.

Expected behavior
The output table should contain quantified TIN scores for two samples.

Additional context
We usually run this just for 1 input BAM file, but the original script contained the functionality that one may in fact provide multiple samples. Please see the script parameters specification for more info. However, it seems that this functionality is broken now.

Error running calculate-tin.py

Hi!
I was trying to run:
calculate-tin.py -i Sample_A.bam -r hg19_Ensembl_gene.bed
But I get an error:

@ 2024-03-15 16:00:51: Get BAM file(s) ...
Total 1 BAM file(s):
	  Sample_A.bam
Traceback (most recent call last):
  File "/home/paulyr2/.local/bin/calculate-tin.py", line 4, in <module>
    __import__('pkg_resources').run_script('tin-score-calculation==0.6.3', 'calculate-tin.py')
  File "/mnt/nasapps/production/miniconda/23.1.0/lib/python3.10/site-packages/pkg_resources/__init__.py", line 672, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/mnt/nasapps/production/miniconda/23.1.0/lib/python3.10/site-packages/pkg_resources/__init__.py", line 1479, in run_script
    exec(script_code, namespace, namespace)
  File "/home/paulyr2/.local/lib/python3.10/site-packages/tin_score_calculation-0.6.3-py3.10.egg/EGG-INFO/scripts/calculate-tin.py", line 554, in <module>
  File "/home/paulyr2/.local/lib/python3.10/site-packages/tin_score_calculation-0.6.3-py3.10.egg/EGG-INFO/scripts/calculate-tin.py", line 490, in main
AttributeError: 'NoneType' object has no attribute 'split'

head hg19_Ensembl_gene.bed
chr1 66999065 67210057 ENST00000237247 0 + 67000041 67208778 0 27 25,123,64,25,84,57,55,176,12,12,25,52,86,93,75,501,81,128,127,60,112,156,133,203,65,165,1302, 0,863,92464,99687,100697,106394,109427,110161,127130,134147,137612,138561,139898,143621,146295,148486,150724,155765,156807,162051,185911,195881,200365,205952,207275,207889,209690,

Documentation cleaning

Is your feature request related to a problem? Please describe.
Readme file is outdated and contained information not required anymore.

Describe the solution you'd like
Clean the documentation.

clean docker-related files

Is your feature request related to a problem? Please describe.
There are several Dockerfiles laying around in the repository, It's best to decide on the main one and set how do we proceed with deploying the container.

Mac version of tin-score-calculation is slow

Is your feature request related to a problem? Please describe.
Both the mac version of the CI and a local installation via pip on OS X 10.14.6 seem very slow. Even for the small test files. The same test under Ubuntu runs in 10s while on OSX it takes more than 20 minutes...

Describe the solution you'd like
Speed up the process.

Describe alternatives you've considered
Or at least understand why this happens.

Python 3

Is your feature request related to a problem? Please describe.
Python 2 is not supported anymore which makes it harder to run the script outside a specific container.

Describe the solution you'd like
Refactor the scripts so that they work with Python 3 interpreter.

TIN score calculation uses too many threads

Describe the bug
The number of threads, resp. processes, is hardcoded and based on on the CPU count of the PC (see additional context).
This potentially produces lots of scheduling overhead.

During an execution of zarp I noticed that a rule running this script (via singularity image) spawned lots of processes (see screenshot).

Screenshot
Taken from a machine with 12 cores, as reported with python's cpu_count().

bildschirmfoto_2021-05-19_um_11 12 24

Expected behavior
Define the number of threads as additional parameter.

Desktop (please complete the following information):

  • OS: CentOS 7

Additional context
The code to adjust:

nrProcesses = cpu_count() * 10

Eliminate dependencies

Is your feature request related to a problem? Please describe.
I was looking at the dependencies of it seems that we could eliminate some of them. More specifically do we really need the guppy package?

from guppy import hpy

Describe the solution you'd like
The maintenance of the package will be easier if we will not use the specific package.

@mzavolan @meric466 @AngryMaciek Do you think it can be eliminated?

BED file / possible to use GTF directly?

Great work, two questions:

  • is there a tool to make the correct BED file for tin-score-calculation from a GTF, or do you have BED files downloadable for human and mouse somewhere (possibly with latest assemblies and annotations from Ensembl?)
  • easiest would be: is it possible to use a GTF directly?

Thanks,
Gregor

License

Make sure that the LICENSE here is the same as the original LICENSE for RSeQC.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.