Code Monkey home page Code Monkey logo

open_pipelines's People

Contributors

afrendeiro avatar berguner avatar fwzhao avatar mfarlik avatar nsheff avatar pdatlinger avatar vreuter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

open_pipelines's Issues

bugs in chipseq.py & ngstk.py

Peak calling in open_pipelines/chipseq.py was not working. It turns out that there are a couple of issues to be corrected. Here are my solutions.

1-) In open_pipelines/chipseq.py (line 535), an underscore is missing, i.e. pipe_manager.wait_for_file... -> pipe_manager._wait_for_file...
2-) macs2CallPeaks in pypiper/ngstk.py (line 1140) was not functioning properly when it is called in open_pipelines/chipseq.py (line 545). I think this is due to not including "self" in macs2CallPeaks's definition. (this subtle problem is explained here: http://blog.rtwilson.com/got-multiple-values-for-argument-error-with-keyword-arguments-in-python-classes/)). I added a "self" in my local copy.
3-) Also, to be consistent with open_pipelines/chipseq.py, I also changed treatmentBams -> treatmentBam AND controlBams -> controlBam in macs2CallPeaks in pypiper/ngstk.py (line 1140)

With these 3 changes, now open_pipelines/chipseq.py call peaks properly.

Fastqc parser bug

Hello,

FastQC parsing fails due to "IndexError: list index out of range". Changing the line:

line = [i for i in range(len(content)) if "Sequence length " in content[i]][0]

to
line = [i for i in range(len(content)) if "Sequence length" in content[i]][0]
fixes the issue. Looks like the last 2 space characters in the search pattern were causing the problem.

Best,
Bekir

Add all required arguments as pipelines inputs

This should ease the requirement to use the pipelines with looper (through the sample yaml).
Passed arguments in command line should overwride any passed in the sample yaml if this is also passed.

set_file_path error

Reported by @mtugrul

I was not able to run chipseq.py and atacseq.py in looper. The following error comes, I looked into the corresponding lines but did not understand why. Any help?

chipseq:

Traceback (most recent call last):
File "/home/mtugrul/software/pipelines/pipelines/chipseq.py", line 755, in
sys.exit(main())
File "/home/mtugrul/software/pipelines/pipelines/chipseq.py", line 473, in main
sample.set_file_paths()
File "/home/mtugrul/software/pipelines/pipelines/chipseq.py", line 78, in set_file_paths
super(ChIPseqSample, self).set_file_paths()
TypeError: set_file_paths() takes exactly 2 arguments (1 given)

atacseq:

Traceback (most recent call last):
File "/home/mtugrul/software/pipelines/pipelines/atacseq.py", line 723, in
sys.exit(main())
File "/home/mtugrul/software/pipelines/pipelines/atacseq.py", line 456, in main
sample.set_file_paths()
File "/home/mtugrul/software/pipelines/pipelines/atacseq.py", line 54, in set_file_paths
super(ATACseqSample, self).set_file_paths()
TypeError: set_file_paths() takes exactly 2 arguments (1 given)

ok, it looks like this problem is due to new looper versions. When I install looper v0.5, it seems to be running now. I tried with all other dev versions of looper v0.6, but no success! I will stick to v0.5 for now, but this should be solved in long term.

Update to PEP stack 2.0

Hi,

I'm no longer using this code, but I'm still collaborating with @sreichl on projects that use this.

I've heard there's some trouble upgrading this to work with the PEP stack>=2.0.

@fwzhao I believe you did some work on this on the project side to upgrade project configs, etc.
Do you want to share your progress, and any issues you might have so we can start upgrading the pipelines?

Anyone else interested, please pitch in.

Missing run_spp.R for chipseq.py

I tried subbing in the script from pipelines/tools that sounds like it could've been a substitute for the spp tool referenced in the pipeline config file's tools section, but when I look at the sample's log file, it seems like the command being run would be appropriate for a different script, so it seems like run_spp.R and spp_peak_calling.R are for different things?

Question: program naming for pipeline scripts

It looks like some pipelines, e.g. chipseq.py, are defined as command-line programs in setup.py. There it has an underscored name while in the description to the argument parser, the name is hyphenated. Is this due to a Python-related hyphens-to-underscores conversion, or should those match?

Attribute reference to 'bigwig' on Sample in chipseq.py

In the chipseq.py pipeline, there are three usages of sample.bigwig, but the Sample instance being used does not have a bigwig attribute.

Target exists: `/sfs/lustre/allocations/shefflab/processed//kipnis_chip/micro/results_pipeline/input_12k/mapped/input_12k.trimmed.bowtie2.filtered.bam.bai`
Removed existing flag: /sfs/lustre/allocations/shefflab/processed//kipnis_chip/micro/results_pipeline/input_12k/chipseq_failed.flag
Traceback (most recent call last):
  File "/home/vpr9v/code/open_pipelines/pipelines/chipseq.py", line 757, in <module>
    sys.exit(main())
  File "/home/vpr9v/code/open_pipelines/pipelines/chipseq.py", line 484, in main
    process(sample, pipe_manager, args)
  File "/home/vpr9v/code/open_pipelines/pipelines/chipseq.py", line 642, in process
    track_dir = os.path.dirname(sample.bigwig)
AttributeError: 'ChIPseqSample' object has no attribute 'bigwig'

Pypiper terminating spawned child process 190059

Change status from running to failed

Peak count stat for SPP

I can read through it if this is unknown, but does anyone happen to be aware if the SPP peak calling RScript handles responsibility for the task of reporting peak count? When MACS2 is the caller, this is done post-hoc with report_dict, but not when it's SPP.

Leveraging pypiper ngstk

atacseq.py and chipseq.py should use functions from pypiper.ngstk. Specifically, this is in regard to bam_to_bigwig / bamToBigWig (though it may also apply to other pipeline-defined functions). At least from quick look over the version in chipseq.py, it seems like the only real difference is a hook for normalization factor. These functions should use a central version from ngstk once it parameterizes that.
databio/pypiper#52

Help needed with CROP-seq / open_pipeline isses

Hello epigen,

I am a wetlab cellbiologist with beginner to intermediate coding skills and I am trying setup the CROP-seq pipeline in our laboratory.
They use looper and also the open_pipelines repository.
sadly I cannot get passed a certain step.
Now I'm not sure if this is a problem that has anything to do with the open_pipeline scripts, but i do not know where to ask elsewhere. the #https://github.com/epigen/crop-seq.git is archived and cant create issues there.
it is called makeref.py and is making a gtf, STAR index and refFlat of the genome i'm using and the viral genome/gRNA that i want.
during this script it calls the a looper config.yaml once.
however i get this error
(CROPenv) lucask@kolossus:~/crop-seq$ make makeref python src/guides_to_ref.py Traceback (most recent call last): File "src/guides_to_ref.py", line 58, in <module> prj = Project(os.path.join("metadata", "config.yaml")) File "/home/lucask/crop-seq/src/looper/looper/models.py", line 772, in __init__ process_pipeline_interfaces(self.metadata.pipelines_dir) File "/home/lucask/crop-seq/src/looper/looper/models.py", line 376, in process_pipeline_interfaces proto_iface = ProtocolInterface(pipe_iface_location) File "/home/lucask/crop-seq/src/looper/looper/models.py", line 2845, in __init__ self.pipe_iface = PipelineInterface(self.pipe_iface_path) File "/home/lucask/crop-seq/src/looper/looper/models.py", line 2471, in __init__ with open(config, 'r') as f: IOError: [Errno 2] No such file or directory: '/media/draco/lucask/open_pipelines/config/pipeline_interface.yaml' make: *** [makeref] Error 1
the config.yaml https://github.com/epigen/crop-seq/blob/master/metadata/config.yaml is a copy of the original except with my directories. I dont know enough (if anything at all) about looper to exactly understand what i am missing.
i'm using an enviroment that should have exactly all the dependancies installed.
I do see an atacseq.interface.yaml in the open_pipelines but not one for drop-seq, (the one i will eventually need)
I was not sure if this issues stems from the fact that you guys are updating this repository or from me making some mistakes with how i should setup the looper config.yaml
Could you help me or give some advice.
thank you in advance.

Kind regards,
Lucas Kuijpers

P.S. let me know if i need to send more info, or if there is another github/webpage where i should look

ATAC-seq pipeline exit if no mitochondrial reads are duplicated

ATAC-seq pipeline exit if no mitochondrial reads are duplicated due to zero division error - see logfile:
File "/Users/christianschmidl/src/open_pipelines/pipelines/atacseq.py", line 359, in parse_duplicate_stats
prefix + "duplicate_percentage": (float(duplicates) / (single_ends + paired_ends * 2)) * 100}
ZeroDivisionError: float division by zero

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.