epigen / open_pipelines Goto Github PK
View Code? Open in Web Editor NEWPipelines for NGS data preprocessing by the Bock lab and friends
Pipelines for NGS data preprocessing by the Bock lab and friends
Peak calling in open_pipelines/chipseq.py was not working. It turns out that there are a couple of issues to be corrected. Here are my solutions.
1-) In open_pipelines/chipseq.py (line 535), an underscore is missing, i.e. pipe_manager.wait_for_file... -> pipe_manager._wait_for_file...
2-) macs2CallPeaks in pypiper/ngstk.py (line 1140) was not functioning properly when it is called in open_pipelines/chipseq.py (line 545). I think this is due to not including "self" in macs2CallPeaks's definition. (this subtle problem is explained here: http://blog.rtwilson.com/got-multiple-values-for-argument-error-with-keyword-arguments-in-python-classes/)). I added a "self" in my local copy.
3-) Also, to be consistent with open_pipelines/chipseq.py, I also changed treatmentBams -> treatmentBam AND controlBams -> controlBam in macs2CallPeaks in pypiper/ngstk.py (line 1140)
With these 3 changes, now open_pipelines/chipseq.py call peaks properly.
open_pipelines/pipelines/atacseq.py
Line 113 in 4d42c1d
Should be self.coverage_dir for bigwig file, nto self.coverage (a file)
Hello,
FastQC parsing fails due to "IndexError: list index out of range". Changing the line:
open_pipelines/pipelines/atacseq.py
Line 510 in d40d700
line = [i for i in range(len(content)) if "Sequence length" in content[i]][0]
Best,
Bekir
This should ease the requirement to use the pipelines with looper (through the sample yaml).
Passed arguments in command line should overwride any passed in the sample yaml if this is also passed.
After macs2 sambamba is called as defined in atacseq.yaml, but the nex sambamba view command is simply calling "sambamba":
macs2 callpeak -t ....
sambamba-0.7.1 depth region -t ....
sambamba view -t ....
Reported by @mtugrul
I was not able to run chipseq.py and atacseq.py in looper. The following error comes, I looked into the corresponding lines but did not understand why. Any help?
chipseq:
Traceback (most recent call last):
File "/home/mtugrul/software/pipelines/pipelines/chipseq.py", line 755, in
sys.exit(main())
File "/home/mtugrul/software/pipelines/pipelines/chipseq.py", line 473, in main
sample.set_file_paths()
File "/home/mtugrul/software/pipelines/pipelines/chipseq.py", line 78, in set_file_paths
super(ChIPseqSample, self).set_file_paths()
TypeError: set_file_paths() takes exactly 2 arguments (1 given)
atacseq:
Traceback (most recent call last):
File "/home/mtugrul/software/pipelines/pipelines/atacseq.py", line 723, in
sys.exit(main())
File "/home/mtugrul/software/pipelines/pipelines/atacseq.py", line 456, in main
sample.set_file_paths()
File "/home/mtugrul/software/pipelines/pipelines/atacseq.py", line 54, in set_file_paths
super(ATACseqSample, self).set_file_paths()
TypeError: set_file_paths() takes exactly 2 arguments (1 given)
ok, it looks like this problem is due to new looper versions. When I install looper v0.5, it seems to be running now. I tried with all other dev versions of looper v0.6, but no success! I will stick to v0.5 for now, but this should be solved in long term.
Hi,
I'm no longer using this code, but I'm still collaborating with @sreichl on projects that use this.
I've heard there's some trouble upgrading this to work with the PEP stack>=2.0.
@fwzhao I believe you did some work on this on the project side to upgrade project configs, etc.
Do you want to share your progress, and any issues you might have so we can start upgrading the pipelines?
Anyone else interested, please pitch in.
I tried subbing in the script from pipelines/tools
that sounds like it could've been a substitute for the spp
tool referenced in the pipeline config file's tools
section, but when I look at the sample's log file, it seems like the command being run would be appropriate for a different script, so it seems like run_spp.R
and spp_peak_calling.R
are for different things?
It looks like some pipelines, e.g. chipseq.py
, are defined as command-line programs in setup.py
. There it has an underscored name while in the description
to the argument parser, the name is hyphenated. Is this due to a Python-related hyphens-to-underscores conversion, or should those match?
In the chipseq.py
pipeline, there are three usages of sample.bigwig
, but the Sample
instance being used does not have a bigwig
attribute.
Target exists: `/sfs/lustre/allocations/shefflab/processed//kipnis_chip/micro/results_pipeline/input_12k/mapped/input_12k.trimmed.bowtie2.filtered.bam.bai`
Removed existing flag: /sfs/lustre/allocations/shefflab/processed//kipnis_chip/micro/results_pipeline/input_12k/chipseq_failed.flag
Traceback (most recent call last):
File "/home/vpr9v/code/open_pipelines/pipelines/chipseq.py", line 757, in <module>
sys.exit(main())
File "/home/vpr9v/code/open_pipelines/pipelines/chipseq.py", line 484, in main
process(sample, pipe_manager, args)
File "/home/vpr9v/code/open_pipelines/pipelines/chipseq.py", line 642, in process
track_dir = os.path.dirname(sample.bigwig)
AttributeError: 'ChIPseqSample' object has no attribute 'bigwig'
Pypiper terminating spawned child process 190059
Change status from running to failed
I can read through it if this is unknown, but does anyone happen to be aware if the SPP peak calling RScript handles responsibility for the task of reporting peak count? When MACS2 is the caller, this is done post-hoc with report_dict
, but not when it's SPP.
atacseq.py
and chipseq.py
should use functions from pypiper.ngstk
. Specifically, this is in regard to bam_to_bigwig
/ bamToBigWig
(though it may also apply to other pipeline-defined functions). At least from quick look over the version in chipseq.py
, it seems like the only real difference is a hook for normalization factor. These functions should use a central version from ngstk
once it parameterizes that.
databio/pypiper#52
Hello epigen,
I am a wetlab cellbiologist with beginner to intermediate coding skills and I am trying setup the CROP-seq pipeline in our laboratory.
They use looper and also the open_pipelines repository.
sadly I cannot get passed a certain step.
Now I'm not sure if this is a problem that has anything to do with the open_pipeline scripts, but i do not know where to ask elsewhere. the #https://github.com/epigen/crop-seq.git is archived and cant create issues there.
it is called makeref.py and is making a gtf, STAR index and refFlat of the genome i'm using and the viral genome/gRNA that i want.
during this script it calls the a looper config.yaml once.
however i get this error
(CROPenv) lucask@kolossus:~/crop-seq$ make makeref python src/guides_to_ref.py Traceback (most recent call last): File "src/guides_to_ref.py", line 58, in <module> prj = Project(os.path.join("metadata", "config.yaml")) File "/home/lucask/crop-seq/src/looper/looper/models.py", line 772, in __init__ process_pipeline_interfaces(self.metadata.pipelines_dir) File "/home/lucask/crop-seq/src/looper/looper/models.py", line 376, in process_pipeline_interfaces proto_iface = ProtocolInterface(pipe_iface_location) File "/home/lucask/crop-seq/src/looper/looper/models.py", line 2845, in __init__ self.pipe_iface = PipelineInterface(self.pipe_iface_path) File "/home/lucask/crop-seq/src/looper/looper/models.py", line 2471, in __init__ with open(config, 'r') as f: IOError: [Errno 2] No such file or directory: '/media/draco/lucask/open_pipelines/config/pipeline_interface.yaml' make: *** [makeref] Error 1
the config.yaml https://github.com/epigen/crop-seq/blob/master/metadata/config.yaml is a copy of the original except with my directories. I dont know enough (if anything at all) about looper to exactly understand what i am missing.
i'm using an enviroment that should have exactly all the dependancies installed.
I do see an atacseq.interface.yaml in the open_pipelines but not one for drop-seq, (the one i will eventually need)
I was not sure if this issues stems from the fact that you guys are updating this repository or from me making some mistakes with how i should setup the looper config.yaml
Could you help me or give some advice.
thank you in advance.
Kind regards,
Lucas Kuijpers
P.S. let me know if i need to send more info, or if there is another github/webpage where i should look
ATAC-seq pipeline exit if no mitochondrial reads are duplicated due to zero division error - see logfile:
File "/Users/christianschmidl/src/open_pipelines/pipelines/atacseq.py", line 359, in parse_duplicate_stats
prefix + "duplicate_percentage": (float(duplicates) / (single_ends + paired_ends * 2)) * 100}
ZeroDivisionError: float division by zero
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.