Comments (3)
Idea 1: pre-generate all feature list names
- Make feature list generating rule a checkpoint with a subsequent aggregation rule that creates a csv similar to the input annotation of enrichment analysis (name, path, background,…) for each analysis. -> this is then required in the target rule instead of the feature_list folder
- Thereby the missing input problem is solved without using the internal data and the annotation of enrichment analysis module has become less cumbersome.
- -> enabling run from A to Z
- Need to explicitly determine the exact filenames before execution and then instruct rules -> Is this actually possible?! I did not manage before in genome_track to make outputs conditional, only inputs using input functions.
- This requires the function dmatrix from library patsy, which in turn requires the Global Workflow Dependency functionality of Snakemake 8
- need to make empty files for groups without DEGs
Idea 2: use checkpoints
- should work directly without a rule in the middle: https://edwards.flinders.edu.au/how-to-use-snakemake-checkpoints/
- Just Checkpoints did not work.
Idea 3: use for loops around the rule
- Check if for loops for rules are supported. Then one rule per analysis with the respective expand for the result files.
Idea 4: input = output?
- Can I have a rule that has its input as output?!
Idea 5: adapt enrichmnet_analysis input
- Change enrichment analysis input to a pattern of the output directory of the differential analysis. Think it threw before testing and implementing
Idea 6: Split up the feature list generation per group
- Con: waste of resources as the result is loaded over and over
- Pro: specific outputs supported by Snakemake
- Request in the final target rule all pre determined feature lists and use wildcards for each group within each analyses.
- Solves the problem without checkpoints or other problems (but requires Snakemake 8)
- To save resources the explicit rule can take the input from the checkpoint but selects only for the lists per analysis and then copies or touches them?
from dea_limma.
Goal: Run analyses from rAw/reAds to pathwayZ/enrichmentZ i.e., close the gap between dea_limma/_seurat and enrichment-anlaysis module
if explicit pre generation of file names, then Snakemake 8 is required
- install Snakemake 8
- setup & document SLURM executor for CeMM HPC
- change module to work with Snakemake 8 and SLURM executor (e.g., move partition from param to resource)
- change & test all other modules, then switch min_version to 8.X.X
- add global workflow dependency ie envs/global.yaml with library patsy for function dmatrix
- develop function that generates file names using patsy
- add it to target rule all as final outcome
- add rule that touches (or copies?) respective files per group from checkpoint or call a new rule/script for feature list generation per group
input: get_feature_lists, output: up = os.path.join(result_path,'{analysis}','feature_lists','{group}_up_features.txt'), up_annot = os.path.join(result_path,'{analysis}','feature_lists','{group}_up_features_annot.txt') if config["feature_annotation"]["path"]!="" else [], # same for down and featureScores.csv
from dea_limma.
predetermining result names potential problem
Requires to look into annotation/metatada data that is upstream generated by eg spilterlize or scRNAseq processing… hence can’t be used for a real A to Z run… But isn't that then a general problem? Think about it thoroughly before testing, then test easily without heavy developing.
Which brings me back to checkpoints between modules being the solution?!?!
from dea_limma.
Related Issues (12)
- configurable gene/feature lists to be plotted as LFC clustered heatmaps and highlighted volcanos (if present)
- Snakemake v 7.21 requires output file in rule all HOT 1
- add example config, annotation and data
- volcano plots legend has duplicate entries HOT 1
- DEA statistics barplot split of up and down by sign
- volcano plot: below indicate absolute LFC being used
- consider saving $E expression matrix that is used in the lmFit linear model
- support contrasts HOT 1
- logFC heatmap job does not complete if no DEGs detected
- consider including edgeR 4.0 workflow
- consider capping heatmaps HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dea_limma.