DEA modules with explicit output to enable usage as module with subsequent modules (av

Idea 1: pre-generate all feature list names Make feature lis

consider determining the exact output files from the start for complete compatibility about dea_limma HOT 3 OPEN

sreichl commented on June 22, 2024

consider determining the exact output files from the start for complete compatibility

from dea_limma.

Comments (3)

sreichl commented on June 22, 2024

Idea 1: pre-generate all feature list names

Make feature list generating rule a checkpoint with a subsequent aggregation rule that creates a csv similar to the input annotation of enrichment analysis (name, path, background,…) for each analysis. -> this is then required in the target rule instead of the feature_list folder
Thereby the missing input problem is solved without using the internal data and the annotation of enrichment analysis module has become less cumbersome.
-> enabling run from A to Z
Need to explicitly determine the exact filenames before execution and then instruct rules -> Is this actually possible?! I did not manage before in genome_track to make outputs conditional, only inputs using input functions.
This requires the function dmatrix from library patsy, which in turn requires the Global Workflow Dependency functionality of Snakemake 8
need to make empty files for groups without DEGs

Idea 2: use checkpoints

should work directly without a rule in the middle: https://edwards.flinders.edu.au/how-to-use-snakemake-checkpoints/
Just Checkpoints did not work.

Idea 3: use for loops around the rule

Check if for loops for rules are supported. Then one rule per analysis with the respective expand for the result files.

Idea 4: input = output?

Can I have a rule that has its input as output?!

Idea 5: adapt enrichmnet_analysis input

Change enrichment analysis input to a pattern of the output directory of the differential analysis. Think it threw before testing and implementing

Idea 6: Split up the feature list generation per group

Con: waste of resources as the result is loaded over and over
Pro: specific outputs supported by Snakemake
Request in the final target rule all pre determined feature lists and use wildcards for each group within each analyses.
Solves the problem without checkpoints or other problems (but requires Snakemake 8)
To save resources the explicit rule can take the input from the checkpoint but selects only for the lists per analysis and then copies or touches them?

from dea_limma.

sreichl commented on June 22, 2024

Goal: Run analyses from rAw/reAds to pathwayZ/enrichmentZ i.e., close the gap between dea_limma/_seurat and enrichment-anlaysis module

if explicit pre generation of file names, then Snakemake 8 is required

install Snakemake 8
setup & document SLURM executor for CeMM HPC
change module to work with Snakemake 8 and SLURM executor (e.g., move partition from param to resource)
- change & test all other modules, then switch min_version to 8.X.X
add global workflow dependency ie envs/global.yaml with library patsy for function dmatrix
develop function that generates file names using patsy
add it to target rule all as final outcome

add rule that touches (or copies?) respective files per group from checkpoint or call a new rule/script for feature list generation per group

input:
    get_feature_lists,
output:
    up = os.path.join(result_path,'{analysis}','feature_lists','{group}_up_features.txt'),
    up_annot = os.path.join(result_path,'{analysis}','feature_lists','{group}_up_features_annot.txt') if config["feature_annotation"]["path"]!="" else [],
    # same for down and featureScores.csv

from dea_limma.

sreichl commented on June 22, 2024

predetermining result names potential problem
Requires to look into annotation/metatada data that is upstream generated by eg spilterlize or scRNAseq processing… hence can’t be used for a real A to Z run… But isn't that then a general problem? Think about it thoroughly before testing, then test easily without heavy developing.
Which brings me back to checkpoints between modules being the solution?!?!

from dea_limma.

Recommend Projects

consider determining the exact output files from the start for complete compatibility about dea_limma HOT 3 OPEN

Comments (3)

Idea 1: pre-generate all feature list names

Idea 2: use checkpoints

Idea 3: use for loops around the rule

Idea 4: input = output?

Idea 5: adapt enrichmnet_analysis input

Idea 6: Split up the feature list generation per group

Related Issues (12)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent