Code Monkey home page Code Monkey logo

snakemake / snakemake Goto Github PK

View Code? Open in Web Editor NEW
2.1K 19.0 514.0 89.82 MB

This is the development home of the workflow management system Snakemake. For general information, see

Home Page: https://snakemake.github.io

License: MIT License

Shell 0.11% Dockerfile 0.03% Python 27.56% CSS 0.06% Roff 0.01% Rebol 0.01% R 0.02% Makefile 0.22% Julia 0.01% HTML 70.68% Jupyter Notebook 0.09% JavaScript 0.89% Jinja 0.14% Rust 0.07% Vim Script 0.11% C 0.03%
snakemake reproducibility workflow-management

snakemake's Introduction

Gitpod Ready-to-Code GitHub Workflow Status (with event) Sonarcloud Status Bioconda Pypi docker container status Stack Overflow Twitter Mastodon Follow Discord Github stars Contributor Covenant

Snakemake

The Snakemake workflow management system is a tool to create reproducible and scalable data analyses. Snakemake is highly popular, with on average more than 7 new citations per week in 2021, and almost 400k downloads. Workflows are described via a human readable, Python based language. They can be seamlessly scaled to server, cluster, grid and cloud environments without the need to modify the workflow definition. Finally, Snakemake workflows can entail a description of required software, which will be automatically deployed to any execution environment.

Homepage: https://snakemake.github.io

Copyright (c) 2012-2022 Johannes Köster [email protected] (see LICENSE)

snakemake's People

Contributors

boulund avatar bow avatar chrarnold avatar chrisburr avatar daler avatar darwinawardwinner avatar dcroote avatar dependabot[bot] avatar dlaehnemann avatar epruesse avatar felixmoelder avatar fgvieira avatar github-actions[bot] avatar henningtimm avatar holtgrewe avatar jfear avatar johanneskoester avatar kemaleren avatar marcelm avatar mbhall88 avatar mdehollander avatar melund avatar percyfal avatar pvandyken avatar rasmusagren avatar stolpeo avatar tbooth avatar tomkinsc avatar vsoch avatar yesimon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

snakemake's Issues

maximum number of jobs option disregarded

Snakemake version
snakemake=5.4.5

Describe the bug
Hi,
I call snakemake in a drmaa environment with the following cluster settings:

local-cores: 2
jobs: 200
latency-wait: 5
keep-going: True
rerun-incomplete: True
restart-times: 2

Though I have set the job option to 200, I manage to exceed the max. allowed job count of 300.
drmaa.errors.TryLaterException: code 16: job rejected: only 300 jobs are allowed per user (current job count: 300)

Is it possible, that if jobs are restarted, this happens in addition to the already started 200 jobs?

Checkpoint Aggregation Pattern - Snakemake running one sample at a time

I often make rules that might produce an empty output file. I then have some aggregate step that wants to take all non-empty samples and do something to them.

It appears the current checkpoint functionality doesn't support this fully - snakemake runs one sample at a time and then reevaluates the DAG each time rather than running all samples simultaneously through the checkpoint rule.

Here is a minimal reproducible example that shows the behavior:

import os
SAMPLES = ["a","b","c","d"]

rule target:
    input: "output/aggregate_output"

checkpoint maybe_empty:
    output: "output/{sample}.txt"
    run:
        with open(output[0], "w") as f:
            if wildcards.sample=="a":
                pass
            else:
                f.write("something\n")

def get_nonempty_files(wildcards):
    return ["output/{}.txt".format(s) for s in SAMPLES if os.path.getsize(checkpoints.maybe_empty.get(sample=s).output[0])>0]

rule aggregate:
    input:
        get_nonempty_files
    output: "output/aggregate_output"
    run:
        with open(output[0], "w") as f:
            f.write("\n".join(input))

It appears checkpoints.maybe_empty.get(sample=s) raises the error for the first sample it tries, so it goes ahead and runs the necessary steps for that one sample. It then tries again and it raises the error for the second sample, etc.

The behavior I’d want with a checkpoint like this is:

  • the first time the input function runs, figure out all the checkpoint outputs it depends on and run those.

  • then rerun the input function after all checkpoints have been completed.

ProtectedOutputException is thrown for unprotected file

Snakemake version
5.7.0

Describe the bug
When trying to re-execute a rule using snakemake -R <rulename>, snakemake throws an ProtectedOutputException even though the output file is not protected.
Downgrading to snakemake 5.6.0 solves the problem.

Minimal example
Consider the following Snakefile:

rule all:
    input:
        'out.txt'

rule dummy:
    output:
        'out.txt'
    shell:
        """
        touch {output}
        """

Executing it for the first time works as expected:

$ snakemake
Building DAG of jobs...
Using shell: /usr/local/bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
	count	jobs
	1	all
	1	dummy
	2

[Tue Oct  8 13:57:07 2019]
rule dummy:
    output: out.txt
    jobid: 1

[Tue Oct  8 13:57:07 2019]
Finished job 1.
1 of 2 steps (50%) done

[Tue Oct  8 13:57:07 2019]
localrule all:
    input: out.txt
    jobid: 0

[Tue Oct  8 13:57:07 2019]
Finished job 0.
2 of 2 steps (100%) done

Re-executing dummy then fails:

$ snakemake -R dummy
Building DAG of jobs...
Using shell: /usr/local/bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
	count	jobs
	1	all
	1	dummy
	2
ProtectedOutputException in line 5 of /private/tmp/example/Snakefile:
Write-protected output files for rule dummy:
out.txt

GNU Parallel execution backend

If a group of machines do not run a proper cluster backend (e.g., Slurm) and users do not have root access, GNU Parallel is a good option.

It seems like an execution backend could be written to use GNU Parallel as a batch manager. It could use a temp file writable only by the user running snakemake as a job queue.

I don't see an alternative, unless there is a batch scheduler that can run as an unprivileged user (I'm not experienced with batch schedulers).

Allow environment variables to be specified in a workflow

For remote executors, @johanneskoester had the idea to (along with a command line flag) allow the user to specify environment variables in a workflow.

envvars:
"GITHUB_TOKEN",
"SOME_OTHER_VAR"

I like the idea because it looks very similar to GitHub Actions! The idea would be that the workflow would find the value in the host environment, and then securely send to a remote worker.

--touch touch all files even missing ones

Snakemake version
5.7.0

Describe the bug
My workflow stopped in the middle and I want to rerun the latter part only. Since some intermediate files have been deleted, I used --touch to touch all existing files to prevent them from rerunning. But the workflow simply touch all files even if the files are missing without any errors.

Logs
If applicable, any terminal output to help explain your problem.

Minimal example
Add a minimal example for reproducing the bug.

Additional context
Add any other context about the problem here.

Document configfiles argument in Snakemake API and add type check

Is your feature request related to a problem? Please describe.
In the Snakemake API the configfiles argument has recently been changed from configfile to configfiles. Updating the name of the argument w/o casting the argument as list results in iterating over the argument, i.e. the chars of the provided string, leading to
IsADirectoryError: [Errno 21] Is a directory: '/' on *nix systems.

Describe the solution you'd like
Document the argument and check if the argument is a string: if so, cast as list.

if configfiles is None:
   configfiles = []
elif isinstance(configfiles, str):
   configfiles = [configfiles]

This is a small change but user won't have to go through their code, only to find that the bug they have was due to the fact that they supplied a scalar not a list.

I could make a PR if you like.

Cheers,
Simon

Specify configuration directly in subworkflow definition

Is your feature request related to a problem? Please describe.
I try to modularize my workflows a bit, using subworkflows. Subworkflows can have slightly different behaviours dependending on the configuration (for example, using different parameters, different biological database, etc.). In more complex workflows, however, the number of configuration files I have to manage becomes quite large. Often there's also a set of configuration keys that are useful for multiple subworkflows (directory where I keep my samples, ...), so some config would be useful to define at a central single location.

Describe the solution you'd like
I would propose a another directive under subworkflow, named config, where you can explicitly define configuration in a key=value format. This configuration is then passed like you would use the -C command line argument on the snakemake CLI. This would allow me to pass configuration defined in the master snakefile to subworkflows (and maybe allow functions too for on the fly configuration generation, or modifying some value).

Describe alternatives you've considered
I've noticed a slight inconsistency in configuration for subworkflows:

  1. If you invoke the master Snakefile by running snakemake on the command line, and add extra config using the -C argument, then these config definitions will also be defined in any subworkflows, overwriting existing config keys defined in the configfile of a subworkflow.
  2. If you invoke the master Snakefile by running snakemake on the command line, and add extra config with --configfile argument, then this configfile is not applied to subworkflows.

So while specifying configuration using -C could be used for defining configuration keys that should be available in every subworkflow too, I'd prefer to define them in a central config file.

--use-ingularity breaks shadow usage in non-singularity rules

Snakemake version
5.7.4

Describe the bug
In a workflow with some rules using singularity and some not, the ones without singularity container will not switch into a shadow directory if the workflow is executed with --use-singularity.

Logs
running the minimal example with snakemake all produces the correct
test.out, test_a.out, test-b.out and test_c.out

running the minimal example with snakemake --use-singularity -k all fails in rule_b and rule_c and produces:
test_a.out, junk_b.out, junk_c.out

Minimal example
Consider the following example

rule all:
    input: "test.out"

rule singularity_ok:
    input:
        "test_a.out",
        "test_b.out",
        "test_c.out"
    output:
        touch("test.out")
    shell:
        """
        test ! -f junk_a.out
        test ! -f junk_b.out
        test ! -f junk_c.out
        """

rule a:
    output:
        "test_a.out"
    singularity:
        "docker://bash"
    shadow:
        "minimal"
    shell:
        'echo 1 > junk_a.out; echo "test" > {output}'

rule b:
    output:
        "test_b.out"
    shadow:
        "minimal"
    shell:
        'echo 1 > junk_b.out; echo "test" > {output}'

rule c:
    output:
        "test_c.out"
    shadow:
        "minimal"
    run:
        with open(output[0], 'w') as fp:
            print("test", file=fp)
        with open('junk_c.out', 'w') as fp:
            print("junk", file=fp)

Additional context
I played a bit around, seems to be related to code in executor.py:

# Change workdir if shadow defined and not using singularity.
# Otherwise, we do the change from inside the container.
passed_shadow_dir = None
if use_singularity and job_rule.singularity_img:
    passed_shadow_dir = shadow_dir
    shadow_dir = None

This is a duplicate of issue 1280 in the bitbucket repo.

Pay

Using file:// url for wrappers triggers warning

Snakemake version
5.7.1

Describe the bug
I'm using file:// URLs for paths to wrappers.
This causes warnings.

Logs

File path file:///fast/basecalls/seqmux/home/digestiflow-demux/digestiflow_demux/wrappers/fastqc/environment.yaml contains double '/'. This is likely unintended. It can also lead to inconsistent results of the file-matching approach used by Snakemake.

Special treatment of `rule all`

Is your feature request related to a problem? Please describe.

By default, Snakemake will execute the first rule in a workflow. Current rule dependency syntax can only be used sequentially. If one wants to express rule all with rule-dependent inputs, then all must be placed at the end and explicitly requested when running snakemake (there are other hacks around this).

Describe the solution you'd like

It would be great if rule all is treated as an exception, so that if rule all is defined (anywhere in the workflow), then it is treated as the default target rule.

AttributeError: 'Logger' object has no attribute 'logfile_handler'

Snakemake version
5.5.1, 5.7.4

Describe the bug
Trying to run snakemake with cluster-mode on slurm system. Job fails with exception
AttributeError: 'Logger' object has no attribute 'logfile_handler'

Logs
If applicable, any terminal output to help explain your problem.

Minimal example

Snakefile:

experiment_location = config["experiment_location"]
results = config["results"]

input_files, = glob_wildcards(experiment_location + "{IDS}.fastq.gz")

print(experiment_location)
print(results)

rule all:
    input: expand(results+"{IDS}.bam", IDS=input_files)

rule transform:
    input: experiment_location + "{IDS}.fastq.gz"
    output: results + "{IDS}.bam"
    shell: 'echo {input}; echo "test123" > {output}'

Snakemake command:

cluster_base_path=/path/to/dir/

experiment_location=path/to/experiment

results=results/

result_dir=Condition3_N_A_N_A/

snakemake -s ${cluster_base_path}Snakefile --config experiment_location="${cluster_base_path}${experiment_location}" results="${cluster_base_path}${results}${result_dir}" --cluster-config cluster.json --cluster "sbatch -n {cluster.n} -t {cluster.time}" --jobs 1

cluster.json:

{
    "\_\_default\_\_" :
    {
        "account" : "openbisrun",
        "time" : "00:15:00",
        "n" : 1,
        "partition" : "core"
    }
}

Full exception:

Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 transform
1
Resources before job selection: {'_cores': 1, '_nodes': 9223372036854775807}
Ready jobs (1):
transform
Selected jobs (1):
transform
Resources after job selection: {'_cores': 0, '_nodes': 9223372036854775806}

[Mon Oct 28 10:48:42 2019]
rule transform:
input: /data5/openbis/data_volume/bbb-hub-data/snakemake_input/YVESN/PROJECT_TEST1/TEST1/test1_14_R2.fastq.gz
output: /data5/openbis/data_volume/bbb-hub-data/results/Condition3_N_A_N_Atest1_14_R2.bam
jobid: 0
wildcards: IDS=test1_14_R2

echo /data5/openbis/data_volume/bbb-hub-data/snakemake_input/YVESN/PROJECT_TEST1/TEST1/test1_14_R2.fastq.gz; echo "test123" > /data5/openbis/data_volume/bbb-hub-data/results/Condition3_N_A_N_Atest1_14_R2.bam
/data5/openbis/data_volume/bbb-hub-data/snakemake_input/YVESN/PROJECT_TEST1/TEST1/test1_14_R2.fastq.gz
[Mon Oct 28 10:48:42 2019]
Finished job 0.
1 of 1 steps (100%) done
Complete log: None
Traceback (most recent call last):
File "/home/openbisrun/.conda/envs/sm/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/openbisrun/.conda/envs/sm/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/openbisrun/.conda/envs/sm/lib/python3.7/site-packages/snakemake/main.py", line 4, in
main()
File "/home/openbisrun/.conda/envs/sm/lib/python3.7/site-packages/snakemake/init.py", line 1989, in main
show_failed_logs=args.show_failed_logs,
File "/home/openbisrun/.conda/envs/sm/lib/python3.7/site-packages/snakemake/init.py", line 637, in snakemake
logger.cleanup()
File "/home/openbisrun/.conda/envs/sm/lib/python3.7/site-packages/snakemake/logging.py", line 113, in cleanup
self.logger.removeHandler(self.logfile_handler)
AttributeError: 'Logger' object has no attribute 'logfile_handler'

Subworkflow without a configfile gives error

Snakemake version
5.7.0

Describe the bug
If a subworkflow does not have a configfile entry, an error is shown.

Logs

Traceback (most recent call last):
  File "/mypath/miniconda/envs/mydef/lib/python3.7/site-packages/snakemake/__init__.py", line 611, in snakemake
    export_cwl=export_cwl,
  File "/mypath/miniconda/envs/mydef/lib/python3.7/site-packages/snakemake/workflow.py", line 558, in execute
    updated_files=updated,
  File "/mypath/miniconda/envs/mydef/lib/python3.7/site-packages/snakemake/__init__.py", line 389, in snakemake
    overwrite_config.update(load_configfile(f))
  File "/mypath/miniconda/envs/mydef/lib/python3.7/site-packages/snakemake/io.py", line 1303, in load_configfile
    config = _load_configfile(configpath)
  File "/mypath/miniconda/envs/mydef/lib/python3.7/site-packages/snakemake/io.py", line 1265, in _load_configfile
    with open(configpath) as f:
TypeError: expected str, bytes or os.PathLike object, not NoneType

Minimal example

Running this code gives error (if configfile entry is copied to the subworkflow, then the error will disappear):

configfile: "config.yml"

subworkflow test:
    snakefile: "sub.snakefile"

rule all:
    input: test('test.txt')

The sub.snakefile can be:

rule all:
    output: touch('test.txt')

Snakemake version 5.7.0 throws error for symbolic links

Snakemake version
5.7.0

Describe the bug
When I updated to the latest Snakemake version (5.7.0) and ran snakemake with my workflow rules I got an error referring to symlinks not being available on my platform (which is Windows). My data folder (which is used for both input and output in my rules) is symlinked to a remote drive, which I guess trips up this version of Snakemake.

Previous versions of Snakemake still work, so I downgraded to 5.6.0 for now.

I couldn't find this documented anywhere.

Logs

NotImplementedError: access: follow_symlinks unavailable on this platform

Additional context
Snakemake installed via pip within an Anaconda environment on Windows.

Checkpoint aggregate returns checkpoint output dir instead of files

Hi,
the following has been observed in 5.7.1 (edit: also in v5.7.4 and v5.6.0)

For jobs waiting for checkpoint output, I get failed jobs with the following irritating output (simplified):

rule merge_mono_dinucleotide_fraction:
    input: <TBD>
    output: <OMITTED path to output file>
    log: <OMITTED path to log file>
    jobid: 0
   <OMITTED wildcards, resources etc...>

Error in rule merge_mono_dinucleotide_fraction:
    jobid: 0
    output: <OMITTED>
    log: <OMITTED>
    shell:
        samtools merge -@ 6 -O BAM <OMITTED: correct path to output file> input/fastq/strand-seq/HG00733_PRJEB12849/requests &> <OMITTED: path to log file>
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

The <TBD> (to be determined?) probably tells me that Snakemake needs to evaluate the checkpoint in the input function - ok. The checkpoint is long-running (downloading data), and adding a pdb.set_trace() inside my input function shows that there is an IncompleteCheckpoint exception raised (as expected, I presume). Now the problematic part: the path input/fastq/strand-seq/HG00733_PRJEB12849/requests is the directory() output of the checkpoint. Checking the log file of samtools for the above failed job shows the following:

[E::hts_hopen] Failed to open file input/fastq/strand-seq/HG00733_PRJEB12849/requests
[E::hts_open_format] Failed to open file input/fastq/strand-seq/HG00733_PRJEB12849/requests
samtools merge: fail to open "input/fastq/strand-seq/HG00733_PRJEB12849/requests": Is a directory

Apparently, Snakemake detects the unfinished checkpoint, but returns the directory() of the checkpoint as input to the rule (in this case merge_mono_dinucleotide_fraction). If I wait for all jobs to fail, and for the checkpoint to finish, and restart the pipeline, the workflow continues as expected (= showing that the aggregate input function works as intended).

I have trouble coming up with a minimal reproducible example for this, maybe because it's about timing, or the reason is actually something else - nevertheless, the log output of samtools clearly shows that Snakemake executes the rule with the checkpoint output, instead of the output collected by the aggregate input function.
Thanks for looking into this.

Best,
Peter

Below the code of my aggregate input function - as stated above, this works as intended after waiting for the checkpoint to complete (see my comment below):

def collect_merge_files(wildcards):
    """
    """
    individual = wildcards.individual
    bioproject = wildcards.bioproject
    platform = wildcards.platform
    project = wildcards.project
    lib_id = wildcards.lib_id

    requests_dir = checkpoints.create_bioproject_download_requests.get(individual=individual, bioproject=bioproject).output[0]

    search_pattern = '_'.join([individual, project, '{spec}', lib_id, '{run_id}', '1'])

    search_path = os.path.join(requests_dir, search_pattern + '.request')

    checkpoint_wildcards = glob_wildcards(search_path)

    bam_files = expand(
        'output/alignments/strandseq_to_reference/{reference}.{individual}.{bioproject}/{individual}_{project}_{spec}_{lib_id}_{run_id}.filt.sam.bam',
        zip,
        reference=[wildcards.reference, wildcards.reference],
        individual=[individual, individual],
        bioproject=[bioproject, bioproject],
        project=[project, project],
        spec=checkpoint_wildcards.spec,
        lib_id=[lib_id, lib_id],
        run_id=checkpoint_wildcards.run_id)

    assert len(bam_files) == 2, 'Missing merge partner: {}'.format(bam_files)

    return sorted(bam_files)

Rules with outputfiles in subfolders on Windows (possible path separator issue)

Snakemake version
5.7.1
Describe the bug
Unable to specify output files in subfolders on windows. I have tried both
This works on linux and WSL.

Logs
Powershell

PS > snakemake -np tmp/entries.feather
Building DAG of jobs...
MissingRuleException:
No rule to produce tmp\entries.feather (if you use input functions make sure that they don't raise unexpected exceptions).

Git-bash (MINGW)

$ snakemake -np tmp/entries.feather
Building DAG of jobs...
MissingRuleException:
No rule to produce tmp\entries.feather (if you use input functions make sure that they don't raise unexpected exceptions).

Minimal example
Snakefile

rule convert_xlsx:
    input:
        "input/pi/{filename}.xlsx"
    output:
        "tmp/{filename}.feather"
    shell:
        "echo {input} > {output}"

Additional context
I believe this is caused by the path separator. Are there any recommended best practices for writing windows and linux compatible workflows?

Python3.5 support broken

Snakemake version
Version 5.7.4

Describe the bug
Current versions of snakemake are no longer python 3.5 compatible due to the f'' formatting of strings (PEP 498)

Logs
SyntaxError: invalid syntax
(snakemake_35) user@server:temp> snakemake --version
...
from snakemake.dag import DAG
File "~/virtual/snakemake_35/lib/python3.5/site-packages/snakemake/dag.py", line 1665
f'{in_file}'

Minimal example
Add a minimal example for reproducing the bug.
snakemake --version

Additional context
none

jobs still running though snakemake terminated with an error

Snakemake version
snakemake=5.4.5=0

Describe the bug
I have encountered several times in a cluster environment that though snakemake terminated with an error either from snakemake itself or from the cluster environment, not all running jobs were deleted. Shouldn't all jobs be deleted as soon as any error is encountered and snakemake is terminating?

Easy way to benchmark all rules

Is your feature request related to a problem? Please describe.
A common scenario I have is need to know how much RAM and time (CPU/wall clock) and %cpu each process of my workflow used.

I'm excited about the benchmark directive that you can give to rules in order to get this.

But is there a way to get this information for all rules, in a single tsv file?

Describe the solution you'd like
In nextflow there is a -with-trace flag at command-line which gives exactly this: https://www.nextflow.io/docs/latest/tracing.html?highlight=trace

Maybe snakemake could have such a flag too?
Or, a directive in the Snakefile not tied to specific rules: for eg benchmarkfile: much like configfile:

Describe alternatives you've considered
At command-line --stats produces a JSON with run time but not RAM use.
Same for --report but that's html not tsv or json.

self.workflow.default_remote_provider reference not relevant for single workflow

Hey @johanneskoester this is a question for you since the git blame references you for doing this change yesterday! In this section:

def apply_default_remote(self, path):
"""Apply the defined default remote provider to the given path and return the updated _IOFile.
Asserts that default remote provider is defined.
"""
assert (
self.default_remote_provider is not None
), "No default remote provider is defined, calling this anyway is a bug"
path = "{}/{}".format(self.workflow.default_remote_prefix, path)
path = os.path.normpath(path)
return self.workflow.default_remote_provider.remote(path)

the check for the remote provider checks self.workflow.default_remote_provider.remote(path) and this doesn't make sense given that the default_remote_provider parameter is a part of the workflow instance (e.g., it should be self.default_remote_provider.remote(path). I think this might work, however, if there is a subworkflow that holds a workflow. I'm testing now for the addition I'm working on, and it seems to work correctly given the case of no subworkflow, so I'm wondering if this should:

  1. use self.default_remote_provider globally,
  2. check for the workflow attribute (a subworkflow) and if it exists, use it instead
  3. Some other logic?

is there any case of doing some sort of further nesting (e.g., can subworkflows have subworkflows?) For now I'll change it to be 1. until further discussion.

Snakemake not able to see output files until I "ls" the output directory

I am running snakemake from a conda environment from a screen on an LSF cluster. 'pip freeze' tells me I'm running snakemake==5.6.0 (python 3.6.8), and I've seen the same behaviour with snakemake==5.4.0 (python 3.6.0).

I have a rule that 'touches' an output file when it completes.

The log records that these jobs have been completed. However, for some reason snakemake itself hangs a short time after the jobs start, so they are completed, but snakemake doesn't run all the way to completion - it just stops outputting anything to console until I ctrl-C it (not sure if this is related to the bug described below).

If I then do a dry run with "-np", snakemake says it still has to run all of the jobs again.

I then "ls" the output directory that is supposed to contain my output file. I see the file is there. I do another dry run, and now snakemake complains about the metadata for the file I've just looked at with "ls":


IncompleteFilesException:
The files below seem to be incomplete. If you are sure that certain files are not incomplete, mark them as complete with

    snakemake --cleanup-metadata <filenames>

To re-generate the files rerun your command with the --rerun-incomplete flag.
Incomplete files:
/nfs/leia/research/stegle/dseaton/hipsci/singlecell_neuroseq/data/data_processed/pool1_13_noddd_D52/qtl_analysis/eqtl_discovery/celltype_DA/qtl_results_all.parquet.finished

I run --clean-metadata, and then again do a dry run. Now snakemake says all of the jobs need to be run, except the job for the file I've just looked at and cleaned the metadata for. This happens reliably, so I've just gone through looking in specific directories and checking that those files are the ones dropped from the dry-run.

Running with "--touch" doesn't get around this.

One other thing I've noticed is that Snakemake is recently often complaining about missing output files, which has only partly been helped by me increasing the latency by quite a bit (up to 300).

This wasn't happening before with the same Conda environments, so I suspect something has changed with the LSF cluster configuration that is doing this, but I have no idea where the issue could be coming from so I'm not sure what to ask my sysadmin.

Thanks for any help or suggestions, and for the software, it's fantastic.

Deleted log file in directory() output after encountering error

Snakemake version
5.7.1

Description
Using directory() as output and storing the log inside this directory will delete the log upon encountering an error.

This is an example rule:

rule example:
    input:
        "some.input"
    output:
        directory("output/directory")
    log:
        "output/directory/some.log"
    shell:
        "somecomand --input {input} --output_dir {output} 2> {log}" 

which failed with:

Error in rule example:
    jobid: 2
    output: output/directory
    log: output/directory/some.log (check log file(s) for error message)
    shell: somecomand --input some.input --output_dir output/directory 2> output/directory/some.log
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job example since they might be corrupted:
output/directory
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /path/to/.snakemake/log/2019-10-21T192621.555072.snakemake.log

and therefor deleted my log file inside the output directory.

I know it might be intended to save the log files inside a separate local directory like

    log:
        "log/example_rule_some.log"

but I prefer to keep my logs with the created files, if possible.

Possible Solution
check for {log} in {output} if {output} is a directory() and delete everything but {log}

Thanks!
Keep up the good work =)

Edit:
Linux examplePC 3.16.0-7-amd64 #1 SMP Debian 3.16.59-1 (2018-10-03) x86_64

No such file or directory: '/proc/24076/stat'

I'm running snakemake 5.5.1 on a SGE cluster, and I'm getting a lot of the following errors:

Exception in thread Thread-13461173:
Traceback (most recent call last):
  File "/ebio/abt3_projects/software/miniconda3_gt4.4/envs/snakemake_dev/lib/python3.6/site-packages/psutil/_common.py", line 342, in wrapper
    ret = self._cache[fun]
AttributeError: _cache

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/ebio/abt3_projects/software/miniconda3_gt4.4/envs/snakemake_dev/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/ebio/abt3_projects/software/miniconda3_gt4.4/envs/snakemake_dev/lib/python3.6/site-packages/snakemake/benchmark.py", line 108, in run
    self.function(*self.args, **self.kwargs)
  File "/ebio/abt3_projects/software/miniconda3_gt4.4/envs/snakemake_dev/lib/python3.6/site-packages/snakemake/benchmark.py", line 138, in _action
    self.work()
  File "/ebio/abt3_projects/software/miniconda3_gt4.4/envs/snakemake_dev/lib/python3.6/site-packages/snakemake/benchmark.py", line 169, in work
    self._update_record()
  File "/ebio/abt3_projects/software/miniconda3_gt4.4/envs/snakemake_dev/lib/python3.6/site-packages/snakemake/benchmark.py", line 186, in _update_record
    main = psutil.Process(self.pid)
  File "/ebio/abt3_projects/software/miniconda3_gt4.4/envs/snakemake_dev/lib/python3.6/site-packages/psutil/__init__.py", line 446, in __init__
    self._init(pid)
  File "/ebio/abt3_projects/software/miniconda3_gt4.4/envs/snakemake_dev/lib/python3.6/site-packages/psutil/__init__.py", line 473, in _init
    self.create_time()
  File "/ebio/abt3_projects/software/miniconda3_gt4.4/envs/snakemake_dev/lib/python3.6/site-packages/psutil/__init__.py", line 823, in create_time
    self._create_time = self._proc.create_time()
  File "/ebio/abt3_projects/software/miniconda3_gt4.4/envs/snakemake_dev/lib/python3.6/site-packages/psutil/_pslinux.py", line 1507, in wrapper
    return fun(self, *args, **kwargs)
  File "/ebio/abt3_projects/software/miniconda3_gt4.4/envs/snakemake_dev/lib/python3.6/site-packages/psutil/_pslinux.py", line 1717, in create_time
    ctime = float(self._parse_stat_file()['create_time'])
  File "/ebio/abt3_projects/software/miniconda3_gt4.4/envs/snakemake_dev/lib/python3.6/site-packages/psutil/_pslinux.py", line 1507, in wrapper
    return fun(self, *args, **kwargs)
  File "/ebio/abt3_projects/software/miniconda3_gt4.4/envs/snakemake_dev/lib/python3.6/site-packages/psutil/_common.py", line 345, in wrapper
    return fun(self)
  File "/ebio/abt3_projects/software/miniconda3_gt4.4/envs/snakemake_dev/lib/python3.6/site-packages/psutil/_pslinux.py", line 1553, in _parse_stat_file
    with open_binary("%s/%s/stat" % (self._procfs_path, self.pid)) as f:
  File "/ebio/abt3_projects/software/miniconda3_gt4.4/envs/snakemake_dev/lib/python3.6/site-packages/psutil/_common.py", line 587, in open_binary
    return open(fname, "rb", **kwargs)
FileNotFoundError: [Errno 2] No such file or directory: '/proc/24076/stat'

The pipeline is still running even though these errors have been occurring regularly during the ~7 day run, so the errors are not causing the snakemake job to die.

It appears that _parse_stat_file() can't open the stat file. Any idea on what may be causing this? I am running 200 parallel jobs, so maybe it's some sort of latency issue?

ChildIOException inconsistency

Snakemake version
I could reproduce this behavior in versions 5.6.0, 5.4.4 and 5.4.2

Describe the bug
I have found some inconsistencies related to the ChildIoException error. As I understand, ChildIoException occurs when a rule has multiple outputs, one of which is a directory, and the others are files inside the directory (Correct me if I’m wrong)

Minimal example
So, the following example gives ChildIoException:

rule All: input: "C"

rule A:
    output: "outputA/A1", "outputA/A2", directory("outputA")
    shell: "mkdir -p outputA; touch outputA/A1; touch outputA/A2"

rule C:
    input: test="outputA/A1", testb="B"
    output: "C"
    shell: "cp {input.test} C"
ChildIOException:
File/directory is a child to another output:
/home/elsa/test/outputA
/home/elsa/test/outputA/A1

However, I have found that it is possible to work around that error by adding another rule which has the directory (“outputA”) as an input. The following code gives no error and executes correctly:

rule All: input: "C"

rule A:
    output: "outputA/A1", "outputA/A2", directory("outputA")
    shell: "mkdir -p outputA; touch outputA/A1; touch outputA/A2"

rule B:
    input: "outputA"
    output: "B"
    shell: "touch B"

rule C:
    input: test="outputA/A1", testb="B"
    output: "C"
    shell: "cp {input.test} C"

So, which behavior is correct?

Input file function seems not to be fully executing

Version 5.7.4

Hi I'm really hoping there's a resolution for this problem. I'm pretty excited about the checkpoint feature but by now am wondering whether it will work for me.

The parallel portion of my fan out / execute / aggregate process runs a number of jobs that is not known until the fan-out rule executes and generates its output files. To accomplish this, I marked the fan-out rule as a checkpoint, and its output is a directory. After initial attempts failed, I wrote the input function for my aggregate rule, just the way it is in the example.

However, when this function is called, it appears to skip over my return statement, returning instead just the directory name, as if in this function I had written return checkpoint_output.

So what happens, is that instead of triggering the intermediate step (which would happen if merge_potential_results had provided list of files provided by that step), it only triggers the checkpoint (fan-out) step. In this case, specifically when I run this part of the workflow, I get the error, potential/molpro is a directory. This is because for some reason the input function is returning potential/molpro (the output from the checkpoint) rather than the list of files that should be made by the second statement of the molpro_pot_outs() function.

Any other info I can provide please let me know.

checkpoint potential_comp_inputs:
    """ Provides most of the inputs for the potential calculation.
        Requires the *optimized hin" file generated in optimization.
        Creates xyz file and a dynamic number of molpro.in files to be
        run in parallel, along with files provided by `generate_densities`
    """
    output:
        molpro_in = temp(directory("potential/molpro")),
        lat = temp("potential/molpro.lat"),
        pot = temp("potential/molpro.pot"),
        xyz = temp("potential/geometry.xyz")
    input: "optimized/molecule.hin"
    run:
        call_command(opt_hin_to_pot_lat, "optimized/molecule.hin")


rule potential_chunk:
    input:
        molpro = "potential/molpro/molpro-{idx}.in",
        molpro_int = "potential/molpro.int",
        wfu = "potential/wavefile.wfu",
        aux = "potential/molpro.aux"
    output:
        out = "potential/molpro_out/molpro-{idx}.out",
        potout = "potential/molpro_out/molpro-{idx}.potout"
    params:
        basis_set = config["basis_set"],
        charge = config["charge"]
    threads: 4
    resources:
        mem_mb = 13_600
    run:
        call_command(potential_chunk, f"-m {resources.mem_mb} -n {threads} "
                     f"{input.molpro} {params.charge} {params.basis_set}")

def molpro_pot_outs(wildcards):
    checkpoint_output = checkpoints.potential_comp_inputs.get().output[0]
    return expand("potential/molpro_out/molpro-{idx}.potout",
                  idx=glob_wildcards(
                      path.join(checkpoint_output, 'molpro-{idx}.in')).idx)

rule merge_potential_results:
    input:
        molpro_pot_outs
    output:
        "potential/molpro.potout"
    run:
        print('merge_potential_results'+str(input))
        call_command(merge, seq(input).make_string(' ') + f' {output}')

Direct support for parameter space exploration

Is your feature request related to a problem? Please describe.
When exploring large parameter spaces, it becomes tedious to manually encode them as wildcards.

Describe the solution you'd like
A reasonable solution is to generate hashes of parameter combinations and use those as wildcard values. While this is possible now, it requires a fair amount of boilerplate code. It would be nice to have ergonomic support for this baked into Snakemake.

Describe alternatives you've considered
Manual boilerplate code for performing hashing and lookup.

TypeError: resource() got an unexpected keyword argument 'keep_local'

Snakemake version
5.7.0

Describe the bug
If I run snakemake via tibanna, I get the error:

TypeError: resource() got an unexpected keyword argument 'keep_local'

Logs

snakemake --tibanna --default-remote-prefix=tibanna/run_1                      
Traceback (most recent call last):
  File "/tmp/tmp.1BY3tqvG9v/venv/lib/python3.6/site-packages/snakemake/__init__.py", line 421, in snakemake
    keep_local=True, is_default=True
  File "/tmp/tmp.1BY3tqvG9v/venv/lib/python3.6/site-packages/snakemake/remote/S3.py", line 41, in __init__
    self._s3c = S3Helper(*args, **kwargs)  # _private variable by convention
  File "/tmp/tmp.1BY3tqvG9v/venv/lib/python3.6/site-packages/snakemake/remote/S3.py", line 167, in __init__
    self.s3 = boto3.resource("s3", **kwargs)
  File "/tmp/tmp.1BY3tqvG9v/venv/lib/python3.6/site-packages/boto3/__init__.py", line 100, in resource
    return _get_default_session().resource(*args, **kwargs)
TypeError: resource() got an unexpected keyword argument 'keep_local'

Minimal example
Simply cloning https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling, copying the .test directory into the root, and then running snakemake --tibanna --default-remote-prefix=some_bucket/key will cause this issue

Some python package versions

appdirs==1.4.3
attrs==19.2.0
Benchmark-4dn==0.5.6
boto3==1.9.248
botocore==1.12.248
certifi==2019.9.11
chardet==3.0.4
ConfigArgParse==0.15.1
datrie==0.8
docutils==0.15.2
gitdb2==2.0.6
GitPython==3.0.3
idna==2.8
importlib-metadata==0.23
jmespath==0.9.4
jsonschema==3.1.1
more-itertools==7.2.0
pkg-resources==0.0.0
psutil==5.6.3
pyrsistent==0.15.4
python-dateutil==2.8.0
python-lambda-4dn==0.12.2
PyYAML==5.1.2
ratelimiter==1.2.0.post0
requests==2.22.0
s3transfer==0.2.1
six==1.12.0
smmap2==2.0.5
snakemake==5.7.0
tibanna==0.10.0
urllib3==1.25.6
wrapt==1.11.2
zipp==0.6.0

Support for charliecloud containers

Hi,

seeing that snakemake already supports docker/singularity, I was wondering whether charliecloud (https://github.com/hpc/charliecloud) could also be supported? This would be a nice additional option of creating fully reproducible workflows which can easily be distributed. In addition, not all HPC environments support/run a docker daemon (as is the case for me) and charliecloud is a very nice and light-weight alternative which doesn't need a background service running.

I realize there are pros/cons to using e.g. docker vs charliecloud, but I believe charliecloud-support would be a worthwhile feature.

Intro

A typical charliecloud usage involves:

  1. buliding an image-tarball from a docker image
  2. extracting these images to the system you want to run your job on
  3. run your job from within the container

For 2.) and 3.) these are simple command-line calls: ch-tar2dir [...] and ch-run [...] for which it might be relatively easy to include them in snakemake. For 1.) a user would have to take care of it him/herself.

Implementation

There could be two possible strategies for an implementation in snakemake:

  1. Specify the image-tarball when calling snakemake/within the rule definitions (similar to the conda envs). Whenever rules are invoked which should run within a charliecloud env, the image gets extracted (ch-tar2dir [...]) and the call to e.g. Rscript etc. wrapped within a ch-run [...] call
  2. Specify the flatted/extracted root-directory of the charliecloud image. In that case it would suffice for snakemake to wrap rules using charliecloud to wrap the command in a ch-run call.

Additional notes

There are some options to ch-run which might be relevant for users when using charliecloud, such as 'bind-mounting' certain directories (-b option) such that they are available from within the container and other options (compare e.g. this command reference for ch-run).


While I'm no expert on snakemake internals, I'd be happy to provide more information on charliecloud and serve as a tester of this functionality.

What do you think?

Best and thanks!
Johann

Customizing snakemake console output

Hello!

Over at anvi'o we are using snakemake extensively to automate our workflows and it works gorgeously. Imo anvi'o's ease of use can really be demarcated by "before snakemake" and "after snakemake". So thank you so much for all of your commitment to this project!

One thing I was interested in doing is capturing all of the logging info and processing it in a customized manner, so that I can tailor exactly what is sent to the console. I'm probably thinking about this the wrong way, but I was imagining perhaps handling this by creating a custom Logger.text_handler method found in logging.py. Anyways, I'm very interested to see your perspective one whether snakemake is set up to handle client-side customization of the console output.

Thanks so much for your time.

Cheers,
Evan

Default remote provider for Azure Storage

Given that Snakemake now supports Azure Storage, would it be possible to add the command line --default-remote-provider for Azure, in addition to being able to access within a Snakefile?

"SyntaxError" for the Snakefile located in the directory which name cntains an apostrophe symbol

Environment: Snakemake 5.4.5, Python 3.6.5, Windows 10, MinGW.

Way to reproduce: take a fully valid project and put into a directory that has an apostrophe in it’s name, e.g. “I’ll not work“, try to run. The Snakemake raises an SyntaxError “invalid syntax“ on the first rule found. Rename the directory back and remove the apostrophe, e.g. “I will work“, the project works again.

Getting the input and output files of a rule in Python API

Is your feature request related to a problem? Please describe.
I want to use the Python API to query the input and output files of each rule.

Describe the solution you'd like
At the very least, allow a way to return the workflow object from the snakemake function. The ideal case will be to return a list of dictionary, each detailing the input and output of each rule

Describe alternatives you've considered
The printfilegraph option offers something similar, but its output is in HTML and requires further parsing.

Additional context
Implementing such function will be very useful during development, where the user may want to run each script independently, can refer to the original snakefile for input and output locations. The script in development can check if it is run by snakemake, and if not, setup a debug target to query the input and output files of a rule.

Support execution of Jupyter Notebooks using script directive

When writing pipelines which involve many visualizations and light data processing, I tend to implement the individual steps using Jupyter Notebooks and glue them together using Snakemake.
At the moment, I execute them manually using nbconvert. This is inconvenient and makes passing parameters difficult/impossible.

An elegant solution to this situation would be to allow Jupyter notebooks (*.ipynb) to be used via the script directive:

rule foo:
  input: 'data.csv'
  output: 'figure.pdf'
  script: 'CreateFigure.ipynb'

This was already discussed in a now stale bitbucket issue, but no solution was proposed.
Other efforts exist, but seem outdated and not well integrated into Snakemake itself.
Taking a quick look at the current Snakemake source suggests that no implementation efforts have been started.

If this is a feature others would be interested in as well, I'd be very happy to contribute a PR.

edit: I started an implementation here.

Create classes for Run and Wrapper Parameters

As discussed in vsoch#1 (comment), it might be good to refactor the Wrapper and Run parameters to each have a class instead of appending to a long list of arguments. As an example:

class RunParams:
    def __init__(self):
        self.rulename = None
        self.input = None
        ...

class WrapperParams(RunParams):
    ...

.Rprofile ignored when running an R script

Snakemake version
5.7.1

Describe the bug
A rule executing an R script using the script directive does not execute the content of .Rprofile. I found out this was the case because my packrat environment is not being loaded (the line source("packrat/init.R" is inside .Rprofile)

Logs
If applicable, any terminal output to help explain your problem.

Minimal example
Configure a packrat folder

  • R -e 'install.packages(\"packrat\", repos=\"http://cran.us.r-project.org\")'
  • R -e 'packrat::init()'

Install RMySQL

  • R -e 'install.packages(c("RMySQL"))'

Create a rule executing an R file with only one line:

  • library(RMySQL)

Additional context
.Rprofile, packrat folder and the snakefile are in the root directory

Remove some deprecated functionality

  • remove dynamic as this is now supported via checkpoints, keep a dummy that fails with an explanatory error message.
  • remove version keyword, as this is now much better handled via conda and singularity integration
  • remove R() helper function, as this is now handled via the script keyword. Keep a dummy that fails with an explanatory error message.

SyntaxError in pickle procress

Snakemake version
5.7
no bug in 5.4.1

Describe the bug
We encountered a bug in the metagneome-atlas snakemake pipeline, when running an external python script which uses multiple threads and uses snakemake.shell.shell tu run command application.
The rule works on most sytems, but multiple users encountered the problem

The error message complains about a SyntaxError in a pickle process, done by snakemake.
From the error message, it is difficult to go back and see where there might be an error in my snakemake or python script.

Logs

Activating conda environment: /stor/work/Ochman/sean/projects/bee_metagenomes/databases/conda_envs/f6f0da5b
Job counts:
count jobs
1 predict_genes_genomes
1
Traceback (most recent call last):
File "/stor/work/Ochman/sean/projects/bee_metagenomes/full_run_1/.snakemake/shadow/tmp8mzrmw3k/.snakemake/scripts/tmp7l63qsgu.predict_genes_of_genomes.py", line 3, in
import sys; sys.path.extend(["/stor/home/spl552/miniconda3/envs/atlas/lib/python3.6/site-packages", "/stor/work/Ochman/sean/projects/atlas-tutorial/atlas/atlas/rules"]); import pickle; snakemake = pickle.loads(b'\x80\x03csnakemake.script\nSnakemake\nq\x00)\x81q\x01}q\x02(X\x05\x00\x0
genes_genomesq\xcbX\x0f\x00\x00\x00bench_iterationq\xccNX\t\x00\x00\x00scriptdirq\xcdX@\x00\x00\x00/stor/work/Ochman/sean/projects/atlas-tutorial/atlas/atlas/rulesq\xceub.'); from snakemake.logging import logger; logger.printshellcmds = False; real_file = file; file = '/stor/work/Ochman/sean/projects/atlas-tutorial/atlas/atlas/rules/predict_genes_of_genomes.py';
File "/stor/home/spl552/miniconda3/envs/atlas/lib/python3.6/site-packages/snakemake/init.py", line 21, in
from snakemake.workflow import Workflow
File "/stor/home/spl552/miniconda3/envs/atlas/lib/python3.6/site-packages/snakemake/workflow.py", line 30, in
from snakemake.dag import DAG
File "/stor/home/spl552/miniconda3/envs/atlas/lib/python3.6/site-packages/snakemake/dag.py", line 1618
return f"#{hex_r:0>2X}{hex_g:0>2X}{hex_b:0>2X}"
^
SyntaxError: invalid syntax
[Tue Oct 22 11:34:23 2019]
Error in rule predict_genes_genomes:
jobid: 0
output: genomes/annotations/genes
log: logs/genomes/prodigal.log (check log file(s) for error message)
conda-env: /stor/work/Ochman/sean/projects/bee_metagenomes/databases/conda_envs/f6f0da5b

RuleException:
CalledProcessError in line 451 of /stor/work/Ochman/sean/projects/atlas-tutorial/atlas/atlas/rules/genomes.smk:
Command 'source /stor/home/spl552/miniconda3/bin/activate '/stor/work/Ochman/sean/projects/bee_metagenomes/databases/conda_envs/f6f0da5b'; set -euo pipefail; python /stor/work/Ochman/sean/projects/bee_metagenomes/full_run_1/.snakemake/shadow/tmp8mzrmw3k/.snakemake/scripts/tmp7l63qsgu.predict_genes_of_genomes.py' returned non-zero exit status 1.
File "/stor/work/Ochman/sean/projects/atlas-tutorial/atlas/atlas/rules/genomes.smk", line 451, in __rule_predict_genes_genomes

Don't delete temporary wrapper scripts

This is a copied issue from snakemake-workflows/docs#15, moved to the correct location!

I'm trying to get Snakemake working with Singularity, and I need to debug the Singularity command, but the script wrapper that does the execution doesn't exist after the failure, e.g., I need to test:

 singularity exec --home /home/vanessa/Documents/Dropbox/Code/labs/cherry/snakemake/encode-demo-workflow  --bind /home/vanessa/anaconda3/lib/python3.7/site-packages:/mnt/snakemake /home/vanessa/Documents/Dropbox/Code/labs/cherry/snakemake/encode-demo-workflow/.snakemake/singularity/328e6123b3d8f239ce917fa97ccbbd80.simg bash -c 'set -euo pipefail;  python /home/vanessa/Documents/Dropbox/Code/labs/cherry/snakemake/encode-demo-workflow/.snakemake/scripts/tmp2navashs.wrapper.py'

And why is home being force bound to be the present working directory? That seems strange for Singularity. Should it be --pwd?

The --forceall flag is not propagated to subworkflows

I'm trying to run a workflow with subworkflows with the --forceall flag specified.

Expected behavior: subworkflow rebuild all rules.

Actual behavior: subworkflows don’t rebuild any rule if the files are present.

As I can see, the problem is in the snakemake function in the __init__ file, line 414. The subsnakemake variable is being created as a partial function where the forceall flag is not specified (and the default value = False is used later). Actually the whole list of frozen parameters should be revised as there should be other parameters that need to be propagated to subworkflows.

A formatter such as black

Coming from a word where there's a linter and formatter for everything, and then there's not one for Snakemake

Integration into black or another formatter for Snakemake files.

We could just suffer through, but I noticed that you forked black, so there may be work going on already for this.

Setting jobs in profiles to use all available cores

(Manually migrated from the original Bitbucket issue #1305)

I am testing out Snakemake profiles and would like to specify the number of jobs to be the same as the number of available cores. When using command-line arguments, I would be able to do this:

snakemake -j

and Snakemake will determine the right number of cores to allocate. If I instead of a profile/config.yaml file that looks like

jobs:

I get this error:

snakemake: error: argument --cores/--jobs/-j: invalid int value: 'None'

Is there a way to get the behavior of simply passing -j while using profiles?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.