Comments (56)
Hi, Any update about this bug? Thanks.
from drop.
Hi,
that is indeed a bug, thanks for pointing that out, we will be fixing it soon. The issue is that the point/period in the filename causes the output filename to be truncated early.
For the moment, you can create a symlink to your fasta file (so you don't have to copy the file) that does not contain any point characters before the .fa
ending. In your case it should work with
ln -s /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38.primary_assembly.genome.fa /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38_primary_assembly_genome.fa
and replacing the corresponding entry in the config to
mae:
genome: /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38_primary_assembly_genome.fa
Hope this solves the problem for now.
from drop.
I tried, but still getting errors...
MissingInputException in line 38 of /gpfs/home/evrong01/.local/lib/python3.6/site-packages/wbuild/wBuild.snakefile:
Missing input files for rule markdown:
MAE/UDP--v32_results.md
[Thu Aug 13 00:55:45 2020]
Error in rule unlock:
jobid: 0
output: /gpfs/scratch/evrong01/droptest2/.drop/tmp/MAE/unlock
shell:
snakemake --unlock --configfile /gpfs/scratch/evrong01/droptest2/.drop/tmp/config.yaml
(exited with non-zero exit code)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.snakemake/log/2020-08-13T005543.671615.snakemake.log
check for missing R packages
Structuring dependencies...
Dependencies file generated.
Building DAG of jobs...
Subworkflow AE: Nothing to be done.
Subworkflow AS: Nothing to be done.
Executing subworkflow MAE.
Structuring dependencies...
Dependencies file generated.
Building DAG of jobs...
MissingInputException in line 56 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
Missing input files for rule create_dict:
/gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38_primary_assembly.genome.fa
from drop.
Make sure that you specify the filename correctly, the fasta file doesn't seem to exist. Also, remove the last dot before the file ending from
/gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38_primary_assembly.genome.fa
to
/gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38_primary_assembly_genome.fa
otherwise the error will persist.
If the unlock us giving you problems, try deleting the .snakemake
directory in the project directory.
from drop.
Next bug I'm getting. This is with the conda environment...
Error: package or namespace load failed for ‘tMAE’:
package ‘tMAE’ was installed before R 4.0.0: please re-install it
Execution halted
from drop.
Which R version are you using?
which R
It should be the one in your conda environment
from drop.
I'm using the drop conda environment. So if there is an issue, it is because of an issue in the drop conda environment configuration.
R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
from drop.
Hm, seems as though R might access different library paths. Which paths do you get when you call
.libPaths()
in R in the environment? You should only have a single path that links to the conda environment.
If it has more than one path, we could have some issues reinstalling tMAE.
Also, which Bioconductor version do you have?
BiocManager::version()
from drop.
Ok, the libPaths was sourcing my regular R first instead of the conda R. This is a bug in conda. The point of conda is to have a separate environment. However, conda still sources the original R library paths from the root user. See here for details:
https://waoverholt.com/conda-and-R/
The solution is:
In your conda package, add to here: drop_conda/etc/conda/activate.d/activate-r-base.sh
this line: export R_LIBS=DROPCONDA_ENVIRONMENT/lib/R/library
where DROPCONDA_ENVIRONMENT is the path to the conda environment
from drop.
Next bug in mae conda. How do I fix this?
Started with deseq
[1] "Running DESeq..."
Error in DESeqDataSet(se, design = design, ignoreRank) :
counts matrix should be numeric, currently it has mode: logical
Calls: DESeq4MAE ... deseq_for_allele_specific_expression -> DESeqDataSetFromMatrix -> DESeqDataSet
Execution halted
[Fri Aug 14 12:31:02 2020]
Error in rule Scripts_MAE_deseq_mae_R:
jobid: 9
output: /gpfs/scratch/evrong01/droptest2/root/processed_results/mae/samples/1841982--UDP-1003_RNA_res.Rds
from drop.
Thanks for finding the issue. For some reason, we I don't get this behaviour on my machine, despite having a system installation of R, will need some time to figure put what is going on.
from drop.
Next bug in mae conda. How do I fix this?
Started with deseq
[1] "Running DESeq..."
Error in DESeqDataSet(se, design = design, ignoreRank) :
counts matrix should be numeric, currently it has mode: logical
Calls: DESeq4MAE ... deseq_for_allele_specific_expression -> DESeqDataSetFromMatrix -> DESeqDataSet
Execution halted
[Fri Aug 14 12:31:02 2020]
Error in rule Scripts_MAE_deseq_mae_R:
jobid: 9
output: /gpfs/scratch/evrong01/droptest2/root/processed_results/mae/samples/1841982--UDP-1003_RNA_res.Rds
This error is most likely due to a failed GATK run. Try removing all the MAE output ($ROOT/processed_data
where $ROOT
is the project root path specified in config.yaml
) before rerunning the pipeline.
from drop.
Ok I will try removing failed MAE output. Why doesn't snakemake or drop detect this automatically and rerun? This should be fixed.
from drop.
Deleting the mae directory in the root/processed_results folder gives the prior bug/error:
MissingInputException in line 38 of /gpfs/home/evrong01/.local/lib/python3.6/site-packages/wbuild/wBuild.snakefile:
Missing input files for rule markdown:
MAE/UDP--v32_results.md
[Fri Aug 14 12:53:06 2020]
Error in rule unlock:
jobid: 0
output: /gpfs/scratch/evrong01/droptest2/.drop/tmp/MAE/unlock
shell:
snakemake --unlock --configfile /gpfs/scratch/evrong01/droptest2/.drop/tmp/config.yaml
(exited with non-zero exit code)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.snakemake/log/2020-08-14T125304.030683.snakemake.log
from drop.
It also gives this error:
Building DAG of jobs...
MissingInputException in line 62 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
Missing input files for rule create_SNVs:
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
[Fri Aug 14 12:51:59 2020]
Error in rule unlock:
jobid: 0
output: /gpfs/scratch/evrong01/droptest2/.drop/tmp/MAE/unlock
shell:
snakemake --unlock --configfile /gpfs/scratch/evrong01/droptest2/.drop/tmp/config.yaml
(exited with non-zero exit code)
from drop.
In general, mae is a lot more buggy than the other pipelines. There must be something in its foundation that causes it to have so many problems compared to the other drop pipelines.
from drop.
This directory that it complains about doesn't even exist...
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/
I tried 'drop update' and it still doesn't work.
So the suggestion of deleting $ROOT/processed_data/mae broke the pipeline. I'm not sure how to fix it now.
from drop.
Sorry, the mae-pipeline folder exists, but the mae-pipeline/.drop folder does not exist:
[evrong01@bigpurple-ln1 droptest2]$ ls -a /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/
. .. mae_readme.md resource Scripts Snakefile .snakemake
So this error points to files that do not exist. This is definitely a bug in the underlying code. The pipeline has path names that don't exist. This is what causes the unlock commands to have errors every time. I suspect the unlock commmand has several bugs.
MissingInputException in line 62 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
Missing input files for rule create_SNVs:
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt
from drop.
Can you print the contents of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile
? There might be a command that doesn't always give the same result for some reason. I'd just fix it manually for now.
Yes MAE is a bit buggy, as GATK commands don't error in the same way as other commands, and require unlocking, which doesn't work as it should for submodules. We are still working on these issues on the new release. It still takes time before we establish that main features are tested more systematically, so we don't run into the same reproducibility issues as we do now.
from drop.
I think the unlock command or drop update have a bug in how it sets the base folder location, because it looks like it has the similar folder path twice:
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
= /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline + .drop/modules/mae-pipeline + Scripts/MAE/filterSNVs.sh
But that is the wrong path. The correct path is: /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
from drop.
Here is the mae Snakefile
`### SNAKEFILE MONOALLELIC EXPRESSION
import os
import drop
import pathlib
METHOD = 'MAE'
SCRIPT_ROOT = drop.getMethodPath(METHOD, type_='workdir', str_=False)
CONF_FILE = drop.getConfFile()
parser = drop.config(config, METHOD)
config = parser.parse()
include: config['wBuildPath'] + "/wBuild.snakefile"
FUNCTIONS
def fasta_dict(fasta_file):
return fasta_file.split('.')[0] + ".dict"
def getVcf(rna_id, vcf_id="qc"):
if vcf_id == "qc":
return config["mae"]["qcVcf"]
else:
return parser.getProcDataDir() + f"/mae/snvs/{vcf_id}--{rna_id}.vcf.gz"
def getQC(format):
if format == "UCSC":
return config["mae"]["qcVcf"]
elif format == "NCBI":
return parser.getProcDataDir() + "/mae/qc_vcf_ncbi.vcf.gz"
else:
raise ValueError(f"getQC: {format} is an invalid chromosome format")
def getChrMap(SCRIPT_ROOT, conversion):
if conversion == 'ncbi2ucsc':
return SCRIPT_ROOT/"resource"/"chr_NCBI_UCSC.txt"
elif conversion == 'ucsc2ncbi':
return SCRIPT_ROOT/"resource"/"chr_UCSC_NCBI.txt"
else:
raise ValueError(f"getChrMap: {conversion} is an invalid conversion option")
def getScript(type, name):
return SCRIPT_ROOT/"Scripts"/type/name
rule all:
input:
rules.Index.output,
config["htmlOutputPath"] + "/mae_readme.html",
rules.Scripts_MAE_Datasets_R.output,
rules.Scripts_QC_Datasets_R.output
output: touch(drop.getMethodPath(METHOD, type_='final_file'))
rule sampleQC:
input: rules.Scripts_QC_Datasets_R.output
output: touch(drop.getTmpDir() + "/sampleQC.done")
rule create_dict:
input: config['mae']['genome']
output: fasta_dict(config['mae']['genome'])
shell: "gatk CreateSequenceDictionary --REFERENCE {input[0]}"
MAE
rule create_SNVs:
input:
ncbi2ucsc = getChrMap(SCRIPT_ROOT, "ncbi2ucsc"),
ucsc2ncbi = getChrMap(SCRIPT_ROOT, "ucsc2ncbi"),
vcf_file = lambda wildcards: parser.getFilePath(sampleId=wildcards.vcf,
file_type='DNA_VCF_FILE'),
bam_file = lambda wildcards: parser.getFilePath(sampleId=wildcards.rna,
file_type='RNA_BAM_FILE'),
script = getScript("MAE", "filterSNVs.sh")
output:
snvs_filename=parser.getProcDataDir() + "/mae/snvs/{vcf}--{rna}.vcf.gz",
snvs_index=parser.getProcDataDir() + "/mae/snvs/{vcf}--{rna}.vcf.gz.tbi"
shell:
"""
{input.script} {input.ncbi2ucsc} {input.ucsc2ncbi} {input.vcf_file}
{wildcards.vcf} {input.bam_file} {output.snvs_filename}
{config[tools][bcftoolsCmd]} {config[tools][samtoolsCmd]}
"""
rule allelic_counts:
input:
ncbi2ucsc = getChrMap(SCRIPT_ROOT, "ncbi2ucsc"),
ucsc2ncbi = getChrMap(SCRIPT_ROOT, "ucsc2ncbi"),
vcf_file = lambda wildcards: getVcf(wildcards.rna, wildcards.vcf),
bam_file = lambda wildcards: parser.getFilePath(sampleId=wildcards.rna,
file_type='RNA_BAM_FILE'),
fasta = config['mae']['genome'],
dict = fasta_dict(config['mae']['genome']),
script = getScript("MAE", "ASEReadCounter.sh")
output:
counted = parser.getProcDataDir() + "/mae/allelic_counts/{vcf}--{rna}.csv.gz"
shell:
"""
{input.script} {input.ncbi2ucsc} {input.ucsc2ncbi}
{input.vcf_file} {input.bam_file} {wildcards.vcf}--{wildcards.rna}
{input.fasta} {config[mae][gatkIgnoreHeaderCheck]} {output.counted}
{config[tools][bcftoolsCmd]}
"""
QC
rule renameChrQC:
input:
ucsc2ncbi = getChrMap(SCRIPT_ROOT, "ucsc2ncbi"),
ncbi_vcf = getQC(format="UCSC")
output:
ncbi_vcf = getQC(format="NCBI")
shell:
"""
bcftools={config[tools][bcftoolsCmd]}
echo 'converting from UCSC to NCBI format'
$bcftools annotate --rename-chrs {input.ucsc2ncbi} {input.ncbi_vcf}
| bgzip > {output.ncbi_vcf}
$bcftools index -t {output.ncbi_vcf}
"""
rule allelic_counts_qc:
input:
ncbi2ucsc = getChrMap(SCRIPT_ROOT, "ncbi2ucsc"),
ucsc2ncbi = getChrMap(SCRIPT_ROOT, "ucsc2ncbi"),
vcf_file_ucsc = getQC(format="UCSC"),
vcf_file_ncbi = getQC(format="NCBI"),
bam_file = lambda wildcards: parser.getFilePath(sampleId=wildcards.rna,
file_type='RNA_BAM_FILE'),
fasta = config['mae']['genome'],
dict = fasta_dict(config['mae']['genome']),
script_qc = getScript("QC", "ASEReadCounter.sh"),
script_mae = getScript("MAE", "ASEReadCounter.sh")
output:
counted = parser.getProcDataDir() + "/mae/allelic_counts/qc_{rna}.csv.gz"
shell:
"""
{input.script_qc} {input.ncbi2ucsc} {input.ucsc2ncbi}
{input.vcf_file_ucsc} {input.vcf_file_ncbi} {input.bam_file}
{wildcards.rna} {input.fasta} {config[mae][gatkIgnoreHeaderCheck]}
{output.counted} {config[tools][bcftoolsCmd]}
{config[tools][samtoolsCmd]} {input.script_mae}
"""
rulegraph_filename = f'{config["htmlOutputPath"]}/{METHOD}_rulegraph'
rule produce_rulegraph:
input:
expand(rulegraph_filename + ".{fmt}", fmt=["svg", "png"])
rule create_graph:
output:
svg = f"{rulegraph_filename}.svg",
png = f"{rulegraph_filename}.png"
shell:
"""
snakemake --configfile {CONF_FILE} --rulegraph | dot -Tsvg > {output.svg}
snakemake --configfile {CONF_FILE} --rulegraph | dot -Tpng > {output.png}
"""
rule unlock:
output: touch(drop.getMethodPath(METHOD, type_="unlock"))
shell: "snakemake --unlock --configfile {CONF_FILE}"`
from drop.
The bugs are not from gatk. The bugs are from the mae drop scripts. There is an issue in some path setting and the unlock scripts. There is no feasible way to run mae.
There's a bunch of unlock errors that happen due to bugs in the mae scripts themselves. This is before any gatk is ever run:
Building DAG of jobs...
MissingInputException in line 116 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
Missing input files for rule allelic_counts_qc:
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/Scripts/QC/ASEReadCounter.sh
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/Scripts/MAE/ASEReadCounter.sh
[Fri Aug 14 13:11:34 2020]
Error in rule unlock:
jobid: 0
output: /gpfs/scratch/evrong01/droptest2/.drop/tmp/MAE/unlock
shell:
snakemake --unlock --configfile /gpfs/scratch/evrong01/droptest2/.drop/tmp/config.yaml
(exited with non-zero exit code)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.snakemake/log/2020-08-14T131129.648275.snakemake.log
from drop.
Yes, I know that error, but it's difficult to find out why it works sometimes breaks at other times.
Try creating the empty processed_data/mae
directory if it doesn't exist. That may solve the issue. Otherwise, try replacing the SCRIPT_ROOT
variable in the .drop/modules/mae-pipeline/Snakefile
to SCRIPT_ROOT = os.getcwd()
. Note that this modification will be overwritten every time you call drop update
from drop.
The bugs are not from gatk. The bugs are from the mae drop scripts. There is an issue in some path setting and the unlock scripts. There is no feasible way to run mae.
There's a bunch of unlock errors that happen due to bugs in the mae scripts themselves. This is before any gatk is ever run
The fact that Scripts_MAE_deseq_mae_R was called before it failed means that gatk was run. However errors in the gatk command don't get recognised by snakemake, so it only occurs when the broken output is read in R. That's why I was trying to get you to rerun the mae pipeline from scratch.
from drop.
Seeing as you are working with the demo and that we habe resolved the dependency issues, it might be easier to just redo the setup in a new empty directory, instead of trying to isolate the mae pipeline for now
from drop.
Neither of these things fixes the problem. Changing the SCRIPT_ROOT = os.getcwd() now gives this error when running snakemake unlock:
TypeError in line 34 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
unsupported operand type(s) for /: 'str' and 'str'
File "/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile", line 64, in
File "/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile", line 34, in getChrMap
from drop.
How do I redo the setup in a new empty directory without having rerun all the other pipelines, which takes 3 days of very significant compute resources?
from drop.
It takes 3 days to execute the demo? It should only take about 20min. And the total input is just abt 600MB, so total memory consumption shouldn't be high. What are your resources, what system are you working on?
from drop.
I ran drop for 110 samples, not the demo. Is there any way to transfer all the results of the other pipelines (except for mae) to the new directory?
from drop.
Try to run drop on a new instance of drop demo to see if the problem still occurs first. Right now I don't have any good solution why the code suddenly breaks, as it worked before
from drop.
I tried now in a new directory-- drop demo works. Deleting the mae directory in root for some reason causes the directory to no longer work, and drop update doesn't fix it. And unlock commands also stop working for mae.
But because the mae pipeline sometimes crashes and then needs to be reset, there should be a way to reset the mae pipeline to the beginning.
In the meantime, let me know if there is a way to move results from a prior directory to a new directory, except for mae.
from drop.
OK, good to know that that's a way we can fix the issue. You should definitely keep a copy of the output you have at the moment, before you have a full pipeline run. Also be careful if you save in a scratch
directory, as these are often used for temporary data.
You can simply copy the output to the new directory, run a drop init
and adapt the config.yaml
. Always double-check with a dryrun snakemake -n
before running the pipeline fully. In order to prevent the pipeline from overwriting your output, call snakemake --touch
. This should update the update the timestamps of all the necessary and existsing output files. If snakemake -n
still tells you to redo the steps, there might be some upstream input missing in the output.
from drop.
Thanks. Which specific directories do I need to copy from the old directory to the new directory?
from drop.
Create the new project first, call snakemake -n
then copy the files. The computationally expensive files are in processed_data/
and processed_results/
directories. These should be autocreated empty directories in your new project directory, so make sure they exist before. Copy the directories of each submodule separately
processed_data/aberrant_expression/
processed_data/aberrant_splicing
processed_results/...
etc
and don't touch the mae
directory.
from drop.
I made a new directory, then I did drop init, then drop update, then I copied the config.yaml and samples.tsv file, then snakemake -n. It ran fine. However, there is no root directory. I only have these files in the directory...
-rw-rw---- 1 evrong01 evrong01 1.8K Aug 15 15:54 config.yaml
-rw-rw---- 1 evrong01 evrong01 0 Aug 15 15:54 readme.md
-rwxrwx--- 1 evrong01 evrong01 952 Aug 15 15:55 runpipeline.sh
-rw-rw---- 1 evrong01 evrong01 179K Aug 15 15:54 samples.tsv
drwxrwx--- 6 evrong01 evrong01 4.0K Aug 15 15:54 Scripts
-rw-rw---- 1 evrong01 evrong01 2.9K Aug 15 15:54 Snakefile
from drop.
Now I created the root directory manually. Then I copied all the processed_data and processed_results except for mae. Then I ran snakemake --touch and it ran fine. Then I ran snakemake -n and it ran fine.
But now I am running snakemake unlock again, and again I'm getting the same error. There must be some bug in the mae code.
Building DAG of jobs...
MissingInputException in line 38 of /gpfs/home/evrong01/.local/lib/python3.6/site-packages/wbuild/wBuild.snakefile:
Missing input files for rule markdown:
MAE/UDP--v32_results.md
[Sat Aug 15 16:02:50 2020]
Error in rule unlock:
jobid: 0
output: /gpfs/scratch/evrong01/droptest3/.drop/tmp/MAE/unlock
shell:
snakemake --unlock --configfile /gpfs/scratch/evrong01/droptest3/.drop/tmp/config.yaml
(exited with non-zero exit code)
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/droptest3/.drop/modules/mae-pipeline/.snakemake/log/2020-08-15T160249.624667.snakemake.log
from drop.
Also, running 'snakemake aberrantExpression' finishes and says "Nothing to be done.", but it didn't recreate the html_output directory.
So just moving the processed_data and processed_results files to the new directory doesn't help, because then it doesn't recreate the html_output directory. I guess I can just copy those over too.
But still the problem is I can't get mae to run. Unlock doesn't work, and there is no way around it.
from drop.
Also, if I run snakemake mae without doing unlock in the new directory, it also says "Nothing to be done", even though there is no mae results in the folder. I suspect that snakemake --touch makes it think that mae is already finished.
from drop.
And if I run snakemake mae without doing unlock in the old directory, I get this error. Maybe this is where the bug is.
Subworkflow AE: Nothing to be done.
Subworkflow AS: Nothing to be done.
Executing subworkflow MAE.
Structuring dependencies...
Dependencies file generated.
TypeError in line 34 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
unsupported operand type(s) for /: 'str' and 'str'
File "/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile", line 64, in
File "/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile", line 34, in getChrMap
from drop.
In the new directory, I manually deleted the two MAE.done files and now mae seems to run. However, it hit another new error:
Error in eval(jsub, SDenv, parent.frame()) :
object 'gene_status' not found
Calls: [ -> [.data.table -> eval -> eval
Execution halted
[Sat Aug 15 16:19:32 2020]
Error in rule Scripts_MAE_gene_name_mapping_R:
jobid: 14
output: /gpfs/scratch/evrong01/droptest3/root/processed_data/mae/gene_name_mapping_v32.tsv
from drop.
I think things will just get too messy from here. I'd suggest that you try to use the version of drop that I'm still developing on, as it will make reruns so much easier. The output structure of the HTML will be a bit different and it will still miss some features, but the pipeline core should be fully functional.
If you already have a local clone of the drop repo set a new remote to [email protected]:mumichae/drop.git
git add remote -n mumichae [email protected]:mumichae/drop.git
Otherwise just clone the above URL.
Then checkout import_counts
git checkout import_counts
Next, you need to install drop to your current drop conda environment (as you already have all the dependencies).
Also, you need to use a different version of wbuild
as I am developing there as well. So you need to remove the current version wbuild first.
conda remove wbuild
pip install -e <root_of_drop_repo> # should install the correct version of wbuild
Let me know once you have installed the new version and whether you can get the drop demo
running with dryrun and proper execution. Then we can proceed.
from drop.
I'm getting an installation error:
(/gpfs/home/evrong01/bin/drop_conda) [evrong01@bigpurple-ln2 drop_dev]$ pip install -e .
Traceback (most recent call last):
File "/gpfs/home/evrong01/bin/drop_conda/bin/pip", line 6, in
from pip._internal.cli.main import main
ModuleNotFoundError: No module named 'pip._internal.cli.main'
from drop.
Seems as though your pip is not working. It could be that a wrong version is referenced (seeing as it has worked before). Make sure you are using the pip provided by anaconda. conda list
should definitely contain pip. Its version should match with pip --version
from drop.
conda remove wbuild broke pip. I reinstalled pip and then managed to install the alternate drop version.
snakemake -n gives this next error:
The downloaded source packages are in
‘/tmp/RtmpkQB0PU/downloaded_packages’
Installation path not writeable, unable to update packages: AnnotationDbi,
AnnotationHub, BiocFileCache, bit, bit64, broom, data.table, DelayedArray,
dplyr, DT, ExperimentHub, fs, GenomicFeatures, Hmisc, httr, loo, mice,
pillar, pkgbuild, ps, quantreg, RcppArmadillo, Rhdf5lib, rlang, rstan,
StanHeaders, SummarizedExperiment, sys, tibble, tidyr, vctrs, xfun, XML
installed OUTRIDER
install c-mertes/FRASER
Bioconductor version 3.11 (BiocManager 1.30.10), R 4.0.0 (2020-04-24)
Installing github package(s) 'c-mertes/FRASER'
Error: package 'remotes' not installed in library path(s)
/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0
/gpfs/share/apps/R/4.0.0/lib64/R/library
install with 'install("remotes")'
Execution halted
CalledProcessError in line 5 of /gpfs/scratch/evrong01/dropdemo/Snakefile:
Command '['Rscript', PosixPath('/gpfs/data/bin/drop_dev/drop/installRPackages.R'), PosixPath('/gpfs/data/bin/drop_dev/drop/requirementsR.txt')]' returned non-zero exit status 1.
File "/gpfs/scratch/evrong01/dropdemo/Snakefile", line 5, in
File "/gpfs/data/bin/drop_dev/drop/setupDrop.py", line 33, in installRPackages
File "/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/subprocess.py", line 369, in check_returncode
I went into R and manually installed remotes. Then I get this next error:
- installing source package ‘Rhdf5lib’ ...
** using staged installation
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether we are using the GNU C++ compiler... yes
checking whether g++ -std=gnu++11 accepts -g... yes
checking whether C++ compiler accepts -w... yes
checking how to run the C++ preprocessor... g++ -std=gnu++11 -E
checking for grep that handles long lines and -e... /usr/bin/grep
checking for egrep... /usr/bin/grep -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for zlib.h... yes
checking curl/curl.h usability... yes
checking curl/curl.h presence... yes
checking for curl/curl.h... yes
checking openssl/evp.h usability... yes
checking openssl/evp.h presence... yes
checking for openssl/evp.h... yes
checking openssl/hmac.h usability... yes
checking openssl/hmac.h presence... yes
checking for openssl/hmac.h... yes
checking openssl/sha.h usability... yes
checking openssl/sha.h presence... yes
checking for openssl/sha.h... yes
checking for curl_global_init in -lcurl... no
checking for EVP_sha256 in -lcrypto... yes
untarring hdf5small_cxx_hl_1.10.6.tar.gz ...
building the szip library...
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking for config x86_64-unknown-linux-gnu... no
checking for config x86_64-unknown-linux-gnu... no
checking for config unknown-linux-gnu... no
checking for config unknown-linux-gnu... no
checking for config x86_64-linux-gnu... no
checking for config x86_64-linux-gnu... no
checking for config x86_64-unknown... no
checking for config linux-gnu... found
compiler 'gcc' is GNU gcc-6.3.0
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking for style of include used by make... GNU
checking dependency style of gcc... gcc3
checking how to run the C preprocessor... /gpfs/home/evrong01/bin/drop_conda/bin/x86_64-conda_cos6-linux-gnu-cpp
configure: error: in/tmp/RtmpxYUhev/R.INSTALL7409582d709f/Rhdf5lib/src/hdf5/szip': configure: error: C preprocessor "/gpfs/home/evrong01/bin/drop_conda/bin/x86_64-conda_cos6-linux-gnu-cpp" fails sanity check See
config.log' for more details
make: *** No targets specified and no makefile found. Stop.
make: *** No rule to make targetinstall'. Stop. building the hdf5 library... checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /usr/bin/mkdir -p checking for gawk... gawk checking whether make sets $(MAKE)... yes checking whether make supports nested variables... yes checking whether make supports nested variables... (cached) yes checking whether to enable maintainer-specific portions of Makefiles... no checking build system type... x86_64-unknown-linux-gnu checking host system type... x86_64-unknown-linux-gnu checking shell variables initial values... done checking if basename works... yes checking if xargs works... yes checking for cached host... none checking for config x86_64-unknown-linux-gnu... no checking for config x86_64-unknown-linux-gnu... no checking for config unknown-linux-gnu... no checking for config unknown-linux-gnu... no checking for config x86_64-linux-gnu... no checking for config x86_64-linux-gnu... no checking for config x86_64-unknown... no checking for config linux-gnu... found compiler 'gcc' is GNU gcc-6.3.0 compiler 'g++ -std=gnu++11' is GNU g++-6.3.0 checking for config ./config/site-specific/host-bigpurple-ln2... no checking build mode... production checking for gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking whether gcc understands -c and -o together... yes checking for style of include used by make... GNU checking dependency style of gcc... gcc3 checking if unsupported combinations of configure options are allowed... no checking how to run the C preprocessor... /gpfs/home/evrong01/bin/drop_conda/bin/x86_64-conda_cos6-linux-gnu-cpp configure: error: in
/tmp/RtmpxYUhev/R.INSTALL7409582d709f/Rhdf5lib/src/hdf5':
configure: error: C preprocessor "/gpfs/home/evrong01/bin/drop_conda/bin/x86_64-conda_cos6-linux-gnu-cpp" fails sanity check
See `config.log' for more details
sh configure
configure: error: cannot find sources (src/H5.c) in /gpfs/share/apps/samtools/1.9/bcftools-1.9 or ..
make: *** [_config] Error 1
configure: HDF5_INCLUDE=hdf5/src
configure: HDF5_CXX_INCLUDE=hdf5/c++/src
configure: HDF5_HL_INCLUDE=hdf5/hl/src
configure: HDF5_HL_CXX_INCLUDE=hdf5/hl/c++/src
configure: HDF5_LIB=hdf5/src/.libs/libhdf5.a
configure: HDF5_CXX_LIB=hdf5/c++/src/.libs/libhdf5_cpp.a
configure: HDF5_HL_LIB=hdf5/hl/src/.libs/libhdf5_hl.a
configure: HDF5_HL_CXX_LIB=hdf5/hl/c++/src/.libs/libhdf5_hl_cpp.a
configure: SZIP_LIB=hdf5/szip/src/.libs/libsz.a
configure: creating ./config.status
config.status: creating src/Makevars
** libs
mkdir -p "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/include"
cp "hdf5/src/".h "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/include"
cp "hdf5/c++/src/".h "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/include"
cp "hdf5/hl/src/".h "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/include"
cp "hdf5/hl/c++/src/".h "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/include"
mkdir -p "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/lib/"
cp "hdf5/src/.libs/libhdf5.a" "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/lib/"
cp: cannot stat ‘hdf5/src/.libs/libhdf5.a’: No such file or directory
make: *** [copying] Error 1
ERROR: compilation failed for package ‘Rhdf5lib’ - removing ‘/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/Rhdf5lib’
Error: Failed to install 'FRASER' from GitHub:
(converted from warning) installation of package ‘Rhdf5lib’ had non-zero exit status
Execution halted
CalledProcessError in line 5 of /gpfs/scratch/evrong01/dropdemo/Snakefile:
Command '['Rscript', PosixPath('/gpfs/data/bin/drop_dev/drop/installRPackages.R'), PosixPath('/gpfs/data/bin/drop_dev/drop/requirementsR.txt')]' returned non-zero exit status 1.
File "/gpfs/scratch/evrong01/dropdemo/Snakefile", line 5, in
File "/gpfs/data/bin/drop_dev/drop/setupDrop.py", line 33, in installRPackages
File "/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/subprocess.py", line 369, in check_returncode
from drop.
Have you considered distributing drop as a docker? It might solve all of these installation/configuration issues.
from drop.
yes, we have a docker https://github.com/c-mertes/docker_drop, but it uses the old version of drop and we would have to take the same steps to update it. You can try to use that instead but you'll need to update drop, as the bugs you are experiencing are still there. Just note that it will need a considerable amount of storage (15GB compressed). I haven't managed to test it on my machine yet due to limited storage though, so I can only help you with the issues that are not related to docker.
As for your error message, it seems as though you aren't accessing the conda environment properly, as you still have other versions of R that you are accessing. In conda, you shouldn't be reinstalling the R packages, as they should already exist. Maybe check your PATH variable echo $PATH
as it could set Locations to R and python on top of conda. And make sure you are using the conda R version, not any other local one.
You can decide which environment to use. For docker, note that you'll have to mount your data in the specific folder structure as described in the README https://github.com/c-mertes/docker_drop. I'm not sure if using your old output data works, but you can definitely give it a try.
from drop.
Are you updated Docker-drop regularly on https://github.com/c-mertes/docker_drop? Does it have the most up to date version of drop?
When I uninstalled wbuild, it messed up the conda environment. It removed and switched many dependencies and for some reason caused R to get uninstalled. I'm not sure why.
This is why in my opinion dockers are better. Conda is in general a buggy system, because it is not completely disconnected from the local environment and therefore it is not good at handling complicated dependencies. Even though docker requires more memory, it will save users the headache of trying to get the pipeline working.
from drop.
I fixed the R PATH issue for the conda environment. But snakemake -n for the demo still gave an error:
Warning messages:
1: In install.packages(...) :
installation of package ‘BiocFileCache’ had non-zero exit status
2: In install.packages(...) :
installation of package ‘biomaRt’ had non-zero exit status
3: In install.packages(...) :
installation of package ‘GenomicFeatures’ had non-zero exit status
4: In install.packages(...) :
installation of package ‘VariantAnnotation’ had non-zero exit status
5: In INSTALL(packages[i, 1]) :
installation of package ‘ggplot2’ had non-zero exit status
AttributeError in line 12 of /gpfs/scratch/evrong01/droptest3/Snakefile:
'Config' object has no attribute 'getConfig'
File "/gpfs/scratch/evrong01/droptest3/Snakefile", line 12, in
File "/gpfs/data/bin/drop_dev/drop/config/DropConfig.py", line 26, in init
I tried to manually installed these and looks like the error is because dplyr and associated packages were not installed. It gives an option to install it, so it must be that when DROP runs the demo it doesn't answer 'yes' to install dependencies such as dplyr.
from drop.
I succeeded in getting all the packages installed. Now I'm getting this error for the demo:
droptest3]$ snakemake -n
check for missing R packages
AttributeError in line 12 of /gpfs/scratch/evrong01/droptest3/Snakefile:
'Config' object has no attribute 'getConfig'
File "/gpfs/scratch/evrong01/droptest3/Snakefile", line 12, in
File "/gpfs/data/evronylab/bin/drop_dev/drop/config/DropConfig.py", line 26, in init
from drop.
That's a bit surprising, as you had all the dependencies before and wbuild doesn't depend on any R packages, so removing wbuild shouldn't remove any R dependencies. Are you still using the correct R library path .libPaths()
? You need to make sure that the standard packages are running properly in R (and don't have to be reinstalled, as they should already be present in conda). Otherwise you will get errors later on.
If you want to use the docker instead, you only need to do download the developer version of drop (as you did before on your local machine) and remove wbuild using pip, as described below. Then you can reinstall the newest drop version from the local repo. Don't forget to prepare and mount your input data as described in the README.
Updating DROP and wbuild
The error you got is what we want, as you are now using the new drop version. Unfortunately wbuild doesn't seem to have been uninstalled properly so try with
pip uninstall wbuild
then conda list
should not contain wbuild anymore.
Continue with reinstalling drop
pip install -e <path-to-drop>
And make sure wbuild is installed. In case that doesn't work, try pip uninstall drop
before reinstalling it.
Then you should have the correct dependencies to get a successful dryrun.
from drop.
For some reason now gatk is not being found in my conda environment. It was there before. I think it's too complicated. I'm not sure what to try next. If there is a simpler way to setup the drop environment, I'm happy to try, but conda seems complicated. If you have a docker that is built and ready to go, I'm happy to try that.
from drop.
It seems as though you aren't properly accessing your conda environment. You can verify with conda env list
to see which one you're in.
If you are working with docker, just follow the instructions in https://github.com/c-mertes/docker_drop and then follow my previous instructions on updating to the developer version of drop ([email protected]:mumichae/drop.git, branch import_counts
) and wbuild ([email protected]:mumichae/wBuild.git branch subindex
). As they are still work in progress, there is no dedicated docker container for that version.
from drop.
Thanks. To make it more simple for users, do you have a docker that is already built and confirmed to work on docker hub?
from drop.
Well yes, we have a prebuilt docker container that os about 15GB in compressed size. You don't need to rebuild bit just run it as described in the README and it'll be downloaded from the mertes/drop repository (which will take time)
docker run --rm --network=host -ti mertes/drop
for writing the demo or if you are mounting your own data:
docker run --rm -ti -v ... etc.
It has the old version of drop, so you'll need to manually update to use any github developer version (what I explained before).
Does that answer your question?
from drop.
Ok, I think to keep things simple I will just wait until a more stable version of drop is released with the above issues resolved. I would appreciate if you let me know once it is released. Thanks.
from drop.
Hi, We have a new RNA-seq sample from a syndromic family that we would like to try on DROP. Are any of the new versions of DROP available yet? Thanks.
from drop.
Related Issues (20)
- Running pipeline offline in trusted research environemnt HOT 1
- lymphoblastoid cell lines datasets of gene counts
- Error running aberrantSplicing HOT 2
- Error in AberrantSplicing_pipeline_FRASER_04_fit_hyperparameters_FraseR_R HOT 3
- Error in MAE QC create matrix dna rna cor HOT 3
- CalledProcessError in installRPackages.R HOT 5
- Default running folder is out of space HOT 2
- Chunk options `#+echo` not correctly parsed HOT 1
- Incompatible with `Snakemake>=8` HOT 2
- Writing `rds` files as log can crash Snakemake execution HOT 1
- Error in h(simpleError(msg, call)) HOT 8
- requirementsR.txt referencing HEAD leads to irreproducibility / pipeline breaking HOT 3
- Problem running DROP HOT 2
- Annotation file asks for columns that shouldn't be needed HOT 2
- Error in rule AberrantSplicing_pipeline_Counting_01_1_countRNA_splitReads_samplewise_R HOT 10
- Pipeline fails with no significant results (AberrantSplicing_pipeline_FRASER_08_extract_results_FraseR_R) HOT 1
- Error in rule AberrantSplicing_pipeline_Counting_01_1_countRNA_splitReads_samplewise_R HOT 2
- Pipeline FAILS when specifying subsets of genes to test HOT 1
- useNames = NA is defunct HOT 4
- conda setup using yaml doesn't work HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from drop.