Code Monkey home page Code Monkey logo

Comments (56)

gevro avatar gevro commented on July 17, 2024

Hi, Any update about this bug? Thanks.

from drop.

mumichae avatar mumichae commented on July 17, 2024

Hi,

that is indeed a bug, thanks for pointing that out, we will be fixing it soon. The issue is that the point/period in the filename causes the output filename to be truncated early.

For the moment, you can create a symlink to your fasta file (so you don't have to copy the file) that does not contain any point characters before the .fa ending. In your case it should work with

ln -s /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38.primary_assembly.genome.fa /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38_primary_assembly_genome.fa

and replacing the corresponding entry in the config to

mae:
  genome: /gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38_primary_assembly_genome.fa

Hope this solves the problem for now.

from drop.

gevro avatar gevro commented on July 17, 2024

I tried, but still getting errors...

MissingInputException in line 38 of /gpfs/home/evrong01/.local/lib/python3.6/site-packages/wbuild/wBuild.snakefile:
Missing input files for rule markdown:
MAE/UDP--v32_results.md
[Thu Aug 13 00:55:45 2020]
Error in rule unlock:
jobid: 0
output: /gpfs/scratch/evrong01/droptest2/.drop/tmp/MAE/unlock
shell:
snakemake --unlock --configfile /gpfs/scratch/evrong01/droptest2/.drop/tmp/config.yaml
(exited with non-zero exit code)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.snakemake/log/2020-08-13T005543.671615.snakemake.log
check for missing R packages
Structuring dependencies...
Dependencies file generated.

Building DAG of jobs...
Subworkflow AE: Nothing to be done.
Subworkflow AS: Nothing to be done.
Executing subworkflow MAE.
Structuring dependencies...
Dependencies file generated.

Building DAG of jobs...
MissingInputException in line 56 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
Missing input files for rule create_dict:
/gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38_primary_assembly.genome.fa

from drop.

mumichae avatar mumichae commented on July 17, 2024

Make sure that you specify the filename correctly, the fasta file doesn't seem to exist. Also, remove the last dot before the file ending from

/gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38_primary_assembly.genome.fa

to

/gpfs/data/reference-files/GRCh38_gencode-STAR/GRCh38_primary_assembly_genome.fa

otherwise the error will persist.

If the unlock us giving you problems, try deleting the .snakemake directory in the project directory.

from drop.

gevro avatar gevro commented on July 17, 2024

Next bug I'm getting. This is with the conda environment...

Error: package or namespace load failed for ‘tMAE’:
package ‘tMAE’ was installed before R 4.0.0: please re-install it
Execution halted

from drop.

mumichae avatar mumichae commented on July 17, 2024

Which R version are you using?

which R

It should be the one in your conda environment

from drop.

gevro avatar gevro commented on July 17, 2024

I'm using the drop conda environment. So if there is an issue, it is because of an issue in the drop conda environment configuration.

R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-conda_cos6-linux-gnu (64-bit)

from drop.

mumichae avatar mumichae commented on July 17, 2024

Hm, seems as though R might access different library paths. Which paths do you get when you call

.libPaths()

in R in the environment? You should only have a single path that links to the conda environment.
If it has more than one path, we could have some issues reinstalling tMAE.

Also, which Bioconductor version do you have?

BiocManager::version()

from drop.

gevro avatar gevro commented on July 17, 2024

Ok, the libPaths was sourcing my regular R first instead of the conda R. This is a bug in conda. The point of conda is to have a separate environment. However, conda still sources the original R library paths from the root user. See here for details:
https://waoverholt.com/conda-and-R/

The solution is:
In your conda package, add to here: drop_conda/etc/conda/activate.d/activate-r-base.sh

this line: export R_LIBS=DROPCONDA_ENVIRONMENT/lib/R/library
where DROPCONDA_ENVIRONMENT is the path to the conda environment

from drop.

gevro avatar gevro commented on July 17, 2024

Next bug in mae conda. How do I fix this?

Started with deseq
[1] "Running DESeq..."
Error in DESeqDataSet(se, design = design, ignoreRank) :
counts matrix should be numeric, currently it has mode: logical
Calls: DESeq4MAE ... deseq_for_allele_specific_expression -> DESeqDataSetFromMatrix -> DESeqDataSet
Execution halted
[Fri Aug 14 12:31:02 2020]
Error in rule Scripts_MAE_deseq_mae_R:
jobid: 9
output: /gpfs/scratch/evrong01/droptest2/root/processed_results/mae/samples/1841982--UDP-1003_RNA_res.Rds

from drop.

mumichae avatar mumichae commented on July 17, 2024

Thanks for finding the issue. For some reason, we I don't get this behaviour on my machine, despite having a system installation of R, will need some time to figure put what is going on.

from drop.

mumichae avatar mumichae commented on July 17, 2024

Next bug in mae conda. How do I fix this?

Started with deseq
[1] "Running DESeq..."
Error in DESeqDataSet(se, design = design, ignoreRank) :
counts matrix should be numeric, currently it has mode: logical
Calls: DESeq4MAE ... deseq_for_allele_specific_expression -> DESeqDataSetFromMatrix -> DESeqDataSet
Execution halted
[Fri Aug 14 12:31:02 2020]
Error in rule Scripts_MAE_deseq_mae_R:
jobid: 9
output: /gpfs/scratch/evrong01/droptest2/root/processed_results/mae/samples/1841982--UDP-1003_RNA_res.Rds

This error is most likely due to a failed GATK run. Try removing all the MAE output ($ROOT/processed_data where $ROOT is the project root path specified in config.yaml) before rerunning the pipeline.

from drop.

gevro avatar gevro commented on July 17, 2024

Ok I will try removing failed MAE output. Why doesn't snakemake or drop detect this automatically and rerun? This should be fixed.

from drop.

gevro avatar gevro commented on July 17, 2024

Deleting the mae directory in the root/processed_results folder gives the prior bug/error:
MissingInputException in line 38 of /gpfs/home/evrong01/.local/lib/python3.6/site-packages/wbuild/wBuild.snakefile:
Missing input files for rule markdown:
MAE/UDP--v32_results.md
[Fri Aug 14 12:53:06 2020]
Error in rule unlock:
jobid: 0
output: /gpfs/scratch/evrong01/droptest2/.drop/tmp/MAE/unlock
shell:
snakemake --unlock --configfile /gpfs/scratch/evrong01/droptest2/.drop/tmp/config.yaml
(exited with non-zero exit code)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.snakemake/log/2020-08-14T125304.030683.snakemake.log

from drop.

gevro avatar gevro commented on July 17, 2024

It also gives this error:

Building DAG of jobs...
MissingInputException in line 62 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
Missing input files for rule create_SNVs:
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
[Fri Aug 14 12:51:59 2020]
Error in rule unlock:
jobid: 0
output: /gpfs/scratch/evrong01/droptest2/.drop/tmp/MAE/unlock
shell:
snakemake --unlock --configfile /gpfs/scratch/evrong01/droptest2/.drop/tmp/config.yaml
(exited with non-zero exit code)

from drop.

gevro avatar gevro commented on July 17, 2024

In general, mae is a lot more buggy than the other pipelines. There must be something in its foundation that causes it to have so many problems compared to the other drop pipelines.

from drop.

gevro avatar gevro commented on July 17, 2024

This directory that it complains about doesn't even exist...
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/

I tried 'drop update' and it still doesn't work.

So the suggestion of deleting $ROOT/processed_data/mae broke the pipeline. I'm not sure how to fix it now.

from drop.

gevro avatar gevro commented on July 17, 2024

Sorry, the mae-pipeline folder exists, but the mae-pipeline/.drop folder does not exist:

[evrong01@bigpurple-ln1 droptest2]$ ls -a /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/
. .. mae_readme.md resource Scripts Snakefile .snakemake

So this error points to files that do not exist. This is definitely a bug in the underlying code. The pipeline has path names that don't exist. This is what causes the unlock commands to have errors every time. I suspect the unlock commmand has several bugs.

MissingInputException in line 62 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
Missing input files for rule create_SNVs:
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt

from drop.

mumichae avatar mumichae commented on July 17, 2024

Can you print the contents of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile ? There might be a command that doesn't always give the same result for some reason. I'd just fix it manually for now.

Yes MAE is a bit buggy, as GATK commands don't error in the same way as other commands, and require unlocking, which doesn't work as it should for submodules. We are still working on these issues on the new release. It still takes time before we establish that main features are tested more systematically, so we don't run into the same reproducibility issues as we do now.

from drop.

gevro avatar gevro commented on July 17, 2024

I think the unlock command or drop update have a bug in how it sets the base folder location, because it looks like it has the similar folder path twice:
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh

= /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline + .drop/modules/mae-pipeline + Scripts/MAE/filterSNVs.sh

But that is the wrong path. The correct path is: /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Scripts/MAE/filterSNVs.sh

from drop.

gevro avatar gevro commented on July 17, 2024

Here is the mae Snakefile
`### SNAKEFILE MONOALLELIC EXPRESSION
import os
import drop
import pathlib

METHOD = 'MAE'
SCRIPT_ROOT = drop.getMethodPath(METHOD, type_='workdir', str_=False)
CONF_FILE = drop.getConfFile()

parser = drop.config(config, METHOD)
config = parser.parse()
include: config['wBuildPath'] + "/wBuild.snakefile"

FUNCTIONS

def fasta_dict(fasta_file):
return fasta_file.split('.')[0] + ".dict"

def getVcf(rna_id, vcf_id="qc"):
if vcf_id == "qc":
return config["mae"]["qcVcf"]
else:
return parser.getProcDataDir() + f"/mae/snvs/{vcf_id}--{rna_id}.vcf.gz"

def getQC(format):
if format == "UCSC":
return config["mae"]["qcVcf"]
elif format == "NCBI":
return parser.getProcDataDir() + "/mae/qc_vcf_ncbi.vcf.gz"
else:
raise ValueError(f"getQC: {format} is an invalid chromosome format")

def getChrMap(SCRIPT_ROOT, conversion):
if conversion == 'ncbi2ucsc':
return SCRIPT_ROOT/"resource"/"chr_NCBI_UCSC.txt"
elif conversion == 'ucsc2ncbi':
return SCRIPT_ROOT/"resource"/"chr_UCSC_NCBI.txt"
else:
raise ValueError(f"getChrMap: {conversion} is an invalid conversion option")

def getScript(type, name):
return SCRIPT_ROOT/"Scripts"/type/name

rule all:
input:
rules.Index.output,
config["htmlOutputPath"] + "/mae_readme.html",
rules.Scripts_MAE_Datasets_R.output,
rules.Scripts_QC_Datasets_R.output
output: touch(drop.getMethodPath(METHOD, type_='final_file'))

rule sampleQC:
input: rules.Scripts_QC_Datasets_R.output
output: touch(drop.getTmpDir() + "/sampleQC.done")

rule create_dict:
input: config['mae']['genome']
output: fasta_dict(config['mae']['genome'])
shell: "gatk CreateSequenceDictionary --REFERENCE {input[0]}"

MAE

rule create_SNVs:
input:
ncbi2ucsc = getChrMap(SCRIPT_ROOT, "ncbi2ucsc"),
ucsc2ncbi = getChrMap(SCRIPT_ROOT, "ucsc2ncbi"),
vcf_file = lambda wildcards: parser.getFilePath(sampleId=wildcards.vcf,
file_type='DNA_VCF_FILE'),
bam_file = lambda wildcards: parser.getFilePath(sampleId=wildcards.rna,
file_type='RNA_BAM_FILE'),
script = getScript("MAE", "filterSNVs.sh")
output:
snvs_filename=parser.getProcDataDir() + "/mae/snvs/{vcf}--{rna}.vcf.gz",
snvs_index=parser.getProcDataDir() + "/mae/snvs/{vcf}--{rna}.vcf.gz.tbi"
shell:
"""
{input.script} {input.ncbi2ucsc} {input.ucsc2ncbi} {input.vcf_file}
{wildcards.vcf} {input.bam_file} {output.snvs_filename}
{config[tools][bcftoolsCmd]} {config[tools][samtoolsCmd]}
"""

rule allelic_counts:
input:
ncbi2ucsc = getChrMap(SCRIPT_ROOT, "ncbi2ucsc"),
ucsc2ncbi = getChrMap(SCRIPT_ROOT, "ucsc2ncbi"),
vcf_file = lambda wildcards: getVcf(wildcards.rna, wildcards.vcf),
bam_file = lambda wildcards: parser.getFilePath(sampleId=wildcards.rna,
file_type='RNA_BAM_FILE'),
fasta = config['mae']['genome'],
dict = fasta_dict(config['mae']['genome']),
script = getScript("MAE", "ASEReadCounter.sh")
output:
counted = parser.getProcDataDir() + "/mae/allelic_counts/{vcf}--{rna}.csv.gz"
shell:
"""
{input.script} {input.ncbi2ucsc} {input.ucsc2ncbi}
{input.vcf_file} {input.bam_file} {wildcards.vcf}--{wildcards.rna}
{input.fasta} {config[mae][gatkIgnoreHeaderCheck]} {output.counted}
{config[tools][bcftoolsCmd]}
"""

QC

rule renameChrQC:
input:
ucsc2ncbi = getChrMap(SCRIPT_ROOT, "ucsc2ncbi"),
ncbi_vcf = getQC(format="UCSC")
output:
ncbi_vcf = getQC(format="NCBI")
shell:
"""
bcftools={config[tools][bcftoolsCmd]}
echo 'converting from UCSC to NCBI format'
$bcftools annotate --rename-chrs {input.ucsc2ncbi} {input.ncbi_vcf}
| bgzip > {output.ncbi_vcf}
$bcftools index -t {output.ncbi_vcf}
"""

rule allelic_counts_qc:
input:
ncbi2ucsc = getChrMap(SCRIPT_ROOT, "ncbi2ucsc"),
ucsc2ncbi = getChrMap(SCRIPT_ROOT, "ucsc2ncbi"),
vcf_file_ucsc = getQC(format="UCSC"),
vcf_file_ncbi = getQC(format="NCBI"),
bam_file = lambda wildcards: parser.getFilePath(sampleId=wildcards.rna,
file_type='RNA_BAM_FILE'),
fasta = config['mae']['genome'],
dict = fasta_dict(config['mae']['genome']),
script_qc = getScript("QC", "ASEReadCounter.sh"),
script_mae = getScript("MAE", "ASEReadCounter.sh")
output:
counted = parser.getProcDataDir() + "/mae/allelic_counts/qc_{rna}.csv.gz"
shell:
"""
{input.script_qc} {input.ncbi2ucsc} {input.ucsc2ncbi}
{input.vcf_file_ucsc} {input.vcf_file_ncbi} {input.bam_file}
{wildcards.rna} {input.fasta} {config[mae][gatkIgnoreHeaderCheck]}
{output.counted} {config[tools][bcftoolsCmd]}
{config[tools][samtoolsCmd]} {input.script_mae}
"""

rulegraph_filename = f'{config["htmlOutputPath"]}/{METHOD}_rulegraph'
rule produce_rulegraph:
input:
expand(rulegraph_filename + ".{fmt}", fmt=["svg", "png"])

rule create_graph:
output:
svg = f"{rulegraph_filename}.svg",
png = f"{rulegraph_filename}.png"
shell:
"""
snakemake --configfile {CONF_FILE} --rulegraph | dot -Tsvg > {output.svg}
snakemake --configfile {CONF_FILE} --rulegraph | dot -Tpng > {output.png}
"""

rule unlock:
output: touch(drop.getMethodPath(METHOD, type_="unlock"))
shell: "snakemake --unlock --configfile {CONF_FILE}"`

from drop.

gevro avatar gevro commented on July 17, 2024

The bugs are not from gatk. The bugs are from the mae drop scripts. There is an issue in some path setting and the unlock scripts. There is no feasible way to run mae.

There's a bunch of unlock errors that happen due to bugs in the mae scripts themselves. This is before any gatk is ever run:

Building DAG of jobs...
MissingInputException in line 116 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
Missing input files for rule allelic_counts_qc:
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/Scripts/QC/ASEReadCounter.sh
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_NCBI_UCSC.txt
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/resource/chr_UCSC_NCBI.txt
/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.drop/modules/mae-pipeline/Scripts/MAE/ASEReadCounter.sh
[Fri Aug 14 13:11:34 2020]
Error in rule unlock:
jobid: 0
output: /gpfs/scratch/evrong01/droptest2/.drop/tmp/MAE/unlock
shell:
snakemake --unlock --configfile /gpfs/scratch/evrong01/droptest2/.drop/tmp/config.yaml
(exited with non-zero exit code)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/.snakemake/log/2020-08-14T131129.648275.snakemake.log

from drop.

mumichae avatar mumichae commented on July 17, 2024

Yes, I know that error, but it's difficult to find out why it works sometimes breaks at other times.

Try creating the empty processed_data/mae directory if it doesn't exist. That may solve the issue. Otherwise, try replacing the SCRIPT_ROOT variable in the .drop/modules/mae-pipeline/Snakefile to SCRIPT_ROOT = os.getcwd(). Note that this modification will be overwritten every time you call drop update

from drop.

mumichae avatar mumichae commented on July 17, 2024

The bugs are not from gatk. The bugs are from the mae drop scripts. There is an issue in some path setting and the unlock scripts. There is no feasible way to run mae.

There's a bunch of unlock errors that happen due to bugs in the mae scripts themselves. This is before any gatk is ever run

The fact that Scripts_MAE_deseq_mae_R was called before it failed means that gatk was run. However errors in the gatk command don't get recognised by snakemake, so it only occurs when the broken output is read in R. That's why I was trying to get you to rerun the mae pipeline from scratch.

from drop.

mumichae avatar mumichae commented on July 17, 2024

Seeing as you are working with the demo and that we habe resolved the dependency issues, it might be easier to just redo the setup in a new empty directory, instead of trying to isolate the mae pipeline for now

from drop.

gevro avatar gevro commented on July 17, 2024

Neither of these things fixes the problem. Changing the SCRIPT_ROOT = os.getcwd() now gives this error when running snakemake unlock:
TypeError in line 34 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
unsupported operand type(s) for /: 'str' and 'str'
File "/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile", line 64, in
File "/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile", line 34, in getChrMap

from drop.

gevro avatar gevro commented on July 17, 2024

How do I redo the setup in a new empty directory without having rerun all the other pipelines, which takes 3 days of very significant compute resources?

from drop.

mumichae avatar mumichae commented on July 17, 2024

It takes 3 days to execute the demo? It should only take about 20min. And the total input is just abt 600MB, so total memory consumption shouldn't be high. What are your resources, what system are you working on?

from drop.

gevro avatar gevro commented on July 17, 2024

I ran drop for 110 samples, not the demo. Is there any way to transfer all the results of the other pipelines (except for mae) to the new directory?

from drop.

mumichae avatar mumichae commented on July 17, 2024

Try to run drop on a new instance of drop demo to see if the problem still occurs first. Right now I don't have any good solution why the code suddenly breaks, as it worked before

from drop.

gevro avatar gevro commented on July 17, 2024

I tried now in a new directory-- drop demo works. Deleting the mae directory in root for some reason causes the directory to no longer work, and drop update doesn't fix it. And unlock commands also stop working for mae.

But because the mae pipeline sometimes crashes and then needs to be reset, there should be a way to reset the mae pipeline to the beginning.

In the meantime, let me know if there is a way to move results from a prior directory to a new directory, except for mae.

from drop.

mumichae avatar mumichae commented on July 17, 2024

OK, good to know that that's a way we can fix the issue. You should definitely keep a copy of the output you have at the moment, before you have a full pipeline run. Also be careful if you save in a scratch directory, as these are often used for temporary data.
You can simply copy the output to the new directory, run a drop init and adapt the config.yaml. Always double-check with a dryrun snakemake -n before running the pipeline fully. In order to prevent the pipeline from overwriting your output, call snakemake --touch. This should update the update the timestamps of all the necessary and existsing output files. If snakemake -n still tells you to redo the steps, there might be some upstream input missing in the output.

from drop.

gevro avatar gevro commented on July 17, 2024

Thanks. Which specific directories do I need to copy from the old directory to the new directory?

from drop.

mumichae avatar mumichae commented on July 17, 2024

Create the new project first, call snakemake -n then copy the files. The computationally expensive files are in processed_data/ and processed_results/ directories. These should be autocreated empty directories in your new project directory, so make sure they exist before. Copy the directories of each submodule separately

processed_data/aberrant_expression/
processed_data/aberrant_splicing
processed_results/...
etc

and don't touch the mae directory.

from drop.

gevro avatar gevro commented on July 17, 2024

I made a new directory, then I did drop init, then drop update, then I copied the config.yaml and samples.tsv file, then snakemake -n. It ran fine. However, there is no root directory. I only have these files in the directory...

-rw-rw---- 1 evrong01 evrong01 1.8K Aug 15 15:54 config.yaml
-rw-rw---- 1 evrong01 evrong01 0 Aug 15 15:54 readme.md
-rwxrwx--- 1 evrong01 evrong01 952 Aug 15 15:55 runpipeline.sh
-rw-rw---- 1 evrong01 evrong01 179K Aug 15 15:54 samples.tsv
drwxrwx--- 6 evrong01 evrong01 4.0K Aug 15 15:54 Scripts
-rw-rw---- 1 evrong01 evrong01 2.9K Aug 15 15:54 Snakefile

from drop.

gevro avatar gevro commented on July 17, 2024

Now I created the root directory manually. Then I copied all the processed_data and processed_results except for mae. Then I ran snakemake --touch and it ran fine. Then I ran snakemake -n and it ran fine.

But now I am running snakemake unlock again, and again I'm getting the same error. There must be some bug in the mae code.

Building DAG of jobs...
MissingInputException in line 38 of /gpfs/home/evrong01/.local/lib/python3.6/site-packages/wbuild/wBuild.snakefile:
Missing input files for rule markdown:
MAE/UDP--v32_results.md
[Sat Aug 15 16:02:50 2020]
Error in rule unlock:
jobid: 0
output: /gpfs/scratch/evrong01/droptest3/.drop/tmp/MAE/unlock
shell:
snakemake --unlock --configfile /gpfs/scratch/evrong01/droptest3/.drop/tmp/config.yaml
(exited with non-zero exit code)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /gpfs/scratch/evrong01/droptest3/.drop/modules/mae-pipeline/.snakemake/log/2020-08-15T160249.624667.snakemake.log

from drop.

gevro avatar gevro commented on July 17, 2024

Also, running 'snakemake aberrantExpression' finishes and says "Nothing to be done.", but it didn't recreate the html_output directory.

So just moving the processed_data and processed_results files to the new directory doesn't help, because then it doesn't recreate the html_output directory. I guess I can just copy those over too.

But still the problem is I can't get mae to run. Unlock doesn't work, and there is no way around it.

from drop.

gevro avatar gevro commented on July 17, 2024

Also, if I run snakemake mae without doing unlock in the new directory, it also says "Nothing to be done", even though there is no mae results in the folder. I suspect that snakemake --touch makes it think that mae is already finished.

from drop.

gevro avatar gevro commented on July 17, 2024

And if I run snakemake mae without doing unlock in the old directory, I get this error. Maybe this is where the bug is.

Subworkflow AE: Nothing to be done.
Subworkflow AS: Nothing to be done.
Executing subworkflow MAE.
Structuring dependencies...
Dependencies file generated.

TypeError in line 34 of /gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile:
unsupported operand type(s) for /: 'str' and 'str'
File "/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile", line 64, in
File "/gpfs/scratch/evrong01/droptest2/.drop/modules/mae-pipeline/Snakefile", line 34, in getChrMap

from drop.

gevro avatar gevro commented on July 17, 2024

In the new directory, I manually deleted the two MAE.done files and now mae seems to run. However, it hit another new error:

Error in eval(jsub, SDenv, parent.frame()) :
object 'gene_status' not found

Calls: [ -> [.data.table -> eval -> eval
Execution halted
[Sat Aug 15 16:19:32 2020]
Error in rule Scripts_MAE_gene_name_mapping_R:
jobid: 14
output: /gpfs/scratch/evrong01/droptest3/root/processed_data/mae/gene_name_mapping_v32.tsv

from drop.

mumichae avatar mumichae commented on July 17, 2024

I think things will just get too messy from here. I'd suggest that you try to use the version of drop that I'm still developing on, as it will make reruns so much easier. The output structure of the HTML will be a bit different and it will still miss some features, but the pipeline core should be fully functional.

If you already have a local clone of the drop repo set a new remote to [email protected]:mumichae/drop.git

git add remote -n mumichae [email protected]:mumichae/drop.git

Otherwise just clone the above URL.

Then checkout import_counts

git checkout import_counts

Next, you need to install drop to your current drop conda environment (as you already have all the dependencies).
Also, you need to use a different version of wbuild as I am developing there as well. So you need to remove the current version wbuild first.

conda remove wbuild
pip install -e <root_of_drop_repo> # should install the correct version of wbuild

Let me know once you have installed the new version and whether you can get the drop demo running with dryrun and proper execution. Then we can proceed.

from drop.

gevro avatar gevro commented on July 17, 2024

I'm getting an installation error:
(/gpfs/home/evrong01/bin/drop_conda) [evrong01@bigpurple-ln2 drop_dev]$ pip install -e .
Traceback (most recent call last):
File "/gpfs/home/evrong01/bin/drop_conda/bin/pip", line 6, in
from pip._internal.cli.main import main
ModuleNotFoundError: No module named 'pip._internal.cli.main'

from drop.

mumichae avatar mumichae commented on July 17, 2024

Seems as though your pip is not working. It could be that a wrong version is referenced (seeing as it has worked before). Make sure you are using the pip provided by anaconda. conda list should definitely contain pip. Its version should match with pip --version

from drop.

gevro avatar gevro commented on July 17, 2024

conda remove wbuild broke pip. I reinstalled pip and then managed to install the alternate drop version.

snakemake -n gives this next error:

The downloaded source packages are in
‘/tmp/RtmpkQB0PU/downloaded_packages’
Installation path not writeable, unable to update packages: AnnotationDbi,
AnnotationHub, BiocFileCache, bit, bit64, broom, data.table, DelayedArray,
dplyr, DT, ExperimentHub, fs, GenomicFeatures, Hmisc, httr, loo, mice,
pillar, pkgbuild, ps, quantreg, RcppArmadillo, Rhdf5lib, rlang, rstan,
StanHeaders, SummarizedExperiment, sys, tibble, tidyr, vctrs, xfun, XML
installed OUTRIDER
install c-mertes/FRASER
Bioconductor version 3.11 (BiocManager 1.30.10), R 4.0.0 (2020-04-24)
Installing github package(s) 'c-mertes/FRASER'
Error: package 'remotes' not installed in library path(s)
/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0
/gpfs/share/apps/R/4.0.0/lib64/R/library
install with 'install("remotes")'
Execution halted
CalledProcessError in line 5 of /gpfs/scratch/evrong01/dropdemo/Snakefile:
Command '['Rscript', PosixPath('/gpfs/data/bin/drop_dev/drop/installRPackages.R'), PosixPath('/gpfs/data/bin/drop_dev/drop/requirementsR.txt')]' returned non-zero exit status 1.
File "/gpfs/scratch/evrong01/dropdemo/Snakefile", line 5, in
File "/gpfs/data/bin/drop_dev/drop/setupDrop.py", line 33, in installRPackages
File "/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/subprocess.py", line 369, in check_returncode

I went into R and manually installed remotes. Then I get this next error:

  • installing source package ‘Rhdf5lib’ ...
    ** using staged installation
    checking for gcc... gcc
    checking whether the C compiler works... yes
    checking for C compiler default output file name... a.out
    checking for suffix of executables...
    checking whether we are cross compiling... no
    checking for suffix of object files... o
    checking whether we are using the GNU C compiler... yes
    checking whether gcc accepts -g... yes
    checking for gcc option to accept ISO C89... none needed
    checking whether we are using the GNU C++ compiler... yes
    checking whether g++ -std=gnu++11 accepts -g... yes
    checking whether C++ compiler accepts -w... yes
    checking how to run the C++ preprocessor... g++ -std=gnu++11 -E
    checking for grep that handles long lines and -e... /usr/bin/grep
    checking for egrep... /usr/bin/grep -E
    checking for ANSI C header files... yes
    checking for sys/types.h... yes
    checking for sys/stat.h... yes
    checking for stdlib.h... yes
    checking for string.h... yes
    checking for memory.h... yes
    checking for strings.h... yes
    checking for inttypes.h... yes
    checking for stdint.h... yes
    checking for unistd.h... yes
    checking for zlib.h... yes
    checking curl/curl.h usability... yes
    checking curl/curl.h presence... yes
    checking for curl/curl.h... yes
    checking openssl/evp.h usability... yes
    checking openssl/evp.h presence... yes
    checking for openssl/evp.h... yes
    checking openssl/hmac.h usability... yes
    checking openssl/hmac.h presence... yes
    checking for openssl/hmac.h... yes
    checking openssl/sha.h usability... yes
    checking openssl/sha.h presence... yes
    checking for openssl/sha.h... yes
    checking for curl_global_init in -lcurl... no
    checking for EVP_sha256 in -lcrypto... yes
    untarring hdf5small_cxx_hl_1.10.6.tar.gz ...
    building the szip library...
    checking for a BSD-compatible install... /usr/bin/install -c
    checking whether build environment is sane... yes
    checking for a thread-safe mkdir -p... /usr/bin/mkdir -p
    checking for gawk... gawk
    checking whether make sets $(MAKE)... yes
    checking whether make supports nested variables... yes
    checking whether to enable maintainer-specific portions of Makefiles... no
    checking build system type... x86_64-unknown-linux-gnu
    checking host system type... x86_64-unknown-linux-gnu
    checking for config x86_64-unknown-linux-gnu... no
    checking for config x86_64-unknown-linux-gnu... no
    checking for config unknown-linux-gnu... no
    checking for config unknown-linux-gnu... no
    checking for config x86_64-linux-gnu... no
    checking for config x86_64-linux-gnu... no
    checking for config x86_64-unknown... no
    checking for config linux-gnu... found
    compiler 'gcc' is GNU gcc-6.3.0
    checking for gcc... gcc
    checking whether the C compiler works... yes
    checking for C compiler default output file name... a.out
    checking for suffix of executables...
    checking whether we are cross compiling... no
    checking for suffix of object files... o
    checking whether we are using the GNU C compiler... yes
    checking whether gcc accepts -g... yes
    checking for gcc option to accept ISO C89... none needed
    checking whether gcc understands -c and -o together... yes
    checking for style of include used by make... GNU
    checking dependency style of gcc... gcc3
    checking how to run the C preprocessor... /gpfs/home/evrong01/bin/drop_conda/bin/x86_64-conda_cos6-linux-gnu-cpp
    configure: error: in /tmp/RtmpxYUhev/R.INSTALL7409582d709f/Rhdf5lib/src/hdf5/szip': configure: error: C preprocessor "/gpfs/home/evrong01/bin/drop_conda/bin/x86_64-conda_cos6-linux-gnu-cpp" fails sanity check See config.log' for more details
    make: *** No targets specified and no makefile found. Stop.
    make: *** No rule to make target install'. Stop. building the hdf5 library... checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /usr/bin/mkdir -p checking for gawk... gawk checking whether make sets $(MAKE)... yes checking whether make supports nested variables... yes checking whether make supports nested variables... (cached) yes checking whether to enable maintainer-specific portions of Makefiles... no checking build system type... x86_64-unknown-linux-gnu checking host system type... x86_64-unknown-linux-gnu checking shell variables initial values... done checking if basename works... yes checking if xargs works... yes checking for cached host... none checking for config x86_64-unknown-linux-gnu... no checking for config x86_64-unknown-linux-gnu... no checking for config unknown-linux-gnu... no checking for config unknown-linux-gnu... no checking for config x86_64-linux-gnu... no checking for config x86_64-linux-gnu... no checking for config x86_64-unknown... no checking for config linux-gnu... found compiler 'gcc' is GNU gcc-6.3.0 compiler 'g++ -std=gnu++11' is GNU g++-6.3.0 checking for config ./config/site-specific/host-bigpurple-ln2... no checking build mode... production checking for gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking whether gcc understands -c and -o together... yes checking for style of include used by make... GNU checking dependency style of gcc... gcc3 checking if unsupported combinations of configure options are allowed... no checking how to run the C preprocessor... /gpfs/home/evrong01/bin/drop_conda/bin/x86_64-conda_cos6-linux-gnu-cpp configure: error: in /tmp/RtmpxYUhev/R.INSTALL7409582d709f/Rhdf5lib/src/hdf5':
    configure: error: C preprocessor "/gpfs/home/evrong01/bin/drop_conda/bin/x86_64-conda_cos6-linux-gnu-cpp" fails sanity check
    See `config.log' for more details
    sh configure
    configure: error: cannot find sources (src/H5.c) in /gpfs/share/apps/samtools/1.9/bcftools-1.9 or ..
    make: *** [_config] Error 1
    configure: HDF5_INCLUDE=hdf5/src
    configure: HDF5_CXX_INCLUDE=hdf5/c++/src
    configure: HDF5_HL_INCLUDE=hdf5/hl/src
    configure: HDF5_HL_CXX_INCLUDE=hdf5/hl/c++/src
    configure: HDF5_LIB=hdf5/src/.libs/libhdf5.a
    configure: HDF5_CXX_LIB=hdf5/c++/src/.libs/libhdf5_cpp.a
    configure: HDF5_HL_LIB=hdf5/hl/src/.libs/libhdf5_hl.a
    configure: HDF5_HL_CXX_LIB=hdf5/hl/c++/src/.libs/libhdf5_hl_cpp.a
    configure: SZIP_LIB=hdf5/szip/src/.libs/libsz.a
    configure: creating ./config.status
    config.status: creating src/Makevars
    ** libs
    mkdir -p "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/include"
    cp "hdf5/src/".h "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/include"
    cp "hdf5/c++/src/"
    .h "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/include"
    cp "hdf5/hl/src/".h "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/include"
    cp "hdf5/hl/c++/src/"
    .h "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/include"
    mkdir -p "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/lib/"
    cp "hdf5/src/.libs/libhdf5.a" "/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/00LOCK-Rhdf5lib/00new/Rhdf5lib/lib/"
    cp: cannot stat ‘hdf5/src/.libs/libhdf5.a’: No such file or directory
    make: *** [copying] Error 1
    ERROR: compilation failed for package ‘Rhdf5lib’
  • removing ‘/gpfs/data/bin/Gilad/R/x86_64-pc-linux-gnu-library/4.0/Rhdf5lib’
    Error: Failed to install 'FRASER' from GitHub:
    (converted from warning) installation of package ‘Rhdf5lib’ had non-zero exit status
    Execution halted
    CalledProcessError in line 5 of /gpfs/scratch/evrong01/dropdemo/Snakefile:
    Command '['Rscript', PosixPath('/gpfs/data/bin/drop_dev/drop/installRPackages.R'), PosixPath('/gpfs/data/bin/drop_dev/drop/requirementsR.txt')]' returned non-zero exit status 1.
    File "/gpfs/scratch/evrong01/dropdemo/Snakefile", line 5, in
    File "/gpfs/data/bin/drop_dev/drop/setupDrop.py", line 33, in installRPackages
    File "/gpfs/share/apps/python/cpu/3.6.5/lib/python3.6/subprocess.py", line 369, in check_returncode

from drop.

gevro avatar gevro commented on July 17, 2024

Have you considered distributing drop as a docker? It might solve all of these installation/configuration issues.

from drop.

mumichae avatar mumichae commented on July 17, 2024

yes, we have a docker https://github.com/c-mertes/docker_drop, but it uses the old version of drop and we would have to take the same steps to update it. You can try to use that instead but you'll need to update drop, as the bugs you are experiencing are still there. Just note that it will need a considerable amount of storage (15GB compressed). I haven't managed to test it on my machine yet due to limited storage though, so I can only help you with the issues that are not related to docker.

As for your error message, it seems as though you aren't accessing the conda environment properly, as you still have other versions of R that you are accessing. In conda, you shouldn't be reinstalling the R packages, as they should already exist. Maybe check your PATH variable echo $PATH as it could set Locations to R and python on top of conda. And make sure you are using the conda R version, not any other local one.

You can decide which environment to use. For docker, note that you'll have to mount your data in the specific folder structure as described in the README https://github.com/c-mertes/docker_drop. I'm not sure if using your old output data works, but you can definitely give it a try.

from drop.

gevro avatar gevro commented on July 17, 2024

Are you updated Docker-drop regularly on https://github.com/c-mertes/docker_drop? Does it have the most up to date version of drop?


When I uninstalled wbuild, it messed up the conda environment. It removed and switched many dependencies and for some reason caused R to get uninstalled. I'm not sure why.

This is why in my opinion dockers are better. Conda is in general a buggy system, because it is not completely disconnected from the local environment and therefore it is not good at handling complicated dependencies. Even though docker requires more memory, it will save users the headache of trying to get the pipeline working.

from drop.

gevro avatar gevro commented on July 17, 2024

I fixed the R PATH issue for the conda environment. But snakemake -n for the demo still gave an error:
Warning messages:
1: In install.packages(...) :
installation of package ‘BiocFileCache’ had non-zero exit status
2: In install.packages(...) :
installation of package ‘biomaRt’ had non-zero exit status
3: In install.packages(...) :
installation of package ‘GenomicFeatures’ had non-zero exit status
4: In install.packages(...) :
installation of package ‘VariantAnnotation’ had non-zero exit status
5: In INSTALL(packages[i, 1]) :
installation of package ‘ggplot2’ had non-zero exit status
AttributeError in line 12 of /gpfs/scratch/evrong01/droptest3/Snakefile:
'Config' object has no attribute 'getConfig'
File "/gpfs/scratch/evrong01/droptest3/Snakefile", line 12, in
File "/gpfs/data/bin/drop_dev/drop/config/DropConfig.py", line 26, in init

I tried to manually installed these and looks like the error is because dplyr and associated packages were not installed. It gives an option to install it, so it must be that when DROP runs the demo it doesn't answer 'yes' to install dependencies such as dplyr.

from drop.

gevro avatar gevro commented on July 17, 2024

I succeeded in getting all the packages installed. Now I'm getting this error for the demo:

droptest3]$ snakemake -n
check for missing R packages
AttributeError in line 12 of /gpfs/scratch/evrong01/droptest3/Snakefile:
'Config' object has no attribute 'getConfig'
File "/gpfs/scratch/evrong01/droptest3/Snakefile", line 12, in
File "/gpfs/data/evronylab/bin/drop_dev/drop/config/DropConfig.py", line 26, in init

from drop.

mumichae avatar mumichae commented on July 17, 2024

That's a bit surprising, as you had all the dependencies before and wbuild doesn't depend on any R packages, so removing wbuild shouldn't remove any R dependencies. Are you still using the correct R library path .libPaths()? You need to make sure that the standard packages are running properly in R (and don't have to be reinstalled, as they should already be present in conda). Otherwise you will get errors later on.

If you want to use the docker instead, you only need to do download the developer version of drop (as you did before on your local machine) and remove wbuild using pip, as described below. Then you can reinstall the newest drop version from the local repo. Don't forget to prepare and mount your input data as described in the README.

Updating DROP and wbuild
The error you got is what we want, as you are now using the new drop version. Unfortunately wbuild doesn't seem to have been uninstalled properly so try with

pip uninstall wbuild

then conda list should not contain wbuild anymore.
Continue with reinstalling drop

pip install -e <path-to-drop>

And make sure wbuild is installed. In case that doesn't work, try pip uninstall drop before reinstalling it.

Then you should have the correct dependencies to get a successful dryrun.

from drop.

gevro avatar gevro commented on July 17, 2024

For some reason now gatk is not being found in my conda environment. It was there before. I think it's too complicated. I'm not sure what to try next. If there is a simpler way to setup the drop environment, I'm happy to try, but conda seems complicated. If you have a docker that is built and ready to go, I'm happy to try that.

from drop.

mumichae avatar mumichae commented on July 17, 2024

It seems as though you aren't properly accessing your conda environment. You can verify with conda env list to see which one you're in.

If you are working with docker, just follow the instructions in https://github.com/c-mertes/docker_drop and then follow my previous instructions on updating to the developer version of drop ([email protected]:mumichae/drop.git, branch import_counts) and wbuild ([email protected]:mumichae/wBuild.git branch subindex). As they are still work in progress, there is no dedicated docker container for that version.

from drop.

gevro avatar gevro commented on July 17, 2024

Thanks. To make it more simple for users, do you have a docker that is already built and confirmed to work on docker hub?

from drop.

mumichae avatar mumichae commented on July 17, 2024

Well yes, we have a prebuilt docker container that os about 15GB in compressed size. You don't need to rebuild bit just run it as described in the README and it'll be downloaded from the mertes/drop repository (which will take time)

docker run --rm --network=host -ti mertes/drop

for writing the demo or if you are mounting your own data:

docker run --rm -ti -v ... etc.

It has the old version of drop, so you'll need to manually update to use any github developer version (what I explained before).

Does that answer your question?

from drop.

gevro avatar gevro commented on July 17, 2024

Ok, I think to keep things simple I will just wait until a more stable version of drop is released with the above issues resolved. I would appreciate if you let me know once it is released. Thanks.

from drop.

gevro avatar gevro commented on July 17, 2024

Hi, We have a new RNA-seq sample from a syndromic family that we would like to try on DROP. Are any of the new versions of DROP available yet? Thanks.

from drop.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.