Code Monkey home page Code Monkey logo

hpp_production_workflows's People

Contributors

adamnovak avatar avani-k avatar iviolich avatar jimin001 avatar juklucas avatar miramastoras avatar mobinasri avatar tpesout avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

hpp_production_workflows's Issues

Ensembl mapping pipeline for annotation source code

Hi all,
I want to use the Ensembl Mapping Pipeline for annation in this paper (A draft human pangenome reference) , but I could not find the source codes in this repository. Is this pipeline not publicly available? Or am I missing something?
There is only a description of the process and some code in the paper, but I can't use the Ensembl Mapping Pipeline for annation method.

Thanks!

Best,
Tony

Use of docker image `:latest` tag can result in failed workflow runs

Various QC WDL such as QC/tasks/quast.wdl reference their docker image dependency using the :latest tag. This causes an ambiguity on which image is required, as latest is an ambiguous tag (changes over time). This can cause a couple of issues:

  • run failures if a user updates their WDL, but does not pull the latest images
  • makes it quite difficult to use an old WDL version, as you have no idea which docker image it requires.

Recommend that all images be referenced in WDL by an unambiguous tag or their actual digest. That will create an unambiguous dependency.

CC @0seastar0 @tpesout

WDL and shell lint

A number of the workflows/tasks have lint as reported by miniwdl check. Some of the suggestions are worth implementing (eg, quoting).

miniwdl check reference

For example, I ran it on QC/wdl/workflows/standard_qc_haploid.wdl and it reported:

$ miniwdl check hpp_production_workflows/QC/wdl/workflows/standard_qc_haploid.wdl
standard_qc_haploid.wdl
    workflow standardQualityControlHaploid
        (Ln 17, Col 9) UnusedDeclaration, nothing references Boolean isMaleSample
        call asmgene_t.asmgene
        call quast_t.quast
        call meryl_t.runMeryl as meryl
        call merqury_t.merqury
        call yak_t.runYakAssemblyStats as yak
        call consolidate
    task consolidate
        (Ln 135, Col 11) CommandShellCheck, SC2035 Use ./*glob* or -- *glob* so names with dashes won't become options.
        (Ln 135, Col 30) CommandShellCheck, SC2035 Use ./*glob* or -- *glob* so names with dashes won't become options.
    asmgene_t : asmgene.wdl
        workflow runAsmgene (not called)
            call asmgene
                (Ln 4, Col 2) IncompleteCall, required input(s) omitted in call to asmgene (assemblyFasta, genesFasta)
        task asmgene
            (Ln 46, Col 82) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 49, Col 46) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 49, Col 60) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
    merqury_t : merqury.wdl
        workflow runMerqury (not called)
            call merqury
                (Ln 4, Col 5) IncompleteCall, required input(s) omitted in call to merqury (assemblyFasta, kmerTarball)
        task merqury
            (Ln 31, Col 8) CommandShellCheck, SC2034 OMP_NUM_THREADS appears unused. Verify use (or export if used externally).
            (Ln 46, Col 15) CommandShellCheck, SC2207 Prefer mapfile or read -a to split command output (or quote to avoid splitting).
            (Ln 48, Col 19) CommandShellCheck, SC2207 Prefer mapfile or read -a to split command output (or quote to avoid splitting).
            (Ln 49, Col 19) CommandShellCheck, SC2207 Prefer mapfile or read -a to split command output (or quote to avoid splitting).
            (Ln 56, Col 19) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 57, Col 15) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 68, Col 23) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 69, Col 19) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 77, Col 15) CommandShellCheck, SC2206 Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
            (Ln 80, Col 8) CommandShellCheck, SC2068 Double quote array expansions to avoid re-splitting elements.
            (Ln 83, Col 17) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 83, Col 40) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
    meryl_t : meryl.wdl
        (Ln 4, Col 1) UnusedImport, no use of workflow, tasks, or structs defined in the imported document shardReads_t
        workflow runMeryl
            scatter readFile
                call extractReads_t.extractReads as sampleReadsExtracted
            scatter readFile
                call extractReads_t.extractReads as maternalReadsExtracted
            scatter readFile
                call extractReads_t.extractReads as paternalReadsExtracted
            call arithmetic_t.sum as sampleReadSize
            call arithmetic_t.sum as maternalReadSize
            call arithmetic_t.sum as paternalReadSize
            call arithmetic_t.sum as allReadSize
            call arithmetic_t.max as sampleReadSizeMax
            call arithmetic_t.max as maternalReadSizeMax
            call arithmetic_t.max as paternalReadSizeMax
            scatter readFile
                call merylCount as sampleMerylCount
            scatter readFile
                call merylCount as maternalMerylCount
            scatter readFile
                call merylCount as paternalMerylCount
            call merylUnionSum as sampleMerylUnionSum
            call merylUnionSum as maternalMerylUnionSum
            call merylUnionSum as paternalMerylUnionSum
            call merylHapmer
        task merylCount
            (Ln 198, Col 8) CommandShellCheck, SC2034 OMP_NUM_THREADS appears unused. Verify use (or export if used externally).
            (Ln 201, Col 11) CommandShellCheck, SC2006 Use $(...) notation instead of legacy backticked `...`.
            (Ln 202, Col 92) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 205, Col 16) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 205, Col 30) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 208, Col 15) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
        task merylHapmer
            (Ln 299, Col 8) CommandShellCheck, SC2034 OMP_NUM_THREADS appears unused. Verify use (or export if used externally).
            (Ln 308, Col 13) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 311, Col 36) CommandShellCheck, SC2035 Use ./*glob* or -- *glob* so names with dashes won't become options.
            (Ln 311, Col 42) CommandShellCheck, SC2035 Use ./*glob* or -- *glob* so names with dashes won't become options.
        task merylUnionSum
            (Ln 244, Col 8) CommandShellCheck, SC2034 OMP_NUM_THREADS appears unused. Verify use (or export if used externally).
        arithmetic_t : arithmetic.wdl
            task max
            task sum
        extractReads_t : extract_reads.wdl
            workflow runExtractReads (not called)
                scatter file
                    call extractReads
            task extractReads
                (Ln 58, Col 65) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
                (Ln 64, Col 12) CommandShellCheck, SC2226 This ln has no destination. Check the arguments, or specify '.' explicitly.
                (Ln 65, Col 56) CommandShellCheck, SC2046 Quote this to prevent word splitting.
                (Ln 65, Col 56) CommandShellCheck, SC2006 Use $(...) notation instead of legacy backticked `...`.
                (Ln 65, Col 106) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
                (Ln 67, Col 46) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
                (Ln 73, Col 19) CommandShellCheck, SC2006 Use $(...) notation instead of legacy backticked `...`.
                (Ln 74, Col 21) CommandShellCheck, SC2053 Quote the right-hand side of == in [[ ]] to prevent glob matching.
                (Ln 75, Col 23) CommandShellCheck, SC2006 Use $(...) notation instead of legacy backticked `...`.
                (Ln 77, Col 13) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
        shardReads_t : shard_reads.wdl
            workflow runShardReads (not called)
                call shardReads
                    (Ln 4, Col 5) IncompleteCall, required input(s) omitted in call to shardReads (readFile)
            task shardReads (not called)
                (Ln 46, Col 97) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
                (Ln 48, Col 88) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
                (Ln 52, Col 24) CommandShellCheck, SC2046 Quote this to prevent word splitting.
                (Ln 52, Col 26) CommandShellCheck, SC2012 Use find instead of ls to better handle non-alphanumeric filenames.
    quast_t : quast.wdl
        workflow runQuast (not called)
            call quast
                (Ln 4, Col 2) IncompleteCall, required input(s) omitted in call to quast (assemblyFasta)
        task quast
            (Ln 34, Col 19) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 37, Col 12) CommandShellCheck, SC2226 This ln has no destination. Check the arguments, or specify '.' explicitly.
            (Ln 44, Col 18) CommandShellCheck, SC2206 Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
            (Ln 51, Col 23) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 54, Col 16) CommandShellCheck, SC2226 This ln has no destination. Check the arguments, or specify '.' explicitly.
            (Ln 56, Col 22) CommandShellCheck, SC2206 Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
            (Ln 60, Col 14) CommandShellCheck, SC2236 Use -n instead of ! -z.
            (Ln 65, Col 15) CommandShellCheck, SC2206 Quote to prevent word splitting/globbing, or split robustly with mapfile or read -a.
            (Ln 71, Col 17) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 71, Col 38) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
    yak_t : yak.wdl
        (Ln 4, Col 1) UnusedImport, no use of workflow, tasks, or structs defined in the imported document shardReads_t
        workflow runYakAssemblyStats
            (Ln 16, Col 9) UnusedDeclaration, nothing references Int kmerSize
            (Ln 17, Col 9) UnusedDeclaration, nothing references Int shardLinesPerFile
            scatter readFile
                call extractReads_t.extractReads as maternalReadsExtracted
            scatter readFile
                call extractReads_t.extractReads as paternalReadsExtracted
            scatter readFile
                call extractReads_t.extractReads as sampleReadsExtracted
            call arithmetic_t.sum as maternalReadSize
            call arithmetic_t.sum as paternalReadSize
            call arithmetic_t.sum as sampleReadSize
            call yakCount as yakCountMat
            call yakCount as yakCountPat
            call yakCount as yakCountSample
            call yakAssemblyStats
        task yakAssemblyStats
            (Ln 187, Col 81) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 188, Col 81) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 191, Col 112) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 192, Col 112) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 196, Col 8) CommandShellCheck, SC2129 Consider using { cmd1; cmd2; } >> file instead of individual redirects.
            (Ln 196, Col 26) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 197, Col 17) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 197, Col 42) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 198, Col 26) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 199, Col 17) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 199, Col 42) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 200, Col 30) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 201, Col 17) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 201, Col 52) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 202, Col 30) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 203, Col 17) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 203, Col 52) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 206, Col 17) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
            (Ln 206, Col 39) CommandShellCheck, SC2035 Use ./*glob* or -- *glob* so names with dashes won't become options.
        task yakCount
        arithmetic_t : arithmetic.wdl
            task sum
            task max (not called)
        extractReads_t : extract_reads.wdl
            workflow runExtractReads (not called)
                scatter file
                    call extractReads
            task extractReads
                (Ln 58, Col 65) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
                (Ln 64, Col 12) CommandShellCheck, SC2226 This ln has no destination. Check the arguments, or specify '.' explicitly.
                (Ln 65, Col 56) CommandShellCheck, SC2046 Quote this to prevent word splitting.
                (Ln 65, Col 56) CommandShellCheck, SC2006 Use $(...) notation instead of legacy backticked `...`.
                (Ln 65, Col 106) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
                (Ln 67, Col 46) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
                (Ln 73, Col 19) CommandShellCheck, SC2006 Use $(...) notation instead of legacy backticked `...`.
                (Ln 74, Col 21) CommandShellCheck, SC2053 Quote the right-hand side of == in [[ ]] to prevent glob matching.
                (Ln 75, Col 23) CommandShellCheck, SC2006 Use $(...) notation instead of legacy backticked `...`.
                (Ln 77, Col 13) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
        shardReads_t : shard_reads.wdl
            workflow runShardReads (not called)
                call shardReads
                    (Ln 4, Col 5) IncompleteCall, required input(s) omitted in call to shardReads (readFile)
            task shardReads (not called)
                (Ln 46, Col 97) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
                (Ln 48, Col 88) CommandShellCheck, SC2086 Double quote to prevent globbing and word splitting.
                (Ln 52, Col 24) CommandShellCheck, SC2046 Quote this to prevent word splitting.
                (Ln 52, Col 26) CommandShellCheck, SC2012 Use find instead of ls to better handle non-alphanumeric filenames.

Please add a LICENSE file

There's currently no license associated with the code in this repo, making it impossible to reuse. Please consider adding a permissive license.

Add option -e to the asmgene.wdl

It would be helpful to add the option -e to the paftools.js asmgene.wdl to aid in evaluation of misassemblies. It will print both single-copy and multi-copy gene errors.

#paftools.js asmgene
Usage: paftools.js asmgene [options] <ref-splice.paf> <asm-splice.paf> [...]
Options:
-i FLOAT min identity [0.99]
-c FLOAT min coverage [0.99]
-a only evaluate genes mapped to the autosomes
-e print fragmented/missing genes

Thank you!!

please push newest docker images

Hi!

I ran the standard_haploid_qc workflow and saw that it was using a really old version of minimap2 in the asmgene call.

The asmgene task (here: https://github.com/human-pangenomics/hpp_production_workflows/blob/master/QC/wdl/tasks/asmgene.wdl), uses the docker image "tpesout/hpp_minimap2:latest"
If Iyou pull and start that docker image, and run minimap2 --version:
$ docker run -it --rm tpesout/hpp_minimap2:latest /bin/bash
root@9826e39b4f73:/data# minimap2 --version
2.17-r941

However, the docker build file (here: https://github.com/human-pangenomics/hpp_production_workflows/blob/master/QC/docker/minimap2/Dockerfile) points at a much newer (1 release back) version.

We think @tpesout Trevor forgot to update the docker images with the latest build, maybe did it on a local machine but didn't publish the image.

Thanks,
Sara and Bruce @bkmartinjr

Lint in CI (GitHub Actions)

The ticket #1 is to check the bash shell by the check problem.

And I intend this ticket as linting WDL syntax in a tool such as WOMtool. And for the sustainable way, checking it on CI such as GitHub Actions is nice.

clarify data source for HSat annotation

Hi,
could you please clarify the data source for the chm13v1.1/HSat annotation used in this step:

2. Incorporating HSATs Coverage Bias
[...]
To do so we need a bed file pointing to the HSat in the reference then we can run the script project_blocks.py to project it back to the assembly.

Or is that already packaged in the container? Thanks!
+Peter

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.