Code Monkey home page Code Monkey logo

labshengli / nanome Goto Github PK

View Code? Open in Web Editor NEW
29.0 29.0 7.0 131.12 MB

NANOME pipeline (Nanopore long-read sequencing data consensus DNA methylation detection)

Home Page: https://www.jax.org/research-and-faculty/faculty/sheng-li

License: MIT License

Python 72.08% Shell 1.50% Nextflow 14.96% Dockerfile 0.27% R 1.29% HTML 0.33% JavaScript 8.90% CSS 0.67%
bioinformatics dna-methylation long-read-sequencing methylation-calling nanopore-sequencing pipeline

nanome's People

Contributors

liuyangzzu avatar panziwei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nanome's Issues

Open clusterOptions in command line

Very nice workflow! A small thing:
I would suggest modifying some of the slurm parameters:
--qos should not be mandatory as not every cluster needs it (ours doesnt)
it would be very helpful if there would be an option from the command line to access the Nextflow clusterOptions (https://www.nextflow.io/docs/latest/process.html#process-clusteroptions). We for example have a mandatory account parameter (-A) in our Slurm config, others might have other configs, so this would help to make the workflow more easily configurable if run on other HPC setups.

Kudos for the workflow, it is very well written and I was able to run it on our HPC after the clusterOption modification. If you want I can modify it and create a PR

Add dynamic resource (disk size) allocation based on input file size

#!/usr/bin/env nextflow 

Channel
     .fromPath('hello.txt')
     .map { [it, it.size()] }
     .set { input_ch }

process foo {
  disk { $x.size() < 600.GB ? 400.GB : 700.GB }
  input:
  set file(x), val(sz) from input_ch
  """
  you_command --input $x
  """
}

In human words:

disk { $x.size() < 600.GB ? 400.GB : 700.GB }
"Is the input file size < 600GB?|

  • If true, allocate disk of size 400.GB in the task
  • If false, allocate disk of size 700.GB in the task

No such file

(nanome) [poultrylab1@pbsnode01 nanome]$ nextflow run TheJacksonLaboratory/nanome -profile test,docker
N E X T F L O W  ~  version 21.10.0
Launching `TheJacksonLaboratory/nanome` [chaotic_hamilton] - revision: c181f907e9 [master]
NANOME - NF PIPELINE (v1.3.6)
by Li Lab at The Jackson Laboratory
https://github.com/TheJacksonLaboratory/nanome
=================================
dsname              : CIEcoli
input               : https://github.com/TheJacksonLaboratory/nanome/raw/master/test_data/ecoli_ci_test_fast5.tar.gz
genome              : ecoli

Running settings   : --------
processors          : 2
chrSet              : NC_000913.3
dataType            : ecoli
runBasecall         : Yes
runNanopolish       : Yes
runMegalodon        : Yes
runDeepSignal       : Yes
runGuppy            : Yes

Pipeline settings  : --------
Working dir         : /storage-04/chicken/ont_methylation/nanome/work
Output dir          : outputs
Launch dir          : /storage-04/chicken/ont_methylation/nanome
Script dir          : /storage-01/poultrylab1/.nextflow/assets/TheJacksonLaboratory/nanome
User                : poultrylab1
Profile             : test,docker
Config Files        : /storage-01/poultrylab1/.nextflow/assets/TheJacksonLaboratory/nanome/nextflow.config
Pipeline Release    : master
Container           : docker - liuyangzzu/nanome:v1.2
=================================
executor >  local (1)
[6e/af606b] process > EnvCheck (EnvCheck) [100%] 1 of 1 โœ”
[-        ] process > Untar               -
[-        ] process > Basecall            -
[-        ] process > QCExport            -
[-        ] process > Resquiggle          -
[-        ] process > Nanopolish          -
[-        ] process > NplshComb           -
[-        ] process > Megalodon           -
[-        ] process > MgldnComb           -
[-        ] process > DeepSignal          -
[-        ] process > DpSigComb           -
[-        ] process > Guppy               -
[-        ] process > GuppyComb           -
[-        ] process > Report              -
No such file: https://github.com/TheJacksonLaboratory/nanome/raw/master/test_data/ecoli_ci_test_fast5.tar.gz


Add .github/workflows/ci.yml for implementing CI/CD

NOTE: GitHub actions requires additional configuration for running on GPU instances.
It is recommended to implement first the cpu only mode.

There are also limitations in the resources that can be used when testing with Github Actions:

Hardware specification for Linux virtual machines (used by default)
https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources

Max cpus: 2-core CPU
Max memory: 7 GB of RAM memory
Max disk size: 14 GB of SSD disk space

To be able to add CI/CD successfully therefore, a minimal test dataset is required.

Here is the template file that implements both docker and singularity CI.
It needs to be created in a folder in the root of the repo named .github/workflows/ci.yml (the folder names are reserved, the file name can be changed eg for ci.yml to continues-integration.yml etc)

name: splicing-pipelines-nf CI
# This workflow is triggered on pushes and PRs to the repository.
on: [pull_request]

jobs:
  docker:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        nxf_ver: ['20.01.0', '']
    steps:
      - uses: actions/checkout@v1
      - name: Install Nextflow
        run: |
          export NXF_VER=${{ matrix.nxf_ver }}
          wget -qO- get.nextflow.io | bash
          sudo mv nextflow /usr/local/bin/
      - name: Basic workflow tests
        run: |
          nextflow run ${GITHUB_WORKSPACE} --config conf/test.config
  singularity:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        singularity_version: ['3.6.4']
        nxf_ver: ['20.01.0', '']
    steps:
      - uses: actions/checkout@v1
      - uses: eWaterCycle/setup-singularity@v6
        with:
          singularity-version: ${{ matrix.singularity_version }}
      - name: Install Nextflow
        run: |
          export NXF_VER=${{ matrix.nxf_ver }}
          wget -qO- get.nextflow.io | bash
          sudo mv nextflow /usr/local/bin/
      - name: Basic workflow tests
        run: |
          nextflow run ${GITHUB_WORKSPACE}  --config conf/test.config

Many configs can be tested using the matrix stretegy.

Add dynamic resource allocation based on error exit status (increase memory and cpus)

Implemented here, we can also add in nanome in the same way:
https://github.com/lifebit-ai/templates/blob/322299f35c354f1b8d86dd5f4848db93850a9288/inst/templates/nextflow/conf/base.config#L28-L66

// contents nextflow.config

// Specify increasing resources on failure for specific process type
    withName: 'my_process' {
        disk = "50 GB"
        cpus = {check_max(2 * task.attempt, 'cpus')}
        memory = {check_max(2.GB * task.attempt, 'memory')}    }

//  Ready to copy-paste function, END OF nextflow.config file

// Function to ensure that resource requirements don't go beyond
// a maximum limit
def check_max(obj, type) {
  if (type == 'memory') {
    try {
      if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1)
        return params.max_memory as nextflow.util.MemoryUnit
      else
        return obj
    } catch (all) {
      println "   ### ERROR ###   Max memory '${params.max_memory}' is not valid! Using default value: $obj"
      return obj
    }
  } else if (type == 'time') {
    try {
      if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1)
        return params.max_time as nextflow.util.Duration
      else
        return obj
    } catch (all) {
      println "   ### ERROR ###   Max time '${params.max_time}' is not valid! Using default value: $obj"
      return obj
    }
  } else if (type == 'cpus') {
    try {
      return Math.min( obj, params.max_cpus as int )
    } catch (all) {
      println "   ### ERROR ###   Max cpus '${params.max_cpus}' is not valid! Using default value: $obj"
      return obj
    }
  }
}

reference genomes

Hi, I was wondering if you are considering to open nanome to other references ? Basically, I've been running most of the tools you have included in nanome on my datasets/references but separately. Therefore, it would really help to have a tool such nanome for it.
Thank you,
Best Regards
P

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.