labshengli / nanome Goto Github PK
View Code? Open in Web Editor NEWNANOME pipeline (Nanopore long-read sequencing data consensus DNA methylation detection)
Home Page: https://www.jax.org/research-and-faculty/faculty/sheng-li
License: MIT License
NANOME pipeline (Nanopore long-read sequencing data consensus DNA methylation detection)
Home Page: https://www.jax.org/research-and-faculty/faculty/sheng-li
License: MIT License
Very nice workflow! A small thing:
I would suggest modifying some of the slurm parameters:
--qos should not be mandatory as not every cluster needs it (ours doesnt)
it would be very helpful if there would be an option from the command line to access the Nextflow clusterOptions (https://www.nextflow.io/docs/latest/process.html#process-clusteroptions). We for example have a mandatory account parameter (-A) in our Slurm config, others might have other configs, so this would help to make the workflow more easily configurable if run on other HPC setups.
Kudos for the workflow, it is very well written and I was able to run it on our HPC after the clusterOption modification. If you want I can modify it and create a PR
#!/usr/bin/env nextflow
Channel
.fromPath('hello.txt')
.map { [it, it.size()] }
.set { input_ch }
process foo {
disk { $x.size() < 600.GB ? 400.GB : 700.GB }
input:
set file(x), val(sz) from input_ch
"""
you_command --input $x
"""
}
In human words:
disk { $x.size() < 600.GB ? 400.GB : 700.GB }
"Is the input file size < 600GB?|
errorStrategy = { task.attempt == process.maxRetries ? 'ignore' : task.exitStatus in [3,9,10,14,143,137,104,134,139] ? 'retry' : 'ignore' }<br class="Apple-interchange-newline">
Parse tombo log and if grep "Broken pipe" is found echo and redirect to sterr
(nanome) [poultrylab1@pbsnode01 nanome]$ nextflow run TheJacksonLaboratory/nanome -profile test,docker
N E X T F L O W ~ version 21.10.0
Launching `TheJacksonLaboratory/nanome` [chaotic_hamilton] - revision: c181f907e9 [master]
NANOME - NF PIPELINE (v1.3.6)
by Li Lab at The Jackson Laboratory
https://github.com/TheJacksonLaboratory/nanome
=================================
dsname : CIEcoli
input : https://github.com/TheJacksonLaboratory/nanome/raw/master/test_data/ecoli_ci_test_fast5.tar.gz
genome : ecoli
Running settings : --------
processors : 2
chrSet : NC_000913.3
dataType : ecoli
runBasecall : Yes
runNanopolish : Yes
runMegalodon : Yes
runDeepSignal : Yes
runGuppy : Yes
Pipeline settings : --------
Working dir : /storage-04/chicken/ont_methylation/nanome/work
Output dir : outputs
Launch dir : /storage-04/chicken/ont_methylation/nanome
Script dir : /storage-01/poultrylab1/.nextflow/assets/TheJacksonLaboratory/nanome
User : poultrylab1
Profile : test,docker
Config Files : /storage-01/poultrylab1/.nextflow/assets/TheJacksonLaboratory/nanome/nextflow.config
Pipeline Release : master
Container : docker - liuyangzzu/nanome:v1.2
=================================
executor > local (1)
[6e/af606b] process > EnvCheck (EnvCheck) [100%] 1 of 1 โ
[- ] process > Untar -
[- ] process > Basecall -
[- ] process > QCExport -
[- ] process > Resquiggle -
[- ] process > Nanopolish -
[- ] process > NplshComb -
[- ] process > Megalodon -
[- ] process > MgldnComb -
[- ] process > DeepSignal -
[- ] process > DpSigComb -
[- ] process > Guppy -
[- ] process > GuppyComb -
[- ] process > Report -
No such file: https://github.com/TheJacksonLaboratory/nanome/raw/master/test_data/ecoli_ci_test_fast5.tar.gz
NOTE: GitHub actions requires additional configuration for running on GPU instances.
It is recommended to implement first the cpu only mode.
There are also limitations in the resources that can be used when testing with Github Actions:
Hardware specification for Linux virtual machines (used by default)
https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources
Max cpus: 2-core CPU
Max memory: 7 GB of RAM memory
Max disk size: 14 GB of SSD disk space
To be able to add CI/CD successfully therefore, a minimal test dataset is required.
Here is the template file that implements both docker and singularity CI.
It needs to be created in a folder in the root of the repo named .github/workflows/ci.yml
(the folder names are reserved, the file name can be changed eg for ci.yml to continues-integration.yml etc)
name: splicing-pipelines-nf CI
# This workflow is triggered on pushes and PRs to the repository.
on: [pull_request]
jobs:
docker:
runs-on: ubuntu-latest
strategy:
matrix:
nxf_ver: ['20.01.0', '']
steps:
- uses: actions/checkout@v1
- name: Install Nextflow
run: |
export NXF_VER=${{ matrix.nxf_ver }}
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Basic workflow tests
run: |
nextflow run ${GITHUB_WORKSPACE} --config conf/test.config
singularity:
runs-on: ubuntu-latest
strategy:
matrix:
singularity_version: ['3.6.4']
nxf_ver: ['20.01.0', '']
steps:
- uses: actions/checkout@v1
- uses: eWaterCycle/setup-singularity@v6
with:
singularity-version: ${{ matrix.singularity_version }}
- name: Install Nextflow
run: |
export NXF_VER=${{ matrix.nxf_ver }}
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Basic workflow tests
run: |
nextflow run ${GITHUB_WORKSPACE} --config conf/test.config
Many configs can be tested using the matrix stretegy.
Implemented here, we can also add in nanome in the same way:
https://github.com/lifebit-ai/templates/blob/322299f35c354f1b8d86dd5f4848db93850a9288/inst/templates/nextflow/conf/base.config#L28-L66
// contents nextflow.config
// Specify increasing resources on failure for specific process type
withName: 'my_process' {
disk = "50 GB"
cpus = {check_max(2 * task.attempt, 'cpus')}
memory = {check_max(2.GB * task.attempt, 'memory')} }
// Ready to copy-paste function, END OF nextflow.config file
// Function to ensure that resource requirements don't go beyond
// a maximum limit
def check_max(obj, type) {
if (type == 'memory') {
try {
if (obj.compareTo(params.max_memory as nextflow.util.MemoryUnit) == 1)
return params.max_memory as nextflow.util.MemoryUnit
else
return obj
} catch (all) {
println " ### ERROR ### Max memory '${params.max_memory}' is not valid! Using default value: $obj"
return obj
}
} else if (type == 'time') {
try {
if (obj.compareTo(params.max_time as nextflow.util.Duration) == 1)
return params.max_time as nextflow.util.Duration
else
return obj
} catch (all) {
println " ### ERROR ### Max time '${params.max_time}' is not valid! Using default value: $obj"
return obj
}
} else if (type == 'cpus') {
try {
return Math.min( obj, params.max_cpus as int )
} catch (all) {
println " ### ERROR ### Max cpus '${params.max_cpus}' is not valid! Using default value: $obj"
return obj
}
}
}
Consider replacing git clone with either staging the the release tar.gz bundle from GitHub https://github.com/WGLab/DeepMod/archive/refs/tags/v0.1.3.tar.gz via a channel or using simply wget.
https://github.com/liuyangzzu/nanome/blob/3c344b67c01ab68b541a5c2add856b3ce4ae9cc2/main.nf#L113
https://github.com/liuyangzzu/nanome/blob/3c344b67c01ab68b541a5c2add856b3ce4ae9cc2/main.nf#L198
If you want to use the basename without any suffix, you can use simpleName method.
Example:
guppy_basecaller --output_path ${x.simpleName}_basecalled_folder \
Convert the whole folder to a tar to pass to the next process.
This is a requirement for the process named Megalodon
so it is a good universal solution.
We may need to add extra Miniconda.
Hi, I was wondering if you are considering to open nanome to other references ? Basically, I've been running most of the tools you have included in nanome on my datasets/references but separately. Therefore, it would really help to have a tool such nanome for it.
Thank you,
Best Regards
P
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.