Code Monkey home page Code Monkey logo

radian's Introduction

RADIAN

RNA lAnguage informeD decodIng of nAnopore sigNals

Overview

Nanopore direct RNA basecaller that utilises a model of mRNA language.

Since RNA is always sequenced from the 3' to 5' direction, nanopore signals implicitly encode the nucleotide biases in mRNA. This basecaller uses a probabilistic model of human mRNA language to guide basecalling when the signal prediction is ambiguous. The mRNA model is incorporated in a modified CTC beam search decoding algorithm.

Preprint: https://www.biorxiv.org/content/10.1101/2022.10.19.512968v1

RADIAN architecture

Installation

cd <path/to/radian>
pip install --upgrade pip
pip install -r requirements.txt
tar -xvzf radian/models/rnamodel_12mer_pc.tar.gz

Command structure

usage: basecall.py [-h] fast5_dir fasta_dir [--local] [--chunk-len] [--step-size]
                   [--batch-size] [--outlier-clip] [--rna-model]
                   [--sig-model] [--sig-config] [--beam-width]
                   [--decode-type] [--sig-threshold]
                   [--rna-threshold] [--context-len]

positional arguments:
  fast5_dir             Directory of single/multi fast5 files.
  fasta_dir             Directory to output fasta files.

optional arguments:
  -h, --help
  --local
  --chunk-len
  --step-size
  --batch-size
  --outlier-clip
  --rna-model
  --sig-model
  --sig-config
  --beam-width
  --decode-type {global,chunk}
  --sig-threshold
  --rna-threshold
  --context-len

Example usage

We provide a fast5 file containing 5 reads for testing in data/reads.fast5.

To basecall the single or multi-fast5 file(s) in and output fasta to <out_dir>:

cd radian
mkdir out_dir
python3 basecall.py data out_dir

radian's People

Contributors

a-sneddon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

radian's Issues

Output FASTA file starts with `@` instead of `>` in sequence headers

After creating an empty rna model configuration file to fix bug #1, radian produced output for me on the test data:

gringer@musculus:~/install/radian/radian$ python3 basecall.py data out_dir
...
2022-10-22 18:15:31.148055: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
1/1 [==============================] - 1s 928ms/step
1/1 [==============================] - 1s 722ms/step
1/1 [==============================] - 1s 754ms/step
Basecalled read 00256416-5423-47a9-ad91-54a87a6be5e5 in 5.77 sec.
1/1 [==============================] - 1s 582ms/step
Basecalled read 018097e8-babe-4525-8426-e7cfe568219c in 1.70 sec.
1/1 [==============================] - 1s 707ms/step
1/1 [==============================] - 1s 704ms/step
1/1 [==============================] - 0s 306ms/step
Basecalled read 022542bc-00b3-4d6a-9226-9f5da14af8ea in 4.75 sec.
1/1 [==============================] - 1s 721ms/step
1/1 [==============================] - 1s 691ms/step
1/1 [==============================] - 1s 699ms/step
1/1 [==============================] - 0s 202ms/step
Basecalled read 04295bc9-d0af-4a85-a59f-fff2ef52ece6 in 5.90 sec.
1/1 [==============================] - 1s 639ms/step
1/1 [==============================] - 1s 701ms/step
1/1 [==============================] - 0s 148ms/step
Basecalled read 049f55ce-f95e-4712-95a7-44a5709134f8 in 4.06 sec.

However, the output file extension claims a fasta format, but the sequence headers inside the file begin with @ (as for fastq format):

gringer@musculus:~/install/radian/radian$ cat out_dir/reads-0.fasta 
@00256416-5423-47a9-ad91-54a87a6be5e5
GAGCCGTCCCTACAAACTGGAGCGTCTGCCCTATTTATCCACCTCACCTAGCAATAGATCCCATCCTCCATAATCAACAACAAAGTAATATTCGCCCCACTCTAGCCAGTACTGTTGACTCTAGCCAGACCTCCTCATTTCTAACCTGAGCGGAAGGACAACCTAGTAGCTTACCCCTACCATCATTGGACAAGCTGCATCCGTACTATACTTCACAACAATCCAACTATAAACAAACTATACTCCCTAAACAGAAACAATACCAATAGGCCTAGA
@018097e8-babe-4525-8426-e7cfe568219c
GGTCATCACTCACTCAGTAAATTAATAATTCATGGTGAGAGCCTTGCTCGAGGAAAGTCCTTAAAAAAGAATCTCAAACCTAAGTTGCATCAAGATGCCCCCACCCCACACTATATCAGAACCTGTACAG
@022542bc-00b3-4d6a-9226-9f5da14af8ea
AACACAAAACCCAAATAATTCAAGCACTGCCATTACAATTTTAACTGGGTCCTATTTACCTCCTACAAAGCCTCAGAGTTACTTCCGAGTCTCTTTTCCCCATTTCCGACGGTACTCTACGGCTTAATTTGGCCTCAGGCTCCACGGACTTACGACATTATTGGCTTCACTTCATACTGCTTCATCCGCCACTAATATTCACTTTTTACATCCAACATCATTGGCCATGAAGCCGCCGCCCTGATATCTGGCATTTTGTTAGATGTGGTTTGACTATTTCTGTATGTCTCCTTTATTGACGGAGGTTAGA
@04295bc9-d0af-4a85-a59f-fff2ef52ece6
ATAGAACTCTAACAGACAACAGAAAGTACCCTAGACCTTGTAGGAAAGGCTAAAGACATGGGTGCTGGAAATGGAGCCTTGGCCTAAAGATTCTAGACTTTAGCTCGCACCTGCATCAGCTGCACTCCAGACCATGGCCCATGGCAGGCACCAGTGACTTAATTTGGAAGCAGGAGATTGAGCACTGAGTGGGAGCTACCTTGCCTGTCCTTACCACTACTTCAGTAAATAAAGGCTCTTGGA
@049f55ce-f95e-4712-95a7-44a5709134f8
GGCAAAGGTTAGACTCCAGATGGAGAGATTTGATTAACGTTGAAAACTCTGAACCACAACTTGCTGGACATAAAGCTATTGAATGATGCTTTTGAGTGAAACACACAAGTTGGGTGAAGTGGATCAATGGCCGAAGGATTATAGTTATGTTTATGGTGGTACTGAAAGAAAGAACATTTGAGCAGTGCAGTTTTGTTGCTGTGCCCATGAATAGGAAATAGAAATTGTTTATTAG

Expected output:

gringer@musculus:~/install/radian/radian$ perl -pe 's/^@/>/' out_dir/reads-0.fasta 
>00256416-5423-47a9-ad91-54a87a6be5e5
GAGCCGTCCCTACAAACTGGAGCGTCTGCCCTATTTATCCACCTCACCTAGCAATAGATCCCATCCTCCATAATCAACAACAAAGTAATATTCGCCCCACTCTAGCCAGTACTGTTGACTCTAGCCAGACCTCCTCATTTCTAACCTGAGCGGAAGGACAACCTAGTAGCTTACCCCTACCATCATTGGACAAGCTGCATCCGTACTATACTTCACAACAATCCAACTATAAACAAACTATACTCCCTAAACAGAAACAATACCAATAGGCCTAGA
>018097e8-babe-4525-8426-e7cfe568219c
GGTCATCACTCACTCAGTAAATTAATAATTCATGGTGAGAGCCTTGCTCGAGGAAAGTCCTTAAAAAAGAATCTCAAACCTAAGTTGCATCAAGATGCCCCCACCCCACACTATATCAGAACCTGTACAG
>022542bc-00b3-4d6a-9226-9f5da14af8ea
AACACAAAACCCAAATAATTCAAGCACTGCCATTACAATTTTAACTGGGTCCTATTTACCTCCTACAAAGCCTCAGAGTTACTTCCGAGTCTCTTTTCCCCATTTCCGACGGTACTCTACGGCTTAATTTGGCCTCAGGCTCCACGGACTTACGACATTATTGGCTTCACTTCATACTGCTTCATCCGCCACTAATATTCACTTTTTACATCCAACATCATTGGCCATGAAGCCGCCGCCCTGATATCTGGCATTTTGTTAGATGTGGTTTGACTATTTCTGTATGTCTCCTTTATTGACGGAGGTTAGA
>04295bc9-d0af-4a85-a59f-fff2ef52ece6
ATAGAACTCTAACAGACAACAGAAAGTACCCTAGACCTTGTAGGAAAGGCTAAAGACATGGGTGCTGGAAATGGAGCCTTGGCCTAAAGATTCTAGACTTTAGCTCGCACCTGCATCAGCTGCACTCCAGACCATGGCCCATGGCAGGCACCAGTGACTTAATTTGGAAGCAGGAGATTGAGCACTGAGTGGGAGCTACCTTGCCTGTCCTTACCACTACTTCAGTAAATAAAGGCTCTTGGA
>049f55ce-f95e-4712-95a7-44a5709134f8
GGCAAAGGTTAGACTCCAGATGGAGAGATTTGATTAACGTTGAAAACTCTGAACCACAACTTGCTGGACATAAAGCTATTGAATGATGCTTTTGAGTGAAACACACAAGTTGGGTGAAGTGGATCAATGGCCGAAGGATTATAGTTATGTTTATGGTGGTACTGAAAGAAAGAACATTTGAGCAGTGCAGTTTTGTTGCTGTGCCCATGAATAGGAAATAGAAATTGTTTATTAG

Minimum Python version

I am trying to install radian on my laptop with Python 3.7 and I am getting an issue that ``Ignored the following versions that require a different python version: 3.6.0 Requires-Python >=3.8; 3.6.0rc1 Requires-Python >=3.8; 3.6.0rc2 Requires-Python >=3.8; 3.6.1 Requires-Python >=3.8`. Is it possible that this matplotlib version be downgraded?

Requirements failed on Debian 5.18.16-1

Radian is not installing on my system because the requested tensorflow version is too low:

gringer@musculus:~/install/radian$ pip install -r requirements.txt 
Defaulting to user installation because normal site-packages is not writeable
Collecting attrdict~=2.0.1
  Using cached attrdict-2.0.1-py2.py3-none-any.whl (9.9 kB)
Collecting biopython~=1.79
  Using cached biopython-1.79-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.7 MB)
Collecting keras-tcn~=3.5.0
  Using cached keras_tcn-3.5.0-py3-none-any.whl (13 kB)
Requirement already satisfied: matplotlib~=3.6.1 in /home/gringer/.local/lib/python3.10/site-packages (from -r requirements.txt (line 4)) (3.6.1)
Collecting numpy~=1.19.5
  Using cached numpy-1.19.5.zip (7.3 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: ont-fast5-api~=4.1.0 in /home/gringer/.local/lib/python3.10/site-packages (from -r requirements.txt (line 6)) (4.1.0)
Collecting pysam~=0.19.1
  Using cached pysam-0.19.1-cp310-cp310-manylinux_2_24_x86_64.whl (15.2 MB)
Collecting PyYAML~=6.0
  Using cached PyYAML-6.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (682 kB)
Requirement already satisfied: scikit-learn~=1.1.2 in /home/gringer/.local/lib/python3.10/site-packages (from -r requirements.txt (line 9)) (1.1.2)
ERROR: Could not find a version that satisfies the requirement tensorflow~=2.4.4 (from versions: 2.8.0rc0, 2.8.0rc1, 2.8.0, 2.8.1, 2.8.2, 2.8.3, 2.9.0rc0, 2.9.0rc1, 2.9.0rc2, 2.9.0, 2.9.1, 2.9.2, 2.10.0rc0, 2.10.0rc1, 2.10.0rc2, 2.10.0rc3, 2.10.0, 2.11.0rc0, 2.11.0rc1)
ERROR: No matching distribution found for tensorflow~=2.4.4
gringer@musculus:~/install/radian$ uname -a
Linux musculus 5.18.0-4-amd64 #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10) x86_64 GNU/Linux

I changed the version to 2.9.2 (i.e. tensorflow ~= 2.9.2), and it got past that (but I'm expecting that there will need to be some code changes as well)... but I have further issues:

ERROR: Cannot install -r requirements.txt (line 10), -r requirements.txt (line 2), -r requirements.txt (line 3), -r requirements.txt (line 4), -r requirements.txt (line 6), -r requirements.txt (line 9) and numpy~=1.19.5 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested numpy~=1.19.5
    biopython 1.79 depends on numpy
    keras-tcn 3.5.0 depends on numpy
    matplotlib 3.6.1 depends on numpy>=1.19
    ont-fast5-api 4.1.0 depends on numpy>=1.16
    scikit-learn 1.1.2 depends on numpy>=1.17.3
    tensorflow 2.9.2 depends on numpy>=1.20

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

So I changed the numpy requirements as well, to numpy >= 1.20. This allowed it to install without error:

attrdict~=2.0.1
biopython~=1.79
keras-tcn~=3.5.0
matplotlib~=3.6.1
numpy>=1.20
ont-fast5-api~=4.1.0
pysam~=0.19.1
PyYAML~=6.0
scikit-learn~=1.1.2
tensorflow~=2.9.2 # Requires cuDNN 8.1, CUDA 11.2
textdistance~=4.5.0

basecall.py will run, but not complete, presumably due to slightly different library requirements:

gringer@musculus:~/install/radian/radian$ python3 radian/basecall.py data out_dir
2022-10-22 17:14:17.123853: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-10-22 17:14:17.123874: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
Traceback (most recent call last):
  File "/home/gringer/install/radian/radian/basecall.py", line 14, in <module>
    from utilities import get_config, setup_local
  File "/home/gringer/install/radian/radian/utilities.py", line 7, in <module>
    from attrdict import AttrDict
  File "/home/gringer/.local/lib/python3.10/site-packages/attrdict/__init__.py", line 5, in <module>
    from attrdict.mapping import AttrMap
  File "/home/gringer/.local/lib/python3.10/site-packages/attrdict/mapping.py", line 4, in <module>
    from collections import Mapping
ImportError: cannot import name 'Mapping' from 'collections' (/usr/lib/python3.10/collections/__init__.py)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.