Code Monkey home page Code Monkey logo

taiyaki's Introduction

ONT_logo

We have a new bioinformatic resource that largely replaces the functionality of this project! See our new repository here: https://github.com/nanoporetech/bonito

This repository is now unsupported and we do not recommend its use. Please contact Oxford Nanopore: [email protected] for help with your application if it is not possible to upgrade to our new resources, or we are missing key features.


Taiyaki

Taiyaki is research software for training models for basecalling Oxford Nanopore reads.

Oxford Nanopore's devices measure the flow of ions through a nanopore, and detect changes in that flow as molecules pass through the pore. These signals can be highly complex and exhibit long-range dependencies, much like spoken or written language. Taiyaki can be used to train neural networks to understand the complex signal from a nanopore device, using techniques inspired by state-of-the-art language processing.

Taiyaki is used to train the models used to basecall DNA and RNA found in Oxford Nanopore's Guppy basecaller and for modified base detection with megalodon. This includes the flip-flop models, which are trained using a technique inspired by Connectionist Temporal Classification (Graves et al 2006).

Main features:

  • Prepare data for training basecallers by remapping signal to reference sequence
  • Train neural networks for flip-flop basecalling and squiggle prediction
  • Export basecaller models for use in Guppy and megalodon

Taiyaki is built on top of pytorch and is compatible with Python 3.5 or later. It is aimed at advanced users, and it is an actively evolving research project, so expect to get your hands dirty.

Contents

  1. Installing system prerequisites
  2. Installing Taiyaki
  3. Tests
  4. Walk through
  5. Workflows
    * Using the workflow Makefile
    * Steps from fast5 files to basecalling
    * Preparing a training set
    * Basecalling
    * Modified bases
    * Abinitio training
  6. Guppy compatibility
    * Q score calibration
    * Standard model parameters
  7. Environment variables
  8. CUDA
    * Troubleshooting
  9. Using multiple GPUs
    * How to launch training with multiple GPUs
    * Choice of learning rates for multi-GPU training
    * Selection of GPUs
    * More than one multi-GPU training group on a single machine
  10. Running on SGE
    * Installation
    * Execution
    * Selection of multiple GPUs in SGE
  11. Diagnostics

Installing system prerequisites

To install required system packages on ubuntu 16.04:

sudo make deps

Other linux platforms may be compatible, but are untested.

In order to accelerate model training with a GPU you will need to install CUDA (which should install nvcc and add it to your path.) See instructions from NVIDIA and the CUDA section below.

Taiyaki also makes use of the OpenMP extensions for multi-processing. These are supported by the system installed compiler on most modern Linux systems but require a more modern version of the clang/llvm compiler than that installed on MacOS machines. Support for OpenMP was adding in clang/llvm in version 3.7 (see http://llvm.org or use brew). Alternatively you can install GCC on MacOS using homebrew.

Some analysis scripts require a recent version of the BWA aligner.

Windows is not supported.

Installing Taiyaki


NOTE If you intend to use Taiyaki with a GPU, make sure you have installed and set up CUDA before proceeding.

Install Taiyaki in a new virtual environment (RECOMMENDED)

We recommend installing Taiyaki in a self-contained virtual environment.

The following command creates a complete environment for developing and testing Taiyaki, in the directory venv:

make install

Taiyaki will be installed in development mode so that you can easily test your changes. You will need to run source venv/bin/activate at the start of each session when you want to use this virtual environment.

Install Taiyaki system-wide or into activated Python environment

This is not the recommended installation method: we recommend that you install taiyaki in its own virtual environment if possible.

Taiyaki can be installed from source using either:

python3 setup.py install
python3 setup.py develop #[development mode](http://setuptools.readthedocs.io/en/latest/setuptools.html#development-mode)

Alternatively, you can use pip with either:

pip install path/to/taiyaki/repo
pip install -e path/to/taiyaki/repo #[development mode](http://setuptools.readthedocs.io/en/latest/setuptools.html#development-mode)

Tests

Tests can be run as follows, provided that the recommended make install installation method was used:

source venv/bin/activate   # activates taiyaki virtual environment (do this first)
make workflow              # runs scripts which carry out the workflow for basecall-network training and for squiggle-predictor training
make acctest               # runs acceptance tests
make unittest              # runs unit tests
make multiGPU_test         # runs multi-GPU test (GPUs 0 and 1 must be available, and CUDA must be installed - see below)

Walk throughs and further documentation

For a walk-through of Taiyaki model training, including how to obtain sample training data, see docs/walkthrough.rst.

For an example of training a modifed base model, see docs/modbase.rst.

Workflows

Using the workflow Makefile

The file at workflow/Makefile can be used to direct the process of generating ingredients for training and then running the training itself.

For example, if we have a directory read_dir containing fast5 files, and a fasta file refs.fa containing a ground-truth reference sequence for each read, we can (from the Taiyaki root directory) use the command line

make -f workflow/Makefile MAXREADS=1000 \
    READDIR=read_dir USER_PER_READ_REFERENCE_FILE=refs.fa \
    DEVICE=3 train_remapuser_ref

This will place the training ingredients in a directory RESULTS/training_ingredients and the training output (including logs and trained models) in RESULTS/remap_training, using GPU 3 and only reading the first 1000 reads in the directory. The fast5 files may be single or multi-read.

Using command line options to make, it is possible to change various other options, including the directory where the results go. Read the Makefile to find out about these options. The Makefile can also be used to follow a squiggle-mapping workflow.

The paragraph below describes the steps in the workflow in more detail.

Steps from fast5 files to basecalling

The script bin/prepare_mapped_reads.py prepares a file containing mapped signals. This file is the main ingredient used to train a basecalling model.

The simplest workflow looks like this. The flow runs from top to bottom and lines show the inputs required for each stage. The scripts in the Taiyaki package are shown, as are the files they work with.

                   fast5 files
                  /          \
                 /            \
                /              \
               /   generate_per_read_params.py
               |                |
               |                |               fasta with reference
               |   per-read-params file         sequence for each read
               |   (tsv, contains shift,        (produced with get_refs_from_sam.py
               |   scale, trim for each read)   or some other method)
                \               |               /
                 \              |              /
                  \             |             /
                   \            |            /
                    \           |           /
                     \          |          /
                     prepare_mapped_reads.py
                     (also uses remapping flip-flop
                     model from models/)
                                |
                                |
                     mapped-signal-file (hdf5)
                                |
                                |
                     train_flipflop.py
                     (also uses definition
                     of model to be trained)
                                |
                                |
                     trained flip-flop model
                                |
                                |
                          dump_json.py
                                |
                                |
                     json model definition
                     (suitable for use by Guppy)

Each script in bin/ has lots of options, which you can find out about by reading the scripts. Basic usage is as follows:

bin/generate_per_read_params.py <directory containing fast5 files> --output <name of output per_read_tsv file>

bin/get_refs_from_sam.py <genomic references fasta> <one or more SAM/BAM files> --output <name of output reference_fasta>

bin/prepare_mapped_reads.py <directory containing fast5 files> <per_read_tsv> <output mapped_signal_file>  <file containing model for remapping>  <reference_fasta>

bin/train_flipflop.py --device <digit specifying GPU> <pytorch model definition> <mapped-signal files to train with>

Some scripts mentioned also have a useful option --limit which limits the number of reads to be used. This allows a quick test of a workflow.

Preparing a training set

The prepare_mapped_reads.py script prepares a data set to use to train a new basecaller. Each member of this data set contains:

  • The raw signal for a complete nanopore read (lifted from a fast5 file)
  • A reference sequence that is the "ground truth" for the that read
  • An alignment between the signal and the reference

As input to this script, we need a directory containing fast5 files (either single-read or multi-read) and a fasta file that contains the ground-truth reference for each read. In order to match the raw signal to the correct ground-truth sequence, the IDs in the fasta file should be the unique read ID assigned by MinKnow (these are the same IDs that Guppy uses in its fastq output). For example, a record in the fasta file might look like:

>17296436-f2f1-4713-adaf-169ed9cf6aa6
TATGATGTGAGCTTATATTATTAATTTTGTATCAATCTTATTTTCTAATGTATGCATTTTAATGCTATAAATTTCCTTCTAAGCACTAC...

The recommended way to produce this fasta file is as follows:

  1. Align Guppy fastq basecalls to a reference genome using Guppy Aligner or Minimap. This will produce one or more SAM files.
  2. Use the get_refs_from_sam.py script to extract a snippet of the reference for each mapped read. You can filter reads by coverage.

The final input required by prepare_mapped_signal.py is a pre-trained basecaller model, which is used to determine the alignment between raw signal and reference sequence. An example of such a model (for DNA sequenced with pore r9) is provided at models/mGru256_flipflop_remapping_model_r9_DNA.checkpoint. This does make the entire training process somewhat circular: you need a model to train a model. However, the new training set can be somewhat different from the data that the remapping model was trained on and things still work out. So, for example, if your samples are a bit weird and whacky, you may be able to improve basecall accuracy by retraining a model with Taiyaki. Internally, we use Taiyaki to train basecallers after incremental pore updates, and as a research tool into better basecalling methods. Taiyaki is not intended to enable training basecallers from scratch for novel nanopores. If it seems like remapping will not work for your data set, then you can use alternative methods so long as they produce data conformant with this format.

Basecalling

Taiyaki comes with a script to perform flip-flop basecalling using a GPU. This script requires CUDA and cupy to be installed.

Example usage:

bin/basecall.py <directory containing fast5s> <model checkpoint>  >  <output fasta>

A limited range of models can also be used with Guppy, which will provide better performance and stability. See the section on Guppy compatibility for more details.

Note: due to the RNA motor processing along the strand from 3' to 5', the base caller sees the read reversed relative to the natural orientation. Use bin/basecall.py --reverse to output the basecall of the read in its natural direction.

With the default settings, the script taiyaki/bin/basecall.py produces fasta files rather than fastqs, so no q-score calibration is needed. However the option --fastq may be used to generate fastqs instead. Because of a number of small differences between the implementation of basecalling in Guppy and Taiyaki, the q scores generated by the two systems will not be identical. Also see the section on qscore calibration below.

Modified Bases

Taiyaki enables the training of modified base basecalling (modbase) models. Modbase models will produce standard canonical basecalls along with the probability that each base is actually a modified alternative (e.g. 5mC, 5hmC, 6mA, etc.).

Modified base training requires the ground truth modified base content of each training read. This is provided as the input to the prepare_mapped_reads.py step of the training pipeline. Alternatively, Megalodon provides options to produce modified base mapped signal file in a single command for certain sample types. See documentation for these options here.

In either case, the accuracy of this modified base markup is essential to producing a highly accurate modified base model.

Modifed bases in the references FASTA file provided to prepare_mapped_reads.py are represented by a single letter code. Each modified base must be annotated with its corresponding canonical base as well as a "long name". This specification is provided via the --mod argument to prepare_mapped_reads.py, which takes 3 arguments

  1. Single letter modified base code (used in references FASTA file)
  2. Corresponging single letter canonical base code (A, C, G, or T`)
  3. Modified base long name (e.g. 5mC, 5hmC, 6mA, etc.) These values will be stored in the mapped signal file and later the produced model. It is recommended that modified base codes follow specifications from the DNAmod database if possible (though many single letter codes are not defined). For example, to encode 5-methyl-cytosine and 6-methyl-adenosine with the single letter codes m and a respectively, the following command line arguments would be added --mod m C 5mC --mod a A 6mA.

In addition to the modbase training data, a modbase model requires a categorical modifications (cat_mod) model architecture. This model replaces the flip-flop layer with a similar layer adding the logic to produce modified base probabilities. The recommended architecture is found in models/mLstm_cat_mod_flipflop.py and should be passed to train_flipflop.py command as first model argument.

The --mod_factor argument controls the proportion of the training loss attributed to the modified base output stream in comparison to the canonical base output stream. The default value of 1 should provide a high quality model in most cases (note this is different from previous recommendations).

Modified base models can be used in Guppy to call modified base anchored to the basecalls or Megalodon to call modified bases anchored to a reference.

Abinitio training

'Ab initio' is an alternative entry point for Taiyaki that obtains acceptable models with fewer input requirements, particularly it does not require a previously trained model.

The input for ab initio training is a set of signal-sequence pairs:

  • Fixed length chunks from reads
  • A reference sequence trimmed for each chunk.

The models produced are not as accurate as those produced by the normal training process but can be used to bootstrap it.

The process is described in the abinitio walk-through.

RNA

During DNA sequencing, the strands of DNA go through the pore starting at the 5' end of the molecule. In contrast, during direct RNA sequencing the strands go through the pore starting at the 3' end. As a consequence, the per-read reference sequences used for RNA training must be reversed with respect to the genome/exome reference sequence (there is no need to complement the sequences). Basecalls produced with RNA models will then need to be reversed again in order to align them to a reference.

In terms of the workflow described above, the following steps need to be changed:

  • If using get_refs_from_sam.py to produce per-read references, then add the --reverse option.
  • If using the basecall.py script in taiyaki, then add the --reverse option.
  • If basecalling with Guppy then use an RNA-specific config file (see the Guppy docs for more info).

Guppy compatibility

In order to train a model that is compatible with Guppy (version 2.2 at time of writing), we recommend that you use the model defined in models/mLstm_flipflop.py and that you call train_flipflop.py with:

train_flipflop.py --size 256 --stride 5 --winlen 19 mLstm_flipflop.py <other options...>

You should then be able to export your checkpoint to json (using bin/dump_json.py) that can be used to basecall with Guppy.

See Guppy documentation for more information on how to do this.

Key options include selecting the Guppy config file to be appropriate for your application, and passing the complete path of your .json file.

For example:

guppy_basecaller --input_path /path/to/input_reads --save_path /path/to/save_dir --config dna_r9.4.1_450bps_flipflop.cfg --model path/to/model.json --device cuda:1

Certain other model architectures may also be Guppy-compatible, but it is hard to give an exhaustive list and so we recommend you contact us to get confirmation.

Q score calibration

The Guppy config file contains parameters qscore_shift and qscore_scale which calibrate the q scores in fastq files. These parameters can also be overridden by Guppy basecaller command-line options. Since these parameters are specific to a particular model, the calibration will be incorrect for newly-trained models. The Taiyaki script misc/calibrate_qscores_byread.py can be used to calculate shift and scale parameters for a new model. The ingredients needed are an alignment summary (which may be a .txt file generated by the Guppy aligner or a .samacc file generated by taiyaki/misc/align.py) and the fastq files that go with it.

Standard model parameters

Because of differences in the chemistry, particularly sequencing speed, and sample rate, the models used in Guppy are trained with different parameters depending on condition. The default parameters for Taiyaki are generally those appropriate for a high accuracy DNA model and should be changed depending on what sample is being trained. The table below describes the parameters currently used to train the production models released as part of Guppy:

Condition chunk_len_min chunk_len_max size stride winlen
DNA, high accuracy 3000 8000 256 5 19
DNA, fast 2000 4000 96 5 19
RNA, high accuracy 10000 20000 256 10 31
RNA, fast 10000 20000 96 12 31

Environment variables

The environment variables OMP_NUM_THREADS, OMP_PROC_BIND and OPENBLAS_NUM_THREADS can have an impact on performance. The optimal value will depend on your system and on the jobs you are running, so experiment. As a starting point, we recommend:

OPENBLAS_NUM_THREADS=1
OMP_NUM_THREADS=8
OMP_PROC_BIND=true

Note that when using multiple GPUs as recommended via torch.distributed.launch, the OMP_PROC_BIND=true should be omitted.

CUDA

In order to use a GPU to accelerate model training, you will need to ensure that CUDA is installed (specifically nvcc) and that CUDA-related environment variables are set. This should be done before running make install described above. If you forgot to do this, just run make install again once everything is set up. The Makefile will try to detect which version of CUDA is present on your system, and install matching versions of pytorch and cupy. Taiyaki depends on pytorch version 1.2, which supports CUDA versions 9.2 and 10.0.

To see what version of CUDA will be detected and which torch and cupy packages will be installed you can run:

make show_cuda_version

Expert users can override the detected versions on the command line. For example, you might want to do this if you are building Taiyaki on one machine to run on another.

# Force CUDA version 9.2
CUDA=9.2 make install

# Override torch package, and don't install cupy at all
TORCH=my-special-torch-package CUPY= make install

Users who install Taiyaki system-wide or into an existing activated Python environment will need to make sure CUDA and a corresponding version of PyTorch have been installed.

Troubleshooting

During training, if this error occurs:

AttributeError: module 'torch._C' has no attribute '_cuda_setDevice'

or any other error related to the device, it suggests that you are trying to use pytorch's CUDA functionality but that CUDA (specifically nvcc) is either not installed or not correctly set up.

If:

nvcc --version

returns

-bash: nvcc: command not found

nvcc is not installed or it is not on your path.

Ensure that you have installed CUDA (check NVIDIA's intructions) and that the CUDA compiler nvcc is on your path.

To place cuda on your path enter the following:

export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64

Once CUDA is correctly configured and you are installing Taiyaki in a new virtual environment (as recommended), you may need to run make install again to ensure that you have the correct pytorch package to match your CUDA version.

Using multiple GPUs

The script bin/train_flipflop.py can be used in multi-GPU mode with Pytorch's DistributedDataParallel class. With N GPUs available on a single machine, we can run N processes, each using one of the GPUs and processing different random selections from the same training data. The gradients are synchronised by averaging across the processes. The outcome is that the batch size is larger by a factor N than the batch size in single-GPU mode.

How to launch training with multiple GPUs

Multi-GPU training runs can be launched using the Pytorch distributed.launch module. For example, in a Taiyaki environment:

python -m torch.distributed.launch --nproc_per_node=4 train_flipflop.py --lr_max 0.004 --lr_min 0.0002 taiyaki/models/mLstm_flipflop.py mapped_reads.hdf5

This command line launches four processes, each using a GPU. Four GPUs numbered 0,1,2,3 must be available.

Note that all command-line options for train_flipflop.py are used in the same way as normal, apart from device.

The script workflow/test_multiGPU.sh provides an example. Note that the line choosing GPUs (export CUDA_VISIBLE_DEVICES...) may need to be edited to specify the GPUs to be used on your system.

Choice of learning rates for multi-GPU training

A higher learning rate can be used for large-batch or multi-GPU training. As a starting point, with N GPUs we recommend using a learning rate sqrt(N) times higher than used for a single GPU. With these settings we expect to make roughly the same training progress as a single-GPU training run but in N times fewer batches. This will not always be true: as always, experiments are necessary to find the best choice of hyperparameters. In particular, a lower learning rate than suggested by the square-root rule may be necessary in the early stages of training. One way to achieve this is by using the command-line arguments lr_warmup and warmup_batches. Also bear in mind that the timescale for the learning rate schedule, lr_cosine_iters should be changed to take into account the faster progress of training.

Selection of GPUs for multi-GPU training

The settings above use the first nproc_per_node GPUs available on the machine. For example, with 8 GPUs and nproc_per_node = 4, we will use the GPUs numbered 0,1,2,3. This selection can be altered using the environment variable CUDA_VISIBLE_DEVICES. For example,

export CUDA_VISIBLE_DEVICES="2,4,6,7"

will make the GPUs numbered 2,4,6,7 available to CUDA as if they were numbers 0,1,2,3. If we then launch using the command line above (python -m torch.distributed.launch...), GPUs 2,4,6,7 will be used.

See below for how this applies in a SGE system.

More than one multi-GPU training group on a single machine

Suppose that there are 8 GPUs on your machine and you want to train two models, each using 4 GPUs. Setting CUDA_VISIBLE_DEVICES to "4,5,6,7" for the second training job, you set things off, but find that the second job fails with an error message like this

File "./bin/train_flipflop.py", line 178, in main
    torch.distributed.init_process_group(backend='nccl')
File "XXXXXX/taiyaki/venv/lib/python3.5/site-packages/torch/distributed/distributed_c10d.py", line 354, in init_process_group
    store, rank, world_size = next(rendezvous(url))
File "XXXXXX/taiyaki/venv/lib/python3.5/site-packages/torch/distributed/rendezvous.py", line 143, in _env_rendezvous_handler
    store = TCPStore(master_addr, master_port, start_daemon)
RuntimeError: Address already in use

The reason is that torch.distributed.launch sets up the process group with a fixed default IP address and port for communication between processes (master_addr 127.0.0.1, master_port 29500). The two process groups are trying to use the same port. To fix this, set off your second process group with a different address and port:

python -m torch.distributed.launch --nproc_per_node=4 --master_addr 127.0.0.2 --master_port 29501 train_flipflop.py <command-line-options>

Running on an SGE cluster

There are two things to get right: installing with the correct CUDA version, and executing with the correct choice of GPU.

Installation

It is important that when the package is installed, it knows which version of the CUDA compiler is available on the machine where it will be executed. When running on an SGE cluster we might want to do installation on a different machine from execution. There are two ways of getting around this. You can qlogin to a node which has the same resources as the execution node, and then install using that machine:

qlogin -l h=<nodename>
cd <taiyaki_directory>
make install

...or you can tell Taiyaki at the installation stage which version of CUDA to use. For example

CUDA=9.2 make install

Execution

When executing on an SGE cluster you need to make sure you run on a node which has GPUs available, and then tell Taiyaki to use the correct GPU.

You tell the system to wait for a node which has an available GPU by adding the option -l gpu=1 to your qsub command. To find out which GPU has been allocated to your job, you need to look at the environment variable SGE_HGR_gpu. If it has the value cuda0, then use GPU number 0, and if it has the value cuda1, then use GPU 1. The command line option --device (used by train_flipflop.py accepts inputs such as 'cuda0' or 'cuda1' or integers 0 or 1, so SGE_HGR_gpu can be passed straight into the --device option.

The easy way to achieve this is with a Makefile like the one in the directory workflow. This Makefile contains comments which will help users run the package on a UGE system.

Selection of multiple GPUs in SGE

When multiple GPUs are available to a SGE job (for example, if we use the command line option -l gpu=4 in qsub to request 4 GPUS), the GPUs allocated are passed to the process in SGE_HGR_gpu. Unfortunately, CUDA_VISIBLE_DEVICES requires a comma-separated list of integers, and the list supplied in SGE_HGR_gpu is space-separated and contains strings like 'cuda0'. To get around this we first convert to a comma-separated list and then remove the word 'cuda'. These lines should be placed in the script before the training script is called.

COMMASEP=${SGE_HGR_gpu// /,}
export CUDA_VISIBLE_DEVICES=${COMMASEP//cuda/}

Also note that on nodes with many GPUs, port clashes may occur (see 'More than one multi-GPU training group on a single machine' above). They can be avoided with clever use of the command-line arguments of torch.distributed.launch.

Diagnostics

The misc directory contains several scripts that are useful for working out where things went wrong (or understanding why they went right).

Graphs showing the information in mapped read files can be plotted using the script plot_mapped_signals.py A graph showing the progress of training can be plotted using the script plot_training.py


This is a research release provided under the terms of the Oxford Nanopore Technologies' Public Licence. Research releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.

© 2019 Oxford Nanopore Technologies Ltd. Taiyaki is distributed under the terms of the Oxford Nanopore Technologies' Public Licence.

taiyaki's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

taiyaki's Issues

Size training set ?

Dear,

I am leaving the Nanopore Day in Bordeaux and there Stephen Rudd presented the release of this promising tools.

What should be the size of the training set? Number of reads of the same reference?

I have a couple of experiments where I am able to link 1-to-1 the reads and the real expected sequence. Before writing some code to clean my data and format it to be acceptable by taiyaki, I would like to know if I can expect any improvement or not. :-)

Thank you in advance for any comments.

How to reinstall following git pull

Sorry for basic naive question. I pulled recent updates to the master git repository to my local git clone.

Does it require running make install again for everything to run smoothly?

Resolution Error when running get_refs_from_sam.py

SOLVED! I was just calling the function directly which was looking at my local environment. Calling the function inside bin/ helped solve the issue

When I try to run the get_refs_from_sam.py command I get the following error:

pkg_resources.ResolutionError: Script 'scripts/get_refs_from_sam.py' not found in metadata at '/yshare2/home/aakdemir/taiyaki/taiyaki.egg-info'

As for insight full command :

get_refs_from_sam.py  reference.fasta  ../basecall.sam  --min_coverage  0.8 > read_references.fasta

I did not produce the sam files using the guppy tool as in the example given in the workflow but I dont think it should be an issue.

I installed from source and did,

python3 setup.py install

Also activated the venv environment. generate_per_read_params.py works smoothly.

Replacing T for modified base in reference fasta file.

I am trying to prepare a reference_fasta to then call mod base training set. I have mapped reads to reference and then generated a fasta sequence for each read.

To generate data, I produced IVT replacing U for modified U.

Is there some easy way to changed every T for mod Y alphabet in the fasta reference file which I can then feed into rest of training?

Make Install on GPU

Hi,
I am trying to install taiyaki on GPU using 'make install'. Using python 3.5.2 and cuda 9.0, I get the following error:

building 'taiyaki.squiggle_match.squiggle_match' extension
/apps/well/gcc/5.4.0/bin/gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I./taiyaki/squiggle_match -I/gpfs1/well/ont/apps/taiyaki_gpu/taiyaki/venv/lib/python3.5/site-packages/numpy/core/include -I/apps/well/python/3.5.2-gcc5.4.0/include/python3.5m -c taiyaki/squiggle_match/squiggle_match.c -o build/temp.linux-x86_64-3.5/taiyaki/squiggle_match/squiggle_match.o -O3 -fopenmp -std=c99 -march=native
In file included from /gpfs1/well/ont/apps/taiyaki_gpu/taiyaki/venv/lib/python3.5/site-packages/numpy/core/include/numpy/ndarraytypes.h:1822:0,
from /gpfs1/well/ont/apps/taiyaki_gpu/taiyaki/venv/lib/python3.5/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from /gpfs1/well/ont/apps/taiyaki_gpu/taiyaki/venv/lib/python3.5/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from taiyaki/squiggle_match/squiggle_match.c:626:
/gpfs1/well/ont/apps/taiyaki_gpu/taiyaki/venv/lib/python3.5/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
#warning "Using deprecated NumPy API, disable it with "
^
/tmp/ccqWrNNb.s: Assembler messages:
/tmp/ccqWrNNb.s:8517: Error: no such instruction: `shlx %rax,%rdi,%rdx'
error: command '/apps/well/gcc/5.4.0/bin/gcc' failed with exit status 1
make: *** [install] Error 1

So far I haven't found a solution for this, do you have an idea for this issue?

Thanks,
Florian

Model trained on in vitro transcribed RNA works well on test set, but not on RNA isolated from cells

Hi, could you please clarify what --winlen mean? Does it mean windows of 19 bases are used for training (and thus later for basecalling)?

--winlen WINLEN Length of window over data (default: 19)

I'm asking, because I've trained a model for RNA using in vitro transcribed RNA molecules (IVT) that cover all possible 5-mers. The model works great both, on the training and test set (Fig1), but it doesn't work on RNA isolated from cells (Fig2), while default guppy3 model works just fine (Fig3).
Would setting --winlen 5 (thus training to detect 5-mers) solve the issue?

Fig1: In vitro transcribed sample [guppy3 using model trained on IVT]
image

Fig2: Biological sample [guppy3 using model trained on IVT]
image

Fig3: Biological sample [guppy3 with default model]
image

In addition, could you please briefly describe also how chunks.hdf5 was generated for ab inito training? That would be very useful in the near future!

prepare_reads script running very slowly

I am running this on around 300, 000 direct RNA reads with 100% replacement with modified U.

Running with --jobs 12 (The maximum I can dot) but based on the progress this would take forever. I am unsure how much this will go faster using more threads.

Running htop there only seems to be 4 jobs running at any one time. RAM is not maxed out.

Some warning is printed may be reason running slowly

/taiyaki/venv/lib/python3.5/site-packages/torch/serialization.py:434: SourceChangeWarning: source code of class 'taiyaki.layers.Serial' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = Trueand use the patch tool to revert the changes. warnings.warn(msg, SourceChangeWarning) /taiyaki/venv/lib/python3.5/site-packages/torch/serialization.py:434: SourceChangeWarning: source code of class 'taiyaki.layers.Convolution' has changed. you can retrieve the original source code by accessing the object's source attribute or settorch.nn.Module.dump_patches = Trueand use the patch tool to revert the changes. warnings.warn(msg, SourceChangeWarning) /taiyaki/venv/lib/python3.5/site-packages/torch/serialization.py:434: SourceChangeWarning: source code of class 'taiyaki.layers.GruMod' has changed. you can retrieve the original source code by accessing the object's source attribute or settorch.nn.Module.dump_patches = Trueand use the patch tool to revert the changes. warnings.warn(msg, SourceChangeWarning) /taiyaki/venv/lib/python3.5/site-packages/torch/serialization.py:434: SourceChangeWarning: source code of class 'taiyaki.layers.GlobalNormFlipFlop' has changed. you can retrieve the original source code by accessing the object's source attribute or settorch.nn.Module.dump_patches = True and use the patch tool to revert the changes. warnings.warn(msg, SourceChangeWarning)

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Dear All,
With several attempts to run taiyaki/bin/train_mod_flipflop.py, I am ending up with an error related to cuDNN. Could you please help me resolve this? The error is as follows-

* Taiyaki version 4.1.0
* Command line
/home/gaurav/current_account/soft_scripts/ONT/taiyaki/bin/train_mod_flipflop.py --device 0 --mod_factor 0.01 --outdir training_walk_3 /home/gaurav/current_account/soft_scripts/ONT/taiyaki/models/mGru_cat_mod_flipflop.py intermediate_files/modbase_walk_3.hdf5 --overwrite
* Loading data from intermediate_files/modbase_walk_3.hdf5
* Per read file MD5 243a48f6886023013630f367d73be3eb
* Will train from all strands
* Loaded 5891 reads.
* Using alphabet definition: canonical alphabet ACGT with modified base(s) Y=6mA (alt to A), Z=5mC (alt to C)
* Sampled 1000 chunks: median(mean_dwell)=9.87, mad(mean_dwell)=0.86
* Reading network from /home/gaurav/current_account/soft_scripts/ONT/taiyaki/models/mGru_cat_mod_flipflop.py
1.0.0
Traceback (most recent call last):
  File "/home/gaurav/current_account/soft_scripts/ONT/taiyaki/bin/train_mod_flipflop.py", line 358, in <module>
    main()
  File "/home/gaurav/current_account/soft_scripts/ONT/taiyaki/bin/train_mod_flipflop.py", line 202, in main
    network = helpers.load_model(args.model, **model_kwargs).to(device)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 381, in to
    return self._apply(convert)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/rnn.py", line 118, in _apply
    self.flatten_parameters()
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/rnn.py", line 114, in flatten_parameters
    self.batch_first, bool(self.bidirectional))
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

real	0m12.389s
user	0m16.438s
sys	0m14.909s

Additional info:
I have installed- CUDA v10.0.13, cuDNN v7.4.1 (compatibile with CUDA v10.0; also tried cuDNN v7.6.2), nvcc and python 2.7 and 3.5 are available system-wide. Graphic card- NVIDIA GEFORCE RTX-2080-Ti along with Nvidia driver v410.48. I have installed Taiyaki in venv as recommended and the 'make install' process successfully installs all the dependencies including pytorch v1.0.0 (which I believe is also associated with the error).
I also believe that this could be an issue of incompatibilities between different versions of CUDA, cuDNN, Pytorch, Python, Nvidia driver and graphic card. Thus, it would also be helpful if anyone of you running Taiyaki successfully on NVIDIA GEFORCE RTX-2080-Ti graphic card can post versions of different dependencies mentioned above. Many thanks !

Best,
Gaurav

train using PCR fragments

Dear Developper,
i have few PCR products sequenced via Nanopore. however, guppy base calling is not very good. I would like to make a new model. Is it possible using sequences of PCR fragments and taiyaki?

Cheers
Luigi

pytorch1.2/python3.7/cpu/linux wheel download HTTP 403 error

Issue transferred from megalodon issue reported by @bcatano :

Receiving the following error when trying to install via the installation instructions. Previous step completed with no issues.

~/git/taiyaki$ make install 
rm -rf venv
virtualenv --python=python3 --prompt="(taiyaki) " venv
Running virtualenv with interpreter /home/grid/miniconda3/bin/python3
Using base prefix '/home/grid/miniconda3'
/usr/lib/python3/dist-packages/virtualenv.py:1086: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
New python executable in /home/grid/git/taiyaki/venv/bin/python3
Also creating executable in /home/grid/git/taiyaki/venv/bin/python
Installing setuptools, pkg_resources, pip, wheel...done.
source venv/bin/activate && \
    python3 venv/bin/pip install pip --upgrade && \
    mkdir -p /home/grid/.cache/taiyaki/wheelhouse/ && \
    python3 venv/bin/pip download --dest /home/grid/.cache/taiyaki/wheelhouse/ http://download.pytorch.org/whl/cpu/torch-1.2.0-cp37-cp37m-manylinux1_x86_64.whl && \
    python3 venv/bin/pip install --find-links /home/grid/.cache/taiyaki/wheelhouse/ --no-index torch && \
    python3 venv/bin/pip install -r requirements.txt  && \
    python3 venv/bin/pip install -r develop_requirements.txt && \
    python3 setup.py develop
Requirement already up-to-date: pip in ./venv/lib/python3.7/site-packages (19.2.3)
Collecting torch==1.2.0 from http://download.pytorch.org/whl/cpu/torch-1.2.0-cp37-cp37m-manylinux1_x86_64.whl
  ERROR: HTTP error 403 while getting http://download.pytorch.org/whl/cpu/torch-1.2.0-cp37-cp37m-manylinux1_x86_64.whl
  ERROR: Could not install requirement torch==1.2.0 from http://download.pytorch.org/whl/cpu/torch-1.2.0-cp37-cp37m-manylinux1_x86_64.whl because of error 403 Client Error: Forbidden for url: http://download.pytorch.org/whl/cpu/torch-1.2.0-cp37-cp37m-manylinux1_x86_64.whl
ERROR: Could not install requirement torch==1.2.0 from http://download.pytorch.org/whl/cpu/torch-1.2.0-cp37-cp37m-manylinux1_x86_64.whl because of HTTP error 403 Client Error: Forbidden for url: http://download.pytorch.org/whl/cpu/torch-1.2.0-cp37-cp37m-manylinux1_x86_64.whl for URL http://download.pytorch.org/whl/cpu/torch-1.2.0-cp37-cp37m-manylinux1_x86_64.whl
make: *** [Makefile:48: install] Error 1

Error in modbase-walkthrough

Hi,
I am trying to run taiyaki on the data provided in the walkthrough for modified base detection. I installed taiyaki in a virtual environment (taiyaki 4.1.0, torch 1.1.0, numpy 1.16.3).

When running prepare_mapped_reads.py, like this on a GPU, I get the following error message:

prepare_mapped_reads.py --limit 100 --mod Z C 5mC --mod Y A 6mA reads flo_files/read_params.tsv flo_files/mapped_reads.hdf5 pretrained/r941_dna_minion.checkpoint modbase_references.fasta --overwrite

Running prepare_mapping using flip-flop remapping
Converting references to labels using canonical alphabet ACGT with modified base(s) Y=6mA (alt to A), Z=5mC (alt to C)
~/miniconda3/envs/flo_taiyaki/lib/python3.6/site-packages/torch-1.1.0-py3.6-linux-x86_64.egg/torch/serialization.py:454: SourceChangeWarning: source code of class 'taiyaki.layers.Serial' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)
.... more SourceChangeWarnings
~/miniconda3/envs/flo_taiyaki/lib/python3.6/site-packages/torch-1.1.0-py3.6-linux-x86_64.egg/torch/serialization.py:454: SourceChangeWarning: source code of class 'torch.nn.modules.linear.Linear' has changed. you can retrieve the original source code by accessing the object's source attribute or set torch.nn.Module.dump_patches = True and use the patch tool to revert the changes.
warnings.warn(msg, SourceChangeWarning)

  • 82 reads failed to produce remapping results due to: Failure applying basecall network to remap read.
  • 18 reads failed to produce remapping results due to: No fasta reference found.

I narrowed the problem down to line 65 in prepare_mapping_funcs.py:
transweights = modelOnDevice(signalTensor).cpu().numpy()

File "/users/bsg/kjv309/miniconda3/envs/flo_taiyaki/lib/python3.6/site-packages/taiyaki-4.1.0-py3.6-linux-x86_64.egg/taiyaki/prepare_mapping_funcs.py", line 65, in oneread_remap
transweights = modelOnDevice(signalTensor).cpu().numpy()

When removing the try .. except statement for this line I get the following error text:

Traceback (most recent call last):
File "./miniconda3/envs/flo_taiyaki/bin/prepare_mapped_reads.py", line 4, in
_import_('pkg_resources').run_script('taiyaki==4.1.0', 'prepare_mapped_reads.py')
File "./miniconda3/envs/flo_taiyaki/lib/python3.6/site-packages/pkg_resources/_init_.py", line 666, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "./miniconda3/envs/flo_taiyaki/lib/python3.6/site-packages/pkg_resources/_init_.py", line 1453, in run_script
exec(code, namespace, namespace)
File "/gpfs0/./miniconda3/envs/flo_taiyaki/lib/python3.6/site-packages/taiyaki-4.1.0-py3.6-linux-x86_64.egg/EGG-INFO/scripts/prepare_mapped_reads.py", line 83, in
main()
File "/gpfs0/./miniconda3/envs/flo_taiyaki/lib/python3.6/site-packages/taiyaki-4.1.0-py3.6-linux-x86_64.egg/EGG-INFO/scripts/prepare_mapped_reads.py", line 79, in main
results, args.output, alphabet_info)
File "./miniconda3/envs/flo_taiyaki/lib/python3.6/site-packages/taiyaki-4.1.0-py3.6-linux-x86_64.egg/taiyaki/prepare_mapping_funcs.py", line 102, in generate_output_from_results
for resultdict, mesg in results:
File "./miniconda3/envs/flo_taiyaki/lib/python3.6/site-packages/taiyaki-4.1.0-py3.6-linux-x86_64.egg/taiyaki/iterators.py", line 353, in imap_mp
for r in map(my_function, args):
File "./miniconda3/envs/flo_taiyaki/lib/python3.6/site-packages/taiyaki-4.1.0-py3.6-linux-x86_64.egg/taiyaki/prepare_mapping_funcs.py", line 65, in oneread_remap
transweights = modelOnDevice(signalTensor).cpu().numpy()
File "./miniconda3/envs/flo_taiyaki/lib/python3.6/site-packages/torch-1.1.0-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 493, in _call_
result = self.forward(*input, **kwargs)
File "./miniconda3/envs/flo_taiyaki/lib/python3.6/site-packages/taiyaki-4.1.0-py3.6-linux-x86_64.egg/taiyaki/layers.py", line 487, in forward
x = layer(x)
File "./miniconda3/envs/flo_taiyaki/lib/python3.6/site-packages/torch-1.1.0-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 493, in _call_
result = self.forward(*input, **kwargs)
File "./miniconda3/envs/flo_taiyaki/lib/python3.6/site-packages/taiyaki-4.1.0-py3.6-linux-x86_64.egg/taiyaki/layers.py", line 426, in forward
out = self.activation(self.conv(self.pad(x)))
File "./miniconda3/envs/flo_taiyaki/lib/python3.6/site-packages/torch-1.1.0-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 493, in _call_
result = self.forward(*input, **kwargs)
File "./miniconda3/envs/flo_taiyaki/lib/python3.6/site-packages/torch-1.1.0-py3.6-linux-x86_64.egg/torch/nn/modules/conv.py", line 190, in forward
if self.padding_mode == 'circular':
File "./miniconda3/envs/flo_taiyaki/lib/python3.6/site-packages/torch-1.1.0-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 539, in _getattr_
type(self)._name_, name))
AttributeError: 'Conv1d' object has no attribute 'padding_mode'

Is this an issue concerning torch (related to the warnings) or something else? Do you have ideas on this issue?

Thanks,
Florian

CUDA out of memory

I am unsure whether to place here or on some CUDA specific thread.

getting to final step of training data on IVT RNA synthesized with modified base. Most reads could not be resquiggled, I guess because of the higher error rate relating to modified base completely replacing U in the T7 pol mix.

I am running out of room on the GPU but running nvidia-smi seems plenty of space free.

`/taiyaki/venv/bin/train_flipflop.py --device 0 models/mGru_flipflop.py training mapped_reads.hdf5 --overwrite

  • Loading data from mapped_reads.hdf5
  • Per read file MD5 5f26a4f43d3e2acd15d8d65f6167136c
  • Reads not filtered by id
  • Loaded 148 reads.
  • Sampled 1000 chunks: median(mean_dwell)=37.97, mad(mean_dwell)=12.13
  • Reading network from models/mGru_flipflop.py
  • Network has 1989160 parameters.
  • Dumping initial model
  • Training
    ....Traceback (most recent call last):
    File "/taiyaki/venv/bin/train_flipflop.py", line 7, in
    exec(compile(f.read(), file, 'exec'))
    File "/taiyaki/bin/train_flipflop.py", line 251, in
    main()
    File "/taiyaki/bin/train_flipflop.py", line 203, in main
    loss.backward()
    File "/taiyaki/venv/lib/python3.5/site-packages/torch/tensor.py", line 102, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
    File "/taiyaki/venv/lib/python3.5/site-packages/torch/autograd/init.py", line 90, in backward
    allow_unreachable=True) # allow_unreachable flag
    RuntimeError: CUDA out of memory. Tried to allocate 125.38 MiB (GPU 0; 5.93 GiB total capacity; 5.21 GiB already allocated; 119.06 MiB free; 17.47 MiB cached)`

Anyone have experience with troubleshooting this. The GPU is NVIDIA GeForce GTX 1060.

ImportError: /lib64/libc.so.6: version `GLIBC_2.14' not found

Hi:
When I run the make unittest command to try to confirm that taikayi is working properly, I get the following error:

 ================================================ ERRORS ================================================
______________________________ ERROR collecting test/unit/test_layers.py _______________________________
ImportError while importing test module '/home/weir/software/taiyaki/test/unit/test_layers.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../test/unit/test_layers.py:6: in <module>
    import torch
../../venv/lib/python3.6/site-packages/torch/__init__.py:84: in <module>
    from torch._C import *
E   ImportError: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/weir/software/taiyaki/venv/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
======================================= 1 error in 1.61 seconds ========================================

I need install a new version of glibc?

guppy with new model trained on modified uridine RNA outputs high identities but low basecall quality

I have a similar yet opposite problem to this previous issue #26 (comment)

Where I have a model that works well with the basecall.py script but when dumped to model file for guppy, the output basecalls seem to still have very low basecall quality despite using reads that were originally filtered to pass the min_qscore >7. Despite this the basecall reads both fail and pass map well (identities well over 95%).

I read that for RNA, increasing chunk size to 1200 may help but this only very marginally.

Only thing I can assume is guppy assumes RNA to use AGCU alphabet and require additional step to output T where as I used AGCT alphabet to train my model.

Below is percent identity vs length output from mapped reads outputted from basecall.py and guppy. I show output on the input trained data to illustrate the point that guppy outputs lower basecall qualities than the input data. This data is a mixture of unmodified and modified reads.

output from basecall.py looks solid, of course because this is the data used to train:

Screen Shot 2019-09-26 at 3 27 45 PM

output from guppy using the trained model and rna941_hac.cfg:

Screen Shot 2019-09-26 at 4 08 24 PM

Here is pycoQC of the sequencing summary file , around 2/3 of reads are classified as 'fail'

using new trained model for basecall

Screen Shot 2019-09-26 at 4 11 36 PM

using the standard RNA hac model with guppy

Screen Shot 2019-09-26 at 4 12 25 PM

To generate this model

  1. I took reads generated from IVT mRNA with or without modified uridine. I filtered out raw fast5 reads that generated passed reads from guppy and greater than 500nt as I think the IVT generates can produce plenty of premature termination transcripts. I output fastq sequences with T instead of U

  2. Then generated a reference fasta for each of the reads. For the modified reads, I converted T to Y for the modification. For all reads are reversed the sequence to match with the physical mRNA molecule passing through the pore

  3. I generated the prepare mapped reads hdf5 files separately for unmodified and modified and then later merged them with the merge.py script. Resquiggle used the pretrained/r941_rna_minion.checkpoint file given in the taiyaki_modbase walkthrough. Using AGCT alphabet as I output T originally with guppy.

  4. Then trained data using the taiyaki/models/mGru_cat_mod_flipflop.py for first round and using AGCT and additionally parameters --winlen 31, --size 256, --stride 10 to make compatible with RNA hac.

Far as I can tell things went smoothly for training. Although for mapped squiggle to sequence it doesn't look as clean as DNA examples but this probably has something to do with the DNA adaptor at 3' end of RNA molecule and the polyA on the raw read but absent form the basecalled sequence.

Only two kinds of nucleotides predicted by GUPPY on RNA model just trained

Hi:
I have encountered an similar but different issue with #23 (comment).
In my condition,Only two kinds of nucleotides (UC) are predicted in all reads.like this:

@b706192c-74b0-4c84-809a-1787d3b566ba runid=5bf02a9de9e0ebfa894230c4ecc621d657cfbace sampleid=20190522-NPL0934-K2 read=14611 ch=107 start_time=2019-05-22T09:20:10Z
UCUUCGCUCUUCUCUCUCUCUCUCCUCUCUCUUCUCCU
+
#$$$%"%#%#$%$%#&"%%%$%$%%#$$&$%$#%$%$&
@2a7ead7c-de5f-4d2c-99bd-bd2b77f146c8 runid=5bf02a9de9e0ebfa894230c4ecc621d657cfbace sampleid=20190522-NPL0934-K2 read=16910 ch=261 start_time=2019-05-22T10:08:50Z
UCUCUCUCUUUCUCUCUUUUCUCUCUCUCUCCU
+
$%$%#%#$#$#%$%$%$###&$%$%$%#&$''$
@8b60657e-e05e-454f-89a6-a36511a78926 runid=5bf02a9de9e0ebfa894230c4ecc621d657cfbace sampleid=20190522-NPL0934-K2 read=1551 ch=266 start_time=2019-05-22T02:43:14Z
UUCCUCUUUUCUCUCU
+
$#%%#&##$#$$%#$&
@b7adb84b-e918-47ab-8ac8-a1d15944eb3e runid=5bf02a9de9e0ebfa894230c4ecc621d657cfbace sampleid=20190522-NPL0934-K2 read=27890 ch=478 start_time=2019-05-22T19:10:28Z
UCUCUUCUCUCUCUCCCGCCUCUCUCU

I've also trained the model following your guide.Apparently,the length of reads also is abnormal.

train_mod: RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Trying to run train_mod on a HPC with GPU installed but get error similar to #37

environment

Ubuntu 16.04.6 LTS (GNU/Linux 4.4.0-75-generic x86_64)

syst.admin install Taiyaki system wide with the setup.py file and creating a virtual environment.

(python3) (base) callum@dgt-gpu1:~$ source taiyaki/venv/bin/activate
-bash: taiyaki/venv/bin/activate: No such file or directory
(python3) (base) callum@dgt-gpu1:~$ python -c 'import taiyaki; print(taiyaki.__version__)'
4.1.0
(python3) (base) callum@dgt-gpu1:~$ python -c 'import torch; print(torch.__version__)'
1.0.0
(python3) (base) callum@dgt-gpu1:~$ python -c 'import torch; print(torch.version.cuda)'
9.0.176
(python3) (base) callum@dgt-gpu1:~$ python -c 'import torch; print(torch.backends.cudnn.version())'
7401
(python3) (base) callum@dgt-gpu1:~$ python -c 'import torch; print(torch.backends.cudnn.enabled)'
True

Run

After activating environment

(python3) (base) callum@dgt-gpu1:~$ train_mod_flipflop.py --device cuda:0 --mod_factor 0.01 --outdir taiyaki/taiyaki_modbase/training_1 taiyaki/models/mGru_cat_mod_flipflop.py taiyaki/taiyaki_modbase/intermediate_files/modbase.hdf5 --overwrite

error traceback

* Taiyaki version 4.1.0
* Command line
/usr/local/python3/bin/train_mod_flipflop.py --device cuda:0 --mod_factor 0.01 --outdir taiyaki/taiyaki_modbase/training_1 taiyaki/models/mGru_cat_mod_flipflop.py taiyaki/taiyaki_modbase/intermediate_files/modbase.hdf5 --overwrite
* Loading data from taiyaki/taiyaki_modbase/intermediate_files/modbase.hdf5
* Per read file MD5 ffd4296fb9fce0b45ad0230c2f8936cb
* Will train from all strands
* Loaded 5891 reads.
* Using alphabet definition: canonical alphabet ACGT with modified base(s) Y=6mA (alt to A), Z=5mC (alt to C)
* Sampled 1000 chunks: median(mean_dwell)=9.90, mad(mean_dwell)=0.87
* Reading network from taiyaki/models/mGru_cat_mod_flipflop.py
Traceback (most recent call last):
  File "/usr/local/python3/bin/train_mod_flipflop.py", line 4, in <module>
    __import__('pkg_resources').run_script('taiyaki==4.1.0', 'train_mod_flipflop.py')
  File "/usr/local/python3/lib/python3.5/site-packages/pkg_resources/__init__.py", line 719, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/python3/lib/python3.5/site-packages/pkg_resources/__init__.py", line 1504, in run_script
    exec(code, namespace, namespace)
  File "/usr/local/python3/lib/python3.5/site-packages/taiyaki-4.1.0-py3.5-linux-x86_64.egg/EGG-INFO/scripts/train_mod_flipflop.py", line 358, in <module>
    main()
  File "/usr/local/python3/lib/python3.5/site-packages/taiyaki-4.1.0-py3.5-linux-x86_64.egg/EGG-INFO/scripts/train_mod_flipflop.py", line 202, in main
    network = helpers.load_model(args.model, **model_kwargs).to(device)
  File "/usr/local/python3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 381, in to
    return self._apply(convert)
  File "/usr/local/python3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/usr/local/python3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/usr/local/python3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/usr/local/python3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 187, in _apply
    module._apply(fn)
  File "/usr/local/python3/lib/python3.5/site-packages/torch/nn/modules/rnn.py", line 117, in _apply
    self.flatten_parameters()
  File "/usr/local/python3/lib/python3.5/site-packages/torch/nn/modules/rnn.py", line 113, in flatten_parameters
    self.batch_first, bool(self.bidirectional))
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

sys.admin tried upgrade and downgrading torch package and CUDA to 10.0.130 but still provides same error:

make tests make acctest and make unittest do pass but 2 were skipped

../../test/unit/test_layers.py::ANNTest::test_017_decode_simple SKIPPED
../../test/unit/test_layers.py::GlobalNormFlipFlopTest::test_cupy_and_non_cupy_same SKIPPED

I am unsure if this relates.

I managed to install and run train_mod on another local station installed within a docker container running ubuntu 16.04 and also with same dependency versions. How I installed with git clone, and then make install not using setup.py

sys.admin advised me there may be some bug in setup.py file:

taiyaki/setup.py ( by git clone) has a bug. ( package_data line)
diff is

(taiyaki) anon@dgt-gpu1:~/src/taiyaki$ diff -u setup.py.orig setup.py
--- setup.py.orig	2019-08-23 11:04:56.570220292 +0900
+++ setup.py	2019-08-23 11:07:25.946714589 +0900
@@ -73,7 +73,7 @@
     ],
 
     packages=find_packages(exclude=["*.test", "*.test.*", "test.*", "test", "bin"]),
-    package_data={'configs': 'data/configs/*'},
+    package_data={'configs': ['data/configs/*']},
     exclude_package_data={'': ['*.hdf', '*.c', '*.h']},
     ext_modules=extensions,
     setup_requires=["pytest-runner", "pytest-xdist"],

Several questions about processes and definitions

Hi:
I have a few questions I want to get help with.

  1. Does needs to specify a motif(like CCWGG of 5mc in E.coli) for Training a modified base model ?If my modified base does not exist any motif,this pipeline will still work?
  2. I have some confuse about trim parameter.I think this parameter is used to limit a range of reads for downstream analysis.For insdance,There is a read consisting of 100 bases.If I set --trim 80 20 ,only the data in the interval (20th bases~80th bases)will be used.
  3. How to quickly and accurately change content of references.fasta for each read.If I know the UUID of the reads and the location where the modification exists,I want to alter the canonical base to modified base at known sites.(e.g ACCGTAGTGAT --> ACCGTXGTGAT)

taiyaki useful to try yet?

Is taiyaki as of today (2-Apr-2019) in a state that can be tried with a dataset to generate a new model and then apply the model to basecall a second dataset?

Should we wait for an imminent release that brings needed features to the fore?

Modbase Walkthrough

I've built Taiyaki on a GPU node with "make install".
Because the "generate_per_read_params.py" and "prepare_mapped_reads.py" commands don't appear to make use of the GPU I activated the venv and ran them from another compute node.

I get the error message:

* 2109 reads failed to produce remapping results due to: No fasta reference found.

Is this a problem with the walkthrough or is it a consequence of me running the binary on a different machine?

Thanks,

Dan

The default parameters for training are set for DNA and are not appropriate for RNA. https://github.com/nanoporetech/taiyaki/issues/23#issuecomment-498248344

The default parameters for training are set for DNA and are not appropriate for RNA. #23 (comment)

Until we get the documentation updated, the following table should give guidance as to appropriate settings.

Condition chunk_len_min chunk_len_max size stride winlen
DNA, high accuracy 2000 4000 256 2 or 3 19
DNA, fast 2000 4000 96 4 19
RNA, high accuracy 2000 4000 256 10 31
RNA, fast 2000 4000 96 12 31

Originally posted by @tmassingham-ont in #32 (comment)

Only two nucleotides predicted by basecall.py on RNA model just trained

I have encountered an issue with basecall.py. Only two nucleotides (AT) are predicted in all reads. When I specify --alphabet ACGU, then only AU is basecalled in all reads. Guppy predicts only single nucleotide with the same model. Any idea what went wrong?
I've trained the model following your guide.

>000ef55b-c583-423c-8ca9-2961b723b07b
AT
>00182b59-9b3f-43ca-9822-3ce4b5e1b1aa
AT
>001dc7a7-a20e-4ec8-b62d-2f349396bdc6
AT
>0024f15a-eec9-4ad6-b0ec-00d9775677b4
AT
>003c5d98-6296-4f98-b821-723401f9f321
AT
>004f287a-5b53-4663-b452-7c846280b427
AT

Reads reported by taiyaki basecall.py are close to random while guppy with same model works as expected

Hi, I've observed weird behaviour of basecall.py trained on RNA reads with modifications - the accuracy of base-calling is close to random (1). Interestingly, Guppy works pretty well with the same model (2), better than with the default model shipped with Guppy (3). I'm expecting reads with lengths around 2.2-2.7 kb.
Can you please explain this?
I guess it makes no sense to look at modifications in hdf5 file produced by basecall.py, since the sequence prediction isn't accurate, right?

  1. basecall.py from taiyaki v4.1.0 using new model
basecall.py --device cuda:0 --alphabet ACGU --modified_base_output taiyaki.$s/$n.hdf5 $d $s.train2/model_final.checkpoint >  taiyaki.$s/$n.fa

image

  1. Guppy 3.1.5 using the same model trained with taiyaki v4.1.0
m=$s.train2/model_final.checkpoint; dump_json.py $m > $m.json
guppy_basecaller --device cuda:0 -c rna_r9.4.1_70bps_hac.cfg -m $m.json --compress_fastq --fast5_out -ri $d -s guppy3.$s/$n

image

  1. Guppy 3.1.5 using default RNA model
guppy_basecaller --device cuda:0 -c rna_r9.4.1_70bps_hac.cfg --num_callers 4 --compress_fastq --fast5_out -ri $d -s guppy3/$d

image

Filter raw fast5 mappable reads first?

I generated IVT synthesize mRNA with 100% replacement with modified U.

This particular modification caused a big drop in phred quality scores and mappability so that only 10% of fast5 reads could be mapped, similar to number of reads passing Q>7 threshold.

I wonder is it better to first filter out the raw fast5 reads that can be mapped after standard guppy basecall (maybe using SquiggleKit), and take this through per_read_params > map_reads (resquiggle) ?

From what I can see the map reads will only ever be able to remap on squiggle those reads that can be aligned to the reference.fasta anyhow. So only these read chunks are given to train scripts? Or it will just proceed randomly sampling from all the raw data during the training?

Access to the Guppy template_r9.4.1_450bps_hac.jsn model for further training

I found that the pretrained model provided in the walkthrough (URL Link) performs slightly worse that the Guppy high accuracy model template_r9.4.1_450bps_hac.jsn.

Is it possible to provide the Guppy model for further training? (Or is there a way to convert the .jsn file into a .checkpoint format to be used with taiyaki for further training?)

Thanks a lot!

Taiyaki installation

Hi, we've got some users who are interested in Taiyaki and although the application is relatively simple for them to install themselves, their storage is somewhat constrained. Additionally our cluster has limited GPU nodes so I'd like to make some documentation which encourages users to not carry out the portions of a Taiyaki workflow that do not use the GPUs (I think this includes generate_per_read_params.py and prepare_mapped_reads.py). I'd like to install Taiyaki globally. I made a script using sed to edit the "venv" destination and did some testing, but I get an error message:

pkg_resources.DistributionNotFound: The 'taiyaki==4.1.0' distribution was not found and is required by the application

Is this because I have only the "venv" and not other dependencies in the git repo?

Any advice or suggestions would be appreciated.

train_mod_flipflop.py warns about over half of chunks failing maxdwell

Hi, I've started training model for RNA modification detection, but train_mod_flipflop.py warns about over half of chunks failing maxdwell.

* Summary: pass:100 maxdwell:112 tooshort:8 emptysequence:3 meandwell:23
...* Warning: only 126 chunks passed tests after 278 attempts.
* Summary: pass:126 maxdwell:111 tooshort:7 meandwell:29 emptysequence:5
.* Warning: only 111 chunks passed tests after 272 attempts.
* Summary: pass:111 emptysequence:10 maxdwell:114 meandwell:30 tooshort:7
41 0.103  179.49s (128.74 ksample/s 4.55 kbase/s) lr=9.96e-04 pass:332681 maxdwell:194491  tooshort:13425  meandwell:60513  emptysequence:20365

When I played with test data provided with taiyaki less than 10% of chunks failed maxdwell. I suspect some modification may in fact affect helicase thus dwell time may increase significantly and I suspect that's what's happening...

.................................................C  1000 0.029  151.65s (66.01 ksample/s 6.76 kbase/s) lr=4.00e-04 pass:3468812  meandwell:207586  maxdwell:342531  emptysequence:7256

Would it be reasonable to increase the values of --filter_max_dwell and --filter_mean_dwell? If so, how to estimate which values would be good (defaults are 10 and 3, respectively)?

basecall.py fails with RuntimeError: CUDA out of memory

Hi, basecall.py fails with CUDA out of memory toward the end of basecalling - that's odd as processing first 1-4k reads used ~5GB of vRAM on GTX 1080 Ti (11 GB vRAM), thus I don't see any reason why this should happen. In contrast guppy 3.1.5 runs just fine with the same model...

Traceback (most recent call last):
  File "/home/lpryszcz/src/taiyaki/venv/bin/basecall.py", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/home/lpryszcz/src/taiyaki/bin/basecall.py", line 198, in <module>
    main()
  File "/home/lpryszcz/src/taiyaki/bin/basecall.py", line 177, in main
    is_cat_mod, mods_fp)
  File "/home/lpryszcz/src/taiyaki/bin/basecall.py", line 83, in process_read
    out = model(torch.tensor(chunks, device=device))
  File "/home/lpryszcz/src/taiyaki/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/lpryszcz/src/taiyaki/taiyaki/layers.py", line 487, in forward
    x = layer(x)
  File "/home/lpryszcz/src/taiyaki/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/lpryszcz/src/taiyaki/taiyaki/layers.py", line 80, in forward
    return reverse(self.layer(reverse(x)))
  File "/home/lpryszcz/src/taiyaki/venv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/lpryszcz/src/taiyaki/taiyaki/layers.py", line 362, in forward
    y, hy = self.cudnn_gru.forward(x)
  File "/home/lpryszcz/src/taiyaki/venv/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 179, in forward
    self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: CUDA out of memory. Tried to allocate 7.02 GiB (GPU 0; 10.92 GiB total capacity; 3.63 GiB already allocated; 6.07 GiB free; 647.31 MiB cached)

Problem with installation and prepare_mapped_reads.py

Hi,
I am new to bioinformatics, so probably my problem is easy to fix if I manage to describe it good enough. I try to install Taiyaki in a docker container that runs Linux (I tried Ubuntu, Debian and Alpine from Docker Hub). I usually try to install all necessary packages (for numpy I tried different versions because I had deprecation warnings at some point), then I run the commands for the installation:

make deps
python3 setup.py develop #[development mode](http://setuptools.readthedocs.io/en/latest/setuptools.html#development-mode) 

When I run the tests (unittest seems to work)

make workflow          
make acctest  

i always end up with some error like the following:

Traceback (most recent call last):
  File "/taiyaki-master/bin/prepare_mapped_reads.py", line 8, in <module>
    from taiyaki.iterators import imap_mp
ImportError: No module named taiyaki.iterators

however, in the taiyaki folder is an iterators.py and iterators.pyc file. As I was saying, I tried several Linux versions but the problem was always very similar. The last time I tried in Debian. I will attach the command history, but it contains many unnecessary and/or repetitive commands, so I am sorry for that.

root@cc942b1a2091:/taiyaki-master# history
    1  ls
    2  cd taiyaki-master/
    3  make deps
    4  apt-get update
    5  make deps
    6  apt-get install zlib1g-dev
    7  apt-get install libbz2-dev
    8  apt-get apt-utils
    9  apt install  apt-utils
   10  apt-get install libbz2-dev
   11  apt-get install liblzma-dev
   12  su -
   13  apt-get install liblzma-dev
   14  apt-get install libcurl4-gnutls-dev
   15  apt-get install samtools
   16  apt-get install dialog
   17  apt-get install Dialog
   18  apt-get install dialog
   19  apt-get install samtools
   20  apt-get install python-pip
   21  apt install python3-pip
   22  apt-get install python3.6
   23  run apt-get update
   24  apt-get update
   25  apt-get install python3.6
   26* apt-get installpip3 install numpy
   27  pip3 install numpy
   28  pip3 install Cython
   29  pip3 install pytest
   30  pip3 install parameterized
   31  make deps
   32  python3 setup.py develop #[development mode](http://setuptools.readthedocs.io/en/latest/setuptools.html#development-mode)
   33  pip3 uninstall numpy
   34  pip3 install 'numpy<1.16.0'
   35  install biopythonb
   36  install biopython
   37  pip3 install biopython
   38  pip3 install h5py
   39  pip3 install matplotlib
   40  pip3 install ont_fast5_api
   41  pip3 install pysam
   42  make deps
   43  python3 setup.py develop #[development mode](http://setuptools.readthedocs.io/en/latest/setuptools.html#development-mode)
   44  pip3 install pytest
   45  pip3 install parameterized
   46  make unittest
   47  make workflow
   48  make acctest
   49  make install
   50  pip3 install h5py
   51  make install
   52  pip install wheel
   53  pip3 install wheel
   54  apt-get install libhdf5-dev
   55  make install
   56  make unittest
   57  numpy.version.version
   58  numpy
   59  pip list
   60  pip3 list
   61  /usr/bin/python --version
   62  /usr/bin/python3 --version
   63  make unittest
   64  pip3 install numpy
   65  apt-get purge python-numpy
   66  apt-get purge numpy
   67  pip3 list
   68  pip3 install 'numpy>1.16.0'
   69  pip3 list
   70  make unittest
   71  make workflow
   72  make install
   73  history
   74* pip install -e path/to/taiyaki/repo #[development mode](http://setuptools.readthedocs.io/en/latest/setuptools.html#development-mode
   75  pip install taiyaki
   76  ls
   77  cd taiyaki
   78  ls
   79  cd ..
   80  make unittest
   81  make workflow
   82  history

Warnings when training the modified model on the first round.

Hi:
When I run the following command:
train_mod_flipflop.py --device 0 --mod_factor 0.01 --outdir training taiyaki/models/mGru_cat_mod_flipflop.py modbase.hdf5
The following warning appears

......* Warning: only 47 chunks passed tests after 108 attempts.
* Summary: pass:47 meandwell:17 maxdwell:44
..* Warning: only 56 chunks passed tests after 114 attempts.
* Summary: pass:56 meandwell:14 maxdwell:43 emptysequence:1
...........* Warning: only 59 chunks passed tests after 126 attempts.
* Summary: pass:59 maxdwell:53 meandwell:13 emptysequence:1
.....* Warning: only 51 chunks passed tests after 108 attempts.
* Summary: pass:51 maxdwell:42 meandwell:15
...* Warning: only 46 chunks passed tests after 116 attempts.
* Summary: pass:46 maxdwell:51 meandwell:19
..* Warning: only 43 chunks passed tests after 102 attempts.
* Summary: pass:43 maxdwell:48 meandwell:11
.......* Warning: only 46 chunks passed tests after 100 attempts.
* Summary: pass:46 maxdwell:46 meandwell:8
...* Warning: only 38 chunks passed tests after 102 attempts.
* Summary: pass:38 maxdwell:49 meandwell:15
.   998 0.090  138.71s (70.14 ksample/s 2.37 kbase/s) lr=4.00e-04 pass:3412193  maxdwell:1954485  meandwell:647545  emptysequence:25253  tooshort:58
......* Warning: only 46 chunks passed tests after 106 attempts.
* Summary: pass:46 maxdwell:47 meandwell:13
............* Warning: only 45 chunks passed tests after 102 attempts.
* Summary: pass:45 maxdwell:42 meandwell:15
................* Warning: only 51 chunks passed tests after 108 attempts.
* Summary: pass:51 maxdwell:46 meandwell:11
.............* Warning: only 52 chunks passed tests after 114 attempts.
* Summary: pass:52 maxdwell:46 meandwell:16
...   999 0.090  134.75s (73.59 ksample/s 2.48 kbase/s) lr=4.00e-04 pass:3415745  maxdwell:1956435  meandwell:648244  emptysequence:25283  tooshort:58
..................* Warning: only 47 chunks passed tests after 102 attempts.
* Summary: pass:47 meandwell:8 maxdwell:47
.* Warning: only 48 chunks passed tests after 102 attempts.
* Summary: pass:48 meandwell:9 maxdwell:45
..........* Warning: only 46 chunks passed tests after 104 attempts.
* Summary: pass:46 maxdwell:52 meandwell:6
....* Warning: only 48 chunks passed tests after 104 attempts.
* Summary: pass:48 maxdwell:48 meandwell:8
.* Warning: only 50 chunks passed tests after 104 attempts.
* Summary: pass:50 maxdwell:48 meandwell:6
..............* Warning: only 59 chunks passed tests after 122 attempts.
* Summary: pass:59 maxdwell:51 meandwell:12
.C  1000 0.090  137.18s (72.25 ksample/s 2.43 kbase/s) lr=4.00e-04 pass:3419207  maxdwell:1958427  meandwell:648857  emptysequence:25302  tooshort:58

How to understand the meaning of chunk?Does it mean something like ‘batch size’?
Thank you for your support.

Two questions about the results

Hi:

I find basecalls.fa that is generated by basecall.py and model_final checkpoint is completely unable to mapping to the reference sequence files(0% mapping ratio).
However,the fastq file which is basecalled by guppy_basecaller and model.json(derived from the model_final checkpoint) is Almost complete alignment to the reference sequence(99.89% mapping ratio)
Just like the problem I found before.
#38 (comment)

basecalls.hdf5which is contains the information about the presence of modifications was observed that the distribution of modified bases was extremely uneven.Some reads contain dozens of consecutive modified bases with high probability

Update:The sequence from basecalls.py need be reversed if sequencing method is direct RNA.
The second problem is a parallel computation error in my custom script.Now settled.

pysam wheel install issues

I'm trying to install Taiyaki on my local system (2x10 core 64Gb RAM 1Tb SSD) running Ubuntu 16.04LTS.
I got over some initial issues (trouble finding dependency for Cuda 10. Could only work out how to install CUDA 10.1 so changed makefile accordingly to TORCH_CUDA=10.1 and CUPY_10.1=cupy-cuda101 ) and now the makefile seems happy pulling down and installing all the dependencies until it gets to pysam.
It consistently falls over trying to make a pip wheel for pysam (see install terminal text attached).
Any ideas ?. Is it pointing to the correct pysam repository in the makefile?. Is changing to 10.1 version of CUDA causing issues ?
Any help massively appreciated

Best wishes
install_script.txt

Nick

"ImportError: No module named 'taiyaki'" while testing workflow on Ubuntu 16.04

Hello Everyone,

Could you help me to resolve this issue?
I have installed Taiyaki in virtual environment mode. And I wanted to test taiyaki workflow.
The command I used: sudo make workflow and I got the error as follows:

./workflow/remap_from_samrefs_then_train_test_workflow.sh
+ set -o pipefail
+ echo ''

+ echo 'Test of extract-ref-from-sam followed by flip-flop remap and basecall network training starting'
Test of extract-ref-from-sam followed by flip-flop remap and basecall network training starting
+ echo ''

+ READ_DIR=test/data/reads
+ SAM_DIR=test/data/aligner_output
++ xargs
++ ls test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_0.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_1.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_2.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_3.sam
+ SAMFILES='test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_0.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_1.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_2.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_3.sam'
+ REFERENCEFILE=test/data/genomic_reference.fasta
+ echo 'SAMFILES=test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_0.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_1.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_2.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_3.sam'
SAMFILES=test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_0.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_1.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_2.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_3.sam
+ echo REFERENCEFILE=test/data/genomic_reference.fasta
REFERENCEFILE=test/data/genomic_reference.fasta
++ pwd
+ TAIYAKI_DIR=/opt/taiyaki-master_1
+ RESULT_DIR=/opt/taiyaki-master_1/RESULTS/train_remap_samref
+ rm -rf /opt/taiyaki-master_1/RESULTS/train_remap_samref
+ rm -rf /opt/taiyaki-master_1/RESULTS/training_ingredients
+ make -f workflow/Makefile NETWORK_SIZE=96 MAXREADS=10 READDIR=test/data/reads TAIYAKI_ROOT=/opt/taiyaki-master_1 DEVICE=cpu MAX_TRAINING_ITERS=2 'BAMFILE=test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_0.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_1.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_2.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_3.sam' REFERENCEFILE=test/data/genomic_reference.fasta SEED=1 TAIYAKIACTIVATE= train_remap_samref
make[1]: Entering directory '/opt/taiyaki-master_1'

------------Setting up directory /opt/taiyaki-master_1/RESULTS/training_ingredients

mkdir /opt/taiyaki-master_1/RESULTS/training_ingredients

------------Creating reference file from sam or bam at test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_0.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_1.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_2.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_3.sam

/opt/taiyaki-master_1/bin/get_refs_from_sam.py test/data/genomic_reference.fasta test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_0.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_1.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_2.sam test/data/aligner_output/fastq_runid_9a076f39fd3254aeacc15a915c736105296275f3_3.sam > /opt/taiyaki-master_1/RESULTS/training_ingredients/per_read_references_from_sam.fa
Traceback (most recent call last):
  File "/opt/taiyaki-master_1/bin/get_refs_from_sam.py", line 6, in <module>
    from taiyaki.bio import reverse_complement
ImportError: No module named 'taiyaki'
workflow/Makefile:240: recipe for target '/opt/taiyaki-master_1/RESULTS/training_ingredients/per_read_references_from_sam.fa' failed
make[1]: *** [/opt/taiyaki-master_1/RESULTS/training_ingredients/per_read_references_from_sam.fa] Error 1
make[1]: Leaving directory '/opt/taiyaki-master_1'
Makefile:117: recipe for target 'workflow' failed
make: *** [workflow] Error 2

Additional info:

  1. In the installation directory (downloaded & unzipped), I used sudo make deps
  2. sudo make install
  3. sudo make workflow . Learnt from the error that packages like numpy, torch are missing. I had already ran sudo make deps command to install dependencies. Still ?
  4. Installed all the dependencies listed in the 'requirements.txt' file of installation directory using command format python3 -m pip install matplotlib

System info
Ubuntu 16.04
Python 2.7 & 3.5 installed
pysam==0.15.2, scipy==1.3.0, matplotlib==3.0.3, torch==1.0.1.post2, numpy==1.16.4, biopython==1.73, Cython==0.29.12, h5py==2.9.0

Thanks in advance !

Best regards,
Gaurav

prepare_mapped_reads.py fails to produce remapping on GPU while it works just fine on CPU

Hi, prepare_mapped_reads.py fails to produce remapping on GPU while it works just fine on CPU. I'm training RNA models for detection of modifications using pretrained/r941_rna_minion.checkpoint. Any idea?

prepare_mapped_reads.py --device cuda:0 --jobs 4 --alphabet ACGU ... modbase.hdf5
* 18150 reads failed to produce remapping results due to: Failure applying basecall network to remap read.

prepare_mapped_reads.py --jobs 4 --alphabet ACGU ... modbase_cpu.hdf5
..................................................      50
...
..................................................   18150

ll *.hdf5
-rwxr-xr-x 1 lpryszcz lpryszcz 4.3G May 29 23:25 mapped_reads_cpu.hdf5*
-rwxr-xr-x 1 lpryszcz lpryszcz 6.1K May 29 19:44 mapped_reads.hdf5*

train network with mapped_reads.hdf5 supplied in the intermediate_files

Hi,
When I go through the process in the walkthrough process, I am trying to train the neural network with the data mapped_reads.hdf5 supplied in the intermediate_files. Then I received the following error:

(taiyaki) wenxu@wenxu-C9X299-PG300F:~/Documents/nanopore/demo$ train_flipflop.py --device 0 /home/wenxu/Downloads/taiyaki-master/models/mGru_flipflop.py training mapped_reads.hdf5

  • Taiyaki version 4.1.0
  • Command line
    /home/wenxu/Downloads/taiyaki-master/venv/bin/train_flipflop.py --device 0 /home/wenxu/Downloads/taiyaki-master/models/mGru_flipflop.py training mapped_reads.hdf5
  • Loading data from mapped_reads.hdf5
  • Per read file MD5 fc5b1e9f9399df2a89811755e070e73b
  • Reads not filtered by id
    Traceback (most recent call last):
    File "/home/wenxu/Downloads/taiyaki-master/venv/bin/train_flipflop.py", line 4, in
    import('pkg_resources').run_script('taiyaki==4.1.0', 'train_flipflop.py')
    File "/home/wenxu/Downloads/taiyaki-master/venv/lib/python3.6/site-packages/pkg_resources/init.py", line 739, in run_script
    self.require(requires)[0].run_script(script_name, ns)
    File "/home/wenxu/Downloads/taiyaki-master/venv/lib/python3.6/site-packages/pkg_resources/init.py", line 1494, in run_script
    exec(code, namespace, namespace)
    File "/home/wenxu/Downloads/taiyaki-master/venv/lib/python3.6/site-packages/taiyaki-4.1.0-py3.6-linux-x86_64.egg/EGG-INFO/scripts/train_flipflop.py", line 277, in
    main()
    File "/home/wenxu/Downloads/taiyaki-master/venv/lib/python3.6/site-packages/taiyaki-4.1.0-py3.6-linux-x86_64.egg/EGG-INFO/scripts/train_flipflop.py", line 113, in main
    with mapped_signal_files.HDF5Reader(args.input) as per_read_file:
    File "/home/wenxu/Downloads/taiyaki-master/venv/lib/python3.6/site-packages/taiyaki-4.1.0-py3.6-linux-x86_64.egg/taiyaki/mapped_signal_files.py", line 526, in init
    assert self.version == _version, 'Incorrect file version, got {} expected {}'.format(self.version, _version)
    AssertionError: Incorrect file version, got 7 expected 8

Is there any help on this? Thanks!

Guppy can't run on GPU with trained model

Hi,
I trained Taiyaki (4.1.0) to detect m6A mods in RNA and converted the model into json (with bin/dump_json.py).
I can only use Guppy3.2.2 (1) on this trained model in CPU mode and get the following error in GPU mode:
[guppy/error] main: CUDA error at /builds/ofan/ont_core_cpp/ont_core/common/cuda_common.cpp:146: ') [guppy/warning] main: An error occurred in the basecaller. Aborting.

Guppy runs on GPU fine when used on it's default model.
Any ideas what can be the problem?

  1. guppy_basecaller --qscore_filtering --compress_fastq --fast5_out -r -i ${1} -s ${2} --flowcell FLO-MIN106 --kit SQK-RNA002 --model path_to_model/model.json --num_callers 4 --gpu_runners_per_device 3 --device "cuda:all"

Training RNA with pretained models

How can we train model to work on modified RNA? The walkthroughs illustrate using pretrained models such as flip_flop but I think this is only used for calling DNA.

Basecall.py takes a long time

Hi:
When I use basecall.py to predict modification information of total sample(about 3.5 million reads).This process has been going on for more than a week and is still going on.The GPU being used is nvidia tesla p100 with 16GB RAM.Is there any way to speed up the process?

Training modified RNA model

I am working through the walkthrough and comments in the issue section here to train model on IVT synthesized mRNA with 100% replacement with modified U. I also have data set with unmodified mRNA.

General
Should I mix these two data sets together (completely modified reads, with unmodified reads)? Or just training on the modified reads. Intermediate models provided already know context of canonical U but if ab initio I guess I need to provide signal from canonical U?

Specifically
The two steps of the train_mod for RNA which intermediate model should I start with?

  1. mGru_mod_cat_flipflop for the first 1st step
  2. an ab initio model trained on mixture of modified and unmodified reads

If using mGru_mod_cat_flipflop should I add these flags mentioned in this issue previous because this model is used to train for DNA not RNA?

#32 (comment)

Should I do this in the first part of the training ?

The basecalled sequences are quite different from reference sequences

Hi:
I've finished a modified base model.And I found strange result: basecalls.fa which is generated by basecall.py with final check model are quite different from modbase_references.fasta which is provided by myself.
In my opinion,The training of the model depends on the reference sequence, so when I use the trained model to predict the training set, the generated sequence should be close to the reference sequence.

The prediction results showed a low recall

Hi:
As I said in the previous issues,I'm trying to use taiyaki to train my modified RNA model.But I found the prediction results showed a low recall.I'll go through the process of getting results.
1.Follow the instructions to train a modified base model(About 140k reads covering 2000 transcriptome modification sites,Approximately 1.5 modified bases per read).
2.Using model to basecalling the training set itself again.
basecall.py --device 0 --modified_base_output basecalls.hdf5 ${trainning_set_reads} training2/model_final.checkpoint > basecalls.fa
3.Basecall. fa was mapping to transcriptome by minimap2 to get bam file(minimap2 -t 8 -ax splice -uf -k14 ${transcriptpme} ${workspace}basecalls_reversed.fa > ${workspace}basecalls.sam).The threshold value was set to obtain the modified base coordinates on reads(Handling basecalls.hdf5 files).
4.Converting the coordinates of modified bases with read as the origin to the coordinates with transcriptome as the origin(Process cigar in the bam file using r.get_reference_positions of pysam).
5.Calculating how many of the 2000 transcriptome modification sites are covered by modified bases.I found that only about 6% positions were recalled.I don't know what the problem is, maybe my test method is not suitable, or the training problem.

training_loss
location_sig_ref
chunklog
Unfortunately, this downstream analysis software does not support RNA data.So, does downstream analysis of taiyaki currently require users to do it themselves?
https://github.com/nanoporetech/megalodon

strange output after guppy

Dear Developer,
I get this output when i use the model in combination with guppy

@a2aad6bd-b374-473c-9ad0-ca8649a28be8 runid=58f015113b3e17c33f219e761f9ef6172868dd8a sampleid=ale8 read=557 ch=440 start_time=2018-09-27T08:41:06Z
AGT
+
""#
@e08fc779-d143-4dd8-837e-dd9e1183df11 runid=58f015113b3e17c33f219e761f9ef6172868dd8a sampleid=ale8 read=579 ch=77 start_time=2018-09-27T08:46:35Z
AGT
+
""#
@bda5f326-e22c-4f99-a430-cecbf7d39bb6 runid=58f015113b3e17c33f219e761f9ef6172868dd8a sampleid=ale8 read=621 ch=502 start_time=2018-09-27T08:45:31Z
AGT
+
""#
@cc6df394-873f-48e3-afb8-3410671bdb85 runid=58f015113b3e17c33f219e761f9ef6172868dd8a sampleid=ale8 read=576 ch=293 start_time=2018-09-27T08:44:59Z
AGT
+
""#
@ac90af7a-a3e7-446e-aac2-83eb971fe4b5 runid=58f015113b3e17c33f219e761f9ef6172868dd8a sampleid=ale8 read=554 ch=12 start_time=2018-09-27T08:41:39Z
AGT
+
""#
@3263edd2-8a2f-4930-9aef-4e7b80940fb4 runid=58f015113b3e17c33f219e761f9ef6172868dd8a sampleid=ale8 read=570 ch=293 start_time=2018-09-27T08:44:32Z
AGT
+
""#

Cheers
Luigi

prepare_mapped_reads.py on RNA fails with TypeError: '<' not supported between instances of 'numpy.str_' and 'int'

Hi, I get an error when running prepare_mapped_reads.py on direct RNA reads with m6A. I've run it previously with DNA without any problem, so that's odd to me.

prepare_mapped_reads.py --jobs 6 --mod = A 6mA reads read_params.tsv mapped_reads.hdf5 pretrained/r941_rna_minion.checkpoint read_references.fasta

"""
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/lpryszcz/src/taiyaki/taiyaki/prepare_mapping_funcs.py", line 50, in oneread_remap
    sig.set_trim_absolute(read_params_dict['trim_start'], read_params_dict['trim_end'])
  File "/home/lpryszcz/src/taiyaki/taiyaki/signal.py", line 66, in set_trim_absolute
    if trimstart < 0 or trimend < 0:
TypeError: '<' not supported between instances of 'numpy.str_' and 'int'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/lpryszcz/src/taiyaki/venv/bin/prepare_mapped_reads.py", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/home/lpryszcz/src/taiyaki/bin/prepare_mapped_reads.py", line 83, in <module>
    main()
  File "/home/lpryszcz/src/taiyaki/bin/prepare_mapped_reads.py", line 79, in main
    results, args.output, alphabet_info)
  File "/home/lpryszcz/src/taiyaki/taiyaki/prepare_mapping_funcs.py", line 101, in generate_output_from_results
    for resultdict, mesg in results:
  File "/home/lpryszcz/src/taiyaki/taiyaki/iterators.py", line 361, in imap_mp
    for r in mapper(my_function, args, chunksize=chunksize):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 369, in <genexpr>
    return (item for chunk in result for item in chunk)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 761, in next
    raise value
TypeError: '<' not supported between instances of 'numpy.str_' and 'int'
Exception ignored in: <generator object iterate_files_reads_unpaired at 0x7f6eef0176d0>
Traceback (most recent call last):
  File "/home/lpryszcz/src/taiyaki/taiyaki/fast5utils.py", line 63, in iterate_files_reads_unpaired
  File "/home/lpryszcz/src/taiyaki/venv/lib/python3.6/site-packages/ont_fast5_api/fast5_file.py", line 52, in __exit__
  File "/home/lpryszcz/src/taiyaki/venv/lib/python3.6/site-packages/ont_fast5_api/fast5_file.py", line 78, in close
  File "/home/lpryszcz/src/taiyaki/venv/lib/python3.6/site-packages/h5py/_hl/files.py", line 289, in close
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (/tmp/pip-tnf92dft-build/h5py/_objects.c:2853)
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (/tmp/pip-tnf92dft-build/h5py/_objects.c:2811)
  File "h5py/h5f.pyx", line 234, in h5py.h5f.get_obj_ids (/tmp/pip-tnf92dft-build/h5py/h5f.c:3450)
  File "h5py/h5i.pyx", line 37, in h5py.h5i.wrap_identifier (/tmp/pip-tnf92dft-build/h5py/h5i.c:1145)
ImportError: sys.meta_path is None, Python is likely shutting down

Read qscore decreases after training and rebasecalling

We generate high accuracy target sequence data on PrometION with live basecalling and try taiyaki to improve read accuray further. But after training and rebasecalling. we found the qscore decreases a little bit (shown as below) but the identity shows the oppsite.
So do you think it is an appropriate way to evaluate the performance?

Qscore

  Basecall Basecall_new
Min. 6.554 4.533
1st Qu. 10.706 8.057
Median 11.856 8.597
Mean 11.575 8.413
3rd Qu. 12.707 8.960
Max. 16.446 10.910

Identity

  Basecall Basecall_new
Min. 0.7302 0.7188
1st Qu. 0.9348 0.9427
Median 0.9511 0.9572
Mean 0.9434 0.9505
3rd Qu. 0.9611 0.9661
Max. 1.0000 1.0000

guppy 3.2.2 --fast_out for modifications with custom model for RNA

According to the guppy devs, mod-bases should already be working with guppy 3.1. If you use the --fast5_out option, guppy will write modification probabilities back to the fast5s that you basecalled. There aren't any tools yet, so you will have to extract this information yourself using hdf5 or h5py. You are looking for a dataset called ModBaseProbs. Please open another issue if you need more help with guppy's mod-bases output.

Originally posted by @myrtlecat in #26 (comment)

I've generate a model for RNA modification of my choice. I am having issues with basecall.py script running out of memory when encounters large read. If I were to dump the model to json file for guppy, will the fast5_out record these probabilities? Or is still only for DNA m5C and m6A?

OSerror when I tried to use trained model file to predict test data

Hi:
I tried to predict the labels of test set by using the following command for evaluate the performance of custom mod-base model.

   basecall.py --device 0 --modified_base_output basecalls.hdf5 "/root/test_data_7.17/fast5/" "/root/training_data_7.11/training2/model_final.checkpoint" > basecalls.fa

But the following errors occurred:

* Loading model.
* Initializing reads file search.
Traceback (most recent call last):
  File "/root/anaconda3/bin/basecall.py", line 4, in <module>
    __import__('pkg_resources').run_script('taiyaki==4.1.0', 'basecall.py')
  File "/root/anaconda3/lib/python3.7/site-packages/pkg_resources/__init__.py", line 666, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/root/anaconda3/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1446, in run_script
    exec(code, namespace, namespace)
  File "/root/anaconda3/lib/python3.7/site-packages/taiyaki-4.1.0-py3.7-linux-x86_64.egg/EGG-INFO/scripts/basecall.py", line 198, in <module>
    main()
  File "/root/anaconda3/lib/python3.7/site-packages/taiyaki-4.1.0-py3.7-linux-x86_64.egg/EGG-INFO/scripts/basecall.py", line 142, in main
    recursive=args.recursive))
  File "/root/anaconda3/lib/python3.7/site-packages/taiyaki-4.1.0-py3.7-linux-x86_64.egg/taiyaki/fast5utils.py", line 139, in iterate_fast5_reads
    for y in iterate_files_reads_unpaired(filepaths, read_ids, limit, verbose):
  File "/root/anaconda3/lib/python3.7/site-packages/taiyaki-4.1.0-py3.7-linux-x86_64.egg/taiyaki/fast5utils.py", line 52, in iterate_files_reads_unpaired
    with ont_fast5_api.fast5_interface.get_fast5_file(filepath, 'r') as f5file:
  File "/root/anaconda3/lib/python3.7/site-packages/ont_fast5_api/fast5_interface.py", line 6, in get_fast5_file
    if is_multi_read(filepath):
  File "/root/anaconda3/lib/python3.7/site-packages/ont_fast5_api/fast5_interface.py", line 13, in is_multi_read
    with MultiFast5File(filepath, mode='r') as fast5:
  File "/root/anaconda3/lib/python3.7/site-packages/ont_fast5_api/multi_fast5.py", line 14, in __init__
    self.handle = h5py.File(self.filename, self.mode)
  File "/root/anaconda3/lib/python3.7/site-packages/h5py/_hl/files.py", line 394, in __init__
    swmr=swmr)
  File "/root/anaconda3/lib/python3.7/site-packages/h5py/_hl/files.py", line 170, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 85, in h5py.h5f.open
OSError: Unable to open file (truncated file: eof = 32768, sblock->base_addr = 0, stored_eof = 72986)

Could you help me to resolve this issue?

Modified Base Output

Hi,
I trained a modification-aware RNA model, the modification that I looked at, can occur only at G's. My training dataset consists of reads that are either completely modified (so every G has the modification) or completely unmodifed.
After training I basecalled a held-out test set, that also consists of reads that are completely modified or unmodified, with basecall.py
When looking at the modifed base scores, modified positions have a modified base score from -2 to 0, and unmodified positions have a score from around -20 to -2. There is a bit of intercept between these populations, but over all the distributions of unmodified and modified reads are separated quite clear. In docs/FILE_FORMATS you write:

More negative modified base scores indicate modified bases and more positive scores indicate canonical bases.

So that's more or less the opposite behavior of what I would expect from looking at my data..
Do you have an idea, where this problem is coming from? Maybe by any luck there is a mistake in the file-format doc?

Thanks,
Florian

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.