nanoporetech / dorado Goto Github PK

View Code? Open in Web Editor NEW

399.0 32.0 51.0 14.55 MB

Oxford Nanopore's Basecaller

Home Page: https://nanoporetech.com/

License: Other

CMake 6.25% C++ 88.85% C 0.05% Metal 2.67% Shell 1.20% Batchfile 0.11% Python 0.81% Objective-C++ 0.07%

basecalling libtorch nanopore genomics

dorado's Introduction

Dorado

Dorado is a high-performance, easy-to-use, open source basecaller for Oxford Nanopore reads.

Features

One executable with sensible defaults, automatic hardware detection and configuration.
Runs on Apple silicon (M1/2 family) and Nvidia GPUs including multi-GPU with linear scaling (see Platforms).
Modified basecalling.
Duplex basecalling (watch the following video for an introduction to Duplex).
Simplex barcode classification.
Support for aligned read output in SAM/BAM.
Initial support for poly(A) tail estimation.
POD5 support for highest basecalling performance.
Based on libtorch, the C++ API for pytorch.
Multiple custom optimisations in CUDA and Metal for maximising inference performance.

If you encounter any problems building or running Dorado, please report an issue.

Installation

Platforms

Dorado is heavily-optimised for Nvidia A100 and H100 GPUs and will deliver maximal performance on systems with these GPUs.

Dorado has been tested extensively and supported on the following systems:

Platform	GPU/CPU	Minimum Software Requirements
Linux x86_64	(G)V100, A100	CUDA Driver ≥450.80.02
	H100	CUDA Driver ≥520
Linux arm64	Jetson Orin	Linux for Tegra ≥34.1.1
Windows x86_64	(G)V100, A100	CUDA Driver ≥452.39
	H100	CUDA Driver ≥520
Apple	Apple Silicon (M1/M2)

Linux or Windows systems not listed above but which have Nvidia GPUs with ≥8 GB VRAM and architecture from Pascal onwards (except P100/GP100) have not been widely tested but are expected to work. When basecalling with Apple devices, we recommend systems with ≥16 GB of unified memory.

If you encounter problems with running on your system, please report an issue.

AWS Benchmarks on Nvidia GPUs for Dorado 0.3.0 are available here. Please note: Dorado's basecalling speed is continuously improving, so these benchmarks may not reflect performance with the latest release.

Performance tips

For optimal performance, Dorado requires POD5 file input. Please convert your .fast5 files before basecalling.
Dorado will automatically detect your GPU's free memory and select an appropriate batch size.
Dorado will automatically run in multi-GPU cuda:all mode. If you have a hetrogenous collection of GPUs, select the faster GPUs using the --device flag (e.g --device cuda:0,2). Not doing this will have a detrimental impact on performance.

Running

The following are helpful commands for getting started with Dorado. To see all options and their defaults, run dorado -h and dorado <subcommand> -h.

Model selection foreword

Dorado can automatically select a basecalling model using a selection of model speed (fast, hac, sup) and the pod5 data. This feature is not supported for fast5 data. If the model does not exist locally, dorado will automatically downloaded the model and delete it when finished. To re-use downloaded models, manually download models using dorado download.

Dorado continues to support model paths.

For details read Automatic model selection complex.

Simplex basecalling

To run Dorado basecalling, using the automatically downloaded hac model on a directory of POD5 files or a single POD5 file (.fast5 files are supported, but will not be as performant).

$ dorado basecaller hac pod5s/ > calls.bam

To basecall a single file, simply replace the directory pod5s/ with a path to your data file.

If basecalling is interrupted, it is possible to resume basecalling from a BAM file. To do so, use the --resume-from flag to specify the path to the incomplete BAM file. For example:

$ dorado basecaller hac pod5s/ --resume-from incomplete.bam > calls.bam

calls.bam will contain all of the reads from incomplete.bam plus the new basecalls (incomplete.bam can be discarded after basecalling is complete).

Note: it is important to choose a different filename for the BAM file you are writing to when using --resume-from. If you use the same filename, the interrupted BAM file will lose the existing basecalls and basecalling will restart from the beginning.

DNA adapter and primer trimming

Dorado can detect and remove any adapter and/or primer sequences from the beginning and end of DNA reads. Note that if you intend to demultiplex the reads at some later time, trimming adapters and primers may result in some portions of the flanking regions of the barcodes being removed, which could interfere with correct demultiplexing.

In-line with basecalling

By default, dorado basecaller will attempt to detect any adapter or primer sequences at the beginning and ending of reads, and remove them from the output sequence.

This functionality can be altered by using either the --trim or --no-trim options with dorado basecaller. The --no-trim option will prevent the trimming of detected barcode sequences as well as the detection and trimming of adapter and primer sequences.

The --trim option takes as its argument one of the following values:

all This is the the same as the default behavior. Any detected adapters or primers will be trimmed, and if barcoding is enabled then any detected barcodes will be trimmed.
primers This will result in any detected adapters or primers being trimmed, but if barcoding is enabled the barcode sequences will not be trimmed.
adapters This will result in any detected adapters being trimmed, but primers will not be trimmed, and if barcoding is enabled then barcodes will not be trimmed either.
none This is the same as using the --no-trim option. Nothing will be trimmed.

If adapter/primer trimming is done in-line with basecalling in combination with demultiplexing, then the software will automatically ensure that the trimming of adapters and primers does not interfere with the demultiplexing process. However, if you intend to do demultiplexing later as a separate step, then it is recommended that you disable adapter/primer trimming when basecalling with the --no-trim option, to ensure that any barcode sequences remain completely intact in the reads.

Trimming existing datasets

Existing basecalled datasets can be scanned for adapter and/or primer sequences at either end, and trim any such found sequences. To do this, run:

$ dorado trim <reads> > trimmed.bam

<reads> can either be an HTS format file (e.g. FASTQ, BAM, etc.) or a stream of an HTS format (e.g. the output of Dorado basecalling).

The --no-trim-primers option can be used to prevent the trimming of primer sequences. In this case only adapter sequences will be trimmed.

If it is also your intention to demultiplex the data, then it is recommended that you demultiplex before trimming any adapters and primers, as trimming adapters and primers first may interfere with correct barcode classification.

The output of dorado trim will always be unaligned records, regardless of whether the input is aligned/sorted or not.

Custom primer trimming

The software automatically searches for primer sequences used in Oxford Nanopore kits. However, you can specify an alternative set of primer sequences to search for when trimming either in-line with basecalling, or in combination with the --trim option. In both cases this is accomplished using the --primer-sequences command line option, followed by the full path and filename of a FASTA file containing the primer sequences you want to search for. The record names of the sequences do not matter. Note that if you use this option the normal primer sequences built-in to the dorado software will not be searched for.

RNA adapter trimming

Adapters for RNA002 and RNA004 kits are automatically trimmed during basecalling. However, unlike in DNA, the RNA adapter cannot be trimmed post-basecalling.

Modified basecalling

Beyond the traditional A, T, C, and G basecalling, Dorado can also detect modified bases such as 5-methylcytosine (5mC), 5-hydroxymethylcytosine (5hmC), and N⁶-methyladenosine (6mA). These modified bases play crucial roles in epigenetic regulation.

To call modifications, extend the models argument with a comma-separated list of modifications:

$ dorado basecaller hac,5mCG_5hmCG pod5s/ > calls.bam

Refer to the DNA models table's Compatible Modifications column to see available modifications that can be called with the --modified-bases option.

Modified basecalling is also supported with Duplex basecalling, where it produces hemi-methylation calls.

Duplex

To run Duplex basecalling, run the command:

$ dorado duplex sup pod5s/ > duplex.bam

When using the duplex command, two types of DNA sequence results will be produced: 'simplex' and 'duplex'. Any specific position in the DNA which is in a duplex read is also seen in two simplex strands (the template and complement). So, each DNA position which is duplex sequenced will be covered by a minimum of three separate readings in the output.

The dx tag in the BAM record for each read can be used to distinguish between simplex and duplex reads:

dx:i:1 for duplex reads.
dx:i:0 for simplex reads which don't have duplex offsprings.
dx:i:-1 for simplex reads which have duplex offsprings.

Dorado will report the duplex rate as the number of nucleotides in the duplex basecalls multiplied by two and divided by the total number of nucleotides in the simplex basecalls. This value is a close approximation for the proportion of nucleotides which participated in a duplex basecall.

Duplex basecalling can be performed with modified base detection, producing hemi-methylation calls for duplex reads:

$ dorado duplex hac,5mCG_5hmCG pod5s/ > duplex.bam

More information on how hemi-methylation calls are represented can be found in page 7 of the SAM specification document (version aa7440d) and Modkit documentation.

Alignment

Dorado supports aligning existing basecalls or producing aligned output directly.

To align existing basecalls, run:

$ dorado aligner <index> <reads>  > aligned.bam

where index is a reference to align to in (FASTQ/FASTA/.mmi) format and reads is a folder or file in any HTS format.

When reading from an input folder, dorado aligner also supports emitting aligned files to an output folder, which will preserve the file structure of the inputs:

$ dorado aligner <index> <input_read_folder> --output-dir <output_read_folder>

An alignment summary containing alignment statistics for each read can be generated with the --emit-summary option. The file will be saved in the --output-dir folder.

To basecall with alignment with duplex or simplex, run with the --reference option:

$ dorado basecaller <model> <reads> --reference <index> > calls.bam

Alignment uses minimap2 and by default uses the map-ont preset. This can be overridden with the -k and -w options to set kmer and window size respectively.

Sequencing Summary

The dorado summary command outputs a tab-separated file with read level sequencing information from the BAM file generated during basecalling. To create a summary, run:

$ dorado summary <bam> > summary.tsv

Note that summary generation is only available for reads basecalled from POD5 files. Reads basecalled from .fast5 files are not compatible with the summary command.

Barcode Classification

Dorado supports barcode classification for existing basecalls as well as producing classified basecalls directly.

In-line with basecalling

In this mode, reads are classified into their barcode groups during basecalling as part of the same command. To enable this, run:

$ dorado basecaller <model> <reads> --kit-name <barcode-kit-name> > calls.bam

This will result in a single output stream with classified reads. The classification will be reflected in the read group name as well as in the BC tag of the output record.

By default, Dorado is set up to trim the barcode from the reads. To disable trimming, add --no-trim to the cmdline.

The default heuristic for double-ended barcodes is to look for them on either end of the read. This results in a higher classification rate but can also result in a higher false positive count. To address this, dorado basecaller also provides a --barcode-both-ends option to force double-ended barcodes to be detected on both ends before classification. This will reduce false positives dramatically, but also lower overall classification rates.

The output from dorado basecaller can be demultiplexed into per-barcode BAMs using dorado demux. e.g.

$ dorado demux --output-dir <output-dir> --no-classify <input-bam>

This will output a BAM file per barcode in the output-dir.

The barcode information is reflected in the BAM RG header too. Therefore demultiplexing is also possible through samtools split. e.g.

$ samtools split -u <output-dir>/unclassified.bam -f "<output-dir>/<prefix>_%!.bam" <input-bam>

However, samtools split uses the full RG string as the filename suffix, which can result in very long file names. We recommend using dorado demux to split barcoded BAMs.

Classifying existing datasets

Existing basecalled datasets can be classified as well as demultiplexed into per-barcode BAMs using the standalone demux command in dorado. To use this, run

$ dorado demux --kit-name <kit-name> --output-dir <output-folder-for-demuxed-bams> <reads>

<reads> can either be a folder or a single file in an HTS format file (e.g. FASTQ, BAM, etc.) or a stream of an HTS format (e.g. the output of dorado basecalling).

This results in multiple BAM files being generated in the output folder, one per barcode (formatted as KITNAME_BARCODEXX.bam) and one for all unclassified reads. As with the in-line mode, --no-trim and --barcode-both-ends are also available as additional options.

If the input file is aligned/sorted and --no-trim is chosen, each of the output barcode-specific BAM files will also be sorted and indexed. However, if trimming is enabled (which is the default), the alignment information is removed and the output BAMs are unaligned. This is done because the alignment tags and positions are invalidated once a sequence is altered.

Here is an example output folder

$ dorado demux --kit-name SQK-RPB004 --output-dir /tmp/demux reads.fastq

$ ls -1 /tmp/demux
SQK-RPB004_barcode01.bam
SQK-RPB004_barcode02.bam
SQK-RPB004_barcode03.bam
...
unclassified.bam

A summary file listing each read and its classified barcode can be generated with the --emit-summary option in dorado demux. The file will be saved in the --output-dir folder.

Demultiplexing mapped reads

If the input data files contain mapping data, this information can be preserved in the output files. To do this, you must use the --no-trim option. Trimming the barcodes will invalidate any mapping information that may be contained in the input files, and therefore the application will exclude any mapping information if --no-trim is not specified.

It is also possible to get dorado demux to sort and index any output bam files that contain mapped reads. To enable this, use the --sort-bam option. If you use this option then you must also use the --no-trim option, as trimming will prevent any mapping information from being included in the output files. Index files (.bai extension) will only be created for BAM files that contain mapped reads and were sorted. Note that for large datasets sorting the output files may take a few minutes.

Using a sample sheet

Dorado is able to use a sample sheet to restrict the barcode classifications to only those present, and to apply aliases to the detected classifications. This is enabled by passing the path to a sample sheet to the --sample-sheet argument when using the basecaller or demux commands. See here for more information.

Custom barcodes

In addition to supporting the standard barcode kits from Oxford Nanopore, Dorado also supports specifying custom barcode kit arrangements and sequences. This is done by passing a barcode arrangement file via the --barcode-arrangement argument (either to dorado demux or dorado basecaller). Custom barcode sequences can optionally be specified via the --barcode-sequences option. See here for more details.

Poly(A) tail estimation

Dorado has initial support for estimating poly(A) tail lengths for cDNA (PCS and PCB kits) and RNA. Note that Oxford Nanopore cDNA reads are sequenced in two different orientations and Dorado poly(A) tail length estimation handles both (A and T homopolymers). This feature can be enabled by passing --estimate-poly-a to the basecaller command. It is disabled by default. The estimated tail length is stored in the pt:i tag of the output record. Reads for which the tail length could not be estimated will not have the pt:i tag.

Note that if this option is used, then adapter/primer/barcode trimming will be automatically disabled for DNA.

Available basecalling models

To download all available Dorado models, run:

$ dorado download --model all

Decoding Dorado model names

The names of Dorado models are systematically structured, each segment corresponding to a different aspect of the model, which include both chemistry and run settings. Below is a sample model name explained:

[email protected]

Analyte Type (dna): This denotes the type of analyte being sequenced. For DNA sequencing, it is represented as dna. If you are using a Direct RNA Sequencing Kit, this will be rna002 or rna004, depending on the kit.
Pore Type (r10.4.1): This section corresponds to the type of flow cell used. For instance, FLO-MIN114/FLO-FLG114 is indicated by r10.4.1, while FLO-MIN106D/FLO-FLG001 is signified by r9.4.1.
Chemistry Type (e8.2): This represents the chemistry type, which corresponds to the kit used for sequencing. For example, Kit 14 chemistry is denoted by e8.2 and Kit 10 or Kit 9 are denoted by e8.
Translocation Speed (400bps): This parameter, selected at the run setup in MinKNOW, refers to the speed of translocation. Prior to starting your run, a prompt will ask if you prefer to run at 260 bps or 400 bps. The former yields more accurate results but provides less data. As of MinKNOW version 23.04, the 260 bps option has been deprecated.
Model Type (hac): This represents the size of the model, where larger models yield more accurate basecalls but take more time. The three types of models are fast, hac, and sup. The fast model is the quickest, sup is the most accurate, and hac provides a balance between speed and accuracy. For most users, the hac model is recommended.
Model Version Number (v4.3.0): This denotes the version of the model. Model updates are regularly released, and higher version numbers typically signify greater accuracy.

DNA models:

Below is a table of the available basecalling models and the modified basecalling models that can be used with them. The bolded models are for the latest released condition with 5 kHz data.

The versioning of modification models is bound to the basecalling model. This means that the modification model version is reset for each new simplex model release. For example, 6mA@v1 compatible with v4.3.0 basecalling models is more recent than 6mA@v2 compatible with v4.2.0 basecalling models.

Basecalling Models	Compatible Modifications	Modifications Model Version	Data Sampling Frequency
[email protected]			5 kHz
[email protected]	5mCG_5hmCG 5mC_5hmC 6mA	v1 v1 v2	5 kHz
[email protected]	5mCG_5hmCG 5mC_5hmC 6mA	v1 v1 v2	5 kHz
[email protected]	5mCG_5hmCG	v2	5 kHz
[email protected]	5mCG_5hmCG	v2	5 kHz
[email protected]	5mCG_5hmCG 5mC_5hmC 5mC 6mA	v3.1 v1 v2 v3	5 kHz
[email protected]	5mCG_5hmCG	v2	4 kHz
[email protected]	5mCG_5hmCG	v2	4 kHz
[email protected]	5mCG_5hmCG	v2	4 kHz
[email protected]	5mCG_5hmCG	v2	4 kHz
[email protected]	5mCG_5hmCG	v2	4 kHz
[email protected]	5mCG_5hmCG	v2	4 kHz
[email protected]	5mCG_5hmCG	v2	4 kHz
[email protected]	5mCG_5hmCG	v2	4 kHz
[email protected]	5mCG_5hmCG	v2	4 kHz
[email protected]	5mCG_5hmCG	v2	4 kHz
[email protected]	5mCG_5hmCG	v2	4 kHz
[email protected]	5mCG_5hmCG	v2	4 kHz
[email protected]	5mCG	v2	4 kHz
[email protected]	5mCG	v2	4 kHz
[email protected]	5mCG	v2	4 kHz
[email protected]	5mCG	v2	4 kHz
[email protected]	5mCG	v2	4 kHz
[email protected]	5mCG	v2	4 kHz
[email protected]			4 kHz
[email protected]	5mCG_5hmCG 5mCG	v0 v0.1	4 kHz
[email protected]	5mCG_5hmCG 5mCG	v0 v0.1	4 kHz
[email protected]	5mCG_5hmCG 5mCG	v0 v0.1	4 kHz

RNA models:

Note: The BAM format does not support U bases. Therefore, when Dorado is performing RNA basecalling, the resulting output files will include T instead of U. This is consistent across output file types. The same applies to parsing inputs. Any input HTS file (e.g. FASTQ generated by guppy/basecall_server) with U bases is not handled by dorado.

Basecalling Models	Compatible Modifications	Modifications Model Version	Data Sampling Frequency
[email protected]	N/A	N/A	4 kHz
[email protected]	N/A	N/A	4 kHz
[email protected]	m6A_DRACH	v1	4 kHz
rna002_70bps_fast@v3	N/A	N/A	3 kHz
rna002_70bps_hac@v3	N/A	N/A	3 kHz

Automatic model selection complex

The model argument in dorado can specify either a model path or a model complex. A model complex must start with the simplex model speed, and follows this syntax:

(fast|hac|sup)[@(version|latest)][,modification[@(version|latest)]][,...]

Automatically selected modification models will always match the base simplex model version and will be the latest compatible version unless a specific version is set by the user. Automatic modification model selection will not allow the mixing of modification models which are bound to different simplex model versions.

Here are a few examples of model complexes:

Model Complex	Description
fast	Latest compatible fast model
hac	Latest compatible hac model
sup	Latest compatible sup model
hac@latest	Latest compatible hac simplex basecalling model
[email protected]	Simplex basecalling hac model with version `v4.2.0`
[email protected]	Simplex basecalling hac model with version `v3.5.0`
hac,5mCG_5hmCG	Latest compatible hac simplex model and latest 5mCG_5hmCG modifications model for the chosen basecall model
hac,5mCG_5hmCG@v2	Latest compatible hac simplex model and 5mCG_5hmCG modifications model with version `v2.0.0`
sup,5mCG_5hmCG,6mA	Latest compatible sup model and latest compatible 5mCG_5hmCG and 6mA modifications models

Developer quickstart

Linux dependencies

The following packages are necessary to build Dorado in a barebones environment (e.g. the official ubuntu:jammy docker image).

$ apt-get update && apt-get install -y --no-install-recommends \
        curl \
        git \
        ca-certificates \
        build-essential \
        nvidia-cuda-toolkit \
        libhdf5-dev \
        libssl-dev \
        libzstd-dev \
        cmake \
        autoconf \
        automake

Clone and build

$ git clone https://github.com/nanoporetech/dorado.git dorado
$ cd dorado
$ cmake -S . -B cmake-build
$ cmake --build cmake-build --config Release -j
$ ctest --test-dir cmake-build

The -j flag will use all available threads to build Dorado and usage is around 1-2 GB per thread. If you are constrained by the amount of available memory on your system, you can lower the number of threads i.e. -j 4.

After building, you can run Dorado from the build directory ./cmake-build/bin/dorado or install it somewhere else on your system i.e. /opt (note: you will need the relevant permissions for the target installation directory).

$ cmake --install cmake-build --prefix /opt

Pre-commit

The project uses pre-commit to ensure code is consistently formatted; you can set this up using pip:

$ pip install pre-commit
$ pre-commit install

Troubleshooting Guide

Library Path Errors

Dorado comes equipped with the necessary libraries (such as CUDA) for its execution. However, on some operating systems, the system libraries might be chosen over Dorado's. This discrepancy can result in various errors, for instance, CuBLAS error 8.

To resolve this issue, you need to set the LD_LIBRARY_PATH to point to Dorado's libraries. Use a command like the following on Linux (change path as appropriate):

$ export LD_LIBRARY_PATH=<PATH_TO_DORADO>/dorado-x.y.z-linux-x64/lib:$LD_LIBRARY_PATH

On macOS, the equivalent export would be (change path as appropriate):

$ export DYLD_LIBRARY_PATH=<PATH_TO_DORADO>/dorado-x.y.z-osx-arm64/lib:$DYLD_LIBRARY_PATH

Improving the Speed of Duplex Basecalling

Duplex basecalling is an IO-intensive process and can perform poorly if using networked storage or HDD. This can generally be improved by splitting up POD5 files appropriately.

Firstly install the POD5 python tools:

The POD5 documentation can be found here.

$ pip install pod5

Then run pod5 view to generate a table containing information to split on specifically, the "channel" information.

$ pod5 view /path/to/your/dataset/ --include "read_id, channel" --output summary.tsv

This will create "summary.tsv" file which should look like:

read_id channel
0000173c-bf67-44e7-9a9c-1ad0bc728e74    109
002fde30-9e23-4125-9eae-d112c18a81a7    463
...

Now run pod5 subset to copy records from your source data into outputs per-channel. This might take some time depending on the size of your dataset

$ pod5 subset /path/to/your/dataset/ --summary summary.tsv --columns channel --output split_by_channel

The command above will create the output directory split_by_channel and write into it one pod5 file per unique channel. Duplex basecalling these split reads will now be much faster.

Running Duplex Basecalling in a Distributed Fashion

If running duplex basecalling in a distributed fashion (e.g. on a SLURM or Kubernetes cluster) it is important to split POD5 files as described above. The reason is that duplex basecalling requires aggregation of reads from across a whole sequencing run, which will be distributed over multiple POD5 files. The splitting strategy described above ensures that all reads which need to be aggregated are in the same POD5 file. Once the split is performed one can execute multiple jobs against smaller subsets of POD5 (e.g one job per 100 channels). This will allow basecalling to be distributed across nodes on a cluster. This will generate multiple BAMs which can be merged. This apporach also offers some resilience as if any job fails it can be restarted without having to re-run basecalling against the entire dataset.

GPU Out of Memory Errors

Dorado operates on a broad range of GPUs but it is primarily developed for Nvidia A100/H100 and Apple Silicon. Dorado attempts to find the optimal batch size for basecalling. Nevertheless, on some low-RAM GPUs, users may face out of memory crashes.

A potential solution to this issue could be setting a manual batch size using the following command:

dorado basecaller --batchsize 64 ...

Note: Reducing memory consumption by modifying the chunksize parameter is not recommended as it influences the basecalling results.

Low GPU Utilization

Low GPU utilization can lead to reduced basecalling speed. This problem can be identified using tools such as nvidia-smi and nvtop. Low GPU utilization often stems from I/O bottlenecks in basecalling. Here are a few steps you can take to improve the situation:

Opt for POD5 instead of .fast5: POD5 has superior I/O performance and will enhance the basecall speed in I/O constrained environments.
Transfer data to the local disk before basecalling: Slow basecalling often occurs because network disks cannot supply Dorado with adequate speed. To mitigate this, make sure your data is as close to your host machine as possible.
Choose SSD over HDD: Particularly for duplex basecalling, using a local SSD can offer significant speed advantages. This is due to the duplex basecalling algorithm's reliance on heavy random access of data.

Licence and Copyright

Dorado is distributed under the terms of the Oxford Nanopore Technologies PLC. Public License, v. 1.0. If a copy of the License was not distributed with this file, You can obtain one at http://nanoporetech.com

dorado's People

Contributors

Stargazers

Watchers

Forkers

matthewlinks epislim mjfos2r gringer jzinno hiruna72 genrait lpryszcz lutfia95 dkomics granek oracle5th osagiei ahderojas ssun3 lql341 georgenikitinnv thesequencingcenter akaraw ssghost esteinig hcyvan singagan mfitzgib sivasan xiangrong131 jahernayeem gilmahu yogurt0713 modernism-01 erikheggeli aadrian nimstepf rdeborja mp15 shians kh0ihuynh type333 afollet jouhpf oumarousoro abubakariabdulwasid kamiddlemiss linhxxx dewadewi2020 aowenson-imm scchess boegel jmencius ramongallego

dorado's Issues

Software download issues

Hello,

The link address for software download works when I click to download but does not work with wget.

I would like to be able to download straight to our server if possible. Linux address is shown as "https://nanoporetech.box.com/shared/static/h8eqc9htxk938jzpl4fch2rqlm48yeb0.gz"

Paul

multiplexing workaround?

What the work around for no multiplexing functionality or alignment if your running an M1 Mac ?

Is the dependent package koi going to be released open source?

Dorado seems to depend on koi from ONT https://github.com/nanoporetech/dorado/blob/master/cmake/Koi.cmake. Is this going to be released as open-source at one point or will it remain closed?

Bonito vs dorado

Hi,

Does dorado use the same models and/or produce the same baseballs as bonito?
The models seem to have very similar names and version numbers.

Best,
Chris

Error - no kernel image is available for execution on the device

I'm just trying out Dorado on our GPU cluster, and it seems to output many (tens of thousands for a single fast5 file so far) instances of the error message:

Error - no kernel image is available for execution on the device

It looks like this might be related to pytorch versions: pytorch/pytorch#31285

nvidia-smi says the server I was running on has: Driver Version: 460.84 CUDA Version: 11.2

Assertion failed: (ptr != nullptr), function mtl_for_tensor, file metal_utils.cpp, line 138.

Greetings!
I am currently attempting to use Dorado on my Macbook Pro and running into the title'd issue.
Computer Specs:

❯ sw_vers
ProductName:	macOS
ProductVersion:	12.4
BuildVersion:	21F79

Model Name: MacBook Pro
Model Identifier: MacBookPro18,3
Chip: Apple M1 Pro
Total Number of Cores: 10 (8 performance and 2 efficiency)
Memory: 16 GB
System Firmware Version: 7459.121.3
OS Loader Version: 7459.121.3
Apple M1 Pro:

Chipset Model: Apple M1 Pro
Type: GPU
Bus: Built-In
Total Number of Cores: 16
Vendor: Apple (0x106b)
Metal Family: Supported, Metal GPUFamily Apple 7

cmake version:

❯ cmake --version
cmake version 3.23.1

CMake suite maintained and supported by Kitware (kitware.com/cmake).

and path of cmake

❯ which cmake
/opt/homebrew/bin/cmake

I have been attempting to run basecalling on a run I did last week. my first step was basecalling the barcode for the known lambdaphage control I added.
I moved all of the fast5 files for the barcoded lambda control into a directory for testing and got the following error:

❯ ./dorado basecaller models/[email protected] /Users/michaelfoster/sequencing/dorado_test/ >lambda-dorado-test.sam
> Creating basecall pipeline
Assertion failed: (ptr != nullptr), function mtl_for_tensor, file metal_utils.cpp, line 138.
[1]    52214 abort      ./dorado basecaller models/[email protected]  > lambda-dorado-test.sam

okay, just gonna rm -rf the entire dorado directory and do it from the start.

here is the entire process starting from git clone:

❯ git clone [email protected]:nanoporetech/dorado.git
Cloning into 'dorado'...
remote: Enumerating objects: 1172, done.
remote: Counting objects: 100% (1172/1172), done.
remote: Compressing objects: 100% (428/428), done.
remote: Total 1172 (delta 762), reused 1138 (delta 735), pack-reused 0
Receiving objects: 100% (1172/1172), 366.55 KiB | 1.68 MiB/s, done.
Resolving deltas: 100% (762/762), done.
❯ cd dorado
❯ ls
CMakeLists.txt          LICENCE.txt             dorado                  make_koi_archive_win.sh
DEV.md                  README.md               make_koi_archive.sh     tests
❯ cmake -S . -B cmake-build
-- The C compiler identification is AppleClang 13.1.6.13160021
-- The CXX compiler identification is AppleClang 13.1.6.13160021
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Submodule update
Submodule 'dorado/3rdparty/HighFive' (https://github.com/BlueBrain/HighFive.git) registered for path 'dorado/3rdparty/HighFive'
Submodule 'dorado/3rdparty/cpp-httplib' (https://github.com/yhirose/cpp-httplib.git) registered for path 'dorado/3rdparty/cpp-httplib'
Submodule 'dorado/3rdparty/11Zip' (https://github.com/Sygmei/11Zip.git) registered for path 'dorado/3rdparty/elzip'
Submodule 'dorado/3rdparty/hdf_plugins' (https://github.com/nanoporetech/vbz_compression.git) registered for path 'dorado/3rdparty/hdf_plugins'
Submodule 'dorado/3rdparty/toml11' (https://github.com/ToruNiina/toml11.git) registered for path 'dorado/3rdparty/toml11'
Submodule 'dorado/3rdparty/zstd' (https://github.com/facebook/zstd.git) registered for path 'dorado/3rdparty/zstd'
Cloning into '/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/HighFive'...
Cloning into '/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/cpp-httplib'...
Cloning into '/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip'...
Cloning into '/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/hdf_plugins'...
Cloning into '/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/toml11'...
Cloning into '/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/zstd'...
From https://github.com/BlueBrain/HighFive
 * branch            d3afd218ff04c3a2c6fbbd2a26b076715428bd57 -> FETCH_HEAD
Submodule path 'dorado/3rdparty/HighFive': checked out 'd3afd218ff04c3a2c6fbbd2a26b076715428bd57'
Submodule 'deps/catch2' (https://github.com/catchorg/Catch2.git) registered for path 'dorado/3rdparty/HighFive/deps/catch2'
Cloning into '/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/HighFive/deps/catch2'...
Submodule path 'dorado/3rdparty/HighFive/deps/catch2': checked out '216713a4066b79d9803d374f261ccb30c0fb451f'
Submodule path 'dorado/3rdparty/cpp-httplib': checked out 'fee8e97b4eeb34fe2e6e6294413d84e9e7a072a7'
Submodule path 'dorado/3rdparty/elzip': checked out '94a125161e4acab2638d8becd99af352d515b793'
Submodule 'extlibs/minizip' (https://github.com/zlib-ng/minizip-ng.git) registered for path 'dorado/3rdparty/elzip/extlibs/minizip'
Cloning into '/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip'...
From https://github.com/zlib-ng/minizip-ng
 * branch            99d39015e29703af2612277180ea586805f655ea -> FETCH_HEAD
Submodule path 'dorado/3rdparty/elzip/extlibs/minizip': checked out '99d39015e29703af2612277180ea586805f655ea'
Submodule path 'dorado/3rdparty/hdf_plugins': checked out '02fb8f50b93921ffa3c040106e809aaf8adbe5c5'
Submodule 'third_party/streamvbyte' (https://github.com/lemire/streamvbyte.git) registered for path 'dorado/3rdparty/hdf_plugins/third_party/streamvbyte'
Cloning into '/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/hdf_plugins/third_party/streamvbyte'...
Submodule path 'dorado/3rdparty/hdf_plugins/third_party/streamvbyte': checked out '1813d4ec3d732f3f615821fd8d1f725204c15ecc'
Submodule path 'dorado/3rdparty/toml11': checked out '59243256528d4133321e845c3193db2d2725e6ee'
Submodule path 'dorado/3rdparty/zstd': checked out '97a3da1df009d4dc67251de0c4b1c9d7fe286fc1'
-- Downloading metal-cpp
-- Downloading metal-cpp - done
-- Extracting metal-cpp
-- Extracting metal-cpp - done
-- Downloading pod5-0.0.14-Darwin
-- Downloading pod5-0.0.14-Darwin - done
-- Extracting pod5-0.0.14-Darwin
-- Extracting pod5-0.0.14-Darwin - done
-- Building version 1.0.3
-- No CMAKE_BUILD_TYPE set - defaulting to Debug
-- Found HDF5: /opt/homebrew/Cellar/hdf5/1.12.2/lib/libhdf5.dylib;/opt/homebrew/opt/libaec/lib/libsz.dylib;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk/usr/lib/libz.tbd;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk/usr/lib/libdl.tbd;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk/usr/lib/libm.tbd (found suitable version "1.12.2", minimum required is "1.8.16")  
-- Found zstd: /opt/homebrew/lib/libzstd.dylib (found suitable version "1.5.2", minimum required is "1.3.1") 
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Downloading torch-1.10.2-Darwin
-- Downloading torch-1.10.2-Darwin - done
-- Extracting torch-1.10.2-Darwin
-- Extracting torch-1.10.2-Darwin - done
CMake Warning at dorado/3rdparty/torch-1.10.2-Darwin/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  dorado/3rdparty/torch-1.10.2-Darwin/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
  CMakeLists.txt:195 (find_package)


-- Found Torch: /Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/torch-1.10.2-Darwin/torch/lib/libtorch.dylib  
-- Found HDF5: /opt/homebrew/Cellar/hdf5/1.12.2/lib/libhdf5.dylib;/opt/homebrew/opt/libaec/lib/libsz.dylib;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk/usr/lib/libz.tbd;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk/usr/lib/libdl.tbd;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk/usr/lib/libm.tbd;/opt/homebrew/Cellar/hdf5/1.12.2/lib/libhdf5_cpp.a;/opt/homebrew/Cellar/hdf5/1.12.2/lib/libhdf5.a;/opt/homebrew/opt/libaec/lib/libsz.dylib;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk/usr/lib/libz.tbd;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk/usr/lib/libdl.tbd;/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk/usr/lib/libm.tbd (found version "1.12.2") found components: C CXX HL 
-- Found OpenSSL: /opt/homebrew/opt/openssl@3/lib/libcrypto.a (found version "3.0.2")  
-- Using CMake version 3.23.1
CMake Deprecation Warning at dorado/3rdparty/elzip/extlibs/minizip/CMakeLists.txt:56 (cmake_policy):
  The OLD behavior for policy CMP0074 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.


-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for inttypes.h
-- Looking for inttypes.h - found
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of off64_t
-- Check size of off64_t - failed
-- Looking for fseeko
-- Looking for fseeko - found
-- Found PkgConfig: /opt/homebrew/bin/pkg-config (found version "0.29.2") 
-- Checking for module 'openssl'
--   No package 'openssl' found
-- Using OpenSSL 3.0.2
-- Character encoding support requires iconv
-- The following features have been enabled:

 * MZ_COMPAT, Enables compatibility layer
 * MZ_LIBCOMP, Enables Apple compression
 * MZ_FETCH_LIBS, Enables fetching third-party libraries if not found
 * MZ_OPENSSL, Enables OpenSSL for encryption
 * MZ_LIBBSD, Build with libbsd for crypto random

-- The following features have been disabled:

 * MZ_ZLIB, Enables ZLIB compression
 * MZ_BZIP2, Enables BZIP2 compression
 * MZ_LZMA, Enables LZMA & XZ compression
 * MZ_ZSTD, Enables ZSTD compression
 * MZ_FORCE_FETCH_LIBS, Enables fetching third-party libraries always
 * MZ_PKCRYPT, Enables PKWARE traditional encryption
 * MZ_WZAES, Enables WinZIP AES encryption
 * MZ_SIGNING, Enables zip signing support
 * MZ_ICONV, Enables iconv string encoding conversion library
 * MZ_COMPRESS_ONLY, Only support compression
 * MZ_DECOMPRESS_ONLY, Only support decompression
 * MZ_FILE32_API, Builds using posix 32-bit file api
 * MZ_BUILD_TESTS, Builds minizip test executable
 * MZ_BUILD_UNIT_TESTS, Builds minizip unit test project
 * MZ_BUILD_FUZZ_TESTS, Builds minizip fuzzer executables
 * MZ_CODE_COVERAGE, Builds with code coverage flags

-- Configuring done
-- Generating done
-- Build files have been written to: /Users/michaelfoster/sequencing/basecalling/dorado/cmake-build

okay seems to have worked. it printed it couldn't find openssl but then used it anyway despite having everything installed ahead of time. I can list the steps taken to prep/download deps if needed.

ran the next command:

❯ cmake --build cmake-build --config Release -- -j
[  1%] Creating directories for 'streamvbyte'
[  4%] Compiling metal kernels
[  7%] Building C object dorado/3rdparty/elzip/extlibs/minizip/CMakeFiles/minizip.dir/mz_crypt.c.o
[  7%] Building C object dorado/3rdparty/elzip/extlibs/minizip/CMakeFiles/minizip.dir/mz_strm.c.o
[  8%] Building CXX object dorado/3rdparty/hdf_plugins/vbz_plugin/hdf_test_utils/CMakeFiles/hdf_test_utils.dir/hdf_id_helper.cpp.o
[  8%] Building C object dorado/3rdparty/elzip/extlibs/minizip/CMakeFiles/minizip.dir/mz_strm_buf.c.o
[ 10%] Building C object dorado/3rdparty/elzip/extlibs/minizip/CMakeFiles/minizip.dir/mz_strm_mem.c.o
[ 13%] Building C object dorado/3rdparty/elzip/extlibs/minizip/CMakeFiles/minizip.dir/mz_os.c.o
[ 13%] Building C object dorado/3rdparty/elzip/extlibs/minizip/CMakeFiles/minizip.dir/mz_zip.c.o
[ 14%] Building C object dorado/3rdparty/elzip/extlibs/minizip/CMakeFiles/minizip.dir/mz_strm_split.c.o
[ 16%] Building C object dorado/3rdparty/elzip/extlibs/minizip/CMakeFiles/minizip.dir/mz_zip_rw.c.o
[ 17%] Building C object dorado/3rdparty/elzip/extlibs/minizip/CMakeFiles/minizip.dir/mz_strm_libcomp.c.o
[ 19%] Building C object dorado/3rdparty/elzip/extlibs/minizip/CMakeFiles/minizip.dir/mz_crypt_openssl.c.o
[ 20%] Building C object dorado/3rdparty/elzip/extlibs/minizip/CMakeFiles/minizip.dir/mz_os_posix.c.o
[ 22%] Building C object dorado/3rdparty/elzip/extlibs/minizip/CMakeFiles/minizip.dir/mz_strm_os_posix.c.o
[ 23%] Building C object dorado/3rdparty/elzip/extlibs/minizip/CMakeFiles/minizip.dir/mz_compat.c.o
[ 25%] No download step for 'streamvbyte'
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_strm_libcomp.c:120:13: warning: variable 'total_in' set but not used [-Wunused-but-set-variable]
    int32_t total_in = 0;
            ^
1 warning generated.
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_zip.c:192:17: warning: unused function 'mz_zip_get_pk_verify' [-Wunused-function]
static uint16_t mz_zip_get_pk_verify(uint32_t dos_date, uint64_t crc, uint16_t flag)
                ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_compat.c:166:13: warning: unused variable 'written' [-Wunused-variable]
    int32_t written = 0;
            ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_compat.c:167:11: warning: unused variable 'opaque' [-Wunused-variable]
    void *opaque = NULL;
          ^
[ 26%] No patch step for 'streamvbyte'
2 warnings generated.
[ 27%] Performing configure step for 'streamvbyte'
1 warning generated.
-- No build type selected, default to Release
[ 29%] Linking CXX static library ../../../../../lib/libhdf_test_utils.a
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:40:9: warning: 'ERR_load_BIO_strings' is deprecated [-Wdeprecated-declarations]
        ERR_load_BIO_strings();
        ^
/opt/homebrew/opt/openssl@3/include/openssl/cryptoerr_legacy.h:31:1: note: 'ERR_load_BIO_strings' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0 int ERR_load_BIO_strings(void);
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:43:9: warning: 'ENGINE_load_builtin_engines' is deprecated [-Wdeprecated-declarations]
        ENGINE_load_builtin_engines();
        ^
/opt/homebrew/opt/openssl@3/include/openssl/engine.h:358:1: note: 'ENGINE_load_builtin_engines' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0 void ENGINE_load_builtin_engines(void);
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:44:9: warning: 'ENGINE_register_all_complete' is deprecated [-Wdeprecated-declarations]
        ENGINE_register_all_complete();
        ^
/opt/homebrew/opt/openssl@3/include/openssl/engine.h:415:1: note: 'ENGINE_register_all_complete' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0 int ENGINE_register_all_complete(void);
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:93:18: warning: 'SHA1_Init' is deprecated [-Wdeprecated-declarations]
        result = SHA1_Init(&sha->ctx1);
                 ^
/opt/homebrew/opt/openssl@3/include/openssl/sha.h:49:1: note: 'SHA1_Init' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0 int SHA1_Init(SHA_CTX *c);
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:95:18: warning: 'SHA256_Init' is deprecated [-Wdeprecated-declarations]
        result = SHA256_Init(&sha->ctx256);
                 ^
/opt/homebrew/opt/openssl@3/include/openssl/sha.h:73:1: note: 'SHA256_Init' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0 int SHA256_Init(SHA256_CTX *c);
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:114:18: warning: 'SHA1_Update' is deprecated [-Wdeprecated-declarations]
        result = SHA1_Update(&sha->ctx1, buf, size);
                 ^
/opt/homebrew/opt/openssl@3/include/openssl/sha.h:50:1: note: 'SHA1_Update' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0 int SHA1_Update(SHA_CTX *c, const void *data, size_t len);
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:116:18: warning: 'SHA256_Update' is deprecated [-Wdeprecated-declarations]
        result = SHA256_Update(&sha->ctx256, buf, size);
                 ^
/opt/homebrew/opt/openssl@3/include/openssl/sha.h:74:1: note: 'SHA256_Update' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0 int SHA256_Update(SHA256_CTX *c,
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:136:18: warning: 'SHA1_Final' is deprecated [-Wdeprecated-declarations]
        result = SHA1_Final(digest, &sha->ctx1);
                 ^
/opt/homebrew/opt/openssl@3/include/openssl/sha.h:51:1: note: 'SHA1_Final' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0 int SHA1_Final(unsigned char *md, SHA_CTX *c);
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:140:18: warning: 'SHA256_Final' is deprecated [-Wdeprecated-declarations]
        result = SHA256_Final(digest, &sha->ctx256);
                 ^
/opt/homebrew/opt/openssl@3/include/openssl/sha.h:76:1: note: 'SHA256_Final' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0 int SHA256_Final(unsigned char *md, SHA256_CTX *c);
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:208:5: warning: 'AES_encrypt' is deprecated [-Wdeprecated-declarations]
    AES_encrypt(buf, buf, &aes->key);
    ^
/opt/homebrew/opt/openssl@3/include/openssl/aes.h:56:1: note: 'AES_encrypt' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:220:5: warning: 'AES_decrypt' is deprecated [-Wdeprecated-declarations]
    AES_decrypt(buf, buf, &aes->key);
    ^
/opt/homebrew/opt/openssl@3/include/openssl/aes.h:59:1: note: 'AES_decrypt' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:237:14: warning: 'AES_set_encrypt_key' is deprecated [-Wdeprecated-declarations]
    result = AES_set_encrypt_key(key, key_bits, &aes->key);
             ^
/opt/homebrew/opt/openssl@3/include/openssl/aes.h:50:1: note: 'AES_set_encrypt_key' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:258:14: warning: 'AES_set_decrypt_key' is deprecated [-Wdeprecated-declarations]
    result = AES_set_decrypt_key(key, key_bits, &aes->key);
             ^
/opt/homebrew/opt/openssl@3/include/openssl/aes.h:53:1: note: 'AES_set_decrypt_key' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:326:5: warning: 'HMAC_CTX_free' is deprecated [-Wdeprecated-declarations]
    HMAC_CTX_free(hmac->ctx);
    ^
/opt/homebrew/opt/openssl@3/include/openssl/hmac.h:35:1: note: 'HMAC_CTX_free' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0 void HMAC_CTX_free(HMAC_CTX *ctx);
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:344:17: warning: 'HMAC_CTX_new' is deprecated [-Wdeprecated-declarations]
    hmac->ctx = HMAC_CTX_new();
                ^
/opt/homebrew/opt/openssl@3/include/openssl/hmac.h:33:1: note: 'HMAC_CTX_new' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0 HMAC_CTX *HMAC_CTX_new(void);
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:351:14: warning: 'HMAC_Init_ex' is deprecated [-Wdeprecated-declarations]
    result = HMAC_Init_ex(hmac->ctx, key, key_length, evp_md, NULL);
             ^
/opt/homebrew/opt/openssl@3/include/openssl/hmac.h:43:1: note: 'HMAC_Init_ex' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0 int HMAC_Init_ex(HMAC_CTX *ctx, const void *key, int len,
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:367:14: warning: 'HMAC_Update' is deprecated [-Wdeprecated-declarations]
    result = HMAC_Update(hmac->ctx, buf, size);
             ^
/opt/homebrew/opt/openssl@3/include/openssl/hmac.h:45:1: note: 'HMAC_Update' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0 int HMAC_Update(HMAC_CTX *ctx, const unsigned char *data,
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:387:18: warning: 'HMAC_Final' is deprecated [-Wdeprecated-declarations]
        result = HMAC_Final(hmac->ctx, digest, (uint32_t *)&digest_size);
                 ^
/opt/homebrew/opt/openssl@3/include/openssl/hmac.h:47:1: note: 'HMAC_Final' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0 int HMAC_Final(HMAC_CTX *ctx, unsigned char *md,
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:391:18: warning: 'HMAC_Final' is deprecated [-Wdeprecated-declarations]
        result = HMAC_Final(hmac->ctx, digest, (uint32_t *)&digest_size);
                 ^
/opt/homebrew/opt/openssl@3/include/openssl/hmac.h:47:1: note: 'HMAC_Final' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0 int HMAC_Final(HMAC_CTX *ctx, unsigned char *md,
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:418:23: warning: 'HMAC_CTX_new' is deprecated [-Wdeprecated-declarations]
        target->ctx = HMAC_CTX_new();
                      ^
/opt/homebrew/opt/openssl@3/include/openssl/hmac.h:33:1: note: 'HMAC_CTX_new' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0 HMAC_CTX *HMAC_CTX_new(void);
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
/Users/michaelfoster/sequencing/basecalling/dorado/dorado/3rdparty/elzip/extlibs/minizip/mz_crypt_openssl.c:420:14: warning: 'HMAC_CTX_copy' is deprecated [-Wdeprecated-declarations]
    result = HMAC_CTX_copy(target->ctx, source->ctx);
             ^
/opt/homebrew/opt/openssl@3/include/openssl/hmac.h:49:1: note: 'HMAC_CTX_copy' has been explicitly marked deprecated here
OSSL_DEPRECATEDIN_3_0 __owur int HMAC_CTX_copy(HMAC_CTX *dctx, HMAC_CTX *sctx);
^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:182:49: note: expanded from macro 'OSSL_DEPRECATEDIN_3_0'
#   define OSSL_DEPRECATEDIN_3_0                OSSL_DEPRECATED(3.0)
                                                ^
/opt/homebrew/opt/openssl@3/include/openssl/macros.h:62:52: note: expanded from macro 'OSSL_DEPRECATED'
#     define OSSL_DEPRECATED(since) __attribute__((deprecated))
                                                   ^
21 warnings generated.
[ 30%] Linking C static library libminizip.a
[ 30%] Built target hdf_test_utils
[ 30%] Built target minizip
[ 35%] Building CXX object dorado/3rdparty/elzip/CMakeFiles/elzip.dir/src/zipper.cpp.o
[ 35%] Building CXX object dorado/3rdparty/elzip/CMakeFiles/elzip.dir/src/elzip.cpp.o
[ 35%] Building CXX object dorado/3rdparty/elzip/CMakeFiles/elzip.dir/src/unzipper.cpp.o
-- The C compiler identification is AppleClang 13.1.6.13160021
-- The CXX compiler identification is AppleClang 13.1.6.13160021
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Deprecation Warning at CMakeLists.txt:9 (cmake_policy):
  The OLD behavior for policy CMP0065 will be removed from a future version
  of CMake.

  The cmake-policies(7) manual explains that the OLD behaviors of all
  policies are deprecated and that a policy should be set to OLD only under
  specific short-term circumstances.  Projects should be ported to the NEW
  behavior and not rely on setting a policy to OLD.


-- CMAKE_SYSTEM_PROCESSOR: arm64
-- CMAKE_BUILD_TYPE: Release
-- CMAKE_C_COMPILER: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc
-- CMAKE_C_FLAGS: "-std=c99"     
-- CMAKE_C_FLAGS_DEBUG: -g
-- CMAKE_C_FLAGS_RELEASE: -O3 -DNDEBUG
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/michaelfoster/sequencing/basecalling/dorado/cmake-build/streamvbyte/src/streamvbyte-build
[ 36%] Performing build step for 'streamvbyte'
[  4%] Building C object CMakeFiles/streamvbyte_static.dir/src/streamvbyte_encode.c.o
[ 12%] Building C object CMakeFiles/streamvbyte.dir/src/streamvbyte_encode.c.o
[ 12%] Building C object CMakeFiles/streamvbyte.dir/src/streamvbyte_zigzag.c.o
[ 16%] Building C object CMakeFiles/streamvbyte.dir/src/streamvbyte_decode.c.o
[ 20%] Building C object CMakeFiles/streamvbyte.dir/src/streamvbytedelta_decode.c.o
[ 25%] Building C object CMakeFiles/streamvbyte.dir/src/streamvbyte_0124_encode.c.o
[ 29%] Building C object CMakeFiles/streamvbyte.dir/src/streamvbyte_0124_decode.c.o
[ 38%] Creating metallib
[ 37%] Building C object CMakeFiles/streamvbyte.dir/src/streamvbytedelta_encode.c.o
[ 37%] Building C object CMakeFiles/streamvbyte_static.dir/src/streamvbyte_decode.c.o
[ 45%] Building C object CMakeFiles/streamvbyte_static.dir/src/streamvbyte_zigzag.c.o
[ 45%] Building C object CMakeFiles/streamvbyte_static.dir/src/streamvbytedelta_encode.c.o
[ 54%] Building C object CMakeFiles/streamvbyte_static.dir/src/streamvbyte_0124_decode.c.o
[ 54%] Building C object CMakeFiles/streamvbyte_static.dir/src/streamvbytedelta_decode.c.o
[ 58%] Building C object CMakeFiles/streamvbyte_static.dir/src/streamvbyte_0124_encode.c.o
[ 39%] Linking CXX static library libelzip.a
[ 39%] Built target metal-lib
[ 62%] Linking C shared library libstreamvbyte.dylib
[ 39%] Built target elzip
[ 66%] Linking C static library libstreamvbyte_static.a
[ 66%] Built target streamvbyte_static
[ 66%] Built target streamvbyte
[ 75%] Building C object CMakeFiles/writeseq.dir/tests/writeseq.c.o
[ 79%] Building C object CMakeFiles/example.dir/example.c.o
[ 79%] Building C object CMakeFiles/unit.dir/tests/unit.c.o
[ 83%] Building C object CMakeFiles/perf.dir/tests/perf.c.o
[ 87%] Linking C executable example
[ 91%] Linking C executable writeseq
Apple clang version 13.1.6 (clang-1316.0.21.2.3)
Target: arm64-apple-darwin21.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
 "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld" -demangle -lto_library /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/libLTO.dylib -dynamic -arch arm64 -platform_version macos 12.0.0 12.3 -syslibroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk -o example -search_paths_first -headerpad_max_install_names CMakeFiles/example.dir/example.c.o libstreamvbyte_static.a -lSystem /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/13.1.6/lib/darwin/libclang_rt.osx.a
Apple clang version 13.1.6 (clang-1316.0.21.2.3)
Target: arm64-apple-darwin21.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
 "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld" -demangle -lto_library /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/libLTO.dylib -dynamic -arch arm64 -platform_version macos 12.0.0 12.3 -syslibroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk -o writeseq -search_paths_first -headerpad_max_install_names CMakeFiles/writeseq.dir/tests/writeseq.c.o libstreamvbyte_static.a -lSystem /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/13.1.6/lib/darwin/libclang_rt.osx.a
[ 95%] Linking C executable perf
Apple clang version 13.1.6 (clang-1316.0.21.2.3)
Target: arm64-apple-darwin21.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
 "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld" -demangle -lto_library /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/libLTO.dylib -dynamic -arch arm64 -platform_version macos 12.0.0 12.3 -syslibroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk -o perf -search_paths_first -headerpad_max_install_names CMakeFiles/perf.dir/tests/perf.c.o libstreamvbyte_static.a -lm -lSystem /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/13.1.6/lib/darwin/libclang_rt.osx.a
[100%] Linking C executable unit
[100%] Built target example
[100%] Built target writeseq
Apple clang version 13.1.6 (clang-1316.0.21.2.3)
Target: arm64-apple-darwin21.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
 "/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld" -demangle -lto_library /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/libLTO.dylib -dynamic -arch arm64 -platform_version macos 12.0.0 12.3 -syslibroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk -o unit -search_paths_first -headerpad_max_install_names CMakeFiles/unit.dir/tests/unit.c.o libstreamvbyte_static.a -lSystem /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/13.1.6/lib/darwin/libclang_rt.osx.a
[100%] Built target perf
[100%] Built target unit
[ 41%] Performing install step for 'streamvbyte'
Consolidate compiler generated dependencies of target streamvbyte_static
Consolidate compiler generated dependencies of target streamvbyte
[ 66%] Built target streamvbyte_static
[ 66%] Built target streamvbyte
Consolidate compiler generated dependencies of target example
Consolidate compiler generated dependencies of target unit
Consolidate compiler generated dependencies of target writeseq
Consolidate compiler generated dependencies of target perf
[ 75%] Built target example
[ 87%] Built target unit
[100%] Built target writeseq
[100%] Built target perf
Install the project...
-- Install configuration: "Release"
-- Installing: /Users/michaelfoster/sequencing/basecalling/dorado/cmake-build/streamvbyte_lib/include/streamvbyte.h
-- Installing: /Users/michaelfoster/sequencing/basecalling/dorado/cmake-build/streamvbyte_lib/include/streamvbytedelta.h
-- Installing: /Users/michaelfoster/sequencing/basecalling/dorado/cmake-build/streamvbyte_lib/include/streamvbyte_zigzag.h
-- Installing: /Users/michaelfoster/sequencing/basecalling/dorado/cmake-build/streamvbyte_lib/lib/libstreamvbyte.dylib
-- Installing: /Users/michaelfoster/sequencing/basecalling/dorado/cmake-build/streamvbyte_lib/lib/libstreamvbyte_static.a
[ 42%] Completed 'streamvbyte'
[ 42%] Built target streamvbyte
[ 45%] Building CXX object dorado/3rdparty/hdf_plugins/vbz/CMakeFiles/vbz.dir/v0/vbz_streamvbyte.cpp.o
[ 45%] Building CXX object dorado/3rdparty/hdf_plugins/vbz/CMakeFiles/vbz.dir/v1/vbz_streamvbyte.cpp.o
[ 47%] Building CXX object dorado/3rdparty/hdf_plugins/vbz/CMakeFiles/vbz.dir/vbz.cpp.o
[ 48%] Linking CXX static library ../../../../lib/libvbz.a
[ 48%] Built target vbz
[ 52%] Building CXX object dorado/3rdparty/hdf_plugins/vbz_plugin/CMakeFiles/vbz_hdf_plugin.dir/vbz_plugin.cpp.o
[ 52%] Building CXX object dorado/3rdparty/hdf_plugins/vbz/test/CMakeFiles/vbz_test.dir/streamvbyte_test.cpp.o
[ 52%] Building CXX object dorado/3rdparty/hdf_plugins/vbz/test/CMakeFiles/vbz_test.dir/vbz_test.cpp.o
[ 54%] Building CXX object dorado/3rdparty/hdf_plugins/vbz/test/CMakeFiles/vbz_test.dir/main.cpp.o
[ 55%] Linking CXX static library ../../../../lib/libvbz_hdf_plugin.a
[ 55%] Built target vbz_hdf_plugin
[ 57%] Building CXX object dorado/3rdparty/hdf_plugins/vbz_plugin/test/CMakeFiles/vbz_hdf_plugin_test.dir/main.cpp.o
[ 60%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/read_pipeline/ScalerNode.cpp.o
[ 60%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/cli/basecaller.cpp.o
[ 61%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/cli/download.cpp.o
[ 63%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/nn/CRFModel.cpp.o
[ 67%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/read_pipeline/BasecallerNode.cpp.o
[ 67%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/read_pipeline/ReadPipeline.cpp.o
[ 67%] Building CXX object dorado/3rdparty/hdf_plugins/vbz_plugin/test/CMakeFiles/vbz_hdf_plugin_test.dir/vbz_hdf_plugin_test.cpp.o
[ 69%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/data_loader/DataLoader.cpp.o
[ 70%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/read_pipeline/WriterNode.cpp.o
[ 72%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/decode/beam_search.cpp.o
[ 75%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/decode/GPUDecoder.cpp.o
[ 75%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/utils/compat_utils.cpp.o
[ 76%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/decode/CPUDecoder.cpp.o
[ 77%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/decode/fast_hash.cpp.o
[ 80%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/utils/sequence_utils.cpp.o
[ 82%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/nn/MetalCRFModel.cpp.o
[ 83%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/utils/stitch.cpp.o
[ 80%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/utils/tensor_utils.cpp.o
[ 85%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/utils/metal_utils.cpp.o
[ 86%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/decode/MTLDecoder.cpp.o
[ 88%] Linking CXX executable ../../../../../bin/vbz_test
[ 88%] Built target vbz_test
[ 89%] Linking CXX executable ../../../../../bin/vbz_hdf_plugin_test
[ 89%] Built target vbz_hdf_plugin_test
[ 91%] Linking CXX static library lib/libdorado_lib.a
[ 91%] Built target dorado_lib
[ 92%] Building CXX object CMakeFiles/dorado.dir/dorado/main.cpp.o
[ 94%] Building CXX object tests/CMakeFiles/dorado_tests.dir/main.cpp.o
[ 97%] Building CXX object tests/CMakeFiles/dorado_tests.dir/Fast5DataLoaderTest.cpp.o
[ 97%] Building CXX object tests/CMakeFiles/dorado_tests.dir/ReadTest.cpp.o
[ 98%] Linking CXX executable bin/dorado
[ 98%] Built target dorado
[100%] Linking CXX executable dorado_tests
[100%] Built target dorado_tests

looks to have worked but only error is:

CMake Warning at dorado/3rdparty/torch-1.10.2-Darwin/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
  static library kineto_LIBRARY-NOTFOUND not found.
Call Stack (most recent call first):
  dorado/3rdparty/torch-1.10.2-Darwin/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
  CMakeLists.txt:195 (find_package)

anyway, ran the tests and they worked fine.

❯ ctest --test-dir cmake-build
Internal ctest changing into directory: /Users/michaelfoster/sequencing/basecalling/dorado/cmake-build
Test project /Users/michaelfoster/sequencing/basecalling/dorado/cmake-build
    Start 1: vbz_test
1/3 Test #1: vbz_test .........................   Passed    4.26 sec
    Start 2: vbz_hdf_plugin_test
2/3 Test #2: vbz_hdf_plugin_test ..............   Passed    0.12 sec
    Start 3: dorado_tests
3/3 Test #3: dorado_tests .....................   Passed    1.06 sec

100% tests passed, 0 tests failed out of 3

Total Test time (real) =   5.44 sec

okay cool. lets run the test binaries in cmake-build/bin/

❯ ./vbz_hdf_plugin_test
===============================================================================
All tests passed (12 assertions in 6 test cases)

that worked, the next one:

❯ ./vbz_test
===============================================================================
All tests passed (101 assertions in 13 test cases)

cool, that one worked as well.

now to download the models.
will just download them to the bin folder as it downloads to . as default.

❯ ./dorado download
 - downloading [email protected] [200]
 - downloading [email protected] [200]
 - downloading [email protected] [200]
 - downloading [email protected] [200]
 - downloading [email protected] [200]
 - downloading [email protected] [200]
 - downloading [email protected] [200]
 - downloading [email protected] [200]
 - downloading [email protected] [200]
 - downloading [email protected] [200]
 - downloading [email protected] [200]
 - downloading [email protected] [200]

cool, lets basecall the lambda barcode now.

❯ ./dorado basecaller [email protected] /Users/michaelfoster/sequencing/dorado_test/ >dorado-test-lambda.sam
> Creating basecall pipeline
Assertion failed: (ptr != nullptr), function mtl_for_tensor, file metal_utils.cpp, line 138.
[1]    53532 abort      ./dorado basecaller [email protected]  > dorado-test-lambda.sam

same error.
Where do I go next for troubleshooting? Is it an issue with running it directly from the cmake-build/bin folder? Honestly I have no idea what the issue is but hopefully I've provided enough information to assist in troubleshooting.

Thanks again. Looking forward to being able to do entire sequencing runs on my macbook pro without having to ssh into the cluster.

AttributeError: module 'collections' has no attribute 'Iterable'

Hi, I have been trying to install dorado but am getting what I think is a dependancy error. This is what I am seeing ... I have a feeling it has to do with the depreciation of collections.Iterable to collections.abc.Iterable. However I am not sure how to fix this error!

Discarding https://files.pythonhosted.org/packages/4e/a6/4a1576e4a51b10b4b7440cbd36fc3c71fb69fb634672ce0725de614182ca/astropy-4.2.1.tar.gz (from https://pypi.org/simple/astropy/) (requires-python:>=3.7): Requested astropy from https://files.pythonhosted.org/packages/4e/a6/4a1576e4a51b10b4b7440cbd36fc3c71fb69fb634672ce0725de614182ca/astropy-4.2.1.tar.gz (from dorado) has inconsistent version: expected '4.2.1', but metadata has '0.0.0'
Using cached astropy-4.2.tar.gz (7.5 MB)
Installing build dependencies ... error
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> [75 lines of output]
Collecting setuptools
Using cached setuptools-65.5.0-py3-none-any.whl (1.2 MB)
Collecting setuptools_scm
Using cached setuptools_scm-7.0.5-py3-none-any.whl (42 kB)
Collecting wheel
Using cached wheel-0.37.1-py2.py3-none-any.whl (35 kB)
Collecting cython==0.29.14
Using cached Cython-0.29.14.tar.gz (2.1 MB)
Preparing metadata (setup.py): started
Preparing metadata (setup.py): finished with status 'error'
error: subprocess-exited-with-error

    × python setup.py egg_info did not run successfully.
    │ exit code: 1
    ╰─> [50 lines of output]
        Unable to find pgen, not compiling formal grammar.
        running egg_info
        creating /private/var/folders/6s/23wxsrjj0pl5kyfs_57rjf7h0000gn/T/pip-pip-egg-info-7eyq_j1r/Cython.egg-info
        writing /private/var/folders/6s/23wxsrjj0pl5kyfs_57rjf7h0000gn/T/pip-pip-egg-info-7eyq_j1r/Cython.egg-info/PKG-INFO
        writing dependency_links to /private/var/folders/6s/23wxsrjj0pl5kyfs_57rjf7h0000gn/T/pip-pip-egg-info-7eyq_j1r/Cython.egg-info/dependency_links.txt
        writing entry points to /private/var/folders/6s/23wxsrjj0pl5kyfs_57rjf7h0000gn/T/pip-pip-egg-info-7eyq_j1r/Cython.egg-info/entry_points.txt
        writing top-level names to /private/var/folders/6s/23wxsrjj0pl5kyfs_57rjf7h0000gn/T/pip-pip-egg-info-7eyq_j1r/Cython.egg-info/top_level.txt
        writing manifest file '/private/var/folders/6s/23wxsrjj0pl5kyfs_57rjf7h0000gn/T/pip-pip-egg-info-7eyq_j1r/Cython.egg-info/SOURCES.txt'
        Traceback (most recent call last):
          File "<string>", line 2, in <module>
          File "<pip-setuptools-caller>", line 34, in <module>
          File "/private/var/folders/6s/23wxsrjj0pl5kyfs_57rjf7h0000gn/T/pip-install-zdwcc9d3/cython_306094b2917846c79bd3e0660c0a186f/setup.py", line 228, in <module>
            setup(
          File "/opt/homebrew/lib/python3.10/site-packages/setuptools/__init__.py", line 87, in setup
            return distutils.core.setup(**attrs)
          File "/opt/homebrew/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
            return run_commands(dist)
          File "/opt/homebrew/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
            dist.run_commands()
          File "/opt/homebrew/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands
            self.run_command(cmd)
          File "/opt/homebrew/lib/python3.10/site-packages/setuptools/dist.py", line 1217, in run_command
            super().run_command(command)
          File "/opt/homebrew/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
            cmd_obj.run()
          File "/opt/homebrew/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 308, in run
            self.find_sources()
          File "/opt/homebrew/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 316, in find_sources
            mm.run()
          File "/opt/homebrew/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 560, in run
            self.add_defaults()
          File "/opt/homebrew/lib/python3.10/site-packages/setuptools/command/egg_info.py", line 597, in add_defaults
            sdist.add_defaults(self)
          File "/opt/homebrew/lib/python3.10/site-packages/setuptools/command/sdist.py", line 106, in add_defaults
            super().add_defaults()
          File "/opt/homebrew/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 252, in add_defaults
            self._add_defaults_ext()
          File "/opt/homebrew/lib/python3.10/site-packages/setuptools/_distutils/command/sdist.py", line 336, in _add_defaults_ext
            build_ext = self.get_finalized_command('build_ext')
          File "/opt/homebrew/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 306, in get_finalized_command
            cmd_obj.ensure_finalized()
          File "/opt/homebrew/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 109, in ensure_finalized
            self.finalize_options()
          File "/private/var/folders/6s/23wxsrjj0pl5kyfs_57rjf7h0000gn/T/pip-install-zdwcc9d3/cython_306094b2917846c79bd3e0660c0a186f/Cython/Distutils/build_ext.py", line 20, in finalize_options
            self.distribution.ext_modules[:] = cythonize(
          File "/private/var/folders/6s/23wxsrjj0pl5kyfs_57rjf7h0000gn/T/pip-install-zdwcc9d3/cython_306094b2917846c79bd3e0660c0a186f/Cython/Build/Dependencies.py", line 959, in cythonize
            module_list, module_metadata = create_extension_list(
          File "/private/var/folders/6s/23wxsrjj0pl5kyfs_57rjf7h0000gn/T/pip-install-zdwcc9d3/cython_306094b2917846c79bd3e0660c0a186f/Cython/Build/Dependencies.py", line 752, in create_extension_list
            elif isinstance(patterns, basestring) or not isinstance(patterns, collections.Iterable):
        AttributeError: module 'collections' has no attribute 'Iterable'
        [end of output]
  
    note: This error originates from a subprocess, and is likely not a problem with pip.
  error: metadata-generation-failed
  
  × Encountered error while generating package metadata.
  ╰─> See above for output.
  
  note: This is an issue with the package mentioned above, not pip.
  hint: See above for details.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

docker image available?

I'm still experiencing issues compiling dorado on our HPC. I'm wondering if there is a Docker or Singularity image available with the latest builds? We can run those images on our GPU nodes.

target: Intel + Nvidia A100

output metadata

Hello,

I would like some insights on the sam output metadata to perform quality checks on the basecalled data, and I am not sure I am guessing all of them right.

Also, what would you recommend to filter out bad reads?

Thank you,

ziphra

Download models code 200

When I run the windows executable for dorado to download a specific model. or even all models using or not using --directory option the files are no where to be found in the folder.

Command:
c:\dorado-0.0.1+4b67720-win64\bin>dorado.exe download --directory C:\dorado-0.0.1+4b67720-win64\models --model [email protected]

output
downloading [email protected] [200]
does this [200] mean the models were or were not downloaded?

There was an error while I was executing the command "cmake --build cmake-build --config Release -- -j" in jetson orin

/dorado/dorado/data_loader/DataLoader.cpp:12:10: fatal error: pod5_format/c_api.h: No such file or directory
12 | #include "pod5_format/c_api.h"
| ^~~~~~~~~~~~~~~~~~~~~
compilation terminated.

stereo duplex models not getting downloaded?

Very excited about all the progress with dorado, just tried basecalling and simplex works a charm, but I can't get the new stereo duplex to work. When I run it I get the following error:

[2022-12-14 11:15:29.512] [error] toml::parse: file open error -> /home/f002sd4/ont-dependencies/dorado/models/[email protected]/[email protected]/config.toml

I do not get a the [email protected] directory when i download the model. And that appears to be true in all models if i try to download them all.

Thanks again for all the work, can't wait until dorado officially replaces guppy!

m1 neural engine

I can see Dorado using my M1 Max GPU, but does it also use the apple Neural Engine? I'm curious, as I'm not sure if there's any way to monitor usage of that feature while the program is running.
I just finished basecalling a sample with the SUP RNA model, with a rate of "Samples/s: 1.396180e+05" ; is this normal? or can I increase the speed somehow?

Building on mac osx

I'm going to document what I ended up installing on my m1 macbook pro max to compile dorado:

0. Packages installed using brew

cmake
openssl
hdf5

1. Xcode installation and profile switch for being able to use metal

You need to download and install Xcode, i would recommend downloading it directly from https://developer.apple.com/download/all/ instead of installing it from the App store as the latter is relatively slow (max 2.4 Mb/sec download)

Afterwards, you need to switch
sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer

Error 2

Hell guys,

Hope you are doing well. I am on the configuration step as mentioned in clone and build section 'cmake --build cmake-build --config Release -j' I come across the following error.

home/hamid/dorado/dorado/utils/sequence_utils.cpp: In function 'float utils::mean_qscore_from_qstring(const string&)':
/home/hamid/dorado/dorado/utils/sequence_utils.cpp:17:10: error: 'transform' is not a member of 'std'
std::transform(qstring.begin(), qstring.end(), std::back_inserter(scores),
^~~~~~~~~
gmake[2]: *** [CMakeFiles/dorado_lib.dir/build.make:258: CMakeFiles/dorado_lib.dir/dorado/utils/sequence_utils.cpp.o] Error 1
gmake[2]: *** Waiting for unfinished jobs....
[ 87%] Linking CXX executable ../../../../../bin/vbz_test
[ 88%] Linking CXX executable ../../../../../bin/vbz_hdf_plugin_test
/usr/bin/ld: /home/hamid/miniconda3/envs/dorado/lib/libhdf5.a(H5CX.o): undefined reference to symbol 'pthread_setspecific@@GLIBC_2.2.5'
//usr/lib64/libpthread.so.0: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
gmake[2]: *** [dorado/3rdparty/hdf_plugins/vbz_plugin/test/CMakeFiles/vbz_hdf_plugin_test.dir/build.make:120: bin/vbz_hdf_plugin_test] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:1258: dorado/3rdparty/hdf_plugins/vbz_plugin/test/CMakeFiles/vbz_hdf_plugin_test.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
[ 88%] Built target vbz_test
gmake[1]: *** [CMakeFiles/Makefile2:290: CMakeFiles/dorado_lib.dir/all] Error 2
gmake: *** [Makefile:166: all] Error 2

Thanks in advance..

macos binary - quits on startup

downloaded the binary for a mac (chipset: Apple M1 Max) - OS: Monterery 12.5.1

I get this error when calling dorado from command-line

dyld[7721]: Library not loaded: '/opt/homebrew/opt/libaec/lib/libsz.2.dylib'

any suggestions?

FASTQ output for Dorado has missing '+' symbol

I was trying to debug my script that parses FASTQ files and I noticed that Dorado's FASTQ output is missing a '+' symbol after each read.

Example output:

@HD	VN:1.6	SO:unknown
@PG	ID:basecaller	PN:dorado	VN:0.0.3+09107cc	CL:dorado basecaller [email protected] pod5/
0005c095-e552-4562-87d4-4f8afed89068	4	*	0	0	*	*	0	138	CTTGTACTTCGGTTTCAGTTACGTATTGCCTGACTAAACTGTGCCCACAAAGGGTGTATACACAATGCGGAGGGTTTCTTTTTATTGAACGTGATTGAACAGTCTCATAGCCATATTTCTCGCCCACATTCGCTTATT	&'*--88::;*&&((.,,7;-,++****+113=?/.499:9=...-2111-110027?ABBAEDA@B@><>=9546--00;:?GEGCA?>?A@?A@*)*74777=><=<33312,*)))**/79;A>A?BDCA@?,++	qs:i:14	du:f:0.596500	ns:i:2386	ts:i:139	mx:i:1	ch:i:77	st:Z:2022-11-12T01:52:47.429+00:00	rn:i:44532	f5:Z:output.pod5	sm:f:91.461	sd:f:15.861	sv:Z:quantile
001a4c77-7570-4e4c-990c-9be2525af712	4	*	0	0	*	*	0	158	GTACTTCGTTCAGTTACGTATTGCTGCTATACCCACATCCATGTGTTAGCGGCTATGAACTGTTCGCTAATCACAGCGTTCTCTAGTCGGCGTGATACATGGGGTCCCCTTTGTGGGCGCAGTTTAGTCTCCCGCTAAAGTTCTCCAGTCTGGAGAGC	%'()+,2..26..8855221-.4565:;;=988=?<=>BC5ACCGBA;:;;==>=>>===?>>A?=<;91.,---,+)(**<>>=<=<;:;;<?AC@??@44486337768>>;==>>(((*1246566=?@;8:86<8666=CC337=?/***'''&	qs:i:15	du:f:0.740500	ns:i:2962	ts:i:236	mx:i:4	ch:i:49	st:Z:2022-11-12T01:52:20.883+00:00	rn:i:36513	f5:Z:output.pod5	sm:f:82.420	sd:f:13.078	sv:Z:quantile
001dbafc-b693-4645-acf1-9cafd1e7a418	4	*	0	0	*	*	0	170	ACTTCGTTCAGTTACGTATTGCTCCTCATAAAAATTTCATCTTTTGAGGAGACTAAACTGCGTGCCCCAAGGGCTAAGTCTCGCCTATGGACAACTAGGTAATGCGGTGATTAGCGAACAGTCTCATAACCCCGTCTGTTTTGTAACTCATTCTTACTGTAGCCATGTAC	%(68;5585556677*)))3/001;8976<;2146<77312023.0111*233133--,*'&'),*(('%%).-.4:8999>443*)/,+,?@@?><==>BDFGE@>>?BDD@/...06.../9B:86*)))87677=B><18111=A@7678+(&&%%%%%&*((((++	qs:i:12	du:f:0.703500	ns:i:2814	ts:i:127	mx:i:4	ch:i:54	st:Z:2022-11-12T01:51:41.745+00:00	rn:i:41230	f5:Z:output.pod5	sm:f:81.867	sd:f:14.933	sv:Z:quantile
001e5da9-4a73-4fcb-9f02-3b1b6e7831eb	4	*	0	0	*	*	0	175	CATGTACTTCGTTCAGTTACGTATTGCTACAACTAATGAAGTAATCGAATGTGAGCTATGAGACTGTTCGCTAATCACCGCCGGCATTCTGGTTCGCAGCTATATCGGTGGTTCCCTTTGTGGGCACAGTTTAGTCTCCTAGGAAACTAAACACTCCGTAGCCAGCAATACGTAA	%&'(*7:=?B@@@A?>>>7733/-,*+,-=?A>?><@C?=7001-?((((((()44412:@====@ACC@>==?CB?=;::98:;=@@?<<<>ABBC>>>;==?B@A??@?A<@?=>?BB@@?::9:>@CEEA?>===/.2>=;>>?@@@BD?===>>?A:;<<=<<<?97**)&	qs:i:16	du:f:0.822500	ns:i:3290	ts:i:121	mx:i:1	ch:i:119	st:Z:2022-11-12T01:51:32.186+00:00	rn:i:39654	f5:Z:output.pod5	sm:f:78.174	sd:f:14.006	sv:Z:quantile
0022ceb2-454a-47f7-ad71-73f08da4dbbe	4	*	0	0	*	*	0	117	ACTTCGTTCAGTTACGTATTGCTTATCCTATGCCACAGGAGTTGGCTGCTGGGCTATGAGACTGTTCGCTAATCACTTTAGGGTCGAATATCCACCACCGTATAGCATTTCCCTTTG	$%36::;=B>=A@<<4444;(((),16=;;<,+++,''((**+,++,679:<889:966523=ADFD;>?=@@@9899=:3/)(*)())''*/,3653/8:<?,)))-176*&''%&	qs:i:11	du:f:0.522250	ns:i:2089	ts:i:242	mx:i:4	ch:i:87	st:Z:2022-11-12T01:50:59.196+00:00	rn:i:42808	f5:Z:output.pod5	sm:f:84.592	sd:f:13.171	sv:Z:quantile
0022e929-9cf1-415d-b9d3-47f55e808188	4	*	0	0	*	*	0	309	TGTACTTCGTTCAGTTACGTATTGCTATAATAACCTATGCTGAAACTTAATGGCTATGAGACTGTTCGCTAATCACAAAAGGGTGAAACTCTGCGATGCAGTAGTTAAGGAGACCCTTTGTGGGCACAGTTTAGTCTCTGACATCTTGTTAGAATGAGTATGATTCTTGCCAACATTGCACATACCTCCGAGACTAAACTGTGCCCACAAAGGGGAAAAGCTACTGACGGGCTTGTTAAACAGTGAGACAGTGATTAGCGAACAGTCTCATAGCCGGTAGATGGAGCATTCTTGAGATAGCAATACGGC	(+,,112:889=<;=;98...-77889ADDEECB@?==?CA=>=CAC?FGHFA???ADD@A@@BBEFDCBCCBA?ABB,+,/53556:9:;>7666522445<>@@CB,.2?C>??>>>?;;=>>=>????@:8767<;6574469<=<<@98::*))+-18=CBEAE=<<9:899;776422679<=3168854555>?AA<98787>A@03:4297667589:=?C@>9;??CDEECFGA>>;<?AB211:A@?510--.22>?=????>;<9777=??@@><<<?ADF@@B?437865555//&%%	qs:i:18	du:f:1.731250	ns:i:6925	ts:i:256	mx:i:2	ch:i:101	st:Z:2022-11-12T01:52:43.676+00:00

what's the GPU memory utilization rate? Does it depend on read length?

Failed to find the kernel: lstm_simd_768_fwd_16

Using dorado version 0.0.1a0.
I've successfully built dorado on my mac, downloading basecaller models works. I do however get the following error for the super accurate model:

(base) mbp-van-thomas:bin thomasg$ ./dorado basecaller [email protected] fast5_pass
> Creating basecall pipeline
Failed to find the kernel: lstm_simd_768_fwd_16

10.4_e8.1_sup models?

Hi Dorado team! Congratulations on the announcements at NCM today. I'm eagerly following the developments, with particular interest in running Stereo Duplex on our MrHAMER data. We have successfully run guppy duplex on pairs of reads clustered based on UMI sequence and have been consistently getting Q30 median values. As you can imagine, I'm quite eager to transition to Dorado and take advantage of the reported performance and accuracy improvements of Stereo Duplex (and other features). However, I'm unable to find a 10.4_e8.1_sup model anywhere in the current version, and as an early Q20EA/R10.4 adopter have a good amount of data using this chemistry and pore combination. I'd be very thankful if you can point me in the right direction, or if perhaps you have any preliminary models that you'd be able to share for testing purposes. Thanks!

CUDAOutOfMemoryError for duplex with 3080ti (12Gb)

Hi,

duplex fails with OOM as shown below, is it possible to reduce memory footprint like with chunks_per_runner in guppy?

terminate called after throwing an instance of 'c10::CUDAOutOfMemoryError'
  what():  CUDA out of memory. Tried to allocate 1.95 GiB (GPU 0; 11.77 GiB total capacity; 4.51 GiB already allocated; 1.47 GiB free; 8.38 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception raised from malloc at ../c10/cuda/CUDACachingAllocator.cpp:578 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f12ef63720e in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libc10.so)
frame #1: <unknown function> + 0x1667f (0x7f12ef6a167f in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libc10_cuda.so)
frame #2: <unknown function> + 0x46528 (0x7f12ef6d1528 in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libc10_cuda.so)
frame #3: <unknown function> + 0x46752 (0x7f12ef6d1752 in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libc10_cuda.so)
frame #4: at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>) + 0x7bf (0x7f12f0ee46af in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libtorch_cpu.so)
frame #5: at::detail::empty_cuda(c10::ArrayRef<long>, c10::ScalarType, c10::optional<c10::Device>, c10::optional<c10::MemoryFormat>) + 0x115 (0x7f13098c3ca5 in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libtorch_cuda_cpp.so)
frame #6: at::detail::empty_cuda(c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>, c10::optional<c10::MemoryFormat>) + 0x31 (0x7f13098c3f01 in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libtorch_cuda_cpp.so)
frame #7: at::detail::empty_cuda(c10::ArrayRef<long>, c10::TensorOptions const&) + 0x10f (0x7f13098c406f in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libtorch_cuda_cpp.so)
frame #8: <unknown function> + 0x2d720db (0x7f12c22340db in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libtorch_cuda_cu.so)
frame #9: <unknown function> + 0x2e289c6 (0x7f12c22ea9c6 in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libtorch_cuda_cu.so)
frame #10: at::meta::structured_mm::meta(at::Tensor const&, at::Tensor const&) + 0x179 (0x7f12f11f6019 in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libtorch_cpu.so)
frame #11: <unknown function> + 0x2d7ad2a (0x7f12c223cd2a in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libtorch_cuda_cu.so)
frame #12: <unknown function> + 0x2d7adc0 (0x7f12c223cdc0 in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libtorch_cuda_cu.so)
frame #13: at::_ops::mm::call(at::Tensor const&, at::Tensor const&) + 0xc4 (0x7f12f1af5084 in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libtorch_cpu.so)
frame #14: <unknown function> + 0x1a7dc05 (0x7f12f11f8c05 in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libtorch_cpu.so)
frame #15: at::native::matmul(at::Tensor const&, at::Tensor const&) + 0x40 (0x7f12f11f90e0 in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libtorch_cpu.so)
frame #16: <unknown function> + 0x273c1b0 (0x7f12f1eb71b0 in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libtorch_cpu.so)
frame #17: at::_ops::matmul::call(at::Tensor const&, at::Tensor const&) + 0xc4 (0x7f12f1bc4a84 in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libtorch_cpu.so)
frame #18: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x4fb255]
frame #19: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x5205f1]
frame #20: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x520f1e]
frame #21: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x58932c]
frame #22: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x588233]
frame #23: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x5872e1]
frame #24: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x583b26]
frame #25: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x571bae]
frame #26: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x56bdd3]
frame #27: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x56359c]
frame #28: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x589d4e]
frame #29: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x5886b7]
frame #30: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x5879d9]
frame #31: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x585fba]
frame #32: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x5719ea]
frame #33: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x65a50f]
frame #34: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x659dc9]
frame #35: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x65ce78]
frame #36: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x65cdba]
frame #37: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x65cd29]
frame #38: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x65ccb6]
frame #39: /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/dorado() [0x65cc5a]
frame #40: <unknown function> + 0x145a0 (0x7f13187b55a0 in /home/ola/data/dorado-0.1.0+4b0e9a6-Linux/bin/../lib/libtorch_cuda.so)
frame #41: <unknown function> + 0x8609 (0x7f12bed34609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #42: clone + 0x43 (0x7f12be901133 in /lib/x86_64-linux-gnu/libc.so.6)

Aborted (core dumped)

PromethION-MinION models

I would like to know whether we can use the currently available dorado models for both PromethION and MinION runs, since I don't see any distinguishing tag (e.g. "_prom" in guppy). Or are they specific for either platform?
Thank you very much!

Dorado 'Failed to open pod5'

Trying to run dorado with pod5 gives me this error: Failed to open file ./pod5/output.pod5: IOError: Unknown embedded file type

I converted my fast5 to pod5 with:
pod5-convert-from-fast5 fast5_pass/ pod5
When running dorado (new binary) with the pod5 i get the error (full output below). Running it with fast5 works fine, and inspecting the pod5 with pod5-inspect throws no errors.

Thanks for the help!
-Francesco

/home/f002sd4/ont-dependencies/dorado/bin/dorado basecaller  /home/f002sd4/ont-dependencies/dorado/[email protected] ./pod5/ -x "cuda:0" --remora-models /home/f002sd4/ont-dependencies/dorado/[email protected]_5mCG@v2
[2022-11-11 10:48:35.417] [info] > Creating basecall pipeline
@HD     VN:1.6  SO:unknown
@PG     ID:basecaller   PN:dorado       VN:0.0.2+acbca36        CL:dorado basecaller /home/f002sd4/ont-dependencies/dorado/[email protected] ./pod5/ -x cuda:0 --remora-models /home/f002sd4/ont-dependencies/dorado/[email protected]_5mCG@v2
[2022-11-11 10:48:39.807] [error] Failed to open file ./pod5/output.pod5: IOError: Unknown embedded file type
[2022-11-11 10:48:39.807] [error] Failed to query batch count: Invalid: null file passed to C API
[2022-11-11 10:48:40.206] [info] > Reads basecalled: 0
[2022-11-11 10:48:40.206] [info] > Samples/s: 0.000000e+00
[2022-11-11 10:48:40.206] [info] > Finished

CUDAOutOfMemoryError on A100

I managed to compile on RedHat/CentOS8, but I'm getting errors with the 'sup' models:
[email protected] and [email protected]

Data is from a: FLO-MIN106 SQK-DCS109 dna_r9.4.1_450bps_hac
An amplicon run, so no prior info other than that it should contain 16s sequences.

the following modes work fine: [email protected], [email protected], [email protected] and [email protected]
However, what model would your suggest to use and what would the best method be to compare them against Bonito and the Guppy output?

error below:

Creating basecall pipeline
@hd VN:1.5 SO:unknown
@pg ID:basecaller PN:dorado VN:0.0.1a0 CL:dorado basecaller [email protected] /projects/0/lwc2020006/nanopore/0_5cmSedAarhusBay/test
terminate called after throwing an instance of 'c10::CUDAOutOfMemoryError'
what(): CUDA out of memory. Tried to allocate 12.50 GiB (GPU 0; 39.59 GiB total capacity; 27.41 GiB already allocated; 9.90 GiB free; 27.46 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Exception raised from malloc at ../c10/cuda/CUDACachingAllocator.cpp:536 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x148b5fee7d62 in /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libc10.so)
frame #1: + 0x257de (0x148b179577de in /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libc10_cuda.so)
frame #2: + 0x264b2 (0x148b179584b2 in /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libc10_cuda.so)
frame #3: + 0x268e2 (0x148b179588e2 in /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libc10_cuda.so)
frame #4: at::native::empty_cuda(c10::ArrayRef, c10::optionalc10::ScalarType, c10::optionalc10::Layout, c10::optionalc10::Device, c10::optional, c10::optionalc10::MemoryFormat) + 0x124 (0x148b797d25a4 in /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libtorch_cuda_cpp.so)
frame #5: + 0x25aaed9 (0x148b1f00eed9 in /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libtorch_cuda_cu.so)
frame #6: + 0x25ee6fd (0x148b1f0526fd in /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libtorch_cuda_cu.so)
frame #7: at::TensorIteratorBase::allocate_or_resize_outputs() + 0x25b (0x148b613277db in /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libtorch_cpu.so)
frame #8: at::TensorIteratorBase::build(at::TensorIteratorConfig&) + 0x1d3 (0x148b61328b23 in /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libtorch_cpu.so)
frame #9: at::TensorIteratorBase::build_borrowing_binary_op(at::Tensor const&, at::Tensor const&, at::Tensor const&) + 0xd5 (0x148b6132a1f5 in /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libtorch_cpu.so)
frame #10: + 0x25e2cfd (0x148b1f046cfd in /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libtorch_cuda_cu.so)
frame #11: + 0x25e2dcf (0x148b1f046dcf in /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libtorch_cuda_cu.so)
frame #12: at::_ops::mul_Tensor::call(at::Tensor const&, at::Tensor const&) + 0x136 (0x148b61a43556 in /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libtorch_cpu.so)
frame #13: at::native::mul(at::Tensor const&, c10::Scalar const&) + 0xaf (0x148b614d581f in /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libtorch_cpu.so)
frame #14: + 0x1e300bf (0x148b61f5e0bf in /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libtorch_cpu.so)
frame #15: at::_ops::mul_Scalar::call(at::Tensor const&, c10::Scalar const&) + 0x12d (0x148b61dce4cd in /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libtorch_cpu.so)
frame #16: dorado() [0x5360f5]
frame #17: dorado() [0x536467]
frame #18: dorado() [0x539be5]
frame #19: dorado() [0x559e28]
frame #20: dorado() [0x5593bf]
frame #21: dorado() [0x55860b]
frame #22: dorado() [0x555556]
frame #23: dorado() [0x544502]
frame #24: dorado() [0x5405c0]
frame #25: dorado() [0x53c3ee]
frame #26: dorado() [0x55a90e]
frame #27: dorado() [0x55989b]
frame #28: dorado() [0x558e21]
frame #29: dorado() [0x5576d2]
frame #30: dorado() [0x4cd878]
frame #31: dorado() [0x4cd4a7]
frame #32: dorado() [0x4ccee5]
frame #33: dorado() [0x56009a]
frame #34: dorado() [0x560be8]
frame #35: dorado() [0x56b723]
frame #36: dorado() [0x56b5a1]
frame #37: dorado() [0x56b489]
frame #38: dorado() [0x56b396]
frame #39: dorado() [0x56b320]
frame #40: + 0xc71f (0x148be296371f in /projects/0/lwc2020006/software/nanoporetech/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libtorch_cuda.so)
frame #41: + 0x814a (0x148be2d7514a in /lib64/libpthread.so.0)
frame #42: clone + 0x43 (0x148b1bf99dc3 in /lib64/libc.so.6)

system:
ThinkSystem SD650-N v2
Intel Xeon Platinum 8360Y (2x),36 Cores/Socket, 2.4 GHz (Speed Select SKU), 250W
NVIDIA A100 (4x),40 GiB HMB2 memory with 5 active memory stacks per GPU
16x32 GiB,3200 MHz, DDR4
512GiB160GiB HMB2(7.111 GiB)
2xHDR100 ConnectX-6 single port2x25GbE SFP28 LOM1x1GbE RJ45 LOM

Correct parameter format to specify multi-GPUs

I am trying ot get Dorado running with multiple GPUs. Is --device cuda:0,1 the correct way to specify two GPUs? Dorado aborts with the following

> Creating basecall pipeline
terminate called after throwing an instance of 'c10::Error'
  what():  Invalid device string: 'cuda:0,1'
Exception raised from Device at ../c10/core/Device.cpp:115 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f30c2c04d62 in /home/hasindu/hasindu2008.git/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f30c2c0168b in /home/hasindu/hasindu2008.git/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libc10.so)
frame #2: c10::Device::Device(std::string const&) + 0x315 (0x7f30c2bea195 in /home/hasindu/hasindu2008.git/dorado/dorado/3rdparty/torch-1.10.2-Linux/libtorch/lib/libc10.so)
frame #3: <unknown function> + 0x108fa2 (0x55e15b71cfa2 in ./dorado)
frame #4: <unknown function> + 0x108fe1 (0x55e15b71cfe1 in ./dorado)
frame #5: <unknown function> + 0x1077f4 (0x55e15b71b7f4 in ./dorado)
frame #6: <unknown function> + 0x105301 (0x55e15b719301 in ./dorado)
frame #7: <unknown function> + 0x1036e7 (0x55e15b7176e7 in ./dorado)
frame #8: <unknown function> + 0x10087a (0x55e15b71487a in ./dorado)
frame #9: <unknown function> + 0xfe0e2 (0x55e15b7120e2 in ./dorado)
frame #10: <unknown function> + 0xfafef (0x55e15b70efef in ./dorado)
frame #11: <unknown function> + 0xf5640 (0x55e15b709640 in ./dorado)
frame #12: <unknown function> + 0xede7a (0x55e15b701e7a in ./dorado)
frame #13: <unknown function> + 0xe2c2b (0x55e15b6f6c2b in ./dorado)
frame #14: <unknown function> + 0xd2683 (0x55e15b6e6683 in ./dorado)
frame #15: <unknown function> + 0xbcf4f (0x55e15b6d0f4f in ./dorado)
frame #16: <unknown function> + 0x8b0e3 (0x55e15b69f0e3 in ./dorado)
frame #17: <unknown function> + 0x8ba62 (0x55e15b69fa62 in ./dorado)
frame #18: <unknown function> + 0x87aed (0x55e15b69baed in ./dorado)
frame #19: <unknown function> + 0x877c4 (0x55e15b69b7c4 in ./dorado)
frame #20: <unknown function> + 0x863b1 (0x55e15b69a3b1 in ./dorado)
frame #21: __libc_start_main + 0xe7 (0x7f307e0eac87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #22: <unknown function> + 0x85daa (0x55e15b699daa in ./dorado)

Reads basecalled: Metal command buffer execution failed

Om MacOS M1 with 16 GB memory I get

dorado basecaller --emit-fastq ~/bin/dorado-0.0.1+4b67720-Darwin/bin/[email protected] ./fast5 | gzip > run1.fastq.gz

Creating basecall pipeline
Reads basecalled: 200Metal command buffer execution failed: 5, try #0
Reads basecalled: 300Metal command buffer execution failed: 5, try #0
Metal command buffer execution failed: 5, try #1
Metal command buffer execution failed: 5, try #2

etc. etc.
what can I do to solve this?

Assertion failed: (ptr != nullptr), function mtl_for_tensor, file metal_utils.cpp, line 138.

When using either [email protected] or [email protected] i get

(base) mbp-van-thomas:bin thomasg$ ./dorado basecaller [email protected] fast5_pass
> Creating basecall pipeline
@HD	VN:1.5	SO:unknown
@PG	ID:basecaller	PN:dorado	VN:0.0.1a0	CL:dorado basecaller [email protected] fast5_pass
Assertion failed: (ptr != nullptr), function mtl_for_tensor, file metal_utils.cpp, line 138.
Abort trap: 6

Dorado for selective sequencing

We are interested in using Dorado to basecall during selective sequencing. Is there a server client feature like it is there for Guppy?

comment for README.md

Under Developer Quickstart you may want to change the URL to https://github.com/nanoporetech/dorado.git

cheers,

Matt

Compile issues

Trying to compile on the P24 (also have tried our A100 HPC). getting the following:

-- Found HDF5: /usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so;/usr/lib/x86_64-linux-gnu/libpthread.so;/usr/lib/x86_64-linux-gnu/libsz.so;/usr/lib/x86_64-linux-gnu/libz.so;/usr/lib/x86_64-linux-gnu/libdl.so;/usr/lib/x86_64-linux-gnu/libm.so (found suitable version "1.10.4", minimum required is "1.8.16")
CMake Error at /usr/local/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find zstd (missing: ZSTD_LIBRARY ZSTD_INCLUDE_DIR) (Required is
  at least version "1.3.1")
Call Stack (most recent call first):
  /usr/local/share/cmake-3.23/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  dorado/3rdparty/hdf_plugins/cmake/Findzstd.cmake:40 (find_package_handle_standard_args)
  dorado/3rdparty/hdf_plugins/CMakeLists.txt:146 (find_package)

Any ideas? Initially it was complaining that it couldn't find hdf at all so I sudo apt install libhdf5-dev

Thanks

Andrew

Live basecalling benchmark on Apple silicon chips

We would like to do some live basecalling outside of the lab, and we were thinking of getting an Apple Macbook with Apple silicon since dorado is compatible with those. We want to do basecalling in real time as we are sequencing with a MinION device. We are considering any of the M1 Pro, M1 Max and M2 chips.

We would like to know if you have benchmarked any of these chips in a live basecalling situation (as it would be done on a GridION machine). Basically, we want to know if these chips are powerful enough that they will not be a bottleneck.

Also, do you know, how many instances of live basecalling could be run in parallel. If we had two MinION devices running in parallel, could a single machine keep up with that?

If any of this is possible, could you let us know which CPU and memory configuration are required?

Embarassing problem!

Hi, I am trying to get Dorado working on our HPC A30/A100's:

~/beggsa-clinicalnanopore/software/dorado/bin/dorado -v basecaller [email protected] pod5/

All I get is the binary starting and displaying:

0.0.3+09107cc

And then stopping!

Any clues as to what I am doing wrong. I've pointed it to the model folder (shortened here for brevity) and there is a pod5 file in the pod5 directory.

I'm running on RedHat 8.1, CUDA11.3, using the pre-compiled binaries

[error] [error] key "qscore" not found in the top-level table while trying to perform basecalling with 5mCG models

Hi, I am writing to check if I am doing something wrong, I have tried to look up for more documentation but it seems this github is all that there is about dorado.

I installed dorado in a Windows 10 PC and it works fine with the normals models 10.4 fast, hac and sup. But when I try to perform the same process over the same data but with the modified bases models I get this error.


PS C:\Users\phage\OneDrive\Escritorio\dorado-0.0.2+acbca36-win64\bin> ./dorado basecaller .\dorado_trained_models\[email protected]_5mCG@v2\ C:\Users\phage\OneDrive\Escritorio\Salida\NGS_012\NGS_012_30_08_2022\Pool_NGS012\20220830_1143_MN40787_FAT72720_85d0306f\fast5_pass\barcode15\
[2022-11-29 14:38:57.517] [info] > Creating basecall pipeline
[2022-11-29 14:38:57.628] [error] [error] key "qscore" not found in the top-level table
 --> .\dorado_trained_models\[email protected]_5mCG@v2\config.toml

   |

 1 | [general]

   | ^--- the top-level table starts here

Thank you in advance for your help and I am sorry if this error its because I am not running the program properly.

Are PromethION models implemented?

I just compiled the tool successfully on Ubuntu 20.04 (I just needed to upgrade cmake to 3.20.0) and was thinking to give it a try.

We have PromethION data, which in Guppy is represented by these parameters:

FLO-PRO002 SQK-LSK110 dna_r9.4.1_450bps_hac_prom 2021-05-05_dna_r9.4.1_promethion_384_dd219f32

Do any of the currently available models in Dorado apply to this data? Which one?

If not, is there a timeline for PromethION data?

Should the community expect any bumps in basecalling quality in the near-term with Dorado vs Guppy?

How to acquire Remora models in the toml format that Dorado expects as input?

What's the timeline for getting support for modified basecalling models in Dorado?

(Or is this possible already?)

Can't open model file?

Anything I can do? All file permissions seem fine.

`$ ./dorado basecaller --emit-fastq [email protected] /home/exouser/reads
[2022-12-07 03:30:05.329] [info] > Creating basecall pipeline
[2022-12-07 03:30:08.138] [error] toml::parse: file open error -> [email protected]/config.toml

CentOS builded instructions needed

As highlighted in #3 , Build instructions for CentOS are required in the README.

What about AMD cards?

Are there any plans to support AMD/OpenCL with Dorado?

[Question] Best practice after successful dorado run

Hi,
I managed to successfully run dorado on some POD5 files and piped them into BAM format. While I understand dorado as work in progress and appreciate this design choice for its compact standard file format and comparably small memory footprint I was wondering: is there an application you have in mind for downstream processing without going the route of intermediate FASTQ files? In detail, now that I have the base-called reads I'd like to align them to a reference genome but (to the best of my knowledge) neither NGMLR nor minimap2 support (unaligned) BAM alignment records for remapping.

Best regards

M1 Pro: "Metal command buffer execution failed: 5, try #2..."

I'm testing out Dorado on my M1 Pro 14" MacBook Pro using ~14GB of fast5 files from a Lambda phage dataset I found here and after ~15min of runtime, I am getting a bunch of "Metal command buffer execution failed" errors being written to stdout:

Metal command buffer execution failed: 5, try #0
Metal command buffer execution failed: 5, try #1
Metal command buffer execution failed: 5, try #2
Metal command buffer execution failed: 5, try #3
Metal command buffer execution failed: 5, try #4
Metal command buffer execution failed: 5, try #0
Metal command buffer execution failed: 5, try #1
...

They don't stop, so I just cancel basecalling, since at this point my GPU usage drops, but CPU usage spikes (guessing dorado switches to CPU for basecalling at this point).

I ran this command to get dorado going: dorado basecaller -r 2 -b 768 -c 4000 [email protected] ~/Downloads/lambda_fast5 > ~/Downloads/reads.fastq

Thought I'd report this here and see if others had the same issue, or perhaps if I need to change one of my runtime parameters.

RNA model?

I listed the models available but I see no RNA models? Are they not available yet/if so when are they coming?

Dorado basecaller significantly slower than guppy_basecaller on A30 and A100 GPUs

Hi,

I've been investigating using Dorado instead of Guppy on behalf of researchers at my institute.

I've been comparing Dorado's performance with Guppy with pea DNA using the following commands:

# dorado
dorado basecaller ${DORADO_ROOT}/models/[email protected] test-fast5 > /dev/null

# guppy
guppy_basecaller -x 'cuda:0' -c dna_r10.4.1_e8.2_400bps_sup.cfg --num_callers 4 --gpu_runners_per_device 4 --chunks_per_runner 512 -i test-fast5 -s guppy-A30-fast5-results

dorado processes the data at 2.8365e6 samples/s on 1x A30 GPU
guppy processes the data at 3.2495e6 samples/s on 1x A30 GPU

i.e. dorado is ~15% slower. The data in pod5 format didn't result in a meaningful speedup either and we obtain similar performance difference on A100s. We expected dorado to be faster. Any advice on why this difference occurs and advice on speeding up would be much appreciated.

Information about our system:
HPC cluster with Slurm scheduler
OS: Centos7 on all nodes
GPU nodes have cuda 11.5 and driver 495.29.05 installed.
Each node has 2x Intel(R) Xeon(R) Gold 5318S and 500GB RAM.
IO destinations are on VAST shared filesystem (no meaningful difference for local disk).

Information about build:
git commit 3202db8ff153dbad7d62d7efbddcbdeef70c3e9d
Cmake build command: cmake -S . -B cmake-build -DDORADO_USING_OLD_CPP_ABI=True
dependency versions:

hdf5 1.12.2 built from source with gcc 9.1.0
cmake 3.24.3
gcc 9.1.0
openssl 1.1.1s built from source (spack) with gcc 9.1.0

unstable CPU frequency M1 pro with metal

I have successfully built and run dorado on M1 pro thanks to these threads: #12 (comment) #7 (comment)

There are a few memory limitations with metal it seems, but hey this is still great performance for pre-alpha.

I noticed the CPU frequency is quite unstable, and although not detrimental to the outcome, is unexpected behaviour.

command:
dorado basecaller -x "metal" -r 2 -b 768 -c 4000 [email protected] /Volumes/Isaac_SSD/fast5 > /Volumes/Isaac_SSD/mycoplasma2.fastq

Wondering if this is a metal implementation side-effect or something else?

Jetson Orin Support

I was just hoping to get an update on Jetson Orin support. It was identified as being a focus just after the initial release of Dorado, but there has been no update since then. When are we likely to get support for the Jetson Orin?

Compilation Error: conflicting declaration

Hi,
I am trying to install dorado.
I get the below error while running:
cmake --build cmake-build --config Release -j

[ 86%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/utils/compat_utils.cpp.o
[ 86%] Building CXX object CMakeFiles/dorado_lib.dir/dorado/utils/tensor_utils.cpp.o
[ 87%] Linking CXX executable ../../../../../bin/vbz_test
[ 87%] Built target vbz_test
In file included from ~/dorado_new/dorado/3rdparty/hdf5-1.12.1-3/hdf5-1.12.1-3/include/hdf5.h:22,
                 from ~/dorado_new/dorado/3rdparty/hdf_plugins/vbz_plugin/vbz_plugin_user_utils.h:3,
                 from ~/dorado_new/dorado/data_loader/DataLoader.cpp:6:
~/dorado_new/dorado/3rdparty/hdf5-1.12.1-3/hdf5-1.12.1-3/include/H5public.h:167:19: error: conflicting declaration ‘typedef long long int ssize_t’
  167 | typedef long long ssize_t;
      |                   ^~~~~~~
In file included from /usr/include/stdlib.h:394,
                 from /usr/include/c++/9/cstdlib:75,
                 from /usr/include/c++/9/bits/stl_algo.h:59,
                 from /usr/include/c++/9/string:52,
                 from ~/dorado_new/dorado/data_loader/DataLoader.h:2,
                 from ~/dorado_new/dorado/data_loader/DataLoader.cpp:1:
/usr/include/x86_64-linux-gnu/sys/types.h:108:19: note: previous declaration as ‘typedef __ssize_t ssize_t’
  108 | typedef __ssize_t ssize_t;
      |                   ^~~~~~~
make[2]: *** [CMakeFiles/dorado_lib.dir/build.make:174: CMakeFiles/dorado_lib.dir/dorado/data_loader/DataLoader.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
make[1]: *** [CMakeFiles/Makefile2:255: CMakeFiles/dorado_lib.dir/all] Error 2
make: *** [Makefile:166: all] Error 2

flag issue when using --modified-bases on osx

I can successfully run dorado 0.1.0 on an apple silicon M1 Max chip with the following
./dorado-0.1.0+4b0e9a6-Darwin/bin/dorado basecaller [email protected] pod5dir > calls.sam

However when I add --modified-bases 5mCG to the command, per instructions, it fails. Looks like a flag parsing error?

./dorado-0.1.0+4b0e9a6-Darwin/bin/dorado basecaller --modified-bases 5mCG [email protected] pod5dir > calls.sam 
[2022-12-07 15:06:25.351] [error] '[email protected]' is not a supported modification please select from 5mCG, 5mCG_5hmCG

error when basecalling

Hello,

when basecalling with dorado I get this error:
no kernel image is available for execution on the device

I really do not know what to do, I've tried to compile it with no luck.
Ubuntu 22.04, 1050gtx ti

can you help me please,
thanks!

Future of model training by users

Hello everyone,

I'd like to know what is the future of custom model training by users? I've used taiyaki and from my understanding the software is not developed anymore. I've not seen mentions of SUP model training, nor dorado V4 training, and I believe only remora training commands are available for the new generation of software presented at NCM22.

Is there any replacement for custom basecaller training? Are there any plan?

Best,

Alan

Dorado performance compared to Guppy

Are there any parameters that could be tuned on Dorado to get equivalent performance on Guppy? I have been running them on a server with SSD with Tesla V100-16GB GPU and Dorado seems to be 1.5-2.5X slower depending on the model. See the commands I used and the runtimes below:

FAST model (dorado ~2.5X slower)
ont-guppy-6.1.1/bin/guppy_basecaller -i fast5/ -s fastq_fast1/ -c dna_r9.4.1_e8.1_fast_prom.cfg -x cuda:0
time = 19:11.3
dorado basecaller [email protected] fast5/ -b 3000 --emit-fastq -x cuda:0 > dorado_fast.fastq
time = 50:48.4

HAC model (dorado ~1.5X slower)
ont-guppy-6.1.1/bin/guppy_basecaller -i fast5/ -s fastq_hac1/ -c dna_r9.4.1_e8.1_hac_prom.cfg -x cuda:0
time = 58:28.74
dorado basecaller [email protected] fast5/ -b 800 --emit-fastq -x cuda:0 > dorado_hac.fastq
time = 1:34:45

SUP model (dorado ~1.5X slower)
ont-guppy-6.1.1/bin/guppy_basecaller -i fast5/ -s fastq_sup1/ -c dna_r9.4.1_e8.1_sup.cfg -x cuda:0
time = 4:14:03
dorado basecaller [email protected] fast5/ -b 200 --emit-fastq -x cuda:0 > dorado_sup.fastq
time = 6:00:39

cmake Error

Hi, I get the following build error:

dorado/dorado/3rdparty/elzip/src/elzip.cpp:83:50: error: ‘relative’ is not a member of ‘std::filesystem’
             auto relativePath = std::filesystem::relative(absolute_path, directory);
                                                  ^~~~~~~~
dorado/3rdparty/elzip/CMakeFiles/elzip.dir/build.make:75: recipe for target 'dorado/3rdparty/elzip/CMakeFiles/elzip.dir/src/elzip.cpp.o' failed
make[2]: *** [dorado/3rdparty/elzip/CMakeFiles/elzip.dir/src/elzip.cpp.o] Error 1

nanoporetech / dorado Goto Github PK

dorado's Introduction

Dorado

Features

Installation

Platforms

Performance tips

Running

Model selection foreword

Simplex basecalling

DNA adapter and primer trimming

In-line with basecalling

Trimming existing datasets

Custom primer trimming

RNA adapter trimming

Modified basecalling

Duplex

Alignment

Sequencing Summary

Barcode Classification

In-line with basecalling

Classifying existing datasets

Demultiplexing mapped reads

Using a sample sheet

Custom barcodes

Poly(A) tail estimation

Available basecalling models

Decoding Dorado model names

DNA models:

RNA models:

Automatic model selection complex

Developer quickstart

Linux dependencies

Clone and build

Pre-commit

Troubleshooting Guide

Library Path Errors

Improving the Speed of Duplex Basecalling

Running Duplex Basecalling in a Distributed Fashion

GPU Out of Memory Errors

Low GPU Utilization

Licence and Copyright

dorado's People

Contributors

Stargazers

Watchers

Forkers

dorado's Issues

0. Packages installed using brew

1. Xcode installation and profile switch for being able to use metal

Recommend Projects

Recommend Topics

Recommend Org