bcgsc / tigmint Goto Github PK

View Code? Open in Web Editor NEW

54.0 18.0 13.0 21.44 MB

⛓ Correct misassemblies using linked AND long reads

Home Page: https://bcgsc.github.io/tigmint/

License: GNU General Public License v3.0

Ruby 0.20% Shell 0.65% Python 63.63% R 1.42% Makefile 22.82% Dockerfile 0.38% C++ 10.89%

misassembly-correction genome-scaffolding genome-assembly linked-reads 10xgenomics bioinformatics bioinformatics-tool

tigmint's Introduction

Correct misassemblies in genome assembly drafts using linked or long sequencing reads

Cut sequences at positions with few spanning molecules.

Written by Shaun Jackman, Lauren Coombe, Justin Chu, and Janet Li.

Paper · Slides · Poster

Citation

Shaun D. Jackman, Lauren Coombe, Justin Chu, Rene L. Warren, Benjamin P. Vandervalk, Sarah Yeo, Zhuyi Xue, Hamid Mohamadi, Joerg Bohlmann, Steven J.M. Jones and Inanc Birol (2018). Tigmint: correcting assembly errors using linked reads from large molecules. BMC Bioinformatics, 19(1). doi:10.1186/s12859-018-2425-6

Description

Tigmint identifies and corrects misassemblies using linked (e.g. MGI's stLFR, 10x Genomics Chromium) or long (e.g. Oxford Nanopore Technologies long reads) DNA sequencing reads. The reads are first aligned to the assembly, and the extents of the large DNA molecules are inferred from the alignments of the reads. The physical coverage of the large molecules is more consistent and less prone to coverage dropouts than that of the short read sequencing data. The sequences are cut at positions that have insufficient spanning molecules. Tigmint outputs a BED file of these cut points, and a FASTA file of the cut sequences.

Tigmint also allows the use of long reads from Oxford Nanopore Technologies. The long reads are segmented and assigned barcodes, and the following steps of the pipeline are the same as described above.

Each window of a specified fixed size is checked for a minimum number of spanning molecules. Sequences are cut at those positions where a window with sufficient coverage is followed by some number of windows with insufficient coverage is then followed again by a window with sufficient coverage.

Installation

Install Tigmint using Brew

brew install tigmint

Install Tigmint using Conda

conda install -c bioconda -c conda-forge tigmint

Run Tigmint using Docker

docker run quay.io/biocontainers/tigmint

Install Tigmint from the source code

Download and extract the source code.

git clone https://github.com/bcgsc/tigmint && cd tigmint
make -C src

curl -L https://github.com/bcgsc/tigmint/releases/download/v1.2.10/tigmint-1.2.10.tar.gz | tar xz && cd tigmint-1.2.10
make -C src

Dependencies

Install Python package dependencies

conda install -c bioconda intervaltree pybedtools pysam numpy bedtools minimap2 bwa zsh btllib samtools

Install the dependencies of ARCS (optional)

conda install -c bioconda arcs links

Install the dependencies for calculating assembly metrics (optional)

conda install -c bioconda abyss seqtk

Usage

To run Tigmint on the draft assembly myassembly.fa with the reads myreads.fq.gz, which have been run through longranger basic:

tigmint-make tigmint draft=myassembly reads=myreads

bwa mem -C is used to copy the BX tag from the FASTQ header to the SAM tags.
samtools sort -tBX is used to sort first by barcode and then position.

To run both Tigmint and scaffold the corrected assembly with ARCS:

tigmint-make arcs draft=myassembly reads=myreads

To run Tigmint, ARCS, and calculate assembly metrics using the reference genome GRCh38.fa:

tigmint-make metrics draft=myassembly reads=myreads ref=GRCh38 G=3088269832

To run Tigmint with long reads in fasta or fastq format (myreads.fa.gz or myreads.fq.gz - or uncompressed) on the draft assembly myassembly.fa for an organism with a genome size of gsize:

tigmint-make tigmint-long draft=myassembly reads=myreads span=auto G=gsize dist=auto

minimap2 map-ont is used to align long reads from the Oxford Nanopore Technologies (ONT) platform, which is the default input for Tigmint. To use PacBio long reads specify the parameter longmap=pb. The former calls minimap2 -x map-ont while the latter calls minimap2 -x map-pb instead. When using PacBio HiFi long reads, specify the parameter longmap=hifi.

Optionally, ntLink (v1.3.6+) can be used to map the long reads to the draft assembly. To use ntLink mappings, specify mapping=ntLink to your tigmint command.

Notes

tigmint-make is a Makefile script, and so any make options may also be used with tigmint-make, such as -n (--dry-run).
When running Tigmint with linked reads, the file extension of the assembly must be .fa and the reads .fq.gz, and the extension should NOT be included in the parameters draft and reads on the command line (otherwise you will get an error). These specific file name requirements result from implementing the pipeline in GNU Make.
- The requirements for running Tigmint with long reads are the same, but the file extension of the reads file can also be .fa, .fa.gz, or .fq
The minimum spanning molecules parameter (span) for tigmint-cut is heavily dependent on the sequence coverage of the linked or long reads provided. When running Tigmint with long reads, use span=auto and set G to your assembly organism's haploid genome size for this parameter to be calculated automatically, or explicitly set span to a specific number if you are interested in adjusting it. See Tips for more details.
For tigmint-long, the maximum distance between reads threshold should be calculated automatically based on the read length distribution. This can be done by setting the parameter dist=auto.
The long-to-linked-pe step of tigmint-long uses a maximum of 6 threads

tigmint-make commands

tigmint: Run Tigmint, and produce a file named $draft.tigmint.fa
tigmint-long: Run Tigmint using long reads, and produce a file named $draft.cut$cut.tigmint.fa
arcs: Run Tigmint and ARCS, and produce a file name $draft.tigmint.arcs.fa
metrics: Run, Tigmint, ARCS, and calculate assembly metrics using abyss-fac and abyss-samtobreak, and produce TSV files.

Parameters of Tigmint

draft: Name of the draft assembly, myassembly.fa
reads: Name of the reads, myreads.fq.gz
G: Haploid genome size of the draft assembly organism. Required to calculate span parameter automatically. Can be given as an integer or in scientific notation (e.g. '3e9' for human) [0]
span=20: Number of spanning molecules threshold. Set span=auto to automatically select span parameter (currently only recommended for tigmint-long)
cut=500: Cut length for long reads (tigmint-long only)
longmap=ont: Long read platform; ont for Oxford Nanopore Technologies (ONT) long reads, pb for PacBio long reads (tigmint-long only)
window=1000: Window size (bp) for checking spanning molecules
minsize=2000: Minimum molecule size
as=0.65: Minimum AS/read length ratio
nm=5: Maximum number of mismatches
dist=50000: Maximum distance (bp) between reads to be considered the same molecule. Set dist=auto to automatically calculate dist threshold based on read length distribution (tigmint-long only)
mapq=0: Mapping quality threshold
trim=0: Number of bases to trim off contigs following cuts
t=8: Number of threads
ac=3000: Minimum contig length (bp) for tallying attempted corrections. This is for logging purposes only, and will not affect the performance.
SORT_OPTS: specify any options to be used for sort

Parameters of ARCS

c=5
e=30000
r=0.05

Parameters of LINKS

a=0.1
l=10

Parameters for calculating assembly metrics

ref: Reference genome, ref.fa, for calculating assembly contiguity metrics
G: Size of the reference genome, for calculating NG50 and NGA50

Tips

If your barcoded reads are in multiple FASTQ files, the initial alignments of the barcoded reads to the draft assembly can be done in parallel and merged prior to running Tigmint.
When aligning linked reads with BWA-MEM, use the -C option to include the barcode in the BX tag of the alignments.
Sort by BX tag using samtools sort -tBX.
Merge multiple BAM files using samtools merge -tBX.
When aligning long reads with Minimap2, use the -y option to include the barcode in the BX tag of the alignments.
When using long reads, the minimum spanning molecule thresholds (span) should be no greater than 1/4 of the sequence coverage. Setting the parameter span=auto allows the appropriate parameter value to be selected automatically (this setting requires the parameter G as well).
When using long reads, the edit distance threshold (nm) is automatically set to the cut length (cut) to compensate for the higher error rate and length. This parameter should be kept relatively high to include as many alignments as possible.
Each Tigmint (and Tigmint-long) step can be run separately through specifying the target using tigmint-make. For example, the bwa index step for Tigmint with linked reads can be launched with tigmint-make tigmint-index
- Tigmint steps\targets (for linked reads): tigmint-index, tigmint-align, tigmint-molecule, tigmint-cut
- Tigmint-long steps\targets: tigmint-long-estimate, tigmint-long-to-linked, tigmint-long-cut
In the command line, SORT_OPTS can be specified to add options used for sort

Using stLFR linked reads

To use stLFR linked reads with Tigmint, you will need to re-format the reads to have the barcode in a BX:Z: tag in the read header. For example, this format

@V100002302L1C001R017000000#0_0_0/1 0	1
TGTCTTCCTGGACAGCTGACATCCCTTTTGTTTTTCTGTTTGCTCAGATGCTGTCTCTTATACACATCTTAGGAAGACAAGCACTGACGACATGATCACC
+
FFFFFFFGFGFFGFDFGFFFFFFFFFFFGFFF@FFFFFFFFFFFF@FFFFFFFFFGGFFEFEFFFF?FFFFGFFFGFFFFFFFGFFEFGFGGFGFFFGFF

should be changed to:

@V100002302L1C001R017000000 BX:Z:0_0_0
TGTCTTCCTGGACAGCTGACATCCCTTTTGTTTTTCTGTTTGCTCAGATGCTGTCTCTTATACACATCTTAGGAAGACAAGCACTGACGACATGATCACC
+
FFFFFFFGFGFFGFDFGFFFFFFFFFFFGFFF@FFFFFFFFFFFF@FFFFFFFFFGGFFEFEFFFF?FFFFGFFFGFFFFFFFGFFEFGFGGFGFFFGFF

Support

After first looking for an existing issue at https://github.com/bcgsc/tigmint/issues, please report a new issue at https://github.com/bcgsc/tigmint/issues/new. Please report the names of your input files, the exact command line that you are using, and the entire output of Tigmint.

Pipeline

tigmint's People

Contributors

Stargazers

Watchers

Forkers

judithr qiushili sunnycqcn jesszha radnovogene lhui2010 mataruack mmokrejs nailouzhang guijie2015 zhan4429 grosa1 ccoulombe

tigmint's Issues

tigmint parameters with multiple linked reads

I have got a draft genome from 30x PacBio for about 4 Gbp plant genome. I would like to use 60x 10xGenomics linked reads for correcting. I have got 4 library from 10xGenomics. I tried tigmint with dry run. This was the parameters which I used:

tigmint-make arcs -n draft=$HOME/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs reads=$HOME/szeged/fk8jybr/input/Illumina_10x/LC001 $HOME/szeged/fk8jybr/input/Illumina_10x/LC002 $HOME/szeged/fk8jybr/input/Illumina_10x/LC003 $HOME/szeged/fk8jybr/input/Illumina_10x/LC004

I got this output message:

bwa index /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs.fa
bwa mem -t8 -pC /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs.fa /home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.fq.gz | samtools view -u -F4 | samtools sort -@8 -tBX -T$(mktemp -u -t /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.sortbx.bam.XXXXXX) -o /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.sortbx.bam
/big/home/fk8jybr/.linuxbrew/Cellar/tigmint/1.1.2_2/libexec/bin/tigmint-molecule -a0.65 -n5 -q0 -d50000 -s2000 /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.sortbx.bam | sort -k1,1 -k2,2n -k3,3n >/home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.bed
samtools faidx /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs.fa
/big/home/fk8jybr/.linuxbrew/Cellar/tigmint/1.1.2_2/libexec/bin/tigmint-cut -p8 -w1000 -n20 -t0 -o /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs.fa /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.bed
bwa index /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa
bwa mem -t8 -pC /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa /home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.fq.gz | samtools view -@8 -h -F4 -o /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.sortn.bam
arcs -s98 -c5 -l0 -z500 -m4-20000 -d0 -e30000 -r0.05 -v \
	-f /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa \
	-b /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs \
	-g /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.dist.gv \
	--tsv=/home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.dist.tsv \
	--barcode-counts=/home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.sortn.bam.barcode-counts.tsv \
	/home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.sortn.bam
/big/home/fk8jybr/.linuxbrew/Cellar/tigmint/1.1.2_2/libexec/bin/tigmint-arcs-tsv /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs_original.gv /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.links.tsv /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa
cp /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.links.tsv /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.a0.1_l10.links.tigpair_checkpoint.tsv
LINKS -k20 -l10 -t2 -a0.1 -x1 -s /dev/null -f /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa -b /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.a0.1_l10.links
sed -r 's/^>scaffold([^,]*),(.*)/>\1 scaffold\1,\2/' /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.a0.1_l10.links.scaffolds.fa >/home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.a0.1_l10.links.fa
ln -sf /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs./home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC001.c5_e30000_r0.05.arcs.a0.1_l10.links.fa /home/fk8jybr/szeged/fk8jybr/input/pacbio_assembled_canu/lculinaris.contigs.tigmint.arcs.fa
make: *** No rule to make target '/home/fk8jybr/szeged/fk8jybr/input/Illumina_10x/LC002'.  Stop.

How could I set the paramers for use all of 4 linked read libraries?

Cannot compile the bundled while modified copy of make: make-4.1/glob/glob.c:1342: undefined reference to `__alloca'

Hi,
yet another undocumented property of tigmint is that it uses it's own but modified copy of GNU make. Of course, over the years things have changed and make does not compile anymore:

gcc  -g -O2 -Wl,--export-dynamic  -o make ar.o arscan.o commands.o default.o dir.o expand.o file.o function.o getopt.o getopt1.o guile.o implicit.o job.o load.o loadapi.o main.o misc.o output.o read.o remake.o rule.o signame.o strcache.o variable.o version.o vpath.o hash.o xml.o remote-stub.o glob/libglob.a   -ldl 
/usr/bin/ld: glob/libglob.a(glob.o): in function `glob_in_dir':
/foo/tigmint-master/xml-patch-make/make-4.1/make-4.1/glob/glob.c:1367: undefined reference to `__alloca'
/usr/bin/ld: /foo/tigmint-master/xml-patch-make/make-4.1/make-4.1/glob/glob.c:1342: undefined reference to `__alloca'
/usr/bin/ld: foo/tigmint-master/xml-patch-make/make-4.1/make-4.1/glob/glob.c:1256: undefined reference to `__alloca'
/usr/bin/ld: /foo/tigmint-master/xml-patch-make/make-4.1/make-4.1/glob/glob.c:1283: undefined reference to `__alloca'
/usr/bin/ld: glob/libglob.a(glob.o): in function `glob':
/foo/tigmint-master/xml-patch-make/make-4.1/make-4.1/glob/glob.c:581: undefined reference to `__alloca'
/usr/bin/ld: glob/libglob.a(glob.o):/foo/tigmint-master/xml-patch-make/make-4.1/make-4.1/glob/glob.c:732: more undefined references to `__alloca' follow
collect2: error: ld returned 1 exit status

That's the reason why you should better submit the XML-patch to upstream and rely on an improved version.

The fix for the compile issues was already invented a few times.
linuxboot/heads#352
Please update the README.md (as always) to make it clear the pipeline uses it's own make.

Runtime on finding breakpoints

Dear Shaun Jackman,

We have recently received our 10x genomics data and currently optimizing our de novo assembly using different tools (bird genome of 1.2GB). To check for assembly errors, I decided to run Tigmint. The mapping step proved to be no problem, however the "finding breakpoints" step in which tigmunt-cut is used, is now running for more than two weeks without anything happening.

Is it expected that this step takes so much time?

Kind regards,

Jordi

src/long-to-linked-pe v1.0: Using more than 6 threads does not scale, reverting to 6.

The README.md claims one can run tigmint with any number of threads.

It turn out the maximum at least in some step is just 6:

long-to-linked-pe v1.0: Using more than 6 threads does not scale, reverting to 6.

Please update the t=8: Number of threads line in README.md accordingly explaining what steps will not use multiple CPUs. I guess one still want to use for for the bwa step but who knows.

tigmint molecule Error 1

I'm trying to run Tigmint with Arks/Links using the arks-make pipeline. My run gets through the mapping but hits a snag in the tigmint molecule step. The error message doesn't give much to go on so I'm hoping someone can provide some help. Here is the error from STDERR:

make[1]: *** [assembly.chromium_reads.as0.65.nm5.molecule.size2000.bed] Error 1
make[1]: *** Deleting file `assembly.chromium_reads.as0.65.nm5.molecule.size2000.bed'
make: *** [assembly.tigmint.fa] Error 2

The command it errors on is:

command time -v -o assembly.chromium_reads.as0.65.nm5.molecule.size2000.bed.time /n/home13/dcard/.linuxbrew/Cellar/tigmint/1.1.2_2/libexec/bin/tigmint-molecule -a0.65 -n5 -q0 -d50000 -s2000 assembly.chromium_reads.sortbx.bam | sort -k1,1 -k2,2n -k3,3n >assembly.chromium_reads.as0.65.nm5.molecule.size2000.bed

Any information and guidance is greatly appreciated. Also, is there a way to restart the run without having to map everything again? Or is that a feature of make (I'm new to using it like this)?

Thank you,
Daren Card

multiple intervals with the same barcode

Hello,
I am looking at tigmint-molecule's output trying to figure out if I can still use the data in a similar way that tigmint-cut does.
I noticed that there are several barcodes that are represented by more than one interval, here some examples:

scaffold2261    1637697 1670812 GTGCGACAGAAGTACT-1      59
scaffold2261    1706848 1726735 GTGCGACAGAAGTACT-1      31
scaffold2261    1781176 1806754 GTGCGACAGAAGTACT-1      30
scaffold2261    1827401 1851617 GTGCGACAGAAGTACT-1      77
[...]
scaffold2261    790459  875292  GATGCTATCTGCTGTC-1      68
scaffold2261    6642083 6656674 GATGCTATCTGCTGTC-1      14
scaffold2261    6676651 6697432 GATGCTATCTGCTGTC-1      17
[...]
scaffold2261    5214581 5225238 TCATTACCACCTGGTG-1      16
scaffold2261    5251458 5322905 TCATTACCACCTGGTG-1      35
scaffold2261    5346107 5365502 TCATTACCACCTGGTG-1      26

I wonder if at the cut step this fact is being accoutned for: i.e. if in the second example there is a dip in coverage at 1 Mb, will the barcode be counted as spanning the dip or not?
A more meaningful way to keep track of these cases would be to encode the molecule.bed file in a gff/gtf format, where the e.g. 3 intervals would be children of a larger parent, and use that to count coverage/span of a molecule.
Of course, false positives may increase if there are two segments of an adjacent region in the same GEM, but that should be quite unlikely for a large genome and could be filtered out with stringent cut parameters.

Similarly, what does this exactly mean?

tigmint-molecule --help
-d N, --dist N        Maximum distance between reads in the same molecule
                             [50000]

is this max distance between adjacent reads to keep elongating the molecule or the total size of the molecule?
thanks,
Dario

Tigmint not making cuts

Hi,
Thank you for making tigmint!

I am having a bit of difficulty running tigmint, it seems to run but my output genome is identical to the input genome even when I give it unreasonably strict parameters that should cut something (-n500000 for tigmint-cut). I have 10x chromium linked reads with about 50x coverage and a supernova assembly with N50=9 Mb.

I made the sequence alignment with bwa-mem with:

bwa mem -t24 -pC assembly.fa reads.fq.gz | samtools view -@24 -h -F4 -o alignment.bam
samtools sort --verbosity 3 -@ 24 -m 3G -t BX -o sorted.bam alignment.bam

And then ran tigmint with:

tigmint-molecule -a0.65 -n5 -q0 -d50000 -s2000 sorted.bam | sort -k1,1 -k2,2n -k3,3n > genome.reads.as0.65.nm5.molecule.size2000.bed
samtools faidx assembly.fa
tigmint-cut -p24 -w1000 -n20 -t0 -o assembly.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa assembly.fa assembly.reads.as0.65.nm5.molecule.size2000.bed

But it did not seem to make any cuts as the output assembly had the same number of scaffolds as the input. I tried changing the settings of tigmint-cut to -w100 -n500000 -t100 to try force it to cut something but it did not cut anything. Does this indicate that there is a problem with my run, or is there really nothing for it to cut even with very extreme settings?

The output bed files looks like they were produced normally:

head genome.reads.as0.65.nm5.molecule.size2000.bed

0::0:0-42446733	0	2030	CAACCTCTCGGATGCC-1	9
0::0:0-42446733	0	10082	ATACTTCGTGAGGGAG-1	9
0::0:0-42446733	0	18937	CTCTGTGCATTTGCGA-1	66
0::0:0-42446733	0	22013	CCACTACCAGAGATCG-1	65
0::0:0-42446733	0	22785	TTAGGTGCAACTCATG-1	29

head assembly.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa.bed

0       0       42446733        0
58      0       15111319        58
90      0       30744791        90
114     0       5293578 114
138     0       7375403 138

Thank you

pre-alignment

Within the section 'tips' it is stated: "If your barcoded reads are in multiple FASTQ files, the initial alignments of the barcoded reads to the draft assembly can be done in parallel and merged prior to running Tigmint." What should then be the input for 'tigmint-make tigmint-long'? A paf file?

ImportError: /lib64/libc.so.6: version `GLIBC_2.18' not found

Hi,

Sadly to see tigmint stopped by another error...

Cmd line:

$ tigmint tigmint draft=draft reads=reads t=30

Screen output:

/data/qiushi/tubesnout-rj/tigmint/bin/tigmint-molecule -b draft.reads.sortbx.bam -o draft.reads.as0.65.nm5.molecule.tsv -a 0.65 -n 5 -q 0 -d 50000
awk 'NR>1 { print $1"\t"$2-1"\t"$3-1"\tReads="$7",Size="$4",Mapq="$8",AS="$9",NM="$10",BX="$5",MI="$6"\t"$7 }' draft.reads.as0.65.nm5.molecule.tsv \
        | sort -k1,1 -k2,2n -k3,3n >draft.reads.as0.65.nm5.molecule.bed
awk '$3 - $2 >= 2000' draft.reads.as0.65.nm5.molecule.bed >draft.reads.as0.65.nm5.molecule.size2000.bed
samtools faidx draft.fa
/data/qiushi/tubesnout-rj/tigmint/bin/tigmint-cut -p30 -w1000 -n20 -t0 -o draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa draft.fa draft.reads.as0.65.nm5.molecule.size2000.bed
Traceback (most recent call last):
  File "/data/qiushi/tubesnout-rj/tigmint/bin/tigmint-cut", line 11, in <module>
    import pybedtools
  File "/data/qiushi/anaconda3/lib/python3.6/site-packages/pybedtools/__init__.py", line 10, in <module>
    from .cbedtools import (Interval, IntervalFile, overlap, Attributes,
ImportError: /lib64/libc.so.6: version `GLIBC_2.18' not found (required by /home/linuxbrew/.linuxbrew/lib/libstdc++.so.6)
make: *** [draft.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa] Error 1

Information about the server you might need:

$cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.4 (Maipo)

$/usr/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-linker-hash-style=gnu --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin --enable-initfini-array --disable-libgcj --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) 

[qiushi.li@itbioyeaman02 ~]$strings /usr/lib64/libc.so.6 | grep ^GLIBC_
GLIBC_2.2.5
GLIBC_2.2.6
GLIBC_2.3
GLIBC_2.3.2
GLIBC_2.3.3
GLIBC_2.3.4
GLIBC_2.4
GLIBC_2.5
GLIBC_2.6
GLIBC_2.7
GLIBC_2.8
GLIBC_2.9
GLIBC_2.10
GLIBC_2.11
GLIBC_2.12
GLIBC_2.13
GLIBC_2.14
GLIBC_2.15
GLIBC_2.16
GLIBC_2.17
GLIBC_PRIVATE
GLIBC_PRIVATE
GLIBC_2.8
GLIBC_2.3
GLIBC_2.5
GLIBC_2.4
GLIBC_2.9
GLIBC_2.7
GLIBC_2.6
GLIBC_2.3.2
GLIBC_2.3.4
GLIBC_2.3.3
GLIBC_2.15
GLIBC_2.14
GLIBC_2.11
GLIBC_2.16
GLIBC_2.10
GLIBC_2.17
GLIBC_2.12
GLIBC_2.13
GLIBC_2.2.5
GLIBC_2.2.6

I tried to sudo update glibc with local built but failed.

$curl -O http://ftp.gnu.org/gnu/glibc/glibc-2.18.tar.gz
$tar zxf glibc-2.18.tar.gz 
$cd glibc-2.18/
$mkdir build
$cd build/
$../configure --prefix=/usr
$make -j2
$sudo make install

Error info:

Inconsistency detected by ld.so: get-dynamic-info.h: 134: elf_get_dynamic_info: Assertion `info[15] == ((void *)0)' failed!
make[2]: *** [/usr/lib64/gconv/gconv-modules] Error 127
make[2]: Leaving directory `/data/programs/glibc-2.18/iconvdata'
make[1]: *** [iconvdata/subdir_install] Error 2
make[1]: Leaving directory `/data/programs/glibc-2.18'
make: *** [install] Error 2

I would be very cautious on doing anything to the default glibc... And I notice that if

$strings /home/linuxbrew/.linuxbrew/lib/libc.so.6 | grep ^GLIBC_
GLIBC_2.2.5
GLIBC_2.2.6
GLIBC_2.3
GLIBC_2.3.2
GLIBC_2.3.3
GLIBC_2.3.4
GLIBC_2.4
GLIBC_2.5
GLIBC_2.6
GLIBC_2.7
GLIBC_2.8
GLIBC_2.9
GLIBC_2.10
GLIBC_2.11
GLIBC_2.12
GLIBC_2.13
GLIBC_2.14
GLIBC_2.15
GLIBC_2.16
GLIBC_2.17
GLIBC_2.18
GLIBC_2.22
GLIBC_2.23
GLIBC_PRIVATE
GLIBC_2.23
GLIBC_2.8
GLIBC_2.5
GLIBC_2.9
GLIBC_2.7
GLIBC_2.6
GLIBC_2.18
GLIBC_2.11
GLIBC_2.16
GLIBC_2.10
GLIBC_2.17
GLIBC_2.13
GLIBC_2.2.6

There is GLIBC_2.18 in linuxbrew library.

Any idea about this?

Thanks!
Qiushi

tigmint-make errors

Hello.

When I execute 'tigmint-make', I got some errors like below.

macarima@localhost:~$ tigmint-make
make: command: Command not found
make: command: Command not found
make: command: Command not found
Tigmint: Correct misassemblies using linked reads
Usage: tigmint-make [COMMAND]... [PARAMETER=VALUE]...
Example: tigmint-make tigmint draft=myassembly reads=myreads
For more information see https://bcgsc.github.io/tigmint/

How can I remove these errors?

Can I get the breakpoint information?

In tigmint-cut,I got function printBreakpoints().
In the output file,draft.tigmint.fa.bed,I guess scaffold_8-1/2/3 are breakpoints.
Is this right?

Tigment not found by Linuxbrew

Hello,

I'm following the instructions to install Tigment. I've got an installation of linuxbrew up and running, but when I try to run "brew install tigment", I get the following error. Any insight? Thanks!

$ brew install tigment
Error: No available formula with the name "tigment"
==> Searching for a previously deleted formula (in the last month)...
Warning: homebrew/core is shallow clone. To get complete history run:
git -C "$(brew --repo homebrew/core)" fetch --unshallow

Error: No previously deleted formula found.
==> Searching for similarly named formulae...
==> Searching local taps...
Error: No similarly named formulae found.
==> Searching taps...
==> Searching taps on GitHub...
Error: No formulae found in taps.

myassembly.myreads.as0.65.nm5.molecule.tsv in null

Hello,

I am running tigmint with the following command.But myassembly.myreads.as0.65.nm5.molecule.tsv file is coming as null.Please can let me know the reason ?

tigmint-make tigmint draft=myassembly reads=myreads t=16

Filter alignments in a single step

I have two small notes:

This is a bit aesthetic, but in the README, you might want to list the sample draft assembly as "myassembly.fa" and the sample reads as "myreads.fq.gz" in order to better match the sample usage commands.
The pipeline only uses the %.as100.bam to filter and output the %.nm$(nm).bam (lines 144-161 of tigmint-make). Therefore, couldn't you combine these two steps either by piping the alignment score gawk command into the mismatches gawk command or by combining the two gawk commands?

Error : no progress of scaffolding running

Hello
I run the tigmint program with draft assembly in different folder and Nanopore reads library. The program get stuck-up at showing following output producing only the file "NFCC4014.fastq.gz.cut500.fq.gz". for hours and hours the thiis sinlge file size remain 50.5mb not further addition or progress. I kill the program by using Ctrl+C, it shows error and the output file generated is deleted automatically. I dont understand whats the problem? Why the tigmint program stop and dont proceed further. All other dependency tools are installed.

acer@kishor:~/talasmbly$ tigmint-make tigmint-long draft=4spd2403.fa reads=/home/acer/oxford/NFCC4014.fastq.gz span=auto G=33000000 dist=auto longmap=ont t=12
Program run:
pigz -p12 -dc | /home/linuxbrew/.linuxbrew/Cellar/tigmint/1.2.2/libexec/bin/long-to-linked -l500 -m2000 -g33000000 -s -d -o /home/acer/oxford/NFCC4014.fastq.gz.tigmint-long.params.tsv | pigz -p12 > /home/acer/oxford/NFCC4014.fastq.gz.cut500.fq.gz

Kinldy provide me solution. I have draft assembly by CANU, Flye of Nanopore data and draft assembly of Illumina reads by SPAdes software. How ca I input the Illumina paired reads presnt in 4 files (2 sets) as: 1F.fq, 1R.fq, 2F.fq and 2R.fq.

make: * No rule to make target `*.as0.65.nm5.molecule.tsv', needed by `tigmint'. Stop.

Hi Shaun,

I'm trying to use tigmint to correct misassemblies with 10x linked-reads (barcoded.fastq.gz, generated from the longranger basic). There is a immediate make error after "tigmint tigmint"
make: *** No rule to make target ***.as0.65.nm5.molecule.tsv', needed by tigmint'. Stop.
Here are the exact command lines I typed.

1, tigmint installation
git clone https://github.com/bcgsc/tigmint && cd tigmint

2, dependencies installation with a new environment “tigmint3” in conda, python3.6.4 was used
$source activate /path/to/anaconda3/envs/tigmint3
$pip install intervaltree
$pip install pybedtools
$pip install statistics
$pip install pysam (actually pysam was already installed as a dependency of pybedtools)

3, must-have software installation with linuxbrew
$brew install pigz gawk gnu-sed graphviz makefile2graph miller samtools bedtools bwa

4, r and r-packages installation with conda (brew failed due to missing xml2, couldn't be solved in short time)
$conda install R
$conda install -c r r-essentials (”ggplot2", "rmarkdown", "tidyverse" are all included)
$Rscript -e 'install.packages(c("uniqtag"), repos = c(CRAN = "https://cran.rstudio.com"))'

5, tigmint command line

test_1
$tigmint tigmint draft=barcoded_pilon_falcon6unzip reads=barcoded
make: *** No rule to make target barcoded_pilon_falcon6unzip.barcoded.as0.65.nm5.molecule.tsv', needed by tigmint'. Stop.
test_2
$tigmint tigmint draft=barcoded_pilon_falcon6unzip.fa reads=barcoded.fastq.gz
make: *** No rule to make target barcoded_pilon_falcon6unzip.fa.barcoded.fastq.gz.as0.65.nm5.molecule.tsv', needed by tigmint'. Stop.

no output

Just let me know if you need more information!

Best,
Qiushi

Tigmint's principle to break sequences

Hello,

I am looking at the output of Tigmint and I would like to know better the rationale used to split sequences.
In the attached example, tigmint broke the scaffold at 965,520 (red vertical line). The MP data (green and gray lines) is also iffy, so I agree with splitting it. But why not break it at the nearby gap (black line)? Contig construction and gap closing are more reliable steps than scaffolding, so to me the modifications should happen at the gaps (when possible). Blue bars are repeat, red are genes.

Then I have some more general questions/considerations.
First, the output .bed file:

scaffold3872    0       965417  scaffold3872-1
scaffold3872    965417  965520  scaffold3872-2
scaffold3872    965520  3294149 scaffold3872-3

I see that the coordinates are 0-based, but why are some bases present on both broken scaffolds (pos 965417 and 965520 here)? It would be nice to have a 1-based, non overlapping output to use it to make subsets of the intervals (when working manually on a subset of sequences I mean).

Would it be possible to avoid having terminal Ns? E.g. adapting tigmint's breaking point to the nearest gap (if present within e.g. 1 kb) and exclude gap-only contigs in the output? I am not sure if this happens, but it would be a problem when submitting the broken scaffolds to NCBI - they don't like terminal Ns.

Is it possible to have a parameter for minimum contig size? E.g. with minctgsize=1000, if tigmint has evidence to break a scaffold in two regions 522 bp away, break only one (the one that has more support for misassembly). Or something like that. I often see many small contigs, even down to 1 bp!

In the broken scaffolds, what does the lowercase first and last base mean?
thanks for the support,
Dario

Tigmint Updates

This issue is the Tigmint mailing list. I'll post here news about significant developments related to Tigmint. Subscribe to this issue if you'd like to receive these periodic updates, include notifications of new releases. This issue is locked to keep the volume low for subscribers.

recipe for target 'merged.barcoded.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.barcoded.sortn.bam' failed

tigmint-make fails after running for a day. I am not sure exactly how to deal with the error.

########################### run log #####################

[main] CMD: bwa mem -t50 -pC merged.fa barcoded.fq.gz
[main] Real time: 19582.030 sec; CPU: 744783.844 sec
[bam_sort_core] merging from 400 files and 50 in-memory blocks...
/usr/local/bin/tigmint-molecule -a0.65 -n5 -q0 -d50000 -s2000 merged.barcoded.sortbx.bam | sort -k1,1 -k2,2n -k3,3n >merged.barcoded.as0.65.nm5.molecule.size2000.bed
samtools faidx merged.fa
/usr/local/bin/tigmint-cut -p50 -w1000 -n20 -t0 -o merged.barcoded.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa merged.fa merged.barcoded.as0.65.nm5.molecule.size2000.bed
Started at: 2019-03-04 01:32:53.356125
Reading contig lengths...
Finding breakpoints...
Cutting assembly at breakpoints...

Tool: bedtools getfasta (aka fastaFromBed)
Version: v2.21.0
Summary: Extract DNA sequences into a fasta file based on feature coordinates.

Usage: bedtools getfasta [OPTIONS] -fi -bed <bed/gff/vcf> -fo

Options:
-fi Input FASTA file
-bed BED/GFF/VCF file of ranges to extract from -fi
-fo Output file (can be FASTA or TAB-delimited)
-name Use the name field for the FASTA header
-split given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
-tab Write output in TAB delimited format.
- Default is FASTA format.

    -s      Force strandedness. If the feature occupies the antisense,
            strand, the sequence will be reverse complemented.
            - By default, strand information is ignored.

    -fullHeader     Use full fasta header.
            - By default, only the word before the first space or tab is used.

DONE!
Ended at: 2019-03-04 01:55:48.834468
bwa index merged.barcoded.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa
[bwa_index] Pack FASTA... 0.00 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 0.00 seconds elapse.
[bwa_index] Update BWT... 0.00 sec
[bwa_index] Pack forward-only FASTA... 0.00 sec
[bwa_index] Construct SA from BWT and Occ... 0.00 sec
[main] Version: 0.7.10-r789
[main] CMD: bwa index merged.barcoded.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa
[main] Real time: 0.247 sec; CPU: 0.004 sec
bwa mem -t50 -pC merged.barcoded.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa barcoded.fq.gz | samtools view -@50 -h -F4 -o merged.barcoded.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.barcoded.sortn.bam
[M::main_mem] read 3610110 sequences (500000235 bp)...
bash: line 1: 2396 Segmentation fault bwa mem -t50 -pC merged.barcoded.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa barcoded.fq.gz
2397 Done | samtools view -@50 -h -F4 -o merged.barcoded.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.barcoded.sortn.bam
/usr/local/bin/tigmint-make:215: recipe for target 'merged.barcoded.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.barcoded.sortn.bam' failed
make: *** [merged.barcoded.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.barcoded.sortn.bam] Error 139
make: *** Deleting file 'merged.barcoded.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.barcoded.sortn.bam'
[ekw10@debruijn tigmint]$

samtools sort may be replaced by bamsort which scales better

The samtools sort may be replaced by bamsort from biobambam2 package which scales much better. See https://gitlab.com/german.tischler/biobambam2

yet another "make: *** No rule to make target" issue

Greetings,

First of all, sorry for opening another thread about previously posted issues, but after trying several times, i can't use tigmint. So, to be precise i actually installed longstitch through conda command and created a sole enviroment for this tool, i haven't been able use it, that's why i decided to use tigmint alone with no luck either.
This is the comand i used (taking longmap=pb out makes no difference)
tigmint-make tigmint longmap=pb draft=assembly1 reads=H1_1

And get this message back

make: *** No rule to make target 'assembly1.H1_1.sortbx.bam', needed by 'assembly1.H1_1.as0.65.nm5.molecule.size2000.dist50000.bed'. Stop.

From what i understand after reading similar issues, the problem was that the command was not being run in the working directory, which my Windows derived n00b bioinformatic brain understand as the directory where the files are, however the problem still persist.

And this directory looks like this: (longstitch) salvatierra@salvatierra-FXZR:~/Documents/Secuencias/PacBio$

I would appreciate your help. Thanks in advance.

brew install arcs failed

Hi,

I wanted to use the arcs to scaffold the corrected assemblies. But error occurred

$ brew tap Homebrew/science
Error: homebrew/science was deprecated. This tap is now empty as all its formulae were migrated.

$ brew install arcs
Error: No available formula with the name "arcs" 
==> Searching for a previously deleted formula (in the last month)...
Warning: homebrew/core is shallow clone. To get complete history run:
  git -C "$(brew --repo homebrew/core)" fetch --unshallow
Error: No previously deleted formula found.
==> Searching for similarly named formulae...
==> Searching local taps...
This similarly named formula was found:
darcs
To install it, run:
  brew install darcs
==> Searching taps...
==> Searching taps on GitHub...
Error: No formulae found in taps.

$ brew search links-scaffolder
==> Searching local taps...
==> Searching taps on GitHub...
==> Searching blacklisted, migrated and deleted formulae...
No formula found for "links-scaffolder".
Closed pull requests:
links-scaffolder 1.8.4 (https://github.com/Homebrew/homebrew-science/pull/4726)
links-scaffolder 1.6.1 (https://github.com/Homebrew/homebrew-science/pull/3391)
perl 5.26.0 (https://github.com/Homebrew/homebrew-core/pull/14849)
Update links-scaffolder.rb (https://github.com/Homebrew/homebrew-science/pull/2628)
links-scaffolder 1.5: Change sha256 (https://github.com/Homebrew/homebrew-science/pull/2498)
Update links-scaffolder.rb (https://github.com/Homebrew/homebrew-science/pull/2289)
Update links-scaffolder.rb (https://github.com/Homebrew/homebrew-science/pull/2239)
links-scaffolder 1.1 (new formula) (https://github.com/Homebrew/homebrew-science/pull/2015)
links 1.1: New formula (https://github.com/Homebrew/homebrew-science/pull/1988)

Can I still use brew to install arcs and links-scaffolder? Thanks!

Qiushi

tigmint-make ignores $PATH and is supposed to be run from unpacked source tree instead

Hi,
surprisingly it appears one cannot properly install tigmint into system-wide location. Some paths are hardcoded. Why cannot we just rely on $PATH?

sh -c 'gunzip -c foo_PacBio_and_Nanopore.fq.gz | \
/usr/bin/tigmint_estimate_dist.py - -n 1000000 -o foo_PacBio_and_Nanopore.tigmint-long.params.tsv'
samtools faidx foo__abyss_106-long-scaffs.fa

gzip: stdout: Broken pipe
/usr/bin/../src/long-to-linked-pe -l 500 -m2000 -g6.8e9 -s -b foo_PacBio_and_Nanopore.barcode-multiplicity.tsv --bx -t16 --fasta -f foo_PacBio_and_Nanopore.tigmint-long.params.tsv foo_PacBio_and_Nanopore.fq.gz | \
minimap2 -y -t16 -x map-ont --secondary=no foo__abyss_106-long-scaffs.fa - | \
/usr/bin/tigmint_molecule_paf.py -q0 -s2000 -p foo_PacBio_and_Nanopore.tigmint-long.params.tsv - | sort -k1,1 -k2,2n -k3,3n > foo__abyss_106-long-scaffs.foo_PacBio_and_Nanopore.cut500.molecule.size2000.bed
zsh:1: no such file or directory: /usr/bin/../src/long-to-linked-pe
[M::mm_idx_gen::79.608*1.56] collected minimizers
[M::mm_idx_gen::83.467*2.21] sorted minimizers
[M::main::83.468*2.21] loaded/built the index for 3530326 target sequence(s)
[M::mm_mapopt_update::85.100*2.19] mid_occ = 1442
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 3530326
[M::mm_idx_stat::86.092*2.17] distinct minimizers: 67761996 (20.86% are singletons); average occurrences: 10.813; average spacing: 5.460

It seems make continues despite an error, because for some reason an extra sh -c is used in the recipe.

Tigmint-cut not working correctly

Hi, I have been trying to run tigmint, but when I run tigmint-cut, the draft.tigmint.fa file comes up empty. I tried it first with a conda-env putting tigmint on it and since it didn't work, I download it directly from the source. Other than that, there is no error on the stderr.

Started at: 2019-08-05 23:04:06.343181
Reading contig lengths...
Finding breakpoints...

Tool:    bedtools getfasta (aka fastaFromBed)
Version: v2.25.0
Summary: Extract DNA sequences into a fasta file based on feature coordinates.

Usage:   bedtools getfasta [OPTIONS] -fi <fasta> -bed <bed/gff/vcf> -fo <fasta> 

Options: 
	-fi	Input FASTA file
	-bed	BED/GFF/VCF file of ranges to extract from -fi
	-fo	Output file (can be FASTA or TAB-delimited)
	-name	Use the name field for the FASTA header
	-split	given BED12 fmt., extract and concatenate the sequencesfrom the BED "blocks" (e.g., exons)
	-tab	Write output in TAB delimited format.
		- Default is FASTA format.

	-s	Force strandedness. If the feature occupies the antisense,
		strand, the sequence will be reverse complemented.
		- By default, strand information is ignored.

	-fullHeader	Use full fasta header.
		- By default, only the word before the first space or tab is used.

Cutting assembly at breakpoints...
DONE!
Ended at: 2019-08-05 23:50:22.002443

pip installation corrupt

When installing tigmint via pip3 install tigmint the shebangs of the python scripts are hard coded (and obviously not working on every system).

$ head -1 /usr/bin/tigmint*
==> /usr/bin/tigmint <==
#!/bin/sh

==> /usr/bin/tigmint-arcs-tsv <==
#!/usr/local/opt/python/bin/python3.7

==> /usr/bin/tigmint-cut <==
#!/usr/local/opt/python/bin/python3.7

==> /usr/bin/tigmint-make <==
#!/usr/bin/make -rRf

==> /usr/bin/tigmint-molecule <==
#!/usr/local/opt/python/bin/python3.7

samtools sort error (sort: invalid option -- 't')

problem solved, just use the default brew installed version 1.7

gsed: command not found

I faced an error following tigmint-arcs pipeline while running gsed command. I attempted to install it using brew install gnu-sed ,but it caused the error gnu-sed cannot be built with any available compilers.
and also error.
does it make difference to run sed instead the gsed?

tigmint-make tigmint error

Hello, I run this command:
tigmint-make tigmint draft=/ontas_racon1_pilon1 reads=/10x

The error information is：
make: *** No rule to make target '/ontas_racon1_pilon1./10x.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa', needed by 'tigmint'. Stop.

Could you please tell me how to solve this problem? Thank you very much.

Forward+reverse + long reads

Hi there!
I'm reading the documentation and I have a few doubts.

I have 10X linked reads (forward+reverse) and a Pacbio+ONT longreads files.

What would be the best choice to run first: Tigmint or Tigmint-Long?
If tigmint is the choice, I should pre-align each read in a separate round of BWA and then sort+merge them?

Thanks in advance!

tigmint-molecule

Hi,

I'm testing tigmint following your instruction. So far I executed the following commands:

bwa index $ASSEMBLY.fasta
bwa mem -t$THREADS -C $ASSEMBLY.fasta $R1 $R2 | samtools sort -@8 -tBX -o $ALIGNMENT.bam
tigmint-molecule $ALIGNMENT.sam | sort -k1,1 -k2,2n -k3,3n > $ALIGNMENT.bed

I also tried to convert the bam file into a sam and execute tigmint-molecule again. It always gave me a syntax error:

File "/home/fc464/software/tigmint/bin/tigmint-molecule", line 46
sep="\t", file=file)
       ^
SyntaxError: invalid syntax

This is how my fastq file are formatted:

@D00352:461:CCYW4ANXX:2:1101:16223:19699_AAAAAAAAACCAGAAA
TAACTAAGAATTCGAAAGAAGATTCGAACTCGCGCCTCCTGAATACCGTCCGGGCGCTCTCACCACTAAGCCATGCGTTCTACTACAAGCTGCGTCGAAATT
+
BFB<<FF<F/F///<FF/FF/<<<//</</<B<<B/BF<F<F/<<FF////F///7<BF</<B7BF//<BFF<B//7B<B/7B/B/BB/B/BF/////B7F<
@D00352:461:CCYW4ANXX:3:1205:1151:93622_AAAAAAAAAGTACCAA
GAAAAACAAAAAAAAAACGAACTACTTTTATCAGATAACATTGTTCTTTGAGCACATTTACAAAATAGCGATTTCATTTCAAAAAAACTAATAATTCATTGT
+
BFBF/FFFFBFFFFFFF<//<B/FF/<F/F<////BFFFFB//</<FFFF<<BFBFFFBFFFFFFFFFB//B/7/FB/<//7BFBBBB7FB<FFFFB7B/77
@D00352:461:CCYW4ANXX:4:2307:1605:2891_AAAAAAAACAATTCCA
ATAGAGGATGAAAGTGGCAGTTCACGTGGCGAAGCCGCGAGCGGGTGGCTAGTAAACAATAAGCGAGTTATCTCACCAGGTAAAATGTTGAAACATGATTCA
+
FFFFFFFFFFFFFFFBFBF<<F/FFB/F/FFFFFFFBFFFFFFF<<BBFFFFFFFFFFFFFFFFFBFBFFFFFFBFFFFFFFFFFFBFFBFFFFFBB/FFFF

the same way ARCS takes the reads.

What am I doing wrong? is there a problem with my pysam?

Thanks
F

Using MinION reads

Hi Shaun,

I am Raymond, a student of Robert. Robert has asked you whether it is possible to run tigmint with MinION reads in twitter, and you suggested that:"I haven't tested it, but you ought to be able to use Tigmint with long reads. Map reads to the assembly with e.g. minimap2, convert the PAF file to BED, then run tigmint-cut.". I have successfully run tigmint-cut, but could you give me some hints how to run the "scaffold" step after corrected misassemblies?

Thank you very much,
Raymond

"No rule to make target"

Hi,
After running successfully longranger
longranger basic --id=4606 --fastqs=input_fastqs --localcores=4 --localmem=30
I am running tigmint-span but I am getting this error:

[copettid@kp141-242 tigmint]$ ~/bin/tigmint/tigmint-make tigmint
make: *** No rule to make target 'draft.reads.as0.65.nm5.molecule.tsv', needed by 'tigmint'.  Stop.

I had to rename the .fastq.gz to fq.gz file, and the rest seems fine to me:

[copettid@kp141-242 tigmint]$ ~/bin/tigmint/tigmint-make help
Tigmint: Correct misassemblies using linked reads
Usage: tigmint-make [COMMAND]... [PARAMETER=VALUE]...
Example: tigmint-make tigmint draft=myassembly reads=myreads
For more information see https://bcgsc.github.io/tigmint/
[copettid@kp141-242 tigmint]$ ls
4606  input_fastqs  Rabiosa_genome.fa
[copettid@kp141-242 tigmint]$ ls 4606/outs/
barcoded.fq.gz  summary.csv

I am wondering if I should have that .tsv file created at some point. Or what is the reason for this issue?
Thanks,
Dario

pigz may be better replaced by bgzip

pigz may be replaced by bgzip from htslib package from http://www.htslib.org which scales better

samtools/samtools#1318 (comment)

Tigmint only works when files are in working directory

Tigmint was giving some issues with file names before (CentOS 7):

tigmint tigmint draft=supernova_draft/fasta/supernovadraft.1 reads=chromium_data/marmot_nochrombars t=32
bwa mem -t32 -pC supernova_draft/fasta/supernovadraft.1.fa chromium_data/marmot_nochrombars.fq.gz | samtools view -u -F4 | samtools sort -@32 -tBX -T$(mktemp -u -t supernova_draft/fasta/supernovadraft.1.chromium_data/marmot_nochrombars.sortbx.bam.XXXXXX) -o supernova_draft/fasta/supernovadraft.1.chromium_data/marmot_nochrombars.sortbx.bam
mktemp: invalid template, ‘supernova_draft/fasta/supernovadraft.1.chromium_data/marmot_nochrombars.sortbx.bam.XXXXXX’, contains directory separator
[bwt_restore_bwt] Failed to allocate 18446744073709551576 bytes at bwt.c line 452: Cannot allocate memory
[E::hts_open_format] Failed to open file supernova_draft/fasta/supernovadraft.1.chromium_data/marmot_nochrombars.sortbx.bam
samtools sort: can't open "supernova_draft/fasta/supernovadraft.1.chromium_data/marmot_nochrombars.sortbx.bam": No such file or directory
make: *** [supernova_draft/fasta/supernovadraft.1.chromium_data/marmot_nochrombars.sortbx.bam] Error 1

This was resolved by making a separate directory, symlinking the reads and draft files into the directory and running tigmint from there:

mkdir tigmint_stuff
cd tigmint_stuff
ln -s ../chromium_data/marmot_nochrombars.fq.gz marmot_basic.fq.gz
ln -s ../supernova_draft/fasta/supernovadraft.1.fa supernova1.fa
tigmint tigmint draft=supernova1 reads=marmot_basic

Is it possible to support running tigmint where the files are not located within the working directory?

About samtools invalid option in tigmint-make arcs mode

Dear Tigmint team:

I attached the Tigmint by using conda. When I run tigmint-make arcs by using 10X-linked reads, it showed the invalid option in samtools.

Could you give me some suggestions for solving the problem?

Thank you very much.

(Python3_6) [hsiang@pomology-serverIII 01_scaffolding_by_wgs]$ tigmint-make arcs draft=myassembly reads=myreads
bwa mem -t8 -pC myassembly.fa myreads.fq.gz | samtools view -u -F4 | samtools sort -@8 -tBX -T$(mktemp -u -t myassembly.myreads.sortbx.bam.XXXXXX) -o myassembly.myreads.sortbx.bam
sort: invalid option -- 't'
sort: invalid option -- 'B'
sort: invalid option -- 'X'
sort: invalid option -- 'T'
sort: invalid option -- '/'
sort: invalid option -- 't'

Usage: samtools sort [options] <in.bam> <out.prefix>
Options: -n sort by read name
-f use <out.prefix> as full file name instead of prefix
-o final output to stdout
-l INT compression level, from 0 to 9 [-1]
-@ INT number of sorting and compression threads [1]
-m INT max memory per thread; suffix K/M/G recognized [768M]

Usage: samtools view [options] <in.bam>|<in.sam> [region1 [...]]

Options: -b output BAM
-h print header for the SAM output
-H print header only (no alignments)
-S input is SAM
-u uncompressed BAM output (force -b)
-1 fast compression (force -b)
-x output FLAG in HEX (samtools-C specific)
-X output FLAG in string (samtools-C specific)
-c print only the count of matching records
-B collapse the backward CIGAR operation
-@ INT number of BAM compression threads [0]
-L FILE output alignments overlapping the input BED FILE [null]
-t FILE list of reference names and lengths (force -S) [null]
-T FILE reference sequence file (force -S) [null]
-o FILE output file name [stdout]
-R FILE list of read groups to be outputted [null]
-f INT required flag, 0 for unset [0]
-F INT filtering flag, 0 for unset [0]
-q INT minimum mapping quality [0]
-l STR only output reads in library STR [null]
-r STR only output reads in read group STR [null]
-s FLOAT fraction of templates to subsample; integer part as seed [-1]
-? longer help

[M::bwa_idx_load_from_disk] read 0 ALT contigs
make: *** [/home/hsiang/miniconda3/envs/Python3_6/bin/share/tigmint-1.2.4-0/bin/tigmint-make:191: myassembly.myreads.sortbx.bam] Error 1

tigmint-molecule silent

Hi @lcoombe,

I changed cluster and I had to reinstall the software. Now I'm testing if the new environment works. What I did was to map mu linked reads to the assembly using

bwa mem -t$THREADS -C $ASSEMBLY.fasta -p $READS | samtools sort -@$THREADS -tBX -o $ASSEMBLY.LinkedReads.sortbx.bam
The fist possible problem is that bwa mem doesn't recognise the paired reads, and I think maps them as single ends

###### Mon Oct 14 14:54:33 BST 2019: Mapping linked-reads with BWA onto AvanCR.Tigmint.test
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 722022 sequences (100000047 bp)...
[M::process] 722022 single-end sequences; 0 paired-end sequences

My reads headers are like this:

@A00618:19:HHCTMDMXX:2:1445:7048:35712/1_AAAAAAAAAAAAAGGA
TAAAGAAAATTGGAGGGTACGGTATCAATCTCGTTAGACTTTTAAGATTTATAGGAAAAGAATTGAAGAAGTTGAAGATAAATTAGGAAAAGGACCTGTTATAGGACATTGAAAGGGTATTAGATCG
+
FFFF,,,F,FFF,,,,FF,F:,:F:F,:,,,,FF,F,,,F:FF,,F,,,,FFFF,,,:F:F:FFF::FF:F:::FF,:FFFFFFFF,:,:,:F:,,,F,F,FFFF:FF,FFF,F,FFF:FF:FFFF:
@A00618:19:HHCTMDMXX:2:1445:7048:35712/2_AAAAAAAAAAAAAGGA
AATACAATTTAAATTAACTATAACAATTCCAATTCCTAATATATAGAAAACTTCATCAATTAAAAAACTATAAAACAAAAAATTCAAAAAAAAATTATAAACAACAACAAAATTTTCTATCGATCATATCCTTTTAATAAATTTAAAACA
+
,,F:F,,:,FF::,,::,,:FF:F,:F,F,FF:,::,FF,F,FF,,,,:F,::::,,::,:,,:,,FFF::::FF,F:F:::F,:,F,:,F,FFF,:FF:,,,::FF,F,FF,FFFF,,::F,F,F:,F,:FF,,,,F,,,,:,F,,F,,
@A00618:19:HHCTMDMXX:2:2459:29776:36526/1_AAAAAAAAAAAAAGGA
GAAAAGAAATAAGTTGGGTTTGATTATTTTATTTTTTGATTTTTGTTTATTATATGGTTATGGTTAAATTATTTTTTTTAATTTTTATTTTTTTATTTGTAAAAGAAAATATTTTTTGATATTATGT
+
FFFFFFFF:FF:FF:FFF:F:FF:FFFFFFFF,FF,F,FF::FFFF,FFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFF:FFFFFF:FF:F:FF:,::F,FF
@A00618:19:HHCTMDMXX:2:2459:29776:36526/2_AAAAAAAAAAAAAGGA
CCTAAAAAAATAACTAAAAAATCTAAAAATAATATATATTTTCTATTATAAAATTCTACGTAAAATAACAACTACTAATATAACCTCTTTATAATTTTCAATACACTTCAACAAACATTACCCCATTCAATCCTCACAACAACCCTACAT
+
:FFFFFFFFFFFFFFFFFFFFF,F,FF,FFFFF:F:FFFFFFFF,FFFFF:F:FFFF:FFF:FF:F:FFFFF,FFF:FFFFF:FFFFFF,F:F,::FFFFFFF,FF,F:,FFFFFFFFFFFFFFFFFFF:FF,FFFFF,F:FFFFF,FF:
@A00618:19:HHCTMDMXX:2:2266:17074:12493/1_AAAAAAAAAAAAATTC
ATTGTTATTATAATAGTATAAAATAATAATATTAATAGTCAAATATTTTATAATAAAATAAACAATGAATTATTAGTGAAAAAAAAAATATAATTGTTATTAAATTAAAATAAAAAATATATTAAAA
+
FFFF:,:FFFF:,FFF,:F,F,F:F,,:FF,FFF:F,FF,F,FFFFFFFF,,F,FF,,,,FF,:,FFF:FF,F,:,FF,F,FFF,F,FFFFFF,F,FF:,FFF,FF:FF::,FFFF,:F:::,FFF,
@A00618:19:HHCTMDMXX:2:2266:17074:12493/2_AAAAAAAAAAAAATTC
CTCTTTAAAAAATTTTCTATCTACAACCATAAAAATTCCAAAATTAATCAATATAACTACTATTTATATAAACAAATCTTTCTAACGCTAAATAATCACTTCACACACTAATCACACTACACACCACACTAATATCACATATTATAAAAC
+
F:F,::FF:F,FF:,F,,FF,FF,,FFF:,FF:FF,,F:F,FFFFFFF,,:,F,:F:,F:FFFFFFFFFF,,FFF,F,F:F:,FFF:,:,,,,,FFF:F:FF,FFF,F,F,,,,FF:,,:,FF,FFF:F:F,FFFF:::FFF,FF::,,:

Maybe the /1 and /2 is the problem.

But I also tried:

bwa mem -t$THREADS -C $ASSEMBLY.fasta $READ1 $READ2 | samtools sort -@$THREADS -tBX -o $ASSEMBLY.LinkedReads.sortbx.bam
and the mapping seems fine.

After that I execute:

tigmint-molecule $ASSEMBLY.LinkedReads.sortbx.bam | sort -k1,1 -k2,2n -k3,3n

bit everything is silent.
What am I doing wrong?

Does bin/tigmint_estimate_dist.py really work with FASTA files as well?

In the get_n_read_lengths() function it seems to iterate over always 4 rows so unless I am misreading the code it would mean it skips half of the firs N lines from a FASTA file.

BTW, the if read_count >= num_reads condition should be moved up and the equality check dropped (resulting in if read_count > num_reads). Checks for equality are much more expensive than "larger than" and "smaller than" comparisons.

Feature has length = 0, Skipping - followed by empty output from tigmint-long

After tinkering with the exact filename extensions of my input ONT FASTQ reads and even the Wengan hybrid assembly FASTA so that they had the extensions ".fa" and ".fq.gz" (".fasta." and ".fastq.gz" did not work, as mentioned by issue 116), I was able to get tigmint-long to run. However, the primary output file is 0 B, and there was an error message about 3 features having length = 0 (output copied below, starting from the last 3 lines of the minimap2 progress updates):

[M::worker_pipeline::2756.1448.02] mapped 1041921 sequences
[M::worker_pipeline::2776.8038.02] mapped 1039793 sequences
[M::worker_pipeline::2785.275*8.01] mapped 507341 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: minimap2 -y -t8 -x map-ont --secondary=no ../TransformedWenganOutputs/SortedPolishedWengan.fa -
[M::main] Real time: 2785.639 sec; CPU: 22318.370 sec; Peak RSS: 16.040 GB
samtools faidx ../TransformedWenganOutputs/SortedPolishedWengan.fa
/home/ext_sam_keating_gmail_com/micromamba/bin/share/tigmint-1.2.9-1/bin/tigmint-cut -p8 -w1000 -t0 -m3000 -f SAMPLE.tigmint-long.params.tsv -o ../TransformedWenganOutputs/SortedPolishedWengan.SAMPLE.cut500.molecule.size2000.distauto.trim0.window1000.spanauto.breaktigs.fa ../TransformedWenganOutputs/SortedPolishedWengan.fa ../TransformedWenganOutputs/SortedPolishedWengan.SAMPLE.cut500.molecule.size2000.distauto.bed
Started at: 2023-04-19 09:10:09.389024
Reading contig lengths...
Finding breakpoints...
Feature (WSC25979:429699-429699) has length = 0, Skipping.
Feature (WSC4271:3661334-3661334) has length = 0, Skipping.
Feature (WSC31873:85925-85925) has length = 0, Skipping.
Attempted corrections: 9841
Cutting assembly at breakpoints...
DONE!
Ended at: 2023-04-19 09:13:29.093514
ln -sf ../TransformedWenganOutputs/SortedPolishedWengan.SAMPLE.cut500.molecule.size2000.distauto.trim0.window1000.spanauto.breaktigs.fa ../TransformedWenganOutputs/SortedPolishedWengan.cut500.tigmint.fa

Scaffolds output tigmint-arcs error

Dear Shaun and Lauren,

Thanks to your advice concerning the tigmint-cut update, I was able to make tigmint work and locate possible mis-assemblies in my scaffold file. However, when trying to re-scaffold my broken assembly with tigmint arcs, I get and error message when the program is trying to write the final scaffold file:

stderr_arcs.txt

Nevertheless, the run output seems fine:

stdout_arcs.txt

Any idea what might have caused this error? I cant seem to figure it out.

It might be that this is related to the dependency of Links, as brew install links-scaffolder was not available, and I had to install the related dependencies of ARCS individually (https://github.com/bcgsc/arcs)

brew.txt

Any advice is highly appreciated.

Kind regards,

Jordi de Raad

tigmint-long error

Hi there,

I seem to have problems running tigmint-long (installed via conda) on a draft assembly using my nanopore reads.

I get this weird error about the bed file being malformatted. Since this is not an input from my side, this seems to be a problem introduced by tigmint itself. I am not sure which bed file the error refers to, but the one bed (.cut500.as0.65.nm500.molecule.size2000.bed) file has no such malformatting in it at line 33

contig1006_shasta 0 3935 978453 5
contig1006_shasta 0 4951 125221 8
contig1006_shasta 74 5598 522996 8
contig1006_shasta 92 5686 498725 8
contig1006_shasta 194 5672 394040 7
contig1006_shasta 245 4638 80577 6
contig1006_shasta 254 5649 1035475 7
contig1006_shasta 293 5684 573992 7
contig1006_shasta 349 5718 793715 6
contig1006_shasta 399 5761 343613 9
contig1006_shasta 417 4915 576630 5
contig1006_shasta 428 4905 389381 6
contig1006_shasta 433 4944 414072 5
contig1006_shasta 437 4463 38003 6
contig1006_shasta 451 4952 269440 5
contig1006_shasta 463 4950 75560 5
contig1006_shasta 536 4544 882874 6
contig1006_shasta 543 5544 1040578 7
contig1006_shasta 593 5590 202003 6
contig1006_shasta 672 5688 861879 6
contig1006_shasta 802 5761 641545 7
contig1006_shasta 837 5761 625202 7
contig1006_shasta 843 4891 817031 4
contig1006_shasta 986 4013 839224 4
contig1006_shasta 1231 4722 13220 4
contig1006_shasta 1288 5761 954350 5
contig100_shasta 0 3884 777567 5
contig100_shasta 0 4337 828621 4
contig100_shasta 0 4391 351030 8
contig100_shasta 0 8531 658954 8
contig100_shasta 0 8744 644223 9
contig100_shasta 0 9100 2106 11
contig100_shasta 0 11316 459022 18
contig100_shasta 0 13326 162799 18
contig100_shasta 0 13473 322356 16

Here is the commandline output:
samtools faidx draft.fa
/home/ek/.conda/envs/tigmint/bin/tigmint-cut -p30 -w1000 -t100 -f Bterter.recalled-dna_r9.4.1_450bps_sup.prowler.filtlong.10kb.Q95.tigmint-long.params.tsv -o draft.Bterter.recalled-dna_r9.4.1_450bps_sup.prowler.filtlong.10kb.Q95.cut500.as0.65.nm500.molecule.size2000.trim100.window1000.spanauto.breaktigs.fa draft.fa draft.Bterter.recalled-dna_r9.4.1_450bps_sup.prowler.filtlong.10kb.Q95.cut500.as0.65.nm500.molecule.size2000.bed
Started at: 2021-06-16 19:12:42.185574
Reading contig lengths...
Finding breakpoints...
Cutting assembly at breakpoints...
Error: malformed BED entry at line 33. Start was greater than end. Exiting.
DONE!
Ended at: 2021-06-16 19:22:20.750825

Any ideas what the problem may be?

Thanks in advance

Unknown error preventing tigmint-make across multiple installation methods

Hi all,
Thanks for all your work developing this tool, it looks extremely useful!
I'd really love to get this tool up and running on my workstation. However, I receive this error no matter which installation method I try, which genome version, etc...

./bin/tigmint-make tigmint draft=draft.fasta reads=barcoded.fastq.gz
"make: *** No rule to make target 'draft.fasta.barcoded.fastq.gz.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa', needed by 'tigmint'. Stop."

Specs: Ubuntu 16.04, Intel Xeon 48 threads, 128 Gb RAM, 8Gb Graphics card.. and a cool mouse?..
Any guidance would be greatly appreciated!
Best,
bjp

tigmint_molecule_paf.py: TypeError: expected string or bytes-like object

Does tigmint-make tigmint-long support FASTQ reads from other platforms than 10x genomics chromium?

$ bash -x tigmint.sh SCRATCH=/scratch/mmokrejs/job_3024675.cerit-pbs.cerit-sc.cz TMPDIR=/scratch/mmokrejs/job_3024675.cerit-pbs.cerit-sc.cz SORT_OPTS='-S 1G'
+ '[' -z '' ']'
+ threads=14
+ myreads=foo_PacBio_and_Nanopore.fq.gz
+ for f in foo__abyss_*long-scaffs.fa
++ basename foo__abyss_106-long-scaffs.fa .fa
+ p=foo__abyss_106-long-scaffs
+ echo 'tigmint-make tigmint-long draft=foo__abyss_106-long-scaffs.fa reads=foo_PacBio_and_Nanopore.fmlrc2.fa.gz span=auto G=6.8e9 dist=auto'
tigmint-make tigmint-long draft=foo__abyss_106-long-scaffs.fa reads=foo_PacBio_and_Nanopore.fmlrc2.fa.gz span=auto G=6.8e9 dist=auto
++ basename foo__abyss_106-long-scaffs.fa .fa
++ basename foo_PacBio_and_Nanopore.fq.gz .fq.gz
+ tigmint-make tigmint-long draft=foo__abyss_106-long-scaffs reads=foo_PacBio_and_Nanopore longmap=ont span=auto G=6.8e9 dist=auto t=14
long-to-linked-pe -l 500 -m2000 -g6.8e9 -s -b foo_PacBio_and_Nanopore.barcode-multiplicity.tsv --bx -t14 --fasta -f foo_PacBio_and_Nanopore.tigmint-long.params.tsv foo_PacBio_and_Nanopore.fq.gz | \
minimap2 -y -t14 -x map-ont --secondary=no foo__abyss_106-long-scaffs.fa - | \
tigmint_molecule_paf.py -q0 -s2000 -p foo_PacBio_and_Nanopore.tigmint-long.params.tsv - | sort -k1,1 -k2,2n -k3,3n -T /scratch/mmokrejs/job_3024675.cerit-pbs.cerit-sc.cz -S 1G > foo__abyss_106-long-scaffs.foo_PacBio_and_Nanopore.cut500.molecule.size2000.bed
long-to-linked-pe v1.0: Using more than 6 threads does not scale, reverting to 6.
[M::mm_idx_gen::84.761*1.60] collected minimizers
[M::mm_idx_gen::89.593*2.24] sorted minimizers
[M::main::89.593*2.24] loaded/built the index for 3530326 target sequence(s)
[M::mm_mapopt_update::91.203*2.22] mid_occ = 1442
[M::mm_idx_stat] kmer size: 15; skip: 10; is_hpc: 0; #seq: 3530326
[M::mm_idx_stat::92.061*2.21] distinct minimizers: 67761996 (20.86% are singletons); average occurrences: 10.813; average spacing: 5.460
Traceback (most recent call last):
  File "/usr/lib/python-exec/python3.9/tigmint_molecule_paf.py", line 141, in <module>
    main()
  File "/usr/lib/python-exec/python3.9/tigmint_molecule_paf.py", line 138, in main
    MolecIdentifierPaf().run()
  File "/usr/lib/python-exec/python3.9/tigmint_molecule_paf.py", line 98, in run
    self.print_new_molecule(prev_barcode, cur_intervals, out_molecules_file)
  File "/usr/lib/python-exec/python3.9/tigmint_molecule_paf.py", line 43, in print_new_molecule
    barcode_match = re.search(r'^BX:Z:(\S+)', barcode)
  File "/usr/lib/python3.9/re.py", line 201, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object
^Cmake: *** Deleting file `foo__abyss_106-long-scaffs.foo_PacBio_and_Nanopore.cut500.molecule.size2000.bed'
time user=64.51s system=53.79s elapsed=565.02s cpu=20% memory=4 job=long-to-linked-pe -l 500 -m2000 -g6.8e9 -s -b  --bx -t14 --fasta -f
time user=807.08s system=66.89s elapsed=151.51s cpu=576% memory=40947 job=minimap2 -y -t14 -x map-ont --secondary=no  -
time user=0.06s system=0.05s elapsed=151.61s cpu=0% memory=12 job=tigmint_molecule_paf.py -q0 -s2000 -p  -
time user=0.00s system=0.00s elapsed=151.61s cpu=0% memory=0 job=
make: *** [foo__abyss_106-long-scaffs.foo_PacBio_and_Nanopore.cut500.molecule.size2000.bed] Interrupt

Tigmint performance on large genome

Dear developers,

I am trying to analyze a very large (genome size=18Gbp) and fragmented assembly (~500k contigs, N50=50kbp, assembly size=22Gbp, due to excess of haplotigs) with tigmint.

We have four libraries of 10x chromium linked-reads (from a single individual), totaling ~30x coverage across this genome, and have preprocessed the data with longranger and tigmint-molecule. I ran tigmint-cut on this data on 14 cores for one week with full CPU usage for a week but it never printed anything to disk or the screen beyond the "Finding breakpoints..." message.

Admittedly, this use case and assembly is somewhat extreme but I'd still appreciate some feedback. Is it possible that the program might have stalled due to some technical reason (e.g. waiting for a missing dependency program or expected output)? To you knowledge, has it been used for very large and/or repetitive genomes before? If so, how did it perform?

It is possible to add debug messages that may reveal a performance bottleneck or particular step in the process that may need some tweaking? Any other constructive tips?

Respect $TMPDIR as anticipated by sort tool

The sort tool by default uses /tmp which is typically very small and even may not be writable in cluster environments by users. If non-empty, the Makefile shall pass its contents down to sort commandline.

tigmint-make: minimap2 is being called with -y argument

Hi,
I wonder what the -y argument does. It is not documented in minimap2 --help. Should that be -Y instead?

-Y use soft clipping for supplementary alignments

I refer to line https://github.com/bcgsc/tigmint/blob/master/bin/tigmint-make#L225

Error when attempting tigmint-make arcs

I have already completed running Tigmint on my assembly, but now I would like to add on the scaffolding with ARCS and to run metrics. However, when I run it with the -n flag first to make sure it won't overwrite any files, I get the error below. I previously got a similar error in the beginning when first attempting to run tigmint. However, I realized it was due to an error with a file name so that has already been corrected so I'm not sure what the issue is now. Below are the input file names, my command, and the resulting error.

Input files
Dtrenchii_assembly_v2.CCMP2556_10xreads_longranger.sortbx.bam
Dtrenchii_assembly_v2.fa

Command
tigmint-make metrics draft=Dtrenchii_assembly_v2 reads=CCMP2556_10xreads_longranger -n

Output
abyss-fac -G-1 -t500 Dtrenchii_assembly_v2.fa >Dtrenchii_assembly_v2.abyss-fac.tsv
seqtk seq Dtrenchii_assembly_v2.fa | tr _ '~' | abyss-fatoagp -f Dtrenchii_assembly_v2.scaftigs.fa >Dtrenchii_assembly_v2.scaftigs.fa.agp
abyss-fac -G-1 -t500 Dtrenchii_assembly_v2.scaftigs.fa >Dtrenchii_assembly_v2.scaftigs.abyss-fac.tsv
make: *** No rule to make target 'Dtrenchii_assembly_v2.scaftigs.ref.samtobreak.tsv', needed by 'draft_metrics'. Stop.

Provide 'make install' procedure

Hi Shaun,
I wanted to install Tigmint as I realized abyss-2.1.0 has it as an optional dependency.

First of all it appears to me there are more dependencies than you state in Readme.md.

Second, there is no make install DESTDIR=$(DESTDIR) supported, and I am too lazy to figure out which files should be placed in /usr/bin, /usr/lib, /usr/share/doc/tigmint/, etc. I know there is some Makefile already in the top-level direcotry, well that complicates the situation even more.

Thank you

Changing /tmp location

Hello, I'm running tigmint-make tigmint draft=myassembly reads=myreads command in ORCA docker environment. I got this error;

[E::bgzf_close] File write failed
[E::bgzf_flush] File write failed (wrong size)
[E::bgzf_close] File write failed
[E::bgzf_flush] File write failed (wrong size)
[E::bgzf_close] File write failed
[E::bgzf_flush] File write failed (wrong size)
[E::bgzf_close] File write failed
samtools sort: failed to create temporary file "/tmp/myassembly.myreads.sortbx.bam.PVwtv6.0136.bam": No space left on device
samtools sort: failed to create temporary file "/tmp/myassembly.myreads.sortbx.bam.PVwtv6.0137.bam": No space left on device

I believe this is due to my filesystem not having enough memory in the /tmp directory. Is there a way to reassign the temporary directory to another location?

Thank you,
Ilayda

README does not list all dependencies

Seems like when you added zsh to abyss you should also add it to the list of dependencies here and in other tools as well maybe, like arcs, longstitch, etc?

Please update the README to reflect that. Also add bgzip and pigz from htslib and pigz packages.

write temporary bam files

Hello,

I was running tigmint-span and the sever crashed (I am pretty sure it is not due to tigmint, unless it required a huge amount of memory all of a sudden).
I have now a bunch of *.bam.tmp.0628.bam files - I guess it died during the sorting of this command:
bwa mem -t$t -pC $(draft).fa $< | samtools view -h -F4 | samtools sort -@$t -tBX -o $@
I am thinking of two alternative options:

to split the above command in separate steps and have a file for each (at least until the following step is completed with success);
to feed tigmint with a bam file produced outside of its pipeline, and run just the actual tigmint algorithm.
Would it be possible to implement any of the options, or any other that will help deal with long computation time?
Thanks,
Dario

bcgsc / tigmint Goto Github PK

tigmint's Introduction

Correct misassemblies in genome assembly drafts using linked or long sequencing reads

Citation

Description

Installation

Install Tigmint using Brew

Install Tigmint using Conda

Run Tigmint using Docker

Install Tigmint from the source code

Dependencies

Install Python package dependencies

Install the dependencies of ARCS (optional)

Install the dependencies for calculating assembly metrics (optional)

Usage

Notes

tigmint-make commands

Parameters of Tigmint

Parameters of ARCS

Parameters of LINKS

Parameters for calculating assembly metrics

Tips

Using stLFR linked reads

Support

Pipeline

tigmint's People

Contributors

Stargazers

Watchers

Forkers

tigmint's Issues

Recommend Projects

Recommend Topics

Recommend Org