gear-genomics / tracy Goto Github PK
View Code? Open in Web Editor NEWBasecalling, alignment, assembly and deconvolution of Sanger Chromatogram trace files
Home Page: https://www.gear-genomics.com/
License: BSD 3-Clause "New" or "Revised" License
Basecalling, alignment, assembly and deconvolution of Sanger Chromatogram trace files
Home Page: https://www.gear-genomics.com/
License: BSD 3-Clause "New" or "Revised" License
Hi,
I used "decompose" to call variants from sanger ab1 file (forward). But, I got the following massage:
Find Reference Match
Load FM-Index
Couldn't anchor the Sanger trace in the selected reference genome.````
I have checked the file and it has no problem.
> The sequence is:
AGATGCCTTGCTGCCCGTGNCTTTTGCGTGCAAGAGAACTGAGAGCNCCCAANGGGATGGAGCTGTGTAAGAAGTACCAGCAGCAGACCGTGGTGGCCATTGACCTGGCTGGAGATGAGACCATCCCAGGAAGCAGCCTCTTGCCTGGACATGTCCAGGCCTACCAGGTGGGTCCTGTGAGAAGGAATGGAGAGGCTGGCCCTGGGTGAGCTTGTCTCCCACCCATAGTTGGTGGAGAACAGTGACGATCGCA
`**Not: the reverse file has no problem, and run correctly**`
**`Note2: I have seen a closed issue, someone had the same problem. But, unfortunately I dont know the location of the sequence on the genome and I cant use a custom fasta file instead of reference genome.`**
`my command is:
tracy decompose -o forward -a homo_sapiens -q 100 -u 100 -r GRch37/Homo_sapiens.GRCh37.dna.primary_assembly.fa.gz AD-F.ab1`
I was able to install tracy on Linux and I ran ./tracy_v0.5.7_linux_x86_64bit -h
to open the software. The command line outputs saying "This is free software... etc.". I want to run an assemble and typed, tracy assemble file1.ab1 file2.ab1
but get an error that says: "tracy: command not found." What am I doing wrong?
Hi,
This might be a very basic question, but I can't see how to do it in the manual help files. I would like to provide tracy with over 200 .abi input files at one go. Is it possible to provide a file containing a list of all input files rather than having them listed separately within the input command?
Thanks,
Jenni
Hello,
I have a bunch of AB1 files without corrected intensity levels, so only data on fields DATA1-4.
When I try tracy on these files I am getting a segfault error.
Do you know if it is possible to perform basecalling on them? I suspect I will have to generate them again.
Thanks!
Hi,
Thank you for a great tool.
I am using it daily on 50-300 sanger files, and today one of the ab1 files produced sequence and qual score of different length in tracy basecall. the ab1file did not contain useful data, but I thought it might be relevant to report that tracy basecall can produce this issue.
I can send you the ab1 file if you are interested?
I am using Tracy version: v0.5.3 (conda install)
First off - thanks for all the great work on tracy. It's quite amazing to me how few tools there are for performing trace file assembly - so thanks for filling this void with a very nice tool!
I have been using tracy quite a bit recently to assemble trace files, and perform variant calling relative to a reference sequence. Generally, this seems to work very well using tracy, but I have a question (related to a previous issue ) on the interplay between base-call confidence (on the chromatogram), and consensus formation.
I'm seeing incorrect consensus calls being made for a particular base where one of the trace files contains a low-confidence call and the other a high confidence call. From what I understand (based on your previous explanation) tracy does not use the base quality from the chromatogram, and I guess just choses on base over the other when there's a disagreement?
Here's what I'm seeing:
This shows 2 trace files in Geneious. When I assemble these using tracy assemble --format fastq --inccons trace1.ab1 trace2.ab1
the resulting consensus contains insertions at both positions highlighted in red. This is strange to me - the base quality in trace 2
is clearly higher than in trace 1
. Or is it the case that with insertions in one trace file, there is no base to compare to in the second trace file, so the insertion is included in the consensus, irrespective of quality?
Is this expected behaviour?
Thanks for any help!
It's not made explicit which way round the input signal to noise ratio is interpreted - I assume for example that the default 0.33 means noise 1:3 signal for a base to be called. Since it's not documented, could you clarify for me? Thanks very much.
Hello
To assemble forward and reverse ab1 files from Sanger sequence, I executed assemble command without reference (i.e. de novo assemble) like
$ tracy assemble forward.ab1 reverse.ab1
and expected that the aligned fasta display sequences in the same strand and direction as forward would be, but in fact, it followed in reverse strand manner.
It seems like it is not determined at random because I always get outputs in reverse strand manner, and when I switched the order of arguments, that is, reverse.ab1 first and forward.ab1 second, I got the aligned fasta in forward strand manner.
So, I guess that a strand orientation in de novo assemble outputs is always based on the last ab1 file given in command line.
Is this correct?
Hi!
I tried to create a fm9 file for the hg38, but the program quickly ate up all the 16GB of available RAM and killed the process. Is this normal behavior?
The file was in fasta.gz format.
Thanks!
Hello
I have install v0.6.1 to assemble forward and reverse Sanger sequences.
Because a source of the sequences is a cloning vector, I need to compromise mismatch sites between forward and reverse overlaps(that is, no heterozygous site is expected).
I think I should do this based on quality information, but now I'm wondering if an out.vertical file is the one I'm looking for.
My question is whether the consensus sequence(the most right character) in out.vertical file always chooses the one with higher quality.
(example)
-T|T
-C|C
-A|A
-A|A
-A|A
GG|G
AA|A
AA|A
GG|G
TT|T
CC|C
TT|T
AC|A <- is this A chosen because its quality is higher than C?
CC|C
CC|C
I'm using the FASTA sequence (single sequence, as you can see in the image below) as a reference, to align my *ab1 file using the Indigo module (https://www.gear-genomics.com/indigo/). But, I don't know why the program it's returning the message: "Fasta file has incorrect file type!". Could you help me solve this problem?
Thank you for the awesome tool! I'm using it to deconvolute SARS-CoV-2 Sanger data and I've found that sometimes I get erroneous variant calls because of some underlying noise in one of the reads that is not present in the other read.
For example if you look at the sequence AAACTG there is some underlying noise in the forward read (bottom) but not the reverse read, or there is contrasting noise that should really cancel out, but sometimes is called as a variant.
Ideally I would like to use Tracy like the Indigo tool (but I want to be able to have forward and reverse reads) and I would like to be able to tune the peak percentage cut off.
And another question, is there a way to have automatic annotation of Amino acids in the vcf file? I see that this is done in similar pipelines with bcftools and a gff file.
Hi,
Tracy seems awesome. However, I'm having trouble install/compiling on both an Intel Mac and an M1 Mac. Unfortunately, I don't have much experience with compilers and C++.
Following the instructions in the documentation.
Some issues:
==> bzip2
zlib is keg-only, which means it was not symlinked into /opt/homebrew,
because macOS already provides this software and installing another version in
parallel can cause all kinds of trouble.
For compilers to find zlib you may need to set:
export LDFLAGS="-L/opt/homebrew/opt/zlib/lib"
export CPPFLAGS="-I/opt/homebrew/opt/zlib/include"
For pkg-config to find zlib you may need to set:
export PKG_CONFIG_PATH="/opt/homebrew/opt/zlib/lib/pkgconfig"
==> bzip2
bzip2 is keg-only, which means it was not symlinked into /opt/homebrew,
because macOS already provides this software and installing another version in
parallel can cause all kinds of trouble.
If you need to have bzip2 first in your PATH, run:
echo 'export PATH="/opt/homebrew/opt/bzip2/bin:$PATH"' >> ~/.zshrc
For compilers to find bzip2 you may need to set:
export LDFLAGS="-L/opt/homebrew/opt/bzip2/lib"
export CPPFLAGS="-I/opt/homebrew/opt/bzip2/include"
Do you have any recommendations about adding these compiler flags?
make all
), I got the following error~/code/tracy (main) » make all
if [ -r src/htslib/Makefile ]; then cd src/htslib && autoreconf -i && ./configure --disable-s3 --disable-gcs --disable-libcurl --disable-plugins && /Library/Developer/CommandLineTools/usr/bin/make && /Library/Developer/CommandLineTools/usr/bin/make lib-static && cd ../../ && touch .htslib; fi
/bin/sh: autoreconf: command not found
make: *** [.htslib] Error 127
As such, following Stack Exchange, I downloaded the automake
package which then provides the autoreconf
command. This should be included in the list of packages (in the documentation) to be installed via homebrew.
g++ -std=c++14 -isystem /Users/adityaprasad/code/tracy/src/jlib/ -isystem /Users/adityaprasad/code/tracy/src/htslib/ -isystem /Users/adityaprasad/code/tracy/src/sdslLite//include -pedantic -W -Wall -O3 -fno-tree-vectorize -DNDEBUG src/tracy.cpp -o src/tracy -L/Users/adityaprasad/code/tracy/src/htslib/ -L/Users/adityaprasad/code/tracy/src/htslib//lib -L/Users/adityaprasad/code/tracy/src/sdslLite//lib -lboost_iostreams -lboost_filesystem -lboost_system -lboost_program_options -lboost_date_time -ldl -lpthread -lhts -lz -llzma -lbz2 -Wl,-rpath,/Users/adityaprasad/code/tracy/src/htslib/
In file included from src/tracy.cpp:13:
src/index.h:9:10: fatal error: 'boost/dynamic_bitset.hpp' file not found
#include <boost/dynamic_bitset.hpp>
^~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
make: *** [src/tracy] Error 1
I'm not sure what to do here. I have my own installation of boost. Do I have to link it to that somehow?
For the M1 Mac, this just fails to find the tracy package.
~ » conda install -c bioconda tracy
Collecting package metadata (current_repodata.json): done
Solving environment: unsuccessful initial attempt using frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: unsuccessful initial attempt using frozen solve. Retrying with flexible solve.
PackagesNotFoundError: The following packages are not available from current channels:
- tracy
Current channels:
- https://conda.anaconda.org/bioconda/osx-arm64
- https://conda.anaconda.org/bioconda/noarch
- https://conda.anaconda.org/conda-forge/osx-arm64
- https://conda.anaconda.org/conda-forge/noarch
- https://repo.anaconda.com/pkgs/main/osx-arm64
- https://repo.anaconda.com/pkgs/main/noarch
- https://repo.anaconda.com/pkgs/r/osx-arm64
- https://repo.anaconda.com/pkgs/r/noarch
To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.
For the Intel Mac, it starts out better but still fails. In a pre-existing environment, I get the following error
~ » conda install -c bioconda tracy
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: |
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
UnsatisfiableError: The following specifications were found to be incompatible with each other:
Output in format: Requested package -> Available versions
Package libcxx conflicts for:
python=3.9 -> libffi[version='>=3.3,<3.4.0a0'] -> libcxx[version='>=4.0.1']
python=3.9 -> libcxx[version='>=10.0.0|>=12.0.0|>=14.0.6']
Package xz conflicts for:
python=3.9 -> xz[version='>=5.2.10,<6.0a0|>=5.4.2,<6.0a0|>=5.2.8,<6.0a0|>=5.2.6,<6.0a0|>=5.2.5,<6.0a0']
tracy -> htslib[version='>=1.17,<1.18.0a0'] -> xz[version='>=5.2.4,<5.3.0a0|>=5.2.5,<5.3.0a0|>=5.2.6,<5.3.0a0|>=5.2.6,<6.0a0']
Package zlib conflicts for:
python=3.9 -> sqlite[version='>=3.41.2,<4.0a0'] -> zlib[version='>=1.2.13,<2.0a0']
python=3.9 -> zlib[version='>=1.2.11,<1.3.0a0|>=1.2.12,<1.3.0a0|>=1.2.13,<1.3.0a0']
In a fresh environment, I get the following error
~ » conda install -c bioconda tracy
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: /
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed
UnsatisfiableError:
Hello,
Tracy looks like a fantastic CLI, thank you for creating and maintaining it.
Can you please provide more details on the stringency -t
option in the assembly
subcommand. I am guessing 9 is the highest, 1 is the lowest. It would be great to know what the difference between stringency 4
and 5
, and if the trimming is at the start/end only or will trimming happen on each trace/base.
I notice the decompose
subcommand includes additional trimming parameters (trimLeft and trimRight), I think these would be handy to include in assembly
too.
Thank you,
Ammar
Hi,
Trim option is used by default in tracy assemble, but can tirm option be disabled?
Thank you!
Hi Tobias,
I know you added in the --incref feature to include reference in the consensus computation. However, I upgraded to most recent tracy and the --incref is still not an accessible feature. Do I have to manually install this feature somehow?
I'm trying to get Tracy to run (as part of a pipeline we're using in our lab) on a macOS catalina system. Unfortunately the binary can't execute on a mac - I imagine perhaps it simply wasn't designed to run on mac, but do you perhaps have any suggestions on how to get around this?
Per #62, running the bioconda tracy package is really slow for the consensus command at approx ~15s per run compared to about ~1 for the precompiled binary. I am calling tracy in a bash pipeline, within a loop and having activated the conda env prior to the loop, in the following way:
if
tracy consensus \
-o "$out_dir"/assembly/"$code"_cons \
-q 0 -u 0 -r 0 -s 0 \
-b "$code" \
"$ffile" \
"$file" \
>> "$out_dir"/logs/basecall_log.txt 2>&1
then
...
The only thing that changes to use the precompiled binary is tracy consensus
becomes ./tracy/tracy consensus
. It is unclear to me why this change would result in such a massive slowdown. I could avoid the bioconda package, but I cannot get the precompiled binary to run on mac. Thanks!
I am perfoming testing on Ubuntu 20.04 with Tracy 0.7.1
I cannot get tracy consensus to run on certain .ab1 files on a colleagues WSL install. I can however get them to run on a native linux install. The error is the following:
terminate called after throwing an instance of 'std::length_error'
what(): cannot create std::vector larger than max_size()
I've attached a pair of traces from the failing batch below
004_B05.zip
The sequencer shows annotation for insertion one bp ahead instead of showing the position before and after the insertion. What could be the possible issue or is that an acceptable annotation?
I am busy working on a Galaxy wrapper for the tracy subcommands. Could you perhaps provide some sample data to test with?
Hi,
I have been trying to export ab1 basecalling into tsv, but I get the error "File lacks basecalls!". Is there anything I am missing to run it properly?
Here what I used:
tracy basecall -f tsv -o testout.tsv 10F_Kata_Arg4_Arg_F.ab1
I attach the sample file.
Thanks!
10H_Kata_Ser1_Ser_F.zip
I tried download Tracy on my conda as documented here: https://anaconda.org/bioconda/tracy.
However, this installs the old version (version: 0.5.3) and not the most recent 0.5.7 version. Is this a glitch on conda? I was able to download recent version as a statically linked binary.
I created a malformed reference file that had a whitespace after ">". When trying to write the variant in the file, BCFTOOLS returned ID -1 error and the variant didn't appear in the final BCF. Still, tracy returned error code 0.
Is there any way that tracy could somehow signal this in the error code, as well as any other problem that might arise? No need for specifics, just a non-zero code. If this is missed, I think there is no way to differentiate the homozygous reference call from this error.
Hello here,
I am working with tracy v0.5.3 on the commandline to call variants. I first downloaded a selected section of the human genome as the reference. On calling the variants, the results don't have the rs identifiers yet the web results give the rs identifiers
I thought i would resort to the full human genome reference (GRCH38) but this returned the error below
"terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc"
Hi,
Just a different question too.
We might have some individuals which are aneuploid - based on previous microsatellite work with my organisms populations.
I realise that Tracy wants a max of two alleles for decomposition. What happens if there are three? Does it just give a high decomposition error score?
Many thanks,
Jenni
Hi @tobiasrausch
Thanks for developing and maintaining tracy
. I've just discovered your tool and started trying it out, especially within a workflow manager like nextflow.
I noticed some differences between the docker containers of tracy
that are made available to the community and I was wondering which one you were maintaining and would advise:
The dockerhub version is the most up-to-date with respect to the repository (built on 2023-10-27), it is really small in size (~15Mb on disk) and blazing fast! However, there is no tags to track down the version of tracy
, the working directory is /root
, which you need to look up in the Dockerfile.
The quay.io version has tags, and a working directory at the /
but it is way (2-5x) slower for no reason that I could point to. This version was built in May but with the most recent tag of the repository.
One of the main differences between the two versions is that I can only use one of them within nextflow. There is no bash
command in no bash command in the dockerhub version. This is an executable container, which is deprecated in workflow managers (see nextflow-io/nextflow#529).
This is unfortunate that the two versions of the containers do not agree as their combined pros would be awesome (tags, speed, small size, runnable in nextflow)!
Building the container using the Dockerfile in the repository create a container that is very similar to the one on dockerhub. However, with a small workaround there I could make the docker image be used by nextflow by adding bash
to the alpine image via:
RUN apk add --no-cache bash
Let me know how would you envision this moving forward with the continuous releases of docker images.
Best,
I got this issue on the latest code when try to decompose a sanger seq result, what is the matter? Is this a samtools version related issue?
Hi @tobiasrausch - thanks for the great work on this package to date. I've been experimenting with tracy consensus
and tracy assemble
and have found myself wanting the level of control afforded in the latter when defining the conditions under which a pair of reads would assemble in the consensus
command. Specifically being able to specify a minimum overlap length would be useful. Is there a specific reason why these aren't considered common 'alignment options' across methods?
Thanks in advance for considering!
This is not so much of an issue, but I just wanted to dig a bit into expected completion times for tracy assemble
.
I'm running tracy assemble
(de novo assembly) on 4 .ab1
files, each with around 1Kb of high quality calls. The sequenced construct is ~2Kb (so there's a lot of overlap between the 4 .ab1
files
Here's the command I'm running:
tracy assemble primer_f_1.ab1 primer_f_2.ab1 primer_r_1.ab1 primer_r_2.ab1 --format fastq
This takes around 2 minutes to complete, which seems like a long time. Is this expected?
Dear Tracy Team,
I have a .ab1 trace for which I can extract a fasta with Tracy basecall.
However, I am unable to get an anchor on my reference genome (a proprietary plant genome).
my command line is:
tracy decompose -v -g ref.fasta.gz -o BM21_G2_4-G2_F4.bcf trace_files/BM21_G2_4-G2_F4.ab1
Here is the message error from tracy:
[2020-Nov-30 12:57:37] Load ab1 file
[2020-Nov-30 12:57:37] Load FM-Index
[2020-Nov-30 12:57:40] Find Reference Match
Couldn't anchor the Sanger trace in the selected reference genome.
A blast against the db nevertheless returns me a hit with 2 hsps
>ref
Length=230546352
Score = 948 bits (513), Expect = 0.0
Identities = 594/635 (94%), Gaps = 5/635 (1%)
Strand=Plus/Plus
Query 12 TAT-TTCTAAA-ATTACTTTCAATAATGCCATTTATATTTACTTTGAAGCATATGTTGNT 69
||| ||||||| ||||| || | ||||||||||||||||||||||||||||||||||| |
Sbjct 226270899 TATCTTCTAAATATTAC-TTAATTAATGCCATTTATATTTACTTTGAAGCATATGTTGTT 226270957
Query 70 TGAACTCTTCAAAACTATTGAAATAGGTGCATGTCGGATTCTCTAGAATTAAATTATTTT 129
|||||||||||||| ||||| |||||||||||||||||||||||||||||||||||||||
Sbjct 226270958 TGAACTCTTCAAAATTATTGTAATAGGTGCATGTCGGATTCTCTAGAATTAAATTATTTT 226271017
Query 130 GTATAATTTGACACCAACGCCATAACAATTTTCNAGAGTTCAAACAACATAGTTTGAAAA 189
||| |||||||||||||||| ||||||||||| ||||||||||||||||||||||||||
Sbjct 226271018 GTAGAATTTGACACCAACGCGATAACAATTTTGGAGAGTTCAAACAACATAGTTTGAAAA 226271077
Query 190 CAATCATAATTGAAAAATTGCTGAAAAATATGTTACTCAAACTTTTTAAAAATTCTATCC 249
|||||||||||||||||||| ||||||||||||||||| |||||||||||||||||||||
Sbjct 226271078 CAATCATAATTGAAAAATTGGTGAAAAATATGTTACTCGAACTTTTTAAAAATTCTATCC 226271137
Query 250 AGTGTTTGTTGGATTCTCCATAAGTTGTACATTTTTGGAGGATCTAACACCAGCACGACC 309
|||||||||||||||||||| ||||||||||||||||||||||||||||| || ||||||
Sbjct 226271138 AGTGTTTGTTGGATTCTCCAAAAGTTGTACATTTTTGGAGGATCTAACACGAGTACGACC 226271197
Query 310 ATATTCTCGGAACTATATAAACCAAGTGTGTGTTTCATAGTAATTTTTTCTTATCAGATC 369
||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||
Sbjct 226271198 ATATTCTCGGAACAATATAAACCAAGTGTGTGTTTCATAGTAATTTTTTCTTATCAGATC 226271257
Query 370 CTTCCAAAATACACTATCACTATTCTGATGGATTTTTCTTTTGACCAAATTTTATTGCC- 428
||| |||||||||||||||||||||||||||||||||||||||| ||||||||||||
Sbjct 226271258 TTTCAAAAATACACTATCACTATTCTGATGGATTTTTCTTTTGACAAAATTTTATTGCAG 226271317
Query 429 AATTTGTTCAATGCCAGTATATACACTTCATCTAGTACAATAAGAATGAGCGCTTTCAGA 488
||||||||||||||||||||| ||||||||||||||||||| |||||| || ||||||||
Sbjct 226271318 AATTTGTTCAATGCCAGTATAGACACTTCATCTAGTACAAT-AGAATGGGCACTTTCAGA 226271376
Query 489 AATGATAAAAAATCCCAAAATTCTCGAAACATCCCAACAAGAAATGGACCATATTGTTGG 548
||||||||||||||||||||||||| ||| ||| ||||||||||||||||| ||| ||||
Sbjct 226271377 AATGATAAAAAATCCCAAAATTCTCAAAAAATCACAACAAGAAATGGACCAAATTATTGG 226271436
Query 549 AAAAAATAGACGTCTAATTGAATCTGATATTCCAAATCTACCTTAATTACGAGTAGTTTG 608
||||||||||||| ||||||||||||||||||||||||||||||| ||||| | | ||||
Sbjct 226271437 AAAAAATAGACGTTTAATTGAATCTGATATTCCAAATCTACCTTATTTACGTGCAATTTG 226271496
Query 609 CAAAGAANCATTTCNAAAACACCCTTCTACCCCAT 643
||||||| |||||| |||||||||||| || ||||
Sbjct 226271497 CAAAGAAGCATTTCGAAAACACCCTTCAACACCAT 226271531
Score = 211 bits (114), Expect = 4e-52
Identities = 181/215 (84%), Gaps = 1/215 (0%)
Strand=Plus/Plus
Query 429 AATTTGTTCAATGCCAGTATATACACTTCATCTAGTACAATAAGAATGAGCGCTTTCAGA 488
|||||||| | ||| ||| | ||||||||||||| |||| |||||| || ||| ||||
Sbjct 226259276 AATTTGTTTACTGCTGGTACAGACACTTCATCTAGCGCAAT-AGAATGGGCACTTGCAGA 226259334
Query 489 AATGATAAAAAATCCCAAAATTCTCGAAACATCCCAACAAGAAATGGACCATATTGTTGG 548
|||||| |||||| | | |||| | ||| | | |||||||||| |||||| ||| ||||
Sbjct 226259335 AATGATGAAAAATTCAACAATTTTGAAAAAAGCACAACAAGAAACGGACCAAATTATTGG 226259394
Query 549 AAAAAATAGACGTCTAATTGAATCTGATATTCCAAATCTACCTTAATTACGAGTAGTTTG 608
||||||||||||| ||||||||||||||||||||||||||||||| ||||| | | ||||
Sbjct 226259395 AAAAAATAGACGTTTAATTGAATCTGATATTCCAAATCTACCTTATTTACGTGCAATTTG 226259454
Query 609 CAAAGAANCATTTCNAAAACACCCTTCTACCCCAT 643
||||||| |||||| |||||||||||| || ||||
Sbjct 226259455 CAAAGAAACATTTCGAAAACACCCTTCAACACCAT 226259489
Hi,
I am using tracy assemble
to assemble between 2 - 4 trace files. I am outputting the consensus as a .fastq
file, and then aligning this to a reference sequence.
Downstream, I am performing some analysis that filters on per-nucleotide quality scores, and I am not sure that I understand how the these are translated from the base signal from the chromatogram to the base quality of the consensus calculated within tracy assemble
. Typically, I only see 2 different base quality scores on a consensus (e.g. 19 and 24).
Do you have any insight into this?
I'm calling tracy
like so:
tracy assemble \
--format fastq \
--inccons \
--trim 3 \
--outprefix ${colony_id} \
colony_1_p1.ab1 colony_1_p2.ab1
Would it be possible to implement a trim option to take a set number of bases after an initial trim threshold? e.g. for a sequence:
ACTGATCTACTAGATCCC
-q 5 -'crop' 10
we get:
ACTGA TCTACTAGAT CCC
Does that make sense?
Thanks!
Hi,
I'm looking to use Tracy to decompose alleles from several hundred sequence traces. It looks like a very useful tool - thanks for writing it.
My organism is quite heterozygous. I could expect that there would be more than one (many) variants within an allele.
Am I understanding that during decomposition it takes the first haplotype to be the closest match to the reference sequence, and denotes all other mutations present to the second haplotype?
Is it possible to instead ask it to:
Does the pearl command work this way? Or does it just align already decomposed sequences/primary basecalls from a non-decomposed sequence?
I realise either way it is still guessing the haplotype for the sequence.
Many thanks,
Jenni
Hi,
I am using Tracy (Version: 0.7.1) to call SNVs from sanger sequencing ab1 files
Command is listed as follows,
./tracy decompose -o forward -a homo_sapiens -r /bioinfo/data/Genomes/NCBI/build37/Sequence/WholeGenomeFasta/human_g1k_v37.fasta GW21T135C07-2_K-E11-F_TSS20210511-021-02103_G01.ab1
However, "Only single-chromosome FASTA files are supported" was returned, and bcf file was not generated as expected.
Any suggestions to fix the issue?
Thanks,
Junfeng
I used the following command:
tracy basecall -f json -o output_file_path input_file_path
I expected to get the name (identifier) in the output json file.
It would be useful if I can get the name using the same above API
Can tracy decompose write separate output ab1 files for the decomposed alleles?
hello.
I assemble to sanger sequences used tracy assemble.
however, I want to use this tools myself, but I don't understand all of options.
so, i find manual in tracy github and paper, there was no mention of an explanation anywhere.
I want to know these options detaily.
Thanks.
young yu.
Hello,
I have used tracy to call and annotate variants from human samples and it worked well. I have sanger sequenced samples of P.falciparum.
I am inquiring if there's a way i could use tracy to call and annotate these samples
Looking at the tutorial and examples, you should be able to perform de novo assembly by using tracy assembly
without a reference file. But when I attempt to do this with my ab1 files, it says that I need to specify a reference file. Example:
$ tracy assemble *ab1
[2021-Mar-26 13:22:34] tracy assemble 275F.ab1 275F-RC.ab1 3AccOut-RC.ab1 3InOut-RC.ab1 4-3LTR.ab1 4-eGFP-C.ab1 4-Frag-21-L.ab1 4-Frag-26-R-RC.ab1 5AccOut.ab1 5INOut.ab1 5LTR.ab1 eGFP-C-RC.ab1 eGFP-N-RC.ab1 Frag-05-L.ab1 Frag-09-L.ab1 Frag-10-L.ab1 Frag-14-R.ab1 Frag-14-R-RC.ab1 Frag-17-L.ab1 Frag-17-L-RC.ab1 Frag-18-L.ab1 Frag-21-L.ab1 Frag-22-R.ab1 Frag-22-R-RC.ab1 Frag-25-L.ab1 Frag-26-R-RC.ab1 Frag-27-L.ab1 Frag-33-R-RC.ab1 Frag-36-R.ab1 Frag-36-R-RC.ab1 Frag-37-L.ab1 Frag-39-L.ab1
Please specify a reference file!
Any insight would be greatly appreciated. Thanks!
I am using Tracy to generate consensus, and then the mutations are being analysed using Mummer. But when I was comparing the result with the Pearl (online) of the gear-genomics, some mutations were not seen. I traced the problem back to the base-calling. So, if it is possible to update the algorithm of Tracy to that of Pearl for base-calling, it will be great.
It is just my observation. I am not an expert. Excuse me if I am wrong.
tracy decompose -v -o outprefix -g ref.fna Sample.ab1
used this command for variant calling, getting a json file but no bcf
I assembled the contigs with a reference to obtain the consensus shown below. However, the consensus generated a "A" where there was supposed to be a "G", based on majority rule. Is there a way I can make consensus generate a "G" based on the majority of bases present? some paramters I can set somehow?
Hi,
I'm experiencing a very similar issue to #34 .
We've sequenced a chunk of the human MBL2 gene of ~250 nt. However, the machine sequences almost 1000 nucleotides; therefore, most of the sequence in the ab1 file is just rubbish.
For that reason I've decided to set -q 50 -u 750
, so that the first 50 low-quality bases and the last 750 (false) bases are excluded.
I tried first with the whole GATK GRCh38 genome and got the error Couldn't anchor the Sanger trace in the selected reference genome.
when running both Indigo (setting left and right trim sizes to 50 and 750, respectively) and tracy in the command line as follows:
tracy decompose -o forward -r Homo_sapiens_assembly38.fasta.gz -q 50 -u 750 MF-102_MBL2.ab1
[2023-May-04 12:10:37] tracy decompose -o forward -a homo_sapiens -r Homo_sapiens_assembly38.fasta.gz -q 50 -u 750 MF-102_2MBL2.ab1
[2023-May-04 12:10:37] Load ab1 file
[2023-May-04 12:10:37] Find Reference Match
[2023-May-04 12:10:37] Load FM-Index
Couldn't anchor the Sanger trace in the selected reference genome.
As you pointed out here, that issue could be circumvented using a shorter sequence as a reference file.
Then I downloaded and indexed the fasta file for the MBL2 gene and repeated the process with the same parameters. Although it works well now with Indigo, tracy still fails with the same error message in the command line.
I'm using tracy v0.7.5
singularity container in CentOS 7.9.
Hello there,
I was getting started with Tracy and the following error comes when I try to run the program from the conda installation and the precompiled binary,
(base) [larteag7@apolo banano-cultivables]$ conda create -n sangering bioconda::tracy bioconda::seqtk (sangering) [larteag7@apolo banano-cultivables]$ tracy basecall -f fastq B100_907R.ab1 FATAL: kernel too old Aborted (sangering) [larteag7@apolo banano-cultivables]$ tracy --help FATAL: kernel too old Aborted
Is there anyway to adress it?
Thanks in advance,
Luis Alfonso.
When I run these commands:
bgzip reference1.fasta
tracy index -o reference.fasta.fm9 reference.fasta.gz
tracy align --reference reference1.fasta.gz input1.ab1
I get a different result to doing
tracy align --reference reference1.fasta input1.ab1
The input files (reference1.fasta and input1.ab1) can be found here.
Here is the start of the alignment for the indexed case:
>input1
--------TTTTTTTTTGAGCGGGTCGAACCGTCACGAAAAGAAAAGGGGAAGAACCATCAGCAGGAGTAATCCGTATTTTAATTGGATCCACAT-TCATAGCAAACACCAAAAATCCATATTGGGACCACAATCCCAACAAAGACCACTGGAC-
AGAAGCCAACAAGGTAGGAGTGGGAG-CATTCGG-GCCTGGGTTC--ACTCCCCCACACGGAGGCCTTTTGGGGTGGAGCCCTCAGGCTCAAGGCATGCTGACAACATTACCAGCAAATCCGCCTCCTGCCTCCACCAATCGACAGTCAGGAAGG
CAGCCTACCCCAATCACTCCACCTTTGAG-AGACACTCATCCTCAGGCCATGCAGTGGAATTCCACAACATTCCACCAAGCTCTGCAGGATCCCAGAGTAAATCCTGCTGGTGGCTCCAGTTCCGGAACAGTGAACCCTG-TTCCGACTACTGCC
TCACTCATCTCGTCAATCTTCTCGAGGATTGGGGACCCTGCACCGAACATGGAAAGCATCACATCAGGATTCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAAAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTG
GACTTCTCTCAATTTTCTAGGGGGAGCT--CCCGTGTGTCTTGGCCAAAATTCTCAGT--CCCAAACCTCCAGTCACTCACCAACCTCTTGTCCTCCAATTTGTCCTGCCTATCGCTGGATGTGTCTGCGGCGGTTTATCATATTCCT-CTTCAT
CGTGCTGCT------ATGCCTCATCATCCTGTTGGGTTCGTCTGCACCATCAAAAGAATGTTGCCCCGGGTTGTGATTAAAAATTCCAAGGAGCAAGAAAGCCACCCACTACGGGAACCAGGGCCGGAAGCTGAACAAGTCATTTTTCAAAGGAA
AATGAAAGATTTTCTTTCTTATTTGTGGGGGAAAAGCAAAAAAAGGAAAAAGGAAATTGGGGTTACAAACCCCACCCCCAAGGGATTTGGG--AAATACCATATTTTAAAGGGGAAAGGGCCGCATAACCCATTAAAAATTGCATATTTTAAATT
TTTTTTTTTTGAGAAAGAGGGGGGAC-------------------
and the same for the alignment without indexing:
>input1
TTTTTTTTTGAGCGGGTCGAACCGTCACGAAAAGAAAAGGGGAAGAACCATCAGCAGGAGTAATCCGTATTTTAATTGGATCCACATTCATAGCAAACACCAAAAATCCATATTGGGACCACAATCCCAACAAAGACCACTGGACAGAAGCCAAC
AAGGTAGGAGTGGGAGCATTCGGGCCTGGGTTCACTCCCCCACACGGAGGCCTTTTGGGGTGGAGCCCTCAGGCTCAAGGCATGCTGACAACATTACCAGCAAATCCGCCTCCTGCCTCCACCAATCGACAGTCAGGAAGGCAGCCTACCCCAAT
CACTCCACCTTTGAGAGACACTCATCCTCAGGCCATGCAGTGGAATTCCACAACATTCCACCAAGCTCTGCAGGATCCCAGAGTAAA------------TCCTGCTGGTGGCTCCAGTTCCGGAACAGTGAACCCTGTTCCGACTACTGCCTCAC
TCATCTCGTCAATCTTCTCGAGGATTGGGGACCCTGCACCGAACATGGAAAGCATCACATCAGGATTCCTAGGACCCCTGCTCGTGTTACAGGCGGGGTTTTTCTTGTTGACAAAAATCCTCACAATACCGCAGAGTCTAGACTCGTGGTGGACT
TCTCTCAATTTTCTAGGGGGAGCTCCCGTGTGTCTTGGCCAAAATTCTCAGTCCCAAACCTCCAGTCACTCACCAACCTCTTGTCCTCCAATTTGTCCTGCCTATCGCTGGATGTGTCTGCGGCGGTTTATCATATTCCTCTTCATCGTGCTGCT
ATGCCTCATCATCCTGTTGGGTTCGTCTGCACCATCAAAAGAATGTTGCCCCGGGTTGTGATTAAAAATTCCAAGGAGCAAGAAAGCCACCCACTACGGGAACCAGGGCCGGAAGCTGAACAAGTCATTTTTCAAAGGAAAATGAAAGATTTTCT
TTCTTATTTGTGGGGGAAAAGCAAAAAAAGGAAAAAGGAAATTGGGGTTACAAACCCCACCCCCAAGGGATTTGGGAAATACCATATTTTAAAGGGGAAAGGGCCGCATAACCCATTAAAAATTGCATATTTTA-AATTTTTTTTTTTTGAGAAA
GAGGGGGGAC--------
(Tracy version 0.6.1)
Hi,
Thanks for maintaining Tracy!
There are remarkably few (reliable) programs available for generating consensus sequences from forward and reverse Sanger data! Luckily, Tracy exists; I'm generating consensus sequences of fungal ITS from Sanger forward and reverse seqs in the following way:
# basecall the forward seq for use as reference
tracy basecall -f fasta -o ref.fa forward.ab1
# assemble reverse using forward seq as reference
tracy assemble -t 4 -d 1 -r ref.fa -o con reverse.ab1
Then I'm extracting the gap free consensus sequence from the output JSON. I am assembling with -d 1 on the understanding that it will ensure the consensus sequence only includes bases where the reads match.
This works so far, but it feels a bit hacky and there are some niceties which would be extremely helpful. Most importantly, in situations such as A - N it would be nice to be able to take A into the consensus sequence rather than dropping it when -d 1. It would also be nice to be able to output the consensus sequence directly. What do you think?
I am trying to assemble some old (c.a. 2010) Sanger traces and I'm logging the following error.
[2022-Feb-17 15:06:11] tracy assemble --inccons -o data0/tracy_assemble/6_512_1A04 ./data0/traces/6_512_1A04_ITS4_R0.ab1 ./data0/traces/6_512_1A04_ITS1F_R0.ab1
[2022-Feb-17 15:06:11] Load ab1 files
SCF version greater 2.9 required!
Why is this the case, and is there any way we can work around this? In my opinion, it's a limitation to Tracy since so much .ab1 data is now old/outdated.
Thanks for maintaining Tracy!
What are the units of the trimming stringency in the assemble
command?
-t [ --trim ] arg (=4) trimming stringency [1:9]
What does 1-9 mean and may you please provide short description or refer us to the description of the trimming algorithm?
I have done assembly using different trimming stringency and got the following results:
trim length f_leadingGaps f_mid_gaps m_f_gap_indexes f_trailingGaps r_leadingGaps rev_mid_gaps m_r_gap_indexes r_trailingGaps total_gaps sequence
0 1 348 39 2 [4, 9] 0 0 0 [] 42 41 "TTTGATCGTGGCTCAGGACGAACGCTGGCGGCGTGCCTAATACATGCAAGTAGAACGCTGAGAACTGGTGCTTGCACCGGTTCAAGGAGTTGCGAACGGGTGAGTAACGCGTAGGTAACCTACCTCATAGCGGGGGATAACTATTGGAAACGATAGCTAATACCGCATAAGAGAGACTAACGCATGTTAGTAATTTAAAAGGGGCAATTGCTCCACTATGAGATGGACCTGCGTTGTATTAGCTAGTTGGTGAGGTAAAGGCTCACCAAGGCGACGATACATAGCCGACCTGAGAGGGTGATCGCCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGCGGC"
1 2 315 133 0 [] 0 0 0 [] 120 133 "CGAACGCTGGCGGCGTGCCTAATACATGCAAGTAGAACGCTGAGAACTGGTGCTTGCACCGGTTCAAGGAGTTGCGAACGGGTGAGTAACGCGTAGGTAACCTACCTCATAGCGGGGGATAACTATTGGAAACGATAGCTAATACCGCATAAGAGAGACTAACGCATGTTAGTAATTTAAAAGGGGCAATTGCTCCACTATGAGATGGACCTGCGTTGTATTAGCTAGTTGGTGAGGTAAAGGCTCACCAAGGCGACGATACATAGCCGACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAG"
2 3 320 57 0 [] 0 0 0 [] 52 57 "GGACGAACGCTGGCGGCGTGCCTAATACATGCAAGTAGAACGCTGAGAACTGGTGCTTGCACCGGTTCAAGGAGTTGCGAACGGGTGAGTAACGCGTAGGTAACCTACCTCATAGCGGGGGATAACTATTGGAAACGATAGCTAATACCGCATAAGAGAGACTAACGCATGTTAGTAATTTAAAAGGGGCAATTGCTCCACTATGAGATGGACCTGCGTTGTATTAGCTAGTTGGTGAGGTAAAGGCTCACCAAGGCGACGATACATAGCCGACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAGAC"
3 4 327 39 0 [] 0 0 0 [] 46 39 "CTCAGGACGAACGCTGGCGGCGTGCCTAATACATGCAAGTAGAACGCTGAGAACTGGTGCTTGCACCGGTTCAAGGAGTTGCGAACGGGTGAGTAACGCGTAGGTAACCTACCTCATAGCGGGGGATAACTATTGGAAACGATAGCTAATACCGCATAAGAGAGACTAACGCATGTTAGTAATTTAAAAGGGGCAATTGCTCCACTATGAGATGGACCTGCGTTGTATTAGCTAGTTGGTGAGGTAAAGGCTCACCAAGGCGACGATACATAGCCGACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAGACTCC"
4 5 336 40 0 [] 0 0 0 [] 49 40 "GTGGCTCAGGACGAACGCTGGCGGCGTGCCTAATACATGCAAGTAGAACGCTGAGAACTGGTGCTTGCACCGGTTCAAGGAGTTGCGAACGGGTGAGTAACGCGTAGGTAACCTACCTCATAGCGGGGGATAACTATTGGAAACGATAGCTAATACCGCATAAGAGAGACTAACGCATGTTAGTAATTTAAAAGGGGCAATTGCTCCACTATGAGATGGACCTGCGTTGTATTAGCTAGTTGGTGAGGTAAAGGCTCACCAAGGCGACGATACATAGCCGACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGG"
5 6 340 40 1 [3] 0 0 0 [] 49 41 "TCGTGGCTCAGGACGAACGCTGGCGGCGTGCCTAATACATGCAAGTAGAACGCTGAGAACTGGTGCTTGCACCGGTTCAAGGAGTTGCGAACGGGTGAGTAACGCGTAGGTAACCTACCTCATAGCGGGGGATAACTATTGGAAACGATAGCTAATACCGCATAAGAGAGACTAACGCATGTTAGTAATTTAAAAGGGGCAATTGCTCCACTATGAGATGGACCTGCGTTGTATTAGCTAGTTGGTGAGGTAAAGGCTCACCAAGGCGACGATACATAGCCGACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGC"
6 7 343 40 1 [5] 0 0 0 [] 47 41 "GATCGTGGCTCAGGACGAACGCTGGCGGCGTGCCTAATACATGCAAGTAGAACGCTGAGAACTGGTGCTTGCACCGGTTCAAGGAGTTGCGAACGGGTGAGTAACGCGTAGGTAACCTACCTCATAGCGGGGGATAACTATTGGAAACGATAGCTAATACCGCATAAGAGAGACTAACGCATGTTAGTAATTTAAAAGGGGCAATTGCTCCACTATGAGATGGACCTGCGTTGTATTAGCTAGTTGGTGAGGTAAAGGCTCACCAAGGCGACGATACATAGCCGACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGCG"
7 8 345 40 1 [6] 0 0 0 [] 45 41 "TGATCGTGGCTCAGGACGAACGCTGGCGGCGTGCCTAATAAATGCAAGTAGAACGCTGAGAACTGGTGCTTGCACCGGTTCAAGGAGTTGCGAACGGGTGAGTAACGCGTAGGTAACCTACCTCATAGCGGGGGATAACTATTGGAAACGATAGCTAATACCGCATAAGAGAGACTAACGCATGTTAGTAATTTAAAAGGGGCAATTGCTCCACTATGAGATGGACCTGCGTTGTATTAGCTAGTTGGTGAGGTAAAGGCTCACCAAGGCGACGATACATAGCCGACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGCGG"
8 9 348 39 2 [4, 9] 0 0 0 [] 42 41 "TTTGATCGTGGCTCAGGACGAACGCTGGCGGCGTGCCTAATACATGCAAGTAGAACGCTGAGAACTGGTGCTTGCACCGGTTCAAGGAGTTGCGAACGGGTGAGTAACGCGTAGGTAACCTACCTCATAGCGGGGGATAACTATTGGAAACGATAGCTAATACCGCATAAGAGAGACTAACGCATGTTAGTAATTTAAAAGGGGCAATTGCTCCACTATGAGATGGACCTGCGTTGTATTAGCTAGTTGGTGAGGTAAAGGCTCACCAAGGCGACGATACATAGCCGACCTGAGAGGGTGATCGCCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGCGGC"
Trim 1
and trim 9
produced identical consensus sequences, which does not make sense.
Another interesting observation is that trim value 2
has a very high values for the leadingGaps
and trailingGaps
both for F and R strands.
May you please provide short description of the trimming algorithm to understand whats going on.
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.