Code Monkey home page Code Monkey logo

genomethreader's Introduction

GenomeTools

Build Status DebianSBadge DebianTBadge Ubuntu package homebrew version AUR version

The GenomeTools genome analysis system is a free collection of bioinformatics tools (in the realm of genome informatics) combined into a single binary named gt. It is based on a C library named libgenometools which contains a wide variety of classes for efficient and convenient implementation of sequence and annotation processing software.

Overview

If you are interested in gene prediction, have a look at GenomeThreader.

GenomeTools has been designed to run on every POSIX compliant UNIX system, for example, Linux, macOS, and OpenBSD.

Debian-based operating systems

Debian and Ubuntu users can install the most recent stable version simply using apt, e.g.

apt-get install genometools

(as root) to install the gt executable. To install the library and development headers, use

apt-get install libgenometools0 libgenometools0-dev

instead. This is not required to just use the tools.

macOS (via Homebrew)

If Homebrew is installed, GenomeTools can be installed on supported macOS versions using brew:

brew install genometools

Building from source

To use GenomeTools on systems that do not have native packages, or to modify GenomeTools at build time, you need to build from source. Source tarballs are available from GitHub. For instructions on how to build the source by yourself, have a look at the INSTALL file. In most cases (e.g. on a 64-bit Linux system) something like

make -j4

should suffice. On 32-bit systems, add the 32bit=yes option. Add cairo=no if you do not have the Cairo libraries and their development headers installed. This will, however, remove AnnotationSketch support from the resulting binary. When your binary has been built, use the install target and prefix option to install the compiled binary on your system. Make sure you repeat all the options from the original make run. So

make -j4 install prefix=~/gt

would install the software in the gt subdirectory in the current user's home directory. If no prefix option is given, the software will be installed system-wide (requires root access).

GenomeTools uses a collective code construction contract for contributions (and the process explains how to submit a patch). Basically, just fork this repository on GitHub, start hacking on your own feature branch and submit a pull request when you are ready. Our recommended coding style is explained in the developer's guide (among other technical guidelines).

To report a bug, ask a question, or suggest new features, use the GenomeTools issue tracker.

genomethreader's People

Contributors

gordon avatar satta avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

genomethreader's Issues

How to merge mutiple GFF3 files?

Dear the authors:
Thanks for devoloping such an useful program. Now I have a question that if I have mutiple GFF3 files to train bssm, how can I merge these files for the gthbssmtrain program? I do not know how to sort the file as the program put an error:

"error: the file testPB.gff is not sorted (example: line 9 and 14)"

Thanks a gain and looking forward to your valuable help.

formatting puzzle

For some proteins, I get the following funny offset in the output (for position 121). Not consistent. Must be a trivial bug somewhere. Volker

query
MPAWPAAAVAKAVPSPSTPPPPHSRGAGRRRLRPCGAKKGPGTDERGATAGGGGVVTRGA
LLRSGAALFALGFVDAGYSGDWSRIGAISKDTEEALKLAAYAVVPLCLAVVFSPSSEDGS
NNT*

gdna
AACCGTTTGCATCGGCGGGAGTGCCAGGTTTTGTCCGTACCTCGCACCCCGAGAGCGTGCAGCGCTCCACGTTCTGCTGCTAACGCACGTGTGGACGAGAAGCAAACACCCAAACACCCAACTACCACCGCGCACACAGCACAGCATGCCGGCGTGGCCCGCCGCCGCCGTCGCCAAGGCCGTCCCGTCCCCGTCCACGCCGCCGCCGCCGCACTCGCGCGGAGCAGTCCGCCGTCGCCTCCGCCCGTGCGGCGCAAAGAAGGGCCCCGGGACCGACGAGCGAGGTGCCACCGCCGGCGGCGGCGGCGTGGTCACCAGGGGCGCCCTCCTGCGGTCCGGCGCCGCTCTCTTCGCGCTCGGCTTCGTCGACGCCGGGTGAGTCGCACGCACGCGCGCGCGCGCGCGCGCTGATGACGGATTTCCATTGCGTTTTGCACCGGTGGTCGATCTGGTCGGTCTCACCCGTTTCGGGGCGGCTTTGCTGCAGGTACAGCGGCGACTGGTCACGCATCGGCGCCATCTCCAAGGACACCGAGGAGGCGCTCAAGCTGGCCGCCTACGCCGTCGTGCCTCTCTGCCTCGCCGTCGTGTTCTCGCCATCGTCGGAAGACGGCAGTAACAACACCTGAATCCTAGGAAGAAGAGCCATGCTGCAGATCATGTAGTCGTCGTGATACCATCATGCTTACGTGTTGTATACTTCGATCGTGATCGCAATGGGCATTTCTTTTGCCTAAGGGGTGTGTTATCAGCAGTGAAAAGTTTTGGCGTGTCACATCGAATATACGAATATAGATTGATAGCAAAACAAATTGCAGATTCCGTCTATAAATTACGAGACGAATTTATTATTTAATTAATCCATCATTAGCAAATATTTACTGTACCACCACATTATCAAATCATAGAACAATTAGGCTTAAAAGATTCATCTCGTAATTTACACACAATCTGTGTAATTAGTTATTTACATTTAATGTGACTGAGTGAAAATTTTTTG

gth -protein query -genomic gdna -species rice > out
more out
$ GenomeThreader 1.7.1
$ Date run: 2021-01-06 14:53:45
$ Arguments: -protein query -genomic gdna -species rice


Protein Sequence: file=query, description=query

1 MPAWPAAAVA KAVPSPSTPP PPHSRGAGRR RLRPCGAKKG PGTDERGATA GGGGVVTRGA
61 LLRSGAALFA LGFVDAGYSG DWSRIGAISK DTEEALKLAA YAVVPLCLAV VFSPSSEDGS
121 NNT*

Genomic Template: file=gdna, strand=+, from=1, to=926, description=gdna
....

about -noautoindex

Hi Gordon et al.,
I have been trying to use gth in parallel, using a combination of -noautoindex and -intermediate. I got it to work in a roundabout way only, because I could not figure out how to make a proper index with either mkvtree or a first run of gth. The problem I ran into was presence/absence of .dna in the index files. Below is a script how I got around it. Surely there must be a more elegant solution?
Happy New Year, Volker

#/bin/bash!

NUMPRC=24
GENOME=IRBB7unm.fa
CDNAFILE=IRBB7trinityTranscripts.fa

We'll run a toy spliced alignment to create the genome index:

head -2 IRBB7trinityTranscripts.fa > tmpcdna
gth -genomic ${GENOME} -cdna tmpcdna -species rice

... if everything worked as planned, we should now have the

genome index files and can go ahead with the real work in parallel.

However, the created genome index files have the extra tag .dna,

which is then not recognized when using the -noautoindex option to

gth next. As a workaround, we rename the index files to get rid of

the .dna tag. Then gth -noautoindex works (it seems to copy the index

files it needs, using again the .dna tag, but now we seem to have a

working index ...).

ls -1 ${GENOME}.dna* > tmpcmda
cat tmpcmda | sed -e "s/.dna//" > tmpcmdb
sed -i -e "s/^/mv /" tmpcmda
paste tmpcmda tmpcmdb | bash
gth -noautoindex -genomic ${GENOME} -cdna tmpcdna -species rice
\rm tmpcmda tmpcmdb tmpcdna*

gt splitfasta -numfiles ${NUMPRC} ${CDNAFILE}

for cdnafile in ${CDNAFILE}.*
do
gth -noautoindex -intermediate -xmlout -gzip -o gth.${cdnafile}.gz -genomic ${GENOME} -cdna ${cdnafile} -species rice &
done
wait
echo "... gth intermediate run done"

gthconsensus -o gth.TranscriptsOnIRBB7 gth.${CDNAFILE}.*.gz
echo "... gthconsensus run done"

Slightly different gene structures produced on i386

Hi,

I am investigating a build failure on Debian sid i386 (see https://buildd.debian.org/status/fetch.php?pkg=genomethreader&arch=i386&ver=1.7.3%2Bdfsg-2&stamp=1579736664&raw=0), the only failure across all supported architectures (see https://buildd.debian.org/status/package.php?p=genomethreader).
The reason for this failure is that apparently this 32-bit version outputs two exons in the U89959 test case differently. I confirmed that this issue also appears in the binary release version downloaded from the official GenomeThreader site by copying the testdata and testsuite directories from the GenomeThreader source into the extracted binary distribution directory and running testsuite.rb:

(sid-i386)root@debian:/tmp/gth-1.7.3-Linux_i386-32bit/testsuite# ./testsuite.rb
  1: gth                                                         : ok
  2: gth -help                                                   : ok
  3: gth -help+                                                  : ok
  4: gth -version                                                : ok
  5: gth regression test (assertion in gthsafilter)              : ok
  6: gth regression test (-gff3out -intermediate)                : ok
  7: gth regression test (-gff3out -intermediate -fastdp)        : ok
  8: gth regression test (-gff3out -skipalignmentout)            : ok
  9: gth regression test (-gff3out -skipalignmentout -fastdp)    : ok
[...]
 50: fastdp (U89959)                                             : failed
     [ problem: unexpected return code: 1 != 0
       in: /tmp/gth-1.7.3-Linux_i386-32bit/testsuite/stest_testsuite/test50 ]
 51: fastdp (U89959, introncutout)                               : failed
     [ problem: unexpected return code: 1 != 0
       in: /tmp/gth-1.7.3-Linux_i386-32bit/testsuite/stest_testsuite/test51 ]
[...]
(sid-i386)root@debian:/tmp/gth-1.7.3-Linux_i386-32bit/testsuite# cat /tmp/gth-1.7.3-Linux_i386-32bit/testsuite/stest_testsuite/test50/stdout_2
1333,1336c1333,1336
< 1877523	gth	exon	105010	105218	0.861	+	.	Parent=gene174
< 1877523	gth	five_prime_cis_splice_site	105219	105220	0	+	.	Parent=gene174
< 1877523	gth	three_prime_cis_splice_site	105303	105304	0	+	.	Parent=gene174
< 1877523	gth	exon	105305	105428	0.887	+	.	Parent=gene174
---
> 1877523	gth	exon	105010	105223	0.86	+	.	Parent=gene174
> 1877523	gth	five_prime_cis_splice_site	105224	105225	0	+	.	Parent=gene174
> 1877523	gth	three_prime_cis_splice_site	105308	105309	0	+	.	Parent=gene174
> 1877523	gth	exon	105310	105428	0.891	+	.	Parent=gene174
1345,1348c1345,1348
< 1877523	gth	exon	105009	105218	0.862	+	.	Parent=gene175
< 1877523	gth	five_prime_cis_splice_site	105219	105220	0	+	.	Parent=gene175
< 1877523	gth	three_prime_cis_splice_site	105303	105304	0	+	.	Parent=gene175
< 1877523	gth	exon	105305	105428	0.887	+	.	Parent=gene175
---
> 1877523	gth	exon	105009	105223	0.86	+	.	Parent=gene175
> 1877523	gth	five_prime_cis_splice_site	105224	105225	0	+	.	Parent=gene175
> 1877523	gth	three_prime_cis_splice_site	105308	105309	0	+	.	Parent=gene175
> 1877523	gth	exon	105310	105428	0.891	+	.	Parent=gene175

These tests were run in an i386 Debian sid chroot.

Is this a bug or is this deviation known or acceptable?

Genomethreader treatment of pseudogenes

Hello,

Can someone comment on what genomeThreader does with pseudogenes? Are they kept or thrown out? I've been digging around in the manual and Gordon's thesis, but I haven't seen anything explicit confirm either way.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.