genometools / genomethreader Goto Github PK

GenomeThreader gene prediction software.

License: ISC License

Makefile 0.50% Shell 0.14% C 36.95% C++ 1.27% Ruby 1.98% HTML 0.96% CSS 0.04% Objective-C 58.15%

genomethreader's Introduction

GenomeTools

The GenomeTools genome analysis system is a free collection of bioinformatics tools (in the realm of genome informatics) combined into a single binary named gt. It is based on a C library named libgenometools which contains a wide variety of classes for efficient and convenient implementation of sequence and annotation processing software.

If you are interested in gene prediction, have a look at GenomeThreader.

Platforms

GenomeTools has been designed to run on every POSIX compliant UNIX system, for example, Linux, macOS, and OpenBSD.

Building and Installation

Debian-based operating systems

Debian and Ubuntu users can install the most recent stable version simply using apt, e.g.

apt-get install genometools

(as root) to install the gt executable. To install the library and development headers, use

apt-get install libgenometools0 libgenometools0-dev

instead. This is not required to just use the tools.

macOS (via Homebrew)

If Homebrew is installed, GenomeTools can be installed on supported macOS versions using brew:

brew install genometools

Building from source

To use GenomeTools on systems that do not have native packages, or to modify GenomeTools at build time, you need to build from source. Source tarballs are available from GitHub. For instructions on how to build the source by yourself, have a look at the INSTALL file. In most cases (e.g. on a 64-bit Linux system) something like

make -j4

should suffice. On 32-bit systems, add the 32bit=yes option. Add cairo=no if you do not have the Cairo libraries and their development headers installed. This will, however, remove AnnotationSketch support from the resulting binary. When your binary has been built, use the install target and prefix option to install the compiled binary on your system. Make sure you repeat all the options from the original make run. So

make -j4 install prefix=~/gt

would install the software in the gt subdirectory in the current user's home directory. If no prefix option is given, the software will be installed system-wide (requires root access).

Contributing

GenomeTools uses a collective code construction contract for contributions (and the process explains how to submit a patch). Basically, just fork this repository on GitHub, start hacking on your own feature branch and submit a pull request when you are ready. Our recommended coding style is explained in the developer's guide (among other technical guidelines).

To report a bug, ask a question, or suggest new features, use the GenomeTools issue tracker.

genomethreader's People

Contributors

Stargazers

Watchers

Forkers

satta gordon shiyi-pan wook2014 crawlingsponge

genomethreader's Issues

How to merge mutiple GFF3 files?

Dear the authors:
Thanks for devoloping such an useful program. Now I have a question that if I have mutiple GFF3 files to train bssm, how can I merge these files for the gthbssmtrain program? I do not know how to sort the file as the program put an error:

"error: the file testPB.gff is not sorted (example: line 9 and 14)"

Thanks a gain and looking forward to your valuable help.

how to set -species parameter ？

Hi,

My target species is populus, which filename I should choose? rice or arabidopsis or others ?

Thanks in advance
Bai

formatting puzzle

For some proteins, I get the following funny offset in the output (for position 121). Not consistent. Must be a trivial bug somewhere. Volker

query
MPAWPAAAVAKAVPSPSTPPPPHSRGAGRRRLRPCGAKKGPGTDERGATAGGGGVVTRGA
LLRSGAALFALGFVDAGYSGDWSRIGAISKDTEEALKLAAYAVVPLCLAVVFSPSSEDGS
NNT*

gdna
AACCGTTTGCATCGGCGGGAGTGCCAGGTTTTGTCCGTACCTCGCACCCCGAGAGCGTGCAGCGCTCCACGTTCTGCTGCTAACGCACGTGTGGACGAGAAGCAAACACCCAAACACCCAACTACCACCGCGCACACAGCACAGCATGCCGGCGTGGCCCGCCGCCGCCGTCGCCAAGGCCGTCCCGTCCCCGTCCACGCCGCCGCCGCCGCACTCGCGCGGAGCAGTCCGCCGTCGCCTCCGCCCGTGCGGCGCAAAGAAGGGCCCCGGGACCGACGAGCGAGGTGCCACCGCCGGCGGCGGCGGCGTGGTCACCAGGGGCGCCCTCCTGCGGTCCGGCGCCGCTCTCTTCGCGCTCGGCTTCGTCGACGCCGGGTGAGTCGCACGCACGCGCGCGCGCGCGCGCGCTGATGACGGATTTCCATTGCGTTTTGCACCGGTGGTCGATCTGGTCGGTCTCACCCGTTTCGGGGCGGCTTTGCTGCAGGTACAGCGGCGACTGGTCACGCATCGGCGCCATCTCCAAGGACACCGAGGAGGCGCTCAAGCTGGCCGCCTACGCCGTCGTGCCTCTCTGCCTCGCCGTCGTGTTCTCGCCATCGTCGGAAGACGGCAGTAACAACACCTGAATCCTAGGAAGAAGAGCCATGCTGCAGATCATGTAGTCGTCGTGATACCATCATGCTTACGTGTTGTATACTTCGATCGTGATCGCAATGGGCATTTCTTTTGCCTAAGGGGTGTGTTATCAGCAGTGAAAAGTTTTGGCGTGTCACATCGAATATACGAATATAGATTGATAGCAAAACAAATTGCAGATTCCGTCTATAAATTACGAGACGAATTTATTATTTAATTAATCCATCATTAGCAAATATTTACTGTACCACCACATTATCAAATCATAGAACAATTAGGCTTAAAAGATTCATCTCGTAATTTACACACAATCTGTGTAATTAGTTATTTACATTTAATGTGACTGAGTGAAAATTTTTTG

gth -protein query -genomic gdna -species rice > out
more out
$ GenomeThreader 1.7.1
$ Date run: 2021-01-06 14:53:45
$ Arguments: -protein query -genomic gdna -species rice

Protein Sequence: file=query, description=query

1 MPAWPAAAVA KAVPSPSTPP PPHSRGAGRR RLRPCGAKKG PGTDERGATA GGGGVVTRGA
61 LLRSGAALFA LGFVDAGYSG DWSRIGAISK DTEEALKLAA YAVVPLCLAV VFSPSSEDGS
121 NNT*

Genomic Template: file=gdna, strand=+, from=1, to=926, description=gdna
....

about -noautoindex

Hi Gordon et al.,
I have been trying to use gth in parallel, using a combination of -noautoindex and -intermediate. I got it to work in a roundabout way only, because I could not figure out how to make a proper index with either mkvtree or a first run of gth. The problem I ran into was presence/absence of .dna in the index files. Below is a script how I got around it. Surely there must be a more elegant solution?
Happy New Year, Volker

#/bin/bash!

NUMPRC=24
GENOME=IRBB7unm.fa
CDNAFILE=IRBB7trinityTranscripts.fa

We'll run a toy spliced alignment to create the genome index:

head -2 IRBB7trinityTranscripts.fa > tmpcdna
gth -genomic ${GENOME} -cdna tmpcdna -species rice

... if everything worked as planned, we should now have the

genome index files and can go ahead with the real work in parallel.

However, the created genome index files have the extra tag .dna,

which is then not recognized when using the -noautoindex option to

gth next. As a workaround, we rename the index files to get rid of

the .dna tag. Then gth -noautoindex works (it seems to copy the index

files it needs, using again the .dna tag, but now we seem to have a

working index ...).

ls -1 ${GENOME}.dna* > tmpcmda
cat tmpcmda | sed -e "s/.dna//" > tmpcmdb
sed -i -e "s/^/mv /" tmpcmda
paste tmpcmda tmpcmdb | bash
gth -noautoindex -genomic ${GENOME} -cdna tmpcdna -species rice
\rm tmpcmda tmpcmdb tmpcdna*

gt splitfasta -numfiles ${NUMPRC} ${CDNAFILE}

for cdnafile in ${CDNAFILE}.*
do
gth -noautoindex -intermediate -xmlout -gzip -o gth.${cdnafile}.gz -genomic ${GENOME} -cdna ${cdnafile} -species rice &
done
wait
echo "... gth intermediate run done"

gthconsensus -o gth.TranscriptsOnIRBB7 gth.${CDNAFILE}.*.gz
echo "... gthconsensus run done"

homebrew formula for GenomeThreader

With the code signing requirements of macOS Catalina it becomes harder to distribute GenomeThreader binaries. It would be good to have a Homebrew Formula and profit from their code signing efforts.

How to solve the problem？cannot realloc memory

I have splited the fasta, but the sequences are toooooo long. The problem is still there!

Slightly different gene structures produced on i386

Hi,

I am investigating a build failure on Debian sid i386 (see https://buildd.debian.org/status/fetch.php?pkg=genomethreader&arch=i386&ver=1.7.3%2Bdfsg-2&stamp=1579736664&raw=0), the only failure across all supported architectures (see https://buildd.debian.org/status/package.php?p=genomethreader).
The reason for this failure is that apparently this 32-bit version outputs two exons in the U89959 test case differently. I confirmed that this issue also appears in the binary release version downloaded from the official GenomeThreader site by copying the testdata and testsuite directories from the GenomeThreader source into the extracted binary distribution directory and running testsuite.rb:

(sid-i386)root@debian:/tmp/gth-1.7.3-Linux_i386-32bit/testsuite# ./testsuite.rb
  1: gth                                                         : ok
  2: gth -help                                                   : ok
  3: gth -help+                                                  : ok
  4: gth -version                                                : ok
  5: gth regression test (assertion in gthsafilter)              : ok
  6: gth regression test (-gff3out -intermediate)                : ok
  7: gth regression test (-gff3out -intermediate -fastdp)        : ok
  8: gth regression test (-gff3out -skipalignmentout)            : ok
  9: gth regression test (-gff3out -skipalignmentout -fastdp)    : ok
[...]
 50: fastdp (U89959)                                             : failed
     [ problem: unexpected return code: 1 != 0
       in: /tmp/gth-1.7.3-Linux_i386-32bit/testsuite/stest_testsuite/test50 ]
 51: fastdp (U89959, introncutout)                               : failed
     [ problem: unexpected return code: 1 != 0
       in: /tmp/gth-1.7.3-Linux_i386-32bit/testsuite/stest_testsuite/test51 ]
[...]

(sid-i386)root@debian:/tmp/gth-1.7.3-Linux_i386-32bit/testsuite# cat /tmp/gth-1.7.3-Linux_i386-32bit/testsuite/stest_testsuite/test50/stdout_2
1333,1336c1333,1336
< 1877523	gth	exon	105010	105218	0.861	+	.	Parent=gene174
< 1877523	gth	five_prime_cis_splice_site	105219	105220	0	+	.	Parent=gene174
< 1877523	gth	three_prime_cis_splice_site	105303	105304	0	+	.	Parent=gene174
< 1877523	gth	exon	105305	105428	0.887	+	.	Parent=gene174
---
> 1877523	gth	exon	105010	105223	0.86	+	.	Parent=gene174
> 1877523	gth	five_prime_cis_splice_site	105224	105225	0	+	.	Parent=gene174
> 1877523	gth	three_prime_cis_splice_site	105308	105309	0	+	.	Parent=gene174
> 1877523	gth	exon	105310	105428	0.891	+	.	Parent=gene174
1345,1348c1345,1348
< 1877523	gth	exon	105009	105218	0.862	+	.	Parent=gene175
< 1877523	gth	five_prime_cis_splice_site	105219	105220	0	+	.	Parent=gene175
< 1877523	gth	three_prime_cis_splice_site	105303	105304	0	+	.	Parent=gene175
< 1877523	gth	exon	105305	105428	0.887	+	.	Parent=gene175
---
> 1877523	gth	exon	105009	105223	0.86	+	.	Parent=gene175
> 1877523	gth	five_prime_cis_splice_site	105224	105225	0	+	.	Parent=gene175
> 1877523	gth	three_prime_cis_splice_site	105308	105309	0	+	.	Parent=gene175
> 1877523	gth	exon	105310	105428	0.891	+	.	Parent=gene175

These tests were run in an i386 Debian sid chroot.

Is this a bug or is this deviation known or acceptable?

There is no gthbssmtrain in conda version

Hi,

I only found gth when I install genomethreader using bioconda, is this a problem?

Best,
Kun

Genomethreader treatment of pseudogenes

Hello,

Can someone comment on what genomeThreader does with pseudogenes? Are they kept or thrown out? I've been digging around in the manual and Gordon's thesis, but I haven't seen anything explicit confirm either way.

tagged release + instructions to build from source?

It would be nice if the latest release that is available from http://genomethreader.org/download.html (1.7.1) is tagged here, so the sources that correspond to the binary releases can be grabbed easily.

In addition, are there any instructions available on how to compile the latest release from source? Is there a particular reason why you are not providing a source tarball via http://genomethreader.org/download.html ?