wgerlach / fgs Goto Github PK
View Code? Open in Web Editor NEWThis project forked from wltrimbl/fgs
FragGeneScan -- hidden Markov model for predicting prokaryotic coding regions
This project forked from wltrimbl/fgs
FragGeneScan -- hidden Markov model for predicting prokaryotic coding regions
Installation ============= To install FragGeneScan, please follow the steps below: 1. Untar the downloaded file "FragGeneScan.tar.gz". This will automatically generate the directory "FragGeneScan". 2. Make sure that you also have a C compiler such as "gcc" and perl interpreter. 3. Run "makefile" to compile and build excutable "FragGeneScan" make clean make fgs Running the program ==================== 1. To run FragGeneScan, ./run_FragGeneScan.pl -genome=[seq_file_name] -out=[output_file_name] -complete=[1 or 0] -train=[train_file_name] [seq_file_name]: sequence file name including the full path [output_file_name]: output file name including the full path [whole_genome]: 1 if the sequence file has complete genomic sequences 0 if the sequence file has short sequence reads [train_file_name]: file name that contains model parameters; this file should be in the "train" directory. Note that four files containing model parameters already exist in the "train" directory. [complete] for complete genomic sequences or short sequence reads without sequencing error [sanger_5] for Sanger sequencing reads with about 0.5% error rate [sanger_10] for Sanger sequencing reads with about 1% error rate [454_5] for 454 pyrosequencing reads with about 0.5% error rate [454_10] for 454 pyrosequencing reads with about 1% error rate [454_30] for 454 pyrosequencing reads with about 3% error rate [illumina_5] for Illumina sequencing reads with about 0.5% error rate [illumina_10] for Illumina sequencing reads with about 1% error rate 2. To test FragGeneScan with a complete genomic sequence, ./run_FragGeneScan.pl -genome=./example/NC_000913.fna -out=./example/NC_000913.test -complete=1 -train=complete [NC_000913.fna]: this sequence file has the complete genomic sequence of E.coli 3. To test FragGeneScan with sequencing reads, ./run_FragGeneScan.pl -genome=./example/NC_000913-454.fna -out=./example/NC_000913-454.test -complete=0 -train=454_10 [NC_000913-454.fna]: this sequence file has simulated reads (pyrosequencing, average length = 400 bp and sequencing error = 1%) generated using Metasim Output ============ Upon completion, FragGeneScan generates three files. 1. The first file is "[output_file_name].out", which lists the coordinates of putative genes. This file consists of five columns (start position, end position, strand, frame, score). For example, >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome 108 440 - 3 1.378688 337 2799 + 1 1.303498 2801 3733 + 2 1.317386 3734 5020 + 2 1.293573 5234 5530 + 2 1.354725 5683 6459 - 1 1.290816 6529 7959 - 1 1.326412 8238 9191 + 3 1.286832 9306 9893 + 3 1.317067 2. The second file is '[output_file_name].ffn", which lists nucleotide sequences corresponding to the putative genes in "[output_file_name].out". For example, >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=108 e nd=338 strand=- GTTGTTACCTCGTTACCTTTGGTCGAAAAAAAAAGCCCGCACTGTCAGGTGCGGGCTTTTTTCTGTGTTTCCTGTACGCGTCAGCCCGCACCGTTACCTG TGGTAATGGTGATGGTGGTGGTAATGGTGGTGCTAATGCGTTTCATGGATGTTGTGTACTCTGTAATTTTTATCTGTCTGTGCGCTATGCCTATATTGGT TAAAGTATTTAGTGACCTAAGTCAA >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=343 e nd=2799 strand=+ TTGAAGTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCCAGGCAGGGGCAGGTGGCCACCGTCC TCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTAT TTTTGCCGAACTTTTGACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGCAATTGAAAACTTTCGTCGATCAGGAATTTGCCCAAATAAAACAT GTCCTGCATGGCATTAGTTTGTTGGGGCAGTGCCCGGATAGCATCAACGCTGCGCTGATTTGCCGTGGCGAGAAAATGTCGATCGCCATTATGGCCGGCG 3. The third file is '[output_file_name].faa", which lists amino acid sequences corresponding to the putative genes in "[output_file_name].out". For example, >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=108 e nd=338 strand=- VVTSLPLVEKKSPHCQVRAFFCVSCTRQPAPLPVVMVMVVVMVVLMRFMDVVYSVIFICLCAMPILVKVFSDLSQ >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=343 e nd=2799 strand=+ LKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDALPNISDAERIFAELLTGLAAAQPGFPLAQLKTFVDQEFAQIKH VLHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEARGHNVTVIDPVEKLLAVGHYLESTVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELVVLG RNGSDYSAAVLAACLRADCCEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPCLIKNTGNPQAPGTLIGASRDEDE LPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMSRARISVVLITQSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAIISVVG DGMRTLRGISAKFFAALARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQMLFNTDQVIEVFVIGVGGVGGALLEQLKRQQSWLKNKHIDLRVCGV ANSKALLTNVHGLNLENWQEELAQAKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAVADQYADFLREGFHVVTPNKKANTSSMDYYHQLRYAAEKSRRKF LYDTNVGAGLPVIENLQNLLNAGDELMKFSGILSGSLSYIFGKLDEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGRELELADIEIEP VLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDGVCRVKIAEVDGNDPLFKVKNGENALAFYSHYYQPLPLVLRGYGAGNDVTA AGVFADLLRTLSWKLGV >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=2801 end=3733 strand=+ VKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFSLNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSAC SVVAALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDIISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCI AHGRHLAGFIHACYSRQPELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVADWLGKNYLQNQEGFVHICRLD TAGARVLEN License ============ Copyright (C) 2010 Mina Rho, Yuzhen Ye and Haixu Tang. You may redistribute this software under the terms of the GNU General Public License.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.