rdpstaff / xander-hmmgs Goto Github PK
View Code? Open in Web Editor NEWLicense: GNU General Public License v3.0
License: GNU General Public License v3.0
Using HMMgs: See detailed step-by-step instructions in Xander_assembler repository (https://github.com/rdpstaff/Xander_assembler) Build - Build a De Bruijn graph from from a set of reads java -jar hmmgs.jar build <read_file> <bloom_out> <kmerSize> <bloomSizeLog2> [cutoff = 2] [# hashCount = 4] [bitsetSizeLog2 = 30] read_file fasta or fastq files containing the reads to build the graph from bloom_out file to write the bloom filter to kmerSize should be multiple of 3, (recommend 45, minimum 30, maximum 63) bloomSizeLog2 the size of the bloom filter (or memory needed) is 2^bloomSizeLog2 bits, increase if the predicted false positive rate is greater than 1% cutoff minimum number of times a kmer has to be observed in SEQFILE to be included in the final bloom filter hashCount number of hash functions, recommend 4 bitsetSizeLog2 the size of one bitSet 2^bitsetSizeLog2, recommend 30 The bloom filter stats such as bloom filter predicted false positive rate is written to stdout. Search - Perform local assembly starting at the given start points in a given de Bruijn Graph output files <kmers>_nucl.fasta, _prot.fasta, search stats written to stdout java -jar hmmgs.jar search [-h] [-u] [-p <n_nodes>] <k> <limit_in_seconds> <bloom_filter> <for_hmm> <rev_hmm> <kmers> -u don't normalize the hmm input -p n_nodes prune the search if the score does not improve after n_nodes (default 20, set to 0 to disable pruning) k number of best local assemblies to return for each kmer limit_in_seconds dtime limit for individual searches (conservative suggestion = 100) bloom_filter bloom filter built using hmmgs build for_hmm, rev_hmm hidden markov models, HMMER3 format kmers starting points (can use KmerFilter's fast_kmer_filter to identify starting points) [#threads] experimental, suggested 1 (not thoroughly tested) Merge - Merge the left and right contigs generated by hmmgs search java -jar hmmgs.jar merge [options] <hmm> <hmmgs_file> <nucl_contig> -a,--all Generate all combinations for multiple paths for each starting kmer, instead of just the best -b,--min-bits <arg> Minimum bits score -l,--min-length <arg> Minimum length -o,--out <arg> Write output to file instead of stdout KmerFilter: fast_kmer_filter - search a set of reads against a set of reference sequences to identify starting points for assembly java -jar KmerFilter.jar fast_kmer_filter <kmerSize> <query_file> [name=]<ref_file> ... -a,--aligned Build trie from aligned sequences -o,--out <arg> Redirect output to file -T,--transl-table <arg> Translation table to use when translating nucleotide to protein sequences -t,--threads <arg> #Threads to use <kmerSize> kmer length, should be multiple of 3, (recommend 45, minimum 30, maximum 63) <query_file> read file to search for starting points in (use the same fasta file used to build the De Bruijn Graph) 1 or more aligned reference files (aligned using the same HMM that will be used to search) with an optional reference name (ie nifh=my_nifh_refs_aligned.fasta) Other uses: HMMgs can also be used to extract subgraphs from starting points instead of contigs to perform further analysis with (see edu.msu.cme.rdp.graph.GraphSearch) HMMgs can also be used to compute base coverage for contigs (generated by hmmgs or other programs) (see edu.msu.cme.rdp.graph.abundance.ReadKmerMapper and base_coverage.py) NOTES: When using fast_kmer_filter to identify start points there are two things to be aware of. 1. While the Bloom Filter Builder allows any k-size (hmmgs requiers a k divisible by 3 however), fast_kmer_filter requires k <= 63 2. fast_kmer_filter allows for multiple gene starting points to be searched for at the same time (since each requires a scan over the read file it is faster to do every gene at once), however this means the output file is multiplexed and must be demultiplexed before used in hmmgs search. This can be done with the following command: grep 'gene_name' <multiplexed_starts_file> | cut -f2- > <demultiplexed_gene_start_points>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.