Code Monkey home page Code Monkey logo

repcluster's Introduction

REPcluster aim to cluster repeat sequences that have similar contents but distinct structures, such as (TTTAGGG)m vs (TTTAGGG)n (tandem repeats), A-B-C vs A-C (some TE sequences).

Quick install and start

git clone https://github.com/zhangrengang/REPcluster
cd REPcluster

# install
conda install -c bioconda kmer-db mcl xopen 
python3 setup.py install

# run an example
cd example_data
REPclust hifi.trf.fa -x 2	 # for tandem repeats

Outputs

repclust.a2a.csv.jaccard	# Similariry matrix
repclust.network	# Network to import into Cytoscape
repclust.attr		# Attibutes to import into Cytoscape
repclust.mcl		# one Cluster per line
repclust.fa			# centered sequences for each cluster

Usage

usage: REPclust [-h] [-pre STR] [-o DIR] [-tmpdir DIR] [-x INT] [-k INT]
               [-m {jaccard,min,max,cosine}] [-c FLOAT] [-I FLOAT] [-p INT]
               [-cleanup] [-overwrite] [-v]
               FILE [FILE ...]

Cluster Repeat Sequences.

optional arguments:
  -h, --help            show this help message and exit

Input:
  FILE                  Each sequence in a FASTA file is treated as a separate
                        sample

Output:
  -pre STR, -prefix STR
                        Prefix for output [default=repclust]
  -o DIR, -outdir DIR   Output directory [default=.]
  -tmpdir DIR           Temporary directory [default=tmp]

Kmer matrix:
  -x INT, -multiple INT
                        Repeat sequences to cluster tandem repeat or circular
                        sequences [default=1]
  -k INT                Length of kmer [default=15]
  -m {jaccard,min,max,cosine}, -measure {jaccard,min,max,cosine}
                        The similarity measure to be calculated.
                        [default=jaccard]

Cluster:
  -c FLOAT, -min_similarity FLOAT
                        Minimum similarity to cluster [default=0.2]
  -I FLOAT, -inflation FLOAT
                        Inflation for MCL (varying this parameter affects
                        granularity) [default=2.0]

Other options:
  -p INT, -ncpu INT     Maximum number of processors to use [default=32]
  -cleanup              Remove the temporary directory [default=False]
  -overwrite            Overwrite even if check point files existed
                        [default=False]
  -v, -version          show program's version number and exit

repcluster's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.