Code Monkey home page Code Monkey logo

cath-nr-generate-clusters's Introduction

Generate NR clusters from BLAST database

This repo contains a Perl script used to generate non-redundant clusters of sequences from an all-vs-all BLAST database. The representatives from the resulting clusters are guaranteed to share less than X% sequence identity over Y% coverage.

This script was written a long time ago and was not designed to be run outside of the internal release pipeline in CATH. It hasn't been tested outside of CATH walls - it's here for information.

If I was doing this now, I would be tempted to look at mmseqs or CD-Hit as (much faster) alternatives.

Install dependencies

The following seems to work in relatively modern versions of Perl (>=5.30)

cpan -I .

Usage

$ perl script/generate_long_non_redundant_list.pl

usage: generate_long_non_redundant_list.pl [options] <S100_domain_list> <blast_results_directory>

options:

  -o|--out   <file>          Output NR file [default: nr_list_s<SEQ>_o<OV>.txt]
  --seq      Num[20-100]     Specify sequence id cutoff [default: 40.0] 
  --overlap  Num[20-100]     Specify overlap cutoff [default: 60.0] 

cath-nr-generate-clusters's People

Contributors

sillitoe avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.