Code Monkey home page Code Monkey logo

raccroche's Introduction

RACCROCHE

N|Solid

Build Status

Overview

Given the phylogenetic relationships of several extant species, the reconstruction of their ancestral genomes at the gene and chromosome level is made difficult by the cycles of whole genome doubling followed by fractionation in plant lineages. Fractionation scrambles the gene adjacencies that enable existing reconstruction methods. We propose an alternative approach that postpones the selection of gene adjacencies for reconstructing small ancestral segments and instead accumulates a very large number of syntenically validated candidate adjacencies to produce long ancestral contigs through maximum weight matching. Likewise, we do not construct chromosomes by successively piecing together contigs into larger segments, but instead count all contig co-occurrences on the input genomes and cluster these, so that chromosomal assemblies of contigs all emerge naturally ordered at each ancestral node of the phylogeny. These strategies result in substantially more complete reconstructions than existing methods. We deploy a number of quality measures: contig lengths, continuity of contig structure on successive ancestors, coverage of the reconstruction on the input genomes, and rearrangement implications of the chromosomal structures obtained. The reconstructed ancestors can be functionally annotated and are visualized by painting the ancestral projections on the descendant genomes, and by highlighting syntenic ancestor-descendant relationships. Our methods can be applied to genomes drawn from a broad range of clades or orders.

There are four major modules including seven steps in the RACCROCHE pipeline. Scripts in the pipeline are organized according to the modules as depicted in the diagram of program architechture and file structure.

Module 1: construct gene families and list candidate adjacencies

  • Step 1: Pre-process gene families
  • Step 2: List generalized adjacencies
  • Step 3: List candidate adjacencies

Module 2: construct ancestral contigs by Maximum Weight Matching

  • Step 4: Construct contigs by maximum weight matching

Module 3: match contigs, cluster and sort ancestral chromosomes, paint extant genomes with ancestral synteny blocks

  • Step 5: Match synteny blocks between ancestral genome and extant genomes
  • Step 6: Cluster ancestral contigs into ancestral chromosomes based on contig co-occurrence
  • Step 7: Paint the extant genomes according to the ancestral chromosomes

More details can be found in the diagram of module 3 program architechture and file structure.

Module 4: compare ancestors with extant genomes by MCScanX

  • Step 8: Adapt MCScanX to match ancestral genomes with extant genomes
  • Step 9: Measures of Quality

See the manual for how to install and use the pipeline.

In addition to the pipeline, we also provide our project data (under the "project-monocots" directory) on six genomes of monocot orders, confirming the tetraploidization event β€œtau” in the stem lineage between the alismatids and the lilioids.

Authors Qiaoji Xu (QiaojiXu)
Lingling Jin (LinglingJin)
Chunfang Zheng
James H. Leeben-Mack
David Sankoff
Emails [email protected]
[email protected]
License BSD

Citations:

raccroche's People

Contributors

qiaojilim avatar jin-repo avatar shey-xx avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.