Code Monkey home page Code Monkey logo

orthofinder-tools's Introduction

OrthoFinder Tools

Idea

  • Calculate the most common gene name of each orthogroup by majority vote: annotate_orthogroups
  • Create plots analogous to roary_plots: orthofinder_plots

Setup

pip install orthofinder-tools

Usage

annotate_orthogroups

Prerequisites

Your FASTA sequences must have some description, e.g.:

>gnl|extdb|STRAIN-XY_000001 DNA-directed RNA polymerase subunit beta' [Pediococcus stilesii]
MIDVNKFESMQIGLASPDKIRMWSYGEVKKPETINYRTLKPEKDGLFDERIFGPTKDYECACGKYKRIRY
...

From this protein, DNA-directed RNA polymerase subunit beta' will be extracted.

Command line usage

annotate_orthogroups --help

annotate_orthogroups \
    --orthogroups_tsv /path/to/N0_or_Orthogroups.tsv \
    --hog True \
    --fasta_dir /path/to/fastas \
    --file_endings faa \
    --out outfile.tsv \
    --simple True \
    --header True

If --simple=False resulting tsv looks like this:

HOG Best Gene Name Gene Name Occurrences
N0.HOG0000000 amino acid ABC transporter {JSON}
N0.HOG0000001 IS30 family transposase {JSON}
N0.HOG0000002 IS5/IS1182 family transposase {JSON}

The JSON is a dictionary with key='gene name' -> value=occurrence, for example:

{
  'Integrase core domain protein': 47,
  'hypothetical protein': 15,
  'IS30 family transposase': 126
}

If --simple=True resulting tsv looks like this (no header):

N0.HOG0000000 amino acid ABC transporter
N0.HOG0000001 IS30 family transposase
N0.HOG0000002 IS5/IS1182 family transposase

Usage as python class

# load class
from orthofinder_tools import OrthogroupToGeneName

PATH_TO_ORTHOFINDER_FASTAS = '/path/to/OrthoFinder/fastas'
CURRENT_FOLDER = 'Results_Mon00'

otg = OrthogroupToGeneName(
    fasta_dir=PATH_TO_ORTHOFINDER_FASTAS,
    file_endings='faa',
)
otg.load_hog(
    hog_tsv=F'{PATH_TO_ORTHOFINDER_FASTAS}/OrthoFinder/{CURRENT_FOLDER}/Phylogenetic_Hierarchical_Orthogroups/N0.tsv'
)

otg.majority_dict will be a python dict with key='orthogroup' -> value='best name', for example:

{
  'N0.HOG0000000': 'amino acid ABC transporter',
  'N0.HOG0000001': 'IS30 family transposase',
  'N0.HOG0000002': 'IS5/IS1182 family transposase',
}

otg.save_majority_df(outfile='path/to/outfile.tsv) writes the following file:

HOG Best Gene Name  Gene Name Occurrences
N0.HOG0000000   amino acid ABC transporter Counter({'amino acid ABC transporter': 43})
...

otg.save_orthogroup_to_gene_ids(outfile='path/to/outfile.tsv) writes the following file (no header):

N0.HOG0000000   gene_1  gene_2
N0.HOG0000001   gene_3  gene_4  gene_5
...

otg.save_orthogroup_to_gene_ids(outfile='path/to/outfile.tsv) writes the following file (no header):

N0.HOG0000000	amino acid ABC transporter ATP-binding protein
N0.HOG0000001	ATP-binding cassette domain-containing protein
...

orthofinder_plots

Disclaimer: This script is a port of roary_plots by Marco Galardini ([email protected]).

# Command line usage:
orthofinder_plots --help
orthofinder_plots --tree data/SpeciesTree_rooted.txt --orthogroups_tsv data/Orthogroups.tsv --out output

Three files will be created:




Usage as python class

# load class
from orthofinder_tools import create_plots

create_plots(
    tree='/path/to/SpeciesTree_rooted.txt',
    orthogroups_tsv='/path/to/Orthogroups.tsv',
    format='svg',
    no_labels=False,
    out='/path/to/output/folder'
)

orthofinder-tools's People

Contributors

mrtomrod avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

orthofinder-tools's Issues

Failed to extract description for strain=CerAGI:

I ran your specified command with all the required parameters but recieved an error when executed.
I've listed the command and the error which I got while running it.

$ annotate_orthogroups --orthogroups_tsv /home/shivank/Results_Aug23/Phylogenetic_Hierarchical_Orthogroups/N0.tsv --hog True --fasta_dir /home/shivank/protein/ --file_endings faa --out /home/shivank/outfile.tsv --simple True
AssertionError: Failed to extract description for strain=CerAGI:
gene.id=jgi|CerAGI|462322|estExt_Genewise1.C_1_t10001
gene.description=jgi|CerAGI|462322|estExt_Genewise1.C_1_t10001
description=['jgi|CerAGI|462322|estExt_Genewise1.C_1_t10001']

What is the input fasta file format?

I used orthofinder-tools but my fasta file sequence name only have one field like '>gene-001', so the error occured

Failed to extract description for strain=G1

I changed the fasta dir file sequence name and orthogroups.tsv, but it showed

  File "/mnt/f/ortho/orthogroup_to_genename.py", line 106, in <listcomp>
    lambda ids: [gene_id_to_name[id] for id in ids])
KeyError: ''

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.