Code Monkey home page Code Monkey logo

uffescience / nmfst.p Goto Github PK

View Code? Open in Web Editor NEW
0.0 5.0 1.0 8.76 MB

The NMFSt.P Notebook is introduced as a tool in bioinformatics for conducting exploratory analysis on phylogenetic trees. It simplifies the comparison of multiple trees to identify frequent subtrees, addressing the computational intensity and laboriousness associated with this task, particularly in scenarios involving large input tree sets.

Home Page: https://sol.sbc.org.br/index.php/bresci/article/view/25492

Jupyter Notebook 100.00%
notebook phylogenetic-trees

nmfst.p's Introduction

NMFSt.P

NMFSt.P: A Notebook for Parallel Identification of Frequent Subtrees in Phylogenetic Tree Ensembles.

Prerequisites

  • Python 3.10.12
  • Clustalw (versão 2.1)
  • Arquivo FASTA com sequências de proteínas

Installation of Dependencies

Before running the project, you must install the Python dependencies specified in the "requirements.txt" file. To do this, run the following command in the terminal:

pip install -r requirements.txt

To install Clustalw on Linux (Ubuntu):

sudo apt update
sudo apt-get install clustalw

File Organization

Ensure that all required files, including protein sequences in FASTA format, are in the directory specified in 'input_path'.

Workflow Steps

1. Construction of Phylogenetic Trees

The first step of the workflow is the construction of phylogenetic trees from the provided protein sequences. To do this, run the "Constructor.ipynb" script in the terminal:

python Constructor.ipynb

This script performs multiple sequence alignment using ClustalW and then builds the phylogenetic tree using the Neighbor-Joining (NJ) method.

2. Construction of Subtrees and MAF Analysis

After constructing the phylogenetic trees, the next step is to generate subtrees from the main trees and perform the MAF (subtree pair frequency matrix) analysis.

Run the "sub_find.ipynb" script in the terminal:

python sub_find.ipynb

This script will generate all subtrees from the phylogenetic trees and then calculate the subtree pairwise frequency (MAF) matrix. The result will be displayed on the terminal.

Outputs

The generated subtrees will be saved in the "out/Subtrees" directory. Additionally, the subtree pair frequency (MAF) matrix will be displayed in the terminal while running the "sub_find.ipynb" script.

Temporary File Cleanup

The "Constructor.ipynb" and "sub_find.ipynb" script will automatically clean up the temporary files generated during the process. Temporary files will be deleted from the "out/tmp/" directory.

Final considerations

This guide provides an overview of the workflow for building phylogenetic trees and analyzing subtrees. Make sure that the input files are correctly organized in the indicated directories and run the scripts according to the steps described.

License

This project is licensed under the MIT License.

nmfst.p's People

Contributors

johkempo avatar milabixcode avatar

Watchers

Bruno Lopes avatar Aline Paes avatar Daniel de Oliveira avatar Marcos Lage avatar  avatar

Forkers

ovvesley

nmfst.p's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.