Code Monkey home page Code Monkey logo

tpma's Introduction

TPMA: a two pointers meta-alignment tool to ensemble different nucleic acid multiple sequence alignment results

TPMA is a tool written in C++20 for combining and refining multiple MSA results using a two-pointer algorithm. It runs on Linux.

🔨Installation and Usage

1.1 OSX/Linux/WSL(Windows Subsystem for Linux ) - from Anaconda

1.Install WSL for Windows. Instructional video 1 or 2 (Copyright belongs to the original work).

2.Download and install Anaconda. Download Anaconda for different systems here. Instructional video of anaconda installation 1 or 2 (Copyright belongs to the original work).

3.Install TPMA.

#1 Create and activate a conda environment for TPMA
conda create -n tpma_env
conda activate tpma_env

#2 Add channels to conda
conda config --add channels malab

#3 Install TPMA
conda install -c malab tpma

#4 Test TPMA
tpma -h

1.2 OSX/Linux/WSL(Windows Subsystem for Linux ) - from the source code

  1. Download and Compile the source code. (Make sure your version of gcc >= 9.4.0 or clang >= 13.0.0)
#1 Download
git clone https://github.com/malabz/TPMA.git

#2 Open the folder
cd TPMA

#3 Compile
make

#4 Test TPMA
./tpma -h

2 Usage

./tpma -a <initial_alignments_list> -r <raw_data> -o <output>
   -a is used to specify the paths of all initial alignments to be merged
   -r is used to specify the path of raw data, a file in FASTA format
   -o is used to specify the output for TPMA result, a file in FASTA format
   -h print help message

🔬Test dataset and the use case

1. Information about the test dataset

Dataset Sequences Num Repeats Num Avg Length Similarity
mt genomes 30 4 about 16568bp The average similarity is about 99.7%
HVS-II 10 10 about 365bp The average similarity is about 98.6%
16S rRNA 100 8 about 1440bp The average similarity is about 74.6%
23S rRNA 64 10 about 3113bp The average similarity is about 92.7%
SARS-CoV-2_20200301 39 4 about 29858bp The average similarity is about 99.8%
SARS-CoV-2_20200417 100 4 about 27623bp The average similarity is about 85.3%
16s simu 100 3 about 1550bp 14 sets of data with different similarities (99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, 90%, 85%, 80%, 75%, 70%)
23s simu 100 3 about 3900bp 14 sets of data with different similarities (99%, 98%, 97%, 96%, 95%, 94%, 93%, 92%, 91%, 90%, 85%, 80%, 75%, 70%)
CIPRES-128 255 10 about 1550bp The average similarity is about 80%
CIPRES-256 511 10 about 1550bp The average similarity is about 80%

2. The use case

# Download data
wget http://lab.malab.cn/~zyx/tools/TPMA/data/16s_rRNA.tar.gz

# Unzip data
tar -zxvf 16s_rRNA.tar.gz

# Get the folder path
cd 16s_rRNA

# Run TPMA
./tpma -a msa_results/16s_rRNA_100seq_rep1/16s_rRNA_100seq_rep1_clustalw2.fasta msa_results/16s_rRNA_100seq_rep1/16s_rRNA_100seq_rep1_mafft.fasta msa_results/16s_rRNA_100seq_rep1/16s_rRNA_100seq_rep1_muscle3.fasta msa_results/16s_rRNA_100seq_rep1/16s_rRNA_100seq_rep1_tcoffee.fasta -r raw_data/16s_rRNA_100seq_rep1.fasta -o 16s_rRNA_100seq_rep1_tpma_c4.fasta 

📍Reminder

  1. Currently TPMA is ONLY available for DNA/RNA.
  2. The application of TPMA assumes that the sequences' IDs within the MSAs are unique. (E.g. Due to the excessively long length of the sequence IDs in the original data set, Clustal format(the default output format of ClustalW2/PCMA/POA/T-Coffee) may truncate the IDs, resulting in consistent IDs in the alignment output that TPMA cannot process. If the IDs in the original data are too long, we suggest manually renumbering them before using MSA software).
  3. Convert Clustal format to FASTA format using tool: aln2fasta
  4. TPMA will sort the sequences, but we recommend that the input sequences be in the same order as the original file.
  5. TPMA will remove any bad characters that are present in the input data.

🖥️Environment

System GCC version
OSX clang 13.0.0
Linux GCC 9.4.0
WSL GCC 9.4.0

🔖Citation

If you use this software, please cite:

Zhai, Y., Chao, J., Wang, Y., Zhang, P., Tang, F., & Zou, Q. (2024). TPMA: A two pointers meta-alignment tool to ensemble different multiple nucleic acid sequence alignments. PLOS Computational Biology, 20(4), e1011988.

👋Contacts

The software tools are developed and maintained by 🧑‍🏫ZOU's lab.

If you find any bug, welcome to contact us on the issues page or email us at 👉📩.

More tools and infomation can visit our github.

tpma's People

Contributors

smilida avatar

Stargazers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.