Code Monkey home page Code Monkey logo

molecular-epi-ncov's Introduction

Molecular-epi-ncov

This repository is used to provide a molecular epidemiology analysis of SARS-CoV-2 data from GISAID.

LICENSE standard-readme compliant Author

Table of contents

Installation 🖥️

Hardware requirements

The molecular-epi-ncov package requires a standard server or computer with enough RAM to provide support for operations performed in memory.

OS Requirements

All code is tested on Linux: Ubuntu 20.04.1 and MacOSX operating systems.

The R script is compatible with Windows, Mac, and Linux operating systems.

Software Requirements

You will require the following software installed on your server or computer before starting:

Package dependencies

From a terminal, use pip to install the following Python dependencies before running the script:

pip install pandas seaborn matplotlib

From an R session, type:

install.packages("reshape2", "ggplot2", "htmlwidgets", "webshot")
devtools::install_github("hrbrmstr/streamgraph")

Repo contents 🕸️

Usage ☘️

1. Obtain SARS-CoV-2 sequences from public databases

To obtain the most recent GISAID SARS-CoV-2 data in a single file, you can use the batch download feature on the GISAID website(https://gisaid.org/). The files you need are metadata_tsv_2024_01_27.tar.xz and sequences_fasta_2024_01_27.tar.xz. Please note that membership is required to access this feature.

2. Select sequences that meet the specified criteria.

You can download the SARS-CoV-2 reference dataset, to run :

nextclade dataset get \
  --name 'nextstrain/sars-cov-2/wuhan-hu-1' \
  --tag '2024-01-28T00:00:00Z' \
  --output-dir '~/sars-cov-2-2024-01-28update'

To filter sequences that meet the criteria suspiciously clustered single-nucleotide polymorphisms (SNPs) [quality control (QC) SNP clusters status metric not “good”; ≥ 6 mutations in 100 bases], too many private mutations (QC private mutations status metric not good; ≥ 10 mutations from the nearest tree node), or overall bad quality (Nextclade QC overall status “bad”), run seq_filter.ipynb(global sequences) or gd_seq.ipynb(China: Guangdong sequences) in Jupyter Notebook or VS Code.

3. Obtain genotype and clade distribution plot.

Run clade_distribution.ipynb to obtain weekly_clade_distribution.csv, then run clade-distribution/scripts/clade.R to obtain genotype distribution plots.

4. Get mutation heatmaps of the SARS-CoV-2 genome.

Note

Before starting, run ancestral_sequence.sh to get ancestral sequence.

To get the mutation heatmap of SARS-CoV-2, run mutation_heatmap.ipynb. Ancestral sequences were reconstructed using TreeTime. Taking ba.5.2.48 as an example, run the script sh ancestral_sequence.sh to retrieve its ancestral node. The command treetime ancestral is computationally intensive and requires a significant amount of memory.

License ⚖️

This project is covered under MIT License.

molecular-epi-ncov's People

Contributors

tyumen001 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.