Code Monkey home page Code Monkey logo

oric_finder's Introduction

OriC_Finder is no longer in progress. It has been moved here: https://github.com/ZoyavanMeel/ORCA/

OriC_Finder

Python scripts that predict and plot the location of the origin of replication (oriC) of circular bacterial genomes based on Z-curve and GC-skew analysis.

Order of operation

The scripts are excecuted in a specific order to work properly, but also work independently, so that each script can serve as a checkpoint.

DoriC data prep

The DoriC data can be downloaded from http://tubic.tju.edu.cn/doric/public/index.php as a .RAR. Unpack this however you want and you'll be left with a CSV-file.

  1. data_prep_doric.py: Creates three new CSV-files (only _concat.csv works for now) that have each ordered the relevent DoriC data slightly differenly.

NCBI data prep

These scripts prepare the NCBI data for analysis. Each script has docs-strings for further information.

  1. ncbi_download.py: Use this script to download a dataset of your choice. Documentation for the ncbi-genome-download package can be found here: https://github.com/kblin/ncbi-genome-download.
  2. ncbi_to_fasta.py: Unzips and extracts the downloaded FASTA-files from the dataset. Multiple cleaning/filtering options available.
  3. fasta_to_oriC_csv.py: Predicts the oriC(s) for the whole dataset.

Comparison

Once both the DoriC and NCBI datasets have been processed, they can be compared. This is done with oriC_comparison.py.

oriC_Finder.py

This script predicts the origin of replication for circular bacterial DNA. It makes use of a combination of Z-curve and GC-skew analysis. You can load the required FASTA files yourself, or simply provide an accession and NCBI-account email and the find_ori function will fetch them.

Required packages:

Plotting_functions.py

There are 3 general functions in this file which can be used to plot any generic 1D-np.array. To use these functions, make sure to have matplotlib installed

  • plot_Z_curve_3D: Makes a 3D-plot of the Z-curve.
  • plot_Z_curve_2D: Can plot a maximum of four axes in a single 2D-plot. This one is useful for plotting single or multiple Z-curve or GC-skew component agaist each other.
  • plot_GC_skew: Does the same as plot_Z_curve_2D, except only takes one array.

oric_finder's People

Contributors

mister-teapot avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.