DRAFTS

DNA Regulatory element Analysis by cell-Free Transcription and Sequencing

Code and materials from paper "Multiplex transcriptional characterizations across diverse bacterial species using cell-free systems" Yim SS^*, Johns NI^*, Park J, Gomes ALC, McBee RM, Richardson M, Ronda C, Chen SP, Garenne D, Noireaux V, Wang HH. Molecular Systems Biology (2019) 15, e8875. ^*denotes equal contribution

The full paper and supplementary information can be accessed here.

Raw sequencing data can be found at NCBI SRA under PRJNA509603.

dependencies

The following must be installed prior to executing the code in this repository. For Python packages, it may be convenient to obtain these through a distribution such as Anaconda. Installation should only take a few minutes.

python 3.6.X, ipython/jupyter
- biopython
- pandas
- numpy
- scipy
- matplotlib
- seaborn
bbmerge

1. processing of raw sequencing data

01_DRAFTS_process_raw.sh

expects nextseq/miseq raw data folder, where each folder has 2 files of R1 and R2 (paired-end reads) and files sequenced from different lanes of flowcell are separated in four different folders labeled with _L00n
assumes foldernames are Samplename_L001_, Samplename_L002_, Samplename_L003_, or Samplename_L004_ Samplename here is SampleID for each sample in the sample sheet for illumna sequencing run

run 01_DRAFTS_process_raw.sh 1) to find and combine raw nextseq data in search_dir, 2) unzip them to the out_dir, then 3) assemble paired-end reads

bash 01_DRAFTS_process_raw.sh [search_dir] [out_dir (optional)]

after running 01_DRAFTS_process_raw.sh, group DNA-seq and RNA-seq reads in seperate folders for further analysis

2. error filtering and barcode counting

02_DRAFTS_extract_data.py

out_dir should contain a folder named 01_bccounts with 2 empty folders insde named [01_dna_bccounts, 02_rna_bccounts],
and a folder named 02_log with 10 empty folders inside named [01_bccounts, 02_lowq, 03_missingadapter, 04_badbc, 05_goodbc_badalign, 06_frag, 07_goodbc_perfectalign, 08_goodbc_goodalign, 09_goodbc_perfectalign_bccounts, 10_goodbc_goodalign_bccounts, 11_log_files]

run 02_DRAFTS_extract_data.py to 1) filter errors in oligo library synthesis or sequencing, 2) extract barcode counts and 3) other info for qc and additional analysis

python 02_DRAFTS_extract_data.py [ref_csv] [dna_directory] [rna_directory] [out_dir]

3. calculation of transcription levels

03_DRAFTS_compute_tx.py

out_dir should contain a folder named 01_tx

run 03_DRAFTS_compute_tx.py to 1) compute abundances of DNA and RNA barcode counts and 2) transcription levels

python 03_DRAFTS_compute_tx.py [ref_csv] [dna_bc_directory] [rna_bc_directory] [out_dir]

ssyim / drafts Goto Github PK

drafts's Introduction

DRAFTS

dependencies

1. processing of raw sequencing data

2. error filtering and barcode counting

3. calculation of transcription levels

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent