Code Monkey home page Code Monkey logo

subscripts's Introduction

DESCRIPTION

This repository contains all the scripts required for the correct functioning of the master script of inmunoglobulin detection tool.

STRUCTURE

  1. Variables confirmation
  2. Make CDHIT reductions.
  3. Align with CLUSTALW2 the RSS reductions.
  4. Make BLAST analysis.
  5. Convert BLAST m6 table to gff.
  6. Extract scaffolds of interest.
  7. Make EXONERATE analysis.
  8. Clean vulgar format.
  9. Make HMMER analysis for RSS.
  10. Filter EXONERATE files for exons and genes hits.
  11. Reduce to eliminate redundancy in filtered EXONERATE files.
  12. Make overlap analysis to detect V segments with exon and SP
  13. Make MINIPROT analysis.
  14. Correct V segments and RSS-J coordinates.
  15. Predict D segments based in founded RSS-D.

MAIN MODULES

blast_m6_to_gff.py

This script transforms blast format 6 (tabular with 11 columns) into a gff file, along with some filters.

Syntax

blastm6_to_gff.py --file [FILE] --source [SOURCE] --bitscore [BITSCORE]

Options

Options Description
file The file produced by blast with output format 6 (tabular).
source The mode in which blast was performed, e.g. blastn, blastp, tblastx, ...
bitscore Bitscore obtained by blast. Used to add a level of filter to the result. If 0, all results will be maintained.

vulgar_to_table.R

This script takes the vulgar format from the exonerate analysis and transforms the vulgar syntax into a table of condensed results.

Syntax

vulgar_to_table.R --file [FILE]

Options

Options Description
file The filtered exonerate result file containing only the records with vulgar formats.

hmmer_tbl_to_gff.py

This script transforms hmmer tbl format into a gff file.

Syntax

hmmer_tbl_to_gff.py --file [FILE]

Options

Options Description
file The file produced by HMMER analysis, in tbl format.

gff_disambiguation.R

This script takes the filtered exonerate files (genes or exons) and perform a reduction of the overlaping sequences in order to reduce ambiguity produced from matches located at the same coordinates. The result is one genomic range per overlapping individual ranges.

Syntax

gff_disambiguation.R --file [FILE]

Options

Options Description
file The files resul

predict_ighv_by_overlaps.R

This script takes two filtered files, the reduced exons and genes records from EXONERATE results. In order to detect if a V segment is a has its structural exon and its signal peptide, at least two exons need to be detected in each gene record. To do this, the number of exons per gene is counted and every gene that has two or more exons are annotated as genes, whereas genes with one or less associated exons are annotated as pseudogenes. Every record is numbered in a unique fashion to give each segment a unique name.

Syntax

SCRIPTS/SUBSCRIPTS/predict_ighv_by_overlaps.R --query [FILE] --subject [FILE]

Options

Options Description
query The reduced gff file from filtered exon annotated records from EXONERATE
subject The reduced gff file from filtered gene annotated records from EXONERATE

locate_nearby_rss.R

This script detect which RSS is locate nearby each V/J segment. This is achieved by increasing the coordinates of RSS segments in both the start and the end of the match, in order to try to make an artificial overlap with the nearby V/J segment

Syntax

locate_nearby_rss.R -n [FILE] -t [FILE] -v [FILE] -r [INT] -m [STRING]

Options

Options Description
n The HMMER gff file
t The HMMER tbl file
v The overlap gff file prediction containing the names of genes and pseudogenes
r The number of positions to increase in both sides of the HMMER matches, in which to detect possible overlaps with the V segments
m Mode. Depends on the input files. One of the followings: [V_segments/J_segments]

predict_ighd_by_rss.R

This script detects the probable location of true D segments. To do this, the reasoning was that true D segments would by flanked by D RSS signal in both 5' and 3' ends. Every genomic region comprising both ends that have less tha 50 bp lenght is considered as a potential true D segment.

Syntax

predict_ighd_by_rss.R -f [FILE] -t [FILE]

Options

Options Description
f The HMMMER gff file produced with the 5' RSS database
t The HMMMER gff file produced with the 3' RSS database

DOCKER

The script uses many dependencies that could be annoying to install, test, and put the required locations. To prevent this, we designed a docker image in order to have a clean and ready-to-use environment to perform the genomic analysis using this script.

DEPENDENCIES

This are all the dependencies and the versions used in order to make the script to make all the analysis. Here are considered only the programs that could not be preinstalled in a new linux installation.

Programming Language Library, package Use in Version
Conda --- --- ---
python pandas --- ---
R rtracklayer --- ---
R GenomicRanges --- ---

subscripts's People

Contributors

biodfrl89 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.