Code Monkey home page Code Monkey logo

pythonngstools's Introduction

PythonNGSTools

Scripts for NGS data processing

HapCUT2VCF.py

This code convert a output file from HapCUT2(https://github.com/pjedge/hapcut2) into VCF format.

Usage(python3):

python HapCUT2VCF.py  hapcut2_table  output_VCF   original_VCF(optional)  

Because hapcut2_table only keep phased variants. Unphased variants can be recovered if the original input VCF file for HapCUT2 is provided. The variants will be combined into the same output file. Each block in HapCUT2 table is stored as haplotype block in VCF file, and marked using PS tag. The numbering of PS starts from 1 for each run. The output tables can be concatenated before converting to generate unique PS IDs in the whole genome.

VCFsplit4HapCUT2.py

This code split a VCF file into files of smaller size. It can be used to run HapCUT2 in parallel.

Usage(python3):

python VCFsplit4HapCUT2.py  vcf_input  line_number_per_file

MUMmerSNPs2VCF.py

This code converts output file from show-snps of MUMmer into VCF format. "-x 1" option should be turned on so that reference fasta is not needed.

Usage:

show-snps -Clr -x 1  -T mum.delta.filter  >mum.delta.filterX.snps  
python3.4 MUMmerSNPs2VCF.py mum.delta.filterX.snps  mum_filterX.snps.vcf  

Notes for the code:
To get the correct converted VCF files from MUMmer/snps:

  1. You need to check the reference sequence to rebuild insertion and deletion.
    Instead of reading original reference fasta file, I used "show-snps -x 1", so that the surrounding nucleotides are also reported.
  2. For the insertions, if the query sequences are reversely mapped to the references, the orders of nucleotides in query sequence are reversely reported.
    So, they needed to be concatenated in reverse order.
  3. The coordinates of insertion and deletions.
    For insertions, the coordinates in MUMmer/snps are the coordinates of nucleotides before insertions. They need to be kept as the same in VCF files.
    For deletions, the coordinates in MUMmer/snps are of the nucleotides that are deleted. The coordinates in VCF should be : first_position_of_deletion_block - 1.

Download_FTP_ENA.py

Download fastq files from ENA.

Usage:

  1. Search the project in ENA to get the information page of one study.
  2. Click "Read Files" tag next to "Navigation".
  3. Download the TEXT file (save as ENA_description.txt).
  4. Write the shell scripts to download fastq files (The master shell submit downloading jobs to one queue on clusters).
python2.7 Download_FTP_ENA.py  ./  ENA_description.txt  

./ : is the current directory and can be changed into the directory to store the fastq files.

MUMmer helper

Some tools to split large query file, submit jobs to clusters and merge the delta output from MUMmer

pythonngstools's People

Contributors

liangjiaoxue avatar

Watchers

James Cloos avatar Wtong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.