Code Monkey home page Code Monkey logo

awesome-bioinformatics's Introduction

Awesome Bioinformatics Awesome

Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biological data. — Wikipedia

A curated list of awesome Bioinformatics software, resources, and libraries. Mostly command line based, and free or open-source. Please feel free to contribute!

Table of Contents


Data Processing

Command Line Utilities

  • datamash - Data transformations and statistics.
  • Bioinformatics One Liners - Git repo of useful single line commands.
  • Bedtools2 - A Swiss Army knife for genome arithmetic.
  • CSVKit - Utilities for working with CSV/Tab-delimited files.
  • csvtk - Another cross-platform, efficient, practical and pretty CSV/TSV toolkit.
  • easy_qsub - Easily submitting PBS jobs with script template. Multiple input files supported.
  • GNU parallel - General parallelizer that runs jobs in parallel on a single multi-core machine. Here are some example scripts using GNU parallel.
  • Ruffus - Computation Pipeline library for python widely used in science and bioinformatics.

Next Generation Sequencing

Pipelines

  • bcbio-nextgen - Batteries included genomic analysis pipeline for variant and RNA-Seq analysis, structural variant calling, annotation, and prediction.

Sequence Processing

Sequence Processing includes tasks such as demultiplexing raw read data, and trimming low quality bases.

  • fakit - A cross-platform and efficient toolkit for FASTA/Q file manipulation.
  • Fastqp - Fastq and Sam quality control using python.
  • FastQC - A quality control tool for high throughput sequence data.
  • Fastx Tookit - FASTQ/A short-reads pre-processing tools: Demultiplexing, trimming, clipping, quality filtering, and masking utilities.
  • Seqtk - Toolkit for processing sequences in FASTA/Q formats.

Sequence Alignment

De Novo Alignment

DNA Resequencing

  • BWA Burrow-Wheeler Aligner for pairwise alignment between DNA sequences.

Variant Calling

  • samtools/bcftools/htslib - A suite of tools for manipulating next-generation sequencing data.
  • freebayes - Bayesian haplotype-based polymorphism discovery and genotyping.

BAM File Utilities

  • Bamtools - Collection of tools for working with BAM files.

VCF File Utilities

  • vcflib - A C++ library for parsing and manipulating VCF files.
  • bcftools - Set of tools for manipulating VCF files.
  • vcftools - VCF manipulation and statistics (e.g. linkage disequilibrium, allele frequency, Fst).

Genomic Traits

Genomic Traits are differences in terms of DNA structure or content observed among populations that may be regulated by genetic variation. For example, telomere length or rDNA copy number.

  • Telseq - Telseq is a tool for estimating telomere length from whole genome sequence data.
  • bam toolbox MtDNA:Nuclear Coverage; Bam Toolbox can output the ratio of MtDNA:nuclear coverage, a proxy for mitochondrial content.

Variant Simulation

  • wgsim - Comes with samtools! - Reads simulator.
  • Bam Surgeon - Tools for adding mutations to existing .bam files, used for testing mutation callers.

Variant Filtering / Quality Control

Variant Prediction/Annotation

  • SIFT - Predicts whether an amino acid substitution affects protein function.
  • SnpEff - Genetic variant annotation and effect prediction toolbox.

Python Modules

Data

  • cruzdb - Pythonic access to the UCSC Genome database.
  • pyensembl - Pythonic Access to the Ensembl database.

Tools

Visualization

Genome Browsers / Gene diagrams

The following tools can be used to visualize genomic data or for constructing customized visualizations of genomic data including sequence data from DNA-Seq, RNA-Seq, and ChIP-Seq, variants, and more.

  • biodalliance - Embeddable genome viewer. Integration data from a wide variety of sources, and can load data directly from popular genomics file formats including bigWig, BAM, and VCF.
  • IGV js - Java based browser. Fast, efficient, scalable visualization tool for genomics data and annotations. Handles a large variety of formats.
  • Island Plot - D3 JavaScript based genome viewer. Constructs SVGs.
  • pileup.js - JavaScript library that can be used to generate interactive and highly customizable web-based genome browsers.
  • scribl - JavaScript library for drawing canvas-based gene diagrams. The Homepage has examples.
  • DNAism - Horizon chart d3-based js library for DNA data.

Database Access

Resources

Becoming a Bioinformatician

Sequencing

  • Next-Generation Sequencing Technologies - Elaine Mardis (2014) [1:34:35] - Excellent (technical) overview of next-generation and third-generation sequencing technologies, along with some applications in cancer research.
  • Annotated bibliography of *Seq assays - List of ~100 papers on various sequencing technologies and assays ranging from transcription to transposable element discovery.
  • For all you seq... (PDF) (3456x5471) - Massive infographic by Illumina on illustrating how many sequencing techniques work. Techniques cover protein-protein interactions, RNA transcription, RNA-protein interactions, RNA low-level detection, RNA modifications, RNA structure, DNA rearrangements and markers, DNA low-level detection, epigenetics, and DNA-protein interactions. References included.

RNA-Seq

ChIP-Seq

YouTube Channels and Playlists

  • Current Topics in Genome Analysis 2016 - Excellent series of fourteen lectures given at NIH about current topics in genomics ranging from sequence analysis, to sequencing technologies, and even more translational topics such as genomic medicine.
  • GenomeTV - "GenomeTV is NHGRI's collection of official video resources from lectures, to news documentaries, to full video collections of meetings that tackle the research, issues and clinical applications of genomic research."
  • Leading Strand - Keynote lectures from Cold Spring Harbor Laboratory (CSHL) Meetings. More on The Leading Strand.
  • Genomics, Big Data and Medicine Seminar Series - "Our seminars are dedicated to the critical intersection of GBM, delving into 'bleeding edge' technology and approaches that will deeply shape the future."
  • Rafael Irizarry's Channel - Dr. Rafael Irizarry's lectures and academic talks on statistics for genomics.
  • NIH VideoCasting and Podcasting - "NIH VideoCast broadcasts seminars, conferences and meetings live to a world-wide audience over the Internet as a real-time streaming video." Not exclusively genomics and bioinformatics video but many great talks on domain specific use of bioinformatics and genomics.

Blogs

  • ACGT - Dr. Keith Bradnam writes about this "thoughts on biology, genomics, and the ongoing threat to humanity from the bogus use of bioinformatics acroynums."
  • Opiniomics - Dr. Mick Watson write on bioinformatics, genomes, and biology.
  • Bits of DNA - Dr. Lior Pachter writes review and commentary on computational biology.
  • it is NOT junk - Dr. Michael Eisen writes "a blog about genomes, DNA, evolution, open science, baseball and other important things"

Miscellaneous

  • The Leek group guide to genomics papers - Expertly curated genomics papers to get up to speed on genomics, RNA-seq, statistics (used in genomics), software development, and more.
  • A New Online Computational Biology Curriculum - "This article introduces a catalog of several hundred free video courses of potential interest to those wishing to expand their knowledge of bioinformatics and computational biology. The courses are organized into eleven subject areas modeled on university departments and are accompanied by commentary and career advice."
  • How Perl Saved the Human Genome Project - An anecdote by Lincoln D. Stein on the importance of the Perl programming language in the Human Genome Project.

License

CC0

awesome-bioinformatics's People

Contributors

0asa avatar danielecook avatar erictleung avatar pcantalupo avatar shenwei356 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.