Code Monkey home page Code Monkey logo

athena's Introduction

Athena

A toolkit for exploring structural variation mutation rates and dosage sensitivity

Copyright (c) 2019-Present, Ryan L. Collins and the Talkowski Laboratory.
Distributed under terms of the MIT License (see LICENSE).

Note: the functionality of this repo is incomplete and is under active development.


Table of Contents

Getting started

Other


Run from Docker

The recommended way to run Athena is from its dedicated Docker container hosted on Google Container Registry. This will handle all dependencies and installation for you, and ensure you are running the latest version.

$ docker pull us.gcr.io/broad-dsmap/athena
$ docker run --rm -it us.gcr.io/broad-dsmap/athena

Manual installation

If you would prefer to install Athena on your own system, you can do so with pip.

$ git clone https://github.com/talkowski-lab/athena.git
$ cd athena
$ pip install -e .

Invoking Athena

Athena is called from the command line:

$ athena --help
Usage: athena [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  annotate-bins          Annotate bins
  annotate-pairs         Annotate pairs
  breakpoint-confidence  Annotate breakpoint uncertainty
  count-sv               Intersect SV and 1D bins or 2D bin-pairs
  eigen-bins             Eigendecomposition of annotations
  feature-hists          Plot bin annotation distributions
  feature-stats          Compute feature distributions
  make-bins              Create sequential bins
  mu-predict             Predict mutation rates with a trained model
  mu-query               Query a mutation rate matrix
  mu-train               Train mutation rate model
  pair-bins              Create pairs of bins
  slice-remote           Localize slices of remote genomic data
  transform              Transform one or more annotations
  vcf-filter             Filter an input VCF
  vcf-stats              Get SV size & spacing

Athena has numerous subcommands. Specify --help with any subcommand to see a list of options available.

A note on design

This package was designed with canonical CNVs from the gnomAD-SV callset in mind.

To that end, it assumes input data follows gnomAD-SV formatting standards. This may cause issues for alternative styles of SV representation, for SV types other than canonical CNVs, or different metadata labels.

If using non-gnomAD data with Athena, please compare your VCF formatting standards, and the INFO field in particular.

You can read more about the gnomAD-SV dataset in the corresponding preprint.

About the name

This package is named after Athena, the Greek goddess of wisdom, strategy, tactics, and mathematics. She was selected as the namesake for this package given that it relies on understanding the features that influence structural variation mutation rates (wisdom), incorporating those features into a statistical model (mathematics), and using these models to infer which components of the genome are vulnerable to changes in copy number (a kind of genomic tactics/strategy).

athena's People

Contributors

rcollins13 avatar

Watchers

James Cloos avatar Snow avatar  avatar  avatar

athena's Issues

count-sv groups BED queries by chr and not chr_start_end

When I call count-sv with a 3-column BED file like this:

athena count-sv --query-format bed -o DEL.counts.tsv gnomad_v3.sv.filtered.DEL.sites.vcf.gz gene_desert_coords_chr22.bed

Where gene_desert_coords_chr22.bed looks like this:

#chr	start	end
chr22	1	100001
chr22	100001	200001
chr22	200001	300001
chr22	300001	400001
chr22	400001	500001
...

I get this output grouped by chr:

#query	n_svs
.	0
chr22	1689

When I'm expecting it to be grouped by row or chr_start_end. Possibly something to do with the last commit which is named Extend count-sv to work for gtf and generic BED?

The workaround is to just add a fourth column name in the format chr_start_end to make those groupings manually, which works fine.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.