Code Monkey home page Code Monkey logo

seeka's Introduction

Build Status License: GPL v3 Don't judge me

โš ๏ธ THIS SOFTWARE DOES NOT DO MUCH YET!

seeka

Get microbial sequence data easier and faster

Motivation

Traditionally if you saw an accession number in a manuscript, you would paste it into NCBI Search and then muck around trying to download the associated data. There were wizards who could use the Entrez interface and its associated command line tools, but it needs to be easier.

A variety of tools now exist to download data from NCBI and ENA:

Combined they are powerful. I use them. Some work with assemblies, some with both, some with both, but often with confusing caveats and annoying parameters I don't feel I should have to think about.

I just want to do this:

% seeka PRJEB5167

% cd PRJEB5167
% ls
ERR405852 ERR405853 ERR405854 ERR405855 ERR405856 ERR405857 ERR405858
ERR405859 ERR405860 ERR405861 ERR405862 ERR405863 ERR405864 ERR405865
ERR405866 ERR405867 ERR405868 ERR405869 ERR405870 ERR405871 ERR405872
PRJEB5167.tsv

% head -n 1 PRJEB5167.tsv | tr "\t" "\n" | head | nl
     1	study_accession
     2	secondary_study_accession
     3	sample_accession
     4	secondary_sample_accession
     5	experiment_accession
     6	run_accession
     7	submission_accession
     8	tax_id
     9	scientific_name
    10	instrument_platform

% cd ERR405855
% ls
ERR405855_1.fastq.gz ERR405855_2.fastq.gz

Quick Start

% seeka --version
seeka 0.4.2

# download a single run
% seeka ERR405852

# get data for a biosample
% seeka SAMEA2297485

# get every read set in a project
# seeka PRJEB5167

Accession IDs to be supported

  • GCA_nnnnnnnnn.v - Genbank assembly
  • [A-Z]{4}01000000 - Genbank assembly
  • GCF_nnnnnnnnn.v - Refseq assembly
  • NC_nnnnnn.v - Refseq assembly
  • PRJ{EB,NA} - SRA project
  • [SED]RRnnnnnnn - SRA read set (FASTQ)
  • [SED]RXnnnnnnn - SRA experiment
  • [SED]RPnnnnnnn - SRA study
  • [SED]RSnnnnnnn - SRA sample
  • SAM[NED] - Biosamples

Output files

  • seeka.ACCESSION.tsv - metadata TSV for search query
  • *.fastq.gz - any read data
  • *.fna.gz - any assemblies in FASTA
  • *.gbff.gz - any Genbank files in FASTA

Installation

Conda

Install Conda or Miniconda:

conda install -c conda-forge -c bioconda -c defaults seeka # COMING SOON

Homebrew

Install HomeBrew (Mac OS X) or LinuxBrew (Linux).

brew install brewsci/bio/seeka # COMING SOON

Source

This will install the latest version direct from Github. You'll need to add the seeka bin directory to your $PATH, and also ensure all the dependencies are installed.

cd $HOME
git clone https://github.com/tseemann/seeka.git
$HOME/seeka/bin/seeka --help

Dependencies

  • perl >= 5.26
  • ascp from the Aspera Command Line Tools
  • rsync
  • esearch, efetch, elink from the Entrez edirect toolkit

License

seeka is free software, released under the GPL 3.0.

Issues

Please submit suggestions and bug reports to the Issue Tracker

References

Author

Torsten Seemann

seeka's People

Contributors

tseemann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.