Code Monkey home page Code Monkey logo

gisaid-download's Introduction

gisaid-download

Purpose: Assisted download of selected samples from GISAID or EPI_SET creation.

gisaid-download is a tool for acuiring metadata for selected samples from gisaid. It was produced primarily for use with the EpiCoV database but should work just as well for EpiFlu or EpiPox. Fully-automated download from gisaid's website is tricky, but manual download is a slow, error-prone process requiring renaming and moving around files as you go. With gisaid-download, you are guided through the process of downloading desired samples.

Features

Downloading Samples:

  • keeps track of which samples you (or your team-mates) have already downloaded so you only get the new ones
  • can download batches of samples from multiple locations
  • moves/renames files to a consolidated directory as they're downloaded
  • can limit which metadata files you download to any combination of the following:
    • fasta: [Nucleotide sequences]
    • meta: [Dates and Location, Patient status metadata, Sequencing technology metadata]
    • ackno: [Acknowledgement pdfs] (can only do this for 500 samples at a time, so EPI_SETs are better)

Uploading samples for further analysis:

  • can upload all that data to an hpc via sftp
  • can add a command to run after upload to start your analysis pipeline (We might make ours available someday!)

Getting an EPI_SET:

  • can walk you through getting an EPI_SET identifier for all of your samples
  • this will be emailed to you by GISAID

Some of the above features use sftp or ssh via scripted commands with the help of the package hpc-interact. It has its own config for storing login credentials. If needed and not yet made, your credentials will be gathered over the command line. That file can be specified in gisaid_config.ini and only requires two lines:

username=myuser
password=mypass

Installation

pip install gisaid-download

Usage

The first time you use gisaid-download, you'll need to set up a config file: gisaid_config.ini . Download it to your pwd or chosen outdir via:

gisaid_download --example -o gisaid/directory

Edit the config file to serve your needs. There are details in there that explain most everything.

Now you're ready to do the download and/or get your EPI_SET. Let's say we only want samples that were collected before 2023. We'll set a date (required for creating unique filenames). To run, do this:

sample_date='2023-01-01'
gisaid_download ${sample_date}

Behavior

The above command triggers up to four steps. Steps 1, 3, and 4 only happen if you're interacting with the hpc cluster. If using the --no_cluster (or -n) flag, they will be skipped.

Step 1: Update local list of downloaded sequences

If you're planning on transferring downloaded data to an hpc, the above command will first look for samples that already exist on the hpc at your cluster_epicov_dir (from config). This uses sftp via (via hpc-interact). If no data yet exists on the cluster or you're the only one downloading samples and transferring them to the hpc, you can skip this step by adding the --skip_local_update (or -s) flag like this:

gisaid_download ${sample_date} --skip_local_update

The flag -n can also be used to skip this step along with step 3 and 4.

Step 2: Download sequences

This is an interactive download that by default requires pressing enter after each step. If you don't want to press enter as much (like if you get in the rhythm and can't be bothered to stop...), you can add the --quick (or -q) flag like this

gisaid_download ${sample_date} --quick

Step 3: Upload sequences to hpc

Using sftp (via hpc-interact), all the data downloaded in Step 2 will be uploaded to the hpc at your cluster_epicov_dir.

The flag -n can also be used to skip this step along with step 1 and 4.

Step 4: Run a followup command on the hpc

If specified in your gisaid_config.ini, followup_command will be run by ssh through hpc-interact. This could be any string, but we recommend setting it to run a script that will begin analyzing the data you just uploaded.

gisaid-download's People

Contributors

skunklem avatar

gisaid-download's Issues

Unable to set up the config file

I am unable to set up my config file, with this error appearing.

Traceback (most recent call last):
  File "/Users/u2240321/Documents/PIP/GISAIDExtract/epiflu/bin/gisaid_download", line 8, in <module>
    sys.exit(main())
  File "/Users/u2240321/Documents/PIP/GISAIDExtract/epiflu/lib/python3.9/site-packages/gisaid_download/gisaid_download.py", line 609, in main
    date,locations,downloads,filetype_choices,meta_files,get_epi_set,epicov_dir,ssh_vars,wait,skip_local_update,followup_command,cluster_interact,custom_filters,example,outdir = getVariables()
  File "/Users/u2240321/Documents/PIP/GISAIDExtract/epiflu/lib/python3.9/site-packages/gisaid_download/gisaid_download.py", line 181, in getVariables
    return args.date,args.location,args.downloads,filetype_choices,meta_files,args.get_epi_set,args.epicov_dir,ssh_vars,args.wait,args.skip_local_update,followup_command,args.cluster_interact,custom_filters,example,args.outdir
UnboundLocalError: local variable 'meta_files' referenced before assignment

I am using Python 3.9.13.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.