Code Monkey home page Code Monkey logo

idseq-cli's Introduction

IDseq · GitHub license Build Status

Warning: this CLI will soon be deprecated, consider switching to version 2. This project is stable and currently still being maintained, but it is not actively under development. Any new features will be added to version 2. Here is a list of differences from version 1 including exciting new features!

logo

Infectious Disease Sequencing Platform

IDseq is an unbiased global software platform that helps scientists identify pathogens in metagenomic sequencing data.

  • Discover - Identify the pathogen landscape
  • Detect - Monitor and review potential outbreaks
  • Decipher - Find potential infecting organisms in large datasets

A collaborative open project of Chan Zuckerberg Initiative and Chan Zuckerberg Biohub.

Check out our repositories:

Getting Started/Usage Instructions

(1) Install and configure the Amazon Web Services Command Line Interface (AWS CLI):

For macOS users: We recommend trying the Homebrew package manager to install awscli. You can install by running these commands:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

brew install awscli

(2) Install the IDseq CLI:

pip install git+https://github.com/chanzuckerberg/idseq-cli.git --upgrade

  • Tips: Make sure you have Python installed already. Try running pip --version or python --version.

  • Try running with pip2 or pip3 depending on your configuration. Try sudo pip if you run into permissions errors. You can use this same command in the future to update the CLI if needed.

(3) Upload a single sample:

idseq -e YOUR_EMAIL -t YOUR_TOKEN -p'Your Project Name' -s 'Your Sample Name' \ --r1 your_sample_R1.fastq.gz --r2 your_sample_R2.fastq.gz

  • Replace YOUR_EMAIL with your IDseq email and YOUR_TOKEN with your upload token.

  • Supported file types: .fastq/.fq/.fasta/.fa or .fastq.gz/.fq.gz/.fasta.gz/.fa.gz

  • You can create a new project as part of this command. Provide the name of a project that doesn't exist yet and the command will create it for you.

  • You will be prompted to upload metadata in a CSV file with your samples. This is also where you will specify the Host Genome.

  • Your authentication token for uploading is the token after -t. Keep this private like a password!

  • Tips: Avoid copying commands into programs like TextEdit because it may change "straight quotes" into “smart quotes” (“ ‘ ’ ”) which will not be parsed correctly in your terminal.

  • The '' symbol means to continue on the next line in the terminal. If you use this in your command, make sure it is not followed by a space before the line break.

  • New to using a command line? You will need to use cd and ls to navigate to the folder on your computer containing the source files you want to upload. Guide here.

(Optional) Run the program in interactive mode:

Having trouble? Just run idseq without any parameters and the program will guide you through the process.

(Optional) Upload samples in bulk mode by specifying a folder:

idseq -e YOUR_EMAIL -t YOUR_TOKEN -p'Your Project Name' \ --bulk /path/to/your/folder

Edit the command in this text box and copy-and-paste:

idseq -e YOUR_EMAIL -t YOUR_TOKEN -p 'Your Project Name' --bulk .

  • The '.' refers to the current folder in your terminal. The program will try to auto-detect files in the folder.

Troubleshooting

OverflowError: cannot fit 'int' into an index-sized integer.

Your computer might be limited on RAM. Try reducing your file chunk size by adding --uploadchunksize 250 to your command. This splits your file into smaller pieces before uploading each one to IDseq.

Contributing

This project is not currently accepting contributions at this time.

This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to [email protected].

Reporting Security Issues

Please note: If you believe you have found a security issue, please responsibly disclose by contacting us at [email protected].

See SECURITY.md for more information.

idseq-cli's People

Contributors

cdebourcy avatar davidrissato avatar gregdingle avatar j-x-han avatar jameshowardwang avatar jshoe avatar kislyuk avatar markazhang avatar morsecodist avatar ryanking avatar tfrcarvalho avatar yunfangjuan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

lvreynoso rtvt123

idseq-cli's Issues

Status code: 502

Hi,
I am uploading fastq files in batch using the CLI and received Status Code: 502 for one of my samples (maybe more, but I am writing this in the middle of my upload), and I can't seem to find a key for various errors we might encounter.

The sample that failed with this status was a multi-part upload, in which the large file was split into 5000 MB chunks. However, I had other samples that needed a multi-part upload and uploaded successfully... Do you have a sense of what the 502 code could indicate and how I could troubleshoot? This error is most commonly a "bad gateway" error, but clearly that isn't applicable here.

Any thoughts greatly-appreciated! And happy to add any clarifications where needed. I just wasn't sure where to start.

Setup Airbrake for your Python application

Installation

Using pip

pip install -U airbrake

Setup

The easiest way to get set up is with a few environment variables (You can find your project ID and API KEY with your project's settings):

export AIRBRAKE_API_KEY=<Your project API KEY>
export AIRBRAKE_PROJECT_ID=<Your project ID>
export AIRBRAKE_ENVIRONMENT=production

and you're done!

Otherwise, you can instantiate your AirbrakeHandler by passing these values as arguments to the getLogger() helper:

import airbrake


logger = airbrake.getLogger(api_key="<Your project API KEY>", project_id=<Your project ID>)


try:
    1/0
except Exception:
    logger.exception("Bad math.")

For more information please visit our official GitHub repo.

Setup Airbrake for your Python application

Installation

Using pip

pip install -U airbrake

Setup

The easiest way to get set up is with a few environment variables (You can find your project ID and API KEY with your project's settings):

export AIRBRAKE_API_KEY=<Your project API KEY>
export AIRBRAKE_PROJECT_ID=<Your project ID>
export AIRBRAKE_ENVIRONMENT=production

and you're done!

Otherwise, you can instantiate your AirbrakeHandler by passing these values as arguments to the getLogger() helper:

import airbrake


logger = airbrake.getLogger(api_key="<Your project API KEY>", project_id=<Your project ID>)


try:
    1/0
except Exception:
    logger.exception("Bad math.")

For more information please visit our official GitHub repo.

Error: sequence item 0: expected str instance, bytes found

The splitting process fails to finish with error

Error: sequence item 0: expected str instance, bytes found

It occurs where the method parts returns

subprocess.check_output(
                "ls {}*".format(part_prefix), shell=True).splitlines()

Indeed the results of this is a byte string as you can see here (printing the result of the command before returning it):

[b'./s1.fastq__AWS-MULTI-PART-aa', b'./s1.fastq__AWS-MULTI-PART-ab', b'./s1.fastq__AWS-MULTI-PART-ac', b'./s1.fastq__AWS-MULTI-PART-ad', b'./s1.fastq__AWS-MULTI-PART-ae']

The problem is when the return of this method is used with join here:

"parts":
     ", ".join(f.parts(max_part_size)),

knowing that join cannot work with a byte string.

It is solved by replacing

subprocess.check_output(
                "ls {}*".format(part_prefix), shell=True).splitlines()

by

subprocess.check_output(
                "ls {}*".format(part_prefix), shell=True).decode("ascii").splitlines()

This issue could be related to:

  • my python version: Python 3.6.0
  • my locale: fr_FR.UTF-8

Thanks in advance!

TypeError: string indices must be integers

Hi! I am having this problem whenever I try to run idseq in bulk mode. I've been thinking, and if I am not mistaken IDseq was designed for illumina sequencing platforms, correct? My data comes from nanopore, could that be the problem?

Thanks,

Daniel

idseq-cli cannot continue to transfer from breakpoint

idseq-cli cannot continue to transfer from breakpoint after a failure of a transfer. Uploading files from a local machine is really hard so far.

Upload via sftp or a downloadable link from another service will be great as fastq files usually are large. Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.