chanzuckerberg / idseq-cli Goto Github PK

IDseq infectious disease command-line interface

License: MIT License

Makefile 0.56% Python 99.44%

idseq-cli's Introduction

IDseq ·

Warning: this CLI will soon be deprecated, consider switching to version 2. This project is stable and currently still being maintained, but it is not actively under development. Any new features will be added to version 2. Here is a list of differences from version 1 including exciting new features!

Infectious Disease Sequencing Platform

IDseq is an unbiased global software platform that helps scientists identify pathogens in metagenomic sequencing data.

Discover - Identify the pathogen landscape
Detect - Monitor and review potential outbreaks
Decipher - Find potential infecting organisms in large datasets

A collaborative open project of Chan Zuckerberg Initiative and Chan Zuckerberg Biohub.

Check out our repositories:

idseq-web - Frontend portal
idseq-workflows - Bioinformatics workflows
idseq-cli - Command line upload interface (here)
idseq-bench - Pipeline benchmarking tools

Getting Started/Usage Instructions

See live instructions and view your user token at https://idseq.net/cli_user_instructions

(1) Install and configure the Amazon Web Services Command Line Interface (AWS CLI):

For macOS users: We recommend trying the Homebrew package manager to install awscli. You can install by running these commands:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

brew install awscli

Otherwise follow the AWS installation instructions here: https://docs.aws.amazon.com/cli/latest/userguide/installing.html
Verify it works by running aws help, which should display usage instructions. You do not need to set up AWS credentials unless you're using the bulk upload mode.

(2) Install the IDseq CLI:

pip install git+https://github.com/chanzuckerberg/idseq-cli.git --upgrade

Tips: Make sure you have Python installed already. Try running pip --version or python --version.
Try running with pip2 or pip3 depending on your configuration. Try sudo pip if you run into permissions errors. You can use this same command in the future to update the CLI if needed.

(3) Upload a single sample:

idseq -e YOUR_EMAIL -t YOUR_TOKEN -p'Your Project Name' -s 'Your Sample Name' \ --r1 your_sample_R1.fastq.gz --r2 your_sample_R2.fastq.gz

Replace YOUR_EMAIL with your IDseq email and YOUR_TOKEN with your upload token.
Supported file types: .fastq/.fq/.fasta/.fa or .fastq.gz/.fq.gz/.fasta.gz/.fa.gz
You can create a new project as part of this command. Provide the name of a project that doesn't exist yet and the command will create it for you.
You will be prompted to upload metadata in a CSV file with your samples. This is also where you will specify the Host Genome.
- Instructions: https://idseq.net/metadata/instructions
- Metadata dictionary and supported host genomes: https://idseq.net/metadata/dictionary
- Metadata CSV template: https://idseq.net/metadata/metadata_template_csv
Your authentication token for uploading is the token after -t. Keep this private like a password!
Tips: Avoid copying commands into programs like TextEdit because it may change "straight quotes" into “smart quotes” (“ ‘ ’ ”) which will not be parsed correctly in your terminal.
The '' symbol means to continue on the next line in the terminal. If you use this in your command, make sure it is not followed by a space before the line break.
New to using a command line? You will need to use cd and ls to navigate to the folder on your computer containing the source files you want to upload. Guide here.

(Optional) Run the program in interactive mode:

Having trouble? Just run idseq without any parameters and the program will guide you through the process.

(Optional) Upload samples in bulk mode by specifying a folder:

idseq -e YOUR_EMAIL -t YOUR_TOKEN -p'Your Project Name' \ --bulk /path/to/your/folder

Edit the command in this text box and copy-and-paste:

idseq -e YOUR_EMAIL -t YOUR_TOKEN -p 'Your Project Name' --bulk .

The '.' refers to the current folder in your terminal. The program will try to auto-detect files in the folder.

Troubleshooting

`OverflowError: cannot fit 'int' into an index-sized integer.`

Your computer might be limited on RAM. Try reducing your file chunk size by adding --uploadchunksize 250 to your command. This splits your file into smaller pieces before uploading each one to IDseq.

Contributing

This project is not currently accepting contributions at this time.

This project adheres to the Contributor Covenant code of conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to [email protected].

Reporting Security Issues

Please note: If you believe you have found a security issue, please responsibly disclose by contacting us at [email protected].

See SECURITY.md for more information.

idseq-cli's People

Contributors

Stargazers

Watchers

Forkers

lvreynoso rtvt123

idseq-cli's Issues

Status code: 502

Hi,
I am uploading fastq files in batch using the CLI and received Status Code: 502 for one of my samples (maybe more, but I am writing this in the middle of my upload), and I can't seem to find a key for various errors we might encounter.

The sample that failed with this status was a multi-part upload, in which the large file was split into 5000 MB chunks. However, I had other samples that needed a multi-part upload and uploaded successfully... Do you have a sense of what the 502 code could indicate and how I could troubleshoot? This error is most commonly a "bad gateway" error, but clearly that isn't applicable here.

Any thoughts greatly-appreciated! And happy to add any clarifications where needed. I just wasn't sure where to start.

Cosmetic problem: "RuntimeError: Set changed size during iteration"

This error message comes from within tqdm. Tried workaround of setting tqdm.monitor_interval = 0 but does not always work. The files are uploaded fine though.

align tests

Setup Airbrake for your Python application

Installation

Using pip

pip install -U airbrake

Setup

The easiest way to get set up is with a few environment variables (You can find your project ID and API KEY with your project's settings):

export AIRBRAKE_API_KEY=<Your project API KEY>
export AIRBRAKE_PROJECT_ID=<Your project ID>
export AIRBRAKE_ENVIRONMENT=production

and you're done!

Otherwise, you can instantiate your AirbrakeHandler by passing these values as arguments to the getLogger() helper:

import airbrake


logger = airbrake.getLogger(api_key="<Your project API KEY>", project_id=<Your project ID>)


try:
    1/0
except Exception:
    logger.exception("Bad math.")

For more information please visit our official GitHub repo.

Uses deprecated iteritems() function for dictionaries

the cli fails with python 3.6. .items() is the new way to iterate. suggested solutions here.

an example error:

line 68, in clean_samples2files

`return {k: sorted(v) for k, v in samples2files.iteritems() if len(v) in [1, 2]}`

Investigate CLI upload problems

Specifically token access, uploading local files, and documentation

Setup Airbrake for your Python application

Installation

Using pip

pip install -U airbrake

Setup

The easiest way to get set up is with a few environment variables (You can find your project ID and API KEY with your project's settings):

export AIRBRAKE_API_KEY=<Your project API KEY>
export AIRBRAKE_PROJECT_ID=<Your project ID>
export AIRBRAKE_ENVIRONMENT=production

and you're done!

Otherwise, you can instantiate your AirbrakeHandler by passing these values as arguments to the getLogger() helper:

import airbrake


logger = airbrake.getLogger(api_key="<Your project API KEY>", project_id=<Your project ID>)


try:
    1/0
except Exception:
    logger.exception("Bad math.")

For more information please visit our official GitHub repo.

Switch to Python3 eventually

Probably a good idea since idseq-dag will also be on python3 / for future-proofing

Error: sequence item 0: expected str instance, bytes found

The splitting process fails to finish with error

Error: sequence item 0: expected str instance, bytes found

It occurs where the method parts returns

subprocess.check_output(
                "ls {}*".format(part_prefix), shell=True).splitlines()

Indeed the results of this is a byte string as you can see here (printing the result of the command before returning it):

[b'./s1.fastq__AWS-MULTI-PART-aa', b'./s1.fastq__AWS-MULTI-PART-ab', b'./s1.fastq__AWS-MULTI-PART-ac', b'./s1.fastq__AWS-MULTI-PART-ad', b'./s1.fastq__AWS-MULTI-PART-ae']

The problem is when the return of this method is used with join here:

"parts":
     ", ".join(f.parts(max_part_size)),

knowing that join cannot work with a byte string.

It is solved by replacing

subprocess.check_output(
                "ls {}*".format(part_prefix), shell=True).splitlines()

subprocess.check_output(
                "ls {}*".format(part_prefix), shell=True).decode("ascii").splitlines()

This issue could be related to:

my python version: Python 3.6.0
my locale: fr_FR.UTF-8

Thanks in advance!

TypeError: string indices must be integers

Hi! I am having this problem whenever I try to run idseq in bulk mode. I've been thinking, and if I am not mistaken IDseq was designed for illumina sequencing platforms, correct? My data comes from nanopore, could that be the problem?

Thanks,

Daniel

idseq-cli cannot continue to transfer from breakpoint

idseq-cli cannot continue to transfer from breakpoint after a failure of a transfer. Uploading files from a local machine is really hard so far.

Upload via sftp or a downloadable link from another service will be great as fastq files usually are large. Thank you.

chanzuckerberg / idseq-cli Goto Github PK

idseq-cli's Introduction

IDseq ·

Infectious Disease Sequencing Platform

Getting Started/Usage Instructions

(1) Install and configure the Amazon Web Services Command Line Interface (AWS CLI):

(2) Install the IDseq CLI:

(3) Upload a single sample:

(Optional) Run the program in interactive mode:

(Optional) Upload samples in bulk mode by specifying a folder:

Troubleshooting

OverflowError: cannot fit 'int' into an index-sized integer.

Contributing

Reporting Security Issues

idseq-cli's People

Contributors

Stargazers

Watchers

Forkers

idseq-cli's Issues

Installation

Using pip

Setup

Installation

Using pip

Setup

Recommend Projects

Recommend Topics

Recommend Org

`OverflowError: cannot fit 'int' into an index-sized integer.`