Comments (6)
Make sure you read the CLI docs ("Usage" and "Reference" pages):
https://docs.nextstrain.org/projects/nextclade/en/stable/user/nextclade-cli/usage.html
Both of your invocations are invalid because you did not provide the --name
argument. The correct way to request SARS-CoV-2 dataset by name is:
nextclade dataset get --name="nextstrain/sars-cov-2/wuhan-hu-1/orfs" --output-dir="dataset/"
You can also use shortcut name of this particular dataset:
nextclade dataset get --name="sars-cov-2" --output-dir="dataset/"
These two invocations do the same thing.
Think of arguments as key-value pairs separated from other arguments with spaces:
--key1=value1 --key2=value2 --key3=value3
Each argument has a specific meaning in the context of the program you are using. In most cases you need both, the key and the value. The key is the pre-agreed name of the argument. By looking at the key the program understands what kind of information you want to provide to it. In the case of the dataset get
command, the argument to request a dataset by name happens to be called name
. So you must write --name
before giving the actual name of the dataset. The value is the piece of the information you want to give to the program. In this case it's the actual name of the dataset you request - "sars-cov-2"
. The equals sign (=
) between key and value is optional.
It is usually better to wrap the values in quotation marks, especially if it contains spaces:
--some-arg="my value with spaces"
Some arguments which mean to turn something on or off don't need value, only the key (this kind of arguments is sometimes called "flags"). A good example is --only-names
of the dataset list
command:
nextclade dataset list --only-names
As you see it does not have any value after it. It just toggles on the printing of only the names of the datasets, instead of the big table which it prints by default.
There are also so-called "positional" arguments, which have no key, but only the value. For example, when you pass a fasta file to nextclade run
:
nextclade run --input-dataset="dataset/" --output-dir="results/" "my_input_1.fasta" "my_input_2.fasta"
In this case there are two positional arguments: "my_input_1.fasta", "my_input_2.fasta". So as you see, positional arguments are good when you need to pass multiple things into the program.
You can find the available arguments and their meaning in the built-in help screen, by running the program with only the --help
flag:
nextclade --help
And in case of Nextclade, you can also read dedicated help screen different for each of the subcommands:
nextclade run --help
nextclade dataset list --help
nextclade dataset get --help
nextclade sort --help
None of this is specific to Nextclade (nextclade is used as a relevant example). These are the basics of using command-line programs (aka console or terminal programs, or CLI). There should be plenty of learning materials on this topic on the internet.
Regarding specifics of Nextclade, I would not download the dataset into the current directory (the .
passed to the --output-dir
in your example means "output directory is the current working directory"). It might make it difficult to separate input and output files later - they will all end up mixed up in the same directory.
There is also another, simpler way to run nextclade analysis:
nextclade run --dataset-name="sars-cov-2" --output-dir="results/" "my_input.fasta"
This does not need a separate dataset get
step. When using --dataset-name
argument, the dataset will be downloaded each time you run the program, and dataset files will not be written to your computer (which may or may not be what you want).
from nextclade.
Thank you for this assistance. I've gotten caught up in several other projects in the past couple of weeks, but I'm setting out this weekend to learn this in earnest.
I printed out the Nextclade documentation and spent the last week reading it all. That, along with your advice here has been helpful. I think I've figured out the basics of how to run Nextclade and get the sort of file I want (an ndjson at the moment).
I do have a few questions about parts of the documentation.
-
"Add multiple occurrences to increase verbosity further." I need all the help I can get, so I'd like to make Nextclade as verbose as possible. But I'm not sure what "multiple occurrences" means. Does it mean that you type in multiple v's, as in:
-vvvvv
or maybe-v -v -v -v
? How many do you have to enter for maximum verbosity? -
Are the brackets and other symbols in documentation real or not? For example, in the section below from the Nextclade documentation, am I supposed to include the brackets, the < and > symbols, and the "..."? Or are those not real? Based on what you said in the post above, I'm guessing they're not real, but I want to be 100% sure. I don't know how to tell things that are required to be in the code from things that are there but aren't supposed to be part of the code, and there doesn't seem to me to be any possible way to tell the difference. Is there an easy way to know this?
from nextclade.
Does it mean that you type in multiple v's, as in:
-vvvvv or maybe -v -v -v -v ? How many do you have to enter for maximum verbosity?
I think either should work. But I usually use the -vvv
form. Note that the verbosity levels higher than info
(with one -v
), including debug
and trace
verbosity levels, are mostly only useful for developers - they print way too much technical stuff, which will just confuse you more (i.e. do you really want to know what crypto algorithm is being negotiated when SSL handshake is established during HTTP connection when a dataset file is downloaded? Probably not). Note that verbosity levels only affect what's printed to the console. Output files are always the same.
All verbosity levels are listed under --verbosity
argument. It allows to set a level you want directly, without counting how many -v
or -q
flags you need. The default level is warn
. One -v
moves verbosity a level up, one -q
moves it a level down.
Are the brackets and other symbols in documentation real or not?
This is a convention for denoting variables (placeholders). The <thing>
means required value, that is you need to put a thing there. While [thing]
means optional value, and [thing]...
repeatable value, i.e. you can put one or multiple things in there. You don't need to enter brackets, only the value itself.
I think the convention originally comes from man
(1, 2, 3) - a tool to read manual pages on Unix-like systems. The page you screenshotted is autogenerated using a docs generation utility though, and I am not sure how closely it follows the convention. But that's the general idea.
I think I've figured out the basics of how to run Nextclade and get the sort of file I want (an ndjson at the moment)
I would not recommend JSON and NDJSON outputs, because they are unstable, meaning the format can change without notice. This is mentioned in the docs. You probably want TSV output (--output-tsv
). It's stable and easy to open in Excel, Google Sheets or any other spreadsheet software. Use "tab" (\t
) as column delimiter if your software cannot detect it automatically (it will typically ask you). That's what we recommend for most users. By the way, the output files in CLI are exactly the same as what's in the "Export" page of the web app. So if you are accustomed to using Nextclade Web export files, then you will find the CLI outputs familiar as well.
from nextclade.
Thank you. I have used the Nextclade Web export files a lot, so I'm familiar with those. I'm trying to get ndjson files because I want to be able to search them using Julia, which I (half) learned and was ready to start using before I realized there was something called bash that I really should've learned before I ever even tried Julia because it's impossible to do anything without bash.
from nextclade.
Is there a way to get the GISAID accession numbers from Nextclade? I'm doing a search and the only results I can get are the sequence names, which I then have to paste one at a time into the GISAID text search in order to find and download the fastas. I'd like to be able to paste all the EPI_ISL numbers at once so I can download them easily, but I don't see them anywhere in the TSV file or the ndjson file and I'm not sure where else they would be.
from nextclade.
Is there a way to get the GISAID accession numbers from Nextclade?
Not sure what you mean here. Nextclade software does not deal with GISAID and does not even know what accession is, or that GISAID even exists. We don't rely on any database. The only source of data is the input files users provide - input fasta files and dataset files.
Sequence names are taken from your input fasta file and presented in the output files as is. If your fasta file does not contain accession you will not get it from Nextclade. So it's your responsibility to set the names in your input fasta such that you get desired names in the output TSV.
Or do you mean something else?
By the way, sequence names are not guaranteed to be unique - scientists often don't bother with naming their produced sequences too much and it's a bit of a chaos. So it's not always possible to deduce exact sequence just from the name.
from nextclade.
Related Issues (20)
- Feature Request: Dataset download all datasets within specified path HOT 4
- Empty input file causes uncaught error in v3 (it didn't in v2)
- Erroneous Clade Assignment or More Refined Tool? HOT 4
- Add a BA.1 reference for the web nextclade version HOT 4
- error when using `nextclade dataset get --verbosity` flag HOT 3
- 21L Tree Updates? HOT 2
- `--input-pcr-primers` listed in CLI help options despite being removed in v3 HOT 2
- When using `?input-fasta=` url query param without specifying dataset, web auto-starts analysis (prematurely) HOT 5
- Scrollbar shown for dataset names in dataset picker HOT 9
- how to generate the result table by the cli version auspice HOT 4
- output TSV column(s) for missing bases at beginning and end of sequence? HOT 1
- --input-dataset parameter HOT 5
- Update Fred Hutch logo
- How to get the latest Lineage- with CLI HOT 4
- Community build cache validity bug HOT 2
- Developer guide uses deprecated CLI option
- docs: document nextalign-like use-case HOT 1
- ENH(nextclade cli): nextclade dataset list: indicate whether clades can be assigned HOT 4
- nextclade run --output-columns-selection throws error for seqName and includes index even though I don't want index HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nextclade.