qiime2 / docs Goto Github PK

View Code? Open in Web Editor NEW

21.0 21.0 57.0 9.83 MB

Home Page: https://docs.qiime2.org

License: BSD 3-Clause "New" or "Revised" License

Makefile 8.85% CSS 8.19% JavaScript 5.15% Python 63.31% HTML 14.49%

hacktoberfest

docs's Introduction

qiime2 (the QIIME 2 framework)

Source code repository for the QIIME 2 framework.

QIIME 2™ is a powerful, extensible, and decentralized microbiome bioinformatics platform that is free, open source, and community developed. With a focus on data and analysis transparency, QIIME 2 enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results.

Visit https://qiime2.org to learn more about the QIIME 2 project.

Installation

Detailed instructions are available in the documentation.

Users

Head to the user docs for help getting started, core concepts, tutorials, and other resources.

Just have a question? Please ask it in our forum.

Developers

Please visit the contributing page for more information on contributions, documentation links, and more.

Citing QIIME 2

If you use QIIME 2 for any published research, please include the following citation:

Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener C, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibbons SM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley GA, Janssen S, Jarmusch AK, Jiang L, Kaehler BD, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MGI, Lee J, Ley R, Liu YX, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton JT, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson MS, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CHD, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, and Caporaso JG. 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37:852–857. https://doi.org/10.1038/s41587-019-0209-9

docs's People

Contributors

Stargazers

Watchers

Forkers

jakereps gregcaporaso ebolyen andrewsanchez maxvonhippel nervous-laughter eldeveloper kestrelgorlick jairideout mortonjt nbokulich patthehat033 colinbrislawn kastman cduvallet johnchase chriskeefe turanoo wasade antgonza lisa55asil willwcb sejsong jordan2994 uggulhan patrickegon jwdebelius shiffer1 mikerobeson mestaki yixiangzhang1996 luponsky benkaehler oddant1 rckarns8 fedarko helenhip artzet-s jiduhai mcmk3 shihuang047 ziyan-lin nvt-1009 keegan-evans lizgehret larteyis adamovanja xphab valentynbez marijakopanja stephts gajenderaleti fearghallyons gayansamuditha cherman2 hagenjp

docs's Issues

automate uploading build to S3

Built docs are currently hosted via GitHub Pages. We'll need to transition to a Amazon S3 bucket at some point. Having a script to automate uploading built docs to a versioned subdirectory in an S3 bucket would be useful. Thanks @thermokarst for the idea!

Various thoughts/recommendations on documentation

Improvement Description
I have reviewed the documentation and tutorial with an eye toward figuring out how a completely novice user (e.g., new microbiology grad student without any bioinformatics or programming experience) would view the material. Most of the documentation is fantastic (esp. for alpha) and I love new features, such as the glossary, that improve usability over the qiime1 docs. I have various suggestions below, labeled by what I see as the importance: "[high]", "[low]", or "[enhancement]" (the latter meaning it would enhance usability but is not currently a hindrance to understanding the docs). Some I expect are already planned anyway, but I hope my comments may help hammer these out.

Importing Data [high]

This section needs more detail on the expected file names and other requirements for each type. For example, it is unclear that specific filenames are actually enforced for FeatureData[Sequence] (and I suspect for other semantic types as well). The following error is clear (to me), but it's best to avoid this via better documentation:
ValueError: Missing one or more files for EMPMultiplexedDirFmt: 'sequences.fastq.gz'

Directory of Methods [high] COMPLETED

Need a directory of methods, to fill the same niche as qiime1's script index. Most are currently covered in the tutorial, but not all and this will only expand as qiime2 grows. One issue is that qiime2's methods are hidden within plugins, not free-standing commands, and hence just listing the plugins does not reveal all potential methods (and just a few names are not immediately transparent or are jargony). Some sort of function description index, written in plain english for new users, rather than a list of methods names, could be a useful way to approach this (and an enhancement above qiime1's script index, which was difficult to navigate and translate at times). Instead of listing the plugin or command, list the function. Each function will link to 1) the method entry on the plugin doc page or 2) a tutorial page for multi-step procedures (e.g., procrustes plots). For example, functions such as "demultiplex sequences", "build phylogeny", and "pick OTUs" could all be listed as functions.

Artifact Format Documentation [low]

qzv/qza formats are confusing, and as someone very familiar with qiime1 it took me some time to understand what these file formats are and why they are used. The rationale for these formats should be better documented, along with an explanation that these files can be unzipped to examine the contents. This rationale can link to the pages on semantic types and provenance tracking to discuss those topics. some discussion appears here but this should be more clearly documented here and elsewhere (perhaps on its own page that appears in the table of contents). Also make a note of this in the glossary.
As an aside (and I know it's too late to quip about this), I don't really like the choice of the term "artifact", because it has other meaning in biology, e.g., "sequencing artifact".

Taxonomy Format [low]

A discussion of the taxonomy format could be useful. Terms like "level 2" are used in the docs but are not immediately apparent to outsiders, nor will a google search be much help. This may be appropriate to include within a file format page (see recommendation below).

Doc Version Archive [enhancement] COMPLETED

The "ported wiki documentation" is very useful, and I recommend continuing to build this as an archive of release docs if possible, rather than removing these pages. One frustration with the qiime1 site was that docs only covered the release version, and if working with an earlier version of qiime or reviewing a list of commands/files generated using an earlier version of qiime, the older docs no longer existed. As qiime2 grows, may I recommend keeping the "ported wiki documentation" as a table of contents (TOC) at the bottom of the current release docs TOC, which will link to TOCs for archived doc versions.

Glossary: add other glossaries? [enhancement]

I LOVE the glossary, as it defines some of the lingo-y words that are new to qiime2. This should be on the reading list of everyone starting with qiime2, to whom "action" and "method" are otherwise more general terms, and "artifact" is not entirely intuitive. I wonder whether it would be useful to include separate glossaries on more general microbiome terminology, and on file types. I recommend separate, because this will keep the technical glossary pure and simple.

Microbiome Terminology: Much of this goes outside of the jurisdiction of qiime, but could be very useful to new users (and would give the developers control over the terminology). After all, users come from all backgrounds and qiime may be the first exposure to any kind of bioinformatics software, microbiome/ecology concepts, or all of the above for many users. For many of these terms, great explanations exist elsewhere on the web (though not necessarily with a simple google search), and a short sentence and link will suffice (and link to citation if appropriate). Some useful terms: distance matrix, OTU, feature table, demultiplex, barcode, index (see barcode), metadata, phiX, chimera, biom, metric (e.g., alpha diversity), (include alpha/beta diversity metrics in glossary, short sentences such as shown here and a link ideally to the original citation would suffice), alpha diversity, beta diversity, discrete (metadata), continuous (metadata), ordination, PCoA, richness

File Formats/Types: In many ways, this is should be similar to qiime1's file types page. A similar resource does not yet exist in qiime2. This is in part to describe file formats that are used in qiime2, and in part to describe how to input specific file types into qiime2 artifacts (yeah, yeah, could be more appropriately described in importing data but if that doc expands to include this you can link to the entries for each file format in that doc from this glossary). Some formats/terms to include: fasta, fastq, gz, qza, qzv, mapping file, biom, OTU table, feature table

Hope this all helps. I can elaborate on details / brainstorm more if prompted.

References
Ported from original issue.

should commands use explicit .qza/.qzv extension for output files?

@thermokarst and I were chatting about this today. Commands in the docs are a little magical because output artifacts/visualizations aren't named with a .qza/.qzv extension. When only specifying the basename of the output file (sans extension), it isn't clear how the .qza/.qzv extension factors into output files. It also is hard to determine if the output filename is an artifact or visualization b/c the extension isn't in the output name. Using the basename only also makes it ambiguous whether the output is being saved as a directory or a file without an extension.

I had to explain this behavior several times during the Phoenix workshop, and it seemed to confuse users because .qza/.qzv files were a new concept that was just introduced, and the commands don't explicitly reflect how those extensions relate to output files.

Can we use explicit file extensions in the docs, and note somewhere that if the extension is omitted, it will automatically be appended?

list older AMIs in addition to current AMI

Similar to how we're linking to older QIIME 2 VB images.

commandblock support for multi-line command execution

It would be great if Sphinx and/or our commandblock directive supported multiline commands:

$ qiime \
  tools \
  import \
  --help

can we import unaligned sequences in feature-classifier tutorial

In the feature-classifier.rst document we import aligned reads:

    qiime tools import \
      --type FeatureData[Sequence] \
      --input-path aligned_85_otu_sequences.fasta.gz \
      --output-path 85_otus.qza

and then run extract-reads where we extract based on the primers:

    qiime feature-classifier extract-reads \
      --i-sequences 85_otus.qza \
      --p-f-primer GTGCCAGCMGCCGCGGTAA \
      --p-r-primer GGACTACHVGGGTWTCTAAT \
      --p-read-length 100 \
      --o-reads ref-seqs.qza

@BenKaehler, can we import unaligned sequences here? It looks like we just strip the gap characters, and unaligned reads are easier for users to provide in general (and also work directly if we want to train a classifier without extracting reads). So importing unaligned reads is preferable if it works.

(nested) unordered lists render oddly

Example:

update feature-classification required resources text

The last couple of sentences are outdated:

MVP data appear to be phred+64

The metadata.yaml following the import of the raw sequences, and the metadata.yaml following demux, indicate the data are phred 33. However, that does not appear to be accurate as the character set used includes characters defined outside of phred 33 encoding (e.g., "["):

$ funzip sequences.fastq.gz | head
@HWI-EAS440_0386:1:23:17547:1423#0/1
TACGNAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGATGGATGTTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGATACTGGATATCTTGAGTGCAGTTGAGGCAGGGGGGGATTGGTGTG
+
hhhdHddddddddfehhfhhhghggfhhhfhhgggfhhgfgdfcfhehfdgfhggfggfggffgddfgdffdgdaagaaddcbdccc]a^ad__a]_____ba_`a`__^__\]^OWZR\Z\\WYTZ_U^BBBBBBBBBBBBBBBBBBBBBB

I checked with @gregcaporaso and he indicated that it made sense given the age of the data.

At the time of creation of this issue, I do not believe there is a functional impact with the inaccuracy of metadata.yaml as quality scores are not interrogated explicitly by Q2 in this tutorial.

Footer not contained in main container

It would be nice if the footer was moved into the main container so that it was centered on the screen, etc.

Revise existing command blocks to use multiline formatting

Low-resolution displays make it hard to follow what is going on with long commands.

Blocked by #37

updates to commands in tutorials

Ported from original issue.

--p-counts-per-sample → --p-sampling-depth (pending qiime2/q2-feature-table#45 and qiime2/q2-diversity#57)
~~feature-table merge-taxa-data → taxa merge~~ (see comment on qiime2/q2-feature-table#47)
feature-table view-taxa-data → taxa tabulate (pending qiime2/q2-feature-table#46)
feature-table view-seq-data → feature-table tabulate-seqs (pending qiime2/q2-feature-table#46)

resources.qiime2.org

Create page(s) for resources similar to QIIME 1's resources.qiime.org. This will include:

protocols (#3)
virtual machine image download links/AMIs/etc
reference databases
useful datasets (link to mockrobiota, link to Qiita, ...)

`q2studio` installation and usage doc doesn't specify to be in conda env

Related a question on the forum. The q2studio documentation doesn't mention that you have to be in your conda environment for it to be able to find qiime2 and run properly.

artifact api doc shows output from wrong FeatureTable

The table that is being used in the artifact api doc changed, but I didn't update the output from the commands that print to the terminal. We should update this to avoid confusion.

port the qiime 2 usage information from the native install guide to a separate document

Currently there is information in the native install guide that will also be relevant to users who install QIIME 2 using the virtual machines.

support incremental builds with `make html` (commandblock directive)

Current Behavior
make html will perform a make clean first, effectively disabling incremental builds. The current commandblock directive is not safe to use with incremental builds because data that has already been generated from a previous build will not be regenerated, and data that should be cleaned/removed in the current (incremental) run will remain as orphaned files in the build.

Proposed Behavior
Incremental builds should be possible if a hook can be added that will delete the current file's corresponding data assets directory (under source/assets/) before any commands are executed in the file.

References
Ported from original issue.

Reword Moving Pictures tutorial to prevent overloading the word "artifact"

See this section, for an example.

Look into Sphinx-alternative

References
Frozen-Flask came up in discussion this week.

Sort `command-block` collected qza/qzv files

Current Behavior
It's a little confusing when the itemized list of outputs for a multi-command block aren't presented in the same order as the commands as executed.

References

RawSequences is not described in semantic types

In the description of semantic types (https://docs.qiime2.org/2.0.6/semantic-types/), RawSequences is missing. And, if there is any documentation on it (e.g., required format for creating an artifact from sequence data), Google does not know about it. All I was able to find was the tutorial that made use of it, but never really described it. I can read the source code and figure it out, but it would be much more efficient if it was well-documented somewhere.

I am a CS person moving into bioinformatics, if this background helps explain my thought processes.

Update Slackin Inviter URL

Change to https://slack.qiime2.org/

Note, this subdomain isn't set up yet. Will update here when ready.

Client-side URL "upgrade"

I am wondering if we should do something like the Django docs, where we provide a "version selector", and also some way to indicate that an old doc is "stale" or "outdate" (see the red bar at the top of the page).

Moving Pictures tutorial references "sample type"

"Sample type" is now "body site", update the prose.

split multiple commands into their own command blocks

References
See #107 for discussion.

Drop `curl` from `.. download::`

`make html` fails on macOS 10.12 due to conda-installed curl

The r packages installed as part of bootstrapping a QIIME2 environment apparently have a curl requirement. It looks like the recipe uses my system-installed curl, but, conda rewrites the certs path. Running make html with the -s removed on a curl command:

Extension error:
Command 'curl https://codeload.github.com/qiime2/q2studio/zip/0.0.6 -o q2studio-0.0.6.zip' exited with non-zero return code 77.

stdout:



stderr:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  curl: (77) error setting certificate verify locations:
  CAfile: /Users/matthew/miniconda3/envs/q206/etc/pki/tls/certs/cacert.pem
  CApath: none

We have had a few curl-related issues pop up lately, maybe it is worth looking at wget instead?

add page on protocols

Ported from original issue.

This was requested during the Iceland workshop. Basically protocols/recommendations for researchers starting a study: links to primers, PrimerProspector, EMP protocol, etc. See http://resources.qiime.org for some useful protocols to include.

cURL commands for retrieval of MVP tutorial data are cURL version dependent

It appears that different versions of cURL handle redirects differently. wget works, but it does not ship with OSX by default. Versions of cURL observed to have the problem:

Linux:

$ curl --version
curl 7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
Protocols: tftp ftp telnet dict ldap ldaps http file https ftps scp sftp 
Features: GSS-Negotiate IDN IPv6 Largefile NTLM SSL libz

OSX (Sierra):

curl --version
curl 7.49.1 (x86_64-apple-darwin16.0) libcurl/7.49.1 SecureTransport zlib/1.2.8
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz UnixSockets

Beef up VBox install doc

Pull in info from the forum --- a lot of work went into providing detailed answers, we should be able to recycle a fair amount of that here.

Notes from office hours:

need to address bios (pull text from QIIME 1)
VBox version (minimum? exact?)

plugin dev doc: describe when to create methods vs visualizers

This question often comes up -- when should the output be an artifact vs a visualization? Having some guidelines in place would be helpful.

add instructions for zsh tab-completion to install doc

zsh tab-completion support was added to q2cli in qiime2/q2cli#100

add "importing raw sequence data" tutorial

This should illustrate importing demultiplexed or multiplexed reads, and work from the moving pictures tutorial.

MP tutorial add `qiime tools peek` after importing

Came up at NH Workshop

Bootstrap container width

Maybe this can be bumped up 100px-200px? It would be nice to see a rendering or two at either extreme.

add tutorial on filtering of feature tables

add `tip` to install docs about conda env hook for tab-completion

Instead of having to run source tab-qiime after activating a q2cli environment, users can edit a conda env file to run the command each time the environment is activated. Thanks @thermokarst for finding this 💎 !

Importing fasta+qual instead of fastq

I have fasta+qual files. Qiime1 has convert_fastaqual_fastq.py. What's the equivalent in qiime2? Or, even better, how to create a qiime2 artifact directly from the fasta+qual+mapping files? I looked here, which seems to be the appropriate location:
https://docs.qiime2.org/2.0.6/tutorials/import-sequence-data/

add `make html` to travis build

Travis now supports a directive for long-running commands, so it should be possible to hook up the full doc build (make html) to Travis. This will add significant time to the Travis build (~20-30 mins) but it seems worth it to catch any build errors in the docs before merge.

q2 filtering tutorial- add examples of OR and IN SQL uses.

In the q2 filtering tutorial, add in examples for filtering the feature table with two instances of a category using AND and OR SQL commands like those found in the docs.

Add Sphinx directive for displaying download/fetch options

Related to #36

make header logo link to docs.qiime2.org

investigate why `make html` fails on certain machines

@thermokarst reported issues with make html on his macOS machine and on various Linux VMs. It sounds like commands marked with :no-exec: are being executed. @thermokarst is going to work with me to debug this issue. Note that @gregcaporaso and I didn't have this issue on our OS X machines.

Troubleshooting/FAQ Section

References
Moving this here.

include example of BIOM v2.1 import

The importing data tutorial shows how to import a .biom file in BIOM v1.0 format. It'd be helpful to include an example of importing a BIOM v2.1 file, which is the more common case (and QIIME doesn't autodetect v2.1). This question came up on the forum here.