Code Monkey home page Code Monkey logo

docs's Introduction

qiime2 (the QIIME 2 framework)

Source code repository for the QIIME 2 framework.

QIIME 2™ is a powerful, extensible, and decentralized microbiome bioinformatics platform that is free, open source, and community developed. With a focus on data and analysis transparency, QIIME 2 enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results.

Visit https://qiime2.org to learn more about the QIIME 2 project.

Installation

Detailed instructions are available in the documentation.

Users

Head to the user docs for help getting started, core concepts, tutorials, and other resources.

Just have a question? Please ask it in our forum.

Developers

Please visit the contributing page for more information on contributions, documentation links, and more.

Citing QIIME 2

If you use QIIME 2 for any published research, please include the following citation:

Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener C, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibbons SM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley GA, Janssen S, Jarmusch AK, Jiang L, Kaehler BD, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MGI, Lee J, Ley R, Liu YX, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton JT, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson MS, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CHD, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, and Caporaso JG. 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37:852–857. https://doi.org/10.1038/s41587-019-0209-9

docs's People

Contributors

andrewsanchez avatar benkaehler avatar brett-van-tussler avatar cduvallet avatar chriskeefe avatar colinbrislawn avatar ebolyen avatar eldeveloper avatar fedarko avatar gregcaporaso avatar hmaru avatar jairideout avatar jakereps avatar keegan-evans avatar lisa55asil avatar lizgehret avatar maxvonhippel avatar mcmk3 avatar mikerobeson avatar nbokulich avatar oddant1 avatar patthehat033 avatar q2d2 avatar rnaer avatar shiffer1 avatar thermokarst avatar turanoo avatar valentynbez avatar wasade avatar willwcb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docs's Issues

automate uploading build to S3

Built docs are currently hosted via GitHub Pages. We'll need to transition to a Amazon S3 bucket at some point. Having a script to automate uploading built docs to a versioned subdirectory in an S3 bucket would be useful. Thanks @thermokarst for the idea!

Various thoughts/recommendations on documentation

Improvement Description
I have reviewed the documentation and tutorial with an eye toward figuring out how a completely novice user (e.g., new microbiology grad student without any bioinformatics or programming experience) would view the material. Most of the documentation is fantastic (esp. for alpha) and I love new features, such as the glossary, that improve usability over the qiime1 docs. I have various suggestions below, labeled by what I see as the importance: "[high]", "[low]", or "[enhancement]" (the latter meaning it would enhance usability but is not currently a hindrance to understanding the docs). Some I expect are already planned anyway, but I hope my comments may help hammer these out.

Importing Data [high]

This section needs more detail on the expected file names and other requirements for each type. For example, it is unclear that specific filenames are actually enforced for FeatureData[Sequence] (and I suspect for other semantic types as well). The following error is clear (to me), but it's best to avoid this via better documentation:
ValueError: Missing one or more files for EMPMultiplexedDirFmt: 'sequences.fastq.gz'

Directory of Methods [high] COMPLETED

Need a directory of methods, to fill the same niche as qiime1's script index. Most are currently covered in the tutorial, but not all and this will only expand as qiime2 grows. One issue is that qiime2's methods are hidden within plugins, not free-standing commands, and hence just listing the plugins does not reveal all potential methods (and just a few names are not immediately transparent or are jargony). Some sort of function description index, written in plain english for new users, rather than a list of methods names, could be a useful way to approach this (and an enhancement above qiime1's script index, which was difficult to navigate and translate at times). Instead of listing the plugin or command, list the function. Each function will link to 1) the method entry on the plugin doc page or 2) a tutorial page for multi-step procedures (e.g., procrustes plots). For example, functions such as "demultiplex sequences", "build phylogeny", and "pick OTUs" could all be listed as functions.

Artifact Format Documentation [low]

qzv/qza formats are confusing, and as someone very familiar with qiime1 it took me some time to understand what these file formats are and why they are used. The rationale for these formats should be better documented, along with an explanation that these files can be unzipped to examine the contents. This rationale can link to the pages on semantic types and provenance tracking to discuss those topics. some discussion appears here but this should be more clearly documented here and elsewhere (perhaps on its own page that appears in the table of contents). Also make a note of this in the glossary.
As an aside (and I know it's too late to quip about this), I don't really like the choice of the term "artifact", because it has other meaning in biology, e.g., "sequencing artifact".

Taxonomy Format [low]

A discussion of the taxonomy format could be useful. Terms like "level 2" are used in the docs but are not immediately apparent to outsiders, nor will a google search be much help. This may be appropriate to include within a file format page (see recommendation below).

Doc Version Archive [enhancement] COMPLETED

The "ported wiki documentation" is very useful, and I recommend continuing to build this as an archive of release docs if possible, rather than removing these pages. One frustration with the qiime1 site was that docs only covered the release version, and if working with an earlier version of qiime or reviewing a list of commands/files generated using an earlier version of qiime, the older docs no longer existed. As qiime2 grows, may I recommend keeping the "ported wiki documentation" as a table of contents (TOC) at the bottom of the current release docs TOC, which will link to TOCs for archived doc versions.

Glossary: add other glossaries? [enhancement]

I LOVE the glossary, as it defines some of the lingo-y words that are new to qiime2. This should be on the reading list of everyone starting with qiime2, to whom "action" and "method" are otherwise more general terms, and "artifact" is not entirely intuitive. I wonder whether it would be useful to include separate glossaries on more general microbiome terminology, and on file types. I recommend separate, because this will keep the technical glossary pure and simple.

Microbiome Terminology: Much of this goes outside of the jurisdiction of qiime, but could be very useful to new users (and would give the developers control over the terminology). After all, users come from all backgrounds and qiime may be the first exposure to any kind of bioinformatics software, microbiome/ecology concepts, or all of the above for many users. For many of these terms, great explanations exist elsewhere on the web (though not necessarily with a simple google search), and a short sentence and link will suffice (and link to citation if appropriate). Some useful terms: distance matrix, OTU, feature table, demultiplex, barcode, index (see barcode), metadata, phiX, chimera, biom, metric (e.g., alpha diversity), (include alpha/beta diversity metrics in glossary, short sentences such as shown here and a link ideally to the original citation would suffice), alpha diversity, beta diversity, discrete (metadata), continuous (metadata), ordination, PCoA, richness

File Formats/Types: In many ways, this is should be similar to qiime1's file types page. A similar resource does not yet exist in qiime2. This is in part to describe file formats that are used in qiime2, and in part to describe how to input specific file types into qiime2 artifacts (yeah, yeah, could be more appropriately described in importing data but if that doc expands to include this you can link to the entries for each file format in that doc from this glossary). Some formats/terms to include: fasta, fastq, gz, qza, qzv, mapping file, biom, OTU table, feature table

Hope this all helps. I can elaborate on details / brainstorm more if prompted.

References
Ported from original issue.

should commands use explicit .qza/.qzv extension for output files?

@thermokarst and I were chatting about this today. Commands in the docs are a little magical because output artifacts/visualizations aren't named with a .qza/.qzv extension. When only specifying the basename of the output file (sans extension), it isn't clear how the .qza/.qzv extension factors into output files. It also is hard to determine if the output filename is an artifact or visualization b/c the extension isn't in the output name. Using the basename only also makes it ambiguous whether the output is being saved as a directory or a file without an extension.

I had to explain this behavior several times during the Phoenix workshop, and it seemed to confuse users because .qza/.qzv files were a new concept that was just introduced, and the commands don't explicitly reflect how those extensions relate to output files.

Can we use explicit file extensions in the docs, and note somewhere that if the extension is omitted, it will automatically be appended?

can we import unaligned sequences in feature-classifier tutorial

In the feature-classifier.rst document we import aligned reads:

    qiime tools import \
      --type FeatureData[Sequence] \
      --input-path aligned_85_otu_sequences.fasta.gz \
      --output-path 85_otus.qza

and then run extract-reads where we extract based on the primers:

    qiime feature-classifier extract-reads \
      --i-sequences 85_otus.qza \
      --p-f-primer GTGCCAGCMGCCGCGGTAA \
      --p-r-primer GGACTACHVGGGTWTCTAAT \
      --p-read-length 100 \
      --o-reads ref-seqs.qza

@BenKaehler, can we import unaligned sequences here? It looks like we just strip the gap characters, and unaligned reads are easier for users to provide in general (and also work directly if we want to train a classifier without extracting reads). So importing unaligned reads is preferable if it works.

MVP data appear to be phred+64

The metadata.yaml following the import of the raw sequences, and the metadata.yaml following demux, indicate the data are phred 33. However, that does not appear to be accurate as the character set used includes characters defined outside of phred 33 encoding (e.g., "["):

$ funzip sequences.fastq.gz | head
@HWI-EAS440_0386:1:23:17547:1423#0/1
TACGNAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGAGCGTAGATGGATGTTTAAGTCAGTTGTGAAAGTTTGCGGCTCAACCGTAAAATTGCAGTTGATACTGGATATCTTGAGTGCAGTTGAGGCAGGGGGGGATTGGTGTG
+
hhhdHddddddddfehhfhhhghggfhhhfhhgggfhhgfgdfcfhehfdgfhggfggfggffgddfgdffdgdaagaaddcbdccc]a^ad__a]_____ba_`a`__^__\]^OWZR\Z\\WYTZ_U^BBBBBBBBBBBBBBBBBBBBBB

I checked with @gregcaporaso and he indicated that it made sense given the age of the data.

At the time of creation of this issue, I do not believe there is a functional impact with the inaccuracy of metadata.yaml as quality scores are not interrogated explicitly by Q2 in this tutorial.

resources.qiime2.org

Create page(s) for resources similar to QIIME 1's resources.qiime.org. This will include:

  • protocols (#3)
  • virtual machine image download links/AMIs/etc
  • reference databases
  • useful datasets (link to mockrobiota, link to Qiita, ...)

support incremental builds with `make html` (commandblock directive)

Current Behavior
make html will perform a make clean first, effectively disabling incremental builds. The current commandblock directive is not safe to use with incremental builds because data that has already been generated from a previous build will not be regenerated, and data that should be cleaned/removed in the current (incremental) run will remain as orphaned files in the build.

Proposed Behavior
Incremental builds should be possible if a hook can be added that will delete the current file's corresponding data assets directory (under source/assets/) before any commands are executed in the file.

References
Ported from original issue.

Sort `command-block` collected qza/qzv files

Current Behavior
It's a little confusing when the itemized list of outputs for a multi-command block aren't presented in the same order as the commands as executed.

References
screen shot 2017-02-12 at 7 22 48 pm

RawSequences is not described in semantic types

In the description of semantic types (https://docs.qiime2.org/2.0.6/semantic-types/), RawSequences is missing. And, if there is any documentation on it (e.g., required format for creating an artifact from sequence data), Google does not know about it. All I was able to find was the tutorial that made use of it, but never really described it. I can read the source code and figure it out, but it would be much more efficient if it was well-documented somewhere.

I am a CS person moving into bioinformatics, if this background helps explain my thought processes.

Client-side URL "upgrade"

I am wondering if we should do something like the Django docs, where we provide a "version selector", and also some way to indicate that an old doc is "stale" or "outdate" (see the red bar at the top of the page).

screen shot 2017-01-30 at 8 46 16 am

screen shot 2017-01-30 at 8 46 28 am

`make html` fails on macOS 10.12 due to conda-installed curl

The r packages installed as part of bootstrapping a QIIME2 environment apparently have a curl requirement. It looks like the recipe uses my system-installed curl, but, conda rewrites the certs path. Running make html with the -s removed on a curl command:

Extension error:
Command 'curl https://codeload.github.com/qiime2/q2studio/zip/0.0.6 -o q2studio-0.0.6.zip' exited with non-zero return code 77.

stdout:



stderr:

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  curl: (77) error setting certificate verify locations:
  CAfile: /Users/matthew/miniconda3/envs/q206/etc/pki/tls/certs/cacert.pem
  CApath: none

We have had a few curl-related issues pop up lately, maybe it is worth looking at wget instead?

cURL commands for retrieval of MVP tutorial data are cURL version dependent

It appears that different versions of cURL handle redirects differently. wget works, but it does not ship with OSX by default. Versions of cURL observed to have the problem:

Linux:

$ curl --version
curl 7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.15.3 zlib/1.2.3 libidn/1.18 libssh2/1.4.2
Protocols: tftp ftp telnet dict ldap ldaps http file https ftps scp sftp 
Features: GSS-Negotiate IDN IPv6 Largefile NTLM SSL libz

OSX (Sierra):

curl --version
curl 7.49.1 (x86_64-apple-darwin16.0) libcurl/7.49.1 SecureTransport zlib/1.2.8
Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp smb smbs smtp smtps telnet tftp 
Features: AsynchDNS IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz UnixSockets

Beef up VBox install doc

Pull in info from the forum --- a lot of work went into providing detailed answers, we should be able to recycle a fair amount of that here.

Notes from office hours:

  • need to address bios (pull text from QIIME 1)
  • VBox version (minimum? exact?)

Bootstrap container width

Maybe this can be bumped up 100px-200px? It would be nice to see a rendering or two at either extreme.

add `make html` to travis build

Travis now supports a directive for long-running commands, so it should be possible to hook up the full doc build (make html) to Travis. This will add significant time to the Travis build (~20-30 mins) but it seems worth it to catch any build errors in the docs before merge.

include example of BIOM v2.1 import

The importing data tutorial shows how to import a .biom file in BIOM v1.0 format. It'd be helpful to include an example of importing a BIOM v2.1 file, which is the more common case (and QIIME doesn't autodetect v2.1). This question came up on the forum here.

add tutorial on exporting data

This could be a separate tutorial from the importing tutorial, or we could modify the importing data tutorial to include exporting.

FMT input artifacts have no provenance

Current Behavior
The fmt-tutorial-demux-*.qza artifacts have no provenance as they were generated with 2.0.5 (I think).

Comments
This is not urgent, but should be addressed when we ultimately re-write this tutorial to include paired tests.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.