Code Monkey home page Code Monkey logo

azimuth-references's Introduction

azimuth-references

Repo with workflows for generating azimuth reference objects.

Overview

This repository contains the Dockerfile and snakemake workflows that are used to generate the azimuth references that are hosted online. Each reference directory contains a Snakefile and associated scripts that can be run to regenerate each reference (and associated demo data) from publicly available download links of the underlying data. To run:

snakemake --use-singularity --cores 1 all

Reference format

The Azimuth package provides the AzimuthReference function to facilitate converting existing Seurat objects into the specific format expected by Azimuth. Details on the required reference format can be viewed here. For examples starting with a Seurat object, see the export.R scripts in the workflows in this repo (e.g. human pancreas).

azimuth-references's People

Contributors

andrewwbutler avatar austinhartman avatar gesmira avatar jaisonj708 avatar rsatija avatar timoast avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

azimuth-references's Issues

Fetus reference fails to build due to a missing file

Snakelike fails on rule download_asp because this file on AWS is missing: 'https://md-datasets-cache-zipfiles-prod.s3.eu-west-1.amazonaws.com/mbvhhf8m62-2.zip'

Below is the complete log.

The flag 'directory' used in rule all is only valid for outputs, not inputs.
Building DAG of jobs...
Pulling singularity image docker://satijalab/azimuth-references:vitessce.
Pulling singularity image docker://satijalab/azimuth-references:latest.
Using shell: /usr/bin/bash
Provided cores: 8
Rules claiming more threads will be scaled down.
Job stats:
job                               count    min threads    max threads
------------------------------  -------  -------------  -------------
all                                   1              1              1
download_BBI_individual_organs       14              1              1
download_BBI_metadata                 1              1              1
download_BBI_subsampled               1              1              1
download_asp                          1              1              1
download_enge                         1              1              1
download_lind                         1              1              1
export_zarr                           1              1              1
make_BBI_query                        1              1              1
make_BBI_reference                    1              1              1
preprocess_BBI_reference              1              1              1
setup_asp                             1              1              1
setup_enge                            1              1              1
setup_lind                            1              1              1
total                                27              1              1

Select jobs to execute...

[Fri Mar 17 18:53:04 2023]
rule download_BBI_individual_organs:
    output: data/Stomach_gene_count.RDS
    jobid: 18
    reason: Missing output files: data/Stomach_gene_count.RDS
    wildcards: sample=Stomach
    resources: tmpdir=/tmp

Activating singularity image /prj/refs/azimuth-references/human_fetus/.snakemake/singularity/e4faa2430c925e8c50307f5d2609b777.simg

[Fri Mar 17 18:53:04 2023]
rule download_BBI_individual_organs:
    output: data/Liver_gene_count.RDS
    jobid: 11
    reason: Missing output files: data/Liver_gene_count.RDS
    wildcards: sample=Liver
    resources: tmpdir=/tmp

Activating singularity image /prj/refs/azimuth-references/human_fetus/.snakemake/singularity/e4faa2430c925e8c50307f5d2609b777.simg

[Fri Mar 17 18:53:04 2023]
rule download_BBI_subsampled:
    output: data/gene_count_sampled.RDS
    jobid: 3
    reason: Missing output files: data/gene_count_sampled.RDS
    resources: tmpdir=/tmp

Activating singularity image /prj/refs/azimuth-references/human_fetus/.snakemake/singularity/e4faa2430c925e8c50307f5d2609b777.simg

[Fri Mar 17 18:53:04 2023]
rule download_BBI_individual_organs:
    output: data/Thymus_gene_count.RDS
    jobid: 19
    reason: Missing output files: data/Thymus_gene_count.RDS
    wildcards: sample=Thymus
    resources: tmpdir=/tmp

Activating singularity image /prj/refs/azimuth-references/human_fetus/.snakemake/singularity/e4faa2430c925e8c50307f5d2609b777.simg

[Fri Mar 17 18:53:04 2023]
rule download_BBI_metadata:
    output: data/df_cell.RDS, data/df_gene.RDS
    jobid: 4
    reason: Missing output files: data/df_gene.RDS, data/df_cell.RDS
    resources: tmpdir=/tmp

Activating singularity image /prj/refs/azimuth-references/human_fetus/.snakemake/singularity/e4faa2430c925e8c50307f5d2609b777.simg

[Fri Mar 17 18:53:04 2023]
rule download_lind:
    output: data/lind_data/counts.tsv
    jobid: 23
    reason: Missing output files: data/lind_data/counts.tsv
    resources: tmpdir=/tmp

Activating singularity image /prj/refs/azimuth-references/human_fetus/.snakemake/singularity/e4faa2430c925e8c50307f5d2609b777.simg

[Fri Mar 17 18:53:04 2023]
rule download_enge:
    output: data/enge_data, data/enge_data/enge.tar
    jobid: 25
    reason: Missing output files: data/enge_data/enge.tar
    resources: tmpdir=/tmp

Activating singularity image /prj/refs/azimuth-references/human_fetus/.snakemake/singularity/e4faa2430c925e8c50307f5d2609b777.simg

[Fri Mar 17 18:53:04 2023]
rule download_BBI_individual_organs:
    output: data/Heart_gene_count.RDS
    jobid: 10
    reason: Missing output files: data/Heart_gene_count.RDS
    wildcards: sample=Heart
    resources: tmpdir=/tmp

Activating singularity image /prj/refs/azimuth-references/human_fetus/.snakemake/singularity/e4faa2430c925e8c50307f5d2609b777.simg
[Fri Mar 17 18:53:06 2023]
Finished job 18.
1 of 27 steps (4%) done
Select jobs to execute...

[Fri Mar 17 18:53:06 2023]
rule download_BBI_individual_organs:
    output: data/Lung_gene_count.RDS
    jobid: 12
    reason: Missing output files: data/Lung_gene_count.RDS
    wildcards: sample=Lung
    resources: tmpdir=/tmp

Activating singularity image /prj/refs/azimuth-references/human_fetus/.snakemake/singularity/e4faa2430c925e8c50307f5d2609b777.simg
[Fri Mar 17 18:53:08 2023]
Finished job 19.
2 of 27 steps (7%) done
Select jobs to execute...

[Fri Mar 17 18:53:08 2023]
rule download_BBI_individual_organs:
    output: data/Pancreas_gene_count.RDS
    jobid: 14
    reason: Missing output files: data/Pancreas_gene_count.RDS
    wildcards: sample=Pancreas
    resources: tmpdir=/tmp

Activating singularity image /prj/refs/azimuth-references/human_fetus/.snakemake/singularity/e4faa2430c925e8c50307f5d2609b777.simg
[Fri Mar 17 18:53:11 2023]
Finished job 4.
3 of 27 steps (11%) done
Select jobs to execute...

[Fri Mar 17 18:53:11 2023]
rule download_BBI_individual_organs:
    output: data/Cerebrum_gene_count.RDS
    jobid: 8
    reason: Missing output files: data/Cerebrum_gene_count.RDS
    wildcards: sample=Cerebrum
    resources: tmpdir=/tmp

Activating singularity image /prj/refs/azimuth-references/human_fetus/.snakemake/singularity/e4faa2430c925e8c50307f5d2609b777.simg
[Fri Mar 17 18:53:21 2023]
Finished job 12.
4 of 27 steps (15%) done
Select jobs to execute...

[Fri Mar 17 18:53:21 2023]
rule download_BBI_individual_organs:
    output: data/Adrenal_gene_count.RDS
    jobid: 6
    reason: Missing output files: data/Adrenal_gene_count.RDS
    wildcards: sample=Adrenal
    resources: tmpdir=/tmp

Activating singularity image /prj/refs/azimuth-references/human_fetus/.snakemake/singularity/e4faa2430c925e8c50307f5d2609b777.simg
[Fri Mar 17 18:53:30 2023]
Finished job 11.
5 of 27 steps (19%) done
Select jobs to execute...

[Fri Mar 17 18:53:30 2023]
rule download_asp:
    output: data/asp_data/counts_filtered.tsv, data/asp_data/md_filtered.tsv
    jobid: 21
    reason: Missing output files: data/asp_data/md_filtered.tsv, data/asp_data/counts_filtered.tsv
    resources: tmpdir=/tmp

Activating singularity image /prj/refs/azimuth-references/human_fetus/.snakemake/singularity/e4faa2430c925e8c50307f5d2609b777.simg
[Fri Mar 17 18:53:31 2023]
Error in rule download_asp:
    jobid: 21
    output: data/asp_data/counts_filtered.tsv, data/asp_data/md_filtered.tsv
    shell:
        
        mkdir -p data/asp_data
        cd data/asp_data
        wget 'https://md-datasets-cache-zipfiles-prod.s3.eu-west-1.amazonaws.com/mbvhhf8m62-2.zip'
        unzip mbvhhf8m62-2.zip
        unzip Filtered/'Developmental_heart_filtered_scRNA-seq_and_meta_data.zip'
        gunzip share_files/*
        mv share_files/all_cells_count_matrix_filtered.tsv counts_filtered.tsv
        mv share_files/all_cells_meta_data_filtered.tsv md_filtered.tsv
        echo "ASP data downloaded on: $(date)" > ../../logs/download_asp.log
        
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Fri Mar 17 18:53:34 2023]
Finished job 10.
6 of 27 steps (22%) done
[Fri Mar 17 18:53:36 2023]
Finished job 3.
7 of 27 steps (26%) done
[Fri Mar 17 18:53:37 2023]
Finished job 14.
8 of 27 steps (30%) done
[Fri Mar 17 18:53:52 2023]
Finished job 6.
9 of 27 steps (33%) done
[Fri Mar 17 18:55:04 2023]
Finished job 8.
10 of 27 steps (37%) done
[Fri Mar 17 18:55:09 2023]
Finished job 23.
11 of 27 steps (41%) done
[Fri Mar 17 18:55:59 2023]
Finished job 25.
12 of 27 steps (44%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-03-17T184850.349820.snakemake.log

How to generate reference with my own single-cell RNA-seq datasets

I used Seurat for my image-based spatial data from nanostring. It works great. Instead of performing unsupervised analysis, I followed the instructions to map the profiles to Azimuth Human Pancreas reference as "pancreas.object <- RunAzimuth( pancreas.object, reference = "pancreasref" )" Thanks for @AustinHartman's help.

Now I would like to do the same thing with my own single-cell RNA-seq data as the reference. To my understanding, I can use "AzimuthReference" to generate the reference to do so. But I did not find the instruction, protocol, or example of how. I am very new to Seurat, could you please give me some guidance on it? Thanks a lot!

running Seurat+Azimuth workflow on off-line cluster (due to European Life Sciences GDPR laws)

Dear Andrew,

Thank you very much for providing these "bleeding edge" of using Azimuth to annotate real-life datasets (like in https://github.com/satijalab/azimuth-meta-analysis).

I need to annotate human PBMC cells from COVID patients. I was hoping to replicate your lung annotation and then modify it to work on PBMCs. (By the way, can you suggest a good example to correctly annotate these "NEW COVID-specific" celltypes?)

Due to European Life Science GDPR laws I must do it in a very restrictive off-line HPC environment, where the only way to install something is to upload a singularity image. I was able to build one from your docker://satijalab/seurat:latest where I additionally install Azimuth as R package, add snakemake (to run your workflows) and add rstudio-server (to have decent IDE).

Unfortunately, there are many versions of your docker://satijalab/azimuth-references image, and your example (https://github.com/satijalab/azimuth-meta-analysis) use "azimuth-references:vitessce" rather then "azimuth-references:latest". By looking at docker definition file, docker://satijalab/azimuth-references seems to be just a docker://satijalab/seurat plus a few more layers.
Is it correct?
Is it safe to substitute "satijalab/azimuth-references" where the workflow asks for "satijalab/seurat" image?
If not, is there a single "definitely latest" image with Seurat + Azimuth? One image will be so much easier to containerize to singularity and use on off-line cluster...

Thank you in advance,
Daniil

Sample information for BM map

Hi,

Thank you for making this data accessible!
I downloaded the reference but I could not see the sample information for the cells,
I was wondering if you are going to update the annotations at some point?

Best,

Missing data files in human_bonemarrow (bmmc-reference)

Hi,

I attempted to recreate the bmmc-reference, and I noticed the current repo did not contain the following files, nor code that generates these files:

  • data/hca/manifest.tsv, which I take it to be the HCA manifest file. Can you confirm?
  • data/annotations.csv.gz: The snakemake file is the only place this file was mentioned, and nowhere else on this repo. What exactly does this file contain, and how is it generated?

I would appreciate it if you can clarify these issues.

Question about generating Azimuth reference files from new lung tumor cell atlas

Hi!

I would like to generate Azimuth reference files for this lung tumor cell atlas:

https://datasets.cellxgene.cziscience.com/6e5e887d-96f7-40af-908c-9b4fc5057ef9.rds

Is it possible to use this Seurat v5 object for that or do I need some other files? I looked at the code that you used for generating the "human lung cell atlas v2" reference files and I am not sure if I can just use SCTransform on the atlas Seurat object and go from there?

Any help would be greatly appreciated. Thank you!

Understanding reference building approach

Hi!

Really cool tool and web portal! I was trying to understand how you have been building references for Azimuth. A lot of what you did makes sense to me, but I stumbled across a line in the human BMMC reference script indicating an additional normalization step after individual batches had already been scTransformed.

See:

integrated <- NormalizeData(integrated)

I was wondering why the second normalization step is needed here?

Thanks a lot!

Error in eval(predvars, data, env) : object 'log_umi' not found

when I run the step:
query <- SCTransform(

  • object = query,
  • assay = "RNA",
  • new.assay.name = "refAssay",
  • residual.features = rownames(x = reference$map),
  • reference.SCT.model = reference$map[["refAssay"]]@SCTModel.list$refmodel,
  • method = 'glmGamPoi',
  • ncells = 2000,
  • n_genes = 2000,
  • do.correct.umi = FALSE,
  • do.scale = FALSE,
  • do.center = TRUE
  • )
    it occurs the errors: Error in eval(predvars, data, env) : object 'log_umi' not found

image
what is the reason? i need help!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.