Code Monkey home page Code Monkey logo

scrnaseq_medicago_a17_sunn4's Introduction

Scripts to reproduce the scRNA-seq analysis of Medicago truncatula roots

About the M. truncatula scRNA-seq analysis

This repository contains the source code and instructions to reproduce the results of the manuscript "The Single-Cell Transcriptome Program of Nodule Development Cellular Lineages in Medicago truncatula" published by Pereira et al., 2023.

Generating counting matrices from the raw data

The raw data can be downloaded from the NCBI’s Gene Expression Omnibus, accession number GSE22453980.

The data then need to be processed using Cell Ranger v. 7.0.1. The reference genome can be downloaded from https://medicago.toulouse.inra.fr/MtrunA17r5.0-ANR/, release 1.9.

Cell ranger was used with standard parameters for each sample individually.

Downstream analysis

Installing the necessary software.

All analyses were conducted using R v.4.1.2. A list of necessary R packages, and the commands to install them, is present in the script installing_necessary_packages.R.

To run the script and perform the packages installation, open a terminal and execute the script below:

Rscript installing_necessary_packages.R

Necessary inputs for the analysis

  1. Single-cell RNA-seq data for all samples were generated by Cell Ranger, as mentioned above.

Other necessary inputs are available at FigShare (10.6084/m9.figshare.24461014). On FigShare, one can also find the RDS files containing the clustered datasets that will be obteined after executing the code below.

Before downstream analasis, the data needs to be organized as shown below.

> tree data/
data
├── A17_0h
│   ├── molecule_info.h5
│   └── raw_feature_bc_matrix
│       ├── barcodes.tsv.gz
│       ├── features.tsv.gz
│       └── matrix.mtx.gz
├── A17_24h
│   ├── molecule_info.h5
│   └── raw_feature_bc_matrix
│       ├── barcodes.tsv.gz
│       ├── features.tsv.gz
│       └── matrix.mtx.gz
├── A17_48h
│   ├── molecule_info.h5
│   └── raw_feature_bc_matrix
│       ├── barcodes.tsv.gz
│       ├── features.tsv.gz
│       └── matrix.mtx.gz
├── A17_96h
│   ├── molecule_info.h5
│   └── raw_feature_bc_matrix
│       ├── barcodes.tsv.gz
│       ├── features.tsv.gz
│       └── matrix.mtx.gz
├── Sunn_0h
│   ├── molecule_info.h5
│   └── raw_feature_bc_matrix
│       ├── barcodes.tsv.gz
│       ├── features.tsv.gz
│       └── matrix.mtx.gz
├── Sunn_24h
│   ├── molecule_info.h5
│   └── raw_feature_bc_matrix
│       ├── barcodes.tsv.gz
│       ├── features.tsv.gz
│       └── matrix.mtx.gz
├── Sunn_48h
│   ├── molecule_info.h5
│   └── raw_feature_bc_matrix
│       ├── barcodes.tsv.gz
│       ├── features.tsv.gz
│       └── matrix.mtx.gz
└── Sunn_96h
    ├── molecule_info.h5
    └── raw_feature_bc_matrix
        ├── barcodes.tsv.gz
        ├── features.tsv.gz
        └── matrix.mtx.gz

Quality control

Rscript remove_empty_cells.R
Rscript QC_and_doublets_identification.R

Analyse of the genotypes independently

Rscript combining_data_and_clustering_A17.R --cores 8 --UMI 400 --out rds_files/Medicago_ONLY_A17_integ_cds_clustered.rds
Rscript combining_data_and_clustering_sunn4.R --cores 8 --UMI 400 --out rds_files/Medicago_ONLY_SUNN4_integ_cds_clustered.rds
Rscript comparing_expression_in_separeted_datasets.R

Analysis of the combined dataset

Generating the combined dataset and clustering

Rscript combining_all_samples_and_clustering.R

Estimating the number of cells per cluster and timepoint

Rscript number_of_cells_per_cluster_over_time.R --rds rds_files/batched_integrated_clustered_complete_dataset.rds --prefix number_of_cells_per_cluster/whole_dataset

Cluster annotation and cell type identification

Rscript add_annotation_to_markers.R --dir top_1000_markers_per_cluster/by_specificity/ --log_name add_annotation_to_markers_wholedataset_top_1000.out
Rscript add_annotation_to_markers.R --dir top_1000_markers_per_cluster/by_specificity/top100 --log_name add_annotation_to_markers_wholedataset_top_100.out

Rscript investigating_genes_using_other_sources_2.0.R --rds rds_files/batched_integrated_clustered_complete_dataset.rds --out_images images/annotation_using_other_sources_images --markers_dir top_1000_markers_per_cluster/by_specificity/top100 --color_scheme VIRIDIS --show_gene_names FALSE --image_format png

Rscript investigating_genes_using_other_sources_2.0.R --rds rds_files/batched_integrated_clustered_complete_dataset.rds --out_images images/annotation_using_other_sources_images_svg --markers_dir top_1000_markers_per_cluster/by_specificity/top100 --color_scheme VIRIDIS --show_gene_names FALSE --image_format svg

Rscript Dotplots_using_Seurat_whole_data.R
mkdir -p images/expression_profiles_top100/
i=1
while [ $i -le 24 ]
do

Rscript visual_inspection_of_markers.R --rds rds_files/batched_integrated_clustered_complete_dataset.rds --markers top_1000_markers_per_cluster/by_specificity/top100/list_of_top_100_for_cluster_${i}_specificity.tsv --out images/expression_profiles_top100/top_100_for_cluster_${i} --common=FALSE

i=$(($i+1))
done

Visualizing the expression profile of key RNS genes reported from other publications in the whole dataset.

Rscript visual_inspection_of_markers.R --rds rds_files/batched_integrated_clustered_complete_dataset.rds --markers selected_markers/figure_2_roy_et_al_2020.csv --out images/annotation_using_markers/figure_2_roy_et_al_2020 --common=TRUE
Rscript visual_inspection_of_markers.R --rds rds_files/batched_integrated_clustered_complete_dataset.rds --markers selected_markers/markers_Schiessl_2019_table_s2.csv --out images/annotation_using_markers/markers_Schiessl_2019_table_s2 --common=TRUE
Rscript visual_inspection_of_markers.R --rds rds_files/batched_integrated_clustered_complete_dataset.rds --markers selected_markers/RNS_genes_curated_list.tsv --out images/annotation_using_markers/RNS_genes_curated_list --common=TRUE

Generates heatmap of key gene markers

Rscript generates_the_heatmap_containing_gene_markers_for_figure1.R
Rscript investigating_genes_using_other_sources_2.0.R --rds rds_files/batched_integrated_clustered_complete_dataset.rds --out_images images/annotation_using_other_sources_images --markers_dir top_1000_markers_per_cluster/by_specificity/top100 --color_scheme VIRIDIS --show_gene_names FALSE --image_format png

high-dimension weighted gene co-expression network analysis (hdWGCNA)

Note, this step demans a large amoung of RAM memory. We recommend using a cluster with at least 150GB of RAM.

Rscript Create_Seurat_object_from_monocle3_whole_dataset.R
Rscript hdWGCNA_whole_dataset.R
Rscript add_annotation_to_markers_hdWGCNA.R
Rscript subseting_the_network.R

RNA velocity

Note, this step demans a large amoung of RAM memory. We recommend using a cluster with at least 150GB of RAM.

sbatch velocyto.sh # run it for each sample
python combining_loom_files_with_loompy.py
Rscript seurat_to_anndata_after_velocyto_whole_dataset.R
python sc_velo_seurat_obj_whole_dataset_dynamical_model.py

Reclustering analysis of the pericycle cells

Rscript reclustering_pericycle.R 
Rscript add_annotation_to_markers.R --dir RECLUSTERING/PERICYCLE/top_1000_markers_per_cluster/by_specificity --log_name add_annotation_to_markers_pericycle_top_1000.out
Rscript Create_Seurat_object_from_monocle3_pericycle.R
Rscript Dotplots_using_Seurat_pericycle.R
Rscript selecting_stele_clusters.R
Rscript Create_Seurat_object_from_all_stele_clusters.R
Rscript Dotplots_using_Seurat_stele.R

Reclustering analysis of the root hair and IT

Rscript reclustering_root_hair_IT.R
Rscript add_annotation_to_markers.R --dir RECLUSTERING/Epidermis_roothair/top_1000_markers_per_cluster/by_specificity --log_name add_annotation_to_markers_roothair_IT_top_1000.out
Rscript Create_Seurat_object_from_monocle3_epidermis.R
Rscript Dotplots_using_Seurat_epidermis.R

Reclustering analysis of cortex and nodule cells

Rscript reclustering_cortex_nodule.R
Rscript add_annotation_to_markers.R --dir RECLUSTERING/CORTEX_NOD/top_1000_markers_per_cluster/by_specificity --log_name add_annotation_to_markers_cortex_nodule_top_1000.out
Rscript add_annotation_to_markers.R --dir RECLUSTERING/CORTEX_NOD/top_1000_markers_per_cluster/by_specificity/top_100_markers_per_cluster/by_specificity/ --log_name add_annotation_to_markers_cortex_nodule_top_100.out
Rscript investigating_genes_using_other_sources_2.0_cortex_nod.R --rds RECLUSTERING/CORTEX_NOD/rds_file_subset/medicago_integrated_subset_cortex_nodule.rds --out_images RECLUSTERING/CORTEX_NOD/annotation_using_other_sources_images_Cortex_NOD --markers_dir RECLUSTERING/CORTEX_NOD/top_1000_markers_per_cluster/by_specificity/top_100_markers_per_cluster/by_specificity --color_scheme VIRIDIS --show_gene_names FALSE --image_format svg

Rscript investigating_genes_using_other_sources_2.0_cortex_nod.R --rds RECLUSTERING/CORTEX_NOD/rds_file_subset/medicago_integrated_subset_cortex_nodule.rds --out_images RECLUSTERING/CORTEX_NOD/annotation_using_other_sources_images_Cortex_NOD --markers_dir RECLUSTERING/CORTEX_NOD/top_1000_markers_per_cluster/by_specificity/top_100_markers_per_cluster/by_specificity --color_scheme VIRIDIS --show_gene_names FALSE --image_format png
Rscript visual_inspection_of_markers.R --rds RECLUSTERING/CORTEX_NOD/rds_file_subset/medicago_integrated_subset_cortex_nodule.rds  --markers selected_markers/RNS_genes_curated_list.tsv --out RECLUSTERING/CORTEX_NOD/annotation_using_markers/RNS_genes_curated_list --common=TRUE
Rscript visual_inspection_of_markers.R --rds RECLUSTERING/CORTEX_NOD/rds_file_subset/medicago_integrated_subset_cortex_nodule.rds --markers selected_markers/figure_2_roy_et_al_2020.csv --out RECLUSTERING/CORTEX_NOD/annotation_using_markers/figure_2_roy_et_al_2020 --common=TRUE
Rscript visual_inspection_of_markers.R --rds RECLUSTERING/CORTEX_NOD/rds_file_subset/medicago_integrated_subset_cortex_nodule.rds --markers selected_markers/markers_Schiessl_2019_table_s2.csv --out RECLUSTERING/CORTEX_NOD/annotation_using_markers/markers_Schiessl_2019_table_s2 --common=TRUE
Rscript visual_inspection_of_markers.R --rds RECLUSTERING/CORTEX_NOD/rds_file_subset/medicago_integrated_subset_cortex_nodule.rds --markers selected_markers/GusGenes.csv --out RECLUSTERING/CORTEX_NOD/annotation_using_markers/GusGenes --common=FALSE
Rscript Dotplots_using_Seurat_cortex_nodule.R
mkdir -p RECLUSTERING/CORTEX_NOD/images/expression_profiles_top1000/
i=1
while [ $i -le 11 ]
do

Rscript visual_inspection_of_markers.R --rds RECLUSTERING/CORTEX_NOD/rds_file_subset/medicago_integrated_subset_cortex_nodule.rds --markers RECLUSTERING/CORTEX_NOD/top_1000_markers_per_cluster/by_specificity/top_100_markers_per_cluster/by_specificity/list_of_top_100_for_cluster_${i}_specificity.tsv --out RECLUSTERING/CORTEX_NOD/images/expression_profiles_top100/top_100_for_cluster_${i} --common=FALSE

i=$(($i+1))
done

Trajectory inference on the reclustered dataset

Rscript trajectory_inference_cortex_nodule.R
Rscript TradeSeq.R

## Additional analysis

### t-test number of nodules per plant _MtSTY4_

```sh
Rscript t_test_number_nodules_per_plants_RNAi_STY4.R

Produce vides of cell distribution in the datasets

Rscript produce_gifs_from_umaps_video1_whole_dataset.R
Rscript produce_gifs_from_umaps_video2_roothair_IT.R
Rscript produce_gifs_from_umaps_video3_cortex_nodule.R

Version of R packages

sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS 14.0

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
    [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
    [1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
    [1] velocyto.R_0.6              patchwork_1.1.3             hdWGCNA_0.2.02
[4] monocle3_1.2.12             tradeSeq_1.8.0              slingshot_2.2.1
[7] TrajectoryUtils_1.2.0       princurve_2.1.6             WGCNA_1.72-1
[10] fastcluster_1.2.3           dynamicTreeCut_1.63-1       ComplexHeatmap_2.10.0
[13] BiocParallel_1.28.3         scater_1.22.0               scuttle_1.4.0
[16] DropletUtils_1.14.2         SingleCellExperiment_1.16.0 SummarizedExperiment_1.24.0
[19] Biobase_2.54.0              GenomicRanges_1.46.1        GenomeInfoDb_1.30.1
[22] IRanges_2.28.0              S4Vectors_0.32.4            BiocGenerics_0.40.0
[25] MatrixGenerics_1.6.0        matrixStats_1.0.0           scDblFinder_1.8.0
[28] animation_2.7               paletteer_1.5.0             Matrix_1.5-1
[31] SeuratObject_4.1.3          Seurat_4.3.0                pheatmap_1.0.12
[34] RColorBrewer_1.1-3          dynutils_1.0.11             scales_1.2.1
[37] circlize_0.4.15             viridis_0.6.4               viridisLite_0.4.2
[40] docopt_0.7.1                vroom_1.6.3                 lubridate_1.9.2
[43] forcats_1.0.0               stringr_1.5.0               purrr_1.0.2
[46] readr_2.1.4                 tibble_3.2.1                tidyverse_2.0.0
[49] ggplot2_3.4.4               tidyr_1.3.0                 dplyr_1.1.2
[52] cowplot_1.1.1

loaded via a namespace (and not attached):
    [1] scattermore_1.2           R.methodsS3_1.8.2         knitr_1.45
[4] bit64_4.0.5               irlba_2.3.5.1             DelayedArray_0.20.0
[7] R.utils_2.12.2            rpart_4.1.19              data.table_1.14.8
[10] KEGGREST_1.34.0           RCurl_1.98-1.12           doParallel_1.0.17
[13] generics_0.1.3            preprocessCore_1.56.0     ScaledMatrix_1.2.0
[16] callr_3.7.3               terra_1.7-39              RSQLite_2.3.1
[19] RANN_2.6.1                future_1.33.0             bit_4.0.5
[22] tzdb_0.4.0                spatstat.data_3.0-3       httpuv_1.6.10
[25] assertthat_0.2.1          xfun_0.40                 hms_1.1.3
[28] evaluate_0.22             promises_1.2.1            fansi_1.0.4
[31] igraph_1.5.1              DBI_1.1.3                 htmlwidgets_1.6.2
[34] spatstat.geom_3.2-1       ellipsis_0.3.2            backports_1.4.1
[37] RcppParallel_5.1.7        deldir_1.0-9              sparseMatrixStats_1.6.0  
[40] vctrs_0.6.2               remotes_2.4.2.1           ROCR_1.0-11
[43] abind_1.4-5               cachem_1.0.8              withr_2.5.2
[46] progressr_0.14.0          checkmate_2.2.0           sctransform_0.3.5
[49] prettyunits_1.1.1         scran_1.22.1              goftest_1.2-3
[52] cluster_2.1.4             lazyeval_0.2.2            crayon_1.5.2
[55] spatstat.explore_3.1-0    edgeR_3.36.0              pkgconfig_2.0.3
[58] nlme_3.1-163              vipor_0.4.5               nnet_7.3-19
[61] rlang_1.1.1               globals_0.16.2            lifecycle_1.0.3
[64] miniUI_0.1.1.1            rsvd_1.0.5                rprojroot_2.0.3
[67] polyclip_1.10-4           lmtest_0.9-40             boot_1.3-28.1
[70] Rhdf5lib_1.16.0           zoo_1.8-12                base64enc_0.1-3
[73] beeswarm_0.4.0            processx_3.8.2            ggridges_0.5.4
[76] GlobalOptions_0.1.2       png_0.1-8                 rjson_0.2.21
[79] bitops_1.0-7              R.oo_1.25.0               KernSmooth_2.23-22
[82] rhdf5filters_1.6.0        Biostrings_2.62.0         blob_1.2.4
[85] DelayedMatrixStats_1.16.0 shape_1.4.6               parallelly_1.36.0
[88] spatstat.random_3.1-4     beachmat_2.10.0           memoise_2.0.1
[91] magrittr_2.0.3            plyr_1.8.8                ica_1.0-3
[94] zlibbioc_1.40.0           compiler_4.1.2            dqrng_0.3.0
[97] pcaMethods_1.86.0         lme4_1.1-34               clue_0.3-64
[100] fitdistrplus_1.1-11       cli_3.6.1                 XVector_0.34.0
[103] listenv_0.9.0             ps_1.7.5                  pbapply_1.7-2
[106] htmlTable_2.4.1           Formula_1.2-5             mgcv_1.9-0
[109] MASS_7.3-60               tidyselect_1.2.0          stringi_1.7.12
[112] BiocSingular_1.10.0       locfit_1.5-9.8            ggrepel_0.9.3
[115] tools_4.1.2               timechange_0.2.0          future.apply_1.11.0
[118] parallel_4.1.2            rstudioapi_0.15.0         foreign_0.8-84
[121] bluster_1.4.0             foreach_1.5.2             metapod_1.2.0
[124] gridExtra_2.3             Rtsne_0.16                proxyC_0.3.3
[127] digest_0.6.33             BiocManager_1.30.22       shiny_1.7.5.1
[130] Rcpp_1.0.11               later_1.3.1               RcppAnnoy_0.0.21
[133] httr_1.4.7                AnnotationDbi_1.56.2      colorspace_2.1-0
[136] tensor_1.5                reticulate_1.28           splines_4.1.2
[139] uwot_0.1.16               statmod_1.5.0             rematch2_2.1.2
[142] spatstat.utils_3.0-3      sp_1.6-0                  xgboost_1.7.5.1
[145] plotly_4.10.2             xtable_1.8-4              nloptr_2.0.3
[148] jsonlite_1.8.7            R6_2.5.1                  Hmisc_5.1-0
[151] pillar_1.9.0              htmltools_0.5.5           mime_0.12
[154] minqa_1.2.5               glue_1.6.2                fastmap_1.1.1
[157] BiocNeighbors_1.12.0      codetools_0.2-19          pkgbuild_1.4.2
[160] utf8_1.2.3                lattice_0.21-8            spatstat.sparse_3.0-1
[163] curl_5.0.0                ggbeeswarm_0.7.2          leiden_0.4.3
[166] GO.db_3.14.0              survival_3.5-7            limma_3.50.3
[169] rmarkdown_2.25            desc_1.4.2                munsell_0.5.0
[172] GetoptLong_1.0.5          rhdf5_2.38.1              GenomeInfoDbData_1.2.7
[175] iterators_1.0.14          impute_1.68.0             HDF5Array_1.22.1
[178] reshape2_1.4.4            gtable_0.3.4

scrnaseq_medicago_a17_sunn4's People

Contributors

wendelljpereira avatar

Stargazers

Min-Yao Jhu avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.