escamero / mirlyn Goto Github PK

View Code? Open in Web Editor NEW

11.0 11.0 3.0 359 KB

R 100.00%

mirlyn's People

Contributors

Stargazers

Watchers

Forkers

lvelosuarez jibarozzo

mirlyn's Issues

Issue with mirl() in v1.4.1

Hi,

First of all, this is a great package! Thank you very much.

Since the recent update of mirl() to V1.4.1, I am not able to run phyloseq objects as x in the function.
Using the example data from mirlyn:

data("example")
example

phyloseq-class experiment-level object
otu_table() OTU Table: [ 3342 taxa and 6 samples ]
sample_data() Sample Data: [ 6 samples by 1 sample variables ]
tax_table() Taxonomy Table: [ 3342 taxa by 7 taxonomic ranks ]
phy_tree() Phylogenetic Tree: [ 3342 tips and 3336 internal nodes ]

mirlps <- mirl(example, rep = 5)

Error in if (ncol(x) > ncol(mirlobj[[1]])) { : argument is of length zero

It seems to work fine with just an otu table, but I'm no longer able to run mirl() on a phyloseq object.

Error using rarecurve

When trying to plot rarefaction curves, the function rarecurve() brings up the error:

Error in round(x): non-numeric argument to mathematical function

I explored the stack trace for a while and I don't get what the round() function is used for, I suppose it serves the purpose of checking that the sample names provided are just ordinal numbers and integers, but even when I use a mock column with just integers in it, it doesn't work.

convert mirlyn object back to phyloseq

Hi! First, thank you for creating this package, it is really helpful. I was wondering if I could ask for your help with the following. I just rarefied my data using the function mirl, and now I would like to transform the resulting mirl object back to a phyloseq object. There are a few steps in my downstream analysis that I can not do using the mirl object (i.e. permanova test, beta diversity with unifrac distances, etc). Would you be able to explain to me how to transform the mirl object back to a phyloseq object? I am wondering if i could get an average of the iterations for each observed ASV, if possible.

Error when using alphadivDF

Hello developers, first of all I would like to congratulate you for your work with Mirilyn.

I'm trying to perform multiple iterations of rarefying on my phyloseq object, but I've encountered some difficulties. I apologize because I'm new to this topic and I'm not a programmer.

I had this phyloseq object:

phyloseq.Bacteria
phyloseq-class experiment-level object
otu_table() OTU Table: [ 7836 taxa and 48 samples ]
sample_data() Sample Data: [ 48 samples by 3 sample variables ]
tax_table() Taxonomy Table: [ 7836 taxa by 6 taxonomic ranks ]

str(phyloseq.Bacteria)
Formal class 'phyloseq' [package "phyloseq"] with 5 slots
..@ otu_table:Formal class 'otu_table' [package "phyloseq"] with 2 slots
.. .. ..@ .Data : int [1:48, 1:7836] 45962 0 0 0 0 0 0 0 0 0 ...
.. .. .. ..- attr(, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:48] "20082_1" "20085_1" "20089_1" "20108_1" ...
.. .. .. .. ..$ : chr [1:7836] "ASV1" "ASV6" "ASV9" "ASV12" ...
.. .. ..@ taxa_are_rows: logi FALSE
.. .. ..$ dim : int [1:2] 48 7836
.. .. ..$ dimnames:List of 2
.. .. .. ..$ : chr [1:48] "20082_1" "20085_1" "20089_1" "20108_1" ...
.. .. .. ..$ : chr [1:7836] "ASV1" "ASV6" "ASV9" "ASV12" ...
..@ tax_table:Formal class 'taxonomyTable' [package "phyloseq"] with 1 slot
.. .. ..@ .Data: chr [1:7836, 1:6] "Bacteria" "Bacteria" "Bacteria" "Bacteria" ...
.. .. .. ..- attr(, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:7836] "ASV1" "ASV6" "ASV9" "ASV12" ...
.. .. .. .. ..$ : chr [1:6] "Kingdom" "Phylum" "Class" "Order" ...
.. .. ..$ dim : int [1:2] 7836 6
.. .. ..$ dimnames:List of 2
.. .. .. ..$ : chr [1:7836] "ASV1" "ASV6" "ASV9" "ASV12" ...
.. .. .. ..$ : chr [1:6] "Kingdom" "Phylum" "Class" "Order" ...
..@ sam_data :'data.frame': 48 obs. of 3 variables:
Formal class 'sample_data' [package "phyloseq"] with 4 slots
.. .. ..@ .Data :List of 3
.. .. .. ..$ : int [1:48] 1 1 1 1 1 1 0 0 1 0 ...
.. .. .. ..$ : int [1:48] 1 1 1 1 1 1 0 0 0 0 ...
.. .. .. ..$ : int [1:48] 0 0 0 0 0 0 0 0 0 0 ...
.. .. ..@ names : chr [1:3] "Llena" "Parto" "Muestreo"
.. .. ..@ row.names: chr [1:48] "20082_1" "20085_1" "20089_1" "20108_1" ...
.. .. ..@ .S3Class : chr "data.frame"
..@ phy_tree : NULL
..@ refseq : NULL

Since the taxa are columns in the otu_table, I transposed the phyloseq object like this:

trans_phyloseq_Bacteria= t(otu_table(phyloseq.Bacteria)).

And, now it looks like this:

str(trans_phyloseq_Bacteria)
Formal class 'otu_table' [package "phyloseq"] with 2 slots
..@ .Data : int [1:7836, 1:48] 45962 0 10862 10618 0 0 0 0 0 22588 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:7836] "ASV1" "ASV6" "ASV9" "ASV12" ...
.. .. ..$ : chr [1:48] "20082_1" "20085_1" "20089_1" "20108_1" ...
..@ taxa_are_rows: logi TRUE
..$ dim : int [1:2] 7836 48
..$ dimnames:List of 2
.. ..$ : chr [1:7836] "ASV1" "ASV6" "ASV9" "ASV12" ...
.. ..$ : chr [1:48] "20082_1" "20085_1" "20089_1" "20108_1" ...

So I created the mirl objects like this:

mirl_object_10000 <- mirl (trans_phyloseq_Bacteria, libsize = 10000, rep = 1000, replace = FALSE, set.seed = 120).

And they look like this:

str(mirl_object_10000)
List of 1000
$ :Formal class 'otu_table' [package "phyloseq"] with 2 slots
.. ..@ .Data : num [1:7836, 1:37] 3952 0 966 951 0 ...
.. .. ..- attr(, "dimnames")=List of 2
.. .. .. ..$ : chr [1:7836] "ASV1" "ASV6" "ASV9" "ASV12" ...
.. .. .. ..$ : chr [1:37] "20082_1" "20085_1" "20089_1" "20110_1" ...
.. ..@ taxa_are_rows: logi TRUE
.. ..$ dim : int [1:2] 7836 37
.. ..$ dimnames:List of 2
.. .. ..$ : chr [1:7836] "ASV1" "ASV6" "ASV9" "ASV12" ...
.. .. ..$ : chr [1:37] "20082_1" "20085_1" "20089_1" "20110_1" ...
$ :Formal class 'otu_table' [package "phyloseq"] with 2 slots
.. ..@ .Data : num [1:7836, 1:37] 4007 0 984 905 0 ...
.. .. ..- attr(, "dimnames")=List of 2
.. .. .. ..$ : chr [1:7836] "ASV1" "ASV6" "ASV9" "ASV12" ...
.. .. .. ..$ : chr [1:37] "20082_1" "20085_1" "20089_1" "20110_1" ...
.. ..@ taxa_are_rows: logi TRUE
.. ..$ dim : int [1:2] 7836 37
.. ..$ dimnames:List of 2
.. .. ..$ : chr [1:7836] "ASV1" "ASV6" "ASV9" "ASV12" ...
.. .. ..$ : chr [1:37] "20082_1" "20085_1" "20089_1" "20110_1" ...
....etc.......

But, when I use the function alphadivDF:

alphadiv_df_10000 <- alphadivDF(mirl_object_10000).

this error appears:
Error in access(object, "sam_data", errorIfNULL) : sam_data slot is empty.

I don't know what I'm doing wrong. Could you help me?
Thank you in advance,
Maila

Warning when using get_asv_table

I was using the get_asv_table function and the warning was displayed:
'Warning message:
funs() was deprecated in dplyr 0.8.0.
i Please use a list of either functions or lambdas:

Simple named list: list(mean = mean, median = median)

Auto named with `tibble::lst()`: tibble::lst(mean, median)

Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))

i The deprecated feature was likely used in the mirlyn package.
Please report the issue to the authors.
This warning is displayed once every 8 hours.
Call lifecycle::last_lifecycle_warnings() to see where this warning was generated.'

Alpha Diversity: Which metric?

Thanks so much for creating and maintaining this package!

I am wondering what diversity metric is being calculated for Alpha diversity? The graph just says "Diversity Index" and the paper mentions both Shannon Index and Hill Number. Is there a way to specify what metric we want?
Sorry if this is already in the docs and I missed it.

More diversity indices-Potential pull request

Hi! Thank you for this package and great paper.

I needed to calculate more diversity indices and opted to modify alphadivDF to wrap vegan::specnumber and "simpson" and "invsimpson". I was aided by AI cuz time is crunching. Maybe it can be a pull request to add.

alphadivDF <- <- function(x, diversity = c("shannon", "simpson", "invsimpson")) {
    md <- sample_data(x[[1]])
    
    # Calculate observed richness
    t_otu_table <- t(rep_otu_df(x))
    observed <- specnumber(t_otu_table)
    observed_richness_df <- rownames_to_column(as.data.frame(observed), var = "Unique_ID")
    
    div_df_list <- lapply(diversity, function(index) {
        div_values <- vegan::diversity(t(repotu_df(x)), index = index)
        if (!is.matrix(div_values)) {
            div_values <- as.matrix(div_values, ncol = 1, dimnames = list(rownames(div_values), NULL))
        }
        colnames(div_values) <- paste0(index, "_", colnames(div_values))
        return(data.frame(Unique_ID = rownames(div_values), div_values, stringsAsFactors = FALSE))
    })
    
    final <- reduce(div_df_list, function(df1, df2) inner_join(df1, df2, by = "Unique_ID"))
    
    # Merge observed richness with the final dataframe
    final <- merge(final, observed_richness_df, by = "Unique_ID", all.x = TRUE)
    
    final <- cbind(md, final)
    
    return(final)
}

Problems running example code

Thanks for providing this package.

Following your advice, I plan to run alphawichVis to see the differences for determining the repeated iterations. I tried to use example date at first, but I met a problem when I ran the code alphawichVis(example, "Id"). The "R session Error" occurs, showing "The previous R session was abnormally terminated due to an unexpected crash".

Here is the full code.

library(mirlyn)
data(example)
example1 <- mirl(example, rep = 100)
example2 <- alphadivDF(example1)
alphawichVis(example2, "Id")

Looking forward to your reply. Thanks!

Viven

How to proceed with the 'betamatPCA' funtion when the phyloseq object is very large?

I used the 'mirl' function on my phyloseq object named ps05
mirl_ps05 <- mirl::mirl(ps05, libsize = 1000, rep = 100, set.seed = 120)
To create a mirl object.
Examining the mirl object I can see it holds:
A list of 100 phyloseq objects, where each phyloseq object contains:
otu table [7212 * 237]
tax table [7212 * 27]
sam data [237 * 22]

I now wish to use the 'betamatPCA' function . Perhaps like this:

betamatPCA_ps05 <- mirlyn::betamatPCA(mirl_ps05, dsim = "bray")

But this appears to be impossible on my computer. Rstudio continues to crash.
I assume this happens because the PCA analysis I am requesting is too heavy.

Do you have any good advice on how to proceed with the 'betamatPCA' when the phyloseq object is very large?
I guess it will end up being a common issue since DNA metabarcoding very often ends up with very large otu tables and very large tax tables.

My otu, tax and sam tables can be obtained from these dropbox weblinks
https://www.dropbox.com/scl/fi/lck08gsme1b3kiqyyl8en/table_df_tax05.csv?rlkey=n5c1qny7tdrm39la6eb42tiz0&dl=0
https://www.dropbox.com/scl/fi/r4okjl5mefg54cvixdcvt/table_df_sam05.csv?rlkey=t0trgzi77jybreslsnnvial2p&dl=0
https://www.dropbox.com/scl/fi/w3emynu0ohor0iynpcfkw/table_df_otu05.csv?rlkey=gv4g4eo7z5nj10bk2w2bzc6k7&dl=0

Doubt about mirlyn graphic outputs

Dear mirlyn developers,

First of all I would like to congratulate you for the mirlyn R package and the concept of rarefaction multi-iterative.

I know that the R package is in the initial phase, but I would like to replicate Figure 4 and Figure S1,I of your original manuscript: Enhancing diversity analysis by repeatedly rarefying next generation sequencing data describing microbial communities. but I am not capable.

I realised that alphacone() and alphadivDF() functions contain the diversity argument but I don't find the argument to choose between replacement or not.

And also for the abovementioned figures of your manuscript: I understand that is not possible to obtain from mirlyn R package. or yes? I understand that you have been customized it and you have analized separately at library size at 500, 1000, 5000 and 11213. And how you ploted the observed situation?

Thanks on advance for your hints/comments,
Magi.

Please created proper releases of this software

HPC sites prefer properly released versions.

Problems running example code

Hi, thanks very much for creating this package!

I'm having a few problems running the example code:

The code to visualize alpha diversity, alphawichVis(example, xvar = "Sample") didn't work for me. Although I think I managed to produce the intended plot with alphawichVis(alphadiv_df, xvar = "Id").
I was able to create a dataframe of alpha-diversity metric across all library sizes using alphacone, but the plotting code, alphaconeVis(alphacone_example, "Sample"), did not work for me. The error was Error: geom_ribbon requires the following missing aesthetics: x or y, xmin and xmax.
The code for visualizing the PCA is missing a quotation mark around the grouping variable name "F". But otherwise it worked fine.

I'm also wondering whether it is possible to use multiple cores when running the mirl function?

Thanks again, this is a very helpful package.

Laura