storeylab / biobroom Goto Github PK
View Code? Open in Web Editor NEWTidy up computational biology objects
Tidy up computational biology objects
I've been using a broom-style function to tidy seqinr::read.fasta objects. Would there be any interest in adding this to biobroom if I do a pull request?
read_fasta <- function(fasta_filename, annot = FALSE){
fasta <- seqinr::read.fasta(fasta_filename, as.string = TRUE)
# Convert seqinr SeqFastadna object to data.frame
fasta_df <- fasta %>%
sapply(function(x){x[1:length(x)]}) %>%
as.data.frame %>%
broom::fix_data_frame(newcol = "ID", newnames = "Sequence")
if(annot == TRUE){
annot_df <- getAnnot(fasta) %>%
sapply(function(x){x[1:length(x)]}) %>%
as.data.frame() %>%
broom::fix_data_frame(newnames = "Annot")
fasta_df <- cbind(fasta_df, annot_df)
}
return(fasta_df)
}
read_fasta('https://www.uniprot.org/uniprot/?query=PGH1&format=fasta&limit=10')
https://gist.github.com/clairemcwhite/a5e889f6192a664be45c0226d0ab5813
The scater package implements a number of dplyr verbs for SingleCellExperiment
objects, e.g., mutate
. I have been trying to get rid of these functions for a while, and I was wondering whether biobroom would be a better home for them (once generalized to work on SummarizedExperiment
objects).
This would be a win-win for all of us. For tidyverse/BioC users, who no longer have to put up with masking issues (alanocallaghan/scater#74); for biobroom, by adding and centralizing functionality relating to tidyverse/BioC integration; and for me, who no longer has to maintain these verbs that I never use.
Let me know if this is of interest - I am willing to put in a PR.
Need to add sva tidiers
Thanks for the great package! Could you include the fold change into the table resulting from tidy called on a limma ebays fit?
stops with error
stop("No columns to augment in DGEList")
independently of input.
Change
if (is.null(names(list())))
to
if (is.null(names(ret)))
in line 13 of the function augment.DGEList
when I used tidy.qvalue(),the warning cameout.
tidy.qvalue(qobj)
Warning: tbl_df()
was deprecated in dplyr 1.0.0.
ℹ Please use tibble::as_tibble()
instead.
ℹ The deprecated feature was likely used in the biobroom package.
Please report the issue at https://github.com/StoreyLab/biobroom/issues.
Running biobroom::tidy
on a DeSeq2 object, I saw the warning:
Warning message:
`tbl_df()` is deprecated as of dplyr 1.0.0.
Please use `tibble::as_tibble()` instead.
This was with biobroom v1.20.0, and dplyr v1.0.2.
Ideally, biobroom would be updated to avoid this warning.
Thank you!
Hi,
Thanks for the really useful package. Sometimes sample names get mangled if they contain special characters, eg:
> data(hammer)
> pData(hammer)
sample.id num.tech.reps protocol strain Time
SRX020102 SRX020102 1 control Sprague Dawley 2 months
SRX020103 SRX020103 2 control Sprague Dawley 2 months
SRX020104 SRX020104 1 L5 SNL Sprague Dawley 2 months
SRX020105 SRX020105 2 L5 SNL Sprague Dawley 2months
SRX020091-3 SRX020091-3 1 control Sprague Dawley 2 weeks
SRX020088-90 SRX020088-90 2 control Sprague Dawley 2 weeks
SRX020094-7 SRX020094-7 1 L5 SNL Sprague Dawley 2 weeks
SRX020098-101 SRX020098-101 2 L5 SNL Sprague Dawley 2 weeks
> tidy(hammer)
# A tibble: 236,128 x 3
gene sample value
<chr> <chr> <int>
1 ENSRNOG00000000001 SRX020102 2
2 ENSRNOG00000000007 SRX020102 4
3 ENSRNOG00000000008 SRX020102 0
4 ENSRNOG00000000009 SRX020102 0
5 ENSRNOG00000000010 SRX020102 19
6 ENSRNOG00000000012 SRX020102 7
7 ENSRNOG00000000014 SRX020102 0
8 ENSRNOG00000000017 SRX020102 4
9 ENSRNOG00000000021 SRX020102 7
10 ENSRNOG00000000024 SRX020102 86
# ... with 236,118 more rows
> pData(hammer) %>% dplyr::filter(grepl('SRX020091',sample.id))
sample.id num.tech.reps protocol strain Time
1 SRX020091-3 1 control Sprague Dawley 2 weeks
> tidy(hammer) %>% dplyr::filter(grepl('SRX020091',sample))
# A tibble: 29,516 x 3
gene sample value
<chr> <chr> <int>
1 ENSRNOG00000000001 SRX020091.3 7
2 ENSRNOG00000000007 SRX020091.3 5
3 ENSRNOG00000000008 SRX020091.3 0
4 ENSRNOG00000000009 SRX020091.3 0
5 ENSRNOG00000000010 SRX020091.3 50
6 ENSRNOG00000000012 SRX020091.3 31
7 ENSRNOG00000000014 SRX020091.3 0
8 ENSRNOG00000000017 SRX020091.3 21
9 ENSRNOG00000000021 SRX020091.3 30
10 ENSRNOG00000000024 SRX020091.3 257
# ... with 29,506 more rows
Hi,
would you be able to easily add a untidy() function, which reverts the tidy object back to original formatting including any changes made to the tidy version ?
Smth like:
edgeR_oject_tidy <- edgeR_oject %>% tidy()
edgeR_oject <- edgeR_oject_tidy %>% untidy()
I do often get into the situation that I have to jump between formatting, as the package functions need the base formatting.
Cheers
Jakob
Hi,
(firstly, thanks a lot for such a convenient package!)
I was wondering what your view is on having a tidy()
method for DESeqTransform
objects (coming from rlog()
and varianceStabilizingTransform()
functions?
Here's a gist with one:
https://gist.github.com/tavareshugo/3973461a7daf8a43e65e3566d5deed14
So, this should work:
# load libraries
library(DESeq2)
library(biobroom)
library(magrittr)
# Source gist
devtools::source_gist("3973461a7daf8a43e65e3566d5deed14", filename = "tidy_DESeqTransform.R")
# Example
dds <- makeExampleDESeqDataSet(betaSD = 1)
# transformations
vst_norm <- varianceStabilizingTransformation(dds)
rlog_norm <- rlog(dds)
# tidying
tidy(vst_norm)
tidy(vst_norm, colData = TRUE)
tidy(rlog_norm)
tidy(rlog_norm, colData = TRUE)
I'm happy to fork and submit a pull request, if you think something along these lines is worth it.
Too easy?
Hi there! The broom dev team just ran reverse dependency checks on the upcoming broom 0.7.0 release and found new errors/test failures for the CRAN version of this package. I've pasted the results below, which seem to result from our decision to no longer export the fix_data_frame() function (for maintainability purposes.)
ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
[1mBacktrace:[22m
[90m 1. [39mgenerics::tidy(dds)
[90m 2. [39mbiobroom::tidy.EList(dds)
[90m 3. [39mbiobroom:::tidy_matrix(x$E)
[90m 7. [39mbroom::fix_data_frame
[90m 8. [39mbase::getExportedValue(pkg, name)
══ testthat results ═══════════════════════════════════════════════════════════
[ OK: 33 | SKIPPED: 0 | WARNINGS: 0 | FAILED: 3 ]
1. Error: limma tidier works as expected (@test-limma_tidiers.R#5)
2. Error: voom tidier adds weight column (@test-limma_tidiers.R#26)
3. Error: voomWithQualityWeights tidier adds weight and sample.weight columns (@test-limma_tidiers.R#49)
Error: testthat unit tests failed
Execution halted
I've pasted the most recently exported function definition below as a place to start from in making the necessary fixes.🙂
fix_data_frame <- function(x, newnames = NULL, newcol = "term") {
if (!is.null(newnames) && length(newnames) != ncol(x)) {
stop("newnames must be NULL or have length equal to number of columns")
}
if (all(rownames(x) == seq_len(nrow(x)))) {
# don't need to move rownames into a new column
ret <- data.frame(x, stringsAsFactors = FALSE)
if (!is.null(newnames)) {
colnames(ret) <- newnames
}
}
else {
ret <- data.frame(
...new.col... = rownames(x),
unrowname(x),
stringsAsFactors = FALSE
)
colnames(ret)[1] <- newcol
if (!is.null(newnames)) {
colnames(ret)[-1] <- newnames
}
}
as_tibble(ret)
}
We hope to submit this new version of the package to CRAN in the coming weeks. If you encounter any problems fixing these issues, please feel free to reach out!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.