bjmt / universalmotif Goto Github PK
View Code? Open in Web Editor NEWMotif manipulation functions for R.
License: GNU General Public License v3.0
Motif manipulation functions for R.
License: GNU General Public License v3.0
although as.data.frame
behaves correctly, summarise_motifs
always drops the 'altname' column.
m <- create_motif()
m["altname"] <- "myAlt"
m["organism"] <- "drosophila"
m["family"] <- "ETS"
df <- universalmotif::as.data.frame(m)
summary <- universalmotif::summarise_motifs(m, na.rm = F)
df$altname == summary$altname
This is caused by a typo in summarise_motifs_cpp.
Hello!
I have the issue that I can't run the run_meme function with custom alphabet. The first traceable error is in 'get_custom_meme_alph', where the alphabet file is read. It searches for a match to "END ALPHABET" which can't exist, if the alphabet is to pass other checks. I have made a small reproducible example with Biostrings version 2.64 and universalmotif version 1.13.9 where I create the custom methylated DNA alphabet you have presented in the vignettes and write it to a file then try to use it (https://bioconductor.org/packages/release/bioc/manuals/universalmotif/man/universalmotif.pdf)
library(universalmotif)
library(Biostrings)
meme_alph("ACm", complements = "TGM", alph.name = "MethDNA",
letter.names = c(A = "Adenine", C = "Cytosine", G = "Guanine",
T = "Thymine", m = "Methylcytosine", M = "mC:Guanine"),
like = "DNA", ambiguity = c(N = "ACGTmM"),
file = "temp.alph")
sequences <- DNAStringSet(c("ccaagtttcaa", "taaagttgtat", "aaatgccctac", "tccaagtttca", "gtaccttgtga", "ttaaaatcgtc", "gaacactttct", "aatcacttttg", "ttgtcctggcg"))
# Runs great with standard DNA alphabet
first_attempt <- run_meme(sequences, bin = "/path/to/bin", verbose = 0)
# Doesn't run
secont_attempt <- run_meme(sequences, bin = "/path/to/bin", verbose = 0, alph = "temp.alph")
I am trying to query jaspar for all the Homo Sapiens TFs, but when I try to convert the PFMatrixList but I get the following error.
Error in if (mot_classes == classin) return(motifs) :
missing value where TRUE/FALSE needed
library(JASPAR2018)
library(TFBSTools)
library(universalmotif)
opts <- list()
opts[["species"]] <- 9606
opts[["all_versions"]] <- TRUE
PFMatrixList.jaspar <- getMatrixSet(JASPAR2018, opts)
test <- convert_motifs(PFMatrixList.jaspar)
This may be beyond the purview of universalmotif, but I wanted to float this
idea with you because, to me, universalmotif has done a beautiful job of
standardizing the process of manipulating motifs in R and I would like for it to
grow to be come the de-facto standard data structure for these tasks. I really
love this package.
With this in mind, one area I feel could be improved is how motifs are
manipulated and filtered in bulk. filter_motifs
does a good job of filtering
on specific values, but an aspect I have struggled with in the past is the
process of tidying motifs and their metadata. In particular because I have to
write lots of boilerplate each time to do so.
For example, certain entries in motifDb
have out of date information, because
the annotations it pulls from are not always well maintained (this is
particularly true for model organisms like Drosophila). In these instances,
it's necessary to make changes like renaming motifs, or join metadata from
disparate sources into 1 object.
For example, say I have a set of motifs which are named: "geneA_FBgn1000",
"geneB_FBgn2000", and I would like the universalmotif name
slot to hold:
"geneA" and "geneB", while altname
holds "FBgn1000", "FBgn2000". I need to
loop over each universalmotif object in the list, extract the name
slot (using
@
is the only way I know to do this), split on "_", then directly reassign
each slot, again using @
. At the very least, I think there should be some
setters/getters for each slot so this can be vectorized.
Extending this idea somewhat, say I now want to do this for data in the
extra_info
slot, there's more boilerplate to deal with the assignment, and if
I'm thinking about this correctly, the potential for serious slowdowns because
the list will have to be reallocated to grow in size.
In other words, if universalmotif
objects could become as easy to manipulate
as data.frames, this could greatly improve the workflows of motif-centric
analysis.
To explore these ideas, I created a prototype object using data.frames and
universalmotif
together see here. In my
experiments, this structure has greatly improved the process of analyzing motif
databases themselves and the process of data cleaning using both base R and
tidyverse
approaches. The idea is essentially to allow data.frames to convert
seamlessly back and forth between universalmotif and data.frame format.
In this way, all tools for data.frame manipulation are enabled for free in a way
that is compatible with universalmotif objects. Ideally, this would also allow
new columns to be stored as extra_info
, but I haven't implemented anything
like that so far.
I was wondering if something like this may be of interest to formally port to
universalmotif
. Perhaps the object could be extended to enable different
views
similar to tidygraph
, where in one form, the data could be viewed as a
data.frame, or alternatively, in the current universalmotif
format.
If you think something like this is worthwhile, I would be happy to help
implement it.
Hi,
Thank you for write this package!
I am having troubles importing pwms data from cis_bp database. I already tried with both, the Bioconductor and the Github version.
What I am looking for a way to obtain the concensus sequences of all the downloaded pwms files from cis_bp database. One example:
read_cisbp("M11070_2.00.txt")
I am gettting the following warning:
Error in mapply(function(x, y) raw_lines[x:y], meta_starts, meta_stops, :
zero-length inputs cannot be mixed with those of non-zero length
How M11070_2.00.txt file looks?
Pos A C G T
1 0.304347826086957 0.0869565217391304 0.565217391304348 0.0434782608695652
2 0.0 0.0 0.0 1.0
3 1.0 0.0 0.0 0.0
4 1.0 0.0 0.0 0.0
5 0.0 1.0 0.0 0.0
6 0.0434782608695652 0.565217391304348 0.326086956521739 0.0652173913043478
7 0.0 0.0 1.0 0.0
8 0.108695652173913 0.173913043478261 0.0 0.717391304347826
9 0.0888888888888889 0.177777777777778 0.0 0.733333333333333
10 0.0666666666666667 0.0444444444444444 0.0 0.888888888888889
11 0.146341463414634 0.170731707317073 0.024390243902439 0.658536585365854
12 0.292682926829268 0.170731707317073 0.024390243902439 0.51219512195122
13 0.275 0.1 0.075 0.55
Could you please help me to find the issue here?
thank you!
Hi!
First of all, congratulations for the package. It is really useful.
After run_meme (MEME v. 5.05, 2400 sequences, 7000 background sequences, and trying with either objfun de and se), I always get the same error:
Error in universalmotif_cpp(name = x, type = "PPM", altname = x2, nsites = y[1], :
'bkg' vector is too short
MEME runs fine, the issue seems to be with read_meme. Any ideas? Is there something I may be doing wrong?
EDIT: used run_meme with classic mode, only sequences with no control, and get the same mistake.
The devel version also gives the same error.
I attach sessioninfo():
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_CTYPE=es_ES.UTF-8 LC_NUMERIC=C LC_TIME=es_ES.UTF-8 LC_COLLATE=es_ES.UTF-8 LC_MONETARY=es_ES.UTF-8
[6] LC_MESSAGES=es_ES.UTF-8 LC_PAPER=es_ES.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] universalmotif_1.3.65 GrpString_0.3.2 Biostrings_2.53.2 XVector_0.25.0 IRanges_2.19.16 S4Vectors_0.23.25 BiocGenerics_0.31.6
loaded via a namespace (and not attached):
[1] treeio_1.9.3 gtools_3.8.1 tidyselect_0.2.5 purrr_0.3.2 lattice_0.20-38 colorspace_1.4-1 vctrs_0.2.0 htmltools_0.4.0
[9] yaml_2.2.0 rlang_0.4.0 later_1.0.0 pillar_1.4.2 glue_1.3.1 lifecycle_0.1.0 rvcheck_0.1.5 plyr_1.8.4
[17] stringr_1.4.0 ggseqlogo_0.1 zlibbioc_1.31.0 munsell_0.5.0 gtable_0.3.0 ps_1.3.0 httpuv_1.5.2 gbRd_0.4-11
[25] crosstalk_1.0.0 Rcpp_1.0.2 xtable_1.8-4 promises_1.1.0 scales_1.0.0 backports_1.1.5 BiocManager_1.30.7 jsonlite_1.6
[33] mime_0.7 ggplot2_3.2.1 digest_0.6.21 stringi_1.4.3 processx_3.4.1 dplyr_0.8.3 shiny_1.3.2 grid_3.6.1
[41] bibtex_0.4.2 ggtree_1.99.1 Rdpack_0.11-0 tools_3.6.1 magrittr_1.5 lazyeval_0.2.2 tibble_2.1.3 crayon_1.3.4
[49] ape_5.3 tidyr_1.0.0 pkgconfig_2.0.3 zeallot_0.1.0 tidytree_0.2.8 assertthat_0.2.1 rstudioapi_0.10 R6_2.4.0
[57] nlme_3.1-141 compiler_3.6.1
Thanks a lot :)
This project lacks community guidelines, as described in the JOSS review checklist.
Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support
A CONTRIBUTING.md file would solve this easily enough
Hi,
Thanks for developing the package! But I'm having a problem with getting scan_sequences
to calculate p-values.
I'm trying to run scan_sequences
with the motifs argument as a list of PWMatrixList objects, and sequences is a DNAStringSet.
Its runs ok with calc.pvals = FALSE
but when I set it to TRUE I get the following error message:
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘convert_motifs’ for signature ‘"NULL"’
The code is:
UM_scan_res_test <- scan_sequences(motifs=test_PWM,sequences = test_seq,threshold=0, threshold.type = "logodds", RC=TRUE,calc.pvals = TRUE,nthreads = 22)
I have tried using convert_motifs
to convert the list of PWMatrixLists into a list of universalmotif objects and still get the same error.
I have tried to work around the issue by calculating p-values separately using motif_pvalue
and can partially manage this with my data set. Unfortunately, there is a complicating secondary issue in that the results of scan_sequences
(a DFrame) do not incorporate the motif ID (only '@ListData$motif' which corresponds to '@ListData$@name' of my PWMatrixList). It would be really helpful if it also outputs the ID slot (and this is output using as.data.table
) which I think will always be unique. Where this all causes me headaches is the JASPAR database SPLICE collection where the motif names are not unique but the ID slot is. This means that I can output the results of the scan_sequences function (with scores), then use the scores as input to the motif_pvalue but I cannot supply the appropriate PWM in this particular case (it could be any of 6 because they are all named the same!).
I hope this makes sense. In the short term I will just not calculate p-values using univeralmotif for the JASPAR SPLICE collection.
As another aside I have deliberately set the threhold very low as I wanted to calculate a relative threshold in the same way TFBSTools does (and compare results). This uses a percentage (quantile) which is not percentage of the maximum score but a percentage of the range (from minimum to maximum)...this is digressing somewhat from the main issue though!
Hi, I am trying to use the shuffle_sequence function from the universalmotif package in order to shuffle a data containing ~19300 fasta sequeunces of 50 nt length. When I run my script using jus 10 sequences it works perfectly, but when I tried to used with the big data I get this error message :
fastaFile = readDNAStringSet("m6A_5-3-sequences.fasta")
euler <- shuffle_sequences(fastaFile, k = 2, method = "euler")
Error in edgematrix[i, alph.i[lastlets[[i]]]] :
incorrect number of dimensions
Hi All,
I've just run across this error where the stand information must not be "", however, convert_motifs
initially generated the "" when converting a matrix to a PFMatrix, see example below.
R> tmp <- matrix(1:20, nrow = 4, dimnames = list(c("A", "C", "T", "G")))
R> tmp
[,1] [,2] [,3] [,4] [,5]
A 1 5 9 13 17
C 2 6 10 14 18
T 3 7 11 15 19
G 4 8 12 16 20
R> convert_motifs(tmp, "TFBSTools-PFMatrix")
An object of class PFMatrix
ID:
Name: motif
Matrix Class: Unknown
strand: *
Tags:
list()
Background:
A C G T
0.25 0.25 0.25 0.25
Matrix:
N N N N N
A 1 2 2 2 2
C 2 2 2 2 2
G 4 3 3 3 3
T 3 3 3 3 3
R> convert_motifs(convert_motifs(tmp, "TFBSTools-PFMatrix"))
Error in validObject_universalmotif(motifs) :
* strand must be one of +, -, +-
Thanks for the help
-Dave
Hi there,
I'm new to this package.
When I tried to import motif from MEME, I got error information that
Error in get_custom_meme_alph(raw_lines) :
Alphabet type cannot be detected, custom alphabets are not currently supported
However my meme motif is in correct format where
ALPHABET= ACGT
My code is
motif_1 <- read_meme(system.file("/Users/Desktop/motif/pssm/up150_motif_1_pssm.txt",
package = "universalmotif"))
When I tried with this code below
motif_1 <- read_meme("/Users/Desktop/motif/pssm/up150_motif_1_pssm.txt")
It returned that
Warning message:
In readLines(con <- file(file)) :
incomplete final line found on '/Users/zhangruixuan/Desktop/medusavirus_paper/motif/pssm/up150_motif_1_pssm.txt'
I'm not sure whether this warning message meaning, the last part of motif format is letter-probability matrix and I'm sure it is complete. Why did this warning come out?
Could you give me some advices? Thank you in advance.
Hi,
I ran homer on a list of genomic regions, and my output generated 202 motifs.
Next, I ran I try to read the output using the read_homer function, but my R session aborts after 1-3 minutes. I do not get the same problem when I run the same code but when there are only 8 motifs in the directory (rather than 202). I'm not sure if the problem has something to do with the error I get (see below) about 'treeio'.
setwd("/Volumes/BINF1_Raid/home/aspepin/projects/H3K4me3_McMasterU/exp2/")
library(universalmotif)
# Bioconductor version 3.11 (BiocManager 1.30.12),
# ?BiocManager::install for help
# Bioconductor version '3.11' is out-of-date; the
# current release version '3.14' is available with R
# version '4.1'; see https://bioconductor.org/install
# Registered S3 method overwritten by 'treeio':
# method from
# root.phylo ape
#the directory below contains the output of homer i.e. 202 motifs
up.dir <- "./output/homer_vs_whole_genome/CDvsHFD_up/knownResults/"
up.motif.list <- list.files(up.dir,
pattern="*.motif",
full.names = TRUE,
recursive = TRUE)
up.motifs <- lapply(up.motif.list, function(x) read_homer(x))
Whether I run the above, or if I replace the last line (with lapply function) with the following:
up.motifs <- list()
up.motifs <- lapply(1:length(up.motif.list), function(i){
motifs <- read_homer(up.motif.list[i])
return(motifs)
})
My R session aborts and it says "R session aborted, encountered fatal error".
Any idea how I can solve this?
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.5
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets
[6] methods base
other attached packages:
[1] universalmotif_1.6.4
loaded via a namespace (and not attached):
[1] treeio_1.12.0 tinytex_0.31
[3] tidyselect_1.1.1 xfun_0.29
[5] purrr_0.3.4 lattice_0.20-41
[7] ggfun_0.0.5 colorspace_2.0-2
[9] vctrs_0.3.8 generics_0.1.1
[11] stats4_4.0.2 yaml_2.2.1
[13] utf8_1.2.2 gridGraphics_0.5-1
[15] rlang_0.4.12 pillar_1.6.4
[17] glue_1.6.0 DBI_1.1.2
[19] BiocGenerics_0.34.0 rvcheck_0.1.8
[21] lifecycle_1.0.1 ggseqlogo_0.1
[23] zlibbioc_1.34.0 Biostrings_2.56.0
[25] munsell_0.5.0 gtable_0.3.0
[27] IRanges_2.22.2 ps_1.6.0
[29] parallel_4.0.2 fansi_0.5.0
[31] Rcpp_1.0.7 scales_1.1.1
[33] BiocManager_1.30.12 S4Vectors_0.26.1
[35] jsonlite_1.7.2 XVector_0.28.0
[37] ggplot2_3.3.5 aplot_0.1.2
[39] processx_3.5.2 dplyr_1.0.7
[41] rbibutils_2.2.7 grid_4.0.2
[43] ggtree_2.2.4 Rdpack_2.1.4
[45] tools_4.0.2 yulab.utils_0.0.4
[47] magrittr_2.0.1 lazyeval_0.2.2
[49] patchwork_1.1.1 tibble_3.1.6
[51] crayon_1.4.2 ape_5.6-2
[53] tidyr_1.1.4 pkgconfig_2.0.3
[55] MASS_7.3-53.1 ellipsis_0.3.2
[57] tidytree_0.3.9 ggplotify_0.0.6
[59] assertthat_0.2.1 rstudioapi_0.13
[61] R6_2.5.1 nlme_3.1-152
[63] compiler_4.0.2
Given a list of sequences (as AAStringSet
), create_motif
returns an obviously wrong PWM and consensus. It appears that there are some issues with inconsistent ordering of amino acid label values.
create_motif
creates a matrix with row labels from Biostring's AA_STANDARD
. This is a list of single-letter amino acid codes that are in the order:
"A" "R" "N" "D" "C" "Q" "E" "G" "H" "I" "L" "K" "M" "F" "P" "S" "T" "W" "Y" "V"
Later manipulations of this matrix seem to expect the order to be alphabetical.
library(universalmotif)
library(Biostrings)
sequences <- AAStringSet(c("VTTDLQVKV", "STSDLLTLR", "TSLHLLVLR", "QALELLPRL", "LTDTLVSKL",
"TSLHLVLRL", "TSLRLLTSL", "LSTPVLRFT", "APEEHPVLL", "GSSDFLVKL",
"VTFLLPAGW", "LTSELLTHL", "TSSSLLLLR", "LSTEVNPKL", "QSLPTKETL",
"LLDPHVVLL", "SGLVLKVLL", "LTAHVEPLL", "STVKVLLRL", "FLDTVLLSW",
"LSKALVAYY", "KASSLVPKL", "LTADLARVL", "SGTDRQVTL", "TFDVALSPR",
"EDFTLLVNL", "FDDVAVVTF", "SGAYLKVSL", "LWDLSLLTR", "LTTKALYRN",
"GVAPLQVVK", "FFDPVTLHL", "LVSALQLLL", "TESKYYVTL", "LFDLFRFGF",
"LSVPLFKQF", "KRTLLDVVY", "KSFEAPLLK", "TTTPQQTKL", "SAADLPLNL",
"VSSKLLLVL", "QSLPTKETL", "VTLFKVAAP", "LTAHVEPLL", "LDVRYLLDL",
"TTGTLLKTL", "MLLDVYLTL", "SGLVVLKLL", "KSTDVFTTF", "LTAQHKLMA",
"HFDLLLRVN", "KALDSSKTF", "YNDEALLLR", "KSLTLTPQL", "YTRYGPKAF",
"MVAKKPNLL", "YQPDFYFEF", "KDLLMVPTF", "FSLPWRSST", "LPDSSPRTL",
"SMAALFVLL", "PELEVKVTV", "KTPVKVPVL", "LKLLLGLLL", "VLTTKLLVL",
"LSQRKSTSL", "KTTPDVLFV", "LEELSKYLF", "QSLPLFVQL", "KDTKTLVLL",
"HGFFLPEKL", "KLYYQEFKK", "HSLTEDVTL", "ASSTNLLHL", "NDAYLVQGL",
"STLLKFEAA", "HSAELLAEL", "PDLLTKLTF", "TFTKTQETL", "LSGRLLTVL",
"KPEVVFLLL", "KGFVGSFLV", "KAVDTSKTF", "FDDTTFGTF", "VQVVLMLLL",
"VALAKSLYY", "TAHDLLAEL", "KAAKKAPLN", "SYVKLLLSY", "LPLFVSLDL",
"VNFLVLVRT", "FLKAPLLFL", "TLPHLSESF", "TPHDPTVPL", "VDGKTLVNV",
"KLTSEVLNL", "EPFVLPLTW", "VTDLHKTSL", "KQKWLALLK", "MVAKKPNLL",
"SLRNVKVTL", "QNTLAVPEL", "PSPFAALVH", "HTFWGVVFF", "KSDVFLTEL",
"FTDARAYTT", "LTERFTLVF", "KGTSTTHLL", "VGNLRALVR", "KEASLQLVL",
"PVTTKPVTL", "KNASLYLLV", "HTAELVLVL", "TYDLQESNV", "TTATQVLLL",
"KDGLFWVLV", "VSTGLVKLR", "KANEKLAVK", "LVSVQVVLV", "VAKVNAYTF",
"KAAELQTGL", "QVKFAGVKL", "FDDDSKLFW", "AVRMVGLQL", "LSNVAYPVL",
"MFDDTELLF", "KLTLTEVEP", "YRSLGPALR", "LLARASLLL", "LTNSSTVTL",
"GTDLASFNL", "LCNAKLYLF", "KLEDFAFTF", "HTNALQTLL", "KVDSVYYLF",
"VVKAKVNAP", "TTLLKEVEP", "ASLPRSVLF", "WLAWSTFGE", "LTDGYKLTL",
"QGYEKLVEV", "KDGFTLFYF", "LWPLLAVAL", "LDPLSVKTF", "KVQQYAVKL",
"TDELSPHLL", "VDFLLATWF", "CYGRSVLNY", "GSTERNVTL", "KDEVYYVKL",
"KNKAAVLQL", "TDDYMELLF", "TTTLAKVEV", "FLGKALFFL", "LPDMSQPLW",
"KTRTEVSQY", "TEPEYLTEY", "HATTQNVLL", "KSDVFTLEL"))
Compare the output of create_motif
with that of consensusMatrix
:
create_motif(sequences, alphabet="AA", type="PWM")
consensusMatrix(sequences)
Note that I'm using v1.4.0 of universalmotif
, but I don't think this issue has been addressed subsequently.
sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.5
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biostrings_2.54.0 XVector_0.26.0 IRanges_2.20.0 S4Vectors_0.24.0 BiocGenerics_0.32.0 universalmotif_1.4.0
loaded via a namespace (and not attached):
[1] treeio_1.10.0 tidyselect_0.2.5 remotes_2.1.0 purrr_0.3.3 lattice_0.20-38 colorspace_1.4-1
[7] vctrs_0.2.0 yaml_2.2.0 rlang_0.4.1 pkgbuild_1.0.6 pillar_1.4.2 glue_1.3.1
[13] withr_2.1.2 rvcheck_0.1.6 lifecycle_0.1.0 stringr_1.4.0 ggseqlogo_0.1 zlibbioc_1.32.0
[19] munsell_0.5.0 gtable_0.3.0 callr_3.3.2 ps_1.3.0 gbRd_0.4-11 curl_4.2
[25] Rcpp_1.0.2 scales_1.0.0 backports_1.1.5 BiocManager_1.30.9 jsonlite_1.6 ggplot2_3.2.1
[31] packrat_0.5.0 stringi_1.4.3 processx_3.4.1 dplyr_0.8.3 grid_3.6.1 rprojroot_1.3-2
[37] bibtex_0.4.2 ggtree_2.0.0 Rdpack_0.11-0 cli_1.1.0 tools_3.6.1 magrittr_1.5
[43] lazyeval_0.2.2 tibble_2.1.3 crayon_1.3.4 ape_5.3 tidyr_1.0.0 pkgconfig_2.0.3
[49] zeallot_0.1.0 MASS_7.3-51.4 tidytree_0.2.9 prettyunits_1.0.2 assertthat_0.2.1 rstudioapi_0.10
[55] R6_2.4.0 nlme_3.1-141 compiler_3.6.1
I was just messing around with a set of motifs and I wanted to selectively RC one while using a universalmotif_df.
df <- to_df(c(create_motif, create_motif(name = "motif2")))
df %>%
mutate(motif = ifelse(name == "motif2", motif_rc(motif), motif))
Of course, this doesn't work because of the AsIs
class.
So this got me thinking if there could be a way for folks to deprotect the motif for cases like this. Users doing this would know the operation is unsafe. Here's the silly implementation I came up with:
edit_motif <- function(m){
class(m) <- NULL
m
}
So the above becomes:
df <- to_df(c(create_motif, create_motif(name = "motif2")))
df %>%
mutate(motif = ifelse(name == "motif2", motif_rc(edit_motif(motif)), edit_motif(motif)))
Not sure if this is the perfect solution. For instance, this won't keep the AsIs
attribute after the mutate, so maybe a macro for wrapping the whole ifelse
operation to deprotect could work? I'll do some more thinking on this, but figured I'd post it now.
df <- create_motif("AAT", name = "motif") %>%
to_df()
df$new_col <- "test"
df2 <- update_motifs(df, extrainfo = TRUE)
df2$new_col # Doesn't exist!
df2$einfo
> "test"
The issue is not having a drop = FALSE
in extrainfo_to_df
. PR incoming.
BiocManager::install("universalmotif")
Bioconductor version 3.12 (BiocManager 1.30.10), R 4.0.2 (2020-06-22) Installing package(s) 'universalmotif' trying URL 'https://bioconductor.org/packages/3.12/bioc/src/contrib/universalmotif_1.8.3.tar.gz' Content type 'application/x-gzip' length 3732415 bytes (3.6 MB) ================================================== downloaded 3.6 MB * installing *source* package ‘universalmotif’ ... ** using staged installation ** libs g++ -std=gnu++11 -I"/usr/local/lib64/R/include" -DNDEBUG -I'/home/zhou/Rlib/Rcpp/include' -I'/home/zhou/Rlib/RcppThread/include' -I/usr/local/include -fpic -g -O2 -c RcppExports.cpp -o RcppExports.o g++ -std=gnu++11 -I"/usr/local/lib64/R/include" -DNDEBUG -I'/home/zhou/Rlib/Rcpp/include' -I'/home/zhou/Rlib/RcppThread/include' -I/usr/local/include -fpic -g -O2 -c add_multifreq.cpp -o add_multifreq.o g++ -std=gnu++11 -I"/usr/local/lib64/R/include" -DNDEBUG -I'/home/zhou/Rlib/Rcpp/include' -I'/home/zhou/Rlib/RcppThread/include' -I/usr/local/include -fpic -g -O2 -c compare_motifs.cpp -o compare_motifs.o In file included from /home/zhou/Rlib/RcppThread/include/RcppThread.h:11:0, from compare_motifs.cpp:2: /home/zhou/Rlib/RcppThread/include/RcppThread/Thread.hpp: In lambda function: /home/zhou/Rlib/RcppThread/include/RcppThread/Thread.hpp:42:19: error: parameter packs not expanded with ‘...’: f(args...); ^ /home/zhou/Rlib/RcppThread/include/RcppThread/Thread.hpp:42:19: note: ‘args’ /home/zhou/Rlib/RcppThread/include/RcppThread/Thread.hpp:42:23: error: expansion pattern ‘args’ contains no argument packs f(args...); ^ In file included from /home/zhou/Rlib/RcppThread/include/RcppThread.h:13:0, from compare_motifs.cpp:2: /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp: In member function ‘void RcppThread::ThreadPool::push(F&&, Args&& ...)’: /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:31: error: expected ‘,’ before ‘...’ token jobs_.emplace([f, args...] { f(args...); }); ^ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:31: error: expected identifier before ‘...’ token /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:34: error: parameter packs not expanded with ‘...’: jobs_.emplace([f, args...] { f(args...); }); ^ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:34: note: ‘args’ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp: In lambda function: /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:44: error: expansion pattern ‘args’ contains no argument packs jobs_.emplace([f, args...] { f(args...); }); ^ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp: In member function ‘std::future<decltype (f(args ...))> RcppThread::ThreadPool::pushReturn(F&&, Args&& ...)’: /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:146:54: error: expected ‘,’ before ‘...’ token auto job = std::make_shared<jobPackage>([&f, args...] { ^ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:146:54: error: expected identifier before ‘...’ token /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:146:57: error: parameter packs not expanded with ‘...’: auto job = std::make_shared<jobPackage>([&f, args...] { ^ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:146:57: note: ‘args’ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp: In lambda function: /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:147:22: error: expansion pattern ‘args’ contains no argument packs return f(args...); ^ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp: In instantiation of ‘struct RcppThread::ThreadPool::push(F&&, Args&& ...) [with F = RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = compare_motifs_cpp(const List&, const std::vector<int>&, const std::vector<int>&, const string&, double, bool, std::vector<std::vector<double> >&, int, bool, double, bool, int, double, const std::vector<double>&, const string&)::__lambda13&; ptrdiff_t = long int; size_t = long unsigned int]::__lambda8&; Args = {const RcppThread::Batch&}]::__lambda5’: /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:9: required from ‘void RcppThread::ThreadPool::push(F&&, Args&& ...) [with F = RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = compare_motifs_cpp(const List&, const std::vector<int>&, const std::vector<int>&, const string&, double, bool, std::vector<std::vector<double> >&, int, bool, double, bool, int, double, const std::vector<double>&, const string&)::__lambda13&; ptrdiff_t = long int; size_t = long unsigned int]::__lambda8&; Args = {const RcppThread::Batch&}]’ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:201:9: required from ‘void RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = compare_motifs_cpp(const List&, const std::vector<int>&, const std::vector<int>&, const string&, double, bool, std::vector<std::vector<double> >&, int, bool, double, bool, int, double, const std::vector<double>&, const string&)::__lambda13&; ptrdiff_t = long int; size_t = long unsigned int]’ /home/zhou/Rlib/RcppThread/include/RcppThread/parallelFor.hpp:48:5: required from ‘void RcppThread::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t, size_t) [with F = compare_motifs_cpp(const List&, const std::vector<int>&, const std::vector<int>&, const string&, double, bool, std::vector<std::vector<double> >&, int, bool, double, bool, int, double, const std::vector<double>&, const string&)::__lambda13; ptrdiff_t = long int; size_t = long unsigned int]’ compare_motifs.cpp:1426:18: required from here /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:27: error: using invalid field ‘RcppThread::ThreadPool::push(F&&, Args&& ...)::__lambda5::__args’ jobs_.emplace([f, args...] { f(args...); }); ^ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp: In instantiation of ‘struct RcppThread::ThreadPool::push(F&&, Args&& ...) [with F = RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = compare_motifs_all_cpp(const List&, const string&, double, bool, std::vector<std::vector<double> >&, int, bool, double, bool, int, double, const std::vector<double>&, const string&)::__lambda14&; ptrdiff_t = long int; size_t = long unsigned int]::__lambda8&; Args = {const RcppThread::Batch&}]::__lambda5’: /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:9: required from ‘void RcppThread::ThreadPool::push(F&&, Args&& ...) [with F = RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = compare_motifs_all_cpp(const List&, const string&, double, bool, std::vector<std::vector<double> >&, int, bool, double, bool, int, double, const std::vector<double>&, const string&)::__lambda14&; ptrdiff_t = long int; size_t = long unsigned int]::__lambda8&; Args = {const RcppThread::Batch&}]’ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:201:9: required from ‘void RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = compare_motifs_all_cpp(const List&, const string&, double, bool, std::vector<std::vector<double> >&, int, bool, double, bool, int, double, const std::vector<double>&, const string&)::__lambda14&; ptrdiff_t = long int; size_t = long unsigned int]’ /home/zhou/Rlib/RcppThread/include/RcppThread/parallelFor.hpp:48:5: required from ‘void RcppThread::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t, size_t) [with F = compare_motifs_all_cpp(const List&, const string&, double, bool, std::vector<std::vector<double> >&, int, bool, double, bool, int, double, const std::vector<double>&, const string&)::__lambda14; ptrdiff_t = long int; size_t = long unsigned int]’ compare_motifs.cpp:1485:18: required from here /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:27: error: using invalid field ‘RcppThread::ThreadPool::push(F&&, Args&& ...)::__lambda5::__args’ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp: In instantiation of ‘struct RcppThread::ThreadPool::push(F&&, Args&& ...) [with F = RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = pval_extractor(const std::vector<int>&, const std::vector<double>&, const std::vector<int>&, const std::vector<int>&, const string&, const std::vector<int>&, const std::vector<int>&, const std::vector<double>&, const std::vector<double>&, const std::vector<std::basic_string<char> >&, int)::__lambda15&; ptrdiff_t = long int; size_t = long unsigned int]::__lambda8&; Args = {const RcppThread::Batch&}]::__lambda5’: /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:9: required from ‘void RcppThread::ThreadPool::push(F&&, Args&& ...) [with F = RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = pval_extractor(const std::vector<int>&, const std::vector<double>&, const std::vector<int>&, const std::vector<int>&, const string&, const std::vector<int>&, const std::vector<int>&, const std::vector<double>&, const std::vector<double>&, const std::vector<std::basic_string<char> >&, int)::__lambda15&; ptrdiff_t = long int; size_t = long unsigned int]::__lambda8&; Args = {const RcppThread::Batch&}]’ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:201:9: required from ‘void RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = pval_extractor(const std::vector<int>&, const std::vector<double>&, const std::vector<int>&, const std::vector<int>&, const string&, const std::vector<int>&, const std::vector<int>&, const std::vector<double>&, const std::vector<double>&, const std::vector<std::basic_string<char> >&, int)::__lambda15&; ptrdiff_t = long int; size_t = long unsigned int]’ /home/zhou/Rlib/RcppThread/include/RcppThread/parallelFor.hpp:48:5: required from ‘void RcppThread::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t, size_t) [with F = pval_extractor(const std::vector<int>&, const std::vector<double>&, const std::vector<int>&, const std::vector<int>&, const string&, const std::vector<int>&, const std::vector<int>&, const std::vector<double>&, const std::vector<double>&, const std::vector<std::basic_string<char> >&, int)::__lambda15; ptrdiff_t = long int; size_t = long unsigned int]’ compare_motifs.cpp:1859:18: required from here /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:27: error: using invalid field ‘RcppThread::ThreadPool::push(F&&, Args&& ...)::__lambda5::__args’ make: *** [compare_motifs.o] Error 1 ERROR: compilation failed for package ‘universalmotif’ * removing ‘/home/zhou/Rlib/universalmotif’ The downloaded source packages are in ‘/tmp/RtmpS1nRO3/downloaded_packages’ Installation path not writeable, unable to update packages: boot, class, cluster, codetools, foreign, KernSmooth, MASS, Matrix, mgcv, nlme, nnet, spatial, survival Warning message: In install.packages(...) : installation of package ‘universalmotif’ had non-zero exit status
--
> read_meme("dreme.txt")
Error in strsplit(alph, "\\s+")[[1]] : subscript out of bounds
This happens when the ALPHABET string defines a DNA/RNA/AA-LIKE or custom alphabet.
A few outcomes could happen here:
4 is quite the undertaking, so I've implemented 2 & 3. PR incoming.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.