Code Monkey home page Code Monkey logo

universalmotif's Issues

altname is dropped during summarise_motif

although as.data.frame behaves correctly, summarise_motifs always drops the 'altname' column.

m <- create_motif()

m["altname"] <- "myAlt"
m["organism"] <- "drosophila"
m["family"] <- "ETS"

df <- universalmotif::as.data.frame(m)
summary <- universalmotif::summarise_motifs(m, na.rm = F)
df$altname == summary$altname

This is caused by a typo in summarise_motifs_cpp.

Can't run meme with custom alphabet

Hello!
I have the issue that I can't run the run_meme function with custom alphabet. The first traceable error is in 'get_custom_meme_alph', where the alphabet file is read. It searches for a match to "END ALPHABET" which can't exist, if the alphabet is to pass other checks. I have made a small reproducible example with Biostrings version 2.64 and universalmotif version 1.13.9 where I create the custom methylated DNA alphabet you have presented in the vignettes and write it to a file then try to use it (https://bioconductor.org/packages/release/bioc/manuals/universalmotif/man/universalmotif.pdf)

library(universalmotif)
library(Biostrings)

meme_alph("ACm", complements = "TGM", alph.name = "MethDNA",
          letter.names = c(A = "Adenine", C = "Cytosine", G = "Guanine",
                           T = "Thymine", m = "Methylcytosine", M = "mC:Guanine"),
          like = "DNA", ambiguity = c(N = "ACGTmM"),
          file = "temp.alph")

sequences <- DNAStringSet(c("ccaagtttcaa", "taaagttgtat", "aaatgccctac", "tccaagtttca", "gtaccttgtga", "ttaaaatcgtc", "gaacactttct", "aatcacttttg", "ttgtcctggcg"))
# Runs great with standard DNA alphabet
first_attempt <- run_meme(sequences, bin = "/path/to/bin", verbose = 0)
# Doesn't run
secont_attempt <- run_meme(sequences, bin = "/path/to/bin", verbose = 0, alph = "temp.alph")

Error in convert_motifs for TFBStools-PFMatrixList

I am trying to query jaspar for all the Homo Sapiens TFs, but when I try to convert the PFMatrixList but I get the following error.

Error in if (mot_classes == classin) return(motifs) :
missing value where TRUE/FALSE needed

library(JASPAR2018)
library(TFBSTools)
library(universalmotif)

opts <- list()

opts[["species"]] <- 9606

opts[["all_versions"]] <- TRUE

PFMatrixList.jaspar <- getMatrixSet(JASPAR2018, opts)

test <- convert_motifs(PFMatrixList.jaspar)

Improve support for metadata-based manipulations

This may be beyond the purview of universalmotif, but I wanted to float this
idea with you because, to me, universalmotif has done a beautiful job of
standardizing the process of manipulating motifs in R and I would like for it to
grow to be come the de-facto standard data structure for these tasks. I really
love this package.

With this in mind, one area I feel could be improved is how motifs are
manipulated and filtered in bulk. filter_motifs does a good job of filtering
on specific values, but an aspect I have struggled with in the past is the
process of tidying motifs and their metadata. In particular because I have to
write lots of boilerplate each time to do so.

For example, certain entries in motifDb have out of date information, because
the annotations it pulls from are not always well maintained (this is
particularly true for model organisms like Drosophila). In these instances,
it's necessary to make changes like renaming motifs, or join metadata from
disparate sources into 1 object.

For example, say I have a set of motifs which are named: "geneA_FBgn1000",
"geneB_FBgn2000", and I would like the universalmotif name slot to hold:
"geneA" and "geneB", while altname holds "FBgn1000", "FBgn2000". I need to
loop over each universalmotif object in the list, extract the name slot (using
@ is the only way I know to do this), split on "_", then directly reassign
each slot, again using @. At the very least, I think there should be some
setters/getters for each slot so this can be vectorized.

Extending this idea somewhat, say I now want to do this for data in the
extra_info slot, there's more boilerplate to deal with the assignment, and if
I'm thinking about this correctly, the potential for serious slowdowns because
the list will have to be reallocated to grow in size.

In other words, if universalmotif objects could become as easy to manipulate
as data.frames, this could greatly improve the workflows of motif-centric
analysis.

To explore these ideas, I created a prototype object using data.frames and
universalmotif together see here. In my
experiments, this structure has greatly improved the process of analyzing motif
databases themselves and the process of data cleaning using both base R and
tidyverse approaches. The idea is essentially to allow data.frames to convert
seamlessly back and forth between universalmotif and data.frame format.
In this way, all tools for data.frame manipulation are enabled for free in a way
that is compatible with universalmotif objects. Ideally, this would also allow
new columns to be stored as extra_info, but I haven't implemented anything
like that so far.

I was wondering if something like this may be of interest to formally port to
universalmotif. Perhaps the object could be extended to enable different
views similar to tidygraph, where in one form, the data could be viewed as a
data.frame, or alternatively, in the current universalmotif format.

If you think something like this is worthwhile, I would be happy to help
implement it.

importing pwd by read_cisbp

Hi,

Thank you for write this package!
I am having troubles importing pwms data from cis_bp database. I already tried with both, the Bioconductor and the Github version.
What I am looking for a way to obtain the concensus sequences of all the downloaded pwms files from cis_bp database. One example:

read_cisbp("M11070_2.00.txt")

I am gettting the following warning:

Error in mapply(function(x, y) raw_lines[x:y], meta_starts, meta_stops, :
zero-length inputs cannot be mixed with those of non-zero length

How M11070_2.00.txt file looks?

Pos A C G T
1 0.304347826086957 0.0869565217391304 0.565217391304348 0.0434782608695652
2 0.0 0.0 0.0 1.0
3 1.0 0.0 0.0 0.0
4 1.0 0.0 0.0 0.0
5 0.0 1.0 0.0 0.0
6 0.0434782608695652 0.565217391304348 0.326086956521739 0.0652173913043478
7 0.0 0.0 1.0 0.0
8 0.108695652173913 0.173913043478261 0.0 0.717391304347826
9 0.0888888888888889 0.177777777777778 0.0 0.733333333333333
10 0.0666666666666667 0.0444444444444444 0.0 0.888888888888889
11 0.146341463414634 0.170731707317073 0.024390243902439 0.658536585365854
12 0.292682926829268 0.170731707317073 0.024390243902439 0.51219512195122
13 0.275 0.1 0.075 0.55

Could you please help me to find the issue here?
thank you!

Error with read_meme

Hi!

First of all, congratulations for the package. It is really useful.

After run_meme (MEME v. 5.05, 2400 sequences, 7000 background sequences, and trying with either objfun de and se), I always get the same error:

Error in universalmotif_cpp(name = x, type = "PPM", altname = x2, nsites = y[1], :
'bkg' vector is too short

MEME runs fine, the issue seems to be with read_meme. Any ideas? Is there something I may be doing wrong?

EDIT: used run_meme with classic mode, only sequences with no control, and get the same mistake.
The devel version also gives the same error.

I attach sessioninfo():

R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
 [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C               LC_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8     LC_MONETARY=es_ES.UTF-8   
 [6] LC_MESSAGES=es_ES.UTF-8    LC_PAPER=es_ES.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] universalmotif_1.3.65 GrpString_0.3.2       Biostrings_2.53.2     XVector_0.25.0        IRanges_2.19.16       S4Vectors_0.23.25     BiocGenerics_0.31.6  

loaded via a namespace (and not attached):
 [1] treeio_1.9.3       gtools_3.8.1       tidyselect_0.2.5   purrr_0.3.2        lattice_0.20-38    colorspace_1.4-1   vctrs_0.2.0        htmltools_0.4.0   
 [9] yaml_2.2.0         rlang_0.4.0        later_1.0.0        pillar_1.4.2       glue_1.3.1         lifecycle_0.1.0    rvcheck_0.1.5      plyr_1.8.4        
[17] stringr_1.4.0      ggseqlogo_0.1      zlibbioc_1.31.0    munsell_0.5.0      gtable_0.3.0       ps_1.3.0           httpuv_1.5.2       gbRd_0.4-11       
[25] crosstalk_1.0.0    Rcpp_1.0.2         xtable_1.8-4       promises_1.1.0     scales_1.0.0       backports_1.1.5    BiocManager_1.30.7 jsonlite_1.6      
[33] mime_0.7           ggplot2_3.2.1      digest_0.6.21      stringi_1.4.3      processx_3.4.1     dplyr_0.8.3        shiny_1.3.2        grid_3.6.1        
[41] bibtex_0.4.2       ggtree_1.99.1      Rdpack_0.11-0      tools_3.6.1        magrittr_1.5       lazyeval_0.2.2     tibble_2.1.3       crayon_1.3.4      
[49] ape_5.3            tidyr_1.0.0        pkgconfig_2.0.3    zeallot_0.1.0      tidytree_0.2.8     assertthat_0.2.1   rstudioapi_0.10    R6_2.4.0          
[57] nlme_3.1-141       compiler_3.6.1 

Thanks a lot :)

[JOSS REVIEW] No community/contribution guidelines

This project lacks community guidelines, as described in the JOSS review checklist.

Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

A CONTRIBUTING.md file would solve this easily enough

Cannot get scan_sequences to report p-values

Hi,
Thanks for developing the package! But I'm having a problem with getting scan_sequences to calculate p-values.

I'm trying to run scan_sequences with the motifs argument as a list of PWMatrixList objects, and sequences is a DNAStringSet.
Its runs ok with calc.pvals = FALSE but when I set it to TRUE I get the following error message:

Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘convert_motifs’ for signature ‘"NULL"’

The code is:
UM_scan_res_test <- scan_sequences(motifs=test_PWM,sequences = test_seq,threshold=0, threshold.type = "logodds", RC=TRUE,calc.pvals = TRUE,nthreads = 22)

I have tried using convert_motifs to convert the list of PWMatrixLists into a list of universalmotif objects and still get the same error.

I have tried to work around the issue by calculating p-values separately using motif_pvalue and can partially manage this with my data set. Unfortunately, there is a complicating secondary issue in that the results of scan_sequences (a DFrame) do not incorporate the motif ID (only '@ListData$motif' which corresponds to '@ListData$@name' of my PWMatrixList). It would be really helpful if it also outputs the ID slot (and this is output using as.data.table) which I think will always be unique. Where this all causes me headaches is the JASPAR database SPLICE collection where the motif names are not unique but the ID slot is. This means that I can output the results of the scan_sequences function (with scores), then use the scores as input to the motif_pvalue but I cannot supply the appropriate PWM in this particular case (it could be any of 6 because they are all named the same!).

I hope this makes sense. In the short term I will just not calculate p-values using univeralmotif for the JASPAR SPLICE collection.

As another aside I have deliberately set the threhold very low as I wanted to calculate a relative threshold in the same way TFBSTools does (and compare results). This uses a percentage (quantile) which is not percentage of the maximum score but a percentage of the range (from minimum to maximum)...this is digressing somewhat from the main issue though!

shuffle_sequence truncated

Hi, I am trying to use the shuffle_sequence function from the universalmotif package in order to shuffle a data containing ~19300 fasta sequeunces of 50 nt length. When I run my script using jus 10 sequences it works perfectly, but when I tried to used with the big data I get this error message :
fastaFile = readDNAStringSet("m6A_5-3-sequences.fasta")
euler <- shuffle_sequences(fastaFile, k = 2, method = "euler")

Error in edgematrix[i, alph.i[lastlets[[i]]]] :
incorrect number of dimensions

validObject_universalmotif(motifs) fails when strand is "*"

Hi All,

I've just run across this error where the stand information must not be "", however, convert_motifs initially generated the "" when converting a matrix to a PFMatrix, see example below.

R> tmp <- matrix(1:20, nrow = 4, dimnames = list(c("A", "C", "T", "G")))
R> tmp
  [,1] [,2] [,3] [,4] [,5]
A    1    5    9   13   17
C    2    6   10   14   18
T    3    7   11   15   19
G    4    8   12   16   20

R> convert_motifs(tmp, "TFBSTools-PFMatrix")
An object of class PFMatrix
ID: 
Name: motif
Matrix Class: Unknown
strand: *
Tags: 
list()
Background: 
   A    C    G    T 
0.25 0.25 0.25 0.25 
Matrix: 
  N N N N N
A 1 2 2 2 2
C 2 2 2 2 2
G 4 3 3 3 3
T 3 3 3 3 3

R> convert_motifs(convert_motifs(tmp, "TFBSTools-PFMatrix"))
Error in validObject_universalmotif(motifs) : 
* strand must be one of +, -, +-

Thanks for the help

-Dave

read_meme() yields out that "alphabet type cannot be detected" although "ALPHABET= ACGT" is provided

Hi there,

I'm new to this package.

When I tried to import motif from MEME, I got error information that

Error in get_custom_meme_alph(raw_lines) :
Alphabet type cannot be detected, custom alphabets are not currently supported

However my meme motif is in correct format where

ALPHABET= ACGT

My code is

motif_1 <- read_meme(system.file("/Users/Desktop/motif/pssm/up150_motif_1_pssm.txt",
                                           package = "universalmotif"))

When I tried with this code below

motif_1 <- read_meme("/Users/Desktop/motif/pssm/up150_motif_1_pssm.txt")

It returned that

Warning message:
In readLines(con <- file(file)) :
incomplete final line found on '/Users/zhangruixuan/Desktop/medusavirus_paper/motif/pssm/up150_motif_1_pssm.txt'

I'm not sure whether this warning message meaning, the last part of motif format is letter-probability matrix and I'm sure it is complete. Why did this warning come out?

Could you give me some advices? Thank you in advance.

R session aborted / fatal error when running read_homer from a list of motifs

Hi,
I ran homer on a list of genomic regions, and my output generated 202 motifs.
Next, I ran I try to read the output using the read_homer function, but my R session aborts after 1-3 minutes. I do not get the same problem when I run the same code but when there are only 8 motifs in the directory (rather than 202). I'm not sure if the problem has something to do with the error I get (see below) about 'treeio'.

setwd("/Volumes/BINF1_Raid/home/aspepin/projects/H3K4me3_McMasterU/exp2/")
library(universalmotif)

# Bioconductor version 3.11 (BiocManager 1.30.12),
# ?BiocManager::install for help
# Bioconductor version '3.11' is out-of-date; the
# current release version '3.14' is available with R
# version '4.1'; see https://bioconductor.org/install
# Registered S3 method overwritten by 'treeio':
#   method     from
# root.phylo ape 

#the directory below contains the output of homer i.e. 202 motifs
up.dir <- "./output/homer_vs_whole_genome/CDvsHFD_up/knownResults/" 
up.motif.list <- list.files(up.dir,
                         pattern="*.motif",
                         full.names = TRUE,
                         recursive = TRUE)
up.motifs <- lapply(up.motif.list, function(x) read_homer(x))

Whether I run the above, or if I replace the last line (with lapply function) with the following:

up.motifs <- list()
up.motifs <- lapply(1:length(up.motif.list), function(i){
  motifs <- read_homer(up.motif.list[i])
  return(motifs)
})

My R session aborts and it says "R session aborted, encountered fatal error".

Any idea how I can solve this?

> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.5

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets 
[6] methods   base     

other attached packages:
[1] universalmotif_1.6.4

loaded via a namespace (and not attached):
 [1] treeio_1.12.0       tinytex_0.31       
 [3] tidyselect_1.1.1    xfun_0.29          
 [5] purrr_0.3.4         lattice_0.20-41    
 [7] ggfun_0.0.5         colorspace_2.0-2   
 [9] vctrs_0.3.8         generics_0.1.1     
[11] stats4_4.0.2        yaml_2.2.1         
[13] utf8_1.2.2          gridGraphics_0.5-1 
[15] rlang_0.4.12        pillar_1.6.4       
[17] glue_1.6.0          DBI_1.1.2          
[19] BiocGenerics_0.34.0 rvcheck_0.1.8      
[21] lifecycle_1.0.1     ggseqlogo_0.1      
[23] zlibbioc_1.34.0     Biostrings_2.56.0  
[25] munsell_0.5.0       gtable_0.3.0       
[27] IRanges_2.22.2      ps_1.6.0           
[29] parallel_4.0.2      fansi_0.5.0        
[31] Rcpp_1.0.7          scales_1.1.1       
[33] BiocManager_1.30.12 S4Vectors_0.26.1   
[35] jsonlite_1.7.2      XVector_0.28.0     
[37] ggplot2_3.3.5       aplot_0.1.2        
[39] processx_3.5.2      dplyr_1.0.7        
[41] rbibutils_2.2.7     grid_4.0.2         
[43] ggtree_2.2.4        Rdpack_2.1.4       
[45] tools_4.0.2         yulab.utils_0.0.4  
[47] magrittr_2.0.1      lazyeval_0.2.2     
[49] patchwork_1.1.1     tibble_3.1.6       
[51] crayon_1.4.2        ape_5.6-2          
[53] tidyr_1.1.4         pkgconfig_2.0.3    
[55] MASS_7.3-53.1       ellipsis_0.3.2     
[57] tidytree_0.3.9      ggplotify_0.0.6    
[59] assertthat_0.2.1    rstudioapi_0.13    
[61] R6_2.5.1            nlme_3.1-152       
[63] compiler_4.0.2 

`create_motif` makes incorrect motif for amino acid sequences

What happens

Given a list of sequences (as AAStringSet), create_motif returns an obviously wrong PWM and consensus. It appears that there are some issues with inconsistent ordering of amino acid label values.

What I suspect might be causing the problem

create_motif creates a matrix with row labels from Biostring's AA_STANDARD. This is a list of single-letter amino acid codes that are in the order:
"A" "R" "N" "D" "C" "Q" "E" "G" "H" "I" "L" "K" "M" "F" "P" "S" "T" "W" "Y" "V"

Later manipulations of this matrix seem to expect the order to be alphabetical.

How to reproduce

library(universalmotif)
library(Biostrings)

sequences <- AAStringSet(c("VTTDLQVKV", "STSDLLTLR", "TSLHLLVLR", "QALELLPRL", "LTDTLVSKL", 
  "TSLHLVLRL", "TSLRLLTSL", "LSTPVLRFT", "APEEHPVLL", "GSSDFLVKL", 
  "VTFLLPAGW", "LTSELLTHL", "TSSSLLLLR", "LSTEVNPKL", "QSLPTKETL", 
  "LLDPHVVLL", "SGLVLKVLL", "LTAHVEPLL", "STVKVLLRL", "FLDTVLLSW", 
  "LSKALVAYY", "KASSLVPKL", "LTADLARVL", "SGTDRQVTL", "TFDVALSPR", 
  "EDFTLLVNL", "FDDVAVVTF", "SGAYLKVSL", "LWDLSLLTR", "LTTKALYRN", 
  "GVAPLQVVK", "FFDPVTLHL", "LVSALQLLL", "TESKYYVTL", "LFDLFRFGF", 
  "LSVPLFKQF", "KRTLLDVVY", "KSFEAPLLK", "TTTPQQTKL", "SAADLPLNL", 
  "VSSKLLLVL", "QSLPTKETL", "VTLFKVAAP", "LTAHVEPLL", "LDVRYLLDL", 
  "TTGTLLKTL", "MLLDVYLTL", "SGLVVLKLL", "KSTDVFTTF", "LTAQHKLMA", 
  "HFDLLLRVN", "KALDSSKTF", "YNDEALLLR", "KSLTLTPQL", "YTRYGPKAF", 
  "MVAKKPNLL", "YQPDFYFEF", "KDLLMVPTF", "FSLPWRSST", "LPDSSPRTL", 
  "SMAALFVLL", "PELEVKVTV", "KTPVKVPVL", "LKLLLGLLL", "VLTTKLLVL", 
  "LSQRKSTSL", "KTTPDVLFV", "LEELSKYLF", "QSLPLFVQL", "KDTKTLVLL", 
  "HGFFLPEKL", "KLYYQEFKK", "HSLTEDVTL", "ASSTNLLHL", "NDAYLVQGL", 
  "STLLKFEAA", "HSAELLAEL", "PDLLTKLTF", "TFTKTQETL", "LSGRLLTVL", 
  "KPEVVFLLL", "KGFVGSFLV", "KAVDTSKTF", "FDDTTFGTF", "VQVVLMLLL", 
  "VALAKSLYY", "TAHDLLAEL", "KAAKKAPLN", "SYVKLLLSY", "LPLFVSLDL", 
  "VNFLVLVRT", "FLKAPLLFL", "TLPHLSESF", "TPHDPTVPL", "VDGKTLVNV", 
  "KLTSEVLNL", "EPFVLPLTW", "VTDLHKTSL", "KQKWLALLK", "MVAKKPNLL", 
  "SLRNVKVTL", "QNTLAVPEL", "PSPFAALVH", "HTFWGVVFF", "KSDVFLTEL", 
  "FTDARAYTT", "LTERFTLVF", "KGTSTTHLL", "VGNLRALVR", "KEASLQLVL", 
  "PVTTKPVTL", "KNASLYLLV", "HTAELVLVL", "TYDLQESNV", "TTATQVLLL", 
  "KDGLFWVLV", "VSTGLVKLR", "KANEKLAVK", "LVSVQVVLV", "VAKVNAYTF", 
  "KAAELQTGL", "QVKFAGVKL", "FDDDSKLFW", "AVRMVGLQL", "LSNVAYPVL", 
  "MFDDTELLF", "KLTLTEVEP", "YRSLGPALR", "LLARASLLL", "LTNSSTVTL", 
  "GTDLASFNL", "LCNAKLYLF", "KLEDFAFTF", "HTNALQTLL", "KVDSVYYLF", 
  "VVKAKVNAP", "TTLLKEVEP", "ASLPRSVLF", "WLAWSTFGE", "LTDGYKLTL", 
  "QGYEKLVEV", "KDGFTLFYF", "LWPLLAVAL", "LDPLSVKTF", "KVQQYAVKL", 
  "TDELSPHLL", "VDFLLATWF", "CYGRSVLNY", "GSTERNVTL", "KDEVYYVKL", 
  "KNKAAVLQL", "TDDYMELLF", "TTTLAKVEV", "FLGKALFFL", "LPDMSQPLW", 
  "KTRTEVSQY", "TEPEYLTEY", "HATTQNVLL", "KSDVFTLEL"))

Compare the output of create_motif with that of consensusMatrix:

create_motif(sequences, alphabet="AA", type="PWM")
consensusMatrix(sequences)

Note that I'm using v1.4.0 of universalmotif, but I don't think this issue has been addressed subsequently.

sessionInfo()

R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages:
[1] Biostrings_2.54.0 XVector_0.26.0 IRanges_2.20.0 S4Vectors_0.24.0 BiocGenerics_0.32.0 universalmotif_1.4.0

loaded via a namespace (and not attached):
[1] treeio_1.10.0 tidyselect_0.2.5 remotes_2.1.0 purrr_0.3.3 lattice_0.20-38 colorspace_1.4-1
[7] vctrs_0.2.0 yaml_2.2.0 rlang_0.4.1 pkgbuild_1.0.6 pillar_1.4.2 glue_1.3.1
[13] withr_2.1.2 rvcheck_0.1.6 lifecycle_0.1.0 stringr_1.4.0 ggseqlogo_0.1 zlibbioc_1.32.0
[19] munsell_0.5.0 gtable_0.3.0 callr_3.3.2 ps_1.3.0 gbRd_0.4-11 curl_4.2
[25] Rcpp_1.0.2 scales_1.0.0 backports_1.1.5 BiocManager_1.30.9 jsonlite_1.6 ggplot2_3.2.1
[31] packrat_0.5.0 stringi_1.4.3 processx_3.4.1 dplyr_0.8.3 grid_3.6.1 rprojroot_1.3-2
[37] bibtex_0.4.2 ggtree_2.0.0 Rdpack_0.11-0 cli_1.1.0 tools_3.6.1 magrittr_1.5
[43] lazyeval_0.2.2 tibble_2.1.3 crayon_1.3.4 ape_5.3 tidyr_1.0.0 pkgconfig_2.0.3
[49] zeallot_0.1.0 MASS_7.3-51.4 tidytree_0.2.9 prettyunits_1.0.2 assertthat_0.2.1 rstudioapi_0.10
[55] R6_2.4.0 nlme_3.1-141 compiler_3.6.1

Option to deprotect `motif` column in universalmotif_df

I was just messing around with a set of motifs and I wanted to selectively RC one while using a universalmotif_df.

df <- to_df(c(create_motif, create_motif(name = "motif2")))

df %>%
   mutate(motif = ifelse(name == "motif2", motif_rc(motif), motif))

Of course, this doesn't work because of the AsIs class.

So this got me thinking if there could be a way for folks to deprotect the motif for cases like this. Users doing this would know the operation is unsafe. Here's the silly implementation I came up with:

edit_motif <- function(m){
  class(m) <- NULL
  m 
}

So the above becomes:

df <- to_df(c(create_motif, create_motif(name = "motif2")))

df %>%
   mutate(motif = ifelse(name == "motif2", motif_rc(edit_motif(motif)), edit_motif(motif)))

Not sure if this is the perfect solution. For instance, this won't keep the AsIs attribute after the mutate, so maybe a macro for wrapping the whole ifelse operation to deprotect could work? I'll do some more thinking on this, but figured I'd post it now.

error in install

BiocManager::install("universalmotif")

Bioconductor version 3.12 (BiocManager 1.30.10), R 4.0.2 (2020-06-22) Installing package(s) 'universalmotif' trying URL 'https://bioconductor.org/packages/3.12/bioc/src/contrib/universalmotif_1.8.3.tar.gz' Content type 'application/x-gzip' length 3732415 bytes (3.6 MB) ================================================== downloaded 3.6 MB  * installing *source* package ‘universalmotif’ ... ** using staged installation ** libs g++ -std=gnu++11 -I"/usr/local/lib64/R/include" -DNDEBUG  -I'/home/zhou/Rlib/Rcpp/include' -I'/home/zhou/Rlib/RcppThread/include' -I/usr/local/include   -fpic  -g -O2  -c RcppExports.cpp -o RcppExports.o g++ -std=gnu++11 -I"/usr/local/lib64/R/include" -DNDEBUG  -I'/home/zhou/Rlib/Rcpp/include' -I'/home/zhou/Rlib/RcppThread/include' -I/usr/local/include   -fpic  -g -O2  -c add_multifreq.cpp -o add_multifreq.o g++ -std=gnu++11 -I"/usr/local/lib64/R/include" -DNDEBUG  -I'/home/zhou/Rlib/Rcpp/include' -I'/home/zhou/Rlib/RcppThread/include' -I/usr/local/include   -fpic  -g -O2  -c compare_motifs.cpp -o compare_motifs.o In file included from /home/zhou/Rlib/RcppThread/include/RcppThread.h:11:0,                  from compare_motifs.cpp:2: /home/zhou/Rlib/RcppThread/include/RcppThread/Thread.hpp: In lambda function: /home/zhou/Rlib/RcppThread/include/RcppThread/Thread.hpp:42:19: error: parameter packs not expanded with ‘...’:                  f(args...);                    ^ /home/zhou/Rlib/RcppThread/include/RcppThread/Thread.hpp:42:19: note:         ‘args’ /home/zhou/Rlib/RcppThread/include/RcppThread/Thread.hpp:42:23: error: expansion pattern ‘args’ contains no argument packs                  f(args...);                        ^ In file included from /home/zhou/Rlib/RcppThread/include/RcppThread.h:13:0,                  from compare_motifs.cpp:2: /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp: In member function ‘void RcppThread::ThreadPool::push(F&&, Args&& ...)’: /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:31: error: expected ‘,’ before ‘...’ token          jobs_.emplace([f, args...] { f(args...); });                                ^ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:31: error: expected identifier before ‘...’ token /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:34: error: parameter packs not expanded with ‘...’:          jobs_.emplace([f, args...] { f(args...); });                                   ^ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:34: note:         ‘args’ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp: In lambda function: /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:44: error: expansion pattern ‘args’ contains no argument packs          jobs_.emplace([f, args...] { f(args...); });                                             ^ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp: In member function ‘std::future<decltype (f(args ...))> RcppThread::ThreadPool::pushReturn(F&&, Args&& ...)’: /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:146:54: error: expected ‘,’ before ‘...’ token      auto job = std::make_shared<jobPackage>([&f, args...] {                                                       ^ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:146:54: error: expected identifier before ‘...’ token /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:146:57: error: parameter packs not expanded with ‘...’:      auto job = std::make_shared<jobPackage>([&f, args...] {                                                          ^ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:146:57: note:         ‘args’ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp: In lambda function: /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:147:22: error: expansion pattern ‘args’ contains no argument packs          return f(args...);                       ^ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp: In instantiation of ‘struct RcppThread::ThreadPool::push(F&&, Args&& ...) [with F = RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = compare_motifs_cpp(const List&, const std::vector<int>&, const std::vector<int>&, const string&, double, bool, std::vector<std::vector<double> >&, int, bool, double, bool, int, double, const std::vector<double>&, const string&)::__lambda13&; ptrdiff_t = long int; size_t = long unsigned int]::__lambda8&; Args = {const RcppThread::Batch&}]::__lambda5’: /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:9:   required from ‘void RcppThread::ThreadPool::push(F&&, Args&& ...) [with F = RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = compare_motifs_cpp(const List&, const std::vector<int>&, const std::vector<int>&, const string&, double, bool, std::vector<std::vector<double> >&, int, bool, double, bool, int, double, const std::vector<double>&, const string&)::__lambda13&; ptrdiff_t = long int; size_t = long unsigned int]::__lambda8&; Args = {const RcppThread::Batch&}]’ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:201:9:   required from ‘void RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = compare_motifs_cpp(const List&, const std::vector<int>&, const std::vector<int>&, const string&, double, bool, std::vector<std::vector<double> >&, int, bool, double, bool, int, double, const std::vector<double>&, const string&)::__lambda13&; ptrdiff_t = long int; size_t = long unsigned int]’ /home/zhou/Rlib/RcppThread/include/RcppThread/parallelFor.hpp:48:5:   required from ‘void RcppThread::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t, size_t) [with F = compare_motifs_cpp(const List&, const std::vector<int>&, const std::vector<int>&, const string&, double, bool, std::vector<std::vector<double> >&, int, bool, double, bool, int, double, const std::vector<double>&, const string&)::__lambda13; ptrdiff_t = long int; size_t = long unsigned int]’ compare_motifs.cpp:1426:18:   required from here /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:27: error: using invalid field ‘RcppThread::ThreadPool::push(F&&, Args&& ...)::__lambda5::__args’          jobs_.emplace([f, args...] { f(args...); });                            ^ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp: In instantiation of ‘struct RcppThread::ThreadPool::push(F&&, Args&& ...) [with F = RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = compare_motifs_all_cpp(const List&, const string&, double, bool, std::vector<std::vector<double> >&, int, bool, double, bool, int, double, const std::vector<double>&, const string&)::__lambda14&; ptrdiff_t = long int; size_t = long unsigned int]::__lambda8&; Args = {const RcppThread::Batch&}]::__lambda5’: /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:9:   required from ‘void RcppThread::ThreadPool::push(F&&, Args&& ...) [with F = RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = compare_motifs_all_cpp(const List&, const string&, double, bool, std::vector<std::vector<double> >&, int, bool, double, bool, int, double, const std::vector<double>&, const string&)::__lambda14&; ptrdiff_t = long int; size_t = long unsigned int]::__lambda8&; Args = {const RcppThread::Batch&}]’ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:201:9:   required from ‘void RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = compare_motifs_all_cpp(const List&, const string&, double, bool, std::vector<std::vector<double> >&, int, bool, double, bool, int, double, const std::vector<double>&, const string&)::__lambda14&; ptrdiff_t = long int; size_t = long unsigned int]’ /home/zhou/Rlib/RcppThread/include/RcppThread/parallelFor.hpp:48:5:   required from ‘void RcppThread::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t, size_t) [with F = compare_motifs_all_cpp(const List&, const string&, double, bool, std::vector<std::vector<double> >&, int, bool, double, bool, int, double, const std::vector<double>&, const string&)::__lambda14; ptrdiff_t = long int; size_t = long unsigned int]’ compare_motifs.cpp:1485:18:   required from here /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:27: error: using invalid field ‘RcppThread::ThreadPool::push(F&&, Args&& ...)::__lambda5::__args’ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp: In instantiation of ‘struct RcppThread::ThreadPool::push(F&&, Args&& ...) [with F = RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = pval_extractor(const std::vector<int>&, const std::vector<double>&, const std::vector<int>&, const std::vector<int>&, const string&, const std::vector<int>&, const std::vector<int>&, const std::vector<double>&, const std::vector<double>&, const std::vector<std::basic_string<char> >&, int)::__lambda15&; ptrdiff_t = long int; size_t = long unsigned int]::__lambda8&; Args = {const RcppThread::Batch&}]::__lambda5’: /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:9:   required from ‘void RcppThread::ThreadPool::push(F&&, Args&& ...) [with F = RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = pval_extractor(const std::vector<int>&, const std::vector<double>&, const std::vector<int>&, const std::vector<int>&, const string&, const std::vector<int>&, const std::vector<int>&, const std::vector<double>&, const std::vector<double>&, const std::vector<std::basic_string<char> >&, int)::__lambda15&; ptrdiff_t = long int; size_t = long unsigned int]::__lambda8&; Args = {const RcppThread::Batch&}]’ /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:201:9:   required from ‘void RcppThread::ThreadPool::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t) [with F = pval_extractor(const std::vector<int>&, const std::vector<double>&, const std::vector<int>&, const std::vector<int>&, const string&, const std::vector<int>&, const std::vector<int>&, const std::vector<double>&, const std::vector<double>&, const std::vector<std::basic_string<char> >&, int)::__lambda15&; ptrdiff_t = long int; size_t = long unsigned int]’ /home/zhou/Rlib/RcppThread/include/RcppThread/parallelFor.hpp:48:5:   required from ‘void RcppThread::parallelFor(ptrdiff_t, ptrdiff_t, F&&, size_t, size_t) [with F = pval_extractor(const std::vector<int>&, const std::vector<double>&, const std::vector<int>&, const std::vector<int>&, const string&, const std::vector<int>&, const std::vector<int>&, const std::vector<double>&, const std::vector<double>&, const std::vector<std::basic_string<char> >&, int)::__lambda15; ptrdiff_t = long int; size_t = long unsigned int]’ compare_motifs.cpp:1859:18:   required from here /home/zhou/Rlib/RcppThread/include/RcppThread/ThreadPool.hpp:129:27: error: using invalid field ‘RcppThread::ThreadPool::push(F&&, Args&& ...)::__lambda5::__args’ make: *** [compare_motifs.o] Error 1 ERROR: compilation failed for package ‘universalmotif’ * removing ‘/home/zhou/Rlib/universalmotif’  The downloaded source packages are in 	‘/tmp/RtmpS1nRO3/downloaded_packages’ Installation path not writeable, unable to update packages: boot, class, cluster,   codetools, foreign, KernSmooth, MASS, Matrix, mgcv, nlme, nnet, spatial, survival Warning message: In install.packages(...) :   installation of package ‘universalmotif’ had non-zero exit status
--

read_meme fails when alphabet is DNA/RNA/AA-LIKE or custom

dreme.txt

> read_meme("dreme.txt")
Error in strsplit(alph, "\\s+")[[1]] : subscript out of bounds

This happens when the ALPHABET string defines a DNA/RNA/AA-LIKE or custom alphabet.

A few outcomes could happen here:

  1. Throw informative error for both *-LIKE alphabets and custom alphabets
  2. use *-LIKE Alphabets as DNA/RNA/AA
  3. throw an error for custom alphabets
  4. fully support custom alphabets

4 is quite the undertaking, so I've implemented 2 & 3. PR incoming.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.