Code Monkey home page Code Monkey logo

Comments (11)

bjmt avatar bjmt commented on August 16, 2024

Hi,

Thanks for the report. The calc.pvals is a recent addition so I appreciate the feedback.

I've found the source of the error message for scan_sequences() and corrected it here on github. It will take a few days for it to get to Bioconductor if that's where you're installing the package from.

As for the error you are getting when trying to convert the PWMatrixList: unfortunately I cannot replicate this one. If you wouldn't mind, could you share the object? (You can use the dput() function and paste the results.) Totally up to you, I understand if you want to keep it private.

As for the motif ID: first, the IDs from the PWMatrixList objects are carried over to universalmotif objects under the title Alternate name (the slot is called altname). As for it not being present in the DFrame: I predicted things like this could be a problem (i.e. using lists of motifs with duplicate names, or having the altname slot hold the unique identifier), so I added an extra integer index column (motif.i). You can then use this to associate the motif IDs with each row of the scan_sequences() output. For example, if you are working with a PWMatrixList:

IDs <- vapply(my_pwm_list, function(x) x@ID, character(1))
scan_output$ID <- IDs[scan_output$motif.i]

Finally, regarding threshold calculation: when using threshold.type = "logodds", the final threshold is calculated as the maximum possible score multiplied by the value of the threshold parameter. I've gone with this approach instead of getting the quantile of the range of possible scores because some motifs can have negative infinity minimum scores, and anyways generally people are only interested in positive scores (at least that's my impression -- doesn't a negative score basically mean it's more likely to be a background hit?). However, it is still possible to get the kind of threshold you want using the motif_score() function:

motifs <- convert_motifs(my_pwm_list)

# No normalization (requires that your motifs do not have negative infinity scores):
scores <- vapply(motifs, function(x) motif_score(x, allow.nonfinite = TRUE, threshold.type = "total"), numeric(1))
# With normalization:
scores2 <- vapply(motifs, function(x) motif_score(x, allow.nonfinite = FALSE, threshold.type = "total"), numeric(1))

# Now scan:
scan_sequences(my_pwm_list, my_sequences, threshold = scores, threshold.type = "logodds.abs", RC = TRUE, calc.pvals = TRUE)

Let me know if this solves your issues. If you don't want to pursue the second convert_motifs() error then I'll go ahead and close the issue.

Edit: typo

from universalmotif.

OXB-DC avatar OXB-DC commented on August 16, 2024

Thanks for the really quick reply and update to the code with calc.pvals = TRUE. I have updated my package directly from github and it now appears to work with a simple test example.

There was no error when when trying to convert a PFMatrixList, I just tried converting my PFMatrixList object to a universalmotif object before running scan_sequences (and having now looked at the source, and read the help in more detail realised this wouldn't have made any difference anyway, and this is done automatically).

I couldn't get your exact code to work but it pointed me in the right direction. I managed to use the scores_motif function to calculate the min and max scores and from there calculate a different absolute threshold for reach element (in the same way that TFBSTools::searchSeq does when using a percentage threshold).

Maybe keep the issue open for a day or two in case I encounter any further issues. Thanks for the help anyway. It was much appreciated.

from universalmotif.

bjmt avatar bjmt commented on August 16, 2024

Yes, I can see why the code wouldn't work --- I forgot to perform the quantile calculation, sorry about that.

Glad I was of help. I'll close the issue in a couple of days.

from universalmotif.

OXB-DC avatar OXB-DC commented on August 16, 2024

So this is mostly now working but I am having problems with very specific input PWMs causing RStudio to abort (requires restart).

I have copied the source for scan_sequences (and all the necessary accessory functions), defined the input variables and gone through it line by line with an example of a problem PWM (this is the only 1 I have identified so far but there are others). Its been quite laborious. Anyway, I get to line 361, and this causes the abort:

    out$pvalue <- motif_pvalue(motifs[out$motif.i], out$score, use.freq = use.freq,
      nthreads = nthreads, allow.nonfinite = allow.nonfinite, k = motif_pvalue.k)

The dput() for the PWM is:

new("universalmotif", name = "LM40", altname = "CN0040.1", family = "", 
    organism = "Homo sapiens", motif = structure(c(-2.12395465449616, 
    -0.888956530335224, -2.05380206183545, 1.68898533119274, 
    -2.06039732558369, 1.53016323143546, -2.4646618883504, -0.521527543882626, 
    -1.50058150082382, 1.39482961992938, -4.96851513448883, 0.0381893039501086, 
    -1.63302050188377, 1.5197916427118, -3.61387808386758, -0.43016412966062, 
    1.62835639951109, -2.05879318255831, -2.54614282233213, -1.05798575660883, 
    -2.19044153063008, 0.189008780376107, -1.78294918891624, 
    1.33249264619988, -2.12395465449616, -2.05879318255831, -2.24465522706756, 
    1.8434398606671, 0.359930318200709, -2.76692627201666, 1.22643877996926, 
    -3.42379940917238, 1.73520175977255, -3.74649583346093, -1.15278060668211, 
    -3.61181747898124, -2.26014112173258, 1.6820616555792, -3.1645754647309, 
    -1.12806042801111, -0.271042966536174, -2.49045395065771, 
    -1.39546859527949, 1.4869826000752, -3.96370730181868, -1.93950687517175, 
    -3.61387808386758, 1.96201785350123, -3.96370730181868, 1.50934495201134, 
    -4.57733523106496, 0.131114698865368, 1.90800806072115, -4.2157840977931, 
    -3.80124574139517, -3.82806355006355, 1.78363325477044, -3.39297024929661, 
    -1.59779756016926, -3.82806355006355, 1.04147061270769, -4.91578062160256, 
    -5.50691437540247, 1.01249399843676, -1.72858934257226, -4.91578062160256, 
    1.75891374079746, -3.25748347951178, -0.75860991337402, -4.2157840977931, 
    1.61171365725559, -2.84969132280715, -0.307475314489587, 
    -1.88336938508686, 1.3941740309422, -2.73589128754153, 1.83467689942055, 
    -3.96214190038266, -2.38753788163716, -2.97324259102268, 
    0.134054393518394, -2.33177854740234, 1.23255940854213, -2.12146844533144, 
    -1.45900573423771, 0.75551474042414, -1.99541050743711, 0.848849376675759, 
    -1.34101748804916, -0.782441237407515, -2.82247225097757, 
    1.63711857127716), .Dim = c(4L, 23L), .Dimnames = list(c("A", 
    "C", "G", "T"), c("T", "C", "Y", "C", "A", "Y", "T", "R", 
    "A", "C", "T", "T", "Y", "A", "A", "W", "G", "G", "G", "A", 
    "R", "Y", "T"))), alphabet = "DNA", type = "PWM", icscore = 21.7784742398783, 
    nsites = numeric(0), pseudocount = 1, bkg = c(A = 0.253481102585962, 
    C = 0.253196930946292, G = 0.263285024154589, T = 0.230036942313157
    ), bkgsites = numeric(0), consensus = "TCYCAYTRACTTYAAWGGGARYT", 
    strand = "+", pval = numeric(0), qval = numeric(0), eval = numeric(0), 
    multifreq = list(), extrainfo = c(comment = "-", consensus = "TCCCATTGACTTCAAWGGGAGYT", 
    medline = "17442748", collection = "CNE", species.9606 = "Homo sapiens", 
    acc = ""), gapinfo = new("universalmotif_gapped", isgapped = FALSE, 
        gaploc = numeric(0), mingap = numeric(0), maxgap = numeric(0), 
        extrapvals = numeric(0)))

And it aborts RStudio regardless of whether I change the score or nthreads (FYI out$score was originally 30.094). All other arguments are defaults. Can you reproduce this on your side? And if so can you help to resolve?

Kind Regards,
Danny

from universalmotif.

OXB-DC avatar OXB-DC commented on August 16, 2024

Oh I also forgot to mention there also looked like there were some possible issues with missing braces and if statements. Not sure if this matters e.g. line 194-197 and in a few other places

from universalmotif.

OXB-DC avatar OXB-DC commented on August 16, 2024

Went a bit further down the rabbit hole as far as

motif_pvalue_cpp <- function(motifs, bkg, scores, k = 6L, nthreads = 1L, allow_nonfinite = FALSE) {
    .Call('_universalmotif_motif_pvalue_cpp', PACKAGE = 'universalmotif', motifs, bkg, scores, k, nthreads, allow_nonfinite)
}

which is getting into C++ territory and presently beyond me!

from universalmotif.

bjmt avatar bjmt commented on August 16, 2024

Thanks for the detailed reports, and I'm so sorry you're having all this trouble. I've been able to get crashing happening by playing around with motif_pvalue() and your motif. Something must be going wrong in the c++ function (as you correctly pointed out). I will investigate! But I won't get an answer for you for a few hours at least.

Again I really appreciate the feedback.

from universalmotif.

OXB-DC avatar OXB-DC commented on August 16, 2024

So I tried changing k, and for k=4 through to k=7 it reported a value of 0
For k=8 it causes RStudio to abort
For k=9 it reports a value of 0
For k=10, 3.033576e-12
For k=11, 2.234804e-11
For k=12, 2.693484e-11
For k=13, 2.693484e-11
And after that it gives a warning and reverts to k=13

I am presently rerunning all my scans with k=12...may take a while...will let you know

from universalmotif.

OXB-DC avatar OXB-DC commented on August 16, 2024

It works fine with k=12

from universalmotif.

bjmt avatar bjmt commented on August 16, 2024

Think I've found the bug. When k resulted in the motif being split into more than two (so for your case when k<12), there was a memory access bug which resulted in erratic behaviour and occasional crashes.

I've patched the bug now, and the Bioconductor release will get it in a day or two. I'll leave this issue open for another day or two in case you still have problems.

from universalmotif.

bjmt avatar bjmt commented on August 16, 2024

Closing. As far as I know the bug is fixed in version 1.8.1 and above.

from universalmotif.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.