Code Monkey home page Code Monkey logo

Comments (7)

snystrom avatar snystrom commented on August 16, 2024

What is your desired output exactly? Or your intended use-case for the merged motif? A variable-length gap isn't well encoded by a single PWM, so you'll need to make some sort of tradeoff here.

from universalmotif.

bjmt avatar bjmt commented on August 16, 2024

Thanks for checking out the package!

As @snystrom has mentioned, a PWM isn't the best format here. As far as the universalmotif package is concerned, motifs are assumed to be of fixed length. I do implement a certain kind of variable gapped motif in the universalmotif package (see add_gap()), but its use is currently limited only to scanning for occurrences of the motifs in sequences. Those gaps are totally ignored by compare_motifs(), view_motifs(), merge_motifs(), etc.

If you absolutely need to merge the two segments, you could always try doing it manually. For example, you could first identify which positions are of interest (i.e., high information content positions) with colSums(convert_type(my_motif, "ICM")), then create individual segments based on which positions you want using subset(my_motif, 3:8) before trying with merge_motifs() (for example).

Other than that I cannot think of a possible solution using available universalmotif functionality sadly, so I will close this issue. Feel free to reopen if you have additional questions.

from universalmotif.

xiao00su avatar xiao00su commented on August 16, 2024

Thank you very much for your quick replies.
Recently I am working on snATAC data, I collected lost of motifs from different database. Some genes have hundreds of similar motifs. I think it would be useful to merge them into a single motif to do the motif scan.
The motifs I listed in the picture were motifs of the same gene collected from different database.
Lhx2

from universalmotif.

bjmt avatar bjmt commented on August 16, 2024

Interesting. I agree, merging them into a consensus motif before scanning is a good idea. However in my opinion you shouldn't try and merge the variable gap motifs with the rest, since they are too different from everything else.

from universalmotif.

xiao00su avatar xiao00su commented on August 16, 2024

I try two stragedy to do the merge.
A: 1. caculate the similarity score of each motif and get the Topological overlap Matrix (TOM). (homer compareMotifs.pl)
2. cluster the motifs based on TOM (seurat )
3. merge the motifs of each gene by clusters (stackMotif/universalMotif, mergeMotifs)

B: merge the motifs of each gene by universalMotifs::merge_similar. (easy but may need to adjust the paremeter of each gene)

**My concern is how likely the consensus motif is the right one? **
Both method show some degree of reasonable consensus motif.
I collected ~6000 of motifs from ~700 genes. How can I estimate the consensus motifs batchly(not by eye)?

motif_cluster2

merge_similar
Lhx2-merge

from universalmotif.

bjmt avatar bjmt commented on August 16, 2024

Oh wow, neat to see the two approaches give such similar results.

Unfortunately I don't think there's an easy answer for your question. What I've done myself recently is to optimize for clustering which result in consensus motifs with the strongest enrichment in the target sequences versus the background. In other words, if the significance of enrichment of the merged motif is weaker than the original motifs then I would not use the merged motif and change the clustering parameters.

from universalmotif.

xiao00su avatar xiao00su commented on August 16, 2024

enrichment scoring is a good idea to test the consensus motif. I will try some of TFs.
Wish the clustering optimiztion goes well and available to be used soon.

from universalmotif.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.