Code Monkey home page Code Monkey logo

Comments (11)

cjfields avatar cjfields commented on August 17, 2024

@passdan That is by design; the bases in those positions for the 'probe' is a defined sequence. In order to get approximate matches you will need to use a module like Text::Fuzzy or String::Approx.

from bioperl-live.

tsibley avatar tsibley commented on August 17, 2024

@passdan Try Bio::Regexp

from bioperl-live.

cjfields avatar cjfields commented on August 17, 2024

@tsibley interesting, that may be a good internal replacement for IUPAC or SeqPattern

from bioperl-live.

tsibley avatar tsibley commented on August 17, 2024

Yeah, I've been wondering about such things. It looks quite nice.

from bioperl-live.

passdan avatar passdan commented on August 17, 2024

Thanks for the advice. As soon as I got the first reply I realised how obviously wrong it would have been to match in that direction!

I'm considering the best way of incorporating the fuzzy or approx matching. At the moment I take an input IUPAC-compliant 'probe' sequence, generate a regex and perform a match:

    if ( $seq->seq =~ m/^(.*)($regexp)(.*)/i ){

but the fuzzy matching tools won't accept a regular expression. Would the best method be to create an array of the possible sequences then loop through them and fuzzy match each? I imagine that this could increase the compute time significantly, but is there a better way?

Thanks

from bioperl-live.

tsibley avatar tsibley commented on August 17, 2024

@passdan Did you look at Bio::Regexp? It'll do fuzzy matching from an IUPAC-compliant string or your (constrained) regular expression.

my $probe   = Bio::Regexp->new->dna->add( $IUPAC_string );
my @matches = $probe->match( $seq->seq );

from bioperl-live.

passdan avatar passdan commented on August 17, 2024

I had a look at that, but I can't see the fuzzy matching reference. From what I see it will only match a defined sequence or regex and not allow any substitutions.

I've been looking at the Text::Approx and it seems appropriate, where Ix, Dx and Sx allow x number of insertion, deletion or substitutions:

                my $index = aindex($probe, ["I0","D0","S1"], $ref);

Unfortunately, 0 everything seems to work but once I apply S1 it matches much more indiscriminatingly than a single base sub.

Sorry, I know this is supposed to be a bug report channel and I'm over running it.

from bioperl-live.

tsibley avatar tsibley commented on August 17, 2024

Ah, you're right. Apologies for not reading more closely.

You might look at Bio::Grep with the Vmatch backend?

from bioperl-live.

passdan avatar passdan commented on August 17, 2024

That does look good. I'll look into it, thanks.

from bioperl-live.

tsibley avatar tsibley commented on August 17, 2024

@passdan Out of curiosity, I turned your musing of "generating all possible sequences and matching on those" on its head and inverted the match sense while generating all possible probes. The idea is that probes are shorter and much more likely to have fewer degenerate bases. This was mostly just an exercise in curiosity, but here it is: https://gist.github.com/tsibley/15edab0b97683c3ce9fe

There are some obvious improvements that could be made in the name of memory efficiency, but they slightly complicate the clarity of the approach.

from bioperl-live.

cjfields avatar cjfields commented on August 17, 2024

@tsibley If we could only +1 a response 👍

Marking as closed.

from bioperl-live.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.