Code Monkey home page Code Monkey logo

Comments (16)

rmcgibbo avatar rmcgibbo commented on June 19, 2024

from mdtraj.

kyleabeauchamp avatar kyleabeauchamp commented on June 19, 2024

Yes, I was explicitly thinking about sklearn as a model.

from mdtraj.

kyleabeauchamp avatar kyleabeauchamp commented on June 19, 2024

We should also think about how to balance distance metric versus feature representations.

from mdtraj.

tjlane avatar tjlane commented on June 19, 2024

Tagging this issue.

Kyle is your vision for this a kind of map function, where you can efficiently apply a function that takes each member of a set of snapshots to a vector? Maybe you could write a little more about what the intended use cases of this are, so I (and maybe others) can get an idea of what you're thinking.

from mdtraj.

rmcgibbo avatar rmcgibbo commented on June 19, 2024

I think it's basically just re-imagining metric.prepare_trajectory as a separate class.

from mdtraj.

kyleabeauchamp avatar kyleabeauchamp commented on June 19, 2024

As an example of how we might use pandas here, think about the case of calculating dihedral angles.

Right now, the output is this:

Returns
-------
rid : np.ndarray, shape=(n_chi, 4)
    The indices of the atoms involved in each of the
    chi dihedral angles
angles : np.ndarray, shape=(n_frames, n_chi)
    The value of the dihedral angle for each of the angles in each of
    the frames.

The way pandas could be useful is by providing a natural way to merge the metadata (rid) and the values (angles). We could also have a way to switch between a string index and a "multiindex", where the multi-index would contain the following:

Type of calculation (chi torsion)
Residue ID
atom ID

I'm not trying to claim that this is the best way to do this, but it is one way to help streamline this stuff...

from mdtraj.

kyleabeauchamp avatar kyleabeauchamp commented on June 19, 2024

Then the "job" of the vectorizer would be to do two things:

  1. Calculate the quantity
  2. Give it a self-consistent label

from mdtraj.

rmcgibbo avatar rmcgibbo commented on June 19, 2024

I was just talking to @msultan about this yesterday afternoon. A useful part of the (feature/vector)izer api would be a minimal operator-type logic. For algorithms like ktICA where you're, in some sense throwing the kitchen sink at the problem in terms of very large feature spaces (sure, they might be implicit but it's still in the spirit), you might want to use a sort of operator logic on featurizers to build up a complex "compound" feature space.

i.e.

[...]
>>> traj.n_frames == 100
>>> # applying two featurizers like normal
>>> dihedral_featurizer(traj).shape == (100, 5)
>>> contact_featurizer(traj).shape == (100, 5)

>>> joint_featurizer = dihedral_featurizer + contact_featurizer
>>> # adding two operators together yields a new compound operator
>>> # when you apply it, you get something that does both
>>> joint_featurizer(trajectory).shape == (100, 15)

I'm not sure what operations really make sense. There's adding two featurizers. Multiplying by a scalar
makes sense. Also, perhaps a kind of generalized outer product makes sense. For example, if you have
two binary featurizers that each induce a 10 dimensional space and you take their outer product under
the 'logical and' operator, you'd get a 100 dimensional feature space with all of the pairwise logical ands --
of the form space1[i] && space2[j].

Maybe I'm overthinking this. I'm not really sure what the use cases are except for some kind of very exhaustive enumeration.

cc: @schwancr

from mdtraj.

kyleabeauchamp avatar kyleabeauchamp commented on June 19, 2024

I think Christian should give his thoughts on the desired properties.
On Aug 6, 2013 3:58 AM, "Robert McGibbon" [email protected] wrote:

I was just talking to @msultan https://github.com/msultan about this
yesterday afternoon. A useful part of the (feature/vector)izer api would be
a minimal operator-type logic. For algorithms like ktICA where you're, in
some sense throwing the kitchen sink at the problem in terms of very large
feature spaces (sure, they might be implicit but it's still in the spirit),
you might want to use a sort of operator logic on featurizers to build up a
complex "compound" feature space.

i.e.

[...]

traj.n_frames == 100

applying two featurizers like normal

dihedral_featurizer(traj).shape == (100, 5)
contact_featurizer(traj).shape == (100, 5)

joint_featurizer = dihedral_featurizer + contact_featurizer

adding two operators together yields a new compound operator

when you apply it, you get something that does both

joint_featurizer(trajectory).shape == (100, 15)

I'm not sure what operations really make sense. There's adding two
featurizers. Multiplying by a scalar
makes sense. Also, perhaps a kind of generalized outer product makes
sense. For example, if you have
two binary featurizers that each induce a 10 dimensional space and you
take their outer product under
the 'logical and' operator, you'd get a 100 dimensional feature space with
all of the pairwise logical ands --
of the form space1[i] && space2[j].

Maybe I'm overthinking this. I'm not really sure what the use cases are
except for some kind of very exhaustive enumeration.

cc: @schwancr https://github.com/schwancr


Reply to this email directly or view it on GitHubhttps://github.com/rmcgibbo/mdtraj/issues/49#issuecomment-22171463
.

from mdtraj.

schwancr avatar schwancr commented on June 19, 2024

It sounds like a decent idea to provide some operators for the featurizer objects, but I'm worried it could be confusing, and I bet someone will end up adding the result of two featurizer.call's as opposed to adding two featurizers.

For instance, you could add operators for building a Hybrid metric:

>>> rmsd = RMSD()
>>> dihedral = Dihedral()
>>> hybrid = 0.1 * rmsd + 0.9 * dihedral
...
>>> hybrid = Hybrid([rmsd, dihedral], [0.1, 0.9])

Those two methods would be the same, but the first one to me could be confusing. I think I'd prefer to have a hybrid featurizer just like we have a hybrid metric. In fact, even in the __add__ case we would need (I think) to write this class as well, meaning it would just be initialized in a unique way.

from mdtraj.

kyleabeauchamp avatar kyleabeauchamp commented on June 19, 2024

I agree that some form of addition operation is critical, as we don't want to manually keep track of calculating each feature.

from mdtraj.

kyleabeauchamp avatar kyleabeauchamp commented on June 19, 2024

I'm not as fond of the outer product.

This does bring up the related point of how to design the MSMB3 hooks etc.

from mdtraj.

schwancr avatar schwancr commented on June 19, 2024

The outer product is a lot like using the kernel trick with second degree polynomials. So we could have a Polynomial featurizer if we wanted to. But again I think it's clearer to have the featurizers be initialized by calling __init__

from mdtraj.

kyleabeauchamp avatar kyleabeauchamp commented on June 19, 2024

OK, I'm fine with creating features from lists, rather than explicitly adding them.

from mdtraj.

schwancr avatar schwancr commented on June 19, 2024

By the way, are we set on the "featurizer" name?

from mdtraj.

rmcgibbo avatar rmcgibbo commented on June 19, 2024

I don't think it was quite clear to me last night that this operator stuff is just an alternative interface to a bunch of constructors for init methods for classes like SumFeaturizer and ScalarMultipleFeaturizer, etc. 

I'm not set on the name featurizer though. 

-Robert
Sent from my iPhone.

On Tue, Aug 6, 2013 at 10:08 AM, Christian Schwantes
[email protected] wrote:

By the way, are we set on the "featurizer" name?

Reply to this email directly or view it on GitHub:
https://github.com/rmcgibbo/mdtraj/issues/49#issuecomment-22193291

from mdtraj.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.