The idea is that we want a robust framework for featurizing trajectories. Key abilitie

We might want the option to save the vectors to disk to avoid recomputing them.<

I was just talking to <a class="user-mention notranslate" data-hovercard-type="user" d

Create feature vectorizer class / API. about mdtraj HOT 16 CLOSED

mdtraj commented on June 19, 2024

Create feature vectorizer class / API.

from mdtraj.

Comments (16)

rmcgibbo commented on June 19, 2024

We might want the option to save the vectors to disk to avoid recomputing them.
The APIs from sklearn for feature extraction are probably good models. http://scikit-learn.org/dev/modules/feature_extraction.html

from mdtraj.

kyleabeauchamp commented on June 19, 2024

Yes, I was explicitly thinking about sklearn as a model.

from mdtraj.

kyleabeauchamp commented on June 19, 2024

We should also think about how to balance distance metric versus feature representations.

from mdtraj.

tjlane commented on June 19, 2024

Tagging this issue.

Kyle is your vision for this a kind of map function, where you can efficiently apply a function that takes each member of a set of snapshots to a vector? Maybe you could write a little more about what the intended use cases of this are, so I (and maybe others) can get an idea of what you're thinking.

from mdtraj.

rmcgibbo commented on June 19, 2024

I think it's basically just re-imagining metric.prepare_trajectory as a separate class.

from mdtraj.

kyleabeauchamp commented on June 19, 2024

As an example of how we might use pandas here, think about the case of calculating dihedral angles.

Right now, the output is this:

Returns
-------
rid : np.ndarray, shape=(n_chi, 4)
    The indices of the atoms involved in each of the
    chi dihedral angles
angles : np.ndarray, shape=(n_frames, n_chi)
    The value of the dihedral angle for each of the angles in each of
    the frames.

The way pandas could be useful is by providing a natural way to merge the metadata (rid) and the values (angles). We could also have a way to switch between a string index and a "multiindex", where the multi-index would contain the following:

Type of calculation (chi torsion)
Residue ID
atom ID

I'm not trying to claim that this is the best way to do this, but it is one way to help streamline this stuff...

from mdtraj.

kyleabeauchamp commented on June 19, 2024

Then the "job" of the vectorizer would be to do two things:

Calculate the quantity
Give it a self-consistent label

from mdtraj.

rmcgibbo commented on June 19, 2024

I was just talking to @msultan about this yesterday afternoon. A useful part of the (feature/vector)izer api would be a minimal operator-type logic. For algorithms like ktICA where you're, in some sense throwing the kitchen sink at the problem in terms of very large feature spaces (sure, they might be implicit but it's still in the spirit), you might want to use a sort of operator logic on featurizers to build up a complex "compound" feature space.

i.e.

[...]
>>> traj.n_frames == 100
>>> # applying two featurizers like normal
>>> dihedral_featurizer(traj).shape == (100, 5)
>>> contact_featurizer(traj).shape == (100, 5)

>>> joint_featurizer = dihedral_featurizer + contact_featurizer
>>> # adding two operators together yields a new compound operator
>>> # when you apply it, you get something that does both
>>> joint_featurizer(trajectory).shape == (100, 15)

I'm not sure what operations really make sense. There's adding two featurizers. Multiplying by a scalar
makes sense. Also, perhaps a kind of generalized outer product makes sense. For example, if you have
two binary featurizers that each induce a 10 dimensional space and you take their outer product under
the 'logical and' operator, you'd get a 100 dimensional feature space with all of the pairwise logical ands --
of the form space1[i] && space2[j].

Maybe I'm overthinking this. I'm not really sure what the use cases are except for some kind of very exhaustive enumeration.

cc: @schwancr

from mdtraj.

kyleabeauchamp commented on June 19, 2024

I think Christian should give his thoughts on the desired properties.
On Aug 6, 2013 3:58 AM, "Robert McGibbon" [email protected] wrote:

I was just talking to @msultan https://github.com/msultan about this
yesterday afternoon. A useful part of the (feature/vector)izer api would be
a minimal operator-type logic. For algorithms like ktICA where you're, in
some sense throwing the kitchen sink at the problem in terms of very large
feature spaces (sure, they might be implicit but it's still in the spirit),
you might want to use a sort of operator logic on featurizers to build up a
complex "compound" feature space.

i.e.

[...]

traj.n_frames == 100

applying two featurizers like normal

dihedral_featurizer(traj).shape == (100, 5)
contact_featurizer(traj).shape == (100, 5)

joint_featurizer = dihedral_featurizer + contact_featurizer

adding two operators together yields a new compound operator

when you apply it, you get something that does both

joint_featurizer(trajectory).shape == (100, 15)

I'm not sure what operations really make sense. There's adding two
featurizers. Multiplying by a scalar
makes sense. Also, perhaps a kind of generalized outer product makes
sense. For example, if you have
two binary featurizers that each induce a 10 dimensional space and you
take their outer product under
the 'logical and' operator, you'd get a 100 dimensional feature space with
all of the pairwise logical ands --
of the form space1[i] && space2[j].

Maybe I'm overthinking this. I'm not really sure what the use cases are
except for some kind of very exhaustive enumeration.

cc: @schwancr https://github.com/schwancr

—
Reply to this email directly or view it on GitHubhttps://github.com/rmcgibbo/mdtraj/issues/49#issuecomment-22171463
.

from mdtraj.

schwancr commented on June 19, 2024

It sounds like a decent idea to provide some operators for the featurizer objects, but I'm worried it could be confusing, and I bet someone will end up adding the result of two featurizer.call's as opposed to adding two featurizers.

For instance, you could add operators for building a Hybrid metric:

>>> rmsd = RMSD()
>>> dihedral = Dihedral()
>>> hybrid = 0.1 * rmsd + 0.9 * dihedral
...
>>> hybrid = Hybrid([rmsd, dihedral], [0.1, 0.9])

Those two methods would be the same, but the first one to me could be confusing. I think I'd prefer to have a hybrid featurizer just like we have a hybrid metric. In fact, even in the __add__ case we would need (I think) to write this class as well, meaning it would just be initialized in a unique way.

from mdtraj.

kyleabeauchamp commented on June 19, 2024

I agree that some form of addition operation is critical, as we don't want to manually keep track of calculating each feature.

from mdtraj.

kyleabeauchamp commented on June 19, 2024

I'm not as fond of the outer product.

This does bring up the related point of how to design the MSMB3 hooks etc.

from mdtraj.

schwancr commented on June 19, 2024

The outer product is a lot like using the kernel trick with second degree polynomials. So we could have a Polynomial featurizer if we wanted to. But again I think it's clearer to have the featurizers be initialized by calling __init__

from mdtraj.

kyleabeauchamp commented on June 19, 2024

OK, I'm fine with creating features from lists, rather than explicitly adding them.

from mdtraj.

schwancr commented on June 19, 2024

By the way, are we set on the "featurizer" name?

from mdtraj.

rmcgibbo commented on June 19, 2024

I don't think it was quite clear to me last night that this operator stuff is just an alternative interface to a bunch of constructors for init methods for classes like SumFeaturizer and ScalarMultipleFeaturizer, etc.

I'm not set on the name featurizer though.

-Robert
Sent from my iPhone.

On Tue, Aug 6, 2013 at 10:08 AM, Christian Schwantes
[email protected] wrote:

By the way, are we set on the "featurizer" name?

Reply to this email directly or view it on GitHub:
https://github.com/rmcgibbo/mdtraj/issues/49#issuecomment-22193291

from mdtraj.

Create feature vectorizer class / API. about mdtraj HOT 16 CLOSED

Comments (16)

applying two featurizers like normal

adding two operators together yields a new compound operator

when you apply it, you get something that does both

By the way, are we set on the "featurizer" name?

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent