Comments (16)
- We might want the option to save the vectors to disk to avoid recomputing them.
- The APIs from
sklearn
for feature extraction are probably good models. http://scikit-learn.org/dev/modules/feature_extraction.html
from mdtraj.
Yes, I was explicitly thinking about sklearn as a model.
from mdtraj.
We should also think about how to balance distance metric versus feature representations.
from mdtraj.
Tagging this issue.
Kyle is your vision for this a kind of map function, where you can efficiently apply a function that takes each member of a set of snapshots to a vector? Maybe you could write a little more about what the intended use cases of this are, so I (and maybe others) can get an idea of what you're thinking.
from mdtraj.
I think it's basically just re-imagining metric.prepare_trajectory as a separate class.
from mdtraj.
As an example of how we might use pandas here, think about the case of calculating dihedral angles.
Right now, the output is this:
Returns
-------
rid : np.ndarray, shape=(n_chi, 4)
The indices of the atoms involved in each of the
chi dihedral angles
angles : np.ndarray, shape=(n_frames, n_chi)
The value of the dihedral angle for each of the angles in each of
the frames.
The way pandas could be useful is by providing a natural way to merge the metadata (rid) and the values (angles). We could also have a way to switch between a string index and a "multiindex", where the multi-index would contain the following:
Type of calculation (chi torsion)
Residue ID
atom ID
I'm not trying to claim that this is the best way to do this, but it is one way to help streamline this stuff...
from mdtraj.
Then the "job" of the vectorizer would be to do two things:
- Calculate the quantity
- Give it a self-consistent label
from mdtraj.
I was just talking to @msultan about this yesterday afternoon. A useful part of the (feature/vector)izer api would be a minimal operator-type logic. For algorithms like ktICA where you're, in some sense throwing the kitchen sink at the problem in terms of very large feature spaces (sure, they might be implicit but it's still in the spirit), you might want to use a sort of operator logic on featurizers to build up a complex "compound" feature space.
i.e.
[...]
>>> traj.n_frames == 100
>>> # applying two featurizers like normal
>>> dihedral_featurizer(traj).shape == (100, 5)
>>> contact_featurizer(traj).shape == (100, 5)
>>> joint_featurizer = dihedral_featurizer + contact_featurizer
>>> # adding two operators together yields a new compound operator
>>> # when you apply it, you get something that does both
>>> joint_featurizer(trajectory).shape == (100, 15)
I'm not sure what operations really make sense. There's adding two featurizers. Multiplying by a scalar
makes sense. Also, perhaps a kind of generalized outer product makes sense. For example, if you have
two binary featurizers that each induce a 10 dimensional space and you take their outer product under
the 'logical and' operator, you'd get a 100 dimensional feature space with all of the pairwise logical ands --
of the form space1[i] && space2[j]
.
Maybe I'm overthinking this. I'm not really sure what the use cases are except for some kind of very exhaustive enumeration.
cc: @schwancr
from mdtraj.
I think Christian should give his thoughts on the desired properties.
On Aug 6, 2013 3:58 AM, "Robert McGibbon" [email protected] wrote:
I was just talking to @msultan https://github.com/msultan about this
yesterday afternoon. A useful part of the (feature/vector)izer api would be
a minimal operator-type logic. For algorithms like ktICA where you're, in
some sense throwing the kitchen sink at the problem in terms of very large
feature spaces (sure, they might be implicit but it's still in the spirit),
you might want to use a sort of operator logic on featurizers to build up a
complex "compound" feature space.i.e.
[...]
traj.n_frames == 100
applying two featurizers like normal
dihedral_featurizer(traj).shape == (100, 5)
contact_featurizer(traj).shape == (100, 5)joint_featurizer = dihedral_featurizer + contact_featurizer
adding two operators together yields a new compound operator
when you apply it, you get something that does both
joint_featurizer(trajectory).shape == (100, 15)
I'm not sure what operations really make sense. There's adding two
featurizers. Multiplying by a scalar
makes sense. Also, perhaps a kind of generalized outer product makes
sense. For example, if you have
two binary featurizers that each induce a 10 dimensional space and you
take their outer product under
the 'logical and' operator, you'd get a 100 dimensional feature space with
all of the pairwise logical ands --
of the form space1[i] && space2[j].Maybe I'm overthinking this. I'm not really sure what the use cases are
except for some kind of very exhaustive enumeration.cc: @schwancr https://github.com/schwancr
—
Reply to this email directly or view it on GitHubhttps://github.com/rmcgibbo/mdtraj/issues/49#issuecomment-22171463
.
from mdtraj.
It sounds like a decent idea to provide some operators for the featurizer objects, but I'm worried it could be confusing, and I bet someone will end up adding the result of two featurizer.call's as opposed to adding two featurizers.
For instance, you could add operators for building a Hybrid metric:
>>> rmsd = RMSD()
>>> dihedral = Dihedral()
>>> hybrid = 0.1 * rmsd + 0.9 * dihedral
...
>>> hybrid = Hybrid([rmsd, dihedral], [0.1, 0.9])
Those two methods would be the same, but the first one to me could be confusing. I think I'd prefer to have a hybrid featurizer just like we have a hybrid metric. In fact, even in the __add__
case we would need (I think) to write this class as well, meaning it would just be initialized in a unique way.
from mdtraj.
I agree that some form of addition operation is critical, as we don't want to manually keep track of calculating each feature.
from mdtraj.
I'm not as fond of the outer product.
This does bring up the related point of how to design the MSMB3 hooks etc.
from mdtraj.
The outer product is a lot like using the kernel trick with second degree polynomials. So we could have a Polynomial featurizer if we wanted to. But again I think it's clearer to have the featurizers be initialized by calling __init__
from mdtraj.
OK, I'm fine with creating features from lists, rather than explicitly adding them.
from mdtraj.
By the way, are we set on the "featurizer" name?
from mdtraj.
I don't think it was quite clear to me last night that this operator stuff is just an alternative interface to a bunch of constructors for init methods for classes like SumFeaturizer and ScalarMultipleFeaturizer, etc.
I'm not set on the name featurizer though.
-Robert
Sent from my iPhone.
On Tue, Aug 6, 2013 at 10:08 AM, Christian Schwantes
[email protected] wrote:
By the way, are we set on the "featurizer" name?
Reply to this email directly or view it on GitHub:
https://github.com/rmcgibbo/mdtraj/issues/49#issuecomment-22193291
from mdtraj.
Related Issues (20)
- ValueError: invalid literal for int() with base 16: HOT 1
- Build docs in CI HOT 9
- Build docs HOT 2
- mdtraj 1.9.9 and python 3.11.3 HOT 7
- Lammpstrj bounding box origin (shifts) in case os "scaled" coordinates representation HOT 2
- shrake-rupley error quits instead of raising error HOT 4
- xrdlib deprecation warning
- mdconvert dcd -> xtc breaks time field HOT 3
- Get velocity of atom from HDF5 trajectory. HOT 1
- Compiling in PPC64le HOT 2
- RMSD method HOT 3
- Support for velocities (& forces) in trajectory I/O HOT 9
- MDTraj reader not compatible with latest GSD
- bond angle computation results in 'nan' HOT 4
- Memory leak in NetCDF reporter HOT 2
- Drop old hard-coded version checks
- Compatibility with CMS topology files
- Memory issues when appending with XTCReporter HOT 16
- Update project status / add contributor's guide HOT 2
- Overwriting an Existing HDF5Reporter
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mdtraj.