Comments (6)
@MelissaKR thanks for the question. Can you tell us more about your use case?
Basically, if you want to do some feature pre-processing on the column of pre-trained embeddings, yes, you can feed them as continuous features to NVTabular.
Let us know if you have further questions.
from dataloader.
@rnyak Thank you for your response. I basically have another model that outputs embeddings for a given set of features, and I want to replace those features in the original model with the embeddings I have obtained.
Should I simply pass these new feature columns as conts
in TorchAsyncItr
? It'll be great if I could see an example code of how pre-trained embeddings are passed to NVTabular's TorchAsyncItr
.
from dataloader.
@rnyak , to follow up on this.
from dataloader.
@MelissaKR this issue was open for a while. do you mind giving a bit more detail about what you want to do with the embeddings you are getting from another model and what's your original model? We are currently supporting feeding embeddings to embedding layer you can see that tensorflow example. let us know if that's something you were looking for, or something else? thanks.
from dataloader.
@rnyak Thank you for getting back to me on this! I have a main model and in that model, let's say I have a feature for different movies. I can pass it as a regular categorical feature to then be fed to the embedding layer. My model uses PyTorch, by the way. But I have trained a different model that uses collaborative filtering which learns embeddings for these movies much better. So now, for each movie in the training and validation set for the main model, I have vectors of size n that are the learned embeddings. And I don't need to use this movie feature anymore and pass it to an embedding layer. Instead, I want to remove it from my dataset and use the learned embeddings from the second model, but I want to see if there is a straightforward way of doing this, instead of manually defining n new numeric features for each element in the new movie embeddings and pass them to NVTabular. In other words, how can I pass pre-trained embeddings as is to my model?
I hope I could clarify my question and use case.
from dataloader.
@MelissaKR thanks for the clarification. we are currently working on that and we will be creating an example shortly. Example might not be on PyT but you can adapt it to your framework I believe :) Can you please tell me what's the architecture of your main model
? it is an MLP model? or a more complicated architecture? Besides can you share a simple screenshot what would your data look like ? contains nested 3D arrays? or is it something like below?
movie_id movie_embedding
1 [float1, float2, ..., float64]
2 [float1, float2, ..., float64]
..
n [float1, float2, ..., float64]
or more like this
movie_id. movie_genres_id. movie_genres_embeddings
1 [1, 2, 3] [[float1, float2, ..., float64] , [float1, float2, ..., float64] , ...]
2 [3,5] [float1, float2, ..., float64] , [float1, float2, ..., float64] ]
from dataloader.
Related Issues (20)
- Change dataloader to output 1D tensors for scalar features HOT 1
- GPU is not detected properly when using SLURM HOT 5
- GPU memory does not get freed up properly after each batch HOT 5
- [FEA] Data loader: support to padding sparse sequential features on the left side HOT 11
- Dataloader does not work with tf.keras.layers.Embedding HOT 7
- Device assignment does not work in PyTorch HOT 4
- Out-of-memory error when iterating over merlin.dataloader.torch.Loader HOT 1
- PyTorch Loader not working HOT 1
- [BUG] Unable to extract session embeddings from a session-based transformer model HOT 3
- Can't import Loader HOT 4
- [Task] Add multi-gpu data parallel example
- [Feature Request] Make the torch dataloader support TensorDict
- Does this work with images? HOT 1
- [BUG] Exception in model when using ragged tensors with tensorflow 2.10.0 HOT 9
- [BUG] Data parallel training freezes due to different number of batches HOT 17
- [BUG] Dataloader doesnt release memory and memory growth HOT 6
- NVTabular KerasSequenceLoader costs longer time to load multi-hot features than one-hot features HOT 2
- [Question] OOM Is there a way not to load the whole dataset in the dataloader? HOT 1
- Shuffle doesn't work HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataloader.