Comments (2)
Hi @LFGMUW,
I tried following the documentation on how to train and evaluate embedding models in pykeen but after the OpenBioLink Dataset caused CUDA oom issues (partly because for most models slicing is not implemented or can I adapt any parameters to fit it?
Please consider that OpenBioLink is a comparably large dataset. To avoid CUDA out of memory exceptions, you can reduce the embedding dimension of your model and make use of the automatic memory optimization, i.e. pipeline(model_kwargs=dict(automatic_memory_optimization=True, ...), ...)
. Please note that if your model uses batch normalization, the automatic_memory_optimization
cannot be used since it relies on sub-batching whereas batch normalization requires the full batch when applied.
sidenote: how are the OBLF1 and F2 supposed to me used?
Openbiolinkf1 and Openbiolinkf2 are subsets of OpenBioLink that we created, which we will upload soon.
from pykeen.pipeline import pipeline
pipeline_result = pipeline(
dataset='WN18RR',
model='RotatE',
model_kwargs=dict(
embedding_dim=500,
#automatic_memory_optimization=True,
)
We performed a reproducibility study and integrated the corresponding configurations. The configuration for RotatE on WN18RR can be found at https://github.com/pykeen/pykeen/blob/master/src/pykeen/experiments/rotate/sun2019_rotate_wn18rr.json
The best RotatE-WN18RR configuration that we found is available at https://github.com/pykeen/benchmarking/blob/master/ablation/results/rotate/wn18rr/random/adam/2020-04-25-19-04_217bcf38-2101-461b-9593-b133b4201e6a/0000_wn18rr_rotate/best_pipeline/pipeline_config.json
Please let us know if you need further help :)
from pykeen.
Hi @mali-git,
Thank you for replying to my questions.
Please consider that OpenBioLink is a comparably large dataset. To avoid CUDA out of memory exceptions, you can reduce the embedding dimension of your model and make use of the automatic memory optimization
Yes, but I thought since it is built-in there would perhaps be a way of training, I did use using the memory optimization parameter as well and ran it with dimension 50, making it much smaller does not seem promising for the learned embeddings.
Please note that if your model uses batch normalization, the automatic_memory_optimization cannot be used since it relies on subbatching whereas batch normalization requires the full batch when applied.
Yes, thank you. I was hoping there was a way around the issue of memory constrains seeing that you published results to the models.
I did see the experiment configs, they where the basis of my attempt of recreating. I was surprised how many orders my metric results differed. But thank you.
from pykeen.
Related Issues (20)
- AttributeError: 'Module' object has no attribute 'get' HOT 2
- Question about the use of `create_inverse_triples` HOT 2
- Want to train a model without any evaluate or test dataset HOT 1
- Bug in wandb result tracker HOT 1
- Possible issue with model evaluation when using datasets with inverse triples HOT 1
- RGCN RuntimeError: trying to backward through graph a second time. (has parameters but no reset_parameters) HOT 2
- QuatE: GPU memory is not released per epoch HOT 3
- Training loop does not update relation representations when continuing training HOT 2
- from pykeen.pipeline import pipeline, pipeline issue HOT 3
- Evaluating metrics on many subsets with multiple models HOT 2
- Shape Mismatch upon initializing pretrained ComplEx embeddings HOT 2
- TransE - CUDA out of memory HOT 3
- Importing model_resolver HOT 2
- Getting Embeddings of the Entity and Relations HOT 13
- RGCN Hyper parameter optimization error HOT 1
- MatKG HOT 1
- HPO_Pipeline fails on AutoSF models HOT 1
- Unable to reproduce TransE experiment
- EarlyStopper: show progress bar
- Cosine Annealing with Warm Restart LR Scheduler recieving an unexpected kwarg `T_i`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pykeen.