Hi, I have failed to reproduce results of ComplEx on WN18RR and FB15K237 reported in C

The following are parameters for DistMult on WN18RR: <div class="highlight highlig

Detailed parameter settings of ComplEx on WN18RR and FB15237? about conve HOT 5 CLOSED

timdettmers commented on July 23, 2024

Detailed parameter settings of ComplEx on WN18RR and FB15237?

from conve.

Comments (5)

TimDettmers commented on July 23, 2024

Note that in our paper, ComplEx refers to ComplEx with traditional 1-1 scoring. To produce the results for ComplEx with 1-1 scoring, we used the software from my co-author: https://github.com/uclmr/inferbeddings

The following shows an excerpt from an email that also inquires for parameter settings for ComplEx on FB15k-237:

thank you for your interest in our work. Pasquale (CC) work on this with his code which you can find here: https://github.com/uclmr/inferbeddings

Here are some example results, they are not the best that we got through. Batch size for Pasquale's code is the entire training set split into 10 pieces.

Here are also hyperparameters for ComplEx. These results are actually a bit better than we report in the paper, so we might have missed these results. We updated our paper accordingly.

$ ~/workspace/inferbeddings/tools/parse_results_mrr_filtered.sh *ComplEx*
192
Best MRR, Filt: ucl_fb15k-237_v1.embedding_size=200_epochs=1000_loss=pairwise_hinge_margin=5_model=ComplEx_similarity=dot.log
Test - Best Raw MRR: 0.149
Test - Best Filt MRR: 0.247
Test - Best Raw MR: 526.44904
Test - Best Filt MR: 338.9463
Test - Best Raw Hits@1: 8.077%
Test - Best Filt Hits@1: 15.797%
Test - Best Raw Hits@3: 15.489%
Test - Best Filt Hits@3: 27.485%
Test - Best Raw Hits@5: 20.791%
Test - Best Filt Hits@5: 33.58%
Test - Best Raw Hits@10: 29.437%
Test - Best Filt Hits@10: 42.83%

I am quoting the WN18RR parameters and results from another email:

thank you! Here the parameter configuration and the final results for DistMult and ComplEx from Pasquale:

For ComplEx:

inferbeddings$ ./tools/parse_results_mrr_filtered.sh ~/workspace/inferbeddings/logs/schematic-memory/filtered/ucl_wn18rr_v1/*.log
384
Best MRR, Filt: /home/pminervi/workspace/inferbeddings/logs/schematic-memory/filtered/ucl_wn18rr_v1/ucl_wn18rr_v1.embedding_size=200_epochs=200_loss=hinge_margin=2_model=ComplEx_similarity=dot.log
Test - Best Raw MRR: 0.309
Test - Best Filt MRR: 0.444
Test - Best Raw MR: 5274.99697
Test - Best Filt MR: 5261.30121
Test - Best Raw Hits@1: 21.171%
Test - Best Filt Hits@1: 41.114%
Test - Best Raw Hits@3: 38.21%
Test - Best Filt Hits@3: 45.836%
Test - Best Raw Hits@5: 43.539%
Test - Best Filt Hits@5: 47.846%
Test - Best Raw Hits@10: 47.336%
Test - Best Filt Hits@10: 50.734%

These parameter settings should reproduce the results in the paper.

from conve.

0532yangyang commented on July 23, 2024

Hi, could you please share the detailed setting of DistMult on FB15k-237 and WN18RR ?

from conve.

TimDettmers commented on July 23, 2024

The following are parameters for DistMult on WN18RR:

For DistMult:
 inferbeddings$ ./tools/parse_results_mrr_filtered.sh ~/workspace/inferbeddings/logs/schematic-memory/filtered/ucl_wn18rr_v1/*DistMult*.log
192
Best MRR, Filt: /home/pminervi/workspace/inferbeddings/logs/schematic-memory/filtered/ucl_wn18rr_v1/ucl_wn18rr_v1.embedding_size=200_epochs=500_loss=hinge_margin=2_model=DistMult_similarity=dot.log
Test - Best Raw MRR: 0.301
Test - Best Filt MRR: 0.425
Test - Best Raw MR: 5124.58998
Test - Best Filt MR: 5110.78287
Test - Best Raw Hits@1: 20.453%
Test - Best Filt Hits@1: 38.864%
Test - Best Raw Hits@3: 37.077%
Test - Best Filt Hits@3: 43.874%
Test - Best Raw Hits@5: 41.895%
Test - Best Filt Hits@5: 45.82%
Test - Best Raw Hits@10: 46.011%
Test - Best Filt Hits@10: 49.059%

And these are parameters on FB15k-237:

237_v1.embedding_size=200_epochs=1000_loss=hinge_margin=2_model=DistMult_similarity=dot.log
Test - Best Raw MRR: 0.159
Test - Best Filt MRR: 0.241
Test - Best Raw MR: 441.64219
Test - Best Filt MR: 254.15069
Test - Best Raw Hits@1: 9.096%
Test - Best Filt Hits@1: 15.523%
Test - Best Raw Hits@3: 16.598%
Test - Best Filt Hits@3: 26.27%
Test - Best Raw Hits@5: 21.599%
Test - Best Filt Hits@5: 32.486%
Test - Best Raw Hits@10: 30.104%
Test - Best Filt Hits@10: 41.896%

from conve.

jrieke commented on July 23, 2024

Hi Tim, first of all thanks for the great repo! One additional question on the DistMult settings: Which dropout value did you use? Also, could you replicate these results with the code in your repo, or only with 1-1 scoring and the inferbeddings code?

from conve.

TimDettmers commented on July 23, 2024

For the inferbeddings code we did not use any dropout. We also did not use L2 regularization. However, we do use re-normalization to L2 norm <= 1 for the embedding vectors after each weight update which has a regularizing effect. All this is for 1-1 scoring.

For 1-N scoring, the results differ between datasets. For some, it improves performance, for some others not. I think for WN18RR it decreases performance. So for WN18RR 1-1 scoring seems to do better (or is it the margin loss, or the re-normalization? We did not test this causally).

Hope it helps. Let me know if you have any more questions.

from conve.

Detailed parameter settings of ComplEx on WN18RR and FB15237? about conve HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent