I noticed this because I found the batch_sim is a square matrix. But the batch_sim sho

Why test pairs show up in the batch_sim matrix generation? about clusterea HOT 6 CLOSED

AdFiFi commented on September 7, 2024

Why test pairs show up in the batch_sim matrix generation?

from clusterea.

Comments (6)

xz-liu commented on September 7, 2024

I believe it is common practice to set a range of matchable entities using a test set before computing similarity, which, in my view, relies on a 1-to-1 mapping assumption. This evaluation setting is widely used in other repositories as well. When running the evaluation code, the embeddings are typically sorted and filtered based on the test pairs before proceeding with the evaluation.

from clusterea.

AdFiFi commented on September 7, 2024

Using ground-truth counterparts as candidates?

from clusterea.

xz-liu commented on September 7, 2024

Yes. In almost all papers, only the test pairs are considered when calculating the embeddings. Our paper introduces small blocks to allow for scalability, and this filtering process is implemented within these small blocks. This is equivalent to filtering globally during the evaluation.

You can find similar implementations in OpenEA and DualAMN. I believe this approach adheres to the assumption of 1-to-1 mapping.

If you are interested in exploring beyond the 1-to-1 mapping assumption, you may want to look into the paper on knowledge graph alignment with dangling cases.

Thank you so much for your interest in our work. We are open to questions at any time.

from clusterea.

AdFiFi commented on September 7, 2024

But why aren't global_matrix and global_matrix_t in main.py square matrices? The size is the number of nodes in the source graph and the target graph, right?

from clusterea.

xz-liu commented on September 7, 2024

Yes. I recall that when I implemented that, I used a sparse matrix so that the similarity matrix would not include the filtered entries. This allowed for filtered evaluation even though the matrix size remains the full size. This was the most convenient way to implement it since we need sparse matrices to store the similarity between a large number of items anyway, and the matrix size is just metadata, not reflecting the actual size of the data.

You could help me check whether this implementation is correct. If not, by fixing it, you would probably achieve a better score than mine.

from clusterea.

AdFiFi commented on September 7, 2024

Thank you for your answering and sharing, which help me understand this work better.

from clusterea.

Why test pairs show up in the batch_sim matrix generation? about clusterea HOT 6 CLOSED

Comments (6)

Related Issues (3)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent