Code Monkey home page Code Monkey logo

selfkg's Introduction

License

SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs

Original implementation for paper SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs.

This paper is accepted and nominated as a best paper by The Web Conference2022! ๐Ÿ˜†

SelfKG is the first self-supervised entity alignment method without label supervision, which can match or achieve comparable results with state-of-the-art supervised baselines. The performance of SelfKG suggests self-supervised learning offers great potential for entity alignment in Knowledge Graphs.

SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs

https://doi.org/10.1145/3485447.3511945

Installation

Requirements

torch==1.9.0
faiss-cpu==1.7.1
numpy==1.19.2
pandas==1.0.5
tqdm==4.61.1
transformers==4.8.2
torchtext==0.10.0

You can use setup.sh to set up your Anaconda environment by

bash setup.sh

Quick Start

Data Preparation

You can download the our data from here, and the final structure our project should be:

โ”œโ”€โ”€ data
โ”‚   โ”œโ”€โ”€ DBP15K
โ”‚   โ”‚   โ”œโ”€โ”€ fr_en
โ”‚   โ”‚   โ”œโ”€โ”€ ja_en
โ”‚   โ”‚   โ””โ”€โ”€ zh_en
โ”‚   โ”œโ”€โ”€ DWY100K
โ”‚   โ”‚   โ”œโ”€โ”€ dbp_wd
โ”‚   โ”‚   โ””โ”€โ”€ dbp_yg
โ”‚   โ””โ”€โ”€ LaBSE
โ”‚       โ”œโ”€โ”€ bert_config.json
โ”‚       โ”œโ”€โ”€ bert_model.ckpt.index
โ”‚       โ”œโ”€โ”€ checkpoint
โ”‚       โ”œโ”€โ”€ config.json
โ”‚       โ”œโ”€โ”€ pytorch_model.bin
โ”‚       โ””โ”€โ”€ vocab.txt
โ”‚   โ””โ”€โ”€ getdata.sh
โ”œโ”€โ”€ loader
โ”œโ”€โ”€ model
โ”œโ”€โ”€ run.sh # Please use this bash to run the experiments!
โ”œโ”€โ”€ run_DWY_LaBSE_neighbor.py # SelfKG on DWY100k
โ”œโ”€โ”€ run_LaBSE_neighbor.py # SelfKG on DBP15k
... # run_LaBSE_*.py # Ablation code will be available soon
โ”œโ”€โ”€ script
โ”‚   โ””โ”€โ”€ preprocess
โ”œโ”€โ”€ settings.py
โ””โ”€โ”€ setup.sh # Can be used to set up your Anaconda environment

You can also use the following scripts to download the datasets directly:

cd data
bash getdata.sh # The download speed is decided by your network connection. If it's pretty slow, please directly download the datasets from the website as mentioned before.

โญRun Experiments

Please use

bash run.sh

to reproduce our experiments results. For more details, please refer to run.sh and our code.

โ— Common Issues

"XXX file not found"
Please make sure you've downloaded all the dataset according to README.

to be continued ...

Citing SelfKG

If you use SelfKG in your research or wish to refer to the baseline results, please use the following BibTeX.

@article{DBLP:journals/corr/abs-2203-01044,
  author    = {Xiao Liu and
               Haoyun Hong and
               Xinghao Wang and
               Zeyi Chen and
               Evgeny Kharlamov and
               Yuxiao Dong and
               Jie Tang},
  title     = {SelfKG: Self-Supervised Entity Alignment in Knowledge Graphs},
  journal   = {CoRR},
  volume    = {abs/2203.01044},
  year      = {2022},
  url       = {https://arxiv.org/abs/2203.01044},
  eprinttype = {arXiv},
  eprint    = {2203.01044},
  timestamp = {Mon, 07 Mar 2022 16:29:57 +0100},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2203-01044.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

selfkg's People

Contributors

haoyunhong avatar xiao9905 avatar xinghaow99 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

selfkg's Issues

How to produce dataset

While I understood the general structure of the dataset and how to produce most of the elements, I still can't figure out how triples_n files are generated. Can you help me?

Code Error

Traceback (most recent call last):
File "run.py", line 5, in
trainer = Trainer(seed=2020)
File "/home/wangzhen/program/GraduationProject/SelfKG/model/layers.py", line 315, in init
myset1 = Mydataset(token1.id_features_dict) # dataset
TypeError: init() missing 1 required positional argument: 'adj_tensor_dict'

Experiments about self negative sampling

I have some questions about the experiments of self negative sampling.

  1. In Table 4, the results show that removing self negative sampling can hurt the performance. Have you tried to sample negative entities from both the Gx and Gy?
  2. Since the alignment accuracy are quite good on the examined datasets, have you tried to reduce the number of conflicts when sampling from another KG using the alignment result?

Results of other dataset

Hi, @HaoyunHong

I'm very excited at your work of self-supervised learning in EA. According to your paper, SelfKG performs well in DBP15k and DWY100K. These datasets are relatively simple EA datasets, and I wonder how SelfKG performs in more complicated datasets, such as DBP100K or DBP1M. If you ever tried SelfKG in other datasets? Can you offer more experimental results?

best,
Meihao

Missing file

I can't find run_LaBSE_SSL_DWY.py and run_LaBSE_SSL.py.

Hi Dear

Sorry, i canโ€˜t expression language clearly in English. Also, After download the data "LaBSE.zip", when i unzip it๏ผŒ The system tells me "pytorch_model.bin" is broken.

Missing file again

No module named 'model.layers_LaBSE_SSL',your model file don't have layers_LaBSE_SSL.py. Please sure the run_LaBSE_SSL.py can run. Thanks!

mark

ๆญๅ–œๅคงไฝฌๅ–œไธญbest paperใ€‚ไธ่ฏปไธช10้ๅฏนไธ่ตทbest paperใ€‚

paper not found

Thank you for this great work
I cant find the paper from the paper link
Could you help me
Thanks again

File Error

Thanks for your great work. When I run your model on different datasets you provided, I found there are maybe some missing files and corrupted file.

For DBP15K fr_en and DWY100K dbp_yg, some files seem corrupted and unreadable. There are errors like (both DBP15K fr_en and DWY100K dbp_yg):
Traceback (most recent call last):
File "run_LaBSE_neighbor.py", line 5, in
trainer = Trainer(seed=37)
File "SelfKG/model/layers_LaBSE_neighbor.py", line 193, in init
loader1 = DBP15KRawNeighbors(self.args.language, "1")
File "SelfKG/loader/DBP15KRawNeighbors.py", line 16, in init
self.load()
File "SelfKG/loader/DBP15KRawNeighbors.py", line 22, in load
self.id_entity = pickle.load(f)
_pickle.UnpicklingError: invalid load key, '\x00'.

For DWY100K dataset, some files may be missing, there is error like:
No such file or directory: 'SelfKG/data/DWY100K/dbp_wd/valid.ref'

่ƒฝๅฆๆไพ›relationๅฏนๅบ”็š„ๆ ‡็ญพๆ–‡ไปถ๏ผŸ

ไฝ ๅฅฝ๏ผŒๆˆ‘ๆณจๆ„ๅˆฐๆ•ฐๆฎ้›†ๆ–‡ไปถไธญๅชๆไพ›ๅฎžไฝ“ๅฏนๅบ”็š„ๆ ‡ๆณจๅบๅท

(xxxx) [xx@xxx xxxx]$ head -5 /xxxx/data/DWY100K/dbp_yg/id_ent_1
0       Simeon Burt Wolbach
1       Peter Kyros
2       Yoshihito Fujita
3       Angus M. Woodbury
4       Avianca Perรบ

่€Œๅ…ณ็ณป็ฑปๅž‹็š„ๆ ‡ๆณจๆ–‡ไปถๆฒกๆœ‰ๆไพ›๏ผŒๆ— ๆณ•ไบ†่งฃไธคไธชๅฎžไฝ“ไน‹้—ด็š„ๅ…ณ็ณป

(xxxx) [xx@xxx xxxx]$ head -5 /xxxx/selfkg/data/DWY100K/dbp_yg/triples_1
11495   75     23339
24003   288     157884
7342    268     39411
23111   216     67889
150614  230     27619

ไพ‹ๅฆ‚ๅฏนไธ‰ๅ…ƒ็ป„(11495,75,23339)๏ผŒ75ๅฏนๅบ”็š„ๅ…ณ็ณปๆฒกๆœ‰ๅฏนๅบ”็š„ๆ–‡ไปถๆŒ‡ๅ‡บใ€‚

่ฏท้—ฎ่ƒฝๆไพ›DWY100Kใ€ๅ’ŒDBP15K ๅ…ณ็ณปๅฏนๅบ”็š„ๅบๅทๆ–‡ไปถๅ—๏ผŸ
ๆœŸๅพ…ๅ›žๅค๐Ÿ˜๏ผ

ๅ…ณไบŽๅˆๅง‹ๅตŒๅ…ฅๅ‘้‡็š„้—ฎ้ข˜

ๆ‚จๅฅฝ๏ผŒๆˆ‘ๆณจๆ„ๅˆฐไฝ ไปฌไปฃ็ ้‡Œๆไพ›็š„ๅˆๅง‹ๅตŒๅ…ฅๅ‘้‡ๆ–‡ไปถ ./data/DBP15K/zh_en/raw_LaBSE_emb_1.pkl ไธŽ./data/DBP15K/zh_en/raw_LaBSE_emb_2.pkl ๅŒ…ๅซ5244ไธช้‡ๅˆ็š„ๅ‘้‡๏ผŒๆ„Ÿ่ง‰ๅˆๅง‹ๅตŒๅ…ฅๅ‘้‡็š„้‡ๅˆ็Ž‡ๆœ‰็‚นๅคง๏ผŒ่ฏท้—ฎ่ฟ™ๆ˜ฏ LaBSE ๅœจๆœชไฝฟ็”จ็ฟป่ฏ‘ๆŠ€ๆœฏ็š„ๅ‰ๆไธ‹ๅพ—ๅˆฐ็š„ๅตŒๅ…ฅๅ‘้‡๏ผŒ่ฟ˜ๆ˜ฏไฝฟ็”จไบ†็ฟป่ฏ‘ๆŠ€ๆœฏๅŽๅพ—ๅˆฐ็š„ๅตŒๅ…ฅๅ‘้‡ๅ‘ข๏ผŸๅฆ‚ๆžœๆ˜ฏๅŽ่€…๏ผŒ่ฏท้—ฎๅฏไปฅๆไพ›ๆœชไฝฟ็”จ็ฟป่ฏ‘ๆŠ€ๆœฏ็š„ๅตŒๅ…ฅๅ‘้‡ๅ—๏ผŸ
ๅฆๅค–๏ผŒ่ฏท้—ฎๅฏไปฅๆไพ›ไธไฝฟ็”จ็ฟป่ฏ‘ๆŠ€ๆœฏใ€ๅตŒๅ…ฅๆจกๅž‹ไธบ FastText ็š„ๅˆๅง‹ๅตŒๅ…ฅๅ‘้‡ๅ—๏ผŸ
ๆœŸๅพ…ๆ‚จ็š„ๅ›žๅค๏ผ่ฐข่ฐข๏ผ

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.