Code Monkey home page Code Monkey logo

hyper-cl's Introduction

Hyper-CL: Conditioning Sentence Representations with Hypernetworks

Official Repository for "Hyper-CL: Conditioning Sentence Representations with Hypernetworks" [Paper(arXiv)])

Young Hyun Yoo, Jii Cha, Changhyeon Kim and Taeuk Kim. Accepted to ACL2024 long paper.

Table of Contents

C-STS

In this section, we describe how to train a Hyper-CL model by using our code. This code based on C-STS

Requirements

Run the following script, the requirements are the same as C-STS.

Data

Download the C-STS dataset and locate the file at data/ (reference the C-STS repository for more details.)

pip install -r requirements.txt

Training

We provide example training scripts for finetuning and evaluating the models in the paper. Go to C-STS/ and execute the following command

bash run_sts.sh

Following the arguments of C-STS, we explain the additional arguments in following :

  • --objective: (If you train Hyper-CL, you should use triplet_cl_mse)

  • --cl_temp: Temperature for contrastive loss

  • --cl_in_batch_neg: Add in-batch negative loss to main loss

  • --hypernet_scaler: To set the value of K for low-rank implemented Hyper-CL (i.e., hyper64-cl, hyper85-cl), we determine the divisor of the embedding size. For instance, in the base model, 'K=64' for hyper64-cl means the embedding size 768 is divided by 12. Thus, the hypernet_scaler is set to 12.

  • --hypernet_dual: Dual encoding that uses separate 2 encoders for sentences 1 and 2 and for the condition.

Hyperparameters

We use the following hyperparamters for training Hyper-CL:

Emb.Model Learning rate (lr) Weight decay (wd) Temperature (temp)
DiffCSE_base+hyper-cl 3e-5 0.1 1.5
DiffCSE_base+hyper64-cl 1e-5 0.0 1.5
SimCSE_base+hyper-cl 3e-5 0.1 1.9
SimCSE_base+hyper64-cl 2e-5 0.1 1.7
SimCSE_large+hyper-cl 2e-5 0.1 1.5
SimCSE_large+hyper85-cl 1e-5 0.1 1.9

SimKGC

We provide example training scripts for finetuning and evaluating the models in the paper. Go to sim-kcg/ and execute the following command. This code is based on SimKCG

Preprocessing WN18RR dataset

bash scripts/preprocess.sh WN18RR

Training

bash scripts/train_wn.sh

We explain the arguments in following:

  • --pretrained-model: Backbone model checkpoint (bert-base-uncased or bert-large-uncased)
  • --encoding_type: Encoding type (bi_encoder or tri_encoder)
  • --triencoder_head: Triencoder head (concat, hadamard or hypernet)
  • Refer to config.py for other arguments.

Evaluation for Perfomance and Inference Time

bash scripts/eval.sh ./checkpoint/WN18RR/model_best.mdl WN18RR

Citation

Please cite our paper if you use Hyper-CL in your work:

@article{yoo2024hyper,
  title={Hyper-CL: Conditioning Sentence Representations with Hypernetworks},
  author={Yoo, Young Hyun and Cha, Jii and Kim, Changhyeon and Kim, Taeuk},
  journal={arXiv preprint arXiv:2403.09490},
  year={2024}
}

hyper-cl's People

Contributors

chajii avatar kch-clo avatar

hyper-cl's Issues

Question about the inference time

Hi,
Thanks for the awesome work.
I am trying to reproduce your experiments, but I couldn't reach the inference time you mentioned in Table 3. It only took me around 630 seconds to complete the entire script with default parameters and a SimCSE_large model. I am wondering how is the inference time being computed or recorded?

sincerely,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.