Code Monkey home page Code Monkey logo

palign's Introduction

Personality Alignment of Large Language Models

Welcome to the official repository for Personality Alignment with PASO (Personality Activate Search Optimize). This repository is dedicated to advancing the field of personalized AI by aligning large language models (LLMs) with individual user preferences and personality traits. Here, you'll find the code and data supporting our groundbreaking research.

Overview

In the evolving landscape of AI, personality alignment stands as a pivotal advancement. Traditional models align with broad human values, but PASO goes further by fine-tuning models to reflect the nuanced preferences and traits of individual users. This repository provides the tools and data to implement and evaluate such alignment, making AI interactions more relevant, meaningful, and personalized.

Features

  • Personality Alignment: Implement PASO to dynamically adjust model activations, achieving nuanced alignment with user-specific traits.
  • Comprehensive PAPI Dataset: Utilize a rich dataset of personality profiles to train and evaluate models.
  • Benchmarking: Compare the performance of PASO against state-of-the-art methods like DPO, PPO, and various prompt-based techniques.
  • Open-Ended Generation: Assess model performance on complex reasoning and personalized response tasks.

Installation

Install the required packages:

pip install .

Data: PAPI Dataset

The Personality Alignment with Personality Inventories (PAPI) dataset is central to our approach. It consists of detailed personality profiles collected from over 300,000 individuals using the IPIP-NEO personality inventory. This dataset forms the backbone of our alignment process, enabling models to learn and adapt to individual user traits.

Data Files Description

  • IPIP-NEO-ItemKey.xls: Contains the item keys for the IPIP-NEO personality inventory.
  • mpi_120.csv: Responses to the IPIP-NEO-120 questionnaire.
  • mpi_300.csv: Responses to the IPIP-NEO-300 questionnaire.
  • mpi_300_split.json: The Test-Set split for PAPI dataset
  • Test-set.json: The Test-Set data for PAPI dataset

Download All Dataset

We have released the PAPI dataset in Google Drive and Huggingface 🤗!

PAPI-300K: the 300K datasets for PAPI, it include IPIP-NEO-120 and IPIP-NEO-300 Questionnaire, with 300K Subject's answer.

PAPA-120-600K: the 600K datasets for PAPI, but it ONLY include IPIP-NEO-120 Questionnaire.

Data Permissions

This project uses IPIP items, scales, and inventories, which are in the public domain. Permission has been automatically granted for any use, commercial or non-commercial. Refer to IPIP Permission for more details.

Method: PAS (Personalized Activate Search)

PAS is an innovative method designed to fine-tune LLMs to align with individual user preferences. It dynamically adjusts model activations based on user-specific traits, ensuring that the model's responses are personalized and relevant.

Key Steps in PASO

  1. Personality Alignment: Use the PAPI dataset to train the model on individual user profiles.
  2. Activation Intervention: During inference, adjust the model's activations in real-time to reflect user-specific traits.
  3. Evaluation: Assess the model's performance using both multiple-choice and open-ended tasks to ensure robust alignment.

Training and Evaluation

To train and evaluate the models using the PAS method, execute:

python main.py

This script aligns the language model with the specified user profiles and evaluates its performance on multiple-choice tasks.

Contributions

We welcome contributions to enhance the personalized alignment capabilities of LLMs. Please feel free to fork this repository, make your changes, and submit a pull request.

References

For a detailed understanding of our methods and results, refer to our latest paper on personalized alignment using the PAS method. Additionally, you can find implementations of DPO, PPO, and other baseline methods within this repository.

@misc{zhu2024personalityalignmentlargelanguage,
      title={Personality Alignment of Large Language Models}, 
      author={Minjun Zhu and Linyi Yang and Yue Zhang},
      year={2024},
      eprint={2408.11779},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2408.11779}, 
}

Explore the future of personalized AI with PAS, and let's build models that truly understand us! 🚀


🎉 What's Next?

This is one of our "causal intervention" projects. If you're hungry for more AI safety goodness, check out these related projects:

  • SafetyLock (Stay safe, stay aligned, and may your LLMs always respond appropriately! 🦜✨)

Happy Aligning!

palign's People

Contributors

zhu-minjun avatar

Stargazers

 avatar Zhongyang Li avatar Yifan Wei avatar  avatar Jian Wang avatar CooperLeong avatar  avatar Jiarui Liu avatar Baiqiao Zhang avatar KABI avatar Nikolay Petrov avatar Linyi Yang avatar  avatar Yarkona avatar  avatar

Watchers

 avatar

Forkers

yarkona yanglinyi

palign's Issues

about file directory IPIP and file selected_IPIP300_samples.json

hello,I want to ask :
1.does the directory 'IPIP' appeared in the code equal to the directory 'PAPI'? (cause I didn't found directory"IPIP" in your published code)
2.And does "selected_IPIP300_samples.json" appeared in the code equal to "Test-set.json"?(cause I didn't found the former too.) 3.I read the code file which generates "selected_IPIP300_samples.json",and it shows that only 300 piece of data were used for the experiment .Do I understand it correctly?
Thank you for answering!

Directions of personal perference

Hello, authors.
As illustrated in the figure below, the original ITI method (Inference-Time Intervention) includes a parameter θ that represents the direction of truthfulness. However, this parameter is omitted in PAS, where personal preference is solely represented by standard vectors.

image
Could you provide further explanation as to why this difference exists and how it is effective in practice?

about PAS

Hello author!I want to ask 2 questions:
1.I noticed that in your main.py,the 120 question you used to train(in mpi_300_split.json) is actually not the IPIP-NEO-120.,which is different from what you said in your paper.Which one do you use when generating results of PAS in Table 1?(By the way,if I understand correctly,the data in Table 1 is behavioral difference Aligned Score(2))
11

2.In main.py,when testing PAS,your system_prompt actually contains 120 answers of the training data. Do you also use it when generating results of PAS in Table 1?
22

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.