- (2024.05.24) Initial Release
we propose Continuity-Relativity indExing with gAussian Middle (CREAM), which interpolates positional encodings by manipulating position indices.
Apart from being simple, CREAM is training-efficient: it only requires fine-tuning at the pre-trained context window (e.g., Llama 2-4K) and can extend LLMs to a much longer target context length (e.g., 256K).
To ensure that the model focuses more on the information in the middle, we introduce a truncated Gaussian to encourage sampling from the middle part of the context during fine-tuning, thus alleviating the “Lost-in-the-Middle” problem faced by long-context LLMs.
Experimental results show that CREAM successfully extends LLMs to the target length for both Base and Chat versions of Llama2-7B with “Never Miss A Beat”.
# clone project
git clone [email protected]:wutong4012/CREAM.git
cd CREAM
# create conda environment
conda create -n CREAM
conda activate CREAM
# install requirements
pip install -r requirements.txt
You can download all the finetune data and evaluation data from Video-LLaVA/DATA
Our training framework offers tailored scripts to meet the diverse needs of researchers.
Train model
bash scripts/run_CREAM.sh 8 linear llama2 5946 CREAM
bash scripts/run_CREAM_chat.sh 8 linear llama2_chat 5946 CREAM
Evaluate model
bash scripts/eval_longchat_lines.sh 8 linear llama2 CREAM 1000
bash scripts/eval_lost_in_the_middle.sh 8 linear llama2 CREAM 1000
bash scripts/eval_needle.sh 8 linear llama2_chat CREAM 100
bash scripts/eval_longbench.sh 8 linear llama2_chat CREAM 100
bash scripts/eval_ppl.sh 8 linear llama2 CREAM 1000
bash scripts/eval_long_ppl.sh 64 linear llama2 CREAM 1000
LongChat-Lines
Lost in the Middle
Needle in a Haystack
LongBench
Data / Code: