arcee-ai / dalm Goto Github PK

View Code? Open in Web Editor NEW

268.0 11.0 32.0 19.36 MB

Domain Adapted Language Modeling Toolkit - E2E RAG

Home Page: https://www.arcee.ai

License: Apache License 2.0

Python 100.00%

language-model llm retrieval retrieval-augmented-generation

dalm's Introduction

Domain Adapted Language Modeling Toolkit

Manifesto

A great rift has emerged between general LLMs and the vector stores that are providing them with contextual information. The unification of these systems is an important step in grounding AI systems in efficient, factual domains, where they are utilized not only for their generality, but for their specificity and uniqueness. To this end, we are excited to open source the Arcee Domain Adapted Language Model (DALM) toolkit for developers to build on top of our Arcee open source Domain Pretrained (DPT) LLMs. We believe that our efforts will help as we begin next phase of language modeling, where organizations deeply tailor AI to operate according to their unique intellectual property and worldview.

Demo DALMs

Query example DALMs created by the Arcee Team.

DALM-Patent	DALM-PubMed	DALM-SEC	DALM-Yours

Research Contents

This repository primarily contains code for fine-tuning a fully differential Retrieval Augmented Generation (RAG-end2end) architecture.

For the first time in the literature, we modified the initial RAG-end2end model (TACL paper, HuggingFace implementation) to work with decoder-only language models like Llama, Falcon, or GPT. We also incorporated the in-batch negative concept alongside the RAG's marginalization to make the entire process efficient.

Inside the training folder, you'll find two codes to train the RAG-end2end and Retriever with contrastive learning.
All evaluations related to the Retriever and the Generator are located in the eval folder.
Additionally, we have data processing codes and synthetic data generation code inside the datasets folder.

Usage

To perform training and evaluation for both the retriever model and the new rag-e2e model, please adhere to the following steps.

System Requirements

The system reqs depend on the retriever model, generator model, and batch size. But for reference (e2e rag), we used the following for our experiments (eval results below):

retriever: BAAI/bge-large-en
generator: meta-llama/Llama-2-7b-hf
batch size: 18
dataset size: 200k

This took 7 hours on a single A100 GPU (80GB).

Installation

You can install this repo directly via pip install indomain

Alternatively, for development or research, you can clone and install the repo locally:

git clone https://github.com/arcee-ai/DALM.git && cd DALM
pip install --upgrade -e .

This will install the DALM repo and all necessary dependencies.

Make sure things are installed correctly by running dalm version. On an non-intel Mac you may need to downgrade transformers library: pip install transformers==4.30.

Data setup

tl;dr

You can run dalm qa-gen <path-to-dataset> to preprocess your dataset for training. See dalm qa-gen --help for more options
If you do not have a dataset, you can start with ours

# Note - our dataset already has queries and answers, so you don't actually need to run this.
# replace `toy_dataset_train.csv` with your dataset of titles and passages
dalm qa-gen dalm/datasets/toy_data_train.csv

The setup for training and evaluation can be effortlessly executed provided you possess a CSV file containing two/three columns: Passage, Query (and Answer if running e2e). You can utilize the script question_answer_generation.py to generate this CSV.
It's important to highlight that the retriever-only training method employs solely the passages and queries, whereas the rag-e2e training code utilizes all three columns.
In our experiments, we utilize BAAI/bge-large-en as the default retriever and employ meta-llama/Llama-2-7b-hf as the default generator. The code is designed to be compatible with any embedding model or autoregressive model available in the Hugging Face model repository at https://huggingface.co/models.

Training

You can leverage our scripts directly if you'd like, or you can use the dalm cli. The arguments for both are identical

Train Retriever Only

Train BAAI/bge-large-en retriever with contrastive learning.

python dalm/training/retriever_only/train_retriever_only.py \
--dataset_path "./dalm/datasets/toy_data_train.csv" \
--retriever_name_or_path "BAAI/bge-large-en" \
--output_dir "retriever_only_checkpoints" \
--use_peft \
--with_tracking \
--report_to all \
--per_device_train_batch_size 150

dalm train-retriever-only "BAAI/bge-large-en" "./dalm/datasets/toy_data_train.csv" \
--output-dir "retriever_only_checkpoints" \
--use-peft \
--with-tracking \
--report-to all \
--per-device-train-batch-size 150

For all available arguments and options, see dalm train-retriever-only --help

Train Retriever and Generator Jointly (RAG-e2e)

Train Llama-2-7b generator jointly with the retriever model BAAI/bge-large-en.

python dalm/training/rag_e2e/train_rage2e.py \
  --dataset_path "./dalm/datasets/toy_data_train.csv" \
  --retriever_name_or_path "BAAI/bge-large-en" \
  --generator_name_or_path "meta-llama/Llama-2-7b-hf" \
  --output_dir "rag_e2e_checkpoints" \
  --with_tracking \
  --report_to all \
  --per_device_train_batch_size 20

dalm train-rag-e2e \
"./dalm/datasets/toy_data_train.csv" \
"BAAI/bge-large-en" \
"meta-llama/Llama-2-7b-hf" \
--output-dir "rag_e2e_checkpoints" \
--with-tracking \
--report-to all \
--per-device-train-batch-size 20

For all available arguments and options, see dalm train-rag-e2e --help

Evaluation

Summary of how evaluation is done

The Retriever in general is trained to be good at finding the most relevant passages in a corpus given a query.

Given a ground-truth test dataset that is a 200,000-line CSV containing patent abstracts and more importantly this evaluation dataset was not present in the training dataset, the below listed steps were followed:

Use the trained retriever to encode all passages into an ad-hoc indexed vector store using the HNSW library.
Take each query and use the trained retriever to encode it into an embedding vector (QE)
For each encoded passage (PE) in the vector store, find the nearest neighbor similarity search score between QE and PE (Note: with HNSW, exhaustiveness is avoided)
Find the top-K (eg, top 5) best matches based on nearest neighbor similarity search scores
Compare the matches against the ground truth top-K best matches to calculate recall and hit rate.

Results

Type of Retriever	Recall	Hit rate
Plain Retriever	0.45984	0.45984
Retriever with contrastive learning	0.46037	0.46038
Retriever End2End	0.73634	0.73634

To run retriever only eval (make sure you have the checkpoints in the project root)

 python dalm/eval/eval_retriever_only.py  \
 --dataset_path qa_pairs_test.csv \
 --retriever_name_or_path "BAAI/bge-large-en" \
 --passage_column_name Abstract \
 --query_column_name Question \
 --retriever_peft_model_path retriever_only_checkpoints

dalm eval-retriever qa_pairs_test.csv \
 --retriever-name-or-path "BAAI/bge-large-en" \
 --passage-column-name Abstract \
 --query-column-name Question \
 --retriever-peft-model-path retriever_only_checkpoints

See dalm eval-retriever --help for all available arguments

For the e2e eval

python dalm/eval/eval_rag.py  \
 --dataset_path qa_pairs_test_2.csv \
 --retriever_name_or_path "BAAI/bge-large-en" \
 --generator_name_or_path "meta-llama/Llama-2-7b-hf" \
 --passage_column_name Abstract \
 --query_column_name Question \
 --answer_column_name Answer \
 --evaluate_generator \
 --query_batch_size 5 \
 --retriever_peft_model_path rag_e2e_checkpoints/retriever \
 --generator_peft_model_path rag_e2e_checkpoints/generator

dalm eval-rag qa_pairs_test.csv \
 --retriever-name-or-path "BAAI/bge-large-en" \
 --generator-name-or-path "meta-llama/Llama-2-7b-hf" \
 --retriever-peft-model-path rag_e2e_checkpoints/retriever \
 --generator-peft-model-path rag_e2e_checkpoints/generator \
 --passage-column-name Abstract \
 --query-column-name Question \
 --answer-column-name Answer \
 --query-batch-size 5

See dalm eval-rag --help for all available arguments

Contributing

See CONTRIBUTING

dalm's People

Contributors

Stargazers

Watchers

dalm's Issues

get test cov to 95%

Eval e2e Rag raising device mismatch error

Hi,
After training rag_e2e I am facing a device mismatch problem for evaluating the model by eval_rag.py.
My evaluation command is:

python dalm/eval/eval_rag.py  \
 --dataset_path ./dalm/datasets/toy_data_train.csv \
 --retriever_name_or_path "myretriever" \
 --generator_name_or_path "mygenerator" \
 --passage_column_name Abstract \
 --query_column_name Question \
 --answer_column_name Answer \
 --evaluate_generator \
 --query_batch_size 5

Error I am getting:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)

After debuging I figure out the AutoModelForRagE2E is CPU based.
If I convert the AutoModelForRagE2E load model to "cuda" then it works fine.
Mybe inside the AutoModelForRagE2E while loading the retriever and generator we need to convert the load model to CUDA.
Should I do this and send a pull request or am I missing something?
Please let me know.

Kind regards

Add a `dalm` cli for users

dalm train e2e ...

Clean up repo

Function decomp
variable cleanup

After #12

Readme documentation

We need to document the structure of the repo and how to use it. Commands for both cli and sdk usage

I can work with @shamanez on this

Incorrect pooling BGE model

BGE models (the default option) require CLS pooling and not the mean pooling:
https://huggingface.co/BAAI/bge-large-en#frequently-asked-questions

While in the actual code, the mean pooling is the default option:
https://github.com/arcee-ai/DALM/blob/main/dalm/models/retriever_only_base_model.py#L60

Is this a bug or an expected behaviour?

Minimum system requirements to train

Hi, What are the minimum system requirements like GPU memory to train rag_e2e?. Thanks

CUDA OOM doing reading comprehension on A10 24GB VRAM GPU

With a subset of the nuclear patent dataset, it throws this error:

12/05/2023 16:14:48 - INFO - dalm.pipelines.reading_comprehension_pipeline - LLM RC dataset generated text of length 2415 from context of length 670
12/05/2023 16:14:48 - INFO - dalm.pipelines.reading_comprehension_pipeline - Writing unprocessed LLM output to context_data_c8307498-165e-49b6-b073-214fbe9bb8e0.csv8_0.json
12/05/2023 16:14:48 - INFO - dalm.pipelines.reading_comprehension_pipeline - Writing Q & A chat completions of length 9 to context_data_c8307498-165e-49b6-b073-214fbe9bb8e0.csv8_0.json
12/05/2023 16:15:17 - INFO - dalm.pipelines.reading_comprehension_pipeline - LLM RC dataset generated text of length 2855 from context of length 11202
12/05/2023 16:15:17 - INFO - dalm.pipelines.reading_comprehension_pipeline - Writing unprocessed LLM output to context_data_c8307498-165e-49b6-b073-214fbe9bb8e0.csv9_0.json
12/05/2023 16:15:17 - INFO - dalm.pipelines.reading_comprehension_pipeline - Writing Q & A chat completions of length 9 to context_data_c8307498-165e-49b6-b073-214fbe9bb8e0.csv9_0.json
/opt/conda/lib/python3.10/site-packages/transformers/pipelines/base.py:1101: UserWarning: You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
12/05/2023 16:15:41 - INFO - dalm.pipelines.reading_comprehension_pipeline - LLM RC dataset generated text of length 2240 from context of length 2841
12/05/2023 16:15:41 - WARNING - dalm.datasets.reading_comprehension_generation.utils - Found a question with no answer: {'question': 's and answer task:', 'answer': 'TBD'}.  Skipping.
12/05/2023 16:15:41 - INFO - dalm.pipelines.reading_comprehension_pipeline - Writing unprocessed LLM output to context_data_c8307498-165e-49b6-b073-214fbe9bb8e0.csv10_0.json
12/05/2023 16:15:41 - INFO - dalm.pipelines.reading_comprehension_pipeline - Writing Q & A chat completions of length 7 to context_data_c8307498-165e-49b6-b073-214fbe9bb8e0.csv10_0.json

12/05/2023 16:15:42 - ERROR - root - Training failed with exception: CUDA out of memory. Tried to allocate 2.00 GiB (GPU 0; 22.20 GiB total capacity; 17.54 GiB already allocated; 327.12 MiB free; 20.83 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
  File "//train_generator.py", line 153, in <module>
    create_reading_comprehension_dataset_and_train()
  File "//train_generator.py", line 134, in create_reading_comprehension_dataset_and_train
    pipeline(
  File "/opt/conda/lib/python3.10/site-packages/dalm/pipelines/reading_comprehension_pipeline.py", line 146, in pipeline
    for index, text_identifier, context, gen_text in llm_rc_dataset_generator:
  File "/opt/conda/lib/python3.10/site-packages/dalm/datasets/reading_comprehension_generation/synthetic_based.py", line 119, in generate_synthetic_dataset
    gen_text = generate_synthetic_data(model_pipeline, chunk_, generation_params)
  File "/opt/conda/lib/python3.10/site-packages/dalm/datasets/reading_comprehension_generation/synthetic_based.py", line 82, in generate_synthetic_data
    outputs = model_pipeline(prompt, **generation_params)
  File "/opt/conda/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 208, in __call__
    return super().__call__(text_inputs, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1140, in __call__
    return self.run_single(inputs, preprocess_params, forward_params, postprocess_params)
  File "/opt/conda/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1147, in run_single
    model_outputs = self.forward(model_inputs, **forward_params)
  File "/opt/conda/lib/python3.10/site-packages/transformers/pipelines/base.py", line 1046, in forward
    model_outputs = self._forward(model_inputs, **forward_params)
  File "/opt/conda/lib/python3.10/site-packages/transformers/pipelines/text_generation.py", line 271, in _forward
    generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1719, in generate
    return self.sample(
  File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2801, in sample
    outputs = self(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 1009, in forward
    outputs = self.model(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 897, in forward
    layer_outputs = decoder_layer(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 626, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/mistral/modeling_mistral.py", line 286, in forward
    attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query_states.dtype)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/functional.py", line 1845, in softmax
    ret = input.softmax(dim, dtype=dtype)

how to use title question and passage all together while training retriever only

cannot import name '_sentencepiece' from partially initialized module 'sentencepiece'

I have no clue about this error.
Any suggestions?
I installed sentencepiece with transformers.

Minor enhancements on the code side

dataset path param is not used anywhere in train_rage2e.py
script name is a different from the one in the README.md
evaluate module is missing
I think the args here need to be renamed https://github.com/arcee-ai/arcee-retriever/blob/main/RAG_e2e_in_batch_negs/train_utils.py#L78 i.e the zip part sample_doc_logprobs is aliased twice

Reading comprehension synthetic data regex improvements

Steps to repro

Run DALM + reading comprehension (unmerged PR) on the Pubmed 500 dataset

Expected

Most entries should end up in reading comprehension dataset

Actual

Only 5% entries make it.

This can be fixed by improving the regexes that look for questions + answers in the response returned from the helper model.

training fails at the end when with-tracking is false

dalm train-rag-e2e \
"qa-outputs/question_answer_pairs.csv" \
"BAAI/bge-large-en" \
"meta-llama/Llama-2-7b-hf" \
--dataset-passage-col-name text \
--output-dir "rag_e2e_checkpoints" \
--no-with-tracking \
--per-device-train-batch-size 12

python ../dalm/training/rag_e2e/train_rage2e.py \
  --dataset_path "qa-outputs/question_answer_pairs.csv" \
  --retriever_name_or_path "BAAI/bge-large-en" \
  --generator_name_or_path "meta-llama/Llama-2-7b-hf" \
  --output_dir "rag_e2e_checkpoints" \
  --per_device_train_batch_size 12

gets the error

Special tokens file saved in rag_e2e_checkpoints/generator/special_tokens_map.json
Traceback (most recent call last):

  File "/workspace/ben/DALM/.venv/bin/dalm", line 8, in <module>
    sys.exit(cli())

  File "/workspace/ben/DALM/dalm/cli.py", line 128, in train_rag_e2e
    train_e2e(

  File "/workspace/ben/DALM/dalm/training/rag_e2e/train_rage2e.py", line 516, in train_e2e
    accelerator.end_training()

  File "/workspace/ben/DALM/.venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 615, in _inner
    return PartialState().on_main_process(function)(*args, **kwargs)

  File "/workspace/ben/DALM/.venv/lib/python3.10/site-packages/accelerate/accelerator.py", line 2499, in end_training
    for tracker in self.trackers:

AttributeError: 'Accelerator' object has no attribute 'trackers'

because of huggingface/accelerate#1994

Installation: Why 'indomain' and not 'dalm' ?

Hi,

I noticed that in the installation instructions here, the DALM repository is installed using the command

pip install indomain

However, based on the name of the toolkit "Domain Adapted Language Modeling Toolkit," it seems like it should be installed using the following command to be more intuitive.

pip install dalm

I wanted to confirm if this is the intended installation command and if there is a specific reason for using "indomain". If not, would it be possible to update the installation command to pip install dalm to better align with the repository's name?

TypeError: Argument() missing 1 required positional argument: 'default'

This error can happen if you have an older version of the typer library installed. The error is known to happen with typer version 0.7.0, whereas typer version 0.9.0 is known to be compatible.

The workaround is to pip install --upgrade typer.

A possible fix would be to update the required dependencies to specify that a more recent version of typer is required.

Contrastive losses and marginalization connection

(1) Contrastive losses

https://github.com/arcee-ai/arcee-retriever/blob/main/RAG_e2e_in_batch_negs/train_rage2e.py#L512
I had a comment on contrastive losses you've used and then as I typed it out I wondered if I'm taking your use of the word contrastive too literally. I understand how you're using
I ask because the contrastive losses I expected to stumble upon is something like the one here: https://lilianweng.github.io/posts/2021-05-31-contrastive/

(2) Question on marginalization

I am wondering if the line here i.e.

marginalized_prob_sum = answer_log_prob + doc_logprobs

is a translation of eq (2) in your paper but expressed in log space because of the use of log probs?

On a different thread but in the same context, I could be completely off my rocker, but it seems all the prob values in are being incremented by the same amount uniformly (unless I got my broadcast rules wrong)

from a runtime shape output from within the function

answer_log_prob torch.Size([102, 50257])
doc_logprobs torch.Size([1, 1])
tensor([[0.]], device='cuda:0', grad_fn=<UnbindBackward0>)
query_passage_log_prob torch.Size([25, 50257])

if this is correct, I would reason the addition of the doc_logprobs isn't much in terms of achieving the goal. Again it feels like I'm off my rocker, I'll leave it to you to tell where I am wrong

I'm wondering if all the other probabilities that we have with us in the row as opposed to just the diagonal elements may somehow aid us. It coould just be my sleepy self coming up with half-assed ideas. I guess we should discuss this

Update README to document reading comprehension

The current README is missing documentation regarding reading comprehension datasets and related generator training (#74)

Add py.typed for typing

From https://stackoverflow.com/a/73766539

For hatchling

Desiring Retriever Only Inference

from dalm.models.rag_e2e_base_model import AutoModelForRagE2E, Mode
import numpy as np
import torch

RETRIEVER_ = "arcee-ai/bge-code-retriever"
GENERATOR_ = "codellama/CodeLlama-7b-Instruct-hf"

COLUMN_NAME = "function"
EMBED_DIM = 384

MAX_LENGTH = 512

selected_torch_dtype = torch.float16

rag_model = AutoModelForRagE2E(RETRIEVER_, GENERATOR_, use_bnb=Mode.GENERATOR)

Here for example if I only want to use the retriever I would like to only load the retriever into memory

dalm qa-gen toy_data_train.csv doesn't work out of the box.

I'm trying to generate triplets using dalm qa-gen on my local CSV file to that has a column called 'Passage' which contains my chunked texts. Apparently it's expecting a title column? Any chance you guys have an example of how the input data should be formatted for ingestion? Thanks!

Rag-end2end didn't achieve any improvement in recall score compared to training only with Retriever

Thanks a lot for this awesome open-source framework. I used a portion of an open-source reading comprehension dataset for training, with 10,000 training data and 6,685 test data.

Here are the results as shown in the table below, use --top_k 5

	no prompt	query = prompt + query
base-model bge-v1.0-large	0.8192969	0.8315632
Retriever with contrastive learning - epoch 3	0.8622288	0.8806282
Retriever End2End - epoch 1	0.8674644	0.8577412
Retriever End2End - epoch 2	0.8557965	0.8560957

For the prompt, I used the prompt from the BGE model: 'Represent this sentence for searching relevant passages.' Since the BGE model's mean pooling uses the CLS token features, I made modifications to the mean pooling part of the retrieval module in DALM. During the training process, RAG-End2End used a batch size of 16, while Retriever_Only used a batch size of 40. Other parameters were set to their default values.

The conclusion I have reached is that after fine-tuning with contrastive learning, there is a decent improvement in the top 5 recall score, but rag end2end did not help. I wanted to ask about the hyperparameters, training and testing data used for evaluation in the readme, and if I have missed anything?

RAG-end2end loss:

Retriever with contrastive learning loss:

Extract training code as a Trainer class instance for testable code

What needs to be done

Either

Make a base trainer class similar to trl's base trainer
Mold code in train_rage2e.py as said base trainer class instance

take the functional approach of breaking down code into multiple functions

Reason for proposal

The hope is to make the code more amenable to testing

How to run model + finetuned adapter in LlamaIndex or Langchain?

Hi,

Does anyone have an example of how to get the models + adapters running in a RAG pipeline using the LlamaIndex or Langchain framework? I want to try to use the RAG Fusion retriever, but I've just been spinning my wheels trying to wrap the models with Transformers in a way that is useable in these frameworks.

Thanks!

E2E training checkpoint saving doesn't work

The trace

09/21/2023 02:37:25 - INFO - accelerate.accelerator - Saving current state to ./**********/step_5000                                                                                              
Traceback (most recent call last):                                                                                                                                                                                   
  File "/****/DALM/dalm/training/rag_e2e/train_rage2e.py", line 490, in <module>                                                                                                                       
    main()                                                                                                                                                                                                           
  File "/****/DALM/dalm/training/rag_e2e/train_rage2e.py", line 449, in main                                                                                                                           
    accelerator.save_state(output_dir)                                                                                                                                                                               
  File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2653, in save_state                                                                                                                 
    hook(self._models, weights, output_dir)
  File "/****/DALM/dalm/training/utils/train_utils.py", line 10, in save_model_hook
    model.save_pretrained(output_dir, state_dict=weights[i])
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'AutoModelForRagE2E' object has no attribute 'save_pretrained'

There seems to be a straight forward enough fix

14.30  × Preparing metadata (pyproject.toml) did not run successfully.
14.30  │ exit code: 1
14.30  ╰─> [33 lines of output]
14.30    Traceback (most recent call last):
14.30     File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
14.30      main()
14.30     File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
14.30      json_out['return_val'] = hook(**hook_input['kwargs'])
14.30     File "/opt/conda/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 152, in prepare_metadata_for_build_wheel
14.30      whl_basename = backend.build_wheel(metadata_directory, config_settings)
14.30     File "/tmp/pip-build-env-yo0q6xyo/overlay/lib/python3.10/site-packages/hatchling/build.py", line 58, in build_wheel
14.30      return os.path.basename(next(builder.build(directory=wheel_directory, versions=['standard'])))
14.30     File "/tmp/pip-build-env-yo0q6xyo/overlay/lib/python3.10/site-packages/hatchling/builders/plugin/interface.py", line 155, in build
14.30      artifact = version_api[version](directory, **build_data)
14.30     File "/tmp/pip-build-env-yo0q6xyo/overlay/lib/python3.10/site-packages/hatchling/builders/wheel.py", line 412, in build_standard
14.30      for included_file in self.recurse_included_files():
14.30     File "/tmp/pip-build-env-yo0q6xyo/overlay/lib/python3.10/site-packages/hatchling/builders/plugin/interface.py", line 176, in recurse_included_files
14.30      yield from self.recurse_selected_project_files()
14.30     File "/tmp/pip-build-env-yo0q6xyo/overlay/lib/python3.10/site-packages/hatchling/builders/plugin/interface.py", line 180, in recurse_selected_project_files
14.30      if self.config.only_include:
14.30     File "/tmp/pip-build-env-yo0q6xyo/overlay/lib/python3.10/site-packages/hatchling/builders/config.py", line 781, in only_include
14.30      only_include = only_include_config.get('only-include', self.default_only_include()) or self.packages
14.30     File "/tmp/pip-build-env-yo0q6xyo/overlay/lib/python3.10/site-packages/hatchling/builders/wheel.py", line 231, in default_only_include
14.30      return self.default_file_selection_options.only_include
14.30     File "/opt/conda/lib/python3.10/functools.py", line 981, in __get__
14.30      val = self.func(instance)
14.30     File "/tmp/pip-build-env-yo0q6xyo/overlay/lib/python3.10/site-packages/hatchling/builders/wheel.py", line 219, in default_file_selection_options
14.30      raise ValueError(message)
14.30    ValueError: Unable to determine which files to ship inside the wheel using the following heuristics: https://hatch.pypa.io/latest/plugins/builder/wheel/#default-file-selection

It should be fixed once #86 is merged.

paper released?

Hi, Thank you for the great work.
I wonder whether your paper has released? If so, please let me know