Code Monkey home page Code Monkey logo

setfit's Introduction

๐Ÿค— Models | ๐Ÿ“Š Datasets | ๐Ÿ“• Documentation | ๐Ÿ“– Blog | ๐Ÿ“ƒ Paper

SetFit - Efficient Few-shot Learning with Sentence Transformers

SetFit is an efficient and prompt-free framework for few-shot fine-tuning of Sentence Transformers. It achieves high accuracy with little labeled data - for instance, with only 8 labeled examples per class on the Customer Reviews sentiment dataset, SetFit is competitive with fine-tuning RoBERTa Large on the full training set of 3k examples ๐Ÿคฏ!

Compared to other few-shot learning methods, SetFit has several unique features:

  • ๐Ÿ—ฃ No prompts or verbalizers: Current techniques for few-shot fine-tuning require handcrafted prompts or verbalizers to convert examples into a format suitable for the underlying language model. SetFit dispenses with prompts altogether by generating rich embeddings directly from text examples.
  • ๐ŸŽ Fast to train: SetFit doesn't require large-scale models like T0 or GPT-3 to achieve high accuracy. As a result, it is typically an order of magnitude (or more) faster to train and run inference with.
  • ๐ŸŒŽ Multilingual support: SetFit can be used with any Sentence Transformer on the Hub, which means you can classify text in multiple languages by simply fine-tuning a multilingual checkpoint.

Check out the SetFit Documentation for more information!

Installation

Download and install setfit by running:

pip install setfit

If you want the bleeding-edge version instead, install from source by running:

pip install git+https://github.com/huggingface/setfit.git

Usage

The quickstart is a good place to learn about training, saving, loading, and performing inference with SetFit models.

For more examples, check out the notebooks directory, the tutorials, or the how-to guides.

Training a SetFit model

setfit is integrated with the Hugging Face Hub and provides two main classes:

  • SetFitModel: a wrapper that combines a pretrained body from sentence_transformers and a classification head from either scikit-learn or SetFitHead (a differentiable head built upon PyTorch with similar APIs to sentence_transformers).
  • Trainer: a helper class that wraps the fine-tuning process of SetFit.

Here is a simple end-to-end training example using the default classification head from scikit-learn:

from datasets import load_dataset
from setfit import SetFitModel, Trainer, TrainingArguments, sample_dataset


# Load a dataset from the Hugging Face Hub
dataset = load_dataset("sst2")

# Simulate the few-shot regime by sampling 8 examples per class
train_dataset = sample_dataset(dataset["train"], label_column="label", num_samples=8)
eval_dataset = dataset["validation"].select(range(100))
test_dataset = dataset["validation"].select(range(100, len(dataset["validation"])))

# Load a SetFit model from Hub
model = SetFitModel.from_pretrained(
    "sentence-transformers/paraphrase-mpnet-base-v2",
    labels=["negative", "positive"],
)

args = TrainingArguments(
    batch_size=16,
    num_epochs=4,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
)

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    metric="accuracy",
    column_mapping={"sentence": "text", "label": "label"}  # Map dataset columns to text/label expected by trainer
)

# Train and evaluate
trainer.train()
metrics = trainer.evaluate(test_dataset)
print(metrics)
# {'accuracy': 0.8691709844559585}

# Push model to the Hub
trainer.push_to_hub("tomaarsen/setfit-paraphrase-mpnet-base-v2-sst2")

# Download from Hub
model = SetFitModel.from_pretrained("tomaarsen/setfit-paraphrase-mpnet-base-v2-sst2")
# Run inference
preds = model.predict(["i loved the spiderman movie!", "pineapple on pizza is the worst ๐Ÿคฎ"])
print(preds)
# ["positive", "negative"]

Reproducing the results from the paper

We provide scripts to reproduce the results for SetFit and various baselines presented in Table 2 of our paper. Check out the setup and training instructions in the scripts/ directory.

Developer installation

To run the code in this project, first create a Python virtual environment using e.g. Conda:

conda create -n setfit python=3.9 && conda activate setfit

Then install the base requirements with:

pip install -e '.[dev]'

This will install mandatory packages for SetFit like datasets as well as development packages like black and isort that we use to ensure consistent code formatting.

Formatting your code

We use black and isort to ensure consistent code formatting. After following the installation steps, you can check your code locally by running:

make style && make quality

Project structure

โ”œโ”€โ”€ LICENSE
โ”œโ”€โ”€ Makefile        <- Makefile with commands like `make style` or `make tests`
โ”œโ”€โ”€ README.md       <- The top-level README for developers using this project.
โ”œโ”€โ”€ docs            <- Documentation source
โ”œโ”€โ”€ notebooks       <- Jupyter notebooks.
โ”œโ”€โ”€ final_results   <- Model predictions from the paper
โ”œโ”€โ”€ scripts         <- Scripts for training and inference
โ”œโ”€โ”€ setup.cfg       <- Configuration file to define package metadata
โ”œโ”€โ”€ setup.py        <- Make this project pip installable with `pip install -e`
โ”œโ”€โ”€ src             <- Source code for SetFit
โ””โ”€โ”€ tests           <- Unit tests

Related work

Citation

@misc{https://doi.org/10.48550/arxiv.2209.11055,
  doi = {10.48550/ARXIV.2209.11055},
  url = {https://arxiv.org/abs/2209.11055},
  author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Efficient Few-Shot Learning Without Prompts},
  publisher = {arXiv},
  year = {2022},
  copyright = {Creative Commons Attribution 4.0 International}
}

setfit's People

Contributors

blakechi avatar bofenghuang avatar bogedy avatar danielkorat avatar danstan5 avatar edabati avatar eunseojo avatar fgbelidji avatar grofte avatar jegork avatar ken-myers avatar kobiche avatar lewtun avatar lukethedukebates avatar moshewasserb avatar mouhanedg56 avatar mpangrazzi avatar muhtasham avatar nbertagnolli avatar orenpereg avatar pdhall99 avatar philipmay avatar rtrompier avatar sanderland avatar tomaarsen avatar twerkmeister avatar victorjmarin avatar yahiaelgamal avatar yongtae723 avatar zachschillaci27 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

setfit's Issues

Predict method problem

Hi, I'm trying to run the predict method on a dataframe text column and I have the following message :
this is the code : df_ri_manclassif['predicted']= model(df_ri_manclassif['global_text'].to_list())
this is the message : ---------------------------------------------------------------------------
NotFittedError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_7856/2981536964.py in
----> 1 df_ri_manclassif['predicted']= model(df_ri_manclassif['global_text'].to_list())

c:\Users\doub2420\AppData\Local\Programs\Python\Python39\lib\site-packages\setfit\modeling.py in call(self, inputs)
60 def call(self, inputs):
61 embeddings = self.model_body.encode(inputs)
---> 62 return self.model_head.predict(embeddings)
63
64 def _save_pretrained(self, save_directory):

c:\Users\doub2420\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\linear_model_base.py in predict(self, X)
445 Vector containing the class labels for each sample.
446 """
--> 447 scores = self.decision_function(X)
448 if len(scores.shape) == 1:
449 indices = (scores > 0).astype(int)

c:\Users\doub2420\AppData\Local\Programs\Python\Python39\lib\site-packages\sklearn\linear_model_base.py in decision_function(self, X)
425 this class would be predicted.
426 """
--> 427 check_is_fitted(self)
428
429 X = self._validate_data(X, accept_sparse="csr", reset=False)
...
-> 1345 raise NotFittedError(msg % {"name": type(estimator).name})
1346
1347

NotFittedError: This LogisticRegression instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.

Can you help ? Thank you !

SetFit for Multilabel Text Classification fails to run

SetFit for Multilabel Text Classification fails to run and throws an error when the code trainer.train() is executed.

Error thrown:

IndexError                                Traceback (most recent call last)
Cell In [22], line 1
----> 1 trainer.train()

File .../site-packages/setfit/trainer.py:161, in SetFitTrainer.train(self)
    158 train_examples = []
    160 for _ in range(self.num_iterations):
--> 161     train_examples = sentence_pairs_generation(np.array(x_train), np.array(y_train), train_examples)
    163 train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=self.batch_size)
    164 train_loss = self.loss_class(self.model.model_body)

File .../site-packages/setfit/modeling.py:214, in sentence_pairs_generation(sentences, labels, pairs)
    212 current_sentence = sentences[first_idx]
    213 label = labels[first_idx]
--> 214 second_idx = np.random.choice(idx[np.where(num_classes == label)[0][0]])
    215 positive_sentence = sentences[second_idx]
    216 # Prepare a positive pair and update the sentences and labels
    217 # lists, respectively

IndexError: index 0 is out of bounds for axis 0 with size 0

Using SetFit for regression tasks?

I was curious about using SetFit for ordinal Likert scale outcomes (ie IMDB movie reviews). It doesn't seem like an obvious option in the SetFit API. Has anyone tried using SetFit for regression tasks?

Problems with multi class classification

Hi,
I have multiple problems with multi class classification.

The README sais that if you do multi class classification you must specify multi_target_strategy.

The strange thing is that all works fine with multiple classes when I do NOT add multi_target_strategy.

But when I specify multi_target_strategy the code complains with:

>> trainer created, start training
Traceback (most recent call last):
  File "/home/phmay/dev/set_fit_frag_magenta/15_fit.py", line 60, in <module>
    trainer.train()
  File "/home/phmay/miniconda3/envs/paraphrase-mining/lib/python3.9/site-packages/setfit/trainer.py", line 255, in train
    train_examples = sentence_pairs_generation_multilabel(
  File "/home/phmay/miniconda3/envs/paraphrase-mining/lib/python3.9/site-packages/setfit/modeling.py", line 257, in sentence_pairs_generation_multilabel
    sample_labels = np.where(labels[first_idx, :] == 1)[0]
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

What did I miss? Is my data format wrong? IMO the README should document this.

Can any1 please help?

num_epochs range

Hi there!
I was wondering whether you can provide a range for typically "good" values to use/test for the argument num_epochs both in the single label classification case and the multi label classification case. Of course, the best performing number depends on the classes to be predicted and the dataset, but in non-FSL settings, typically one uses a range between 2-5 (whereas many researchers may also stick to common defaults such as 3). I'm asking because I noticed that you use rather num_epochs = 20 in your example scripts, so perhaps in general in setfit num_epochs should be higher than in non-FSL settings?

Fine-tuning for Question-Answering

Hello,

Can this library be used for fine-tuning a question-answering model with small amount of data as well ?

I have a data that is in the same format with squad data. It has small amount of context, question, and answers data.

Is it possible use this library to fine tune a question-answering model in huggingface (e.g. deepset/roberta-base-squad2) on my small data ? If it is, how should I set the column_mapping argument of the SetFitTrainer() function ?

cache_dir argument of from_pretrained method not fully functional

SetFitModel.from_pretrained (defined on modeling.py#L69) does not pass necessary variables such as cache_folder to SentenceTransformer during initialization on line 82.

This can render the method useless in certain environments where api frameworks set the home directory to '/nonexistent' with no write permissions. Calling SetFitModel.from_pretrained will throw an error:

    self.model = SetFitModel.from_pretrained(self.hf_model_path, cache_dir=self.hf_cache_dir)
  File "/app/src/python/engine/py_image.binary.runfiles/pip_deps_huggingface_hub/huggingface_hub/hub_mixin.py", line 211, in from_pretrained
    return cls._from_pretrained(
  File "/app/src/python/engine/py_image.binary.runfiles/pip_deps_setfit/setfit/modeling.py", line 72, in _from_pretrained
    model_body = SentenceTransformer(model_id)
  File "/app/src/python/engine/py_image.binary.runfiles/pip_deps_sentence_transformers/sentence_transformers/SentenceTransformer.py", line 87, in __init__
    snapshot_download(model_name_or_path,
  File "/app/src/python/engine/py_image.binary.runfiles/pip_deps_sentence_transformers/sentence_transformers/util.py", line 476, in snapshot_download
    os.makedirs(nested_dirname, exist_ok=True)
  File "/usr/lib/python3.8/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.8/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/lib/python3.8/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  [Previous line repeated 1 more time]
  File "/usr/lib/python3.8/os.py", line 223, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/nonexistent'

Mac M1 GPU support

Hi,

I am wondering if there is a way to send the model to mps GPU (mac apple M1)

Something like:

device = torch.device("mps")

model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2")

model.to(device)
....

in order to exploit the GPU since so far it uses only CPU.

Many thanks in advance for your help

Update model card generation for SetFit tasks

The current push_to_hub() implementation uses sentence-transformers's default model card which isn't quite correct because the task is text-classfication and not feature-extraction.

This results in a model card that is (a) missing nice information about the model / training process and (b) has the wrong widget:

Screen Shot 2022-11-15 at 09 57 48

It would be great to have a dedicated model card for setfit models that includes a template with:

  • The model name and a code snippet on how to use it
  • A pointer back to this repo
  • Tags like setfit and text-classification that we can use to track / filter these models

Now that the huggingface_hub library has dedicated APIs for model cards, we could use these to make this process simple

[FR] use `MultipleNegativesRankingLoss`

If we have more (or equal number of) classes than batch size it should be possible to use MultipleNegativesRankingLoss. This might improve performance.

I would like to get comments about this idea and might provide a PR later.
Does anyone have an opinion on this?

[FR] Make warmup_steps configurable

I would like to make warmup_steps configurable and not have it by default set to 0.1 times steps.

I can provide a PR if wanted.

My experience from hyperparameter optimization with optuna and sentence embeddings is that a value between 0.2 and 0.3 is best.

Multilabel support

Hi!

do you think it could be possible to handle multilabel classification as well (and so handling an array of labels for each single example)? Like e.g. wrapping the LogisticRegression using an OneVsRestClassifier?

Thanks!

PS: this is a very nice work! ๐Ÿ˜„

Use pre-computed token embeddings as features

Thank you for sharing your work!

I would like to use pre-computed token embeddings inputs_embeds as opposed to relying on a tokenizer to generate input_ids from the column mapped to "text".

For context, my pipeline looks as follows:

  1. Embed sequences with a transformer to obtain contextual embeddings of tokens.
  2. For each sequence, keep embeddings for certain subsets of tokens.
  3. Train classifier with SetFit using token embeddings from (2) as features.

I had to make some tweaks for this approach to work (including in sentence-transformers) so I was wondering if there is a simpler way I may be missing.

Thank you.

Save model to disk after training

I am working on an HPC with no internet access on the worker nodes and the only option to save a model after training it is to push it to HuggingFace hub. How do I go about saving it locally to disk?

Implement end-to-end differentiable model

The current SetFit implementation combines the embeddings of a frozen body, with the learnable parameters of a logistic regression classifier.

It would be desirable to have a PyTorch implementation that is end-to-end differentiable. This isn't entirely trivial as we'll likely need different learning rates for the body vs the head.

Sentence Pair Generation Question

I am trying to understand SetFit - when I run the default code in the readme - 8 examples, 2 classes - I noticed the training dataset has 640 examples. That 640 is probably going to include duplicate examples, right? Can you share a bit more detail on the sentence generation process and the impact of duplicates?

Running Evaluation

Hi,
Thanks for sharing this work.
I am wondering if it is possible to run evaluation dataset to tune hyperparameters.
The SetFitTrainer doesn't seem to accept arguments like 'evaluation_strategy', 'save_strategy', 'compute_metrics', etc.
Or perhaps Im doing something wrong?
Thanks.

Error when loading a locally saved model

When I try to load a locally saved model:

from setfit import SetFitModel

model = SetFitModel.from_pretrained("/path/to/model-directory", local_files_only=True)

I get an error:

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/path/to/model-directory'. Use `repo_type` argument if needed.

I think this could be solved by changing these lines from

        if os.path.isdir(model_id) and MODEL_HEAD_NAME in os.listdir(model_id):
            model_head_file = os.path.join(model_id, MODEL_HEAD_NAME)

to something like

        if os.path.isdir(model_id):
            if MODEL_HEAD_NAME in os.listdir(model_id):
                model_head_file = os.path.join(model_id, MODEL_HEAD_NAME)
            else:
                model_head_file = None

[FR] add documentation about the `loss_class` selection

Can you please add documentation about the loss_class selection?

There seems to be a new loss class implemented called SupConLoss.
Maybe you could add some documentation how to select them and which advantages and disadvantages that might have.

Many thanks
Philip

Redirect __call__ to predict in SetFitModel

Hi,

imo because of dry principle this function:

def __call__(self, inputs):
embeddings = self.model_body.encode(inputs)
return self.model_head.predict(embeddings)

should just redirect to predict. Because of dry principle. Like this:

    def __call__(self, inputs):
        return self.predict(inputs)

I can provide a PR if wanted.

Using SetFit Embeddings for Semantic Search?

Hi,

I was wondering if the semantic search would improve if one would train a multilabel-classification model and use those embeddings?

After training a binary classification model I have seen that the embeddings between similar topics on all-MiniLM-L12-v2 vs all-MiniLM-L12-v2-setfit (fitted model) are very close in fitted model which makes sense for me.

# Cosine Similarity
def get_cosine_similarity(vector1, vector2):
  sim = 1 - spatial.distance.cosine(vector1, vector2)
  return sim

word_1 = "acne"
word_2 = "red skin"

emb_fit_1 = model.model_body.encode([word_1])
emb_fit_2 = model.model_body.encode([word_2])

emb_base_1 = model_sbert.encode([word_1])
emb_base_2 = model_sbert.encode([word_2])

print(f"{word_1} vs {word_2} (base)", get_cosine_similarity(emb_base_1, emb_base_2))
print(f"{word_1} vs {word_2} (fit)", get_cosine_similarity(emb_fit_1, emb_fit_2))
acne vs pimple (base) 0.5959747433662415
acne vs pimple (fit) 0.9996786117553711

acne vs red skin (base) 0.36421263217926025
acne vs red skin (fit) 0.9994498491287231

acne vs red car (base) 0.17558744549751282
acne vs red car (fit) 0.0051751588471233845

I would assume that if the model is trained on multi-label-classification task the embeddings would somehow clustered based on the labels which are provided during training. Would that improve the semantic search if enough labels are provided during training?

Of course I could train a model and test it but maybe you have done similar tests and already know if it's working or not :-)

Thanks!

Question regarding framework

Is it possible to just used the fine tuned model to just do encoding? Like the original version of setfit? If possible could you pls let me know how? Thanks in advance

Onnx support?

Hi, really like this work!

Given its advantage on faster inference, have you considered adding support functions, like the example below, to compile SetFitTrainer into the onnx format for production-wise usage?

If that sounds promising, I will be happy to make this feature work!

Example:

# Train 
trainer.train()

# Compile to onnx
onnx_path = "path/to/store/compiled/model.onnx"
trainer.to_onnx(onnx_path, **onnx_related_kwargs)

Normalize before using LogisticRegression

Hi,

as far as I can see is that setfit applies LogisticRegression on top of the output of the sentence transformer model.
See here:

def fit(self, x_train, y_train):
embeddings = self.model_body.encode(x_train)
self.model_head.fit(embeddings, y_train)
def predict(self, x_test):
embeddings = self.model_body.encode(x_test)
return self.model_head.predict(embeddings)

The problem I see is that the output is not normalized by default. Since we use Cosine Sim. to compare embeddings the
length of the vector does not matter. When you do Cosine Sim. this is ok but it is IMO not ok when you apply LogisticRegression.

IMO the embeddings should be normalized to unit length before LogisticRegression is applied. That would be done by passing
normalize_embeddings=True to the encode function. See here:

https://github.com/UKPLab/sentence-transformers/blob/0422a5e07a5a998948721dea435235b342a9f610/sentence_transformers/SentenceTransformer.py#L111-L118

What do you think? I can provide a PR if wanted.

Few-Shot Named Entity Recognition work

Hi, really like your work, have you considered using this framework for few-shot named entity recognition work? or do you have an example code for it, looking forward to the progress in few-shot named entity recognition!

SetFit CUDA error in Kaggle Notebooks

I tried running the example notebooks provided here (Multiclass and Multilabel). They both throw the same error while training:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_134/4032920361.py in <module>
----> 1 trainer.train()

/opt/conda/lib/python3.7/site-packages/setfit/trainer.py in train(self, trial)
    280             optimizer_params={"lr": self.learning_rate},
    281             warmup_steps=warmup_steps,
--> 282             show_progress_bar=True,
    283         )
    284 

/opt/conda/lib/python3.7/site-packages/sentence_transformers/SentenceTransformer.py in fit(self, train_objectives, evaluator, epochs, steps_per_epoch, scheduler, warmup_steps, optimizer_class, optimizer_params, weight_decay, evaluation_steps, output_path, save_best_model, max_grad_norm, use_amp, callback, show_progress_bar, checkpoint_path, checkpoint_save_steps, checkpoint_save_total_limit)
    719                         skip_scheduler = scaler.get_scale() != scale_before_step
    720                     else:
--> 721                         loss_value = loss_model(features, labels)
    722                         loss_value.backward()
    723                         torch.nn.utils.clip_grad_norm_(loss_model.parameters(), max_grad_norm)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/sentence_transformers/losses/CosineSimilarityLoss.py in forward(self, sentence_features, labels)
     37 
     38     def forward(self, sentence_features: Iterable[Dict[str, Tensor]], labels: Tensor):
---> 39         embeddings = [self.model(sentence_feature)['sentence_embedding'] for sentence_feature in sentence_features]
     40         output = self.cos_score_transformation(torch.cosine_similarity(embeddings[0], embeddings[1]))
     41         return self.loss_fct(output, labels.view(-1))

/opt/conda/lib/python3.7/site-packages/sentence_transformers/losses/CosineSimilarityLoss.py in <listcomp>(.0)
     37 
     38     def forward(self, sentence_features: Iterable[Dict[str, Tensor]], labels: Tensor):
---> 39         embeddings = [self.model(sentence_feature)['sentence_embedding'] for sentence_feature in sentence_features]
     40         output = self.cos_score_transformation(torch.cosine_similarity(embeddings[0], embeddings[1]))
     41         return self.loss_fct(output, labels.view(-1))

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py in forward(self, input)
    139     def forward(self, input):
    140         for module in self:
--> 141             input = module(input)
    142         return input
    143 

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/sentence_transformers/models/Transformer.py in forward(self, features)
     64             trans_features['token_type_ids'] = features['token_type_ids']
     65 
---> 66         output_states = self.auto_model(**trans_features, return_dict=False)
     67         output_tokens = output_states[0]
     68 

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/transformers/models/mpnet/modeling_mpnet.py in forward(self, input_ids, attention_mask, position_ids, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict, **kwargs)
    558             output_attentions=output_attentions,
    559             output_hidden_states=output_hidden_states,
--> 560             return_dict=return_dict,
    561         )
    562         sequence_output = encoder_outputs[0]

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/transformers/models/mpnet/modeling_mpnet.py in forward(self, hidden_states, attention_mask, head_mask, output_attentions, output_hidden_states, return_dict, **kwargs)
    345                 position_bias,
    346                 output_attentions=output_attentions,
--> 347                 **kwargs,
    348             )
    349             hidden_states = layer_outputs[0]

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/transformers/models/mpnet/modeling_mpnet.py in forward(self, hidden_states, attention_mask, head_mask, position_bias, output_attentions, **kwargs)
    303             head_mask,
    304             position_bias=position_bias,
--> 305             output_attentions=output_attentions,
    306         )
    307         attention_output = self_attention_outputs[0]

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/transformers/models/mpnet/modeling_mpnet.py in forward(self, hidden_states, attention_mask, head_mask, position_bias, output_attentions, **kwargs)
    244             head_mask,
    245             position_bias,
--> 246             output_attentions=output_attentions,
    247         )
    248         attention_output = self.LayerNorm(self.dropout(self_outputs[0]) + hidden_states)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/transformers/models/mpnet/modeling_mpnet.py in forward(self, hidden_states, attention_mask, head_mask, position_bias, output_attentions, **kwargs)
    166     ):
    167 
--> 168         q = self.q(hidden_states)
    169         k = self.k(hidden_states)
    170         v = self.v(hidden_states)

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1108         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1109                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1110             return forward_call(*input, **kwargs)
   1111         # Do not call functions when jit is used
   1112         full_backward_hooks, non_full_backward_hooks = [], []

/opt/conda/lib/python3.7/site-packages/torch/nn/modules/linear.py in forward(self, input)
    101 
    102     def forward(self, input: Tensor) -> Tensor:
--> 103         return F.linear(input, self.weight, self.bias)
    104 
    105     def extra_repr(self) -> str:

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

Tried different configurations, changed batch_size to 1, it isnt working.

Side Note, Kaggle recently updated their backend, by providing larger CPU and 2 choices for GPU (P100 or 2x T4). I tried on both the GPUs but the notebook isnt running successfully. Can you please help ?

Does num_iterations create duplicate data?

I am trying to get a better understanding behind this hyperparam. As far as I understand, you are iterating over the data num_iterations times and create a positive and negative pair by sampling. Could this result in duplicate data?

Also sometimes it tends to result in more examples than potential pairs for example in imdb for 3 shot there are 6 examples, 2 per class. Setting num_iterations to 5 creates 6 (examples) * 2 (1 positive + 1 negative) * 5 (num_iterations) = 60 examples. The possible combinations though are 6*6/2-6 = 12, essentially half of the matrix of all pairs without the diagonal.

If the above is correct it seems that its like running training for multiple epochs. Is that right? If so, why are you not creating all pairs instead and keep the epochs hyperparam as is which might be more intuitive. If you want a way to sample less data, why not introduce a sample_size to cap those combinations to a lesser number for experimentation?

Error when trying to run run_fewshot_multilabel.py

Hi! Congratulations on the work! I wanted to use this model for my multilabel task but I have an issue when trying to run the script for multilabel classification (run_fewshot_multilabel.py)
i get this error.
from setfit.modeling import LOSS_NAME_TO_CLASS, SetFitBaseModel, SKLearnWrapper, sentence_pairs_generation_multilabel ImportError: cannot import name 'LOSS_NAME_TO_CLASS' from 'setfit.modeling'
I went ahead and checked the file it was referring to (modeling.py) and there was no LOSS_NAME_TO_CLASS variable there.
Any help would be appreciated.

Multilabel Support

What is the current status of multi label support? I saw a notebook for a multi label task here, but can't find it anymore. Can someone provide a working example for a multi label classification task? Thanks!

Implement trainer for distillation

In our paper, we used knowledge distillation to compress a SetFit model into a smaller student. This logic is currently implemented as a script, but it would be valuable to refactor this out to a SetFitDistillationTrainer that people can use in e.g. a Jupyter notebook.

cc @orenpereg

'SetFitTrainer' object has no attribute 'config'

Hi, first of all congrats for this huge work!

I have trained a model and am trying to make predictions with pipeline, however, it keeps throwing a 'SetFitTrainer' object has no attribute 'config'. If not that, what's the correct way to run this model?

Many thanks.

Warmup steps calculation should not ignore epochs.

The warmup steps calculation should not ignore epochs. See here:

warmup_steps = math.ceil(train_steps * 0.1)

The bug is where train steps is calculates here:

train_steps = len(train_dataloader)

the right implementation is here (but not always executed):

train_steps = len(train_dataloader) * self.num_epochs

I can provide a PR.

Use amp / FP16

Hi, since the sentence transformers support FP16 training (by setting use_amp=True)
can this also be build in here?

Not sure if this is a FR or a question. :-)

How to take advantage of Mac M1 GPUs?

More than an issue, this is a request for help.

Do you have advice on how to take advantage of the Mac M1 Pro GPU for training a model, assuming the underlying Torch implementation provides support?

There are some tutorials on how to use Torch with the MPS driver, but I'm not sure how to signal SetFit to use a specific GPU.

Using Setfit for similarity classification

Hello,
I would like to test this promising framework on a similarity classification task. So basically, I have got a dataset with 3 columns: (sentence1,sentence2,label). From what I understand, currently it is only possible to train on a single sentence classification problem.
Is there a get around to use Setfit for a pair sentence classification problem ? If not, would it be possible to add this feature in a future integration ?

Thank you in advance

id2label and label2id setup for inference

I have add the id2label and labnel2id top the pretrained model, I know this is a Sentence Transformer model but I dont know if there is a way to pass the label mapper to the head of the SetFitModel

model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2",num_labels=num_classes,id2label=id2label,label2id=label2id)

In that way in inference time we just call the SetFit Model

HFValidationError when loading the model

Hi there! I am trying to load a model I have stored at Google Drive for inferencing:

# Load SetFit model
tuned_model = SetFitModel.from_pretrained("/content/drive/My Drive/models/tuned-model")
# Run inference
tuned_model(["i didnt feel humiliated", "i feel romantic too", "im grabbing a minute to post i feel greedy wrong"])

But I get the following error:

HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/content/drive/My Drive/models /tuned-model'. Use repo_type argument if needed.

it works however when I load it from the same script in which I have saved it, by:

# Save trained model to disk
trainer.model.save_pretrained("/content/drive/My Drive/models/tuned-model")

What can be the problem? Can't just I save/load from pretrained to Google Drive?

Many thanks in advance for your support and terrific work.

Add support for zero-shot classification

In our paper, we ran some experiments where adding synthetic data like:

{"text": "This sentence is <class-name-0>", "label": <class-label-0>}
{"text": "This sentence is <class-name-1>", "label": <class-label-1>}
...

and applying the SetFit method gave a boost in performance. In particular, one can use this technique for zero-shot classification and we found that it typically outperforms the BART model used in the zershot-classification pipeline in transformers.

It would be nice to enable this feature by having a function like:

from datasets import Dataset

def add_zeroshot_examples(dataset: Dataset, candidate_labels: Union[str, List[str]], template: str = "This sentence is {}") -> Dataset:
    # Apply logic to create `Dataset` from `template` and `candidate_labels`

This way one could have a workflow like:

from datasets import load_dataset

dataset = load_dataset("sst2", split="train")

dataset_with_zeroshot_examples = add_zeroshot_examples(dataset)

How many samples for setfit?

I understood that setfit is a light weight solution for few shot learning. Two questions came up:
.) What would be a number of samples of class you would switch to standard supervised learning and fine-tuning? E.g. 100 samples?
.) Is there any disadvantage of generating too many pairs (num_iterations) If I have 30 classes, wouldnt be the default of 20 too small to learn meaningful embeddings?

Multi-label classification?

Hi y'all! Awesome library, thanks a lot :-) I was wondering whether you are planning on adding support for multi-label classification, i.e., where each example has 0 or more true classes. As far as I understood the docs and code, currently setfit supports only single-label classification. Do you have any plans or timeline for this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.