Hi, I was wondering if the semantic search would improve if one woul

This is super neat! Thanks for sharing the UMAP comparison <a class="user-mention notr

Very interesting experimental results. Out of curiosity, the <code class="notranslate"

Using SetFit Embeddings for Semantic Search? about setfit HOT 5 OPEN

Raidus commented on June 11, 2024 3

Using SetFit Embeddings for Semantic Search?

from setfit.

Comments (5)

Raidus commented on June 11, 2024 7

I have reduced the dimensions with UMAP and visualized the embeddings of the training set with all-MiniLM-L12-v2 vs all-MiniLM-L12-v2-setfit (fitted model). Then I just highlighted every text which includes "acne" and "pimple". The green ones are which do not include "acne" or "pimple". The actual task was a binary classification if a text is related to skincare or not.

It looks like that the model "learned" that "acne" and "pimple" are very close. Their embeddings are closer on average after fitting the model with the training data. I did not calculate the average distance of those embeddings but from a visual point they should be closer together.

That tells me that even after binary classification the embeddings could be used improving the semantic search. I'll do another test with a multi-label classification but creating the training set needs some data wrangling. When I've found some time to do test, I'll post the results here.

from setfit.

hanshupe commented on June 11, 2024

I am very interested in this topic too - planning to use only the fine-tuning part and use the embeddings for semantic search. Any thoughts?

from setfit.

pleonova commented on June 11, 2024

This is super neat! Thanks for sharing the UMAP comparison @Raidus!

Tangential question, are you uploading your model to the HF hub or you storing the fine-tuned model locally and then calling it to get the embeddings?

from setfit.

tomaarsen commented on June 11, 2024

Very interesting experimental results. Out of curiosity, the model_sbert/all-MiniLM-L12-v2 SentenceTransformer is not finetuned on the data, right?

from setfit.

karndeepsingh commented on June 11, 2024

Hi,
How I can train the model Setfit model for semantic search assuming I don't have labeled data ( let's say I have product descriptions) then how I can use the trainer Setfit trainer to create positive and negative samples, as per the hugging face blog it needs a few labels to train right? (Correct me if I am wrong)
Please, help to understand the process of how I can just use the product description to train setfit model and use that on my queries for semantic search

Thanks

from setfit.

Recommend Projects

Using SetFit Embeddings for Semantic Search? about setfit HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent