Comments (8)
Yes, looking for support in LangKit's OpenAIDefault. Currently if I need to do hallucination checks, looks like I cannot do it using Azure Open AI. It is only supported for open ai. I am looking at using to evaluate the response from Azure Open AI for hallucination, prompt injection, contextual relevancy and all
from langkit.
Thanks for the reply @rajib76
For #1, yes, it should be possible to perform the semantic similarity based consistency check without the presence of an LLM.
And #2 also makes a lot of sense for your and others' scenarios
I created two issues to reflect both topics we are discussing:
We'll plan those changes in future sprints
from langkit.
Hi @rajib76! Thanks for opening an issue.
Are you looking for support of an Azure OpenAI deployed LLM in LangKit's OpenAIDefault kind of thing like we do in the Choosing an LLM example? or something else?
I wanted to get some better support for Azure hosted models into LangKit soon, we could probably focus on changes to support the Azure OpenAI models as a first iteration if that is helpful (e.g. gpt-35-turbo, gpt4)?
from langkit.
Ok, working on it. If you want to try an initial dev build:
pip install langkit==0.0.26.dev0
New class and usage looks like this:
from langkit import response_hallucination
from langkit.openai import OpenAIAzure
response_hallucination.init(llm=OpenAIAzure(engine="LangKit-test-01"), num_samples=1)
Also need to set these new env vars:
os.environ["AZURE_OPENAI_ENDPOINT"]
os.environ["AZURE_OPENAI_KEY"]
As referenced in this example: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/chatgpt?tabs=python&pivots=programming-language-chat-completions#working-with-the-gpt-35-turbo-and-gpt-4-models
from langkit.
Was able to run it but saw one thing. I tried the example code for hallucination score. I did not see a way to add a context and then tell the LLM to grade the response based on question and context
from langkit.
Was able to run it but saw one thing. I tried the example code for hallucination score. I did not see a way to add a context and then tell the LLM to grade the response based on question and context
I think I got how it is working. It is self check validation. The prompt is being sent to the same model to get and answer again and then we are checking back with the response. Is it possible to do below
- Send a ground truth instead of LLM creating sample
- If I need to create a sample with a llm, can I use a different LLM than the one which actually does the hallucination check(or is the hallucination check done by the LLM or through a ML model)
from langkit.
Hi, @rajib76
Yes, this is exactly how response_hallucination
works. To try to answer your questions:
-
This module was designed with the zero-resource scenario in mind - without ground truth or context available - so sending a ground truth is impossible. This is something we could look into, though. Would it work for your use case if we had a variant where you pass the
response
andground truth/context
columns, thus removing the need to generate additional samples? (one llm call would still be required for the consistency check) -
Currently, both processes (creating the samples and consistency check) use the same LLM, so it is not possible to use one llm to create the sample and another one to do the hallucination check. Can you explain your scenario a bit more?
from langkit.
Thanks Felipe, for#1, it will work if we can just pass the response and the ground truth. But do we need a LLM to do the consistency check. Can we not have an option to do a semantic match with an embedding model and then put a threshold score.
For #2, I am planning to implement chain of verification as I mentioned in this recording. I wanted to check if this can be out of the box from langkit
https://www.youtube.com/watch?v=qRyCmi0DeU8
from langkit.
Related Issues (20)
- response_hallucination: remove llm requirement for consistency check
- Response Hallucination: decouple LLMs for sample generation and consistency checking
- text-davinci-003 OpenAI model deprecated
- Add feature to use bedrock agent to do the evaluation
- faiss-cpu - installation through pip not supported
- Original data HOT 1
- Jupyter kernel crashes on running injections module in Mac HOT 3
- Need example of prompt column name override
- Importing metrics issue since there is not a way to pass the model path if stored locally HOT 3
- Need env variable to avoid attempted download of nltk artifacts
- TypeError: unsupported operand type(s) for +=: 'int' and 'NoneType' HOT 2
- Better error message if OpenAI key is not provided in response_hallucination
- Explain refusal similarity HOT 1
- Specify Python version compatibility HOT 1
- consistency_check in response_hallucination HOT 1
- Use custom prompt in response_hallucination HOT 3
- add support for detoxify local models
- reduce latency in lazy initialization
- SWE-Agent: Implement the hallucinations metric from the main branch in the workflow branch
- Possible merge issue with 0.0.31 release HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from langkit.