Code Monkey home page Code Monkey logo

Comments (8)

rajib76 avatar rajib76 commented on June 14, 2024 1

Yes, looking for support in LangKit's OpenAIDefault. Currently if I need to do hallucination checks, looks like I cannot do it using Azure Open AI. It is only supported for open ai. I am looking at using to evaluate the response from Azure Open AI for hallucination, prompt injection, contextual relevancy and all

from langkit.

FelipeAdachi avatar FelipeAdachi commented on June 14, 2024 1

Thanks for the reply @rajib76

For #1, yes, it should be possible to perform the semantic similarity based consistency check without the presence of an LLM.

And #2 also makes a lot of sense for your and others' scenarios

I created two issues to reflect both topics we are discussing:

We'll plan those changes in future sprints

from langkit.

jamie256 avatar jamie256 commented on June 14, 2024

Hi @rajib76! Thanks for opening an issue.

Are you looking for support of an Azure OpenAI deployed LLM in LangKit's OpenAIDefault kind of thing like we do in the Choosing an LLM example? or something else?

I wanted to get some better support for Azure hosted models into LangKit soon, we could probably focus on changes to support the Azure OpenAI models as a first iteration if that is helpful (e.g. gpt-35-turbo, gpt4)?

from langkit.

jamie256 avatar jamie256 commented on June 14, 2024

Ok, working on it. If you want to try an initial dev build:
pip install langkit==0.0.26.dev0

New class and usage looks like this:

from langkit import response_hallucination
from langkit.openai import OpenAIAzure

response_hallucination.init(llm=OpenAIAzure(engine="LangKit-test-01"), num_samples=1)

Also need to set these new env vars:

os.environ["AZURE_OPENAI_ENDPOINT"]
os.environ["AZURE_OPENAI_KEY"]

As referenced in this example: https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/chatgpt?tabs=python&pivots=programming-language-chat-completions#working-with-the-gpt-35-turbo-and-gpt-4-models

from langkit.

rajib76 avatar rajib76 commented on June 14, 2024

Was able to run it but saw one thing. I tried the example code for hallucination score. I did not see a way to add a context and then tell the LLM to grade the response based on question and context

from langkit.

rajib76 avatar rajib76 commented on June 14, 2024

Was able to run it but saw one thing. I tried the example code for hallucination score. I did not see a way to add a context and then tell the LLM to grade the response based on question and context

I think I got how it is working. It is self check validation. The prompt is being sent to the same model to get and answer again and then we are checking back with the response. Is it possible to do below

  1. Send a ground truth instead of LLM creating sample
  2. If I need to create a sample with a llm, can I use a different LLM than the one which actually does the hallucination check(or is the hallucination check done by the LLM or through a ML model)

from langkit.

FelipeAdachi avatar FelipeAdachi commented on June 14, 2024

Hi, @rajib76

Yes, this is exactly how response_hallucination works. To try to answer your questions:

  1. This module was designed with the zero-resource scenario in mind - without ground truth or context available - so sending a ground truth is impossible. This is something we could look into, though. Would it work for your use case if we had a variant where you pass the response and ground truth/context columns, thus removing the need to generate additional samples? (one llm call would still be required for the consistency check)

  2. Currently, both processes (creating the samples and consistency check) use the same LLM, so it is not possible to use one llm to create the sample and another one to do the hallucination check. Can you explain your scenario a bit more?

from langkit.

rajib76 avatar rajib76 commented on June 14, 2024

Thanks Felipe, for#1, it will work if we can just pass the response and the ground truth. But do we need a LLM to do the consistency check. Can we not have an option to do a semantic match with an embedding model and then put a threshold score.

For #2, I am planning to implement chain of verification as I mentioned in this recording. I wanted to check if this can be out of the box from langkit

https://www.youtube.com/watch?v=qRyCmi0DeU8

from langkit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.