Code Monkey home page Code Monkey logo

Comments (7)

typpo avatar typpo commented on May 20, 2024

Thanks for comment. This is similar in spirit to #57, where the solution is to make additional prompt configs but that is tedious (and does not lend itself well to reusability).

One option would be to let developers set up multiple test suites in a file. Another option would be to let you pass multiple test suite configs to the promptfoo eval runner..

from promptfoo.

ForgeJosh avatar ForgeJosh commented on May 20, 2024

Thanks for the quick reply. It is pretty similar, but I think 1 difference would be that in the use case I'm considering, each prompt could have very different variables. I actually don't hate multiple prompt configs, since they really are different, but it's a bit clunky, you're right.

I think allowing multiple test suites in a file makes sense to me, everything would work the same as now, except there could be multiple test blocks pointing to a prompt file and a bunch of tests?

from promptfoo.

flipace avatar flipace commented on May 20, 2024

We also have a slightly different use case, where we have more complicated chains of prompts on service side. We don't really want to test the individual prompts, but instead the final output of these chains (basically simulate what a user would see at the end -> have promptfoo call our API with certain options).

For this we implemented a custom provider and apply a patch (via patch-package) on promptfoo which simply passes down the vars object to the providers, in addition to the prompt string.

For us, the prompt string given to promptfoo is just Run #1 etc and instead of the prompt, we use the vars which are set on the individual tests and send them to our API. The result string is then returned to promptfoo as usual.

We trigger promptfoo.evaluate inside of a custom script and we can specify a suite of individual tests to run (ie yarn test:ai foo or yarn test:ai bar will each only pass their respecitve test cases to evaluate).

The TestCase object in our case gets extended with specific additional (typed) keys in the vars object, where we can use our existing types for this API request (so it's all nicely typed across the board) - it would be nice if TestCase would accept generics for its vars 🤔

We might create some PRs with suggestions / additions here for promptfoo, but since I'm going to be on vacation the next 2 weeks it could take some time 😅

from promptfoo.

zhlmmc avatar zhlmmc commented on May 20, 2024

I'm having similar cases. I need to run multiple totally different prompts against the same set of LLMs. Each prompt has different variables and assertions. The only way I can see now is to write different config for each prompt which is ok as I can manually combine the results. But it would be great to be able to see the results in the same report.

from promptfoo.

typpo avatar typpo commented on May 20, 2024

Hi @zhlmmc,

I think Scenarios solve your problem. It's a way to include multiple sets of vars and tests on the same set of LLMs.

@flipace I'm late to your feedback, but d9623cb makes TestCase generic and also passes context: {vars} as a second argument to callApi. It will be in the next release :)

from promptfoo.

zhlmmc avatar zhlmmc commented on May 20, 2024

Hi @zhlmmc,

I think Scenarios solve your problem. It's a way to include multiple sets of vars and tests on the same set of LLMs.

@flipace I'm late to your feedback, but d9623cb makes TestCase generic and also passes context: {vars} as a second argument to callApi. It will be in the next release :)

Hi,

I have tried to play with the Scenarios, but it seems not working in my case. What I need is actually "prompt-test" combination not "data-test" combination. In my case, all prompts are fixed prompts that do not need variables to input at runtime. The ideal output looks like this:

Prompt GPT-3.5 GPT-4 Llama2 ChatGLM PaLM
1 score score score score score
2
3
4
5

Not sure if there is a way to achieve this, thanks!

from promptfoo.

typpo avatar typpo commented on May 20, 2024

To get this sort of layout, I would suggest the following:

prompt.txt simply outputs the prompt var:

{{ prompt }}

promptfooconfig.yaml sets multiple providers, with each unique prompt as a var:

prompts: [prompt.txt]
providers: [openai:gpt-3.5-turbo, openai:gpt-4, llama2, chatglm, palm]
tests:
  - vars:
      prompt: <insert prompt 1 here>
  - vars:
      prompt: <insert prompt 2 here>
  - vars:
      prompt: <insert prompt 3 here>
  # ...

This will produce a table similar to what you have outlined above.

from promptfoo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.