Hi, I spent today messing with Promptfoo, it's a cool project. I see

Thanks for comment. This is similar in spirit to <a class="issue-link js-issue-link"

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Support for projects with "Multiple prompt styles" about promptfoo HOT 7 CLOSED

promptfoo commented on May 20, 2024

Support for projects with "Multiple prompt styles"

from promptfoo.

Comments (7)

typpo commented on May 20, 2024

Thanks for comment. This is similar in spirit to #57, where the solution is to make additional prompt configs but that is tedious (and does not lend itself well to reusability).

One option would be to let developers set up multiple test suites in a file. Another option would be to let you pass multiple test suite configs to the promptfoo eval runner..

from promptfoo.

ForgeJosh commented on May 20, 2024

Thanks for the quick reply. It is pretty similar, but I think 1 difference would be that in the use case I'm considering, each prompt could have very different variables. I actually don't hate multiple prompt configs, since they really are different, but it's a bit clunky, you're right.

I think allowing multiple test suites in a file makes sense to me, everything would work the same as now, except there could be multiple test blocks pointing to a prompt file and a bunch of tests?

from promptfoo.

flipace commented on May 20, 2024

We also have a slightly different use case, where we have more complicated chains of prompts on service side. We don't really want to test the individual prompts, but instead the final output of these chains (basically simulate what a user would see at the end -> have promptfoo call our API with certain options).

For this we implemented a custom provider and apply a patch (via patch-package) on promptfoo which simply passes down the vars object to the providers, in addition to the prompt string.

For us, the prompt string given to promptfoo is just Run #1 etc and instead of the prompt, we use the vars which are set on the individual tests and send them to our API. The result string is then returned to promptfoo as usual.

We trigger promptfoo.evaluate inside of a custom script and we can specify a suite of individual tests to run (ie yarn test:ai foo or yarn test:ai bar will each only pass their respecitve test cases to evaluate).

The TestCase object in our case gets extended with specific additional (typed) keys in the vars object, where we can use our existing types for this API request (so it's all nicely typed across the board) - it would be nice if TestCase would accept generics for its vars 🤔

We might create some PRs with suggestions / additions here for promptfoo, but since I'm going to be on vacation the next 2 weeks it could take some time 😅

from promptfoo.

zhlmmc commented on May 20, 2024

I'm having similar cases. I need to run multiple totally different prompts against the same set of LLMs. Each prompt has different variables and assertions. The only way I can see now is to write different config for each prompt which is ok as I can manually combine the results. But it would be great to be able to see the results in the same report.

from promptfoo.

typpo commented on May 20, 2024

Hi @zhlmmc,

I think Scenarios solve your problem. It's a way to include multiple sets of vars and tests on the same set of LLMs.

@flipace I'm late to your feedback, but d9623cb makes TestCase generic and also passes context: {vars} as a second argument to callApi. It will be in the next release :)

from promptfoo.

zhlmmc commented on May 20, 2024

Hi @zhlmmc,

I think Scenarios solve your problem. It's a way to include multiple sets of vars and tests on the same set of LLMs.

@flipace I'm late to your feedback, but d9623cb makes TestCase generic and also passes context: {vars} as a second argument to callApi. It will be in the next release :)

Hi,

I have tried to play with the Scenarios, but it seems not working in my case. What I need is actually "prompt-test" combination not "data-test" combination. In my case, all prompts are fixed prompts that do not need variables to input at runtime. The ideal output looks like this:

Prompt	GPT-3.5	GPT-4	Llama2	ChatGLM	PaLM
1	score	score	score	score	score
2
3
4
5

Not sure if there is a way to achieve this, thanks!

from promptfoo.

typpo commented on May 20, 2024

To get this sort of layout, I would suggest the following:

prompt.txt simply outputs the prompt var:

{{ prompt }}

promptfooconfig.yaml sets multiple providers, with each unique prompt as a var:

prompts: [prompt.txt]
providers: [openai:gpt-3.5-turbo, openai:gpt-4, llama2, chatglm, palm]
tests:
  - vars:
      prompt: <insert prompt 1 here>
  - vars:
      prompt: <insert prompt 2 here>
  - vars:
      prompt: <insert prompt 3 here>
  # ...

This will produce a table similar to what you have outlined above.

from promptfoo.

Support for projects with "Multiple prompt styles" about promptfoo HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent