Comments (7)
Thanks for comment. This is similar in spirit to #57, where the solution is to make additional prompt configs but that is tedious (and does not lend itself well to reusability).
One option would be to let developers set up multiple test suites in a file. Another option would be to let you pass multiple test suite configs to the promptfoo eval
runner..
from promptfoo.
Thanks for the quick reply. It is pretty similar, but I think 1 difference would be that in the use case I'm considering, each prompt could have very different variables. I actually don't hate multiple prompt configs, since they really are different, but it's a bit clunky, you're right.
I think allowing multiple test suites in a file makes sense to me, everything would work the same as now, except there could be multiple test blocks pointing to a prompt file and a bunch of tests?
from promptfoo.
We also have a slightly different use case, where we have more complicated chains of prompts on service side. We don't really want to test the individual prompts, but instead the final output of these chains (basically simulate what a user would see at the end -> have promptfoo call our API with certain options).
For this we implemented a custom provider and apply a patch (via patch-package
) on promptfoo which simply passes down the vars
object to the providers, in addition to the prompt
string.
For us, the prompt
string given to promptfoo is just Run #1
etc and instead of the prompt, we use the vars which are set on the individual tests and send them to our API. The result string is then returned to promptfoo as usual.
We trigger promptfoo.evaluate
inside of a custom script and we can specify a suite of individual tests to run (ie yarn test:ai foo
or yarn test:ai bar
will each only pass their respecitve test cases to evaluate).
The TestCase
object in our case gets extended with specific additional (typed) keys in the vars
object, where we can use our existing types for this API request (so it's all nicely typed across the board) - it would be nice if TestCase
would accept generics for its vars 🤔
We might create some PRs with suggestions / additions here for promptfoo, but since I'm going to be on vacation the next 2 weeks it could take some time 😅
from promptfoo.
I'm having similar cases. I need to run multiple totally different prompts against the same set of LLMs. Each prompt has different variables and assertions. The only way I can see now is to write different config for each prompt which is ok as I can manually combine the results. But it would be great to be able to see the results in the same report.
from promptfoo.
Hi @zhlmmc,
I think Scenarios solve your problem. It's a way to include multiple sets of vars and tests on the same set of LLMs.
@flipace I'm late to your feedback, but d9623cb makes TestCase
generic and also passes context: {vars}
as a second argument to callApi
. It will be in the next release :)
from promptfoo.
Hi @zhlmmc,
I think Scenarios solve your problem. It's a way to include multiple sets of vars and tests on the same set of LLMs.
@flipace I'm late to your feedback, but d9623cb makes
TestCase
generic and also passescontext: {vars}
as a second argument tocallApi
. It will be in the next release :)
Hi,
I have tried to play with the Scenarios, but it seems not working in my case. What I need is actually "prompt-test" combination not "data-test" combination. In my case, all prompts are fixed prompts that do not need variables to input at runtime. The ideal output looks like this:
Prompt | GPT-3.5 | GPT-4 | Llama2 | ChatGLM | PaLM |
---|---|---|---|---|---|
1 | score | score | score | score | score |
2 | |||||
3 | |||||
4 | |||||
5 |
Not sure if there is a way to achieve this, thanks!
from promptfoo.
To get this sort of layout, I would suggest the following:
prompt.txt simply outputs the prompt
var:
{{ prompt }}
promptfooconfig.yaml sets multiple providers, with each unique prompt as a var:
prompts: [prompt.txt]
providers: [openai:gpt-3.5-turbo, openai:gpt-4, llama2, chatglm, palm]
tests:
- vars:
prompt: <insert prompt 1 here>
- vars:
prompt: <insert prompt 2 here>
- vars:
prompt: <insert prompt 3 here>
# ...
This will produce a table similar to what you have outlined above.
from promptfoo.
Related Issues (20)
- How to properly set perplexity model to run via a YAML file? HOT 15
- Do we have native support for cohere AI models? HOT 3
- Allow progress bar in debug/verbose mode HOT 4
- View server HOT 3
- Invalid option: a HOT 1
- python prompt display HOT 2
- Add bedrock:completion:anthropic.claude-v2.1 HOT 6
- Bedrock anthropic Claude v2 - `llm-rubric produced malformed response:`
- Response from Claude v2 (Bedrock) causes errors HOT 8
- A way to prevent vars array auto-expansion HOT 4
- Customizing the grader provider doesn't work HOT 2
- Make the Config description visible on the view page HOT 1
- error with `promptfoo view` HOT 2
- Context relevance test broken
- HuggingFace api key token configuration HOT 1
- Cannot set environment variables in self-hosted share server HOT 2
- Problem using is-valid-openai-tools-call assertion
- Failed to evaluate ollama model HOT 6
- Feature Request: How to evaluate Azure Open AI Assistant
- Failure to parse response from factuality checker results in mysterious `Invalid option: )` test failure HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from promptfoo.