Comments (6)
Hey @streichsbaer,
#482 starts the work on this. A couple areas where I'm seeking feedback
- I've made
chosen
andrejected
string arrays rather than strings, because some users have multiple columns. Does that make sense to you, or does it make ingesting this data more cumbersome? - I am assuming that
system
andquestion
are test case vars. Does that match workflow? I'm wondering if there is a better way to do this more generically. I could just add thevars
map to the object.
from promptfoo.
This is an interesting idea, and you're right that we essentially have all the information already. What is the simplest implementation that would be useful to you? e.g. an "export" button or command that outputs {prompt, response, preferred} in a machine-readable format?
from promptfoo.
Glad to hear it!
Yes, an "export" button that produces a file with this format.
[
{
"system": "You are an AI assistant...",
"question": "Generate an approximately...",
"chosen": "Midsummer House is a moderately...",
"rejected": " Sure! Here's a sentence that..."
},
...
]
User experience considerations
Evals for 2 models
- Allow setting a thumbs up for an entire column (e.g. auto-choose GPT-4 vs GPT-3.5)
- Allow filtering/exporting for only failed test cases, where it would automatically select the passing one as chosen and the failing one as rejected.
from promptfoo.
Hi @typpo, thanks for that, and sorry for the late response.
- Strings should be fine for
chosen
andrejected
, but having string arrays is ok too. The post-processing of this is not an issue. - Correct, the
system
andquestion
are the test case vars. Adding the vars map to the object is a good idea, then there is more flexibility in the post-processing steps.
from promptfoo.
This feature is released in 0.50.0. Thanks and let me know if you have any more feedback!
from promptfoo.
Awesome, thanks @typpo 🙏
from promptfoo.
Related Issues (20)
- CLI docs request: default behavior of `eval --output` HOT 1
- Support `systemInstruction` for Gemini (PALM) HOT 4
- Allow options to avoid using special characters HOT 4
- How can I set threshold for avg. of test cases scores with csv? HOT 2
- Types for `promptfoo.evaluate` broken when using TS
- Add ability to bust cache from the web UI
- When Python provider raises exception, details are no longer recorded HOT 3
- Ensure percent complete accurately reflects test suite
- Migrations path incorrect in self host docker build HOT 4
- [Web UI] Image previewer not working for variables or failed tests
- CI passes despite failing build HOT 1
- python external assertion not working HOT 2
- [Web UI] Increase robustness of markdown rendering
- Specify a label with a prompt function HOT 3
- --no confirmation for cli view HOT 1
- expression to select subset of output to display in view HOT 3
- SqliteError: no such table: evals HOT 1
- Error: Unknown Amazon Bedrock model: meta.llama3-70b-instruct-v1:0 HOT 4
- Compare providers based on latency HOT 3
- Failed to fetch when attempting to log in/sign up on the web-UI after promptfoo share
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from promptfoo.