Comments (3)
Also, the nice thing about running a server right now is you can expose APIs, which, IMO, is alot easier to work with than text files. For example, right now I have a use case which requires a lot of janking with the text file regime. So for me, alot of my LLM usage is for static content that gets generated in sequential fashion. Essentially, its a chain of multiple choice prompts, where at each stage of the chain, the user can select from a set of statically generated LLM output, and I have generated all permutations of the chain and now want to evaluate random paths through the chain.
To accomplish this, I am basically thinking of writing a script to generate n custom python script providers, that each read the nth column of a csv file that I have (with each row containing a random path of length n). Then Im thinking of using a blank vars.txt file with a single variable to pass the row number to each of the provider files ... lol
Maybe this is kind of a psycho use case for promptfoo, since its mostly static content that Im dealing with (which btw, if there is a better solution, please suggest) but I think I still would prefer to use promptfoo because Im thinking I can re-use the flexibility of the custom script providers for a more dynamic setup in the future.
Anyways, point is, Im having to generate a lot of text files as part of the test script, and feel like if there was an API solution, it might be simpler to plug into my existing code. Would like to hear your thoughts on this, and would be down to help make this happen, in some limited capacity, although not a typescript guy
from promptfoo.
Hi @JohnPeng47,
Thanks for the suggestion. Definitely planning to move in this direction, including a self-hosted server - I also work with a team that would benefit greatly.
Have you tried using promptfoo share
(docs)? It generates a shareable URL, for example: https://app.promptfoo.dev/eval/f:3756cd5e-9ae9-4e91-9a57-cad229cd646f. Won't solve all the use cases you listed, but at least makes it easier to share with nontechnical people.
Roughly speaking, what would your ideal API look like?
from promptfoo.
Yes, the share feature is actually great haha, love how the live server integration.
About the API design ...
Just spit-balling, I'm thinking maybe an API design that separates
a) defining the run configurations
b) actually running the test suite
c) some kind of pull/push/webhook based interface for the custom providers?
Separating a and b makes sense to me, especially for a web UI that would, presumably, let you define and persist previous run configurations and see their results. And maybe introduce a test suite abstraction over single tests, so you can run/view tests in batches.
c) seems like it would be the hardest to do? But IMO if this could be implemented, it would be super clutch. Not sure what the test execution would look like with your custom webhook inside their codebase ... but if you can design this nicely, it would be super sweet, because it would effectively be hooked right into their CI/CD. I think as LLM sophistication increases, custom provider is a no-brainer cuz custom model/custom post-processing/custom pre-processing, and CI/CD LLM evals would be absolutely critical.
Anyways, my 2Cs, would love to know what you think
from promptfoo.
Related Issues (20)
- FR: Ollama Chat API HOT 2
- Question: Is it possible to take an external LLM response(output) and promptfoo just runs the metrics on the output? HOT 4
- Prompfoo giving false-positives? HOT 3
- Bedrock model identifier for Claude 2.1 is `anthropic.claude-v2:1`; Code splits : so it hits v2 HOT 2
- Adding AZURE_OPENAI_API_KEY in the config.yaml file. Is it possible? HOT 2
- Does the provider support loading from files? HOT 1
- Add provider for Together AI HOT 2
- Python assert do not work HOT 5
- HF Text Generation Inference Response Parsing Issue HOT 2
- Loading multi prompt with config set in command line doesn't work HOT 3
- Using `repeat` with cache does not work as expected HOT 3
- Specifying multiple __expected column would test only one of them HOT 2
- Cannot test complex tools that use $refs and $defs HOT 3
- Stop sequence validation error for `gpt-4-vision-preview` HOT 2
- model-graded-factuality and model-graded-closedqa asserts using AzureOpenAI HOT 2
- Question: What models actually support the perplexity metric? HOT 3
- Is there a way to assign a prompt file to a provider HOT 1
- Prompt and variable files are not always relative to configuration directory HOT 2
- HTML Output rendering does not have all information HOT 1
- Error when calling gemini-pro HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from promptfoo.