openshift-psap / llm-load-test Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
It would be useful to optionally test only with long inputs. It would also be helpful to avoid errors due to too-long sequence lengths. We can solve both of these issues by adding two new config.yaml options to filter the dataset based on:
Be able to capture the model's output token length in ghz.
This depends on ghz being able to do it (investigation / upstream PR etc).
Add initial CI so that the linter runs on all PRs
Add a field to the config.yaml file for specifying a string template that is used to define the prompt / system prompt format for the inputs coming from the dataset. This would also require regenerating the dataset to remove the system prompts which are currently hard-coded for llama-type models.
Currently at the end of the test duration, the main process waits for the user processes to finish all active requests. This behavior can produce strange results when load test concurrency goes above the maximum batch size that the runtime can handle for a given model. In cases like these, the server side throughput looks lower because of the time spent finishing up the last few pending requests, not fully utilizing the server side resources.
Some potential solutions:
It will be nice for the runtime performance benchamrking tool to also track some evaluation metrics on the test dataset. This can help shed light on cases like if the improved performance is coming at the cost of degradation in evaluation metrics. Reporting evaluation metrics along with runtime performance metrics (throughput/latency) will provide a more comprehensive picture.
Be able to capture the model's output in ghz.
This depends on ghz being able to do it (investigation / upstream PR etc).
See $title.
Add end to end tests in CI.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.