Code Monkey home page Code Monkey logo

llm-load-test's People

Contributors

ccamacho avatar dagrayvid avatar drewrip avatar fcami avatar npalaska avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

llm-load-test's Issues

Add dataset config to set min input tokens

It would be useful to optionally test only with long inputs. It would also be helpful to avoid errors due to too-long sequence lengths. We can solve both of these issues by adding two new config.yaml options to filter the dataset based on:

  • minimum input token length
  • max sequence length

Make the system prompt and prompt format configurable

Add a field to the config.yaml file for specifying a string template that is used to define the prompt / system prompt format for the inputs coming from the dataset. This would also require regenerating the dataset to remove the system prompts which are currently hard-coded for llama-type models.

Interrupt ongoing requests at end of test

Currently at the end of the test duration, the main process waits for the user processes to finish all active requests. This behavior can produce strange results when load test concurrency goes above the maximum batch size that the runtime can handle for a given model. In cases like these, the server side throughput looks lower because of the time spent finishing up the last few pending requests, not fully utilizing the server side resources.

Some potential solutions:

  • In streaming case, user processes can check if the test is over between each token
  • Main process can communicate expected end time to the user processes, and user processes can add a timeout to the http requests based on the end time of the test.
  • Keep existing test behavior and filter out the results for requests that ended after the test end time in the results processing code.

Add evaluation metrics on the test dataset

It will be nice for the runtime performance benchamrking tool to also track some evaluation metrics on the test dataset. This can help shed light on cases like if the improved performance is coming at the cost of degradation in evaluation metrics. Reporting evaluation metrics along with runtime performance metrics (throughput/latency) will provide a more comprehensive picture.

Capture the model's output

Be able to capture the model's output in ghz.
This depends on ghz being able to do it (investigation / upstream PR etc).

RFE: Warmup

  • add the ability to warm-up Model Mesh so that all pods are able to serve the model equally

Parallel execution

  • launch multiple (ghz, etc) instances in parallel with different parameters (inference query, nb users, rps, etc)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.