Code Monkey home page Code Monkey logo

Comments (7)

lewtun avatar lewtun commented on June 15, 2024 1

The specific use case is the following:

  • Download a model and dataset from the Hub
  • Optimise the model with optimum
  • Run evaluation of base vs optimised model on dataset (ideally with latencies / throughput reported)

Since we're using the pipeline() function under the hood, I think it would be fine to just support CPU / GPU via the device argument. This would give a baseline for users to start from, and they can always roll their own hardware-specific loop if needed.

For reference, this is currently the function I'm using to compute latencies:

import numpy as np
from time import perf_counter 

def time_pipeline(pipeline, dataset, num_samples=100):
    sample_ds = dataset.shuffle(seed=42).select(range(num_samples))
    latencies = []
    # Timed run
    for sample in sample_ds:
        start_time = perf_counter()
        _ = pipeline(sample["text"])
        latency = perf_counter() - start_time
        latencies.append(latency)
    # Compute run statistics
    time_avg_ms = 1000 * np.mean(latencies)
    time_std_ms = 1000 * np.std(latencies)
    print(f"Average latency (ms) - {time_avg_ms:.2f} +\- {time_std_ms:.2f}")
    return {"time_avg_ms": time_avg_ms, "time_std_ms": time_std_ms}

from evaluate.

philschmid avatar philschmid commented on June 15, 2024 1

For Latency and throughput, we should rather use a dummy input for different sequence lengths instead of selecting a few samples from the dataset.
You normally always have latency & throughput for, e.g. sequence length 128.

from evaluate.

ola13 avatar ola13 commented on June 15, 2024 1

I think that's a great idea! True - it is backend dependent, but it will be very useful for debugging.

I wonder if it would be useful to optionally output not only the metric values but some sort of an evaluation report - basic setup information along with the runtime metrics, e.g. what device it was evaluated on etc.

I think it would be valuable to have these numbers for full evaluation, not only dummy input, as it doesn't really cost anything and it can provide additional insights (again, mainly in the debugging scenario)

from evaluate.

lvwerra avatar lvwerra commented on June 15, 2024 1

I think we can can just add the throughput information to the dict that is returned by the evaluator.

I like the idea of an evaluation report, however, I don't think we can assume to know e.g. the device a the pipeline is running on: for now it is a transformer pipeline but it could be any callable so we would not know how to get that info. The evaluate.save function lets you store any information and by default also saves some system information. Maybe we could extend this and then let the user add whatever can not be easily inferred (e.g. the device of the pipeline). What do you think?

As for dummy inputs: I think this is something we should let the user handle. Maybe we can extend the docs with a dedicated "Evaluator" section and have a guide "How to measure the performance of your pipeline" section where we show best practices.

from evaluate.

lvwerra avatar lvwerra commented on June 15, 2024

Also @douwekiela proposed this in #23. The difficulty is that these numbers are hardware and inference setup dependent, so the question is a bit what their value would be? Also, how would you calculate them with bootstrapping? Do you have a specific use-case?

cc @ola13

from evaluate.

douwekiela avatar douwekiela commented on June 15, 2024

I like that use case! Even if the numbers are not directly comparable across systems, I do think people would appreciate the convenience (eg if I want to benchmark two models on the same system).

from evaluate.

ola13 avatar ola13 commented on June 15, 2024

I'm happy to look into this within the next 2 week, if someone feels like taking a stab at it feel free to reassign :)

from evaluate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.