Code Monkey home page Code Monkey logo

mlx-benchmark's Introduction

mlx-benchmark

This repo aims to benchmark Apple's MLX operations and layers, on all Apple Silicon chips, along with some GPUs.

Contribute: If you have a device not yet featured in the benchmark, especially the ones listed below, your PR is welcome to broaden the scope and accuracy of this project.

Current devices: M1, M1 Pro, M2, M2 Pro, M2 Max, M2 Ultra, M3 Pro, M3 Max.

Missing devices: M1 Max, M1 Ultra, M3, M3 Ultra, and other CUDA GPUs.

Benchmarks

Benchmarks are generated by measuring the runtime of every mlx operations, along with their equivalent in pytorch with mps, cpu and cuda backends. For each operation, we measure the runtime of multiple experiments. We propose 2 benchmarks based on these experiments:

Installation

Installation on Mac devices

Running the benchmark locally is straightforward. Create a new env with osx-arm64 architecture and install the dependencies.

CONDA_SUBDIR=osx-arm64 conda create -n mlx_benchmark python=3.10 numpy pytorch torchvision scipy requests -c conda-forge

pip install -r requirements.txt

Installation on other devices

Other operating systems than macOS can only run the torch experiments, on CPU or with a CUDA device. Install a new env without the CONDA_SUBDIR=osx-arm64 prefix and install the torch package that matches your CUDA version. Then install all the requirements within requirements.txt, except mlx.

Finally, open the config.py file and set:

USE_MLX = False

to avoid importing the mlx package, which cannot be installed on non-Mac devices.

Run the benchmark

Run on Mac

To run the benchmark on mps, mlx and CPU:

python run_benchmark.py --include_mps=True --include_mlx=True --include_cpu=True

Run on other devices

To run the torch benchmark on CUDA and CPU:

python run_benchmark.py --include_mps=False --include_mlx=False --include_cuda=True --include_cpu=True

Contributing

Everyone can contribute to the benchmark! If you have a missing device or if you want to add a missing layer/operation, please read the contribution guidelines.

mlx-benchmark's People

Contributors

alexziskind1 avatar arnabkumarroy02 avatar dasayan05 avatar ivanfioravanti avatar menzhse avatar tristanbilot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mlx-benchmark's Issues

Question: Interpretation of cuda results

Tristan, I'm going to officially do a PR, but wanted to run this preview by you and get your take on how to interpret the results. This is from a machine with Core i9 and RTX4090. Does it look right? What do these results mean (specifically in the cuda column)? Thanks!

Average benchmark:

Operation cpu cuda
Argmax 9.36 0.06
BCE 26.26 25.90
Concat 30.27 0.06
Conv1d 61.74 0.17
Conv2d 26.07 0.09
LeakyReLU 6.11 0.04
Linear 103.52 0.07
MatMul 81.08 0.05
PReLU 4.86 0.05
ReLU 10.29 0.06
SeLU 7.34 0.06
Sigmoid 7.88 0.04
Softmax 21.22 0.04
Softplus 7.25 0.04
Sort 61.68 0.11
Sum 13.22 0.06
SumAll 8.50 0.06

Benchmark logic changed, current one can't give negative results

@TristanBilot in commit ed51428 I see that logic for mps/mlx changed. I.e.: v[h] = (v["mps"] / v["mlx_gpu"] - 1) + 1
This code can't generate negative value but in the benchmark posted by you for M1 Pro they are visibile.
It seems that last + 1 was not considered.

I will post a PR with Benchmark results for M3 Max using current code soon, but I think you need to tun again them on M1 Pro.
No?

Minimum machine requirements for the benchmark

Hi Tristan. I'm running this on a few machines, and the 8GB machines can't deal with it. Not that anyone will really try to use an 8GB machine for MLX, but wondering if you know of the minimum RAM requirements. My other machines are handling it just fine.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.