Code Monkey home page Code Monkey logo

tt-metal's Introduction

ttnn logo

TT-NN is python & C++ Neural Network OP library.


Grayskull (GS) Models

Model Batch End-to-end throughput [1] Device throughput [2] Target
ResNet-50 (fps) 20 5,100 7,700 10,000
BERT-Large (sen/s) 12 370 406 410
Falcon7B-decode (t/s) 32 135 135 140
ViT (fps) 8 860 1570 2000
T5 small (sen/s) 140
Bloom (sen/s) 70
U-Net coming soon

[1] - Observed from the host. Includes dispatch overhead and kernel execution time.

[2] - Ignoring host overhead. Kernel execution time only.

Wormhole (WH) Models

Note

All model demos in this table function on both N150 and N300 Wormhole cards, unless otherwise stated.

Model Gen. Token [3] Batch End-to-end throughput [1] Device throughput [2] Target
Falcon7B-decode 129th 32 11.6 t/s/u - 371 t/s 15.4 t/s/u - 493 t/s 21
Mistral-7B-decode 33rd 32 11.7 t/s/u - 374 t/s 16.7 t/s/u - 538 t/s 21
Mamba-2.8B-decode any 32 9.6 t/s/u - 307 t/s 15.8 t/s/u - 506 t/s 22
BERT-Large (sen/s) [4] 8 270 340 400
Stable Diffusion 1.4 512x512 (sec/img) 1 6 5

[1] - Observed from the host. Includes dispatch overhead and kernel execution time.

[2] - Ignoring host overhead. Kernel execution time only.

[3] - Generating the i'th token in a sequence while the kv_cache is filled with i-1 rows.

[4] - This model demo does not work on N150. It does work on N300.

T3000 (2x4 mesh of WHs) Models

Model Technique Gen. Token [3] Batch End-to-end throughput [1] Device throughput [2] Target
Falcon7B-decode Data Parallel 129th 256 4.4 t/s/u - 1114 t/s coming soon 21 t/s/u
LLaMA-2-70B-decode Tensor Parallel 129th 32 8.5 t/s/u - 272 t/s 13.9 t/s/u - 445 t/s 20 t/s/u
LLaMA-3-70B-decode Tensor Parallel 129th 32 8.1 t/s/u - 257 t/s 13.9 t/s/u - 445 t/s 20 t/s/u
Falcon40B-decode Tensor Parallel 129th 32 1.5 t/s/u - 48 t/s 14.0 t/s/u - 448 t/s 30 t/s/u
Mixtral7Bx8-decode Tensor Parallel 129th 32 7.0 t/s/u - 225 t/s 27.0 t/s/u - 864 t/s 28 t/s/u
ResNet50 Data Parallel coming soon

Using TT-NN ops and tensors

import ttnn
import torch

with ttnn.manage_device(device_id=0) as device:
   a = torch.ones((5, 7))
   b = torch.ones((1, 7))

   a = ttnn.from_torch(a, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)
   b = ttnn.from_torch(b, device=device, dtype=ttnn.bfloat16, layout=ttnn.TILE_LAYOUT)

   output = a + b
   output = ttnn.to_torch(output)

print(output)

TT-Metalium logo

TT-Metalium is our low-level programming model, enabling kernel development for Tenstorrent hardware.

Getting started

Get started with simple kernels.

tt-metal's People

Contributors

tt-rkim avatar tt-aho avatar tt-billteng avatar arakhmati avatar pgkeller avatar abhullar-tt avatar tt-brianliu avatar drjessop avatar nemanjagrujic avatar mywoodstock avatar tt-nshanker avatar umadevimcw avatar tt-dma avatar yugaott avatar tarafdartt avatar dongjin-na avatar muthutt avatar mo-tenstorrent avatar banekg avatar aliutt avatar ashayestehmanesh avatar kkwong10 avatar davorchap avatar npetrovic-tenstorrent avatar mikevin920 avatar caixunshiren avatar hschoi4448 avatar farbabi avatar vtangtt avatar acejkov avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.