Code Monkey home page Code Monkey logo
  • 👋 Hi, I’m @shaojiewang
  • 👀 I’m interested in AI computing on CPU/GPU/DSP
  • 🌱 I’m currently learning GPU computing and AI compiler
  • 💞️ I’m looking to collaborate on GPU computing
  • 📫 How to reach me tel/wechat:18317533864

Shaojie WANG's Projects

aitemplate icon aitemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

collect_perf_data icon collect_perf_data

Collect performance data for CK/MISA/MIOpen to fast create presentation sheet.

composable_kernel icon composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

cpu_gemm_opt icon cpu_gemm_opt

how to design cpu gemm on x86 with avx256, that can beat openblas.

cutlass icon cutlass

CUDA Templates for Linear Algebra Subroutines

fbgemm icon fbgemm

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

gemm-vega64 icon gemm-vega64

Implement asm gemm on vega64 for 4096x4096 fp32 matrix

gpubenchmark icon gpubenchmark

A performance benchmark for GPGPU or GPU based AIChips.

incubator-tvm icon incubator-tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

kaggle_wsj icon kaggle_wsj

这里是我用来编写一些卡狗题目的代码。kaggle==卡狗

llama icon llama

Inference code for LLaMA models

llm-awq icon llm-awq

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

llvm-project icon llvm-project

This is the canonical git mirror of the LLVM subversion repository. The repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.

lmdeploy icon lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

megatron-lm icon megatron-lm

Ongoing research training transformer models at scale

onnxruntime icon onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

paddle icon paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

pytorch icon pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

rccl icon rccl

ROCm Communication Collectives Library (RCCL)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.