shaojiewang Goto Github PK

followers: 10.0 following: 21.0 repos: 46.0 gists: 0.0

Name: Shaojie WANG

Type: User

Company: AMD

Bio: GPU high perf computing & AI compiler

Location: shanghai

👋 Hi, I’m @shaojiewang
👀 I’m interested in AI computing on CPU/GPU/DSP
🌱 I’m currently learning GPU computing and AI compiler
💞️ I’m looking to collaborate on GPU computing
📫 How to reach me tel/wechat:18317533864

Shaojie WANG's Projects

ait_learn

learn aitemplate code

aitemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

awesome-tensor-compilers

A list of awesome compiler projects and papers for tensor computation and deep learning.

buy_now_script

buy things from taobao web

ck-fa-bwd-dev

collect_perf_data

Collect performance data for CK/MISA/MIOpen to fast create presentation sheet.

composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

cpu_gemm_opt

how to design cpu gemm on x86 with avx256, that can beat openblas.

cutlass

CUDA Templates for Linear Algebra Subroutines

fastertransformer

Transformer related optimization, including BERT, GPT

fbgemm

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

gcnasm

gemm-vega64

Implement asm gemm on vega64 for 4096x4096 fp32 matrix

gpu_analyze_helper

Help to check gpu kernel's shared mem

gpu_image_processing

gpu coding practice

gpubenchmark

A performance benchmark for GPGPU or GPU based AIChips.

hip-performance-optmization-on-vega64

14 basic topics for VEGA64 performance optmization

hopper-gpu-inst-peak

incubator-tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

kaggle_wsj

这里是我用来编写一些卡狗题目的代码。kaggle==卡狗

lihang-code

《统计学习方法》的代码实现

llama

Inference code for LLaMA models

llm-awq

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

llm.c

LLM training in simple, raw C/CUDA

llvm-project

This is the canonical git mirror of the LLVM subversion repository. The repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

megatron-lm

Ongoing research training transformer models at scale

multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）

shaojiewang Goto Github PK

Shaojie WANG's Projects

Recommend Projects

Recommend Topics

Recommend Org