Code Monkey home page Code Monkey logo

quic / ai-hub-models Goto Github PK

View Code? Open in Web Editor NEW
214.0 10.0 31.0 254.85 MB

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

Home Page: https://aihub.qualcomm.com

License: BSD 3-Clause "New" or "Revised" License

Python 97.84% Shell 0.39% Java 1.77%
deeplearning demos inference inference-api inference-engine machine-learning machinelearning onnx pytorch qnn

ai-hub-models's Introduction

Qualcomm® AI Hub Models

Qualcomm® AI Hub Models

The Qualcomm® AI Hub Models are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) and ready to deploy on Qualcomm® devices.

  • Explore models optimized for on-device deployment of vision, speech, text, and genenrative AI.
  • View open-source recipes to quantize, optimize, and deploy these models on-device.
  • Browse through performance metrics captured for these models on several devices.
  • Access the models through Hugging Face.
  • Sign up to run these models on hosted Qualcomm® devices.

Supported runtimes

Supported operating systems:

  • Android 11+

Supported compute units

Supported precision

  • Floating Points: FP16
  • Integer: INT8 (8-bit weight and activation on select models), INT4 (4-bit weight, 16-bit activation on select models)

Supported chipsets

Select supported devices

  • Samsung Galaxy S21 Series, Galaxy S22 Series, Galaxy S23 Series, Galaxy S24 Series
  • Xiaomi 12, 13
  • Google Pixel 3, 4, 5

and many more.

Installation

We currently support Python >=3.8 and <= 3.10. We recommend using a Python virtual environment (miniconda or virtualenv).

You can setup a virtualenv using:

python -m venv qai_hub_models_env && source qai_hub_models_env/bin/activate

Once the environment is setup, you can install the base package using:

pip install qai_hub_models

Some models (e.g. YOLOv7) require additional dependencies. You can install those dependencies automatically using:

pip install "qai_hub_models[yolov7]"

Getting Started

Each model comes with the following set of CLI demos:

  • Locally runnable PyTorch based CLI demo to validate the model off device.
  • On-device CLI demo that produces a model ready for on-device deployment and runs the model on a hosted Qualcomm® device (needs sign up).

All the models produced by these demos are freely available on Hugging Face or through our website. See the individual model readme files (e.g. YOLOv7) for more details.

Local CLI Demo with PyTorch

All models contain CLI demos that run the model in PyTorch locally with sample input. Demos are optimized for code clarity rather than latency, and run exclusively in PyTorch. Optimal model latency can be achieved with model export via Qualcomm® AI Hub.

python -m qai_hub_models.models.yolov7.demo

For additional details on how to use the demo CLI, use the --help option

python -m qai_hub_models.models.yolov7.demo --help

See the model directory below to explore all other models.


Note that most ML use cases require some pre and post-processing that are not part of the model itself. A python reference implementation of this is provided for each model in app.py. Apps load & pre-process model input, run model inference, and post-process model output before returning it to you.

Here is an example of how the PyTorch CLI works for YOLOv7:

from PIL import Image
from qai_hub_models.models.yolov7 import Model as YOLOv7Model
from qai_hub_models.models.yolov7 import App as YOLOv7App
from qai_hub_models.utils.asset_loaders import load_image
from qai_hub_models.models.yolov7.demo import IMAGE_ADDRESS

# Load pre-trained model
torch_model = YOLOv7Model.from_pretrained()

# Load a simple PyTorch based application
app = YOLOv7App(torch_model)
image = load_image(IMAGE_ADDRESS, "yolov7")

# Perform prediction on a sample image
pred_image = app.predict(image)[0]
Image.fromarray(pred_image).show()

CLI demo to run on hosted Qualcomm® devices

Some models contain CLI demos that run the model on a hosted Qualcomm® device using Qualcomm® AI Hub.

To run the model on a hosted device, sign up for access to Qualcomm® AI Hub. Sign-in to Qualcomm® AI Hub with your Qualcomm® ID. Once signed in navigate to Account -> Settings -> API Token.

With this API token, you can configure your client to run models on the cloud hosted devices.

qai-hub configure --api_token API_TOKEN

Navigate to docs for more information.

The on-device CLI demo performs the following:

  • Exports the model for on-device execution.
  • Profiles the model on-device on a cloud hosted Qualcomm® device.
  • Runs the model on-device on a cloud hosted Qualcomm® device and compares accuracy between a local CPU based PyTorch run and the on-device run.
  • Downloads models (and other required assets) that can be deployed on-device in an Android application.
python -m qai_hub_models.models.yolov7.export

Many models may have initialization parameters that allow loading custom weights and checkpoints. See --help for more details

python -m qai_hub_models.models.yolov7.export --help

How does this export script work?

As described above, the script above compiles, optimizes, and runs the model on a cloud hosted Qualcomm® device. The demo uses Qualcomm® AI Hub's Python APIs.

Qualcomm® AI Hub explained

Here is a simplified example of code that can be used to run the entire model on a cloud hosted device:

from typing import Tuple
import torch
import qai_hub as hub
from qai_hub_models.models.yolov7 import Model as YOLOv7Model

# Load YOLOv7 in PyTorch
torch_model = YOLOv7Model.from_pretrained()
torch_model.eval()

# Trace the PyTorch model using one data point of provided sample inputs to
# torch tensor to trace the model.
example_input = [torch.tensor(data[0]) for name, data in torch_model.sample_inputs().items()]
pt_model = torch.jit.trace(torch_model, example_input)

# Select a device
device = hub.Device("Samsung Galaxy S23")

# Compile model for a specific device
compile_job = hub.submit_compile_job(
    model=pt_model,
    device=device,
    input_specs=torch_model.get_input_spec(),
)

# Get target model to run on a cloud hosted device
target_model = compile_job.get_target_model()

# Profile the previously compiled model on a cloud hosted device
profile_job = hub.submit_profile_job(
    model=target_model,
    device=device,
)

# Perform on-device inference on a cloud hosted device
input_data = torch_model.sample_inputs()
inference_job = hub.submit_inference_job(
    model=target_model,
    device=device,
    inputs=input_data,
)

# Returns the output as dict{name: numpy}
on_device_output = inference_job.download_output_data()

Working with source code

You can clone the repository using:

git clone https://github.com/quic/ai-hub-models/blob/main
cd main
pip install -e .

Install additional dependencies to prepare a model before using the following:

cd main
pip install -e ".[yolov7]"

All models have accuracy and end-to-end tests when applicable. These tests as designed to be run locally and verify that the PyTorch code produces correct results. To run the tests for a model:

python -m pytest --pyargs qai_hub_models.models.yolov7.test

For any issues, please contact us at [email protected].


LICENSE

Qualcomm® AI Hub Models is licensed under BSD-3. See the LICENSE file.


Model Directory

Computer Vision

Model README Torch App Device Export CLI Demo
Image Classification
ConvNext-Tiny qai_hub_models.models.convnext_tiny ✔️ ✔️ ✔️
DenseNet-121 qai_hub_models.models.densenet121 ✔️ ✔️ ✔️
EfficientNet-B0 qai_hub_models.models.efficientnet_b0 ✔️ ✔️ ✔️
GoogLeNet qai_hub_models.models.googlenet ✔️ ✔️ ✔️
GoogLeNetQuantized qai_hub_models.models.googlenet_quantized ✔️ ✔️ ✔️
Inception-v3 qai_hub_models.models.inception_v3 ✔️ ✔️ ✔️
Inception-v3-Quantized qai_hub_models.models.inception_v3_quantized ✔️ ✔️ ✔️
MNASNet05 qai_hub_models.models.mnasnet05 ✔️ ✔️ ✔️
MobileNet-v2 qai_hub_models.models.mobilenet_v2 ✔️ ✔️ ✔️
MobileNet-v2-Quantized qai_hub_models.models.mobilenet_v2_quantized ✔️ ✔️ ✔️
MobileNet-v3-Large qai_hub_models.models.mobilenet_v3_large ✔️ ✔️ ✔️
MobileNet-v3-Large-Quantized qai_hub_models.models.mobilenet_v3_large_quantized ✔️ ✔️ ✔️
MobileNet-v3-Small qai_hub_models.models.mobilenet_v3_small ✔️ ✔️ ✔️
RegNet qai_hub_models.models.regnet ✔️ ✔️ ✔️
ResNeXt101 qai_hub_models.models.resnext101 ✔️ ✔️ ✔️
ResNeXt101Quantized qai_hub_models.models.resnext101_quantized ✔️ ✔️ ✔️
ResNeXt50 qai_hub_models.models.resnext50 ✔️ ✔️ ✔️
ResNeXt50Quantized qai_hub_models.models.resnext50_quantized ✔️ ✔️ ✔️
ResNet101 qai_hub_models.models.resnet101 ✔️ ✔️ ✔️
ResNet101Quantized qai_hub_models.models.resnet101_quantized ✔️ ✔️ ✔️
ResNet18 qai_hub_models.models.resnet18 ✔️ ✔️ ✔️
ResNet18Quantized qai_hub_models.models.resnet18_quantized ✔️ ✔️ ✔️
ResNet50 qai_hub_models.models.resnet50 ✔️ ✔️ ✔️
Shufflenet-v2 qai_hub_models.models.shufflenet_v2 ✔️ ✔️ ✔️
Shufflenet-v2Quantized qai_hub_models.models.shufflenet_v2_quantized ✔️ ✔️ ✔️
SqueezeNet-1_1 qai_hub_models.models.squeezenet1_1 ✔️ ✔️ ✔️
SqueezeNet-1_1Quantized qai_hub_models.models.squeezenet1_1_quantized ✔️ ✔️ ✔️
Swin-Base qai_hub_models.models.swin_base ✔️ ✔️ ✔️
Swin-Small qai_hub_models.models.swin_small ✔️ ✔️ ✔️
Swin-Tiny qai_hub_models.models.swin_tiny ✔️ ✔️ ✔️
VIT qai_hub_models.models.vit ✔️ ✔️ ✔️
WideResNet50 qai_hub_models.models.wideresnet50 ✔️ ✔️ ✔️
WideResNet50-Quantized qai_hub_models.models.wideresnet50_quantized ✔️ ✔️ ✔️
Image Editing
AOT-GAN qai_hub_models.models.aotgan ✔️ ✔️ ✔️
LaMa-Dilated qai_hub_models.models.lama_dilated ✔️ ✔️ ✔️
Image Generation
StyleGAN2 qai_hub_models.models.stylegan2 ✔️ ✔️ ✔️
Super Resolution
ESRGAN qai_hub_models.models.esrgan ✔️ ✔️ ✔️
QuickSRNetLarge qai_hub_models.models.quicksrnetlarge ✔️ ✔️ ✔️
QuickSRNetLarge-Quantized qai_hub_models.models.quicksrnetlarge_quantized ✔️ ✔️ ✔️
QuickSRNetMedium qai_hub_models.models.quicksrnetmedium ✔️ ✔️ ✔️
QuickSRNetMedium-Quantized qai_hub_models.models.quicksrnetmedium_quantized ✔️ ✔️ ✔️
QuickSRNetSmall qai_hub_models.models.quicksrnetsmall ✔️ ✔️ ✔️
QuickSRNetSmall-Quantized qai_hub_models.models.quicksrnetsmall_quantized ✔️ ✔️ ✔️
Real-ESRGAN-General-x4v3 qai_hub_models.models.real_esrgan_general_x4v3 ✔️ ✔️ ✔️
Real-ESRGAN-x4plus qai_hub_models.models.real_esrgan_x4plus ✔️ ✔️ ✔️
SESR-M5 qai_hub_models.models.sesr_m5 ✔️ ✔️ ✔️
SESR-M5-Quantized qai_hub_models.models.sesr_m5_quantized ✔️ ✔️ ✔️
XLSR qai_hub_models.models.xlsr ✔️ ✔️ ✔️
XLSR-Quantized qai_hub_models.models.xlsr_quantized ✔️ ✔️ ✔️
Semantic Segmentation
DDRNet23-Slim qai_hub_models.models.ddrnet23_slim ✔️ ✔️ ✔️
DeepLabV3-Plus-MobileNet qai_hub_models.models.deeplabv3_plus_mobilenet ✔️ ✔️ ✔️
DeepLabV3-Plus-MobileNet-Quantized qai_hub_models.models.deeplabv3_plus_mobilenet_quantized ✔️ ✔️ ✔️
DeepLabV3-ResNet50 qai_hub_models.models.deeplabv3_resnet50 ✔️ ✔️ ✔️
FCN_ResNet50 qai_hub_models.models.fcn_resnet50 ✔️ ✔️ ✔️
FFNet-122NS-LowRes qai_hub_models.models.ffnet_122ns_lowres ✔️ ✔️ ✔️
FFNet-40S qai_hub_models.models.ffnet_40s ✔️ ✔️ ✔️
FFNet-40S-Quantized qai_hub_models.models.ffnet_40s_quantized ✔️ ✔️ ✔️
FFNet-54S qai_hub_models.models.ffnet_54s ✔️ ✔️ ✔️
FFNet-54S-Quantized qai_hub_models.models.ffnet_54s_quantized ✔️ ✔️ ✔️
FFNet-78S qai_hub_models.models.ffnet_78s ✔️ ✔️ ✔️
FFNet-78S-LowRes qai_hub_models.models.ffnet_78s_lowres ✔️ ✔️ ✔️
FFNet-78S-Quantized qai_hub_models.models.ffnet_78s_quantized ✔️ ✔️ ✔️
FastSam-S qai_hub_models.models.fastsam_s ✔️ ✔️ ✔️
FastSam-X qai_hub_models.models.fastsam_x ✔️ ✔️ ✔️
MediaPipe-Selfie-Segmentation qai_hub_models.models.mediapipe_selfie ✔️ ✔️ ✔️
SINet qai_hub_models.models.sinet ✔️ ✔️ ✔️
Segment-Anything-Model qai_hub_models.models.sam ✔️ ✔️ ✔️
Unet-Segmentation qai_hub_models.models.unet_segmentation ✔️ ✔️ ✔️
YOLOv8-Segmentation qai_hub_models.models.yolov8_seg ✔️ ✔️ ✔️
Object Detection
DETR-ResNet101 qai_hub_models.models.detr_resnet101 ✔️ ✔️ ✔️
DETR-ResNet101-DC5 qai_hub_models.models.detr_resnet101_dc5 ✔️ ✔️ ✔️
DETR-ResNet50 qai_hub_models.models.detr_resnet50 ✔️ ✔️ ✔️
DETR-ResNet50-DC5 qai_hub_models.models.detr_resnet50_dc5 ✔️ ✔️ ✔️
MediaPipe-Face-Detection qai_hub_models.models.mediapipe_face ✔️ ✔️ ✔️
MediaPipe-Hand-Detection qai_hub_models.models.mediapipe_hand ✔️ ✔️ ✔️
YOLOv8-Detection qai_hub_models.models.yolov8_det ✔️ ✔️ ✔️
YOLOv8-Detection-Quantized qai_hub_models.models.yolov8_det_quantized ✔️ ✔️ ✔️
Yolo-v6 qai_hub_models.models.yolov6 ✔️ ✔️ ✔️
Yolo-v7 qai_hub_models.models.yolov7 ✔️ ✔️ ✔️
Yolo-v7-Quantized qai_hub_models.models.yolov7_quantized ✔️ ✔️ ✔️
Pose Estimation
HRNetPose qai_hub_models.models.hrnet_pose ✔️ ✔️ ✔️
LiteHRNet qai_hub_models.models.litehrnet ✔️ ✔️ ✔️
MediaPipe-Pose-Estimation qai_hub_models.models.mediapipe_pose ✔️ ✔️ ✔️
OpenPose qai_hub_models.models.openpose ✔️ ✔️ ✔️

Audio

Model README Torch App Device Export CLI Demo
Speech Recognition
HuggingFace-WavLM-Base-Plus qai_hub_models.models.huggingface_wavlm_base_plus ✔️ ✔️ ✔️
Whisper-Base-En qai_hub_models.models.whisper_base_en ✔️ ✔️ ✔️
Whisper-Small-En qai_hub_models.models.whisper_small_en ✔️ ✔️ ✔️
Whisper-Tiny-En qai_hub_models.models.whisper_tiny_en ✔️ ✔️ ✔️
Audio Enhancement
Facebook-Denoiser qai_hub_models.models.facebook_denoiser ✔️ ✔️ ✔️

Multimodal

Model README Torch App Device Export CLI Demo
TrOCR qai_hub_models.models.trocr ✔️ ✔️ ✔️
OpenAI-Clip qai_hub_models.models.openai_clip ✔️ ✔️ ✔️

Generative Ai

Model README Torch App Device Export CLI Demo
Image Generation
ControlNet qai_hub_models.models.controlnet_quantized ✔️ ✔️ ✔️
Stable-Diffusion qai_hub_models.models.stable_diffusion_quantized ✔️ ✔️ ✔️
Text Generation
Baichuan-7B qai_hub_models.models.baichuan_7b_quantized ✔️ ✔️ ✔️
Llama-v2-7B-Chat qai_hub_models.models.llama_v2_7b_chat_quantized ✔️ ✔️ ✔️

ai-hub-models's People

Contributors

kory avatar mynameistechno avatar qaihm-bot avatar quic-ppant avatar quic-rneti avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ai-hub-models's Issues

How to quantize LLM to INT4?

I want to quantize my llama-finetuned model to INT4 and deploy it on my 8 gen3 device. But I don't know how to do it. So when will we have a tutorial?

Image_Classification apk is crashed after device runtime is selected

Describe the bug
Follow the instruction in "apps/android/ImageClassification" and classification-debug.apk is built successfully.
Installed the apk on Samsung Galaxy S21 Ultra.
Sample image can be selected and it will be displayed on UI.
After selecting CPU or NPU runtime, Image_Classification app is crashed.

Question
I only have Samsung Galaxy S21 Ultra now and I can't test this apk on others phone.
Is this example app only limited to certain chipsets?
I saw GPU is used not NPU with python -m qai_hub_models.models.mobilenet_v3_small.export --device "Samsung Galaxy S21 Ultra".
Any changes need to be done for this case?
Thank you.

Feature Request: Bundle Multiple Models into a Pipeline for Easy Download and Usage

Context

We want to run the ControlNet model on Android devices.

Problem

Currently, users interested in utilizing the [TextEncoder, Unet, ControlNet, VAEDecoder] models as a pipeline need to download each model separately and manually connect them later. This process can be cumbersome and inefficient, especially for users unfamiliar with model integration.

Proposed Solution

It would be beneficial to provide a bundled option where users can download all the necessary models as one package and easily integrate them into their workflow without the need for manual connections. Also, a comprehensive notebook with demos would be highly appreciated! Of course, it would be best if we could use StableDiffusionControlNetPipeline.

[BUG] Controlnet models are providing noisy image outputs

Describe the bug
The controlnet pipeline is providing me noisy image output, in general it does its thing, but the quality of the image makes it unsable. Prompt: "photo, mid century modern, natural light, pink walls". See the input/output images below:
Input:
image

Outputs with various seed/guidance scale settings:
image_dpmsolver++_23_2048775510 20_7 5
image_dpmsolver++_23_6126326530 61_7 5
image_dpmsolver++_23_10000000000 00_7 5
image_dpmsolver++_35_2660408163 27_12 5

To Reproduce
Try to run the demo application with the following settings:

  • dpmsolver++ scheduler
  • 23 or 35 steps
  • guidance scale 7.5 or 12.5
  • seed: between 1e7 - 1e10

I get this noise in every image, the image quality is nowhere compared to the original float controlnet model in here: https://github.com/AUTOMATIC1111/stable-diffusion-webui

Expected behavior
Have a comparable output with the original float controlnet model from here: https://github.com/Mikubill/sd-webui-controlnet
Something like this:
00001-1497947079

Host configuration:

  • OS and version: Win 11
  • QAI-Hub-Models version: latest controlnet

Am I missing something, shall I use another scheduler with different options or is there a better quantized model for controlnet?

Execute quantized model on TFLite with QCS6490

Hi all, I have an AI-BOX with Ubuntu 20.04 from a Qualcomm OEM/ODM with the QCS6490 chipset.

I used the AI Hub website to quantize a YoloV7 model to .tflite model and I'd like to perform the model inference on the QCS6490 device mentioned above.

This is the code that I'm using for:

import numpy as np
import tensorflow as tf

# Load your TFLite model
# Replace 'model.tflite' with the path to your actual model
tflite_model_path = 'yolov7.tflite'
tflite_model = open(tflite_model_path, 'rb').read()

# Set up the TFLite interpreter
# To use the Hexagon DSP with TensorFlow Lite, you would typically need to
# build the TensorFlow Lite Hexagon delegate. However, this script assumes
# that the delegate is already available and part of the TFLite runtime.
interpreter = tf.lite.Interpreter(
    model_content=tflite_model,
    experimental_delegates=[tf.lite.experimental.load_delegate('libhexagon_delegate.so')]
)
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test the model on random input data.
input_shape = input_details[0]['shape']

print(f"[INFO] Input Shape = {input_shape}")

input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()

# Get the output of the model
output_data = interpreter.get_tensor(output_details[0]['index'])
print(output_data)

And my question is where can I find, or where do I download the libhexagon_delegate.so library.

Create ControlNet catalog

Context

We want to run the ControlNet model on Android devices.

Problem

It is not clear what condition was used for ControlNet. It would be very kind of you if you could specify what type of condition was used (Canny Edge, M-LSD Lines, Scribbles, etc.)

Proposed Solution

Make a catalog of quantized ControlNet models for each type of condition with the corresponding description.

[Feature Request] 16:9 ratio input image for controlnet

I would like to use the controlnet(https://github.com/quic/ai-hub-models/blob/main/qai_hub_models/models/controlnet_quantized/README.md) model pipeline with a 16:9 resolution input image, but currently the image resolution of this pipeline is fixed to 512x512

Describe the solution you'd like
Either having a variable input/output image WxH would be the best solution, but a fixed resolution model would make it also, for example 960x540

Describe alternatives you've considered
I considered resizing the input 16:9 image, but that distorts the image unfortunately.

[Feature Request] Whisper Prompting feature

Is your feature request related to a problem? Please describe.
In the application I am considering, the recognition of technical terms is fundamental to guarantee the release of a successful solution.
However, ASR models in general have difficulties in recognizing very specific terms.

OpenAI Whisper allows to feed a prompt to the decoder which makes use of a simple language model. The prompt can be used to help stitch together multiple audio segments or as a spelling guide to improve the recognition of specific terms, and it proved to be very useful.

Reference: https://cookbook.openai.com/examples/whisper_prompting_guide

Describe the solution you'd like
It would be very useful to update the Whisper model to accept another input which is a set of decoder_input_ids representing the prompt.
An image of the idea behind this can be found at the following link: openai/whisper#117

Describe alternatives you've considered
The recognition of specific terms can be improved using other strategies such as fine-tuning, but prompting is a much easier and faster alternative in many cases.

Additional context
This feature is already supported in HuggingFace Transformers (huggingface/transformers#22395)

Other useful links:

[MODEL REQUEST] requesting Depth-Anything model

Details of model being requested

Additional context for requested model
Real-time depth estimation on live camera video streams holds immense potential for various industries, offering immersive experiences and enabling innovative applications, from augmented reality to autonomous navigation.

[BUG] 3x Conv-3 Layers runs out of memory on Hub

This bug is being filed based on the discussion with Manuel Kolmet in AI Hub Models slack community. https://qualcomm-ai-hub.slack.com/archives/C06LT6T3REY/p1709827335261829

Bug report: A fairly simply model (3x Conv-3 layers) runs out of memory when converted through the model hub but works fine out of the qnn-pytorch-converter.
The model was created as below, TorchScript is attached.
model = nn.Sequential(
nn.Conv2d(1, 64, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2d(64, 64, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2d(64, 1, kernel_size=3, padding=1),
)
When running the model with qnn-net-run I get
qnn-net-run pid:22832
WARNING: linker: Warning: unable to normalize "$/data/local/tmp/QNN-2.19" (ignoring)
WARNING: linker: Warning: unable to normalize "$/data/local/tmp/QNN-2.19" (ignoring)
Graph Finalize failure
Our own runner tool which prints all debug output shows
2022-06-12 21:52:57.097 - E/QNN-RUNNER-CALLBACK: fa_alloc.cc:3747:ERROR:graph requires estimated allocation of 4176043 KB, limit is 2097152 KB

2022-06-12 21:52:57.098 - E/QNN-RUNNER-CALLBACK: graph_prepare.cc:638:ERROR:error during serialize: memory usage too large

2022-06-12 21:52:57.098 - E/QNN-RUNNER-CALLBACK: graph_prepare.cc:5512:ERROR:Serialize error: memory usage too large

2022-06-12 21:52:57.098 - E/QNN-RUNNER-CALLBACK: Weight Offset (0) + Weight data (0) sizes != total pickle size (712704) !!

2022-06-12 21:52:57.098 - E/QNN-RUNNER-CALLBACK: Error getting size and offsets of weights

2022-06-12 21:52:57.491 - E/QNN-RUNNER-CALLBACK: Failed to initialize graph memory

2022-06-12 21:52:57.491 - E/QNN-RUNNER-CALLBACK: Failed to finalize graph input_model with err: 6020

2022-06-12 21:52:57.491 - E/QNN-RUNNER-CALLBACK: Failed to finalize graph (id: 1) with err 6020

2022-06-12 21:52:57.491 - E/QNN-RUNNER: Graph Finalize failure
The job I've used to convert the model is here: https://app.aihub.qualcomm.com/jobs/jz5763ng3/

Failure to Trace MiDaS Model

i try to use ai-hub on midas ( a very populare model to do image to depth map ) but from what i understand it doesn't support to be trace
here the code i use

import torch
import urllib.request

# Load MiDaS model
model_type = "DPT_Large"     # MiDaS v3 - Large     (highest accuracy, slowest inference speed)
#model_type = "DPT_Hybrid"   # MiDaS v3 - Hybrid    (medium accuracy, medium inference speed)
#model_type = "MiDaS_small"  # MiDaS v2.1 - Small   (lowest accuracy, highest inference speed)
midas = torch.hub.load("intel-isl/MiDaS", model_type)

# Trace MiDaS model
input_shape = (1, 3, 384, 384)  # Adjust input shape as needed
example_input = torch.rand(input_shape)
traced_midas = torch.jit.trace(midas, example_input)

# Optimize model for the chosen device
device = hub.Device("Samsung Galaxy S23 Ultra")
compile_job = hub.submit_compile_job(
    model=traced_midas,
    name="MyMiDaSModel",
    device=device,
    input_specs=dict(image=input_shape),
)

# Run the model on a hosted device
profile_job = hub.submit_profile_job(
    model=compile_job.get_target_model(),
    device=device,
)

here the log of the execution :

C:\Users\iphone/.cache\torch\hub\intel-isl_MiDaS_master\midas\backbones\vit.py:22: TracerWarning: Using len to get tensor shape might cause the trace to be incorrect. Recommended usage would be tensor.shape[0]. Passing a tensor of different shape might lead to errors or silently give incorrect results.
  gs_old = int(math.sqrt(len(posemb_grid)))
Traceback (most recent call last):
  File "C:\Users\iphone\midas_qualcomm\test.py", line 14, in <module>
    traced_midas = torch.jit.trace(midas, example_input)
  File "C:\Users\iphone\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\jit\_trace.py", line 794, in trace
    return trace_module(
  File "C:\Users\iphone\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\jit\_trace.py", line 1056, in trace_module
    module._c._create_method_from_trace(
  File "C:\Users\iphone\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\iphone\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1488, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "C:\Users\iphone/.cache\torch\hub\intel-isl_MiDaS_master\midas\dpt_depth.py", line 166, in forward
    return super().forward(x).squeeze(dim=1)
  File "C:\Users\iphone/.cache\torch\hub\intel-isl_MiDaS_master\midas\dpt_depth.py", line 114, in forward
    layers = self.forward_transformer(self.pretrained, x)
  File "C:\Users\iphone/.cache\torch\hub\intel-isl_MiDaS_master\midas\backbones\vit.py", line 13, in forward_vit
    return forward_adapted_unflatten(pretrained, x, "forward_flex")
  File "C:\Users\iphone/.cache\torch\hub\intel-isl_MiDaS_master\midas\backbones\utils.py", line 99, in forward_adapted_unflatten
    nn.Unflatten(
  File "C:\Users\iphone\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\flatten.py", line 110, in __init__
    self._require_tuple_int(unflattened_size)
  File "C:\Users\iphone\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\flatten.py", line 133, in _require_tuple_int
    raise TypeError("unflattened_size must be tuple of ints, " +
TypeError: unflattened_size must be tuple of ints, but found element of type Tensor at pos 0

did i do something wrong ?

Performance issue

Hello, when I utilize NPU(HTP aka. cDSP) on Snapdragon 8 gen3, I meet some performance problems.
Here is the details.
When we use qhblas_hvx_ah_matrix_vector_mpy_ab in Hexagon SDK qhl_hvx library, we find that it is much slower than directly computing with CPU Arm Neon. The result is shown as below.
cDSP(NPU) [13008, 5120] * [5120, 1] 45ms
CPU [13008, 5120] * [5120, 1] 10ms
After that, we set the power mode to performance mode, the cDSP execution time is a little faster, but still slower than CPU.
cDSP(NPU) [13008, 5120] * [5120, 1] 36ms
CPU [13008, 5120] * [5120, 1] 10ms
I want to know if the result is correct and is compatible with your tests? Looking forward to get your response.

Exceptions while trying to run Whisper ASR example

I was trying to run the Whisper ASR example.
However, I encountered some issues which I would like to point out.

  1. After running the following lines:
    pip install "qai_hub_models[whisper_asr]"
    python -m qai_hub_models.models.whisper_asr.demo
    
    an exception is raised as reported in #15.
  2. Solving that as reported in the same issue and proceeding with the export commands python -m qai_hub_models.models.whisper_asr.export raises another exception which is equivalent to the one before and that can be fixed in a similar manner inside the source code.
  3. Then, trying to execute the command again gives another exception during the profiling step:
     qai_hub/client.py", line 1097, in _determine_model_type
     raise UserError(qai_hub.client.UserError: Unsupported model type. The following types are supported
         - TorchScript: Extension .pt or .pth
         - Core ML Model: Extension .mlmodel
         - Compiled Core ML Model: Extension .mlmodelc or .mlmodelc.zip
         - Tensorflow Lite: Extension .tflite
         - ONNX: Extension .onnx
         - ONNX (ORT model format): Extension .ort
         - QNN Binary: Extension .bin
         - AIMET Model: Directory ending with .aimet
    
  4. Trying to skip the profiling operation ( python -m qai_hub_models.models.whisper_asr.export --skip-profiling) can of course avoid that the previous exception is encountered, but gives another exception similar to the first one:
    qai_hub_models/utils/base_model.py", line 187, in sample_inputs
     inputs_dict[input_name] = [inputs_list[i].numpy()]
    TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
    

Why is it not possible to proceed with the profiling operation? The same issue is also encountered when trying to run the inference step, but I guess that these two problems are due to the same reason.
I hope that the information I reported can be helpful.

Question on LLama 7B Q4 Metrics

Metrics on LLama2 7B is confusing. Can you provide clarification on what the metrics represent.
For example if I am interested in these metrics:

  1. Time to first token (for input of a given token length)
  2. Token per second (same input length)
    How do I extrapolate that from the metrics on that page?

Does this mean the input prompt was maxed out and it corresponds to 8.48 token/sec:

  • Max context length:1024
  • Prompt processor input:1024 tokens
  • Llama-TokenGenerator-KVCache-Quantized: 8.48 token/s

What does this mean, the output is 1 token?

  • Prompt processor output:1 output token + KVCache for token generator

Also, are there the memory metrics for this model?

Thanks you!

[BUG] Model converted from TorchScript gets stuck on qnn-net-run

Describe the bug
I have a model with only layers like Conv2d, LeakyReLU, Sigmoids and element-wise multiplications. The model converts successfully, here is an example job: https://app.aihub.qualcomm.com/jobs/jegn9nk5o/
However, running the model on device (Vivo X90 Pro+ with 8550 chipset) through qnn-net-run gets stuck.
If I run the model through the qnn-pytorch-converter it converts successfully and I can run it on device.

TorchScript can be downloaded here: https://drive.google.com/file/d/19Y2H8TyqEsJ_QbNISDBA98b_6RCYa38k/view?usp=share_link

To Reproduce

  • Download model from link above.
  • Run code to convert:
compile_job = qai_hub.submit_compile_job(
    model="checkpoints/Mock.pt",
    device=qai_hub.Device("Samsung Galaxy S23+"),
    options="--target_runtime qnn_lib_aarch64_android",
    input_specs=dict(image=x.shape),
)
compile_job.download_target_model("checkpoints/")
  • Create demo input with np.random.rand(256,256,1).astype(np.float32).tofile("input.raw") and a input list with echo input.raw > input.txt
  • Copy the model, demo input and input list to device
  • Run the model with qnn-net-run --backend libQnnHtp.so --model job_jegn9nk5o_optimized_so_m7n1evpm5.so --input_list input.txt

Expected behavior
The qnn-net-run command should exit after a few seconds, generating an output/Result_0 folder.

Stack trace
Stack trace with cancellation after a minute:

./qnn-net-run --backend libQnnHtp.so --model res/job_jegn9nk5o_optimized_so_m7n1evpm5.so --input_list res/input.txt                                  
qnn-net-run pid:11906
WARNING: linker: Warning: unable to normalize "$/data/local/tmp/QNN-2.19" (ignoring)
WARNING: linker: Warning: unable to normalize "$/data/local/tmp/QNN-2.19" (ignoring)
^C

Host configuration:

  • QAI-Hub client version: '0.9.0'

Incorrect formats

I believe that this repo is the source for the models under the Qualcomm huggingface account? If not feel free to close this issue as it does not pertain

I was going through some models and noticed that although the file extension is .tflite some files are actually of the ELF format

❯ head -c 20 ResNet50.tflite
ELF�%                

Repos that are fine:

  • VIT

Repos where i noticed the tflite files are actually ELF:

  • ResNet50
  • ResNext
  • Real-ESRGAN-x4plus
  • FFNet-40S

[BUG] Model fails in ONNX export with relu6 op not supported

This bug is being filed in conjunction with the discussion on slack between Adam and Kory. https://qualcomm-ai-hub.slack.com/archives/C06KMD15QH4/p1709746420588749?thread_ts=1709570637.688689&cid=C06KMD15QH4

Compile job on hub for quantized model failed with the following error message "Failure occurred in the Torch to ONNX export: Exporting the operator 'quantized::relu6' to ONNX opset version 17 is not supported.". How can I get rid of it?

https://app.aihub.qualcomm.com/jobs/jn5qww457/

An update will be shared here once available

[BUG] Model with element-wise select / torch.where() fails to finalize

Describe the bug
If a model contains the QNN ElementWiseSelect operation, equivalent to torch.where(), it converts successfully but the graph fails to finalize on device or in the emulator.

To Reproduce
Steps to reproduce the behavior:

class Where(nn.Module):
    def __init__(self):
        super(Where, self).__init__()
        self.conv1 = nn.Conv2d(3, 3, kernel_size=3, padding=1)

    def forward(self, x):
        mask = x > 0.5
        y = x - 1.0
        x = torch.where(mask, x, y)
        x = self.conv1(x)
        return x

x = torch.rand(1, 3, 16, 16)
y = Where()(x)

Expected behavior
After conversion, the model should run on the emulator and device as expected.

Stack trace
Output from qnn-net-run:

qnn-net-run pid:21299
WARNING: linker: Warning: unable to normalize "$/data/local/tmp/QNN-2.19" (ignoring)
WARNING: linker: Warning: unable to normalize "$/data/local/tmp/QNN-2.19" (ignoring)
Graph Finalize failure

Host configuration:

  • QAI-Hub-Models version: aihub-2024.03.07.0
  • QAI-Hub client version: 0.9.0

Additional context
As a heads-up: The operation also fails to finalize when converting through the qnn-pytorch-converter or going creating the model "manually" in C++.

[BUG] whisper_asr demo Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

Describe the bug
I followed the guidance in the following document , installed: pip install "qai_hub_models[whisper_asr]", and after running the CLI demo: python -m qai_hub_models.models.whisper_asr.demo, I encountered an error:

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

To Reproduce
Steps to reproduce the behavior:

  1. pip install "qai_hub_models[whisper_asr]
  2. python -m qai_hub_models.models.whisper_asr.demo

[BUG] openAI-Clip demo failed on cuda machine

On AI Hub Models Slack, Hu Eric shared that openAI-Clip demo failed for him. https://qualcomm-ai-hub.slack.com/archives/C06LT6T3REY/p1709470194079099

Kory initially took a look. It seems like this has something to do with CUDA availability. This bug is being filed to look into the initially reported bug.

(qai_hub) a19284@njai-ubuntu:~/workspace/qai-hub-clip$ python -m qai_hub_models.models.openai_clip.demo
Traceback (most recent call last):
File "/home/a19284/mambaforge/envs/qai_hub/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/a19284/mambaforge/envs/qai_hub/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/a19284/mambaforge/envs/qai_hub/lib/python3.8/site-packages/qai_hub_models/models/openai_clip/demo.py", line 98, in
main()
File "/home/a19284/mambaforge/envs/qai_hub/lib/python3.8/site-packages/qai_hub_models/models/openai_clip/demo.py", line 72, in main
predictions = app.predict_similarity(images, text).flatten()
File "/home/a19284/mambaforge/envs/qai_hub/lib/python3.8/site-packages/qai_hub_models/models/openai_clip/app.py", line 64, in predict_similarity
image_features = self.image_encoder(image)
File "/home/a19284/mambaforge/envs/qai_hub/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/a19284/mambaforge/envs/qai_hub/lib/python3.8/site-packages/qai_hub_models/models/openai_clip/model.py", line 134, in forward
image_features = self.net.encode_image(image)
File "/home/a19284/.qaihm/models/openai_clip/v1/openai_CLIP_git/clip/model.py", line 341, in encode_image
return self.visual(image.type(self.dtype))
File "/home/a19284/mambaforge/envs/qai_hub/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(input, **kwargs)
File "/home/a19284/.qaihm/models/openai_clip/v1/openai_CLIP_git/clip/model.py", line 224, in forward
x = self.conv1(x) # shape = [
, width, grid, grid]
File "/home/a19284/mambaforge/envs/qai_hub/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/a19284/mambaforge/envs/qai_hub/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/a19284/mambaforge/envs/qai_hub/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument weight in method wrapper___slow_conv2d_forward)

Benchmarks for QCS610 and QCS6490

Hi, I'd like to know if there are any benchmarks done on QCS610 and QCS6490. Those chipsets are available on some Qualcomm OEM/ODM vendors.

No such file or directory: '/opt/qcom/aistack/qairt/2.21.0.240401/lib/android/qtld-release.aar'

ai-hub-models-main\apps\android\ImageClassification\classification

Traceback (most recent call last):
File "build_apk.py", line 119, in
shutil.copy(aarfile, destaar)
File "/usr/lib/python3.8/shutil.py", line 418, in copy
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "/usr/lib/python3.8/shutil.py", line 264, in copyfile
with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/qcom/aistack/qairt/2.21.0.240401/lib/android/qtld-release.aar'

To Reproduce
Steps to reproduce the behavior:

  1. Go to : ai-hub-models-main\apps\android\ImageClassification\classification
  2. ' python build_apk.py -q $QNN_SDK_PATH -m mobilenet_v3_large'
  3. Do you want us to download the model from AI hub (y/n)n
  4. Click on 'N'
    5.Give model File as input./mobilenet_v3.tflite
  5. Traceback (most recent call last):
    File "build_apk.py", line 119, in
    shutil.copy(aarfile, destaar)
    File "/usr/lib/python3.8/shutil.py", line 418, in copy
    copyfile(src, dst, follow_symlinks=follow_symlinks)
    File "/usr/lib/python3.8/shutil.py", line 264, in copyfile
    with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst:
    FileNotFoundError: [Errno 2] No such file or directory: '/opt/qcom/aistack/qairt/2.21.0.240401/lib/android/qtld-release.aar'

Host configuration:

  • OS and version: [Ubuntu20.04_WindowsWSL2, ]
  • qairt: 2.21.0.24040

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.