Code Monkey home page Code Monkey logo

google-ai-edge / litert Goto Github PK

View Code? Open in Web Editor NEW
270.0 6.0 20.0 49.51 MB

LiteRT is the new name for TensorFlow Lite (TFLite). While the name is new, it's still the same trusted, high-performance runtime for on-device AI, now with an expanded vision.

Home Page: https://ai.google.dev/edge/litert

License: Apache License 2.0

Shell 0.38% Python 4.77% Dockerfile 0.01% Jupyter Notebook 1.84% Java 1.53% Starlark 3.63% CMake 0.38% C 2.09% C++ 76.03% Objective-C 0.44% Objective-C++ 0.73% Ruby 0.02% C# 0.04% HTML 7.87% CSS 0.01% Swift 0.22% Makefile 0.01%

litert's Introduction

LiteRT

GitHub repository for Google's open-source high-performance runtime for on-device AI which has been renamed from TensorFlow Lite to LiteRT.

More details of the LiteRT announcement are in this blog post.

The official documentation can be found at https://ai.google.dev/edge/litert

PyPi Installation Requirements

  • Python versions: 3.9, 3.10, 3.11, 3.12
  • Operating system: Linux, MacOS

FAQs

  1. How do I contribute code?

    For now, please contribute code to the existing TensorFlow Lite repository.

  2. What is happening to the .tflite file extension and file format?

    No changes are being made to the .tflite file extension or format. Conversion tools will continue to output .tflite flatbuffer files, and .tflite files will be readable by LiteRT.

  3. How do I convert models to .tflite format?

    For Tensorflow, Keras and Jax you can continue to use the same flows. For PyTorch support check out ai-edge-torch.

  4. Will there be any changes to classes and methods?

    No. Aside from package names, you won't have to change any code you've written for now.

  5. Is TensorFlow Lite still being actively developed?

    Yes, but under the name LiteRT. Active development will continue on the runtime (now called LiteRT), as well as the conversion and optimization tools. To ensure you're using the most up-to-date version of the runtime, please use LiteRT.

Build From Source with Bazel

  1. From the LiteRT root folder (where this README file is), run

    git submodule init && git submodule update --remote
    

    to make sure you have the latest version of the tensorflow submodule.

  2. You will need docker, but nothing else. Create the image with

    docker build . -t tflite-builder -f ci/tflite-py3.Dockerfile
    
    # Confirm the container was created with
    docker image ls
    
  3. Run bash inside the container

    docker run -it -w /host_dir -v $PWD:/host_dir -v $HOME/.cache/bazel:/root/.cache/bazel tflite-builder bash
    

    where -v $HOME/.cache/bazel:/root/.cache/bazel is optional, but useful to map your Bazel cache into the container.

  4. Run configure script, use default settings for this example.

    ./configure
    
  5. Build and run e.g. //tflite:interpreter_test

    bazel test //tflite:interpreter_test
    

litert's People

Contributors

ai-edge-bot avatar lukeboyer avatar junjiang-lab avatar terryheo avatar ecalubaquib avatar qukhan avatar yunanaz avatar gaikwadrahul8 avatar fcounda avatar turbotoribio avatar fergushenderson avatar alankelly avatar weilhuan-quic avatar niuchl avatar zichuan-wei avatar whhone avatar v-dziuba avatar majiddadashi avatar pak-laura avatar cantonios avatar sallenkey-wei avatar pasweistorz avatar misterbart avatar kwoncy2020 avatar fiberflow avatar chunhsue avatar c8ef avatar sirakiin avatar ototot avatar tilakrayal avatar

Stargazers

21091397026_auizaldi avatar  avatar  avatar omar avatar Bastien Gares avatar Richardn avatar Tom Yea avatar  avatar Jair F Ochoa Davila avatar Niklas Arlt avatar NordicFlyer1 avatar Kevin Gleason avatar Jong-Chan Kim avatar Ibad Desmukh avatar  avatar Felix Schröter avatar hyw avatar Vincent Casey avatar Philip avatar Hugo-Dz avatar Daiki Nishikawa avatar  avatar  avatar zhuoy avatar Qiyuan Gong avatar Hieu Nguyen avatar Kishore Shanto avatar Jamieson Pryor avatar ALVIN avatar Matas Seimys avatar Paul Yun avatar codezjx avatar  avatar  avatar Sébastien Demanou avatar  avatar  avatar chun avatar  avatar Charles Cai avatar mizore avatar Darren Wihandi avatar  avatar Math avatar meteaure studios avatar Joson avatar Carson Anick avatar stakOverflow avatar Charles Richardson avatar  avatar  avatar Jonathan Beri avatar George Igwegbe avatar John Doe avatar Parth Desai avatar Mihai Maruseac avatar JUNG JIHYEOK avatar  avatar Valentin Berlier avatar Tao avatar 电线杆 avatar GeonhaPark avatar Pitsanu Kittipittayakorn avatar PouRia SA avatar Florian Rathgeber avatar Suraj avatar  avatar  avatar smaniches avatar  avatar  avatar jacky avatar 喵哩个咪 avatar lulu avatar Salvador Díaz Fau avatar Octavio Rodríguez Castañeda avatar Yanjia avatar Anas Saeed avatar  avatar  avatar  avatar Nate avatar Nobuharu Shimazu avatar Fengshuo Yang avatar Deemok avatar Tfiyuen Lau avatar stochastic-sisyphus avatar Leo Donati avatar  avatar xx_q avatar chengenbao avatar Jaemin Noh avatar  avatar  avatar sheng avatar Jasdeep Khalsa avatar Irem Paul Chinonso avatar  avatar Tianze Ds avatar sungerk avatar

Watchers

Mihai Maruseac avatar Jong-Chan Kim avatar Arian Arfaian avatar  avatar  avatar  avatar

litert's Issues

Python Wheel generation of TensorFlow Lite 2.17 for ARMv7l 32 bits not working

I have previously crosscompiled different versions of TensorFlow lite (2.14, 2.15.1, 2.16.2) for Python 3.10 using CMAKE and following the instructions from the website .

So far the only changes required were to just set my armhf flags as:

echo "ARMCC_FLAGS=\"-march=armv7-a -mfpu=neon-vfpv3 -funsafe-math-optimizations \
adjust the python version in the MakeFile (tensorflow/lite/tools/pip_package/Makefile) and run the make command as:

make -C tensorflow/lite/tools/pip_package docker-build \ TENSORFLOW_TARGET=armhf PYTHON_VERSION=3.10
Now, I have tried the same approach for the release 2.17.0 ( ad6d8cc ) and although the build is executed and the wheel is generated without errors, I keep getting the following errorat the moment of importing the interpreter:

python3 simpletest.py 
Traceback (most recent call last):
  File "/mnt/simpletest.py", line 2, in <module>
    import tflite_runtime.interpreter as tflite
  File "/usr/local/lib/python3.10/dist-packages/tflite_runtime/interpreter.py", line 33, in <module>
    from tflite_runtime import _pywrap_tensorflow_interpreter_wrapper as _interpreter_wrapper
ImportError: /usr/local/lib/python3.10/dist-packages/tflite_runtime/_pywrap_tensorflow_interpreter_wrapper.so: undefined symbol: TfLiteXNNPackDelegateOptionsDefault

TFLITE: Execution on GPU delegate gives runtime error with no CPU fallback

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

tf 2.14

Custom code

Yes

OS platform and distribution

aarch64 linux

Mobile device

No response

Python version

python 3.10.9

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

I am using an aarch64 device similar to raspberry pi running tf 2.14.
I installed the latest version of tflite_runtime using pip3 install tflite_runtime which installed v2.14

I have a tflite model sourced from here: https://github.com/usefulsensors/openai-whisper/blob/main/models/whisper.tflite
which works well on CPU but when I try to execute it on GPU or NNAPI tflite delegate, I get runtime error and no other error log accompanying it.

The error snippet is below:

INFO: Created TensorFlow Lite delegate for GPU.
Traceback (most recent call last):
  File "/home/root/whisper_interpreter1.py", line 19, in <module>
    interpreter = tflite.Interpreter(args.model, experimental_delegates=[tflite.load_delegate('gpu_external_delegate.so')], num_threads=args.threads)
  File "/usr/lib/python3.10/site-packages/tflite_runtime/interpreter.py", line 513, in __init__
    self._interpreter.ModifyGraphWithDelegate(
RuntimeError

the code I am using is similar to the one mentioned in this comment: tensorflow/tensorflow#59273 (comment)

I checked the model support using model Analyzer

import tensorflow as tf
tf.lite.experimental.Analyzer.analyze(model_path='whisper.tflite',
                                      gpu_compatibility=True)

and i get the output:

GPU COMPATIBILITY WARNING: Not supported op WHILE

GPU COMPATIBILITY WARNING: Subgraph#0 has GPU delegate compatibility issues at nodes 357, 358, 359, 360, 361, 362, 694 on TFLite runtime version 2.15.0

the entire log is attached:
model_analyzer_log.txt

Not all ops in this model are supported in GPU but other ops are supported. My understanding is that model ops which are not supported on the delegate should fallback onto CPU. But instead of falling back, I end up getting RUNTIME ERROR. Why are unsupported ops not falling back onto CPU instead?

Are unsupported ops not falling back onto CPU by default in TFLite?

Standalone code to reproduce the issue

import os
from timeit import default_timer as timer
import wave
import argparse
import tflite_runtime.interpreter as tflite
import numpy as np
import whisper
import re

parser = argparse.ArgumentParser(description="Running Whisper TFlite test inference.")
parser.add_argument("-f", "--folder", default="./test_wavs/", help="Folder with WAV input files")
parser.add_argument("-m", "--model", default="models/whisper.tflite", help="Path to model")
parser.add_argument("-t", "--threads", type=int, default=2, help="Threads used")
args = parser.parse_args()

interpreter = tflite.Interpreter(args.model, experimental_delegates=[tflite.load_delegate('gpu_external_delegate.so')], num_threads=args.threads)
interpreter.allocate_tensors()
input_tensor = interpreter.get_input_details()[0]['index']
output_tensor = interpreter.get_output_details()[0]['index']
wtokenizer = whisper.tokenizer.get_tokenizer(False, language="en")

def transcribe(audio_file):
  wf = wave.open(audio_file, "rb")
  if (wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE" or wf.getframerate() != 16000):
    print("Audio file must be WAV format mono PCM.")
    exit (1)
    wf.close()

  mel_from_file = whisper.audio.log_mel_spectrogram(audio_file)
  input_data = whisper.audio.pad_or_trim(mel_from_file, whisper.audio.N_FRAMES)
  input_data = np.expand_dims(input_data, 0)

  interpreter.set_tensor(input_tensor, input_data)
  interpreter.invoke()
  output_data = interpreter.get_tensor(output_tensor)

  for token in output_data:
    token[token == -100] = wtokenizer.eot
    text = wtokenizer.decode([t for t in token if t not in wtokenizer.special_tokens])
    

  _re_special = re.compile(r"\<\|.+?\|\>")
  def strip_special_tokens(string):
    return re.sub(_re_special, "", string)

  print(strip_special_tokens(text))



test_files = os.listdir(args.folder)
for file in test_files:
  if file.endswith(".wav"):
    print(file)
    inference_start = timer()
    transcribe(args.folder + file)
    print("\nInference took {:.3}s".format(timer() - inference_start))

Relevant log output

No response

INT8 quantization

Does liteRT support int8 quantization? Does int8 quantization support quantizing only the specified layer?

Unable to Force-load TensorFlowLiteSelectTfOps.framework, created with Selective Build, in iOS

IDE: Xcode 15
Platform: iOS17
TensorFlow version: r2.9

I am developing both iOS and Android apps that are running with TensorFlow Lite model. Because my model uses LSTM, I have to make use of TFSelectOps.

In addition, because the TensorFlowLiteSelectTFOps library is large in memory size, I have to do a selective build. After much effort, I have succeeded in making my TensorFlowLite model running smoothly on the Android app based on the selectively built libraries.

On the iOS however, I used a bazel build:

bash tensorflow/lite/ios/build_frameworks.sh
--input_models=model1.tflite,model2.tflite
--target_archs=x86_64,armv7,arm64

to generate the:

  1. TensorFlowLiteSelectTfOps.framework
  2. TensorFlowLiteC.framework

After that I edited the TensorFlowLiteSwift.podspec to include the framework:

Pod::Spec.new do |s|
  s.name             = 'TensorFlowLiteSwift'
  s.version          = '2.14.0'
  s.authors          = 'Google Inc.'
  s.license          = { :type => 'Apache' }
  s.homepage         = 'https://github.com/tensorflow/tensorflow'
  s.source           = { :git => 'https://github.com/tensorflow/tensorflow.git', :tag => "v#{s.version}" }
  s.summary          = 'TensorFlow Lite for Swift'
  s.description      = <<-DESC

  TensorFlow Lite is TensorFlow's lightweight solution for Swift developers. It
  enables low-latency inference of on-device machine learning models with a
  small binary size and fast performance supporting hardware acceleration.
                       DESC

  s.ios.deployment_target = '17.0'

  s.module_name = 'TensorFlowLite'
  s.static_framework = true

  tfl_dir = 'tensorflow/lite/'
  swift_dir = tfl_dir + 'swift/'

  s.default_subspec = 'Core'

  s.subspec 'Core' do |core|
    # Adjust the path to point to your custom frameworks
    core.vendored_frameworks = [
      'frameworks/TensorFlowLiteC.framework',
      'frameworks/TensorFlowLiteSelectTfOps.framework'
    ]
    
    core.source_files = swift_dir + 'Sources/*.swift'
    core.exclude_files = swift_dir + 'Sources/{CoreML,Metal}Delegate.swift'

    core.test_spec 'Tests' do |ts|
      ts.source_files = swift_dir + 'Tests/*.swift'
      ts.exclude_files = swift_dir + 'Tests/MetalDelegateTests.swift'
      ts.resources = [
        tfl_dir + 'testdata/add.bin',
        tfl_dir + 'testdata/add_quantized.bin',
      ]
    end
  end

  s.subspec 'CoreML' do |coreml|
    coreml.source_files = swift_dir + 'Sources/CoreMLDelegate.swift'
    coreml.dependency 'TensorFlowLiteSwift/Core', "#{s.version}"
  end

  s.subspec 'Metal' do |metal|
    metal.source_files = swift_dir + 'Sources/MetalDelegate.swift'
    metal.dependency 'TensorFlowLiteSwift/Core', "#{s.version}"

    metal.test_spec 'Tests' do |ts|
      ts.source_files = swift_dir + 'Tests/{Interpreter,MetalDelegate}Tests.swift'
      ts.resources = [
        tfl_dir + 'testdata/add.bin',
        tfl_dir + 'testdata/add_quantized.bin',
        tfl_dir + 'testdata/multi_add.bin',
      ]
    end
  end
end

I then do in the Podfile:

pod 'TensorFlowLiteSwift', :path => '../../local-podspecs/TensorFlowLiteSwift.podspec'

Then in terminal:

pod install

In my Other Linker Flags, i put:
-force_load $(SRCROOT)/local-podspecs/frameworks/TensorFlowLiteSelectTfOps.framework/TensorFlowLiteSelectTfOps

But when I do build, i get lots of errors below:

FacialRecognition
Undefined symbol: google::protobuf::TextFormat::PrintToString(google::protobuf::Message const&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator>*)

Undefined symbol: google::protobuf::TextFormat::Parse(google::protobuf::io::ZeroCopyInputStream*, google::protobuf::Message*)

Undefined symbol: google::protobuf::DoubleValue::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)

Undefined symbol: google::protobuf::DoubleValue::MergeFrom(google::protobuf::DoubleValue const&)

Undefined symbol: google::protobuf::DoubleValue::DoubleValue(google::protobuf::DoubleValue const&)

Undefined symbol: google::protobuf::MessageLite::ParseFromArray(void const*, int)

Undefined symbol: google::protobuf::MessageLite::ParseFromString(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator> const&)

Undefined symbol: google::protobuf::MessageLite::ParseFromCodedStream(google::protobuf::io::CodedInputStream*)

Undefined symbol: google::protobuf::RepeatedField::Swap(google::protobuf::RepeatedField*)

Undefined symbol: google::protobuf::RepeatedField::Reserve(int)

Undefined symbol: google::protobuf::RepeatedField::~RepeatedField()

Undefined symbol: google::protobuf::RepeatedField::Swap(google::protobuf::RepeatedField*)

Undefined symbol: google::protobuf::RepeatedField::Reserve(int)

Undefined symbol: google::protobuf::RepeatedField::~RepeatedField()

Undefined symbol: google::protobuf::RepeatedField::Swap(google::protobuf::RepeatedField*)

Undefined symbol: google::protobuf::RepeatedField::Reserve(int)

Undefined symbol: google::protobuf::RepeatedField::~RepeatedField()

Undefined symbol: google::protobuf::RepeatedField::Swap(google::protobuf::RepeatedField*)

Undefined symbol: google::protobuf::RepeatedField::Reserve(int)

Undefined symbol: google::protobuf::RepeatedField::~RepeatedField()

Undefined symbol: google::protobuf::RepeatedField::Swap(google::protobuf::RepeatedField*)

Undefined symbol: google::protobuf::RepeatedField::Reserve(int)

Undefined symbol: google::protobuf::RepeatedField::~RepeatedField()

Undefined symbol: google::protobuf::RepeatedField::Swap(google::protobuf::RepeatedField*)

Undefined symbol: google::protobuf::RepeatedField::Reserve(int)

Undefined symbol: google::protobuf::RepeatedField::~RepeatedField()

Undefined symbol: google::protobuf::RepeatedField::Swap(google::protobuf::RepeatedField*)

Undefined symbol: google::protobuf::RepeatedField::Reserve(int)

Undefined symbol: google::protobuf::RepeatedField::~RepeatedField()

Undefined symbol: google::protobuf::FieldDescriptor::TypeOnceInit(google::protobuf::FieldDescriptor const*)

Undefined symbol: google::protobuf::FieldDescriptor::kCppTypeToName

Undefined symbol: google::protobuf::FieldDescriptor::kTypeToCppTypeMap

Undefined symbol: google::protobuf::UnknownFieldSet::ClearFallback()

Undefined symbol: google::protobuf::UnknownFieldSet::MergeFrom(google::protobuf::UnknownFieldSet const&)

Undefined symbol: google::protobuf::RepeatedPtrField<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator>>::~RepeatedPtrField()

Undefined symbol: google::protobuf::Any_default_instance

Undefined symbol: google::protobuf::BoolValue_default_instance

Undefined symbol: google::protobuf::io::CodedInputStream::SkipFallback(int, int)

Undefined symbol: google::protobuf::io::CodedInputStream::ReadTagFallback(unsigned int)

Undefined symbol: google::protobuf::io::CodedInputStream::ReadVarint32Fallback(unsigned int)

Undefined symbol: google::protobuf::io::CodedInputStream::ReadVarint64Fallback()

Undefined symbol: google::protobuf::io::CodedInputStream::GetDirectBufferPointer(void const**, int*)

Undefined symbol: google::protobuf::io::CodedInputStream::default_recursion_limit_

Undefined symbol: google::protobuf::io::CodedInputStream::ReadLittleEndian32Fallback(unsigned int*)

Undefined symbol: google::protobuf::io::CodedInputStream::ReadLittleEndian64Fallback(unsigned long long*)

Undefined symbol: google::protobuf::io::CodedInputStream::ReadVarintSizeAsIntFallback()

Undefined symbol: google::protobuf::io::CodedInputStream::DecrementRecursionDepthAndPopLimit(int)

Undefined symbol: google::protobuf::io::CodedInputStream::IncrementRecursionDepthAndPushLimit(int)

Undefined symbol: google::protobuf::io::CodedInputStream::ReadRaw(void*, int)

Undefined symbol: google::protobuf::io::CodedInputStream::Refresh()

Undefined symbol: google::protobuf::io::CodedInputStream::PopLimit(int)

Undefined symbol: google::protobuf::io::CodedInputStream::PushLimit(int)

Undefined symbol: google::protobuf::io::CodedInputStream::~CodedInputStream()

Undefined symbol: google::protobuf::io::ArrayOutputStream::ArrayOutputStream(void*, int, int)

Undefined symbol: google::protobuf::io::CodedOutputStream::EnableAliasing(bool)

Undefined symbol: google::protobuf::io::CodedOutputStream::WriteVarint32SlowPath(unsigned int)

Undefined symbol: google::protobuf::io::CodedOutputStream::WriteVarint64SlowPath(unsigned long long)

Undefined symbol: google::protobuf::io::CodedOutputStream::WriteStringWithSizeToArray(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator> const&, unsigned char*)

Undefined symbol: google::protobuf::io::CodedOutputStream::CodedOutputStream(google::protobuf::io::ZeroCopyOutputStream*)

Undefined symbol: google::protobuf::io::CodedOutputStream::~CodedOutputStream()

Undefined symbol: google::protobuf::io::ZeroCopyOutputStream::WriteAliasedRaw(void const*, int)

Undefined symbol: google::protobuf::DoubleValue_default_instance

Undefined symbol: google::protobuf::Any::MergePartialFromCodedStream(google::protobuf::io::CodedInputStream*)

Undefined symbol: google::protobuf::Any::Clear()

Undefined symbol: google::protobuf::Any::MergeFrom(google::protobuf::Any const&)

Undefined symbol: google::protobuf::Any::Any(google::protobuf::Any const&)

Undefined symbol: google::protobuf::DoubleValue* google::protobuf::Arena::CreateMaybeMessagegoogle::protobuf::DoubleValue(google::protobuf::Arena*)

Undefined symbol: google::protobuf::Any* google::protobuf::Arena::CreateMaybeMessagegoogle::protobuf::Any(google::protobuf::Arena*)

Undefined symbol: google::protobuf::BoolValue* google::protobuf::Arena::CreateMaybeMessagegoogle::protobuf::BoolValue(google::protobuf::Arena*)

Undefined symbol: google::protobuf::Message::DiscardUnknownFields()

Undefined symbol: google::protobuf::Message::CheckTypeAndMergeFrom(google::protobuf::MessageLite const&)

Undefined symbol: google::protobuf::Message::CopyFrom(google::protobuf::Message const&)

Undefined symbol: google::protobuf::Message::MergeFrom(google::protobuf::Message const&)

Undefined symbol: google::protobuf::internal::LogMessage::LogMessage(google::protobuf::LogLevel, char const*, int)

Undefined symbol: google::protobuf::internal::LogMessage::~LogMessage()

Undefined symbol: google::protobuf::internal::LogMessage::operator<<(char const*)

Undefined symbol: google::protobuf::internal::LogMessage::operator<<(std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator> const&)

Undefined symbol: google::protobuf::internal::LogMessage::operator<<(long long)

Undefined symbol: google::protobuf::internal::NameOfEnum(google::protobuf::EnumDescriptor const*, int)

Undefined symbol: google::protobuf::internal::WireFormat::SerializeUnknownFields(google::protobuf::UnknownFieldSet const&, google::protobuf::io::CodedOutputStream*)

Undefined symbol: google::protobuf::internal::WireFormat::ComputeUnknownFieldsSize(google::protobuf::UnknownFieldSet const&)

Undefined symbol: google::protobuf::internal::WireFormat::SerializeUnknownFieldsToArray(google::protobuf::UnknownFieldSet const&, unsigned char*)

Undefined symbol: google::protobuf::internal::WireFormat::SkipField(google::protobuf::io::CodedInputStream*, unsigned int, google::protobuf::UnknownFieldSet*)

Undefined symbol: google::protobuf::internal::GenericSwap(google::protobuf::MessageLite*, google::protobuf::MessageLite*)

Undefined symbol: google::protobuf::internal::InitSCCImpl(google::protobuf::internal::SCCInfoBase*)

Undefined symbol: google::protobuf::internal::LogFinisher::operator=(google::protobuf::internal::LogMessage&)

Undefined symbol: google::protobuf::internal::MapFieldBase::SetMapDirty()

Undefined symbol: google::protobuf::internal::MapFieldBase::~MapFieldBase()

Undefined symbol: google::protobuf::internal::OnShutdownRun(void ()(void const), void const*)

Undefined symbol: google::protobuf::internal::ReflectionOps::Merge(google::protobuf::Message const&, google::protobuf::Message*)

Undefined symbol: google::protobuf::internal::VerifyVersion(int, int, char const*)

Undefined symbol: google::protobuf::internal::AddDescriptors(google::protobuf::internal::DescriptorTable const*)

Undefined symbol: google::protobuf::internal::DestroyMessage(void const*)

Undefined symbol: google::protobuf::internal::WireFormatLite::UInt32Size(google::protobuf::RepeatedField const&)

Undefined symbol: google::protobuf::internal::WireFormatLite::UInt64Size(google::protobuf::RepeatedField const&)

Undefined symbol: google::protobuf::internal::WireFormatLite::WriteBytes(int, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator> const&, google::protobuf::io::CodedOutputStream*)

Undefined symbol: google::protobuf::internal::WireFormatLite::WriteFloat(int, float, google::protobuf::io::CodedOutputStream*)

Undefined symbol: google::protobuf::internal::WireFormatLite::WriteInt32(int, int, google::protobuf::io::CodedOutputStream*)

Undefined symbol: google::protobuf::internal::WireFormatLite::WriteInt64(int, long long, google::protobuf::io::CodedOutputStream*)

Undefined symbol: google::protobuf::internal::WireFormatLite::WriteDouble(int, double, google::protobuf::io::CodedOutputStream*)

Linker command failed with exit code 1 (use -v to see invocation)

TfLite+OpenCL with many Interpreters crashes on Nvidia GPUs

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

Nightly, 2.16, 2.10

Custom code

No

OS platform and distribution

Windows 10 Pro & Home and Ubuntu 24.04 LTS (up to date)

Mobile device

No response

Python version

Irrelevant, C++ API is used

Bazel version

No response

GCC/compiler version

Microsoft Visual Studio 2022 C++ compiler (on Windows) and gcc 13.2 (on Ubuntu)

CUDA/cuDNN version

No response

GPU model and memory

Geforce RTX 2080 (8 GB), Geforce GT 940M (1 GB), Geforce GT 1030 (2 GB GDDR5)

Current behavior?

TfLite with OpenCL crashes on Nvidia GPUs if I use many Interpreters:
(1) Geforce RTX 2080, Windows, graphics driver version: 536.23. Our application consistently crashes after TfLite constructs the 36th Interpreter. Error message: "ERROR: Failed to create a compute context - Out of resources", originates from cl_context.cc. I noticed the number of Interpreters equals the number of CL contexts. Computer has 128 GB RAM.

(2) Geforce GT 940M, Windows, graphics driver version: 472.91. Application crashes after TfLite constructs the 40th or 41st Interpreter. "ERROR: Failed to create a compute context - Out of host memory". Computer (=host) has 8 GB RAM.

(3) Geforce GT 1030, Windows, graphics driver version: 536.23. Application crashes after TfLite constructs the 58th Interpreter. "ERROR: Failed to create a compute context - Out of host memory". Computer has 8 GB RAM.

(4) Geforce GT 1030, Ubuntu, graphics driver version: 470.256.02. Ubuntu Desktop Environment becomes very slow around 40 Interpreters, and the Ubuntu Desktop Environment crashes around 50 Interpreters. Computer has 8 GB RAM.

I measured CPU usage and maximum RAM usage (via /usr/bin/time -v ./application) of a minimal-working example:

num_interpreters = 10
    Percent of CPU this job got: 89%
    Maximum resident set size (kbytes): 1276484
num_interpreters = 20
    Percent of CPU this job got: 98%
    Maximum resident set size (kbytes): 2522700
num_interpreters = 30
    Percent of CPU this job got: 99%
    Maximum resident set size (kbytes): 3771720
num_interpreters = 40
    Percent of CPU this job got: 99%
    Maximum resident set size (kbytes): 5023688 = 4906 megabytes = 4.8 gigabytes

The CPU implementation (comment out #define TFLITE_GPU_ENABLE in minimal-working-example code below) uses very little RAM:

num_interpreters = 10
    Percent of CPU this job got: 100%
    Maximum resident set size (kbytes): 8064
num_interpreters = 20
    Percent of CPU this job got: 75%
    Maximum resident set size (kbytes): 8832
num_interpreters = 30
    Percent of CPU this job got: 100%
    Maximum resident set size (kbytes): 9344
num_interpreters = 40
    Percent of CPU this job got: 99%
    Maximum resident set size (kbytes): 9984 = 9.8 megabytes
num_interpreters = 50
    Percent of CPU this job got: 88%
    Maximum resident set size (kbytes): 11008

I believe cases (2-4) crash due to the same reason: insufficient host memory. Why does TfLite (or Nvidia) consume so much memory? (Could it be because of the many CL contexts created?)

No crashes on Intel GPUs.

Our application employs 10 DNN models. So 4 Interpreters per DNN model, which is not that many, is already too much for the RTX 2080.

I have been able to construct a minimal-working example to reproduce the issue (see below).

Standalone code to reproduce the issue

Build commands on Windows (in Command Prompt):
git clone --single-branch --branch nightly https://github.com/tensorflow/tensorflow tensorflow_src
mkdir tflite_x64_release
cd tflite_x64_release
cmake -G "Visual Studio 17 2022" -A x64 -DCMAKE_MSVC_RUNTIME_LIBRARY=MultiThreaded -DCMAKE_BUILD_TYPE=release -DTFLITE_ENABLE_GPU=ON ..\tensorflow_src\tensorflow\lite
cmake --build . -j 8 --config release

Build commands on Ubuntu:
git clone --single-branch --branch nightly https://github.com/tensorflow/tensorflow tensorflow_src
mkdir tflite_x64_release
cd tflite_x64_release
cmake -DCMAKE_BUILD_TYPE=release -DTFLITE_ENABLE_GPU=ON ../tensorflow_src/tensorflow/lite
cmake --build . -j 8 --config release


#define TFLITE_GPU_ENABLE
#include "tensorflow/lite/logger.h"
#include "tensorflow/lite/model.h"
#include "tensorflow/lite/kernels/register.h"
#include "tensorflow/lite/interpreter.h"
#include "tensorflow/lite/delegates/gpu/delegate.h"
#include <array>
#include <iostream>

int main() {
    tflite::LoggerOptions::SetMinimumLogSeverity(tflite::TFLITE_LOG_VERBOSE);

    std::unique_ptr<tflite::FlatBufferModel> model = tflite::FlatBufferModel::BuildFromFile("./lite-model_deeplabv3_1_metadata_2.tflite");  //Model from https://www.tensorflow.org/lite/examples/segmentation/overview
    tflite::ops::builtin::BuiltinOpResolver resolver;
    tflite::InterpreterBuilder interpreter_builder(*model, resolver);
    interpreter_builder.SetNumThreads(1);
    #ifdef TFLITE_GPU_ENABLE
    TfLiteDelegate* gpu_delegate = TfLiteGpuDelegateV2Create(nullptr);
    interpreter_builder.AddDelegate(gpu_delegate);
    #endif

    //Construct multiple Interpreters
    constexpr int num_interpreters = 50;
    std::array<std::unique_ptr<tflite::Interpreter>, num_interpreters> interpreters;
    for (int i = 0; i < num_interpreters; ++i) {
        interpreter_builder(&interpreters[i]);
    }

    std::cout << "Done\n";
    return EXIT_SUCCESS;
}

Relevant log output

No response

Build/release Python 3.11 tflite-runtime MacOS wheels to PyPI

Issue type

Feature Request

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

2.14.0

Custom code

Yes

OS platform and distribution

MacOs

Mobile device

No response

Python version

4.11

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

We are currently needing to run automated jobs on MacOS using tflite-runtime and Python 3.11, but the wheels on Pypi only cover Linux.

Standalone code to reproduce the issue

pip install tflite-runtime

Relevant log output

ERROR: Could not find a version that satisfies the requirement tflite-runtime (from versions: none)
ERROR: No matching distribution found for tflite-runtime

multiple tflite delegate support?

Issue type

Feature Request

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

tf2.13

Custom code

No

OS platform and distribution

No response

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

Currently if you use C API, you can invoke various delegates like: TfLiteHexagonDelegateCreate() for Hexagon, TfLiteGpuDelegateV2Create() for GPU, TfLiteXNNPackDelegateOptionsDefault() for CPU.
Note you can only use one at a time. For example, if you have model that has an OP not supported by Hexagon, that OP will be run on CPU, without using XNNPack.
However this is not the case with benchmark_model, where it applies XNNPack delegate as default, e.g. you can get both hexagon and xnnpack in one run.

Standalone code to reproduce the issue

I would like to see the C API incorporates benchmark_model code, or provide similar API (like create all delegates and pass down to a new API TfLiteDelegateCreate(<delegate>)

Relevant log output

No response

GPUv2 segfaults on split-head attention CLIP model

System information

  • Google Pixel 7 / Android 13 / Google Tensor G2
  • TFLite 2.16.1 (stock)

Standalone code to reproduce the issue

Model asset: tflite_66721_sha_clip_gpuv2_segfault.tflite

Run model through TFLite (GPUv2) on an Android device (for instance through benchmark tool).

Any other info / logs

Runtime log (executed on https://aihub.qualcomm.com/)

[30/Apr/2024:10:26:55 -07:00: profiler/info] -=- Tungsten Initializing -=-
[30/Apr/2024:10:26:55 -07:00: profiler/info] Android system property: ro.board.platform = gs201
[30/Apr/2024:10:26:55 -07:00: profiler/info] Android system property: ro.boot.hardware = panther
[30/Apr/2024:10:26:55 -07:00: profiler/info] Android system property: ro.boot.hardware.platform = gs201
[30/Apr/2024:10:26:55 -07:00: profiler/info] Android system property: ro.system.build.id = TQ1A.221205.011
[30/Apr/2024:10:26:55 -07:00: profiler/info] Android system property: ro.system.build.version.release = 13
[30/Apr/2024:10:26:55 -07:00: profiler/info] Android system property: ro.hardware = panther
[30/Apr/2024:10:26:55 -07:00: profiler/info] Android system property: ro.hardware.chipname = 
[30/Apr/2024:10:26:55 -07:00: profiler/info] Android system property: ro.product.board = panther
[30/Apr/2024:10:26:55 -07:00: profiler/info] Android system property: ro.product.brand = google
[30/Apr/2024:10:26:55 -07:00: profiler/info] Android system property: ro.product.device = panther
[30/Apr/2024:10:26:55 -07:00: profiler/info] Android system property: ro.product.build.fingerprint = google/panther/panther:13/TQ1A.221205.011/9244662:user/release-keys
[30/Apr/2024:10:26:55 -07:00: profiler/info] Android system property: ro.product.manufacturer = Google
[30/Apr/2024:10:26:55 -07:00: profiler/info] Android system property: ro.product.model = Pixel 7
[30/Apr/2024:10:26:55 -07:00: profiler/info] Android system property: ro.product.name = panther
[30/Apr/2024:10:26:55 -07:00: profiler/info] Android system property: ro.soc.manufacturer = Google
[30/Apr/2024:10:26:55 -07:00: profiler/info] Android system property: ro.soc.model = GS201
[30/Apr/2024:10:26:55 -07:00: profiler/info] [Manager] DeviceManager::DeviceManager
[30/Apr/2024:10:26:55 -07:00: profiler/info] [Manager] findAvailableDevices
[30/Apr/2024:10:26:55 -07:00: profiler/info] [Manager] Found interface google-edgetpu (version = 2.0)
[30/Apr/2024:10:26:55 -07:00: profiler/info] [Manager] Found interface google-armnn (version = ArmNN)
[30/Apr/2024:10:26:55 -07:00: profiler/info] NNAPI devices: google-edgetpu,google-armnn,nnapi-reference
[30/Apr/2024:10:26:55 -07:00: profiler/info] GPU device: ARM Mali-G710
[30/Apr/2024:10:26:55 -07:00: profiler/info] OpenGL Version: OpenGL ES 3.2 v1.r36p0-01eac0.1f36dec337e44918d811de9a8a2acf4d
[30/Apr/2024:10:26:55 -07:00: profiler/info] OpenCL Version: OpenCL C 1.2 v1.r36p0-01eac0.1f36dec337e44918d811de9a8a2acf4d
[30/Apr/2024:10:26:55 -07:00: profiler/info] -=- Tungsten Running Task: Loading -=-
[30/Apr/2024:10:26:55 -07:00: profiler/info] Detected chipset 3101, made by 3000.
[30/Apr/2024:10:26:55 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Loading tflite model Models/model.tflite
[30/Apr/2024:10:26:55 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Malloc VM size before: 24632.0 kB, allocated: 13796.0 kB, slack: 10836.0 kB.
[30/Apr/2024:10:26:55 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Current memory baseline range: 57552.0-68388.0 kB.
[30/Apr/2024:10:26:55 -07:00: profiler/debug] [job_id: jygz19nxp] [model.tflite] Runtime metadata not found in Models/model.tflite/trt_metadata.json or Models/model.tflite/trt_metadata.pb
[30/Apr/2024:10:26:55 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] TF Lite version 2.16.1. Loading model from Models/model.tflite.
[30/Apr/2024:10:26:55 -07:00: profiler/debug] [job_id: jygz19nxp] [model.tflite] Mapping resource file in Models/model.tflite
[30/Apr/2024:10:26:55 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Loaded model. Minimum TF Lite version = 2.3.0.
[30/Apr/2024:10:26:55 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] No delegates specified; using compute unit=cpu_and_gpu.
[30/Apr/2024:10:26:55 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] [tflite] Initialized TensorFlow Lite runtime.
[30/Apr/2024:10:26:55 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] GPUV2 delegate requested. OpenCL detected.
[30/Apr/2024:10:26:55 -07:00: profiler/debug] [job_id: jygz19nxp] [model.tflite] Enabling delegate cache in dir=/data/user/0/ai.tetra.tungsten/cache/1714498015468/ai.tetra.runtime/0.6.0/model.tflite_8945450969824422876_1714498006500/gpuv2.
[30/Apr/2024:10:26:55 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] [tflite] Created TensorFlow Lite delegate for GPU.
[30/Apr/2024:10:26:55 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] [tflite] Replacing 2003 out of 2003 node(s) with delegate (TfLiteGpuDelegateV2) node, yielding 1 partitions for the whole graph.
[30/Apr/2024:10:26:56 -07:00: profiler/warning] [job_id: jygz19nxp] [model.tflite] [tflite] File /data/user/0/ai.tetra.tungsten/cache/1714498015468/ai.tetra.runtime/0.6.0/model.tflite_8945450969824422876_1714498006500/gpuv2/gpuv2_1297717803319390986.bin couldn't be opened for reading: No such file or directory
[30/Apr/2024:10:27:00 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] [tflite] Initialized OpenCL-based API.
[30/Apr/2024:10:27:01 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] [tflite] Created 1 GPU delegate kernels.
[30/Apr/2024:10:27:01 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Applied 1 delegates: GPUV2/OpenCL. Model is fully delegated=true.
[30/Apr/2024:10:27:01 -07:00: profiler/debug] [job_id: jygz19nxp] [model.tflite] Saving delegate selection for subsequent steps.
[30/Apr/2024:10:27:01 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Malloc VM size after: 256412.0 kB, allocated: 233690.0 kB, slack: 22722.0 kB.
[30/Apr/2024:10:27:01 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Status Successfully Loaded Cold with t = 5381726 us and usage: before = 68388.0 kB; peakBefore = 68388.0 kB; mallocUnusedBefore = 10836.0 kB; after = 291732.0 kB; peakAfter = 805160.0 kB; mallocUnusedAfter = 22722.0 kB; increase = 200622.0-211458.0 kB; peak = 736772.0-747608.0 kB
[30/Apr/2024:10:27:01 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Saving results to /storage/emulated/0/Android/data/ai.tetra.tungsten/files/Results/job_jygz19nxp/job_jygz19nxp_results.bin
[30/Apr/2024:10:27:01 -07:00: profiler/info] -=- Tungsten Running Task: Loading -=-
[30/Apr/2024:10:27:01 -07:00: profiler/info] Detected chipset 3101, made by 3000.
[30/Apr/2024:10:27:01 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Loading previously saved results in /storage/emulated/0/Android/data/ai.tetra.tungsten/files/Results/job_jygz19nxp/job_jygz19nxp_results.bin
[30/Apr/2024:10:27:01 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Loading tflite model Models/model.tflite
[30/Apr/2024:10:27:01 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Malloc VM size before: 77880.0 kB, allocated: 16704.4 kB, slack: 61175.6 kB.
[30/Apr/2024:10:27:01 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Current memory baseline range: 25732.4-86908.0 kB.
[30/Apr/2024:10:27:01 -07:00: profiler/debug] [job_id: jygz19nxp] [model.tflite] Runtime metadata not found in Models/model.tflite/trt_metadata.json or Models/model.tflite/trt_metadata.pb
[30/Apr/2024:10:27:01 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] TF Lite version 2.16.1. Loading model from Models/model.tflite.
[30/Apr/2024:10:27:01 -07:00: profiler/debug] [job_id: jygz19nxp] [model.tflite] Mapping resource file in Models/model.tflite
[30/Apr/2024:10:27:01 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Loaded model. Minimum TF Lite version = 2.3.0.
[30/Apr/2024:10:27:01 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] GPUV2 delegate requested. OpenCL detected.
[30/Apr/2024:10:27:01 -07:00: profiler/debug] [job_id: jygz19nxp] [model.tflite] Enabling delegate cache in dir=/data/user/0/ai.tetra.tungsten/cache/1714498015468/ai.tetra.runtime/0.6.0/model.tflite_8945450969824422876_1714498006500/gpuv2.
[30/Apr/2024:10:27:01 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] [tflite] Replacing 2003 out of 2003 node(s) with delegate (TfLiteGpuDelegateV2) node, yielding 1 partitions for the whole graph.
[30/Apr/2024:10:27:02 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] [tflite] Found serialized data for model gpuv2 (175507208 B) at /data/user/0/ai.tetra.tungsten/cache/1714498015468/ai.tetra.runtime/0.6.0/model.tflite_8945450969824422876_1714498006500/gpuv2/gpuv2_1297717803319390986.bin
[30/Apr/2024:10:27:02 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] [tflite] Initialized OpenCL-based API from serialized data.
[30/Apr/2024:10:27:02 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] [tflite] Created 1 GPU delegate kernels.
[30/Apr/2024:10:27:02 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Applied 1 delegates: GPUV2/OpenCL. Model is fully delegated=true.
[30/Apr/2024:10:27:02 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Malloc VM size after: 252240.0 kB, allocated: 225091.0 kB, slack: 27149.0 kB.
[30/Apr/2024:10:27:02 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Status Successfully Loaded Warm with t = 1281645 us and usage: before = 86908.0 kB; peakBefore = 86908.0 kB; mallocUnusedBefore = 61175.6 kB; after = 283312.0 kB; peakAfter = 785988.0 kB; mallocUnusedAfter = 27149.0 kB; increase = 169255.0-230430.6 kB; peak = 699080.0-760255.6 kB
[30/Apr/2024:10:27:03 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Saving results to /storage/emulated/0/Android/data/ai.tetra.tungsten/files/Results/job_jygz19nxp/job_jygz19nxp_results.bin
[30/Apr/2024:10:27:03 -07:00: profiler/info] -=- Tungsten Running Task: Performing inference by layer -=-
[30/Apr/2024:10:27:03 -07:00: profiler/info] Detected chipset 3101, made by 3000.
[30/Apr/2024:10:27:03 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Loading previously saved results in /storage/emulated/0/Android/data/ai.tetra.tungsten/files/Results/job_jygz19nxp/job_jygz19nxp_results.bin
[30/Apr/2024:10:27:03 -07:00: profiler/debug] [job_id: jygz19nxp] [model.tflite] Starting profiler
[30/Apr/2024:10:27:03 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Loading tflite model Models/model.tflite
[30/Apr/2024:10:27:03 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Malloc VM size before: 77880.0 kB, allocated: 16961.8 kB, slack: 60918.2 kB.
[30/Apr/2024:10:27:03 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Current memory baseline range: 45341.8-106260.0 kB.
[30/Apr/2024:10:27:03 -07:00: profiler/debug] [job_id: jygz19nxp] [model.tflite] Runtime metadata not found in Models/model.tflite/trt_metadata.json or Models/model.tflite/trt_metadata.pb
[30/Apr/2024:10:27:03 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] TF Lite version 2.16.1. Loading model from Models/model.tflite.
[30/Apr/2024:10:27:03 -07:00: profiler/debug] [job_id: jygz19nxp] [model.tflite] Mapping resource file in Models/model.tflite
[30/Apr/2024:10:27:03 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Loaded model. Minimum TF Lite version = 2.3.0.
[30/Apr/2024:10:27:03 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] GPUV2 delegate requested. OpenCL detected.
[30/Apr/2024:10:27:03 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] [tflite] Replacing 2003 out of 2003 node(s) with delegate (TfLiteGpuDelegateV2) node, yielding 1 partitions for the whole graph.
[30/Apr/2024:10:27:07 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] [tflite] Created 1 GPU delegate kernels.
[30/Apr/2024:10:27:07 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Applied 1 delegates: GPUV2/OpenCL. Model is fully delegated=true.
[30/Apr/2024:10:27:07 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Malloc VM size after: 259724.0 kB, allocated: 243057.6 kB, slack: 16666.4 kB.
[30/Apr/2024:10:27:07 -07:00: profiler/info] [job_id: jygz19nxp] [model.tflite] Status Successfully Loaded Warm with t = 3769509 us and usage: before = 106260.0 kB; peakBefore = 106260.0 kB; mallocUnusedBefore = 60918.2 kB; after = 300360.0 kB; peakAfter = 635204.0 kB; mallocUnusedAfter = 16666.4 kB; increase = 177433.6-238351.8 kB; peak = 528944.0-589862.2 kB

The process ended because of a segmentation fault. Consult the runtime log for more details.
The following is the suspected stack trace.
 * ? (/vendor/lib64/egl/libGLES_mali.so)
 * ? (/vendor/lib64/egl/libGLES_mali.so)
 * ? (/vendor/lib64/egl/libGLES_mali.so)
 * ? (/vendor/lib64/egl/libGLES_mali.so)
 * ? (/vendor/lib64/egl/libGLES_mali.so)
 * clEnqueueNDRangeKernel (/vendor/lib64/egl/libGLES_mali.so)
 * tflite::gpu::cl::CLCommandQueue::Dispatch() (/data/app/~~58txPSlH_K9lY1T48CQEEw==/ai.tetra.tungsten-qGxzJmjd0hwWY5NXpr8n5A==/lib/arm64/libtungsten-native-bridge.so)
 * tflite::gpu::cl::ProfilingCommandQueue::DispatchNTimes() (/data/app/~~58txPSlH_K9lY1T48CQEEw==/ai.tetra.tungsten-qGxzJmjd0hwWY5NXpr8n5A==/lib/arm64/libtungsten-native-bridge.so)
 * tflite::gpu::cl::InferenceContext::ProfileTime() (/data/app/~~58txPSlH_K9lY1T48CQEEw==/ai.tetra.tungsten-qGxzJmjd0hwWY5NXpr8n5A==/lib/arm64/libtungsten-native-bridge.so)
 * tflite::gpu::cl::InferenceContext::Profile() (/data/app/~~58txPSlH_K9lY1T48CQEEw==/ai.tetra.tungsten-qGxzJmjd0hwWY5NXpr8n5A==/lib/arm64/libtungsten-native-bridge.so)
 * ? (/data/app/~~58txPSlH_K9lY1T48CQEEw==/ai.tetra.tungsten-qGxzJmjd0hwWY5NXpr8n5A==/lib/arm64/libtungsten-native-bridge.so)
 * ? (/data/app/~~58txPSlH_K9lY1T48CQEEw==/ai.tetra.tungsten-qGxzJmjd0hwWY5NXpr8n5A==/lib/arm64/libtungsten-native-bridge.so)
 * ? (/data/app/~~58txPSlH_K9lY1T48CQEEw==/ai.tetra.tungsten-qGxzJmjd0hwWY5NXpr8n5A==/lib/arm64/libtungsten-native-bridge.so)
 * tflite::Subgraph::InvokeImpl() (/data/app/~~58txPSlH_K9lY1T48CQEEw==/ai.tetra.tungsten-qGxzJmjd0hwWY5NXpr8n5A==/lib/arm64/libtungsten-native-bridge.so)
 * tflite::Subgraph::Invoke() (/data/app/~~58txPSlH_K9lY1T48CQEEw==/ai.tetra.tungsten-qGxzJmjd0hwWY5NXpr8n5A==/lib/arm64/libtungsten-native-bridge.so)
 * tflite::impl::Interpreter::Invoke() (/data/app/~~58txPSlH_K9lY1T48CQEEw==/ai.tetra.tungsten-qGxzJmjd0hwWY5NXpr8n5A==/lib/arm64/libtungsten-native-bridge.so)
 * backend::tflite::TfLiteModel::Run() (/data/app/~~58txPSlH_K9lY1T48CQEEw==/ai.tetra.tungsten-qGxzJmjd0hwWY5NXpr8n5A==/lib/arm64/libtungsten-native-bridge.so)
 * tungsten::Profiler::ProfileOrValidate() (/data/app/~~58txPSlH_K9lY1T48CQEEw==/ai.tetra.tungsten-qGxzJmjd0hwWY5NXpr8n5A==/lib/arm64/libtungsten-native-bridge.so)
 * tungsten::ProfilerRunner::ProfileModels() (/data/app/~~58txPSlH_K9lY1T48CQEEw==/ai.tetra.tungsten-qGxzJmjd0hwWY5NXpr8n5A==/lib/arm64/libtungsten-native-bridge.so)
 * tungsten::ProfilerRunner::RunTask() (/data/app/~~58txPSlH_K9lY1T48CQEEw==/ai.tetra.tungsten-qGxzJmjd0hwWY5NXpr8n5A==/lib/arm64/libtungsten-native-bridge.so)
 * Java_ai_tetra_tungsten_ProfilerRunner_profileModels (/data/app/~~58txPSlH_K9lY1T48CQEEw==/ai.tetra.tungsten-qGxzJmjd0hwWY5NXpr8n5A==/lib/arm64/libtungsten-native-bridge.so)
 * ? (/apex/com.android.art/lib64/libart.so)
 * ? (/apex/com.android.art/lib64/libart.so)
 * ? (/apex/com.android.art/lib64/libart.so)

tflite model maker installation issue in kaggle

Unable to install and use tflite model maker package in kaggle (gpu p100).Below is complete logs-
`The following NEW packages will be installed:
libportaudio2
0 upgraded, 1 newly installed, 0 to remove and 68 not upgraded.
Need to get 65.3 kB of archives.
After this operation, 223 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libportaudio2 amd64 19.6.0-1.1 [65.3 kB]
Fetched 65.3 kB in 0s (184 kB/s)

�7�8Selecting previously unselected package libportaudio2:amd64.
(Reading database ... 122996 files and directories currently installed.)
Preparing to unpack .../libportaudio2_19.6.0-1.1_amd64.deb ...
�7Progress: [ 0%] [..........................................................] �8�7Progress: [ 20%] [###########...............................................] �8Unpacking libportaudio2:amd64 (19.6.0-1.1) ...
�7Progress: [ 40%] [#######################...................................] �8Setting up libportaudio2:amd64 (19.6.0-1.1) ...
�7Progress: [ 60%] [##################################........................] �8�7Progress: [ 80%] [##############################################............] �8Processing triggers for libc-bin (2.35-0ubuntu3.8) ...

�7�8ERROR: Could not find a version that satisfies the requirement scann==1.2.6 (from tflite-model-maker) (from versions: 1.2.7, 1.2.8, 1.2.9, 1.2.10, 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4)
ERROR: No matching distribution found for scann==1.2.6 (from tflite-model-maker)
ERROR: Ignored the following yanked versions: 3.4.11.39, 3.4.11.41, 4.4.0.40, 4.4.0.42, 4.4.0.44, 4.5.5.62, 4.7.0.68, 4.8.0.74
ERROR: Could not find a version that satisfies the requirement opencv-python-headless==4.1.2.30 (from versions: 3.4.10.37, 3.4.11.43, 3.4.11.45, 3.4.13.47, 3.4.15.55, 3.4.16.59, 3.4.17.61, 3.4.17.63, 3.4.18.65, 4.3.0.38, 4.4.0.46, 4.5.1.48, 4.5.3.56, 4.5.4.58, 4.5.4.60, 4.5.5.64, 4.6.0.66, 4.7.0.72, 4.8.0.76, 4.8.1.78, 4.9.0.80, 4.10.0.82, 4.10.0.84)
ERROR: No matching distribution found for opencv-python-headless==4.1.2.30
Found existing installation: tensorflow 2.16.1
Uninstalling tensorflow-2.16.1:
Successfully uninstalled tensorflow-2.16.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow-decision-forests 1.9.1 requires tensorflow~=2.16.1, but you have tensorflow 2.8.0 which is incompatible.
tensorflow-serving-api 2.16.1 requires tensorflow<3,>=2.16.1, but you have tensorflow 2.8.0 which is incompatible.
tensorflow-text 2.16.1 requires tensorflow<2.17,>=2.16.1; platform_machine != "arm64" or platform_system != "Darwin", but you have tensorflow 2.8.0 which is incompatible.
tf-keras 2.16.0 requires tensorflow<2.17,>=2.16, but you have tensorflow 2.8.0 which is incompatible.`

I am using following code to install-

`!sudo apt -y install libportaudio2

!pip install -q --use-deprecated=legacy-resolver tflite-model-maker

!pip install -q pycocotools

!pip install -q opencv-python-headless==4.1.2.30

!pip uninstall -y tensorflow && pip install -q tensorflow==2.8.0`

TfLite: undefined symbol TfLiteGpuDelegateV2Create in android

Issue type

Support

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.15

Custom code

Yes

OS platform and distribution

Mac Big Sur

Mobile device

aarch64 device

Python version

No response

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

Hi,

I have a whisper-tflite model and android app as well sourced from https://github.com/nyadla-sys/whisper.tflite/tree/main/whisper_android which works well on CPU of an aarch64 target similar to raspberry pi.

I wanted to add support for GPU delegate and NNAPI delegate in the app source code in android studio.
To note, the app is unsupported on GPU/NNAPI (or some ops are unsupported on GPU/NNAPI). And according to my understanding, if a model op is unsupported on GPU/NNAPI, it will fallback onto CPU. So general understanding is the model should fallback and execute on CPU

When I add support for GPU delegate using the below code snippet sourced from https://www.tensorflow.org/lite/android/delegates/gpu_native#enable_gpu_acceleration,

// Set up interpreter
auto model = FlatBufferModel::BuildFromFile(model_path);
if (!model) return false;
ops::builtin::BuiltinOpResolver op_resolver;
std::unique_ptr<Interpreter> interpreter;
InterpreterBuilder(*model, op_resolver)(&interpreter);

// NEW: Prepare GPU delegate.
auto* delegate = TfLiteGpuDelegateV2Create(/*default options=*/nullptr);
if (interpreter->ModifyGraphWithDelegate(delegate) != kTfLiteOk) return false;

// Run inference.
WriteToInputTensor(interpreter->typed_input_tensor<float>(0));
if (interpreter->Invoke() != kTfLiteOk) return false;
ReadFromOutputTensor(interpreter->typed_output_tensor<float>(0));

// NEW: Clean up.
TfLiteGpuDelegateV2Delete(delegate);

I get android build error saying

ld: error: undefined symbol: TfLiteGpuDelegateV2Create

I have added the relevant header file to my C++ code and my gradle build file sources the cmakelists.txt file from the c++ source code directory to build the app.

My understanding is I will get the 'undefined symbol' error if the model is unsupported on the delegate. Is my understanding correct?
If the error is truly because of unsupported model, shouldn't the app still get compiled and simply run on CPU with fallback instead of erroring out at build time?

I get same error if I try to add NNAPI support as well.
Why is fallback to CPU from delegate not working during android build stage since whisper-tflite model works well with CPU

thanks

Standalone code to reproduce the issue

// code snippet for GPU delegate addition

const TfLiteDelegateOptionsV2 options = TfLiteGpuDelegateOptionsV2Default();
TfLiteDelegate* delegate = TfLiteGpuDelegateV2Create(&options);

if (tflite::Interpreter->ModifyGraphWithDelegate(delegate) != kTfLiteOk) return "";
tflite::Interpreter->SetNumThreads(4);
if (tflite::Interpreter->Invoke() != kTfLiteOk) return "";

Relevant log output

No response

[RNN] TFLite does not appear to be using the UnidirectionalSequenceLSTM

1. System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS
  • TensorFlow installation (pip package or built from source): from pip
  • TensorFlow library (version, if pip package or github SHA, if built from source): keras-nightly-3.4.1.dev2024080603 tb-nightly-2.18.0a20240806 tf-nightly-2.18.0.dev20240806

2. Code

  1. https://colab.research.google.com/gist/eric/f00f071e527f9fa7b2ed39f8d482fbb4/tensorflow-datasets.ipynb
  2. https://colab.research.google.com/gist/eric/a292799568831371b7686a2b8cefcd0b/tensorflow-lite-debugger-colab.ipynb

model.zip

3. Failure after conversion

I've found that any Keras LSTM causes this issue.

I expected TFLite to use the UnidirectionalSequenceLSTM op, but instead it seems to be doing something else that then requires the use of flex ops, which I would like to avoid trying to get to work with my tflite deployment situation.

[Bug] compiling the tf lite benchmark tool fails on macos (CMake)

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

2.15

Custom code

No

OS platform and distribution

MacOS 13.5.2

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/compiler version

Apple clang version 15.0.0 (clang-1500.1.0.2.5) Target: arm64-apple-darwin22.6.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

I am trying to compile the tf lite benchmark tool on MacOS.
For this I ran the following commands on tf 2.15 and master

git clone https://github.com/tensorflow/tensorflow.git tensorflow

mkdir tflite_build
cd tflite_build

cmake ../tensorflow/tensorflow/lite

cmake --build . -j -t benchmark_model

This crashes with this error

[ 95%] Linking CXX executable benchmark_model
ld: Undefined symbols:
  _TfLiteCoreMlDelegateCreate, referenced from:
      tflite::evaluation::CreateCoreMlDelegate() in utils.cc.o
  _TfLiteCoreMlDelegateDelete, referenced from:
      tflite::evaluation::CreateCoreMlDelegate() in utils.cc.o
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[3]: *** [tools/benchmark/benchmark_model] Error 1
make[2]: *** [tools/benchmark/CMakeFiles/benchmark_model.dir/all] Error 2
make[1]: *** [tools/benchmark/CMakeFiles/benchmark_model.dir/rule] Error 2
make: *** [benchmark_model] Error 2

I would expect the benchmark to be compiled successfully.

Standalone code to reproduce the issue

Look above

Relevant log output

cmake --build . -j -t benchmark_model
[  0%] Built target fft2d_fftsg
[  0%] Built target microkernel-utils
[  0%] Built target pthreadpool
[  0%] Built target ruy_system_aligned_alloc
[  0%] Built target ruy_have_built_path_for_avx512
[  0%] Built target ruy_have_built_path_for_avx2_fma
[  0%] Built target absl_flags_commandlineflag_internal
[  4%] Built target ruy_have_built_path_for_avx
[  4%] Built target absl_spinlock_wait
[  4%] Built target ruy_profiler_instrumentation
[  4%] Built target cpuinfo
[  4%] Built target ruy_denormal
[  4%] Built target absl_exponential_biased
[  4%] Built target eight_bit_int_gemm
[  4%] Built target ruy_wait
[  4%] Built target farmhash
[  4%] Built target absl_civil_time
[  4%] Built target absl_log_severity
[  4%] Built target absl_int128
[  4%] Built target ruy_apply_multiplier
[  9%] Built target indirection
[  9%] Built target fft2d_fftsg2d
[  9%] Built target normalization
[  9%] Built target absl_strerror
[ 14%] Built target logging
[ 14%] Built target packing
[ 14%] Built target allocator
[ 14%] Built target microparams-init
[ 14%] Built target flatbuffers
[ 42%] Built target microkernels-prod
[ 42%] Built target ruy_cpuinfo
[ 42%] Built target ruy_allocator
[ 42%] Built target absl_raw_logging_internal
[ 42%] Built target ruy_block_map
[ 42%] Built target absl_time_zone
[ 42%] Built target ruy_prepacked_cache
[ 42%] Built target memory
[ 42%] Built target ruy_blocking_counter
[ 42%] Built target hardware-config
[ 42%] Built target mutex
[ 42%] Built target operator-run
[ 42%] Built target operator-utils
[ 42%] Built target post-operation
[ 42%] Built target cache
[ 42%] Built target ruy_tune
[ 42%] Built target absl_bad_variant_access
[ 42%] Built target absl_debugging_internal
[ 42%] Built target absl_bad_optional_access
[ 42%] Built target absl_throw_delegate
[ 42%] Built target absl_cordz_functions
[ 42%] Built target ruy_thread_pool
[ 42%] Built target absl_base
[ 42%] Built target ruy_pack_arm
[ 42%] Built target absl_stacktrace
[ 42%] Built target ruy_kernel_avx512
[ 42%] Built target ruy_pack_avx2_fma
[ 42%] Built target ruy_kernel_avx
[ 42%] Built target ruy_pack_avx512
[ 42%] Built target ruy_kernel_arm
[ 47%] Built target operators
[ 47%] Built target ruy_pack_avx
[ 47%] Built target ruy_ctx
[ 47%] Built target absl_crc_cpu_detect
[ 47%] Built target absl_city
[ 52%] Built target absl_low_level_hash
[ 52%] Built target ruy_kernel_avx2_fma
[ 52%] Built target absl_demangle_internal
[ 52%] Built target absl_strings_internal
[ 52%] Built target absl_malloc_internal
[ 52%] Built target ruy_context
[ 52%] Built target ruy_trmul
[ 52%] Built target ruy_prepare_packed_matrices
[ 52%] Built target absl_graphcycles_internal
[ 52%] Built target absl_crc_internal
[ 57%] Built target subgraph
[ 57%] Built target ruy_context_get_ctx
[ 57%] Built target ruy_frontend
[ 61%] Built target absl_strings
[ 61%] Built target jit
[ 61%] Built target absl_symbolize
[ 61%] Built target absl_flags_commandlineflag
[ 61%] Built target absl_hash
[ 61%] Built target absl_crc32c
[ 61%] Built target absl_time
[ 61%] Built target XNNPACK
[ 61%] Built target absl_str_format_internal
[ 61%] Built target absl_flags_private_handle_accessor
[ 61%] Built target absl_crc_cord_state
[ 66%] Built target absl_flags_marshalling
[ 66%] Built target absl_synchronization
[ 66%] Built target absl_cord_internal
[ 66%] Built target absl_flags_program_name
[ 66%] Built target absl_cordz_handle
[ 66%] Built target absl_hashtablez_sampler
[ 66%] Built target absl_cordz_info
[ 66%] Built target absl_raw_hash_set
[ 66%] Built target absl_flags_config
[ 66%] Built target absl_flags_internal
[ 71%] Built target absl_cord
[ 71%] Built target absl_flags_reflection
[ 71%] Built target absl_status
[ 71%] Built target absl_flags
[ 95%] Built target tensorflow-lite
[ 95%] Linking CXX executable benchmark_model
ld: Undefined symbols:
  _TfLiteCoreMlDelegateCreate, referenced from:
      tflite::evaluation::CreateCoreMlDelegate() in utils.cc.o
  _TfLiteCoreMlDelegateDelete, referenced from:
      tflite::evaluation::CreateCoreMlDelegate() in utils.cc.o
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[3]: *** [tools/benchmark/benchmark_model] Error 1
make[2]: *** [tools/benchmark/CMakeFiles/benchmark_model.dir/all] Error 2
make[1]: *** [tools/benchmark/CMakeFiles/benchmark_model.dir/rule] Error 2
make: *** [benchmark_model] Error 2

Building TFLite fails on nnapi_delegate requiring C++20 extensions

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.16.1

Custom code

No

OS platform and distribution

Docker

Mobile device

No response

Python version

3.11

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

Hi,

I'm following the steps to build TF Lite using docker (https://www.tensorflow.org/lite/android/lite_build#set_up_build_environment_using_docker) with TF release 2.16.1 (also tried 2.15). No modifications to the sources, Dockerfile or commands, only change is limiting the target architectures to arm64_v8a

The step of building aar for a given model fails by throwing C++20 extensions needed by the nnapi delegate. It seems that the rest of tooling is using gnu++17?

Standalone code to reproduce the issue

docker build . -t tflite-builder -f tflite-android.Dockerfile
docker run -it -v $PWD:/host_dir tflite-builder bash

sdkmanager \
  "build-tools;${ANDROID_BUILD_TOOLS_VERSION}" \
  "platform-tools" \
  "platforms;android-${ANDROID_API_LEVEL}"

./configure

bazel build -c opt --cxxopt=--std=c++17 --config=android_arm64 \
  --fat_apk_cpu=arm64-v8a \
  --define=android_dexmerger_tool=d8_dexmerger \
  --define=android_incremental_dexing_tool=d8_dexbuilder \
  //tensorflow/lite/java:tensorflow-lite


bash tensorflow/lite/tools/build_aar.sh \
  --input_models=model.tflite \
  --target_archs=arm64-v8a

Relevant log output

ERROR: /host_dir/tensorflow-2.16.1/tensorflow/lite/delegates/nnapi/BUILD:14:11: Compiling tensorflow/lite/delegates/nnapi/nnapi_delegate.cc failed: (Exit 1): clang failed: error executing command (from target //tensorflow/lite/delegates/nnapi:nnapi_delegate_no_nnapi_implementation) external/androidndk/toolchains/llvm/prebuilt/linux-x86_64/bin/clang -no-canonical-prefixes '--target=aarch64-linux-android30' -fdiagnostics-color -Wa,--noexecstack -fno-exceptions '-std=gnu++17' ... (remaining 158 arguments skipped)
tensorflow/lite/delegates/nnapi/nnapi_delegate.cc:522:7: error: designated initializers are a C++20 extension [-Werror,-Wc++20-designator]
      .type = nn_type,
      ^
tensorflow/lite/delegates/nnapi/nnapi_delegate.cc:1529:45: error: designated initializers are a C++20 extension [-Werror,-Wc++20-designator]
    ANeuralNetworksOperandType operand_type{.type = nn_type};
                                            ^
tensorflow/lite/delegates/nnapi/nnapi_delegate.cc:1682:45: error: designated initializers are a C++20 extension [-Werror,-Wc++20-designator]
    ANeuralNetworksOperandType operand_type{.type = nn_type};
                                            ^
tensorflow/lite/delegates/nnapi/nnapi_delegate.cc:1701:45: error: designated initializers are a C++20 extension [-Werror,-Wc++20-designator]
    ANeuralNetworksOperandType operand_type{.type = nn_type,
                                            ^
tensorflow/lite/delegates/nnapi/nnapi_delegate.cc:1743:9: error: designated initializers are a C++20 extension [-Werror,-Wc++20-designator]
        .type = nn_type,
        ^
tensorflow/lite/delegates/nnapi/nnapi_delegate.cc:1844:17: error: designated initializers are a C++20 extension [-Werror,-Wc++20-designator]
                .channelDim = static_cast<uint32_t>(
                ^
tensorflow/lite/delegates/nnapi/nnapi_delegate.cc:6915:7: error: designated initializers are a C++20 extension [-Werror,-Wc++20-designator]
      .init = [](TfLiteContext* context, const char* buffer,
      ^
7 errors generated.
Target //tmp:tensorflow-lite failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 11.395s, Critical Path: 8.54s
INFO: 62 processes: 21 internal, 41 local.
FAILED: Build did NOT complete successfully

Compilation crashes on MacOS 15.0

Using MacOS 15.0. Running:

PYTHON=python3 tensorflow/lite/tools/pip_package/build_pip_package_with_cmake.sh native

as recommended, leads to a crash during compilation. Log:

[ 36%] Linking CXX static library libabsl_str_format_internal.a
cd /Users/feranick/Desktop/LiteRT/third_party/tensorflow/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3/cmake_build/_deps/abseil-cpp-build/absl/strings && /opt/local/bin/cmake -P CMakeFiles/absl_str_format_internal.dir/cmake_clean_target.cmake
cd /Users/feranick/Desktop/LiteRT/third_party/tensorflow/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3/cmake_build/_deps/abseil-cpp-build/absl/strings && /opt/local/bin/cmake -E cmake_link_script CMakeFiles/absl_str_format_internal.dir/link.txt --verbose=1
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ar qc libabsl_str_format_internal.a CMakeFiles/absl_str_format_internal.dir/internal/str_format/arg.cc.o CMakeFiles/absl_str_format_internal.dir/internal/str_format/bind.cc.o CMakeFiles/absl_str_format_internal.dir/internal/str_format/extension.cc.o CMakeFiles/absl_str_format_internal.dir/internal/str_format/float_conversion.cc.o CMakeFiles/absl_str_format_internal.dir/internal/str_format/output.cc.o CMakeFiles/absl_str_format_internal.dir/internal/str_format/parser.cc.o
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib libabsl_str_format_internal.a
gmake[3]: Leaving directory '/Users/feranick/Desktop/LiteRT/third_party/tensorflow/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3/cmake_build'
[ 36%] Built target absl_str_format_internal
gmake[2]: Leaving directory '/Users/feranick/Desktop/LiteRT/third_party/tensorflow/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3/cmake_build'
gmake[1]: *** [CMakeFiles/Makefile2:1532: CMakeFiles/_pywrap_tensorflow_interpreter_wrapper.dir/rule] Error 2
gmake[1]: Leaving directory '/Users/feranick/Desktop/LiteRT/third_party/tensorflow/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3/cmake_build'
gmake: *** [Makefile:208: _pywrap_tensorflow_interpreter_wrapper] Error 2

libtensorflowlite_c is slower than tf.lite

Issue type

Performance

Have you reproduced the bug with TensorFlow Nightly?

No

Source

binary

TensorFlow version

v2.9.1

Custom code

No

OS platform and distribution

Linux NixOs

Mobile device

No response

Python version

3.10

Bazel version

5

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

Running a model on my linux x86_64 machine in Python is two times faster than running the same model through symbols exposed by compiled libtensorflowlite_c.so. Why is the code exposed in Python's tensorflow.lite.Interpreter faster?

Standalone code to reproduce the issue

Bazel Build Command: "bazel" "--output_base=/target/x86_64-unknown-linux-gnu/release/build/tflitec-d675effdd7095bee/out/tensorflow_v2.9.1_output_base" "build" "-c" "opt" "--config=linux" "//tensorflow/lite/c/tmp:tensorflowlite_c" "--copt=-O3"

Relevant log output

No response

INT4 and other low-precision conversion support status

What is the current status of model conversion (PTQ specifically) with INT4 precision?

The question was raised before here tensorflow/tensorflow#60125.
Also it looks like INT4 support is being added to various parts of tensorflow, as evidenced by tensorflow/tensorflow#63870 and https://github.com/tensorflow/tensorflow/blob/6738c28cf4eed334d75d02b6e97fe16c34069ef1/tensorflow/compiler/mlir/quantization/tensorflow/quantization_options.proto#L76.

However at the moment there seems to be no way to quantize model to INT4 (specifically the weights):

Can anyone who actively works on this in TF team shine the light on what is the current direction and where one needs to dig to add INT4 PTQ quantization?

Supportment for 16KB page sizes on other processor architectures

Issue type

Feature Request

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

tensorflow-lite 2.16.1

Custom code

Yes

OS platform and distribution

Android 15

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

The tensorflow-lite does not support 16KB page sizes on x86, x86_64, armeavi-v7a architecture.

Android 15, which will be released next year, supports 16 KB page sizes.
I'm going to introduce it in my app to improve performance.
However, tensorflow-lite does not seem to support 16 KB page sizes.
Is it possible to support 16 KB page sizes?

My app supports the architecture below.

  • x86
  • x86_64
  • arm64-v8a
  • armeabi-v7a

I found that support for arm64-v8a was supported through the following issues.
tensorflow/tensorflow#69459

Please review whether it is possible to support x86, x86_64, armeavi-v7a architecture also.

The android guide related to this is as follows.
https://developer.android.com/guide/practices/page-sizes

Standalone code to reproduce the issue

1. APK with tensorflow-lite v2.16.1
2. Create an alignment.sh file for 16KB alignment test with reference to the link below
https://developer.android.com/guide/practices/page-sizes?hl=en#test
3. Run below command
$ ./alignment.sh ${UNZIPPED_APK_FOLDER}

currently the output is below,

apk/lib/x86/libtensorflowlite_jni.so: UNALIGNED (2**12)
apk/lib/x86_64/libtensorflowlite_jni.so: UNALIGNED (2**12)
apk/lib/arm64-v8a/libtensorflowlite_jni.so: UNALIGNED (2**12)
apk/lib/armeabi-v7a/libtensorflowlite_jni.so: UNALIGNED (2**12)

Relevant log output

No response

LiteRT runtime build not working for Linux+Aarch64

There is no aarch64 linux wheel for ai-edge-litert.
https://pypi.org/project/ai-edge-litert/#files

For tensorflow there is e.g. CPython 3.11 manylinux: glibc 2.17+ ARM64
https://pypi.org/project/tensorflow/#files

It appears not to be possible to build natively for aarch64+linux from source (using the ci/build_pip_package_with_bazel.sh script):

How to reproduce on Linux+Aarch64:
./ci/build_pip_package_with_bazel.sh

Building TFL runtime in tensorflow on Linux+Aarch64 works:
./tensorflow/lite/tools/pip_package/build_pip_package_with_cmake.sh

TensorflowLite for Windows on Arm

I have been working on making Tensorflow Lite available for the Windows on Arm platform. My current patch can be found here https://github.com/everton1984/tensorflow.git on the branch woa_enablement2. I have written a wiki on how to compile that can be accessed here https://linaro.atlassian.net/wiki/spaces/WOAR/pages/29206609925/Tensorflow+Lite. All that is left is getting the XNNPACK patch into mainstream and I am currently discussing it with the maintaners.

I am wondering if there is the possibility and interest in supporting a new platform, like Windows on Arm and if so if that patch is enough or if there is something else that needs to be tackled.

tflite-support not compatible with python 3.11 aarch64 Raspberry Pi 4

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

tflite-support 0.4.4

Custom code

No

OS platform and distribution

Raspberry Pi OS Bookworm 64 bit

Mobile device

Raspberry Pi

Python version

3.11.12

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

I am unable to install tflite-support into my environment that is using python 3.11.
Install method of pip3 install tflite-support

I also tried to download src and build, but I'm still getting the same errors when I try to install

git clone https://github.com/tensorflow/tflite-support.git
cd tflite-support
pip install .

Standalone code to reproduce the issue

using python 3.11 install via pip
 pip3 install tflite-support

Relevant log output

No response

Unable to build TFLite GPU Delegate for Android on MacOS

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

tf 2.16

Custom code

No

OS platform and distribution

macOS Sonoma 14.5

Mobile device

Android

Python version

3.12

Bazel version

7.2.0

GCC/compiler version

Apple clang 15.0.0

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

I am attempting to build the android gpu delegate but I am running into difficulties producing a shared library. I have followed the build guide found at: https://www.tensorflow.org/lite/android/delegates/gpu_native. I am trying to build on macOS Sonoma 14.5 as it is our development platform, and it appears to be causing issues during linking. The linker attempts to link the CoreFoundation framework which doesn't make much sense for an Android build. Building on a linux platform, we don't run into this issue.

When configuring, we aren't building with ROCm or CUDA support, we're using the default optimization flags, we're configuring the workspace for Android, and we're not supporting iOS.

Standalone code to reproduce the issue

export ANDROID_SDK_HOME="/Users/$USER/Library/Android/sdk"
export ANDROID_NDK_HOME="/Users/$USER/Library/Android/sdk/ndk/25.2.9519653"
export ANDROID_API_LEVEL="34"
export ANDROID_NDK_API_LEVEL="26"
export ANDROID_BUILD_TOOLS_VERSION="30.0.3"

./tensorflow/configure < AndroidConfiguration

bazel build -c opt --cxxopt=--std=c++20 --config=android_arm64 \
   --fat_apk_cpu=arm64-v8a \
   --define=android_dexmerger_tool=d8_dexmerger \
   --define=android_incremental_dexing_tool=d8_dexbuilder \
   --define=xnn_enable_arm_i8mm=false \
   --define=xnn_enable_arm_bf16=false \
   --verbose_failures \
   //tensorflow/lite/delegates/gpu:libtensorflowlite_gpu_delegate.so

Relevant log output

ld.lld: error: unknown argument '-framework'
ld.lld: error: cannot open CoreFoundation: No such file or directory
clang: error: linker command failed with exit code 1 (use -v to see invocation)
Target //tensorflow/lite/delegates/gpu:libtensorflowlite_gpu_delegate.so failed to build

Interpreter API (Java) - GpuDelegateV2 support

Hi,
I am trying to run a TFLite model on the GPU of an Android device.
According to this documentation, it is possible to use both the Interpreter API and the Native c++ API to achieve this.

At the moment, I am using the following dependencies:

implementation 'org.tensorflow:tensorflow-lite:2.15.0'
implementation 'org.tensorflow:tensorflow-lite-select-tf-ops:2.15.0'
implementation 'org.tensorflow:tensorflow-lite-support:0.4.4'
implementation 'org.tensorflow:tensorflow-lite-gpu:2.15.0'
implementation 'org.tensorflow:tensorflow-lite-gpu-api:2.15.0'
implementation 'org.tensorflow:tensorflow-lite-gpu-delegate-plugin:0.4.4'

I was able to successfully run my model using the GPUDelegate provided by the Java Interpreter API. However, this delegate does not allow to specify inference priority options (TFLITE_GPU_INFERENCE_PRIORITY_MIN_LATENCY, TFLITE_GPU_INFERENCE_PRIORITY_MIN_MEMORY_USAGE, TFLITE_GPU_INFERENCE_PRIORITY_MAX_PRECISION).

These options can be specified if the Native C++ API is used given the presence of GpuDelegateV2. However, at the moment I don't see this option in the Interpreter API since there is no class named GpuDelegateV2.

Is there a way to make use of this new delegate without the need of using the Native C++ API?

GPU delegate linker errors when building TFLite 2.16.1 for Android with CMake

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

2.16.1

Custom code

Yes

OS platform and distribution

No response

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/compiler version

Clang 14.0.6 (Android NDK r25b)

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

I'm trying to build TFLite 2.16.1 for Android using CMake and Android NDK r25b and got the following linker errors related to GPU delegate. It looks like some source files was not added to CMake configuration files.

ld: error: undefined symbol: tflite::gpu::OptionalAndroidHardwareBuffer::OptionalAndroidHardwareBuffer()
>>> referenced by android_hardware_buffer.h:69 (tensorflow/lite/delegates/gpu/android_hardware_buffer.h:69)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(TfLiteGpuDelegateV2CreateAsync)
>>> referenced by android_hardware_buffer.h:69 (tensorflow/lite/delegates/gpu/android_hardware_buffer.h:69)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::RegisterBuffer(TfLiteOpaqueContext*, TfLiteIoType, TfLiteBackendBuffer const*, TfLiteAttributeMap const*, int))
>>> referenced by android_hardware_buffer.h:69 (tensorflow/lite/delegates/gpu/android_hardware_buffer.h:69)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::RegisterBuffer(TfLiteOpaqueContext*, TfLiteIoType, TfLiteBackendBuffer const*, TfLiteAttributeMap const*, int))
>>> referenced 5 more times

ld: error: undefined symbol: tflite::delegates::BackendAsyncKernelInterface::BackendAsyncKernelInterface()
>>> referenced by delegate.cc:687 (tensorflow/lite/delegates/gpu/delegate.cc:687)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::CreateAsyncRegistration()::$_2::__invoke(TfLiteContext*, char const*, unsigned long))
>>> did you mean: tflite::delegates::BackendAsyncKernelInterface::~BackendAsyncKernelInterface()
>>> defined in: CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o

ld: error: undefined symbol: tflite::delegates::utils::ReadBufferAttrs(TfLiteAttributeMap const*)
>>> referenced by delegate.cc:1048 (tensorflow/lite/delegates/gpu/delegate.cc:1048)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::RegisterBuffer(TfLiteOpaqueContext*, TfLiteIoType, TfLiteBackendBuffer const*, TfLiteAttributeMap const*, int))
>>> referenced by delegate.cc:911 (tensorflow/lite/delegates/gpu/delegate.cc:911)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::ReconcileRestrictions(TfLiteOpaqueContext const*, TfLiteOpaqueNode const*, int, TfLiteAttributeMap const*, TfLiteAttributeMap*, TfLiteAttributeMap*) const)

ld: error: undefined symbol: tflite::delegates::utils::WriteBufferAttrs(tflite::delegates::utils::BufferAttributes const&, TfLiteAttributeMap*)
>>> referenced by delegate.cc:913 (tensorflow/lite/delegates/gpu/delegate.cc:913)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::ReconcileRestrictions(TfLiteOpaqueContext const*, TfLiteOpaqueNode const*, int, TfLiteAttributeMap const*, TfLiteAttributeMap*, TfLiteAttributeMap*) const)
>>> referenced by delegate.cc:913 (tensorflow/lite/delegates/gpu/delegate.cc:913)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::ReconcileRestrictions(TfLiteOpaqueContext const*, TfLiteOpaqueNode const*, int, TfLiteAttributeMap const*, TfLiteAttributeMap*, TfLiteAttributeMap*) const)
>>> referenced by delegate.cc:913 (tensorflow/lite/delegates/gpu/delegate.cc:913)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::ReconcileRestrictions(TfLiteOpaqueContext const*, TfLiteOpaqueNode const*, int, TfLiteAttributeMap const*, TfLiteAttributeMap*, TfLiteAttributeMap*) const)
>>> referenced 2 more times

ld: error: undefined symbol: tflite::delegates::utils::ReadSyncAttrs(TfLiteAttributeMap const*)
>>> referenced by delegate.cc:936 (tensorflow/lite/delegates/gpu/delegate.cc:936)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::ReconcileRestrictions(TfLiteOpaqueContext const*, TfLiteOpaqueNode const*, int, TfLiteAttributeMap const*, TfLiteAttributeMap*, TfLiteAttributeMap*) const)
>>> referenced by delegate.cc:969 (tensorflow/lite/delegates/gpu/delegate.cc:969)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::SetAttributes(TfLiteOpaqueContext*, TfLiteOpaqueNode*, int, TfLiteAttributeMap const*))

ld: error: undefined symbol: tflite::delegates::utils::WriteSyncAttrs(tflite::delegates::utils::SyncAttributes const&, TfLiteAttributeMap*)
>>> referenced by delegate.cc:938 (tensorflow/lite/delegates/gpu/delegate.cc:938)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::ReconcileRestrictions(TfLiteOpaqueContext const*, TfLiteOpaqueNode const*, int, TfLiteAttributeMap const*, TfLiteAttributeMap*, TfLiteAttributeMap*) const)
>>> referenced by delegate.cc:940 (tensorflow/lite/delegates/gpu/delegate.cc:940)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::ReconcileRestrictions(TfLiteOpaqueContext const*, TfLiteOpaqueNode const*, int, TfLiteAttributeMap const*, TfLiteAttributeMap*, TfLiteAttributeMap*) const)

ld: error: undefined symbol: tflite::gpu::gl::WaitFdGpu(int)
>>> referenced by delegate.cc:1167 (tensorflow/lite/delegates/gpu/delegate.cc:1167)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::Eval(TfLiteOpaqueContext*, TfLiteOpaqueNode*, TfLiteExecutionTask*))

ld: error: undefined symbol: tflite::delegates::utils::WaitForAllFds(absl::lts_20230802::Span<int const>)
>>> referenced by delegate.cc:1170 (tensorflow/lite/delegates/gpu/delegate.cc:1170)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::Eval(TfLiteOpaqueContext*, TfLiteOpaqueNode*, TfLiteExecutionTask*))

ld: error: undefined symbol: tflite::gpu::gl::EglEnvironment::NewEglEnvironment(std::__ndk1::unique_ptr<tflite::gpu::gl::EglEnvironment, std::__ndk1::default_delete<tflite::gpu::gl::EglEnvironment> >*)
>>> referenced by delegate.cc:1175 (tensorflow/lite/delegates/gpu/delegate.cc:1175)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::Eval(TfLiteOpaqueContext*, TfLiteOpaqueNode*, TfLiteExecutionTask*))

ld: error: undefined symbol: tflite::delegates::utils::ConvertToTfLiteStatus(absl::lts_20230802::Status)
>>> referenced by delegate.cc:1175 (tensorflow/lite/delegates/gpu/delegate.cc:1175)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::Eval(TfLiteOpaqueContext*, TfLiteOpaqueNode*, TfLiteExecutionTask*))
>>> referenced by delegate.cc:1186 (tensorflow/lite/delegates/gpu/delegate.cc:1186)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::Eval(TfLiteOpaqueContext*, TfLiteOpaqueNode*, TfLiteExecutionTask*))
>>> referenced by delegate.cc:1187 (tensorflow/lite/delegates/gpu/delegate.cc:1187)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::Eval(TfLiteOpaqueContext*, TfLiteOpaqueNode*, TfLiteExecutionTask*))
>>> referenced 3 more times

ld: error: undefined symbol: tflite::gpu::gl::EglEnvironment::~EglEnvironment()
>>> referenced by memory:2427 (android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/memory:2427)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::Eval(TfLiteOpaqueContext*, TfLiteOpaqueNode*, TfLiteExecutionTask*))
>>> referenced by memory:2427 (android-ndk-r25b/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include/c++/v1/memory:2427)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::Eval(TfLiteOpaqueContext*, TfLiteOpaqueNode*, TfLiteExecutionTask*))

ld: error: undefined symbol: tflite::gpu::AsyncBuffer::GetOpenGlBuffer(unsigned int&)
>>> referenced by delegate.cc:1186 (tensorflow/lite/delegates/gpu/delegate.cc:1186)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::Eval(TfLiteOpaqueContext*, TfLiteOpaqueNode*, TfLiteExecutionTask*))
>>> referenced by delegate.cc:1199 (tensorflow/lite/delegates/gpu/delegate.cc:1199)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::Eval(TfLiteOpaqueContext*, TfLiteOpaqueNode*, TfLiteExecutionTask*))

ld: error: undefined symbol: tflite::gpu::gl::CreateFdGpu()
>>> referenced by delegate.cc:1211 (tensorflow/lite/delegates/gpu/delegate.cc:1211)
>>>               CMakeFiles/tensorflow-lite.dir/delegates/gpu/delegate.cc.o:(tflite::gpu::(anonymous namespace)::DelegateAsyncKernel::Eval(TfLiteOpaqueContext*, TfLiteOpaqueNode*, TfLiteExecutionTask*))
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [CMakeFiles/tensorflow-lite.dir/build.make:6296: libtensorflow-lite.so] Error 1
make[1]: *** [CMakeFiles/Makefile2:1336: CMakeFiles/tensorflow-lite.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

Standalone code to reproduce the issue

CMake options:

cmake -S tensorflow/lite -B build -D CMAKE_BUILD_TYPE=Release -D CMAKE_SYSTEM_PROCESSOR=aarch64 -D BUILD_SHARED_LIBS=ON -D TFLITE_ENABLE_GPU=ON -D CMAKE_SYSTEM_NAME=Linux -D CMAKE_SYSTEM_VERSION=29 -D ANDROID_PLATFORM=29 -D CMAKE_ANDROID_ARCH_ABI=arm64-v8a -D ANDROID_ABI=arm64-v8a -D CMAKE_TOOLCHAIN_FILE=android-ndk-r25b/build/cmake/android.toolchain.cmake

Relevant log output

No response

Support for Quantized ELU is missing in TFLite MLIR converter

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): WSL2 Ubuntu
  • TensorFlow installed from (source or binary): pip
  • TensorFlow version (or github SHA if from source): Latest

Unlike ReLU for example, currently ELU isn't supported for 8 bit quantization in TFLite:
image
I think we can confirm this by looking at ELU's definition in tfl_ops.td , it doesn't have the quantizable trait.

While researching upon this issue, I stumbled upon an old stackoverflow issue implying that quantized ELU was once supported. Maybe it was skipped after the move to MLIR?

Are there plans to support it? If not, will it be straightforward to implement it by my own?

The constant folding pass of the TFLite converter prevent storing packed tensors, stores dequantized tensors instead

1. System information

WSL Linux l 5.14.0-427.18.1.el9_4.x86_64 GNU/Linux
tensorflow==2.10.1
tensorflow-cpu==2.10.1
installed using pip

2. Code

Ignore the fact that the dequantization process is currently wrong, this is just for testing.

class TestBinary(Layer):
    def __init__(self, units):
        super().__init__()
        self.units = units

    def build(self, input_shape):
        self.input_size = input_shape[-1]
        assert input_shape[-1] * self.units % 8 == 0
        compressed_size = int(input_shape[-1] * self.units / 8)
        self.kernel = self.add_weight(shape=(compressed_size,), initializer="ones", dtype=tf.int8)

    def call(self, x):
        compressed_weights = self.kernel
        tensors = []
        for i in range(8):
            tensors.append(compressed_weights)  # will use a formula to dequantize in the future
        
        kernel = tf.stack(tensors)
        kernel = tf.cast(kernel, tf.float32)  # seems to store the variable at this point on disk
        kernel = tf.reshape(kernel, (self.input_size, self.units))
        x = x @ kernel
        return x

quantDense = TestBinary(1000)
test_input_shape = (500,)
quantDense(tf.ones((1, *test_input_shape))) # initialize the weights

converter = tf.lite.TFLiteConverter.from_keras_model(quantDense)
converted = converter.convert()

with open("test.tflite", "wb") as f:
    f.write(converted)

3. conversion

The conversion is successful, but the model size is that of a model which stores the full 1000*500 weight matrix in int8 format, which is about 500KB, when it should store the packed weights and weigh ~63KB.

I assume this is the result of the constant folding pass of the converter which stores the weights that have just been casted to float32, instead of storing the int8 weights and re-doing the dequantization process each time.
This can be seen in the graph of the resulting tflite file:
image
(I am not sure why the tensor is not saved after the transpose instead)

This is also evidenced by the fact that I can replace compressed_weights = self.kernel with compressed_weights = self.kernel + tf.cast(x[0,0], dtype=tf.int8) * 0 and have the compressed weights saved on disk this way, because x cannot be constant-folded.
However, this cost extra operations and forces me to activate additional supported ops in the converter, which is not ideal.

Note that I also tried adding this code, but it does not change anything:

tf.config.optimizer.set_experimental_options({
    "constant_folding": False,
    "disable_model_pruning": True,
    "remapping": False,
    })

So, is there a way to prevent constant folding? Perhaps with a global flag but preferably by introducing a no-op in the graph at a specific point to prevent the folding of just these nodes.

Maybe there is also a way to guarantee the storage of a particular parameter in packed form for binary and ternary weights?

Having non-converted operations, even for simplest models

System information

  • Platform: Tried on Google Colab
  • TensorFlow version: 2.15.0

Steps to reproduce

  • Creating a Python file with the following content in Google Colab (let's call it test.py):
import tensorflow as tf

model = tf.keras.models.Sequential([
    tf.keras.layers.InputLayer(input_shape=(1,)),
    tf.keras.layers.Dense(1)
])

converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()

with open('model.tflite', 'wb') as f:
  f.write(tflite_model)
  • Calling it in Jupyter Notebook by !python test.py
  • The output will be:
2024-01-28 11:17:11.939381: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-01-28 11:17:11.939450: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-01-28 11:17:11.941203: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-01-28 11:17:13.646897: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2024-01-28 11:17:16.713671: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:378] Ignored output_format.
2024-01-28 11:17:16.713736: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:381] Ignored drop_control_dependency.
Summary on the non-converted ops:
---------------------------------
 * Accepted dialects: tfl, builtin, func
 * Non-Converted Ops: 1, Total Ops 6, % non-converted = 16.67 %
 * 1 ARITH ops

- arith.constant:    1 occurrences  (f32: 1)



  (f32: 1)

Note: running the code directly in Jupyter Notebook won't print anything

Problem

  • Why can it not convert all the operations in such a simple model?
  • Was it supposed to be like that?
  • Does having non-converted operations affect the performance of the model when deployed on a microcontroller using TFLM?
  • Is there a way to solve it?

Inference time using Interpreter API on Android inconsistent and 10–50 times slower than same tflite model on iOS

Issue type

Performance

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

binary

TensorFlow version

2.15.0

Custom code

Yes

OS platform and distribution

No response

Mobile device

Google Pixel 4a running Android 13

Python version

No response

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

I'm running inference on Yolov8-based tflite model on Android using the Interpreter API. I noticed that the first 30 or so calls to the Interpreter.run() function take much longer than the subsequent calls. The difference is quite marked, starting at about 3500ms per run and ending at about 500ms.

I thought perhaps it was something about the input data so I tried a test with running the same call with the same input 100 times in a loop. Same behaviour, the first handful of inference runs take around 3 seconds, slowly speeding up to about 500–700ms by the 100th iteration.

I wanted to find out whether there is a specific combination of the interpreter options causing this behaviour so I wrote a test matrix initialising interpreters with different options:

  • Using GPU delegate
    • Using Google Play Services runtime
      • Using model with precision reduced from float32 to float16
    • Using bundled runtime
      • Using model with precision reduced from float32 to float16
  • Using NNAPI delegate
    • Using Google Play Services runtime
      • Using model with precision reduced from float32 to float16
    • Using bundled runtime
      • Using model with precision reduced from float32 to float16
  • Using CPU with XNNPACK
    • Using Google Play Services runtime
      • Using model with precision reduced from float32 to float16
    • Using bundled runtime
      • Using model with precision reduced from float32 to float16
  • Using CPU without XNNPACK
    • Using Google Play Services runtime
      • Using model with precision reduced from float32 to float16
    • Using bundled runtime
      • Using model with precision reduced from float32 to float16

There doesn't seem to be any difference whichever combination runs first takes suspicious amount of time for the first handful of inference runs. Sometimes the time never decreases and all the inference runs for the given configuration take a very long time (~3 seconds).

I'm including the code using the bundled runtime. The Play Services runtime times were in line with the bundled runtime.

The device (Google Pixel 4a) is used only for development. There are no other apps installed aside from the test app and whatever was pre-installed on the phone. The device wasn't connected to the internet while running the test.

iOS comparison

In comparison, version 2.14.0 of TfLite for Swift (latest available on CocoaPods) using the CoreML delegate runs inference on the same input using the same model in 70ms on iPhone 12.

Standalone code to reproduce the issue

fun testInferenceSpeed() {
    val context = InstrumentationRegistry.getInstrumentation().context
    val assetManager = context.assets
    // Input serialized as a float array in JSON
    val jsonFile = "face_on_iPad_001.jpg-flat.json"
    assetManager.open(jsonFile).use { inputStream ->
        val json = inputStream.bufferedReader().use { it.readText() }
        val floatArray = Json.decodeFromString<FloatArray>(json)
        // Models – float32 and float16
        val models = arrayOf("ARC_PSD-001_1.1.122_bst_yl80201_float32.tflite", "ARC_PSD-001_1.1.122_bst_yl80201_float16.tflite")
        val options = arrayOf("gpu", "nnapi", "cpu", "xnnpack")
        for (model in models) {
            assetManager.open(model).use { modelInputStream ->
                // Copy the model from assets to the cache directory
                val modelFile = File(context.cacheDir, model)
                modelFile.outputStream().use { outputStream ->
                    modelInputStream.copyTo(outputStream)
                }
                for (option in options) {
                    val interpreterOptions = InterpreterApi.Options()
                    val compatibilityList = CompatibilityList()
                    when (option) {
                        "gpu" -> {
                            compatibilityList.use {
                                if (it.isDelegateSupportedOnThisDevice) {
                                    interpreterOptions.addDelegate(
                                        GpuDelegate(
                                            it.bestOptionsForThisDevice
                                        )
                                    )
                                }
                            }
                        }
                        "nnapi" -> {
                            if (android.os.Build.VERSION.SDK_INT >= android.os.Build.VERSION_CODES.P) {
                                interpreterOptions.addDelegate(NnApiDelegate())
                                interpreterOptions.useNNAPI = true
                            }
                        }
                        "cpu" -> {
                            interpreterOptions.numThreads =
                                Runtime.getRuntime().availableProcessors()
                            interpreterOptions.useXNNPACK = false
                        }

                        "xnnpack" -> {
                            interpreterOptions.numThreads =
                                Runtime.getRuntime().availableProcessors()
                            interpreterOptions.useXNNPACK = true
                        }
                        else -> throw IllegalArgumentException("Unknown option: $option")
                    }
                    InterpreterApi.create(modelFile, interpreterOptions)
                        .use { interpreterApi ->
                            val times = mutableListOf<Long>()
                            for (i in 0 until 100) {
                                interpreterApi.allocateTensors()
                                val input = FloatBuffer.wrap(floatArray)
                                val output =
                                    FloatBuffer.allocate(5 * 8400).also { it.rewind() }
                                val time = measureTimeMillis {
                                    interpreterApi.run(input, output)
                                }
                                times.add(time)
                            }
                            Log.d(
                                TAG,
                                "Model: $model, Option: $option, Inference times (ms): [${times.map { it.toString()+"ms" }.joinToString()}], Average inference time: ${times.average()} ms"
                            )
                        }
                }
            }
        }
    }
}

Relevant log output

Model: ARC_PSD-001_1.1.122_bst_yl80201_float32.tflite, Option: gpu, Inference times (ms): [2502ms, 3011ms, 2987ms, 2723ms, 3529ms, 4245ms, 3387ms, 4510ms, 4133ms, 4034ms, 4015ms, 3307ms, 3207ms, 3240ms, 2718ms, 2978ms, 2985ms, 3357ms, 2751ms, 2969ms, 2942ms, 3028ms, 2916ms, 3029ms, 4428ms, 2727ms, 4982ms, 4320ms, 3211ms, 2980ms, 4010ms, 3239ms, 2712ms, 3974ms, 3994ms, 3999ms, 3997ms, 3047ms, 3687ms, 3744ms, 2972ms, 2944ms, 3709ms, 3936ms, 3971ms, 3998ms, 3315ms, 4495ms, 3285ms, 4655ms, 2758ms, 3307ms, 4880ms, 4912ms, 3599ms, 2750ms, 2004ms, 2643ms, 3383ms, 3372ms, 1664ms, 3297ms, 2969ms, 1714ms, 2834ms, 3381ms, 1764ms, 2303ms, 1715ms, 3314ms, 3379ms, 1434ms, 3221ms, 2842ms, 1783ms, 1784ms, 1418ms, 1618ms, 1400ms, 1777ms, 1960ms, 1962ms, 1471ms, 2355ms, 2883ms, 1494ms, 2806ms, 2281ms, 2482ms, 2915ms, 1504ms, 2772ms, 3376ms, 1753ms, 3300ms, 1748ms, 2584ms, 3377ms, 3384ms, 1648ms], Average inference time: 3021.08 ms
Model: ARC_PSD-001_1.1.122_bst_yl80201_float32.tflite, Option: nnapi, Inference times (ms): [2288ms, 2105ms, 1637ms, 2280ms, 2085ms, 1695ms, 1634ms, 1759ms, 1637ms, 2006ms, 2210ms, 2018ms, 2050ms, 1979ms, 1698ms, 2201ms, 2105ms, 1989ms, 2040ms, 1966ms, 2034ms, 1970ms, 2031ms, 1970ms, 2033ms, 1968ms, 2034ms, 1966ms, 1763ms, 2160ms, 2077ms, 1987ms, 2040ms, 1966ms, 2033ms, 1859ms, 2106ms, 1993ms, 2041ms, 1965ms, 1826ms, 2117ms, 2073ms, 1979ms, 2041ms, 1969ms, 1632ms, 2109ms, 2212ms, 2024ms, 1362ms, 1284ms, 1970ms, 1806ms, 1212ms, 1800ms, 1231ms, 1452ms, 1465ms, 1128ms, 1185ms, 1519ms, 1246ms, 1824ms, 1224ms, 1719ms, 1234ms, 1964ms, 1133ms, 1973ms, 1689ms, 1241ms, 1890ms, 1194ms, 1187ms, 1108ms, 1089ms, 1091ms, 1086ms, 1084ms, 958ms, 1021ms, 1009ms, 999ms, 964ms, 1025ms, 1041ms, 980ms, 850ms, 1082ms, 1091ms, 976ms, 960ms, 1021ms, 1019ms, 991ms, 958ms, 850ms, 1008ms, 873ms], Average inference time: 1614.26 ms
Model: ARC_PSD-001_1.1.122_bst_yl80201_float32.tflite, Option: cpu, Inference times (ms): [1445ms, 1504ms, 1364ms, 1337ms, 1383ms, 1350ms, 1364ms, 1365ms, 1354ms, 1413ms, 1403ms, 1310ms, 1336ms, 1823ms, 1355ms, 1728ms, 1450ms, 1492ms, 1383ms, 1274ms, 1370ms, 1251ms, 1719ms, 1800ms, 1539ms, 1546ms, 1722ms, 1390ms, 1394ms, 1330ms, 1338ms, 1373ms, 1362ms, 1424ms, 1604ms, 1316ms, 1431ms, 1313ms, 1381ms, 1265ms, 1449ms, 1663ms, 1354ms, 1372ms, 1358ms, 1419ms, 1356ms, 1355ms, 1310ms, 1430ms, 1346ms, 1304ms, 1405ms, 1315ms, 1816ms, 1320ms, 1397ms, 1311ms, 1393ms, 1345ms, 1416ms, 1375ms, 1370ms, 1373ms, 1274ms, 1365ms, 1433ms, 1362ms, 1352ms, 1304ms, 1351ms, 1337ms, 1438ms, 1401ms, 1369ms, 1365ms, 1633ms, 1670ms, 1396ms, 1657ms, 1367ms, 1404ms, 1373ms, 1439ms, 1387ms, 1371ms, 1339ms, 1411ms, 1416ms, 1370ms, 1483ms, 1389ms, 1341ms, 1402ms, 1320ms, 1370ms, 1424ms, 1479ms, 1520ms, 1308ms], Average inference time: 1414.73 ms
Model: ARC_PSD-001_1.1.122_bst_yl80201_float32.tflite, Option: xnnpack, Inference times (ms): [1159ms, 1131ms, 1130ms, 1130ms, 1130ms, 1131ms, 1130ms, 1122ms, 1130ms, 1130ms, 1130ms, 1131ms, 1130ms, 1130ms, 1130ms, 1130ms, 1130ms, 1129ms, 1131ms, 1130ms, 1130ms, 1131ms, 1131ms, 1130ms, 1130ms, 1130ms, 1130ms, 1132ms, 1130ms, 1130ms, 1130ms, 1130ms, 1131ms, 1130ms, 1130ms, 1130ms, 1130ms, 1131ms, 1130ms, 1130ms, 1130ms, 1130ms, 1130ms, 1130ms, 1130ms, 1131ms, 1129ms, 1130ms, 1131ms, 1130ms, 1129ms, 1129ms, 1131ms, 1130ms, 1130ms, 1129ms, 1131ms, 1130ms, 1130ms, 1129ms, 1130ms, 1130ms, 1131ms, 1130ms, 1129ms, 1129ms, 1130ms, 1130ms, 1130ms, 1129ms, 1130ms, 1134ms, 1129ms, 1131ms, 1130ms, 1129ms, 1130ms, 1130ms, 1130ms, 1131ms, 1129ms, 1131ms, 1130ms, 1129ms, 1130ms, 1130ms, 1130ms, 1130ms, 1130ms, 1131ms, 1129ms, 1130ms, 1130ms, 1130ms, 1130ms, 1130ms, 1131ms, 1129ms, 1131ms, 1129ms], Average inference time: 1130.3 ms
Model: ARC_PSD-001_1.1.122_bst_yl80201_float16.tflite, Option: gpu, Inference times (ms): [418ms, 714ms, 771ms, 622ms, 817ms, 814ms, 785ms, 813ms, 810ms, 812ms, 591ms, 812ms, 812ms, 812ms, 815ms, 662ms, 811ms, 812ms, 815ms, 624ms, 810ms, 807ms, 809ms, 811ms, 813ms, 814ms, 810ms, 813ms, 809ms, 809ms, 784ms, 810ms, 810ms, 809ms, 809ms, 770ms, 775ms, 812ms, 811ms, 804ms, 787ms, 809ms, 811ms, 810ms, 663ms, 816ms, 809ms, 812ms, 601ms, 809ms, 811ms, 808ms, 810ms, 809ms, 810ms, 816ms, 811ms, 810ms, 675ms, 809ms, 811ms, 810ms, 624ms, 808ms, 808ms, 813ms, 812ms, 811ms, 810ms, 816ms, 810ms, 809ms, 810ms, 812ms, 809ms, 660ms, 811ms, 806ms, 810ms, 808ms, 808ms, 812ms, 811ms, 820ms, 809ms, 809ms, 814ms, 813ms, 812ms, 811ms, 812ms, 817ms, 809ms, 810ms, 809ms, 811ms, 810ms, 589ms, 812ms, 812ms], Average inference time: 786.15 ms
Model: ARC_PSD-001_1.1.122_bst_yl80201_float16.tflite, Option: nnapi, Inference times (ms): [1156ms, 1128ms, 1127ms, 1127ms, 1127ms, 1127ms, 1127ms, 1128ms, 1127ms, 1127ms, 1127ms, 1127ms, 1128ms, 1127ms, 1126ms, 1127ms, 1129ms, 1128ms, 1128ms, 1128ms, 1128ms, 1129ms, 1128ms, 1127ms, 1128ms, 1127ms, 1128ms, 1127ms, 1127ms, 1128ms, 1127ms, 1127ms, 1128ms, 1127ms, 1128ms, 1128ms, 1127ms, 1128ms, 1128ms, 1127ms, 1127ms, 1128ms, 1128ms, 1128ms, 1127ms, 1129ms, 1128ms, 1127ms, 1129ms, 1127ms, 1128ms, 1127ms, 1127ms, 1128ms, 1130ms, 1126ms, 1127ms, 1127ms, 1127ms, 1127ms, 1128ms, 1127ms, 1127ms, 1127ms, 1127ms, 1130ms, 1128ms, 1127ms, 1127ms, 1129ms, 1127ms, 1127ms, 1128ms, 1127ms, 1127ms, 1127ms, 1127ms, 1128ms, 1128ms, 1127ms, 1128ms, 1127ms, 1127ms, 1127ms, 1127ms, 1127ms, 1127ms, 1129ms, 1127ms, 1127ms, 1127ms, 1123ms, 1127ms, 1128ms, 1127ms, 1127ms, 1127ms, 1128ms, 1126ms, 1128ms], Average inference time: 1127.71 ms
Model: ARC_PSD-001_1.1.122_bst_yl80201_float16.tflite, Option: cpu, Inference times (ms): [1293ms, 1412ms, 1377ms, 1389ms, 1452ms, 1516ms, 1465ms, 1520ms, 1476ms, 1383ms, 1373ms, 1440ms, 1557ms, 1592ms, 1405ms, 1328ms, 1385ms, 1342ms, 1356ms, 1348ms, 1743ms, 1693ms, 1603ms, 1329ms, 1391ms, 1356ms, 1441ms, 1439ms, 1316ms, 1309ms, 1305ms, 1556ms, 1467ms, 1641ms, 1385ms, 1420ms, 1352ms, 1342ms, 1584ms, 1272ms, 1332ms, 1388ms, 1327ms, 1311ms, 1446ms, 1699ms, 1380ms, 1692ms, 1779ms, 1335ms, 1389ms, 1598ms, 1441ms, 1441ms, 1340ms, 1363ms, 1435ms, 1360ms, 1407ms, 1321ms, 1447ms, 1422ms, 1362ms, 1474ms, 1366ms, 1390ms, 1622ms, 1723ms, 1386ms, 1438ms, 1412ms, 1352ms, 1650ms, 1679ms, 1432ms, 1742ms, 1469ms, 1291ms, 1403ms, 1446ms, 1419ms, 1416ms, 1395ms, 1280ms, 1491ms, 1644ms, 1297ms, 1314ms, 1391ms, 1429ms, 1379ms, 1755ms, 1505ms, 1551ms, 1662ms, 1396ms, 1317ms, 1409ms, 1366ms, 1360ms], Average inference time: 1444.19 ms
Model: ARC_PSD-001_1.1.122_bst_yl80201_float16.tflite, Option: xnnpack, Inference times (ms): [1158ms, 1127ms, 1128ms, 1127ms, 1128ms, 1128ms, 1128ms, 1127ms, 1127ms, 1127ms, 1128ms, 1127ms, 1131ms, 1127ms, 1129ms, 1128ms, 1126ms, 1127ms, 1127ms, 1126ms, 1127ms, 1128ms, 1128ms, 1127ms, 1127ms, 1130ms, 1128ms, 1128ms, 1128ms, 1127ms, 1127ms, 1127ms, 1127ms, 1127ms, 1128ms, 1127ms, 1129ms, 1126ms, 1128ms, 1129ms, 1127ms, 1128ms, 1128ms, 1128ms, 1129ms, 1127ms, 1128ms, 1128ms, 1129ms, 1128ms, 1127ms, 1128ms, 1127ms, 1128ms, 1127ms, 1127ms, 1127ms, 1127ms, 1128ms, 1127ms, 1127ms, 1128ms, 1127ms, 1127ms, 1128ms, 1127ms, 1128ms, 1127ms, 1128ms, 1128ms, 1127ms, 1128ms, 1127ms, 1128ms, 1126ms, 1127ms, 1128ms, 1127ms, 1127ms, 1127ms, 1128ms, 1130ms, 1127ms, 1127ms, 1128ms, 1128ms, 1127ms, 1128ms, 1127ms, 1128ms, 1127ms, 1127ms, 1127ms, 1128ms, 1127ms, 1125ms, 1128ms, 1128ms, 1127ms, 1128ms], Average inference time: 1127.84 ms

TensorFlow Lite with iOS MTLBuffer doesn't support dynamic shape?

Issue type

Feature Request

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

tf 2.16.1

Custom code

No

OS platform and distribution

iOS

Mobile device

iPhone

Python version

No response

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

I'm trying to use tflite with Metal MTLBuffer on iOS following the doc : https://www.tensorflow.org/lite/ios/delegates/gpu#inputoutput_buffers_using_c_api

My model.tflite is designed for dynamic shape usecase, so the input/output shape is saved as [1,-1,-1,1] in the model.

I've tried to call ResizeInputTensor before ModifyGraphWithDelegate , but it causes
Execution of the command buffer was aborted due to an error during execution. Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault) when calling Invoke.

If I don't call ResizeInputTensor before ModifyGraphWithDelegate, then I still got nothing from output tensor.

I want to know if tensorflowlite metal delegate supports dynamic shape or not, and how to use it with dynamic shape correctly?

Standalone code to reproduce the issue

tflite::ops::builtin::BuiltinOpResolver op_resolver;
        tflite::InterpreterBuilder interpreter_builder(model, op_resolver);

         // Configure and create the delegate.
        TFLGpuDelegateOptions options;
        options.enable_quantization = true;
        options.allow_precision_loss = true;
        options.wait_type = TFLGpuDelegateWaitType::TFLGpuDelegateWaitTypeActive;
        _gpu_delegate = TFLGpuDelegateCreate(&options);

        if (interpreter_builder(&_predictor) != kTfLiteOk || !_predictor) {
            GLOGE("Unable to prepare TfLite interpreter.");
        }
        TfLiteStatus status;
       status  = _predictor->ResizeInputTensor(0, {1, input_height, input_width, 1});
        if (status != kTfLiteOk) {
            GLOGE("Failed to resize input tensor: {}", status);
            return;
        }

        status = _predictor->ModifyGraphWithDelegate(_gpu_delegate);
        if (status != kTfLiteOk) {
            GLOGE("Failed to ModifyGraphWithDelegate: {}", status);
            return;
        }

        _predictor->SetAllowBufferHandleOutput(true);  // disable default gpu->cpu copy
        
       // id<MTLBuffer> input and  id<MTLBuffer> output from other parts of my codes
        if (!TFLGpuDelegateBindMetalBufferToTensor(
            _gpu_delegate, _predictor->inputs()[0], input)) {
            GLOGE("Failed to TFLGpuDelegateBindMetalBufferToTensor input");
            return false;
        }
        if (!TFLGpuDelegateBindMetalBufferToTensor(
                _gpu_delegate, _predictor->outputs()[0], output)) {
            GLOGE("Failed to TFLGpuDelegateBindMetalBufferToTensor output");
            return false;
        }

        id<MTLCommandBuffer> command_buffer = [_metal_queue commandBuffer];
        command_buffer.label = @"TfliteMetalRunner";
        TFLGpuDelegateSetCommandBuffer(_gpu_delegate, command_buffer);

        if (_predictor->Invoke() != kTfLiteOk) {
            GLOGE("metal runner invoke failed");
            return false;
        }
            GLOGE("metal runner invoke success");

        [command_buffer commit];
        [command_buffer waitUntilScheduled];

Relevant log output

2024-07-12 18:54:38.834941+0800 myapp[6052:2224696] Execution of the command buffer was aborted due to an error during execution. Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)
2024-07-12 18:54:38.890017+0800 myapp[6052:2214145] Execution of the command buffer was aborted due to an error during execution. Caused GPU Address Fault Error (0000000b:kIOGPUCommandBufferCallbackErrorPageFault)
2024-07-12 18:54:38.920375+0800 myapp[6052:2214145] Execution of the command buffer was aborted due to an error during execution. Ignored (for causing prior/excessive GPU errors) (00000004:kIOGPUCommandBufferCallbackErrorSubmissionsIgnored)
2024-07-12 18:54:38.920508+0800 myapp[6052:2214145] Execution of the command buffer was aborted due to an error during execution. Ignored (for causing prior/excessive GPU errors) (00000004:kIOGPUCommandBufferCallbackErrorSubmissionsIgnored)

Consider dilation parameter support for Conv2dTranspose

The Web Neural Network(WebNN) API defines a web-friendly abstraction layer that makes use of Machine Learning capabilities of operating systems and underlying hardware platforms. The TensorFlow Lite will be used as inference runtime on CrOS/Linux/Android platform to implement WebNN API in chromium browser, so WebNN operations need to be converted to TFLite builtin operator with TFLite schema, WebNN convTranspose2d defines the dilation parameter in the Spec, but TFLite schema doesn't support it, do you have plan to support the dilation parameter?

Tflite: android benchmark test get differen result on GPU delegate

Hi,
When I use the benchmark script and benchmark apk to test my model performance, I get the same performance on CPU of XNNPACK delegate, but different performance on GPU of OpenCL delegate.

And the result

benchmark script benchmark apk
Time performance(ms) cpu gpu cpu gpu
mobilenetv2 5.7 7.4 5.6 4.3
mobilenetv3_small 1.8 4.9 1.7 2.1
mobilenetv3_large 5.3 6 5 3.8
The benchmark apk is almost twice as fast as the benchmark script on GPU.

Q:What's the different between the benchmark script and the apk?

Running on benchmark script

Log of benchmark script:

╰─$ adb install -r -d -g android_aarch64_benchmark_model.apk
╰─$ adb shell /data/local/tmp/android_aarch64_benchmark_model   --graph=/data/local/tmp/mobilenetv3_large.tflite \
  --num_threads=4  --num_runs=50 --use_gpu=true
INFO: STARTING!
INFO: Log parameter values verbosely: [0]
INFO: Min num runs: [50]
INFO: Num threads: [4]
INFO: Graph: [/data/local/tmp/mobilenetv3_large.tflite]
INFO: #threads used for CPU inference: [4]
INFO: Use gpu: [1]
INFO: Loaded model /data/local/tmp/mobilenetv3_large.tflite
INFO: Initialized TensorFlow Lite runtime.
INFO: Created TensorFlow Lite delegate for GPU.
INFO: GPU delegate created.
VERBOSE: Replacing 126 out of 126 node(s) with delegate (TfLiteGpuDelegateV2) node, yielding 1 partitions for the whole graph.
INFO: Initialized OpenCL-based API.
INFO: Created 1 GPU delegate kernels.
INFO: Explicitly applied GPU delegate, and the model graph will be completely executed by the delegate.
INFO: The input model file size (MB): 21.9417
INFO: Initialized session in 1342.81ms.
INFO: Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
INFO: count=72 first=11653 curr=6233 min=5959 max=11653 avg=6835.62 std=903

INFO: Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
INFO: count=148 first=6394 curr=6988 min=6096 max=8656 avg=6650.52 std=446

INFO: Inference timings in us: Init: 1342809, First inference: 11653, Warmup (avg): 6835.62, Inference (avg): 6650.52
INFO: Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
INFO: Memory footprint delta from the start of the tool (MB): init=108.359 overall=108.359

Running on benchmark apk

╰─$ adb shell am start -S \
  -n org.tensorflow.lite.benchmark/.BenchmarkModelActivity \
  --es args '"--graph=/data/local/tmp/mobilenetv3_large.tflite \
              --num_threads=4 --num_runs=50 --use_gpu=true"'

Log on logcat:

03-04 16:23:46.821  6210  6210 I tflite  : Initialized OpenCL-based API.
03-04 16:23:46.859  6210  6210 I tflite  : Created 1 GPU delegate kernels.
03-04 16:23:46.859  6210  6210 I tflite  : Explicitly applied GPU delegate, and the model graph will be completely executed by the delegate.
03-04 16:23:46.859  6210  6210 I tflite  : The input model file size (MB): 21.9417
03-04 16:23:46.859  6210  6210 I tflite  : Initialized session in 856.303ms.
03-04 16:23:46.860  6210  6210 I tflite  : Running benchmark for at least 1 iterations and at least 0.5 seconds but terminate if exceeding 150 seconds.
03-04 16:23:47.164  8268  9284 I DisplayFrameSetting: homeToAppEnd pkg=org.tensorflow.lite.benchmark
03-04 16:23:47.363  6210  6210 I tflite  : count=124 first=4601 curr=3901 min=3882 max=4601 avg=4018.02 std=126
03-04 16:23:47.363  6210  6210 I tflite  : Running benchmark for at least 50 iterations and at least 1 seconds but terminate if exceeding 150 seconds.
03-04 16:23:48.153  1895  3405 I MiuiNetworkPolicy: bandwidth: 0 KB/s, Max bandwidth: 200 KB/s
03-04 16:23:48.365  6210  6210 I tflite  : count=234 first=3909 curr=4552 min=3856 max=5840 avg=4207.86 std=334
03-04 16:23:48.366  6210  6210 I tflite  : Inference timings in us: Init: 856303, First inference: 4601, Warmup (avg): 4018.02, Inference (avg): 4207.86
03-04 16:23:48.366  6210  6210 I tflite  : Note: as the benchmark tool itself affects memory footprint, the following is only APPROXIMATE to the actual memory footprint of the model at runtime. Take the information at your discretion.
03-04 16:23:48.366  6210  6210 I tflite  : Memory footprint delta from the start of the tool (MB): init=103.145 overall=103.145

System information

  • Android Device information: xiaomi-12

can't crosscompile tensorflow lite c library using camke

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

2.17.0

Custom code

Yes

OS platform and distribution

Ubuntu 22.04.4 LTS

Mobile device

No response

Python version

No response

Bazel version

cmake version 3.22.1

GCC/compiler version

gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

I need to use the tflite c library in a beaglebone black rev3 with this specs:

processor       : 0
model name      : ARMv7 Processor rev 2 (v7l)
BogoMIPS        : 995.32
Features        : half thumb fastmult vfp edsp thumbee neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x3
CPU part        : 0xc08
CPU revision    : 2

Hardware        : Generic AM33XX (Flattened Device Tree)
Revision        : 0000
Serial          : 2218SBB15982
ldd (Debian GLIBC 2.28-10) 2.28
gcc version 8.3.0

i'm trying to build the library using this toolchain : gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf
so like this tutto https://www.tensorflow.org/lite/guide/build_cmake_arm i used this command first:

mkdir tflite_build
cd tflite_build
ARMCC_FLAGS="-march=armv7-a -mfpu=neon-vfpv3 -funsafe-math-optimizations -mfp16-format=ieee"
ARMCC_PREFIX=${HOME}/toolchains/gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf/bin/arm-linux-gnueabihf-
cmake -DCMAKE_C_COMPILER=${ARMCC_PREFIX}gcc \
  -DCMAKE_CXX_COMPILER=${ARMCC_PREFIX}g++ \
  -DCMAKE_C_FLAGS="${ARMCC_FLAGS}" \
  -DCMAKE_CXX_FLAGS="${ARMCC_FLAGS}" \
  -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON \
  -DCMAKE_SYSTEM_NAME=Linux \
  -DCMAKE_SYSTEM_PROCESSOR=armv7 \
  ../tensorflow/lite/

but then i have errors when i run cmake --build . -j

Standalone code to reproduce the issue

mkdir tflite_build
cd tflite_build
ARMCC_FLAGS="-march=armv7-a -mfpu=neon-vfpv3 -funsafe-math-optimizations -mfp16-format=ieee"
ARMCC_PREFIX=${HOME}/toolchains/gcc-arm-8.3-2019.03-x86_64-arm-linux-gnueabihf/bin/arm-linux-gnueabihf-
cmake -DCMAKE_C_COMPILER=${ARMCC_PREFIX}gcc \
  -DCMAKE_CXX_COMPILER=${ARMCC_PREFIX}g++ \
  -DCMAKE_C_FLAGS="${ARMCC_FLAGS}" \
  -DCMAKE_CXX_FLAGS="${ARMCC_FLAGS}" \
  -DCMAKE_VERBOSE_MAKEFILE:BOOL=ON \
  -DCMAKE_SYSTEM_NAME=Linux \
  -DCMAKE_SYSTEM_PROCESSOR=armv7 \
  ../tensorflow/lite/
cmake --build . -j

Relevant log output

No response

TFLite Podspecs haven't been updated after v2.14

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

binary

TensorFlow version

2.16.2

Custom code

No

OS platform and distribution

iOS

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

TensorflowLiteC podspec still uses the 2.14 binary and hasn't been updated since v2.14. It's necessary to be able to integrate the latest versions with the C API for iOS.

Standalone code to reproduce the issue

N/A

Relevant log output

No response

[TensorFlowLite] Add new data types in the TFLite format schema

Add new data types in the TFLite format schema that could be leveraged by custom accelerators, such as:

  • Custom floating point format, having configurable number of bits for exponent and mantissa (something like TensorType_FLOAT8 using custom format options)

This is relevant only for the TFLite format (schema) used for model representation and not necessary for the TFLite tools or runtime.
This is relevant when industry has proprietary tools that rely on the TFLite format but not on the TFLite tools or runtime (interpreter),

ruy::CpuInfo::Initialize() Null pointer dereference: SIGSEGV 0x0000000000000008

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

tensorflow-lite:2.16.1

Custom code

No

OS platform and distribution

No response

Mobile device

Android

Python version

No response

Bazel version

6.5.0

GCC/compiler version

Clang 8.0.7

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

We recently upgraded TensorFlow Lite from 2.9.0 to 2.16.1 (the latest version from MavenCentral) and received a fresh native crash issue on Firebase Crashlytic. (Possibly related to #74043). But since the official release of the shared library on MavenCentral was stripped (#72877), we can't investigate further. Our team decided to rebuild TensorFlow Lite 2.16.1 from source, with the flag tflite_keep_symbols=true to keep the stack trace info more readable. The issue has about 30k crash events, affecting 7k users. We still, can not reproduce the crash locally.

crashlytics_report

Standalone code to reproduce the issue

We use only two main functions to infer the model, the crash happened in both cases.


org.tensorflow.lite.InterpreterApi#run
org.tensorflow.lite.Interpreter#runSignature(java.util.Map<java.lang.String,java.lang.Object>, java.util.Map<java.lang.String,java.lang.Object>, java.lang.String)


### Relevant log output

```shell
null pointer dereference: SIGSEGV  0x0000000000000008
#00 pc 0x23ddd8 libtensorflowlite_jni.so (ruy::CpuInfo::Initialize() [cpuinfo.cc:66]) (BuildId: 488448e6d222d2f499e4c2978695bdc8ea291e71)
#01 pc 0x23ddd4 libtensorflowlite_jni.so (ruy::CpuInfo::Initialize() [cpuinfo.cc:64]) (BuildId: 488448e6d222d2f499e4c2978695bdc8ea291e71)
#02 pc 0x23de6c libtensorflowlite_jni.so (ruy::CpuInfo::NeonDotprod() [cpuinfo.cc:35]) (BuildId: 488448e6d222d2f499e4c2978695bdc8ea291e71)
#03 pc 0x23c2c8 libtensorflowlite_jni.so (ruy::Ctx::GetRuntimeEnabledPaths() [ctx.cc:125]) (BuildId: 488448e6d222d2f499e4c2978695bdc8ea291e71)
#04 pc 0x23c3b4 libtensorflowlite_jni.so (ruy::Ctx::SelectPath(ruy::Path) [ctx.cc:177]) (BuildId: 488448e6d222d2f499e4c2978695bdc8ea291e71)
#05 pc 0xa63b4 libtensorflowlite_jni.so (void ruy::detail::CreateTrMulParamsAssumingColMajorDst<(ruy::Path)49, float, float, float, float>(ruy::Mat<float> const&, ruy::Mat<float> const&, ruy::Mat<float> const&, ruy::MulParams<float, float> const&, ruy::ChannelDimension, ruy::Ctx*, ruy::TrMulParams*) [create_trmul_params.h:423]) (BuildId: 488448e6d222d2f499e4c2978695bdc8ea291e71)
#06 pc 0xa626c libtensorflowlite_jni.so (void ruy::MulFrontEnd<(ruy::Path)49, float, float, float, float>(ruy::Mat<float> const&, ruy::Mat<float> const&, ruy::MulParams<float, float> const&, ruy::Ctx*, ruy::Mat<float>*) [create_trmul_params.h:472]) (BuildId: 488448e6d222d2f499e4c2978695bdc8ea291e71)
#07 pc 0xa5cb8 libtensorflowlite_jni.so (tflite::cpu_backend_gemm::detail::GemmImplUsingRuy<float, float, float, float, (tflite::cpu_backend_gemm::QuantizationFlavor)0>::Run(tflite::cpu_backend_gemm::MatrixParams<float> const&, float const*, tflite::cpu_backend_gemm::MatrixParams<float> const&, float const*, tflite::cpu_backend_gemm::MatrixParams<float> const&, float*, tflite::cpu_backend_gemm::GemmParams<float, float, (tflite::cpu_backend_gemm::QuantizationFlavor)0> const&, tflite::CpuBackendContext*) [ruy.h:46]) (BuildId: 488448e6d222d2f499e4c2978695bdc8ea291e71)
#08 pc 0x12c9bc libtensorflowlite_jni.so (tflite::optimized_ops::FullyConnected(tflite::FullyConnectedParams const&, tflite::RuntimeShape const&, float const*, tflite::RuntimeShape const&, float const*, tflite::RuntimeShape const&, float const*, tflite::RuntimeShape const&, float*, tflite::CpuBackendContext*) [optimized_ops.h:306]) (BuildId: 488448e6d222d2f499e4c2978695bdc8ea291e71)
#09 pc 0x12b108 libtensorflowlite_jni.so (TfLiteStatus tflite::ops::builtin::fully_connected::EvalFloat<(tflite::ops::builtin::fully_connected::KernelType)1>(TfLiteContext*, TfLiteNode*, TfLiteFullyConnectedParams*, tflite::ops::builtin::fully_connected::OpData*, TfLiteTensor const*, TfLiteTensor const*, TfLiteTensor const*, TfLiteTensor*) [fully_connected.cc:1563]) (BuildId: 488448e6d222d2f499e4c2978695bdc8ea291e71)
#10 pc 0x129aa0 libtensorflowlite_jni.so (TfLiteStatus tflite::ops::builtin::fully_connected::Eval<(tflite::ops::builtin::fully_connected::KernelType)1>(TfLiteContext*, TfLiteNode*) [fully_connected.cc:1605]) (BuildId: 488448e6d222d2f499e4c2978695bdc8ea291e71)
#11 pc 0x3403b0 libtensorflowlite_jni.so (tflite::Subgraph::InvokeImpl() [subgraph.cc:1396]) (BuildId: 488448e6d222d2f499e4c2978695bdc8ea291e71)
#12 pc 0x33fd98 libtensorflowlite_jni.so (tflite::Subgraph::Invoke() [subgraph.cc:1581]) (BuildId: 488448e6d222d2f499e4c2978695bdc8ea291e71)
#13 pc 0x333bc8 libtensorflowlite_jni.so (tflite::impl::SignatureRunner::Invoke() [signature_runner.cc:82]) (BuildId: 488448e6d222d2f499e4c2978695bdc8ea291e71)
#14 pc 0x28a4c libtensorflowlite_jni.so (Java_org_tensorflow_lite_NativeSignatureRunnerWrapper_nativeInvoke [nativesignaturerunner_jni.cc:268]) (BuildId: 488448e6d222d2f499e4c2978695bdc8ea291e71)
#15 pc 0x351e30 libart.so (BuildId: ddcc440d4609d2099db9d20895487a78)

Adding a GPU model to the compatibility database in tflite

Issue type

Support

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

tf2.15

Custom code

Yes

OS platform and distribution

No response

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

I have a question in the "acceleration/compatibility" part in tflite.
As far as I know, Samsung mobile model "samsung_xclipse_920" (S22) information was recently added to the compatibility database to check and support the GPU delegation.
tensorflow/tensorflow@1f3d148

I would like to add "samsung_xclipse_940", a S24 model that was recently released officially on the market,
Are there any ways to do it? Is there anything already in progress? Or, can I upload some PRs (gpu_compatibility.bin ...) related to this manually?

Standalone code to reproduce the issue

flatc -t --raw-binary --strict-json database.fbs -- gpu_compatibility.bin
flatc -b database.fbs gpu_compatibility.json

Relevant log output

No response

Given Shapes are not Broadcastable, tf to tf-lite conversion error

1. System information

Google colab as of 2024-02-27
tf version: 2.15.0

Problem

I receive the following error when converting a tf module from tf to tfl-lite:

RuntimeError: Given shapes, [1,436,1024,3] and [1,218,512,3], are not broadcastable.Node number 1 (SUB) failed to prepare.

Below is the relevant class:

class Inpainting(Model):
    def __init__(self, downsample_factor, **kwargs):
        super(Inpainting, self).__init__(**kwargs)
        self.downsample_factor = downsample_factor
        self.downsize = Resizing(218, 512)

    @tf.function(input_signature=[
        tf.TensorSpec(shape=(1, 218, 512, 3)),
        tf.TensorSpec(shape=(1, 218, 512, 1)),
        tf.TensorSpec(shape=(1, 436, 1024, 3)),
        tf.TensorSpec(shape=(1, 436, 1024, 1)),
        ])
    def call(self, img_t_lr, depth_t_lr, img_t_w, depth_t_w):

        img_t_wlr = self.downsize(img_t_w)
        depth_t_wlr = self.downsize(depth_t_w)
        assert img_t_wlr.shape[1:] == img_t_lr.shape[1:], "must be same shape"
        diff_color = tf.math.subtract(img_t_lr, img_t_wlr)
        assert depth_t_lr.shape[1:] == depth_t_wlr.shape[1:], "must be same shape"
        diff_depth = tf.math.subtract(depth_t_lr, depth_t_wlr)
        diff = tf.concat([diff_color, diff_depth], axis=3)
        return diff

Please see here for the full gist.

I am a bit surprised that this happens. Could this be a bug?

TFLiteGPUDelegate : FirstNLargestPartitions : It would not be very good solution

Issue type

Performance

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

tf 2.4.1

Custom code

Yes

OS platform and distribution

Linux Ubuntu 18.04

Mobile device

Jetson-Xavier nx

Python version

3.6.9

Bazel version

3.1.0

GCC/compiler version

7.5.0

CUDA/cuDNN version

x

GPU model and memory

No response

Current behavior?

Let's suppose that the input tflite model is accelerated using GPUs. If there are incompatible layers for GPU backend (called Fallback layer in tflite) in the middle of the input model, "partition_helper class" divides the input model into delegatable & non-delegatable partition using graph_info class. Then, Apply the FirstNLargestPartitions functions to select the part to be delegated. In FirstNLargestPartitions logic, partitions with many layers are chosen first. Here, the variable N can be set directly by the user, and the default is 1. In this tflite's gpu delegation policy, there are two major problems, I think.
First, It may not be appropriate to preferentially select a partition with many layers containing it. In most cases, such partitions will have the highest computational amount, but there may be partitions with a small number of layers but a very large amount of computation. For example, a partition with two convolutional layers may have more computational power than a partition with 10 simple computational layers, such as add mul. In such cases, FirstNLargest logic may not see the benefits of acceleration well.
Second, It is inconvenient to have the user set the N value, which is a variable for how many partitions to select and delegate. The user must tune and test one by one to see when the N value can most accelerate the input model. Since there is data exchange overhead between CPU and GPU, in most cases the inference time is fastest when the N value is 1, but depending on the structure of the model, there may be cases where it is not.
As a result, I think the existing delegation policy has room for more efficient change by improving the aforementioned problems. However, it seems that the most recent version of the tflite also uses the policy as it is.
Currently, I conducted an inference performance test with yolov4-tiny by delegating all possible partitions to all combinations. As a result, the more computational partitions were delegated, the better the inference performance.
However, the above trend was not always correct because the larger the N value, the greater the CPU GPU data exchange overhead.
Given this overall tendency, it would be better to choose partitions with a large amount of computation rather than to select partitions with a large number of layers contained, in same "N" value condition.
In addition, the N value, which is the variable of how many partitions to select and delegate, may vary depending on the performance of the target hardware. I have not yet developed a logic that prevents users from tuning these N values and automatically selects and delegates partitions internally. It seems difficult to find the most appropriate N value by analyzing the performance of the target hardware without going through tasks such as the profiling process.
On the other hand, the method of obtaining an approximate amount of computation for delegatable partitions and delegating large partitions preferentially will be very simple and effective compared to the previous method.
I wonder why tflite still uses FirstNLargestPartitions logic and whether the aforementioned logic is appropriate from an overall perspective.

Standalone code to reproduce the issue

if(strcmp(GetOpName(reg), "CONV_2D") == 0){
          double mac = tensor->dims->data[1] * tensor->dims->data[2] * tensor->dims->data[3] * i_tensor->dims->data[3] * filter * filter;
          flops = 2*mac/1000000;
          tot += flops;
          printf("\033[0;31mFLOPs : %.1f\033[0m\n", flops);
        }
Above code shows roughly calculate amount of computation per layers.
If the contents mentioned in the above current behavior are appropriate, a new delegation policy can be developed based on the above code.

A program to analyze the inference results by delegating delegatable partitions to all possible combinations was conducted at the link below.
[tflite APP code] https://github.com/easyhardhoon/FBF-TF-hoon/tree/hoon/APP_DOT 
[tflite source code] https://github.com/easyhardhoon/FBF-TF

Relevant log output

testing yolov4-tiny.tflite
there are seven delegatable partitions by partition helper class.
below log is the result of delegation test in jetson-xavier nx, using my custom tflite application & tflite source.
below log shows that the default gpu delegation policy in tflite is not appropriate. 


=== Fallback node number info === : 
8 20 32 55 57 60 63 67 69 70 73 75 76 81 86 91 95 97 98 103 105 108 111 117 122 127 132 134 135 138 140 141 144 146 147 
=== Delegated_partitions info === : 
[0] : 0 1 2 3 4 5 6 7 
[1] : 9 10 11 12 13 14 15 16 17 18 19 
[2] : 21 22 23 24 25 26 27 28 29 30 31 
[3] : 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 102 
[4] : 56 59 62 66 72 78 79 80 83 84 85 88 89 90 94 104 107 110 114 115 116 119 120 121 124 125 126 131 137 143 
[5] : 58 61 64 65 68 74 82 87 92 93 96 106 109 112 113 118 123 128 129 130 133 139 145 
[6] : 71 77 99 100 101 136 142 148 149 150 151 
=== Fallback reason info === : 
ERROR: Following operations are not supported by GPU delegate:
ADD: tensor type error 
MUL: tensor type error
SPLIT: Operation is not supported.
SPLIT_V: Operation is not supported.

N = 1
[0 ]case's latency is 669ms
[1 ]case's latency is 677ms
[2 ]case's latency is 645ms
[3 ]case's latency is 402.00ms  -> Best case
[4 ]case's latency is 632ms   ---> chosen at default delegation.
[5 ]case's latency is 630ms
[6 ]case's latency is 618ms
[END]...Choose_1's average latency is 610.43ms
N = 2
[0 1 ]case's latency is 393ms
[0 2 ]case's latency is 486ms
[0 3 ]case's latency is 293ms
[0 4 ]case's latency is 534ms
[0 5 ]case's latency is 511ms
[0 6 ]case's latency is 510ms
[1 2 ]case's latency is 408ms
[1 3 ]case's latency is 289.00ms
[1 4 ]case's latency is 519ms
[1 5 ]case's latency is 520ms
[1 6 ]case's latency is 519ms
[2 3 ]case's latency is 296ms --> best case
[2 4 ]case's latency is 536ms
[2 5 ]case's latency is 550ms
[2 6 ]case's latency is 534ms
[3 4 ]case's latency is 403ms
[3 5 ]case's latency is 404ms
[3 6 ]case's latency is 408ms
[4 5 ]case's latency is 638ms  --> chosen at default delegation.
[4 6 ]case's latency is 628ms
[5 6 ]case's latency is 635ms
[END]...Choose_2's average latency is 476.86ms
N = 3
[0 1 2 ]case's latency is 280ms
[0 1 3 ]case's latency is 171.00ms
[0 1 4 ]case's latency is 389ms
[0 1 5 ]case's latency is 385ms
[0 1 6 ]case's latency is 391ms
[0 2 3 ]case's latency is 216ms
[0 2 4 ]case's latency is 470ms
[0 2 5 ]case's latency is 481ms
[0 2 6 ]case's latency is 454ms
[0 3 4 ]case's latency is 285ms
[0 3 5 ]case's latency is 287ms
[0 3 6 ]case's latency is 277ms
[0 4 5 ]case's latency is 526ms
[0 4 6 ]case's latency is 546ms
[0 5 6 ]case's latency is 517ms
[1 2 3 ]case's latency is 187ms
[1 2 4 ]case's latency is 407ms
[1 2 5 ]case's latency is 414ms
[1 2 6 ]case's latency is 408ms
[1 3 4 ]case's latency is 292ms
[1 3 5 ]case's latency is 293ms
[1 3 6 ]case's latency is 292ms
[1 4 5 ]case's latency is 529ms
[1 4 6 ]case's latency is 522ms
[1 5 6 ]case's latency is 524ms
[2 3 4 ]case's latency is 298ms
[2 3 5 ]case's latency is 301ms
[2 3 6 ]case's latency is 306ms
[2 4 5 ]case's latency is 535ms
[2 4 6 ]case's latency is 547ms
[2 5 6 ]case's latency is 549ms
[3 4 5 ]case's latency is 406ms
[3 4 6 ]case's latency is 402ms
[3 5 6 ]case's latency is 408ms
[4 5 6 ]case's latency is 640ms
[END]...Choose_3's average latency is 398.14ms
N = 4
[0 1 2 3 ]case's latency is 65.00ms
[0 1 2 4 ]case's latency is 287ms
[0 1 2 5 ]case's latency is 281ms
[0 1 2 6 ]case's latency is 277ms
[0 1 3 4 ]case's latency is 168ms
[0 1 3 5 ]case's latency is 169ms
[0 1 3 6 ]case's latency is 168ms
[0 1 4 5 ]case's latency is 390ms
[0 1 4 6 ]case's latency is 387ms
[0 1 5 6 ]case's latency is 389ms
[0 2 3 4 ]case's latency is 216ms
[0 2 3 5 ]case's latency is 232ms
[0 2 3 6 ]case's latency is 230ms
[0 2 4 5 ]case's latency is 456ms
[0 2 4 6 ]case's latency is 436ms
[0 2 5 6 ]case's latency is 480ms
[0 3 4 5 ]case's latency is 283ms
[0 3 4 6 ]case's latency is 277ms
[0 3 5 6 ]case's latency is 281ms
[0 4 5 6 ]case's latency is 534ms
[1 2 3 4 ]case's latency is 189ms
[1 2 3 5 ]case's latency is 188ms
[1 2 3 6 ]case's latency is 187ms
[1 2 4 5 ]case's latency is 410ms
[1 2 4 6 ]case's latency is 410ms
[1 2 5 6 ]case's latency is 405ms
[1 3 4 5 ]case's latency is 300ms
[1 3 4 6 ]case's latency is 292ms
[1 3 5 6 ]case's latency is 312ms
[1 4 5 6 ]case's latency is 525ms
[2 3 4 5 ]case's latency is 301ms
[2 3 4 6 ]case's latency is 313ms
[2 3 5 6 ]case's latency is 298ms
[2 4 5 6 ]case's latency is 543ms
[3 4 5 6 ]case's latency is 411ms
[END]...Choose_4's average latency is 316.86ms
N = 5
[0 1 2 3 4 ]case's latency is 66ms
[0 1 2 3 5 ]case's latency is 67ms
[0 1 2 3 6 ]case's latency is 64.00ms
[0 1 2 4 5 ]case's latency is 299ms
[0 1 2 4 6 ]case's latency is 285ms
[0 1 2 5 6 ]case's latency is 277ms
[0 1 3 4 5 ]case's latency is 178ms
[0 1 3 4 6 ]case's latency is 175ms
[0 1 3 5 6 ]case's latency is 177ms
[0 1 4 5 6 ]case's latency is 389ms
[0 2 3 4 5 ]case's latency is 193ms
[0 2 3 4 6 ]case's latency is 202ms
[0 2 3 5 6 ]case's latency is 177ms
[0 2 4 5 6 ]case's latency is 462ms
[0 3 4 5 6 ]case's latency is 286ms
[1 2 3 4 5 ]case's latency is 189ms
[1 2 3 4 6 ]case's latency is 190ms
[1 2 3 5 6 ]case's latency is 188ms
[1 2 4 5 6 ]case's latency is 412ms
[1 3 4 5 6 ]case's latency is 299ms
[2 3 4 5 6 ]case's latency is 332ms
[END]...Choose_5's average latency is 233.67ms
N = 6
[0 1 2 3 4 5 ]case's latency is 69ms
[0 1 2 3 4 6 ]case's latency is 67.00ms
[0 1 2 3 5 6 ]case's latency is 68ms
[0 1 2 4 5 6 ]case's latency is 292ms
[0 1 3 4 5 6 ]case's latency is 174ms
[0 2 3 4 5 6 ]case's latency is 225ms
[1 2 3 4 5 6 ]case's latency is 193ms
[END]...Choose_6's average latency is 155.43ms
N = 7
[0 1 2 3 4 5 6 ]case's latency is 70.00ms
[END]...Choose_7's average latency is 70.00ms
Minimum CAES's value : 64.00ms
Minimum CASE's N : 5
Minimum CASE's th : 2
Minimum CASE's combination : [0 1 2 3 6 ]

Object Detection in Android using front camera: the detected bounding boxes are drawn incorrectly

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

0.4.0

Custom code

No

OS platform and distribution

No response

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

When I move an object (initially placed at the center of the screen) from a far distance towards the camera lens, the left position of the bounding box gradually shifts to the right side of the screen instead of staying centered.

Here is the recording of the issue, https://drive.google.com/file/d/144zCu8yPXYSeVPKk4RjDTG40YOcbLAtX/view

This issue is reproducable in the https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android.

I have changed the camera lens to "LENS_FACING_FRONT" in the CameraFragment class. Additionally, to resolve the issue with the inverted (mirrored) coordinates, I have flipped the coordinates horizontally by adding the following code in the OverlayView class.

`

            val boundingBox = result.boundingBox

            val objectTop = boundingBox.top * scaleFactor
            val objectBottom = boundingBox.bottom * scaleFactor
            val objectOriginalLeft = boundingBox.left * scaleFactor
            val objectOriginalRight = boundingBox.right * scaleFactor
            val objectWidth = objectOriginalRight - objectOriginalLeft

             var objectLeft = width - objectOriginalLeft + objectWidth
             var objectRight = objectLeft + objectWidth

Standalone code to reproduce the issue

https://github.com/tensorflow/examples/tree/master/lite/examples/object_detection/android

Relevant log output

No response

TFLite FP16 with Core ML Delegate gets wrong result and has no speed up

1. System information

  • OS Platform and Distribution: iOS 17.2 (iPhone 13 Pro)
  • TensorFlow library: TensorFlowLiteSwift/CoreML ~> 0.0.1-nightly'

2. Code

Please refer to Tensorflow Lite MiDaS iOS Example
And I found the Core ML delegate inference speed was not increasing, the same as FP32 model.

var options = Interpreter.Options()
var delegates: [Delegate] = [coreMLDelegate]
var interpreter = try Interpreter(modelPath:"MiDaS_FP16.tflite", options: options, delegates: delegates)
try interpreter.allocateTensors()
inputTensor = try interpreter.input(at: 0)
outputTensor = try interpreter.output(at: 0)
do {
    try interpreter.copy(data, toInputAt: 0)

    // Run inference by invoking the `Interpreter`.
    try interpreter.invoke()

    // Get the output `Tensor` to process the inference results.
    outputTensor = try interpreter.output(at: 0)
      

  } catch let error {
    os_log(
      "Failed to invoke the interpreter with error: %s", type: .error,
      error.localizedDescription)
    return
  }

3. Failure after conversion

  • The original image
    COCO_val2014_000000003837

  • Model produces results using Core ML Delegate.
    COCO_val2014_000000003837_ANE_1

  • Model produces results using Metal Delegate.
    COCO_val2014_000000003837_GPU_1

5. (optional) Any other info / logs

MiDaS Float 16 TFLite model download from my OneDrive

Efficientdet models do not work on hexagon delegate

System information
Tensorflow lite v2.16.1
Hexagon library 1.20.0.1

Standalone code to reproduce the issue
I generated efficientdet normal and lite int8 models from automl github.
https://github.com/google/automl/blob/master/efficientdet/tf2/tutorial.ipynb

The models work well on CPU, but do not work on hexagon delegate.
When I put the same input value, the output of the hexagon delegate is invalid value.

Any other info / logs
I attached the efficientdet lite0 int8 model as a sample.
https://1drv.ms/u/s!AnqHHtrBqwyUg8N31bJmQjMmrF9QnA?e=lmvIdI

Full Integer Quantization Issue with Multiple Signatures in TensorFlow Lite

1. System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Ubuntu 22.04.3 LTS
  • TensorFlow installation (pip package or built from source):pip
  • TensorFlow library (version, if pip package or github SHA, if built from source):2.15.0

2. Code

To help reproduce this issue, I am providing a link to a custom Colab notebook:
Full Integer Quantization Issue with Multiple Signatures in TensorFlow Lite

3. Failure after conversion

In the dynamic range quantization process of TensorFlow Lite, it appears that for models with multiple signatures (including aliased ones), the quantization treats references to the same computational graph as a single entity. This is evidenced by the TFLite ModelAnalyzer report showing two subgraphs with identical sizes, yet the overall model size corresponds roughly to the size of a single subgraph. Specifically:

 TFLite ModelAnalyzer Output of  Dynamic Range Quantization:
              Model size:      56528 bytes
    Non-data buffer size:       3384 bytes (05.99 %)
  Total data buffer size:      53144 bytes (94.01 %)
          - Subgraph#0  :      53040 bytes (93.83 %)
          - Subgraph#1  :      53040 bytes (93.83 %)
    (Zero value buffers):          0 bytes (00.00 %)
    The total model size is not the sum of the two subgraphs, suggesting that the same subgraph is counted twice but only stored once.

However, the situation is markedly different in the full integer quantization process. Here, the quantization leads to two subgraphs with significantly different sizes, which indicates a distinct treatment of the computational graph segments during quantization. This behavior contrasts with the dynamic range quantization and suggests that the full integer quantization process might interpret or handle the aliased signatures differently, resulting in varied optimization or quantization strategies for the subgraphs. The detailed output is as follows:

 TFLite ModelAnalyzer Output of Full Integer Quantization:
              Model size:     259144 bytes
    Non-data buffer size:       4360 bytes (01.68 %)
  Total data buffer size:     254784 bytes (98.32 %)
          - Subgraph#0  :      51120 bytes (19.73 %)
          - Subgraph#1  :     203568 bytes (78.55 %)
    (Zero value buffers):          0 bytes (00.00 %)
    Here, the total model size reflects the sum of two distinctly sized subgraphs, highlighting a genuine differentiation in how each subgraph is quantized and stored.

This discrepancy between dynamic range and full integer quantization processes raises questions about the underlying mechanisms TensorFlow Lite employs for handling multiple signatures, especially when they reference the same computational graph segment. The difference in subgraph sizes under full integer quantization suggests that the process may inadvertently treat aliased signatures or multiple references as distinct computational entities, potentially leading to inefficiencies in model size and performance.

tvos support(tflite)

Issue type

Build/Install

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

2.9.3

Custom code

No

OS platform and distribution

macOS 14.4

Mobile device

tvOS 17.5.1

Python version

3.13

Bazel version

6.5.0

GCC/compiler version

Apple clang version 15.0.0 (clang-1500.1.0.2.5) Target: arm64-apple-darwin23.4.0 Thread model: posix InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

CUDA/cuDNN version

no

GPU model and memory

m1 pro

Current behavior?

no tvOS support

Standalone code to reproduce the issue

Due to unfamiliarity with the Bazel build system and the lack of available resources, I have attempted multiple times but have not successfully cross-compiled TensorFlow Lite to tvOS. I hope the official team can provide support for cross-compiling to tvOS.

Relevant log output

No response

about the official release schedule of "Play Services TFLite Java" version 16.2.0

System information

  • Android Device information (use adb shell getprop ro.build.fingerprint
    if possible):
    samsung/d2que/d2q:12/SP1A.210812.016/N975U1UES7HVF4:user/release-keys

  • TensorFlow Lite in Play Services SDK version (found in build.gradle):
    target sdk 34
    com.google.android.gms:play-services-tflite-java:16.2.0-beta02

  • Google Play Services version
    (Settings > Apps > Google Play Services > App details):
    24.26.32

Standalone code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate
the problem. If possible, please share a link to or attach code demonstrating
the problem.

I found that if I use com.google.android.gms:play-services-tflite , "libtensorflowlite_jni_gms_client.so" is included in the apk.
16kb align was not applied in version 16.1.0, but
In version 16.2.0-beta02, 16kb align were applied to 64bit

However, the beta version is not safe to use.
So I would like to know the official release schedule for 16.2.0.

Thank you.

Any other info / logs
Include any logs or source code that would be helpful to diagnose the problem.
If including tracebacks, please include the full traceback. Large logs and files
should be attached.

image

TFLite selective builds using TF ops (flex delegate) for embedded linux on aarch64

Issue type

Feature Request

Have you reproduced the bug with TensorFlow Nightly?

No

Source

source

TensorFlow version

2.10 ... 2.15

Custom code

No

OS platform and distribution

Linux Ubuntu 22.04

Mobile device

Yocto based Linux running kernel 6.1.x

Python version

No response

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

Hi,

From what it is described in Select TensorFlow operators and Reduce TensorFlow Lite binary size, it is possible to generate reduced size binaries (minimal TFLite runtime + specific Flex ops) for Android, and it is also described how to build custom C/C++ shared libraries containing the Flex ops that are part of the given models during the build process.

When building the shared libs with models containing flex ops and elinux_aarch64 as config, the expected artifacts are built with the full size, including all the TF ops rather than selecting the ones from the model. The behavior should be reproduced when trying to build benchmark-model with flex (or even the standalone shared lib):

tmp/BUILD (change init_tensorflow visibility if needed, custom-model.tflite should be a model containing at least 1 flex op)

load("@org_tensorflow//tensorflow:tensorflow.bzl", "tf_cc_binary", "clean_dep")
load("@org_tensorflow//tensorflow/lite:build_def.bzl", "tflite_copts", "tflite_copts_warnings", "tflite_linkopts")
load("@org_tensorflow//tensorflow/lite/delegates/flex:build_def.bzl", "tflite_flex_shared_library")

tflite_flex_shared_library(
    name = "tensorflowlite_flex_dynamic",
    models=[
       ":custom-model.tflite",
    ],
)

cc_import(
    name = "libtensorflowlite_flex_dynamic",
    shared_library = ":tensorflowlite_flex_dynamic",
)

tf_cc_binary(
    name = "benchmark_model_plus_flex_dynamic",
    srcs = [
        "//tensorflow/lite/tools/benchmark:benchmark_plus_flex_main.cc",
    ],
    copts = tflite_copts() + tflite_copts_warnings(),
    linkopts = tflite_linkopts(),
    deps = [
        ":libtensorflowlite_flex_dynamic",
        "//tensorflow/lite/tools/benchmark:benchmark_tflite_model_lib",
        "//tensorflow/lite/testing:init_tensorflow",
        "//tensorflow/lite/tools:logging",
    ],
)

Build command

bazel build -c opt \
	--cxxopt=`--std=c++17` \
	--config=monolithic \
	--config=elinux_aarch64 \
	--host_crosstool_top=@bazel_tools//tools/cpp:toolchain \
	--verbose_failures  \
	//tmp:tensorflowlite_flex_dynamic

I'd like to know if building it with the full size is the expected behavior. If yes, I'd like to know if enabling it is part of the roadmap as it seems like a low fruit hanging (it already works for same arch in Android).


I experimented a little bit, partially succeeding (could build reduced size binaries by breaking plenty of bazel targets). I'm adding it to the issue with the hope that someone might find it helpful (disclaimer: take it with a pinch of salt as I've got skill issues with TF internals and the bazel build system).

  • Naively, find all targets that selectively depend on android or mobile configs and ensure that a branch includes elinux_aarch64 (usually the targets are portable_tensorflow_lib and portable_tensorflow_lib_lite). This can be done adding elinux_aarch64 as part of mobile configuration and/or if_* functions wrapping select(). Long story short, replicate dependencies used for android config (when makes sense). Ensure that IS_MOBILE_PLATFORM is defined for portable_tensorflow_lib_lite.

  • Caveats: the python tool to generate the ops_to_register.h header (print_selective_registration_header) is built natively, and there are some conflicts. One of them is a double registration issue (maybe related to some diamond dependency problem?). The other is related to some missing shared libs used by the tool. The only way I found to fix this is changing dependencies of targets from tensorflow/python/*/BUILD. Of course, this affects other dependent targets. I also tried to duplicate targets with modified names to avoid breaking the original ones, but I was unsuccessful.

If relevant, I can try to share the full patch. Just remember the changes will affect building other targets.

Some size results for libtensorflowlite_flex.so

(opt, all) -> 99 Mb
(opt, selected) -> 7 Mb
(dbg, all) -> 5.2 Gb
(dbg, selected) -> 2.1 Gb

Standalone code to reproduce the issue

Check *current behavior* section. Generate the tmp/BUILD file, ensure a model with flex ops is present, and name of the `models` argument in `tflite_flex_shared_library` is correct.

Relevant log output

No response

Tflite model fails on Android GPU delegate due to 'null pointer dereference'

Hi there,

My tflite model fails on GPU delegate, using Android. Without the GPU delegate the model works fine using CPU.

I cut the model until I found out at which point the model crashes. It seems to be underneath node:

afbeelding

Models used:
Model suceeding: https://www.dropbox.com/scl/fi/iz96vmzrr99wpl40mr5pv/till480_7_7_float32-works.tflite?rlkey=ofqjleukotz1wnbmzovb5cddm&st=ww59mctf&dl=0
Model failing: https://www.dropbox.com/scl/fi/2ne8owmufj7ks3mcki818/960_7_7_float32-crashes.tflite?rlkey=ij49ion6d3xjt5lp0fapp2n2r&st=x6ftxmbo&dl=0

I'm using tensorflow 2.16.1 and onnx2tf 1.22.4:
implementation 'org.tensorflow:tensorflow-lite:2.16.1'
implementation 'org.tensorflow:tensorflow-lite-gpu:2.16.1'
implementation 'org.tensorflow:tensorflow-lite-gpu-api:2.16.1'

Any idea what could be wrong or how I can debug this further?

Thanks for your help!
Best regards,
Ramon

Full stacktrace:
I/GPU (10563): org.tensorflow.lite.gpu.GpuDelegate$Options@3e97741
I/GPU (10563): GPU is supported and will be used for inference.
I/tflite (10563): Replacing 299 out of 299 node(s) with delegate (TfLiteGpuDelegateV2) node, yielding 1 partitions for the whole graph.
F/libc (10563): Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x18 in tid 11103 (background), pid 10563 (g_app)


Build fingerprint: 'samsung/a52qnseea/a52q:13/TP1A.220624.014/A525FXXS6DWG2:user/release-keys'
Revision: '8'
ABI: 'arm64'
Processor: '7'
Timestamp: 2024-06-30 17:23:08.630221214+0200
Process uptime: 137s
Cmdline: com.example.debug_app
pid: 10563, tid: 11103, name: background >>> com.example.debug_app <<<
uid: 10448
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0000000000000018
Cause: null pointer dereference
x0 b400007a9f865700 x1 0000000000000079 x2 0000000000000000 x3 0000000000000004
x4 0000000000000000 x5 0000000000000000 x6 0000000000000000 x7 0000000000000000
x8 0000000000000004 x9 0000000000000001 x10 b400007a5983a370 x11 380c88acc82776a5
x12 000000000000001f x13 0000000000000001 x14 0000007a6a05251e x15 0000007a6ad1f0f4
x16 0000000000000000 x17 00000000000000ff x18 0000007a64a28000 x19 0000007b021f6000
x20 0000000000000000 x21 b400007a9f865700 x22 0000000000000079 x23 0000000000000000
x24 0000007b021f17f0 x25 0000000000000002 x26 0000000000000004 x27 0000007b021f6000
x28 0000000000000079 x29 0000007b021f1410
lr 0000000000000000 sp 0000007b021f12f0 pc 0000007a6acd49f0 pst 0000000080001000
backtrace:
#00 pc 0000000000dcc9f0 /vendor/lib64/libllvm-qcom.so (llvm::SelectionDAG::getNode(unsigned int, llvm::DebugLoc, llvm::EVT, llvm::SDValue)+64) (BuildId: 90db7b70a33a3223117d01259e5d89fd)
#1 pc 0000000000e172a0 /vendor/lib64/libllvm-qcom.so (getCopyFromParts(llvm::SelectionDAG&, llvm::DebugLoc, llvm::SDValue const*, unsigned int, llvm::EVT, llvm::EVT, llvm::ISD::NodeType)+2952) (BuildId: 90db7b70a33a3223117d01259e5d89fd)
#2 pc 0000000000e1a1a0 /vendor/lib64/libllvm-qcom.so (llvm::SelectionDAGISel::LowerArguments(llvm::BasicBlock const*)+3840) (BuildId: 90db7b70a33a3223117d01259e5d89fd)
#3 pc 0000000000e251e8 /vendor/lib64/libllvm-qcom.so (llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&)+1336) (BuildId: 90db7b70a33a3223117d01259e5d89fd)
#4 pc 0000000000e2449c /vendor/lib64/libllvm-qcom.so (llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&)+1420) (BuildId: 90db7b70a33a3223117d01259e5d89fd)
#5 pc 000000000092942c /vendor/lib64/libllvm-qcom.so (llvm::FPPassManager::runOnFunction(llvm::Function&)+884) (BuildId: 90db7b70a33a3223117d01259e5d89fd)
#6 pc 0000000000928bc8 /vendor/lib64/libllvm-qcom.so (llvm::FunctionPassManagerImpl::run(llvm::Function&)+192) (BuildId: 90db7b70a33a3223117d01259e5d89fd)
#7 pc 00000000009289f8 /vendor/lib64/libllvm-qcom.so (llvm::FunctionPassManager::run(llvm::Function&)+104) (BuildId: 90db7b70a33a3223117d01259e5d89fd)
#8 pc 0000000000f1a1cc /vendor/lib64/libllvm-qcom.so (llvm::llclib::CompileInComplexPipeline(llvm::Module&, llvm::Triple const&, llvm::TargetMachine&, void* ()(unsigned int), llvm::PassOverrides&, llvm::RunnableListType<llvm::RunnableListTypellvm::OptManager >, llvm::CLPrintfInterpreter const*, llvm::TimeRegion*, llvm::Module*, llvm::formatted_raw_ostream&, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >&, char**, unsigned int&)+2164) (BuildId: 90db7b70a33a3223117d01259e5d89fd)
#9 pc 0000000000f1b05c /vendor/lib64/libllvm-qcom.so (llvm::llclib::Compile(llvm::Module*, void* ()(unsigned int), char**, unsigned int&, llvm::Module, llvm::CLPrintfInterpreter const*)+2460) (BuildId: 90db7b70a33a3223117d01259e5d89fd)
#10 pc 0000000001ac905c /vendor/lib64/libllvm-qcom.so (clang::clanglib::Codegen(llvm::MemoryBuffer*, cl_compiler_target, std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > const&, llvm::OwningArrayPtr&, unsigned int&, cl_rs_compiler_info*)+1052) (BuildId: 90db7b70a33a3223117d01259e5d89fd)
#11 pc 0000000001aeb554 /vendor/lib64/libllvm-qcom.so ((anonymous namespace)::BasicCompilation::link()+4044) (BuildId: 90db7b70a33a3223117d01259e5d89fd)
#12 pc 0000000001ae5dfc /vendor/lib64/libllvm-qcom.so (cl_compiler_link_program+492) (BuildId: 90db7b70a33a3223117d01259e5d89fd)
#13 pc 000000000027b52c /vendor/lib64/libCB.so (cl_program_link_immediate+956) (BuildId: 4a1aa94f6bc5d4e8fb2d539d1411636d)
#14 pc 000000000027a5d0 /vendor/lib64/libCB.so (cl_program_build_immediate+272) (BuildId: 4a1aa94f6bc5d4e8fb2d539d1411636d)
#15 pc 0000000000280448 /vendor/lib64/libCB.so (cb_build_program+1072) (BuildId: 4a1aa94f6bc5d4e8fb2d539d1411636d)
#16 pc 0000000000013360 /vendor/lib64/libOpenCL.so (qCLDrvAPI_clBuildProgram+112) (BuildId: c2a2acac9160da02966c1071cb85246e)
#17 pc 000000000013c1b8 /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_gpu_jni.so (BuildId: 3b36f80b867134f597c198e81d9bec61)
#18 pc 000000000013c0e4 /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_gpu_jni.so (BuildId: 3b36f80b867134f597c198e81d9bec61)
#19 pc 000000000013782c /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_gpu_jni.so (BuildId: 3b36f80b867134f597c198e81d9bec61)
#20 pc 00000000000e6e7c /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_gpu_jni.so (BuildId: 3b36f80b867134f597c198e81d9bec61)
#21 pc 00000000000dd4a0 /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_gpu_jni.so (BuildId: 3b36f80b867134f597c198e81d9bec61)
#22 pc 00000000000dcf88 /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_gpu_jni.so (BuildId: 3b36f80b867134f597c198e81d9bec61)
#23 pc 00000000000dcd60 /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_gpu_jni.so (BuildId: 3b36f80b867134f597c198e81d9bec61)
#24 pc 00000000000d7304 /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_gpu_jni.so (BuildId: 3b36f80b867134f597c198e81d9bec61)
#25 pc 00000000000ad7b8 /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_gpu_jni.so (BuildId: 3b36f80b867134f597c198e81d9bec61)
#26 pc 00000000000ad22c /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_gpu_jni.so (BuildId: 3b36f80b867134f597c198e81d9bec61)
#27 pc 00000000000ae204 /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_gpu_jni.so (BuildId: 3b36f80b867134f597c198e81d9bec61)
#28 pc 00000000002febd8 /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_jni.so (BuildId: 385d6d92f29b4b4cb3ffca33758d3471)
#29 pc 00000000002fe5bc /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_jni.so (BuildId: 385d6d92f29b4b4cb3ffca33758d3471)
#30 pc 00000000002fe1d8 /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_jni.so (BuildId: 385d6d92f29b4b4cb3ffca33758d3471)
#31 pc 00000000000aa7f0 /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_gpu_jni.so (BuildId: 3b36f80b867134f597c198e81d9bec61)
#32 pc 0000000000302f98 /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_jni.so (BuildId: 385d6d92f29b4b4cb3ffca33758d3471)
#33 pc 000000000030353c /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_jni.so (BuildId: 385d6d92f29b4b4cb3ffca33758d3471)
#34 pc 00000000002f6370 /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_jni.so (BuildId: 385d6d92f29b4b4cb3ffca33758d3471)
#35 pc 00000000002f914c /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_jni.so (BuildId: 385d6d92f29b4b4cb3ffca33758d3471)
#36 pc 00000000002f9668 /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_jni.so (BuildId: 385d6d92f29b4b4cb3ffca33758d3471)
#37 pc 000000000007f4d8 /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/lib/arm64/libtensorflowlite_jni.so (Java_org_tensorflow_lite_NativeInterpreterWrapper_createInterpreter+692) (BuildId: 385d6d92f29b4b4cb3ffca33758d3471)
#38 pc 0000000000355830 /apex/com.android.art/lib64/libart.so (art_quick_generic_jni_trampoline+144) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#39 pc 000000000033f080 /apex/com.android.art/lib64/libart.so (art_quick_invoke_static_stub+640) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#40 pc 000000000037dde8 /apex/com.android.art/lib64/libart.so (art::interpreter::ArtInterpreterToCompiledCodeBridge(art::Thread*, art::ArtMethod*, art::ShadowFrame*, unsigned short, art::JValue*)+416) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#41 pc 000000000037d598 /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+1960) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#42 pc 000000000049a6d8 /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp(art::interpreter::SwitchImplContext*)+14012) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#43 pc 0000000000357fd8 /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#44 pc 000000000006e220 [anon:dalvik-classes7.dex extracted in memory from /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/base.apk!classes7.dex] (org.tensorflow.lite.NativeInterpreterWrapper.init+0)
#45 pc 0000000000374120 /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.420609892041422114)+232) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#46 pc 000000000037db04 /apex/com.android.art/lib64/libart.so (art::interpreter::ArtInterpreterToInterpreterBridge(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame*, art::JValue*)+100) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#47 pc 000000000037d534 /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+1860) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#48 pc 0000000000499b48 /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp(art::interpreter::SwitchImplContext*)+11052) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#49 pc 0000000000357fd8 /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#50 pc 000000000006df38 [anon:dalvik-classes7.dex extracted in memory from /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/base.apk!classes7.dex] (org.tensorflow.lite.NativeInterpreterWrapper.+0)
#51 pc 0000000000374120 /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.420609892041422114)+232) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#52 pc 0000000000511d1c /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+5252) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#53 pc 0000000000497814 /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp(art::interpreter::SwitchImplContext*)+2040) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#54 pc 0000000000357fd8 /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#55 pc 000000000006d8f8 [anon:dalvik-classes7.dex extracted in memory from /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/base.apk!classes7.dex] (org.tensorflow.lite.NativeInterpreterWrapperExperimental.+0)
#56 pc 0000000000374120 /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.420609892041422114)+232) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#57 pc 0000000000511d1c /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+5252) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#58 pc 0000000000497814 /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp(art::interpreter::SwitchImplContext*)+2040) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#59 pc 0000000000357fd8 /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#60 pc 000000000006d728 [anon:dalvik-classes7.dex extracted in memory from /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/base.apk!classes7.dex] (org.tensorflow.lite.Interpreter.+0)
#61 pc 0000000000374120 /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.420609892041422114)+232) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#62 pc 0000000000511d1c /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+5252) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#63 pc 0000000000497814 /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp(art::interpreter::SwitchImplContext*)+2040) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#64 pc 0000000000357fd8 /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#65 pc 0000000000006fc4 [anon:dalvik-classes5.dex extracted in memory from /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/base.apk!classes5.dex] (com.example.debug_app.detection.TfliteEdgeyoloRunner.+0)
#66 pc 0000000000374120 /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.420609892041422114)+232) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#67 pc 0000000000511d1c /apex/com.android.art/lib64/libart.so (bool art::interpreter::DoCall(art::ArtMethod*, art::Thread*, art::ShadowFrame&, art::Instruction const*, unsigned short, bool, art::JValue*)+5252) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#68 pc 0000000000497814 /apex/com.android.art/lib64/libart.so (void art::interpreter::ExecuteSwitchImplCpp(art::interpreter::SwitchImplContext*)+2040) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#69 pc 0000000000357fd8 /apex/com.android.art/lib64/libart.so (ExecuteSwitchImplAsm+8) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#70 pc 00000000000049a0 [anon:dalvik-classes5.dex extracted in memory from /data/app/~~sh8Q2LYw9jz26oygi1RiqQ==/com.example.debug_app-0VmdEVI_oHQgAjGD37PnJQ==/base.apk!classes5.dex] (com.example.debug_app.detection.Detector$detectInImage$backgroundThread$1.run+0)
#71 pc 0000000000374120 /apex/com.android.art/lib64/libart.so (art::interpreter::Execute(art::Thread*, art::CodeItemDataAccessor const&, art::ShadowFrame&, art::JValue, bool, bool) (.__uniq.112435418011751916792819755956732575238.llvm.420609892041422114)+232) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#72 pc 0000000000373a18 /apex/com.android.art/lib64/libart.so (artQuickToInterpreterBridge+964) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#73 pc 0000000000355968 /apex/com.android.art/lib64/libart.so (art_quick_to_interpreter_bridge+88) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#74 pc 000000000033eda4 /apex/com.android.art/lib64/libart.so (art_quick_invoke_stub+612) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#75 pc 0000000000239d54 /apex/com.android.art/lib64/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+144) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#76 pc 000000000053a1b0 /apex/com.android.art/lib64/libart.so (art::Thread::CreateCallback(void*)+1600) (BuildId: 735f12f804f88d62a2cb437261076ff7)
#77 pc 00000000000f5298 /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+208) (BuildId: 1bcad8bca80d38bceb9089f70d394e33)
#78 pc 000000000008ebdc /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68) (BuildId: 1bcad8bca80d38bceb9089f70d394e33)
Lost connection to device.

Exited.

App Storage Size Increases with CoreML or Metal Usage on iOS

Issue type

Bug

Have you reproduced the bug with TensorFlow Nightly?

Yes

Source

source

TensorFlow version

0.0.1-nightly

Custom code

No

OS platform and distribution

iPhone 17.5.1

Mobile device

iPhone 13 mini

Python version

No response

Bazel version

No response

GCC/compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current behavior?

I'm encountering an issue where the app's storage size increases each time an AI model is loaded, and the storage doesn't decrease afterward. Specifically, I'm using the PoseNet TensorFlow Lite model on iPhone to demonstrate the problem.

for more detail checkout this StackOverflow Post

Standalone code to reproduce the issue

Run the [TensorFlow Lite Pose Estimation iOS Demo](https://github.com/tensorflow/examples/tree/master/lite/examples/pose_estimation/ios#tensorflow-lite-pose-estimation-ios-demo) using `Metal` or `CoreML`, and you'll notice the app's storage size increasing each time the pose detection model is executed.

Relevant log output

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.