lithium0003 / whisper_for_ios Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 1.0 373 KB

License: MIT License

Python 27.69% Swift 72.31%

whisper_for_ios's People

Contributors

Stargazers

Watchers

Forkers

thamwangjun

whisper_for_ios's Issues

Man, awesome job! I like you ❤️

👍

Always translates into English

Transcription does not always work. At some point, any language is always automatically translated into English and this cannot be canceled

What's your convert environment?

Hi @lithium0003 ,
I try to run convert.py but failed. with the following messages

ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=768, n_audio_head=12, n_audio_layer=12, n_vocab=51865, n_text_ctx=448, n_text_state=768, n_text_head=12, n_text_layer=12)
/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/whisper/model.py:152: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1:] == self.positional_embedding.shape, "incorrect audio shape"
/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/whisper/model.py:90: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  scale = (n_state // self.n_head) ** -0.25
Converting PyTorch Frontend ==> MIL Ops: 100%|███████████████████████████████████████████████████████████████████████████████████████████████▉| 1025/1026 [00:00<00:00, 3627.16 ops/s]
Running MIL Common passes: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 38/38 [00:00<00:00, 78.42 passes/s]
Running MIL FP16ComputePrecision pass: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.92 passes/s]
Running MIL Clean up passes: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:01<00:00,  6.39 passes/s]
ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=768, n_audio_head=12, n_audio_layer=12, n_vocab=51865, n_text_ctx=448, n_text_state=768, n_text_head=12, n_text_layer=12)
Converting PyTorch Frontend ==> MIL Ops: 100%|███████████████████████████████████████████████████████████████████████████████████████████████▌| 1657/1664 [00:00<00:00, 1922.04 ops/s]
Traceback (most recent call last):
  File "/Volumes/Coding/whisper_for_iOS/convert.py", line 52, in <module>
    convert_decoder(size)
  File "/Volumes/Coding/whisper_for_iOS/convert.py", line 37, in convert_decoder
    model = ct.convert(
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/_converters_entry.py", line 451, in convert
    mlmodel = mil_convert(
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 193, in mil_convert
    return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 220, in _mil_convert
    proto, mil_program = mil_convert_to_proto(
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 285, in mil_convert_to_proto
    prog = frontend_converter(model, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 115, in __call__
    return load(*args, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 53, in load
    return _perform_torch_convert(converter, debug)
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 100, in _perform_torch_convert
    raise e
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 92, in _perform_torch_convert
    prog = converter.convert()
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 269, in convert
    convert_nodes(self.context, self.graph)
  File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 92, in convert_nodes
    raise RuntimeError(
RuntimeError: PyTorch convert function for op 'numpy_t' not implemented.

I think the issue is caused by different build environment, and my OS is M1 Max Ventura 13.1, and I use conda to manage python environment.

name: whipser-iOS
channels:
  - defaults
dependencies:
  - python=3.9.*
  - pip
  - pip:
    - whisper
    - numpy==1.23.4
    - coremltools==5.*
    - torch==1.12.1
    - torchvision==0.13.1
    - torchaudio==0.12.1

decoder unable to untilize GPU/Neural Engine

After convert the model, in Xcode select the model and in performance tab, we can benchmark model.
I found that encoder.mlpackage can run on GPU/Neural Engine.

But the decoder.mlpackage look like unable to run on GPU/Neural Engine, I pretty sure I set the compute unit to ALL.

This is quite import for performance, maybe you have some idea of why.
Looking forward to your reply.

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

error in this line

--> traced_model = torch.jit.trace(model.encoder, input_mel)

ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=768, n_audio_head=12, n_audio_layer=12, n_vocab=51865, n_text_ctx=448, n_text_state=768, n_text_head=12, n_text_layer=12)

RuntimeError Traceback (most recent call last)
in
48
49 size = "small"
---> 50 convert_encoder(size)
51 convert_decoder(size)

10 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight, bias)
302 _single(0), self.dilation, self.groups)
303 return F.conv1d(input, weight, bias, self.stride,
--> 304 self.padding, self.dilation, self.groups)
305
306 def forward(self, input: Tensor) -> Tensor:

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.