lithium0003 / whisper_for_ios Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
๐
Transcription does not always work. At some point, any language is always automatically translated into English and this cannot be canceled
Hi @lithium0003 ,
I try to run convert.py but failed. with the following messages
ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=768, n_audio_head=12, n_audio_layer=12, n_vocab=51865, n_text_ctx=448, n_text_state=768, n_text_head=12, n_text_layer=12)
/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/whisper/model.py:152: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert x.shape[1:] == self.positional_embedding.shape, "incorrect audio shape"
/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/whisper/model.py:90: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
scale = (n_state // self.n_head) ** -0.25
Converting PyTorch Frontend ==> MIL Ops: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1025/1026 [00:00<00:00, 3627.16 ops/s]
Running MIL Common passes: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 38/38 [00:00<00:00, 78.42 passes/s]
Running MIL FP16ComputePrecision pass: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1/1 [00:00<00:00, 1.92 passes/s]
Running MIL Clean up passes: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 11/11 [00:01<00:00, 6.39 passes/s]
ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=768, n_audio_head=12, n_audio_layer=12, n_vocab=51865, n_text_ctx=448, n_text_state=768, n_text_head=12, n_text_layer=12)
Converting PyTorch Frontend ==> MIL Ops: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1657/1664 [00:00<00:00, 1922.04 ops/s]
Traceback (most recent call last):
File "/Volumes/Coding/whisper_for_iOS/convert.py", line 52, in <module>
convert_decoder(size)
File "/Volumes/Coding/whisper_for_iOS/convert.py", line 37, in convert_decoder
model = ct.convert(
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/_converters_entry.py", line 451, in convert
mlmodel = mil_convert(
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 193, in mil_convert
return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 220, in _mil_convert
proto, mil_program = mil_convert_to_proto(
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 285, in mil_convert_to_proto
prog = frontend_converter(model, **kwargs)
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/converter.py", line 115, in __call__
return load(*args, **kwargs)
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 53, in load
return _perform_torch_convert(converter, debug)
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 100, in _perform_torch_convert
raise e
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 92, in _perform_torch_convert
prog = converter.convert()
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 269, in convert
convert_nodes(self.context, self.graph)
File "/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 92, in convert_nodes
raise RuntimeError(
RuntimeError: PyTorch convert function for op 'numpy_t' not implemented.
I think the issue is caused by different build environment, and my OS is M1 Max Ventura 13.1, and I use conda to manage python environment.
name: whipser-iOS
channels:
- defaults
dependencies:
- python=3.9.*
- pip
- pip:
- whisper
- numpy==1.23.4
- coremltools==5.*
- torch==1.12.1
- torchvision==0.13.1
- torchaudio==0.12.1
After convert the model, in Xcode select the model and in performance tab, we can benchmark model.
I found that encoder.mlpackage
can run on GPU/Neural Engine.
But the decoder.mlpackage
look like unable to run on GPU/Neural Engine, I pretty sure I set the compute unit to ALL.
This is quite import for performance, maybe you have some idea of why.
Looking forward to your reply.
error in this line
--> traced_model = torch.jit.trace(model.encoder, input_mel)
RuntimeError Traceback (most recent call last)
in
48
49 size = "small"
---> 50 convert_encoder(size)
51 convert_decoder(size)
10 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight, bias)
302 _single(0), self.dilation, self.groups)
303 return F.conv1d(input, weight, bias, self.stride,
--> 304 self.padding, self.dilation, self.groups)
305
306 def forward(self, input: Tensor) -> Tensor:
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.