k2-fsa / fast_rnnt Goto Github PK

A torch implementation of a recursion which turns out to be useful for RNN-T.

License: Other

Shell 0.09% Python 48.52% C++ 4.19% Cuda 21.18% CMake 26.03%

fast_rnnt's Introduction

This project implements a method for faster and more memory-efficient RNN-T loss computation, called pruned rnnt.

Note: There is also a fast RNN-T loss implementation in k2 project, which shares the same code here. We make fast_rnnt a stand-alone project in case someone wants only this rnnt loss.

How does the pruned-rnnt work ?

We first obtain pruning bounds for the RNN-T recursion using a simple joiner network that is just an addition of the encoder and decoder, then we use those pruning bounds to evaluate the full, non-linear joiner network.

The picture below display the gradients (obtained by rnnt_loss_simple with return_grad=true) of lattice nodes, at each time frame, only a small set of nodes have a non-zero gradient, which justifies the pruned RNN-T loss, i.e., putting a limit on the number of symbols per frame.

This picture is taken from here

Installation

You can install it via pip:

pip install fast_rnnt

You can also install from source:

git clone https://github.com/danpovey/fast_rnnt.git
cd fast_rnnt
python setup.py install

To check that fast_rnnt was installed successfully, please run

python3 -c "import fast_rnnt; print(fast_rnnt.__version__)"

which should print the version of the installed fast_rnnt, e.g., 1.0.

How to display installation log ?

Use

pip install --verbose fast_rnnt

How to reduce installation time ?

Use

export FT_MAKE_ARGS="-j"
pip install --verbose fast_rnnt

It will pass -j to make.

Which version of PyTorch is supported ?

It has been tested on PyTorch >= 1.5.0.

Note: The cuda version of the Pytorch should be the same as the cuda version in your environment, or it will cause a compilation error.

How to install a CPU version of `fast_rnnt` ?

Use

export FT_CMAKE_ARGS="-DCMAKE_BUILD_TYPE=Release -DFT_WITH_CUDA=OFF"
export FT_MAKE_ARGS="-j"
pip install --verbose fast_rnnt

It will pass -DCMAKE_BUILD_TYPE=Release -DFT_WITH_CUDA=OFF to cmake.

Where to get help if I have problems with the installation ?

Please file an issue at https://github.com/danpovey/fast_rnnt/issues and describe your problem there.

Usage

For rnnt_loss_simple

This is a simple case of the RNN-T loss, where the joiner network is just addition.

Note: termination_symbol plays the role of blank in other RNN-T loss implementations, we call it termination_symbol as it terminates symbols of current frame.

am = torch.randn((B, T, C), dtype=torch.float32)
lm = torch.randn((B, S + 1, C), dtype=torch.float32)
symbols = torch.randint(0, C, (B, S))
termination_symbol = 0

boundary = torch.zeros((B, 4), dtype=torch.int64)
boundary[:, 2] = target_lengths
boundary[:, 3] = num_frames

loss = fast_rnnt.rnnt_loss_simple(
    lm=lm,
    am=am,
    symbols=symbols,
    termination_symbol=termination_symbol,
    boundary=boundary,
    reduction="sum",
)

For rnnt_loss_smoothed

The same as rnnt_loss_simple, except that it supports am_only & lm_only smoothing that allows you to make the loss-function one of the form:

      lm_only_scale * lm_probs +
      am_only_scale * am_probs +
      (1-lm_only_scale-am_only_scale) * combined_probs

where lm_probs and am_probs are the probabilities given the lm and acoustic model independently.

am = torch.randn((B, T, C), dtype=torch.float32)
lm = torch.randn((B, S + 1, C), dtype=torch.float32)
symbols = torch.randint(0, C, (B, S))
termination_symbol = 0

boundary = torch.zeros((B, 4), dtype=torch.int64)
boundary[:, 2] = target_lengths
boundary[:, 3] = num_frames

loss = fast_rnnt.rnnt_loss_smoothed(
    lm=lm,
    am=am,
    symbols=symbols,
    termination_symbol=termination_symbol,
    lm_only_scale=0.25,
    am_only_scale=0.0
    boundary=boundary,
    reduction="sum",
)

For rnnt_loss_pruned

rnnt_loss_pruned can not be used alone, it needs the gradients returned by rnnt_loss_simple/rnnt_loss_smoothed to get pruning bounds.

am = torch.randn((B, T, C), dtype=torch.float32)
lm = torch.randn((B, S + 1, C), dtype=torch.float32)
symbols = torch.randint(0, C, (B, S))
termination_symbol = 0

boundary = torch.zeros((B, 4), dtype=torch.int64)
boundary[:, 2] = target_lengths
boundary[:, 3] = num_frames

# rnnt_loss_simple can be also rnnt_loss_smoothed
simple_loss, (px_grad, py_grad) = fast_rnnt.rnnt_loss_simple(
    lm=lm,
    am=am,
    symbols=symbols,
    termination_symbol=termination_symbol,
    boundary=boundary,
    reduction="sum",
    return_grad=True,
)
s_range = 5  # can be other values
ranges = fast_rnnt.get_rnnt_prune_ranges(
    px_grad=px_grad,
    py_grad=py_grad,
    boundary=boundary,
    s_range=s_range,
)

am_pruned, lm_pruned = fast_rnnt.do_rnnt_pruning(am=am, lm=lm, ranges=ranges)

logits = model.joiner(am_pruned, lm_pruned)
pruned_loss = fast_rnnt.rnnt_loss_pruned(
    logits=logits,
    symbols=symbols,
    ranges=ranges,
    termination_symbol=termination_symbol,
    boundary=boundary,
    reduction="sum",
)

You can also find recipes here that uses rnnt_loss_pruned to train a model.

For rnnt_loss

The unprund rnnt_loss is the same as torchaudio rnnt_loss, it produces same output as torchaudio for the same input.

logits = torch.randn((B, S, T, C), dtype=torch.float32)
symbols = torch.randint(0, C, (B, S))
termination_symbol = 0

boundary = torch.zeros((B, 4), dtype=torch.int64)
boundary[:, 2] = target_lengths
boundary[:, 3] = num_frames

loss = fast_rnnt.rnnt_loss(
    logits=logits,
    symbols=symbols,
    termination_symbol=termination_symbol,
    boundary=boundary,
    reduction="sum",
)

Benchmarking

The repo compares the speed and memory usage of several transducer losses, the summary in the following table is taken from there, you can check the repository for more details.

Note: As we declared above, fast_rnnt is also implemented in k2 project, so k2 and fast_rnnt are equivalent in the benchmarking.

Name	Average step time (us)	Peak memory usage (MB)
torchaudio	601447	12959.2
fast_rnnt(unpruned)	274407	15106.5
fast_rnnt(pruned)	38112	2647.8
optimized_transducer	567684	10903.1
warprnnt_numba	229340	13061.8
warp-transducer	210772	13061.8

fast_rnnt's People

Contributors

Stargazers

Watchers

fast_rnnt's Issues

Why T>=S constraint?

code

Why do we need this constraint? In a regular rnnt, normally the joint may emit many blank symbol, and in this condition, T>S. But it's also possilble that S>T, e.g. we emit at least one non-blank symbols for each encoder frames.

Actually I have met this
File "/rnnt_related/rnnt-mlperf-training/model_rnnt.py", line 203, in fast_joint simple_loss, (px_grad, py_grad) = fast_rnnt.rnnt_loss_simple( File "/anaconda3/envs/fast-rnnt/lib/python3.8/site-packages/fast_rnnt-1.2-py3.8-linux-x86_64.egg/fast_rnnt/rnnt_loss.py", line 282, in rnnt_loss_simple px, py = get_rnnt_logprobs( File "/anaconda3/envs/fast-rnnt/lib/python3.8/site-packages/fast_rnnt-1.2-py3.8-linux-x86_64.egg/fast_rnnt/rnnt_loss.py", line 149, in get_rnnt_logprobs assert T >= S, (T, S) AssertionError: (272, 274)

RuntimeError: Failed to find native CUDA module

RuntimeError: Failed to find native CUDA module, make sure that you compiled the code with K2_WITH_CUDA.

Combination of fast_rnnt and fast_emit

Is there any version that take advantage of fast_emit?

T>=S constraint in latest pip version

Hello, I am evaluating pruned-rnnt (regular version) for my use case and just ran into the T>=S assertion. I am avoiding building from source because it takes too long (10+ mins) to build. I was wondering if you are going to have another release anytime soon.

An error occurred while compiling the source code

Thank you for the fast_rnnt, I got the error (show blow) when I run "python setup.py install".

python=3.8.11 torch verison=1.10.1 cudatoolkit=10.2.89 CUDA version=10.2. GCC version =5.3.1. cmake version=3.23.0

Is this known issue? How can it be debugged and solved?

Thank you!

Import fast_rnnt is Failed

Hi team,

I installed successfully with pip install fast_rnnt.
But can't import fast_rnnt (as attached img).

Here's my environment information.

Win11 / Python 3.11
CUDA 11.8 / torch 2.0.1

Also I tried install from source, but The import fast_rnnt is not work with same error message.

pip error

I have installed a specific CUDA related kit following the tutorial for k2 but still have this problem, my CUDA version is 11.6

ModuleNotFoundError: No module named '_fast_rnnt'

I installed this module successfully with pip, while this error occurred when I tried to import the fast_rnnt.

#error -- unsupported GUN version ! gcc version later than 5.3 are not supported!

I ran into the following problems，when i run the setup.py:

/opt/lib/cuda-8.0/bin/..//include/host_config.h:115:2 #error -- unsupported GUN version ! gcc version later than 5.3 are not supported!

the environment is :
gcc 5.4.0
pytorch 1.7.1
cuda-10.2
python 3.7
(There are several different versions of cuda in the environment.)

when I set the following configs in CMakeList.txt, I still get the same error

set(CUDA_TOOLKIT_ROOT_DIR /opt/lib/cuda-10.2)
set(CMAKE_C_COMPILER /usr/bin/gcc) (gcc -v 4.8.5)
set(CMAKE_CXX_COMPILER /usr/bin/++)

Error while installing

While installing fast_rnnt, I get the following error, both while installing via pip as well as for the manual installation:

-- Found Torch: /home/ujjwaleshwar/Projects/py/FYP/venv/lib/python3.10/site-packages/torch/lib/libtorch.so  
-- PyTorch version: 1.13.1+cu117
-- Configuring done
-- Generating done
-- Build files have been written to: /home/ujjwaleshwar/fast_rnnt/build/temp.linux-x86_64-cpython-310
[ 16%] Building CUDA object fast_rnnt/csrc/CMakeFiles/mutual_information_core.dir/mutual_information_cpu.cu.o
[ 33%] Building CUDA object fast_rnnt/csrc/CMakeFiles/mutual_information_core.dir/mutual_information_cuda.cu.o
/home/ujjwaleshwar/Projects/py/FYP/venv/lib/python3.10/site-packages/torch/include/pybind11/cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type<T>::type>::cast_op_type<T> pybind11::detail::cast_op(make_caster<T>&)’:
/home/ujjwaleshwar/Projects/py/FYP/venv/lib/python3.10/site-packages/torch/include/pybind11/cast.h:42:120: error: expected template-name before ‘<’ token
   42 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                        ^
/home/ujjwaleshwar/Projects/py/FYP/venv/lib/python3.10/site-packages/torch/include/pybind11/cast.h:42:120: error: expected identifier before ‘<’ token
/home/ujjwaleshwar/Projects/py/FYP/venv/lib/python3.10/site-packages/torch/include/pybind11/cast.h:42:123: error: expected primary-expression before ‘>’ token
   42 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                           ^
/home/ujjwaleshwar/Projects/py/FYP/venv/lib/python3.10/site-packages/torch/include/pybind11/cast.h:42:126: error: expected primary-expression before ‘)’ token
   42 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                              ^
/home/ujjwaleshwar/Projects/py/FYP/venv/lib/python3.10/site-packages/torch/include/pybind11/cast.h: In function ‘typename pybind11::detail::type_caster<typename pybind11::detail::intrinsic_type<T>::type>::cast_op_type<T> pybind11::detail::cast_op(make_caster<T>&)’:
/home/ujjwaleshwar/Projects/py/FYP/venv/lib/python3.10/site-packages/torch/include/pybind11/cast.h:42:120: error: expected template-name before ‘<’ token
   42 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                        ^
/home/ujjwaleshwar/Projects/py/FYP/venv/lib/python3.10/site-packages/torch/include/pybind11/cast.h:42:120: error: expected identifier before ‘<’ token
/home/ujjwaleshwar/Projects/py/FYP/venv/lib/python3.10/site-packages/torch/include/pybind11/cast.h:42:123: error: expected primary-expression before ‘>’ token
   42 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                           ^
/home/ujjwaleshwar/Projects/py/FYP/venv/lib/python3.10/site-packages/torch/include/pybind11/cast.h:42:126: error: expected primary-expression before ‘)’ token
   42 |     return caster.operator typename make_caster<T>::template cast_op_type<T>();
      |                                                                                                                              ^
make[3]: *** [fast_rnnt/csrc/CMakeFiles/mutual_information_core.dir/build.make:77: fast_rnnt/csrc/CMakeFiles/mutual_information_core.dir/mutual_information_cpu.cu.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[3]: *** [fast_rnnt/csrc/CMakeFiles/mutual_information_core.dir/build.make:92: fast_rnnt/csrc/CMakeFiles/mutual_information_core.dir/mutual_information_cuda.cu.o] Error 1
make[2]: *** [CMakeFiles/Makefile2:180: fast_rnnt/csrc/CMakeFiles/mutual_information_core.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:213: fast_rnnt/python/csrc/CMakeFiles/_fast_rnnt.dir/rule] Error 2
make: *** [Makefile:137: _fast_rnnt] Error 2

I am using pytorch 1.13 + cu11.7.
nvcc --version returns the following:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

Any help is appreciated.

Issue in installation

Hi,

I am trying to install this package, but I run into the following error after cloning and running setup.py

[ 20%] Linking CXX shared library ../../lib/libmutual_information_core.so
/usr/bin/ld: cannot find -lmkl_intel_ilp64
/usr/bin/ld: cannot find -lmkl_core
/usr/bin/ld: cannot find -lmkl_intel_thread
collect2: error: ld returned 1 exit status
make[3]: *** [fast_rnnt/csrc/CMakeFiles/mutual_information_core.dir/build.make:99: lib/libmutual_information_core.so] Error 1
make[2]: *** [CMakeFiles/Makefile2:191: fast_rnnt/csrc/CMakeFiles/mutual_information_core.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:225: fast_rnnt/python/csrc/CMakeFiles/_fast_rnnt.dir/rule] Error 2
make: *** [Makefile:131: _fast_rnnt] Error 2
Traceback (most recent call last):
File "setup.py", line 105, in
setuptools.setup(
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/setuptools/init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "/opt/conda/envs/ptca/lib/python3.8/distutils/core.py", line 148, in setup
dist.run_commands()
File "/opt/conda/envs/ptca/lib/python3.8/distutils/dist.py", line 966, in run_commands
self.run_command(cmd)
File "/opt/conda/envs/ptca/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/setuptools/command/install.py", line 74, in run
self.do_egg_install()
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/setuptools/command/install.py", line 116, in do_egg_install
self.run_command('bdist_egg')
File "/opt/conda/envs/ptca/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/conda/envs/ptca/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 164, in run
cmd = self.call_command('install_lib', warn_dir=0)
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 150, in call_command
self.run_command(cmdname)
File "/opt/conda/envs/ptca/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/conda/envs/ptca/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/setuptools/command/install_lib.py", line 11, in run
self.build()
File "/opt/conda/envs/ptca/lib/python3.8/distutils/command/install_lib.py", line 107, in build
self.run_command('build_ext')
File "/opt/conda/envs/ptca/lib/python3.8/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/conda/envs/ptca/lib/python3.8/distutils/dist.py", line 985, in run_command
cmd_obj.run()
File "/opt/conda/envs/ptca/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
_build_ext.run(self)
File "/opt/conda/envs/ptca/lib/python3.8/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/opt/conda/envs/ptca/lib/python3.8/distutils/command/build_ext.py", line 449, in build_extensions
self._build_extensions_serial()
File "/opt/conda/envs/ptca/lib/python3.8/distutils/command/build_ext.py", line 474, in _build_extensions_serial
self.build_extension(ext)
File "setup.py", line 59, in build_extension
raise Exception(
Exception:
Build fast_rnnt failed. Please check the error message.
You can ask for help by creating an issue on GitHub.

Click:
https://github.com/danpovey/fast_rnnt/issues/new

pip error

Train loss is nan or inf

After using the fast_rnnt loss in my environment， the trainning loss always failed into nan or inf.
The configuration fo my ConformerTransducer enviroment is as follows:

v100-32g-4gpu * 2
platform: fairseq
max_tokens: 5000 and update_freq: 13 (ie. batch_size 5000 * 13 * 8)
warmup_lr 1e-7 & lr: 1e-4 & lr_scheduler inverse_sqrt & warmup_updates is 8000
-optimizer adam
-pruned_loss_scaled = 0 if num_updates <= 10000
pruned_loss_scaled = 0.1 if 10000 < num_updates <= 20000
pruned_loss_scaled = 1 if num_updates > 20000

Finally, 6k hours training data are used to train the RNNT model. At the warmup stage (i.e.pruned_loss_scaled = 0 )， the loss always failed into nan，Also when pruned_loss_scaled is set to 0.1 , the loss always failed into inf.

Is there any suggestions to solve this problem?

[feature request] Enable github actions

I just find GitHub actions for this repo have not been enabled yet.

I suggest that we enable GitHub actions for this repo and also add CI tests.

Issues like #27 should be covered by CI.

missing: CUDNN_LIBRARY_PATH CUDNN_INCLUDE_PATH when installing

I'm getting this error when I tried to install Fast RNNT from GitHub repository. These are the commands I used:

$ git clone https://github.com/danpovey/fast_rnnt.git
$ cd fast_rnnt
$ python setup.py install

All needed requirements are met:

cmake version: 3.17.5
gcc version: 8.3.1
python version: 3.9.4
pytorch version: 1.10.1+cu102

The following is the full error trace:

running install
/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
writing fast_rnnt.egg-info/PKG-INFO
writing dependency_links to fast_rnnt.egg-info/dependency_links.txt
writing requirements to fast_rnnt.egg-info/requires.txt
writing top-level names to fast_rnnt.egg-info/top_level.txt
reading manifest file 'fast_rnnt.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '*.pyc' found anywhere in distribution
adding license file 'LICENSE'
writing manifest file 'fast_rnnt.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
copying fast_rnnt/python/fast_rnnt/__init__.py -> build/lib.linux-x86_64-cpython-39/fast_rnnt
running build_ext
For fast compilation, run:
export FT_MAKE_ARGS="-j"; python setup.py install
Setting PYTHON_EXECUTABLE to /gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/bin/PYTHON39
build command is:

            cd build/temp.linux-x86_64-cpython-39

            cmake -DCMAKE_BUILD_TYPE=Release -DPYTHON_EXECUTABLE=/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/bin/PYTHON39 /gfs/project/stag/users/manwar/speechbrain_PR/fast_rnnt

            make  _fast_rnnt
        
-- C++ Standard version: 14
-- Enabled languages: CXX;CUDA
-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;7.5+PTX
-- FT_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_75,code=compute_75
-- FT_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75
-- Adding arch 35
-- Adding arch 50
-- Adding arch 60
-- Adding arch 61
-- Adding arch 70
-- Adding arch 75
-- FT_COMPUTE_ARCHS: 35;50;60;61;70;75
-- Downloading pybind11
-- pybind11 is downloaded to /home/manwar/stag/speechbrain_PR/fast_rnnt/build/temp.linux-x86_64-cpython-39/_deps/pybind11-src
-- pybind11 v2.6.0 
-- Python executable: /home/manwar/stag/speechbrain_PR/py39_PR/bin/PYTHON39
-- Caffe2: CUDA detected: 10.2
-- Caffe2: CUDA nvcc is: /nfs/core/cuda/10.2/bin/nvcc
-- Caffe2: CUDA toolkit directory: /nfs/core/cuda/10.2
-- Caffe2: Header version is: 10.2
-- Could NOT find CUDNN (missing: CUDNN_LIBRARY_PATH CUDNN_INCLUDE_PATH) 
CMake Warning at /home/manwar/stag/speechbrain_PR/py39_PR/lib/python3.9/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:111 (message):
  Caffe2: Cannot find cuDNN library.  Turning the option off
Call Stack (most recent call first):
  /home/manwar/stag/speechbrain_PR/py39_PR/lib/python3.9/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:88 (include)
  /home/manwar/stag/speechbrain_PR/py39_PR/lib/python3.9/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  cmake/torch.cmake:11 (find_package)
  CMakeLists.txt:135 (include)


-- /nfs/core/cuda/10.2/lib64/libnvrtc.so shorthash is 08c4863f
-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 3.5;5.0;5.2;6.0;6.1;7.0;7.5;7.5+PTX
-- Added CUDA NVCC flags for: -gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_75,code=compute_75
CMake Error at /home/manwar/stag/speechbrain_PR/py39_PR/lib/python3.9/site-packages/torch/share/cmake/Caffe2/Caffe2Config.cmake:96 (message):
  Your installed Caffe2 version uses cuDNN but I cannot find the cuDNN
  libraries.  Please set the proper cuDNN prefixes and / or install cuDNN.
Call Stack (most recent call first):
  /home/manwar/stag/speechbrain_PR/py39_PR/lib/python3.9/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
  cmake/torch.cmake:11 (find_package)
  CMakeLists.txt:135 (include)


-- Configuring incomplete, errors occurred!
See also "/home/manwar/stag/speechbrain_PR/fast_rnnt/build/temp.linux-x86_64-cpython-39/CMakeFiles/CMakeOutput.log".
See also "/home/manwar/stag/speechbrain_PR/fast_rnnt/build/temp.linux-x86_64-cpython-39/CMakeFiles/CMakeError.log".
make: *** No rule to make target '_fast_rnnt'.  Stop.
Traceback (most recent call last):
  File "/gfs/project/stag/users/manwar/speechbrain_PR/fast_rnnt/setup.py", line 106, in <module>
    setuptools.setup(
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/__init__.py", line 87, in setup
    return distutils.core.setup(**attrs)
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 148, in setup
    return run_commands(dist)
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 163, in run_commands
    dist.run_commands()
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 967, in run_commands
    self.run_command(cmd)
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/dist.py", line 1224, in run_command
    super().run_command(command)
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
    cmd_obj.run()
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/command/install.py", line 74, in run
    self.do_egg_install()
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/command/install.py", line 123, in do_egg_install
    self.run_command('bdist_egg')
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/dist.py", line 1224, in run_command
    super().run_command(command)
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
    cmd_obj.run()
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 165, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/command/bdist_egg.py", line 151, in call_command
    self.run_command(cmdname)
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/dist.py", line 1224, in run_command
    super().run_command(command)
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
    cmd_obj.run()
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/command/install_lib.py", line 11, in run
    self.build()
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/_distutils/command/install_lib.py", line 107, in build
    self.run_command('build_ext')
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/dist.py", line 1224, in run_command
    super().run_command(command)
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
    cmd_obj.run()
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 339, in run
    self.build_extensions()
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 448, in build_extensions
    self._build_extensions_serial()
  File "/gfs/project/stag/users/manwar/speechbrain_PR/py39_PR/lib/python3.9/site-packages/setuptools/_distutils/command/build_ext.py", line 473, in _build_extensions_serial
    self.build_extension(ext)
  File "/gfs/project/stag/users/manwar/speechbrain_PR/fast_rnnt/setup.py", line 60, in build_extension
    raise Exception(
Exception: 
Build fast_rnnt failed. Please check the error message.
You can ask for help by creating an issue on GitHub.

Click:
	https://github.com/danpovey/fast_rnnt/issues/new

CUDA error

I have the following problems when using rnnt_loss_simple:

I checked the shape of the input tensor：

and the output of the code：

python=3.8.11 torch verison=1.10.1 cudatoolkit=10.2.89 CUDA version=10.2.

Is this a known issue? How can it be debugged and solved?

Thank you!

RuntimeError: invalid device ordinal

I ran the fast_rnnt.get_rnnt_prune_ranges() function with a RuntimeError: Invalid device ordinal .
Here are the error details.
ranges = self.fast_rnnt.get_rnnt_prune_ranges(
File "/opt/conda/lib/python3.8/site-packages/fast_rnnt-1.0-py3.8-linux-x86_64.egg/fast_rnnt/rnnt_loss.py", line 580, in get_rnnt_prune_ranges
s_begin = _adjust_pruning_lower_bound(s_begin, 2 if T1 == T else s_range)
File "/opt/conda/lib/python3.8/site-packages/fast_rnnt-1.0-py3.8-linux-x86_64.egg/fast_rnnt/rnnt_loss.py", line 466, in adjust_pruning_lower_bound
fast_rnnt.monotonic_lower_bound(s_begin)
RuntimeError: invalid device ordinal

[Help wanted] Support BUILD_FOR_ALL_ARCHS

To prevent the following error,

we use

option(K2_BUILD_FOR_ALL_ARCHS "Whether to build k2 for all GPU architectures" OFF)

https://github.com/k2-fsa/k2/blob/efd83642a940dc7db08688cc0791985bed1fafcd/CMakeLists.txt#L87

in k2.

It would be nice if someone can port K2_BUILD_FOR_ALL_ARCHS to this reposiotry.

C++ Version Error While Installing

Hello,
I'm trying to install fast_rnnt in a GPU Ubuntu environment with torch==2.1.0 and CUDA version 11.8.

If I just run pip install --user fast_rnnt, I get the following error log (TLDR I think torch 2.1 wants C++ 17)

Building wheels for collected packages: fast_rnnt
  Building wheel for fast_rnnt (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [238 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-310
      creating build/lib.linux-x86_64-cpython-310/fast_rnnt
      copying fast_rnnt/python/fast_rnnt/__init__.py -> build/lib.linux-x86_64-cpython-310/fast_rnnt
      copying fast_rnnt/python/fast_rnnt/mutual_information.py -> build/lib.linux-x86_64-cpython-310/fast_rnnt
      copying fast_rnnt/python/fast_rnnt/rnnt_loss.py -> build/lib.linux-x86_64-cpython-310/fast_rnnt
      running build_ext
      Setting PYTHON_EXECUTABLE to /opt/conda/bin/python3.10
      build command is:
      
                  cd build/temp.linux-x86_64-cpython-310
      
                  cmake -DCMAKE_BUILD_TYPE=Release -DFT_BUILD_TESTS=OFF -DPYTHON_EXECUTABLE=/opt/conda/bin/python3.10 /tmp/pip-install-piiz6sr_/fast-rnnt_4708b187802f48f2b24324c089385516
      
                  make  -j  _fast_rnnt
      
      -- C++ Standard version: 14
      -- Enabled languages: CXX;CUDA
      -- The CXX compiler identification is GNU 9.4.0
      -- The CUDA compiler identification is NVIDIA 11.8.89
      -- Check for working CXX compiler: /usr/bin/c++
      -- Check for working CXX compiler: /usr/bin/c++ -- works
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc
      -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc -- works
      -- Detecting CUDA compiler ABI info
      -- Detecting CUDA compiler ABI info - done
      -- Autodetected CUDA architecture(s):  8.0
      -- FT_COMPUTE_ARCH_FLAGS: -gencode;arch=compute_80,code=sm_80
      -- FT_COMPUTE_ARCH_CANDIDATES 35;50;60;61;70;75;80;86
      -- Skipping arch 35
      -- Skipping arch 50
      -- Skipping arch 60
      -- Skipping arch 61
      -- Skipping arch 70
      -- Skipping arch 75
      -- Adding arch 80
      -- Skipping arch 86
      -- FT_COMPUTE_ARCHS: 80
      -- Downloading pybind11
      -- pybind11 is downloaded to /tmp/pip-install-piiz6sr_/fast-rnnt_4708b187802f48f2b24324c089385516/build/temp.linux-x86_64-cpython-310/_deps/pybind11-src
      -- pybind11 v2.6.0
      -- Found PythonInterp: /opt/conda/bin/python3.10 (found version "3.10.12")
      -- Found PythonLibs: /opt/conda/lib/libpython3.10.so
      -- Performing Test HAS_FLTO
      -- Performing Test HAS_FLTO - Success
      -- Python executable: /opt/conda/bin/python3.10
      -- Found CUDA: /usr/local/cuda (found version "11.8")
      -- Found CUDAToolkit: /usr/local/cuda/include (found version "11.8.89")
      -- Looking for C++ include pthread.h
      -- Looking for C++ include pthread.h - found
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
      -- Looking for pthread_create in pthreads
      -- Looking for pthread_create in pthreads - not found
      -- Looking for pthread_create in pthread
      -- Looking for pthread_create in pthread - found
      -- Found Threads: TRUE
      -- Caffe2: CUDA detected: 11.8
      -- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
      -- Caffe2: CUDA toolkit directory: /usr/local/cuda
      -- Caffe2: Header version is: 11.8
      -- /usr/local/cuda/lib64/libnvrtc.so shorthash is 672ee683
      -- USE_CUDNN is set to 0. Compiling without cuDNN support
      -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
      -- Autodetected CUDA architecture(s):  8.0
      -- Added CUDA NVCC flags for: -gencode;arch=compute_80,code=sm_80
      CMake Warning at /opt/conda/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
        static library kineto_LIBRARY-NOTFOUND not found.
      Call Stack (most recent call first):
        /opt/conda/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found)
        cmake/torch.cmake:11 (find_package)
        CMakeLists.txt:136 (include)
      
      
      -- Found Torch: /opt/conda/lib/python3.10/site-packages/torch/lib/libtorch.so
      -- PyTorch version: 2.1.0
      -- Configuring done
      CMake Warning at build/temp.linux-x86_64-cpython-310/_deps/pybind11-src/tools/pybind11Tools.cmake:147 (add_library):
        Cannot generate a safe runtime search path for target _fast_rnnt because
        files in some directories may conflict with libraries in implicit
        directories:
      
          runtime library [libcudart.so.11.0] in /usr/local/cuda/lib64 may be hidden by files in:
            /opt/conda/lib
          runtime library [libnvToolsExt.so.1] in /usr/local/cuda/lib64 may be hidden by files in:
            /opt/conda/lib
          runtime library [libcufft.so.10] in /usr/local/cuda/lib64 may be hidden by files in:
            /opt/conda/lib
          runtime library [libcurand.so.10] in /usr/local/cuda/lib64 may be hidden by files in:
            /opt/conda/lib
          runtime library [libcublas.so.11] in /usr/local/cuda/lib64 may be hidden by files in:
            /opt/conda/lib
          runtime library [libcublasLt.so.11] in /usr/local/cuda/lib64 may be hidden by files in:
            /opt/conda/lib
          runtime library [libnvrtc.so.11.2] in /usr/local/cuda/lib64 may be hidden by files in:
            /opt/conda/lib
      
        Some of these libraries may not be found correctly.
      Call Stack (most recent call first):
        fast_rnnt/python/csrc/CMakeLists.txt:16 (pybind11_add_module)
      
      
      -- Generating done
      -- Build files have been written to: /tmp/pip-install-piiz6sr_/fast-rnnt_4708b187802f48f2b24324c089385516/build/temp.linux-x86_64-cpython-310
      Scanning dependencies of target mutual_information_core
      [ 16%] Building CUDA object fast_rnnt/csrc/CMakeFiles/mutual_information_core.dir/mutual_information_cpu.cu.o
      [ 33%] Building CUDA object fast_rnnt/csrc/CMakeFiles/mutual_information_core.dir/mutual_information_cuda.cu.o
      In file included from /opt/conda/lib/python3.10/site-packages/torch/include/torch/extension.h:5,
                       from /tmp/pip-install-piiz6sr_/fast-rnnt_4708b187802f48f2b24324c089385516/fast_rnnt/csrc/mutual_information.h:26,
                       from /tmp/pip-install-piiz6sr_/fast-rnnt_4708b187802f48f2b24324c089385516/fast_rnnt/csrc/mutual_information_cpu.cu:22:
      /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:2: error: #error C++17 or later compatible compiler is required to use PyTorch.
          4 | #error C++17 or later compatible compiler is required to use PyTorch.
            |  ^~~~~
      In file included from /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/string_view.h:4,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/StringUtil.h:6,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/Exception.h:5,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/c10/core/Device.h:5,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/c10/core/impl/InlineDeviceGuard.h:6,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/c10/core/DeviceGuard.h:3,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/c10/cuda/CUDAStream.h:8,
                       from /tmp/pip-install-piiz6sr_/fast-rnnt_4708b187802f48f2b24324c089385516/fast_rnnt/csrc/mutual_information_cuda.cu:21:
      /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/C++17.h:27:2: error: #error You need C++17 to compile PyTorch
         27 | #error You need C++17 to compile PyTorch
            |  ^~~~~
      In file included from /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/string_view.h:4,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/StringUtil.h:6,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/Exception.h:5,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/c10/core/Device.h:5,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/ATen/core/TensorBody.h:11,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/ATen/core/Tensor.h:3,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/ATen/Tensor.h:3,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/autograd/function_hook.h:3,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/autograd/cpp_hook.h:2,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/autograd/variable.h:6,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/autograd/autograd.h:3,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/autograd.h:3,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/all.h:7,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/extension.h:5,
                       from /tmp/pip-install-piiz6sr_/fast-rnnt_4708b187802f48f2b24324c089385516/fast_rnnt/csrc/mutual_information.h:26,
                       from /tmp/pip-install-piiz6sr_/fast-rnnt_4708b187802f48f2b24324c089385516/fast_rnnt/csrc/mutual_information_cpu.cu:22:
      /opt/conda/lib/python3.10/site-packages/torch/include/c10/util/C++17.h:27:2: error: #error You need C++17 to compile PyTorch
         27 | #error You need C++17 to compile PyTorch
            |  ^~~~~
      In file included from /opt/conda/lib/python3.10/site-packages/torch/include/torch/extension.h:5,
                       from /tmp/pip-install-piiz6sr_/fast-rnnt_4708b187802f48f2b24324c089385516/fast_rnnt/csrc/mutual_information.h:26,
                       from /tmp/pip-install-piiz6sr_/fast-rnnt_4708b187802f48f2b24324c089385516/fast_rnnt/csrc/mutual_information_cuda.cu:24:
      /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4:2: error: #error C++17 or later compatible compiler is required to use PyTorch.
          4 | #error C++17 or later compatible compiler is required to use PyTorch.
            |  ^~~~~
      In file included from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:4,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/all.h:9,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/extension.h:5,
                       from /tmp/pip-install-piiz6sr_/fast-rnnt_4708b187802f48f2b24324c089385516/fast_rnnt/csrc/mutual_information.h:26,
                       from /tmp/pip-install-piiz6sr_/fast-rnnt_4708b187802f48f2b24324c089385516/fast_rnnt/csrc/mutual_information_cpu.cu:22:
      /opt/conda/lib/python3.10/site-packages/torch/include/ATen/ATen.h:4:2: error: #error C++17 or later compatible compiler is required to use ATen.
          4 | #error C++17 or later compatible compiler is required to use ATen.
            |  ^~~~~
      In file included from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:4,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/all.h:9,
                       from /opt/conda/lib/python3.10/site-packages/torch/include/torch/extension.h:5,
                       from /tmp/pip-install-piiz6sr_/fast-rnnt_4708b187802f48f2b24324c089385516/fast_rnnt/csrc/mutual_information.h:26,
                       from /tmp/pip-install-piiz6sr_/fast-rnnt_4708b187802f48f2b24324c089385516/fast_rnnt/csrc/mutual_information_cuda.cu:24:
      /opt/conda/lib/python3.10/site-packages/torch/include/ATen/ATen.h:4:2: error: #error C++17 or later compatible compiler is required to use ATen.
          4 | #error C++17 or later compatible compiler is required to use ATen.
            |  ^~~~~
      make[3]: *** [fast_rnnt/csrc/CMakeFiles/mutual_information_core.dir/build.make:63: fast_rnnt/csrc/CMakeFiles/mutual_information_core.dir/mutual_information_cpu.cu.o] Error 1
      make[3]: *** Waiting for unfinished jobs....
      make[3]: *** [fast_rnnt/csrc/CMakeFiles/mutual_information_core.dir/build.make:76: fast_rnnt/csrc/CMakeFiles/mutual_information_core.dir/mutual_information_cuda.cu.o] Error 1
      make[2]: *** [CMakeFiles/Makefile2:191: fast_rnnt/csrc/CMakeFiles/mutual_information_core.dir/all] Error 2
      make[1]: *** [CMakeFiles/Makefile2:225: fast_rnnt/python/csrc/CMakeFiles/_fast_rnnt.dir/rule] Error 2
      make: *** [Makefile:131: _fast_rnnt] Error 2
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-piiz6sr_/fast-rnnt_4708b187802f48f2b24324c089385516/setup.py", line 105, in <module>
          setuptools.setup(
        File "/opt/conda/lib/python3.10/site-packages/setuptools/__init__.py", line 103, in setup
          return distutils.core.setup(**attrs)
        File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
        File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
          dist.run_commands()
        File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/opt/conda/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command
          super().run_command(command)
        File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/opt/conda/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 368, in run
          self.run_command("build")
        File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/opt/conda/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command
          super().run_command(command)
        File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 131, in run
          self.run_command(cmd_name)
        File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/opt/conda/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command
          super().run_command(command)
        File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/opt/conda/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 88, in run
          _build_ext.run(self)
        File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
          self.build_extensions()
        File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
          self._build_extensions_serial()
        File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
          self.build_extension(ext)
        File "/tmp/pip-install-piiz6sr_/fast-rnnt_4708b187802f48f2b24324c089385516/setup.py", line 59, in build_extension
          raise Exception(
      Exception:
      Build fast_rnnt failed. Please check the error message.
      You can ask for help by creating an issue on GitHub.
      
      Click:
          https://github.com/danpovey/fast_rnnt/issues/new
      
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for fast_rnnt
  Running setup.py clean for fast_rnnt
Failed to build fast_rnnt
ERROR: Could not build wheels for fast_rnnt, which is required to install pyproject.toml-based projects

To fix it, I tried setting the C++ version to 17 here and then running python setup.py install. I then get the following:

running bdist_egg
running egg_info
writing fast_rnnt.egg-info/PKG-INFO
writing dependency_links to fast_rnnt.egg-info/dependency_links.txt
writing requirements to fast_rnnt.egg-info/requires.txt
writing top-level names to fast_rnnt.egg-info/top_level.txt
reading manifest file 'fast_rnnt.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '*.pyc' found anywhere in distribution
adding license file 'LICENSE'
writing manifest file 'fast_rnnt.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
copying fast_rnnt/python/fast_rnnt/mutual_information.py -> build/lib.linux-x86_64-cpython-310/fast_rnnt
copying fast_rnnt/python/fast_rnnt/rnnt_loss.py -> build/lib.linux-x86_64-cpython-310/fast_rnnt
copying fast_rnnt/python/fast_rnnt/__init__.py -> build/lib.linux-x86_64-cpython-310/fast_rnnt
running build_ext
Setting PYTHON_EXECUTABLE to /opt/conda/bin/python
build command is:

            cd build/temp.linux-x86_64-cpython-310

            cmake -DCMAKE_BUILD_TYPE=Release -DFT_BUILD_TESTS=OFF -DPYTHON_EXECUTABLE=/opt/conda/bin/python /home/jovyan/fast_rnnt

            make  -j  _fast_rnnt
        
-- C++ Standard version: 17
-- Enabled languages: CXX;CUDA
CMake Error at /home/jovyan/fast_rnnt/build/temp.linux-x86_64-cpython-310/CMakeFiles/CMakeTmp/CMakeLists.txt:15 (add_executable):
  CUDA_STANDARD is set to invalid value '17'


CMake Error at cmake/select_compute_arch.cmake:141 (try_run):
  Failed to generate test project build system.
Call Stack (most recent call first):
  cmake/select_compute_arch.cmake:201 (CUDA_DETECT_INSTALLED_GPUS)
  CMakeLists.txt:95 (cuda_select_nvcc_arch_flags)


-- Configuring incomplete, errors occurred!
See also "/home/jovyan/fast_rnnt/build/temp.linux-x86_64-cpython-310/CMakeFiles/CMakeOutput.log".
make: *** No rule to make target '_fast_rnnt'.  Stop.
Traceback (most recent call last):
  File "/home/jovyan/fast_rnnt/setup.py", line 105, in <module>
    setuptools.setup(
  File "/opt/conda/lib/python3.10/site-packages/setuptools/__init__.py", line 103, in setup
    return distutils.core.setup(**attrs)
  File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
    return run_commands(dist)
  File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "/opt/conda/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command
    super().run_command(command)
  File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.10/site-packages/setuptools/command/install.py", line 84, in run
    self.do_egg_install()
  File "/opt/conda/lib/python3.10/site-packages/setuptools/command/install.py", line 132, in do_egg_install
    self.run_command('bdist_egg')
  File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command
    super().run_command(command)
  File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.10/site-packages/setuptools/command/bdist_egg.py", line 167, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/opt/conda/lib/python3.10/site-packages/setuptools/command/bdist_egg.py", line 153, in call_command
    self.run_command(cmdname)
  File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command
    super().run_command(command)
  File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.10/site-packages/setuptools/command/install_lib.py", line 11, in run
    self.build()
  File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/command/install_lib.py", line 111, in build
    self.run_command('build_ext')
  File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "/opt/conda/lib/python3.10/site-packages/setuptools/dist.py", line 963, in run_command
    super().run_command(command)
  File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
    cmd_obj.run()
  File "/opt/conda/lib/python3.10/site-packages/setuptools/command/build_ext.py", line 88, in run
    _build_ext.run(self)
  File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
    self.build_extensions()
  File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
    self._build_extensions_serial()
  File "/opt/conda/lib/python3.10/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/jovyan/fast_rnnt/setup.py", line 59, in build_extension
    raise Exception(
Exception: 
Build fast_rnnt failed. Please check the error message.
You can ask for help by creating an issue on GitHub.

Click:
        https://github.com/danpovey/fast_rnnt/issues/new

Is there any way to install this package using torch 2.1.0? Thanks for your help!

Trying to Understand pruned_loss

Using my transducer model, I have tried both the pruned and the unpruned loss. The unpruned version worked pretty well, even outperforming the torchaudio.rnnt_loss. The problem is within the pruned version. The model is very slow to converge and the WER & CER are not improving knowing that I tried different prune_range values. Is this expected?

Also, I was wondering what is the best way to understand the pruned loss other than reading the code?

AssertionError: assert py.is_contiguous()

I'm working on integrating FastRNNT with Speechbrain, check this Pull Request.

At the current moment, I'm trying to train a transducer model on the multilingual TEDx dataset (mTEDx) for French. Whenever I train my model, I get this assertion error (he issue's title). However, it says in the mutual_information.py file that:

# The following assertions are for efficiency
assert px.is_contiguous()
assert py.is_contiguous()

Once I comment these two lines, everything works just fine. Using a transducer model with an encoder of wav2vec2 pre-trained model + one linear layer, and a one layer GRU as a decoder, the model trains just fine and I got 14.37 WER on the French test set which is way better than our baseline.

Now, I have these two questions:

How do I avoid getting this AssertionError?
Does commenting these two assertions hurt the performance?

Your guidance is much appreciated!