mdangschat / ctc-asr Goto Github PK

End-to-end trained speech recognition system, based on RNNs and the connectionist temporal classification (CTC) cost function.

License: MIT License

Shell 1.46% Python 98.54%

ctc asr machine-learning speech-recognition mit neural-network tensorflow python python3

ctc-asr's People

Contributors

Stargazers

Watchers

ctc-asr's Issues

Update Download Script for Common Voice v2

New release available: https://voice.mozilla.org/en/datasets
Not sure if v2 is intended to be downloaded automatically, though.

Inference garph

Hello,
I just wanted to know where you are saving the .pbtxt file? I noticed your code creates this graph file but I am not able to locate the code snippet for creating it.
Thanks in advance

Error with Output:Node Name for freezing the graph.

Hi, I am trying to freeze the graph but when I use "bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph=/path_to_file/graph.pbtxt", I get this:
How do I know which one to use amongst this?

No inputs spotted.
Found 36 variables: (name=global_step, type=int64(9), shape=[]) (name=conv/conv2d/kernel, type=float(1), shape=[11,41,1,32]) (name=conv/conv2d/bias, type=float(1), shape=[32]) (name=conv/conv2d_1/kernel, type=float(1), shape=[11,21,32,32]) (name=conv/conv2d_1/bias, type=float(1), shape=[32]) (name=conv/conv2d_2/kernel, type=float(1), shape=[11,21,32,96]) (name=conv/conv2d_2/bias, type=float(1), shape=[96]) (name=rnn/cudnn_lstm/opaque_kernel, type=float(1), shape=) (name=dense4/dense/kernel, type=float(1), shape=[4096,2048]) (name=dense4/dense/bias, type=float(1), shape=[2048]) (name=logits/dense/kernel, type=float(1), shape=[2048,29]) (name=logits/dense/bias, type=float(1), shape=[29]) (name=beta1_power, type=float(1), shape=[]) (name=beta2_power, type=float(1), shape=[]) (name=conv/conv2d/kernel/Adam, type=float(1), shape=[11,41,1,32]) (name=conv/conv2d/kernel/Adam_1, type=float(1), shape=[11,41,1,32]) (name=conv/conv2d/bias/Adam, type=float(1), shape=[32]) (name=conv/conv2d/bias/Adam_1, type=float(1), shape=[32]) (name=conv/conv2d_1/kernel/Adam, type=float(1), shape=[11,21,32,32]) (name=conv/conv2d_1/kernel/Adam_1, type=float(1), shape=[11,21,32,32]) (name=conv/conv2d_1/bias/Adam, type=float(1), shape=[32]) (name=conv/conv2d_1/bias/Adam_1, type=float(1), shape=[32]) (name=conv/conv2d_2/kernel/Adam, type=float(1), shape=[11,21,32,96]) (name=conv/conv2d_2/kernel/Adam_1, type=float(1), shape=[11,21,32,96]) (name=conv/conv2d_2/bias/Adam, type=float(1), shape=[96]) (name=conv/conv2d_2/bias/Adam_1, type=float(1), shape=[96]) (name=rnn/cudnn_lstm/opaque_kernel/Adam, type=float(1), shape=) (name=rnn/cudnn_lstm/opaque_kernel/Adam_1, type=float(1), shape=) (name=dense4/dense/kernel/Adam, type=float(1), shape=[4096,2048]) (name=dense4/dense/kernel/Adam_1, type=float(1), shape=[4096,2048]) (name=dense4/dense/bias/Adam, type=float(1), shape=[2048]) (name=dense4/dense/bias/Adam_1, type=float(1), shape=[2048]) (name=logits/dense/kernel/Adam, type=float(1), shape=[2048,29]) (name=logits/dense/kernel/Adam_1, type=float(1), shape=[2048,29]) (name=logits/dense/bias/Adam, type=float(1), shape=[29]) (name=logits/dense/bias/Adam_1, type=float(1), shape=[29])
Found 59 possible outputs: (name=global_step/read, op=Identity) (name=global_step/cond/switch_t, op=Identity) (name=global_step/cond/switch_f, op=Identity) (name=global_step/add, op=Add) (name=seed2, op=Select) (name=IteratorToStringHandle, op=IteratorToStringHandle) (name=rnn/cudnn_lstm/Identity, op=Identity) (name=rnn/cudnn_lstm/zeros/Less, op=Less) (name=rnn/cudnn_lstm/zeros_1/Less, op=Less) (name=dense4/dense/kernel/Regularizer/l2_regularizer, op=Mul) (name=dense_to_sparse/Shape, op=Shape) (name=gradients/zeros_like, op=ZerosLike) (name=gradients/dense4/dropout/dropout/mul_grad/tuple/control_dependency_1, op=Identity) (name=gradients/dense4/dropout/dropout/truediv_grad/tuple/control_dependency_1, op=Identity) (name=gradients/dense4/Minimum_grad/tuple/control_dependency_1, op=Identity) (name=gradients/zeros_like_3, op=ZerosLike) (name=gradients/rnn/cudnn_lstm/CudnnRNN_grad/tuple/control_dependency_1, op=Identity) (name=gradients/rnn/cudnn_lstm/CudnnRNN_grad/tuple/control_dependency_2, op=Identity) (name=gradients/conv/Minimum_2_grad/tuple/control_dependency_1, op=Identity) (name=gradients/conv/Minimum_1_grad/tuple/control_dependency_1, op=Identity) (name=gradients/conv/Minimum_grad/tuple/control_dependency_1, op=Identity) (name=gradients/conv/conv2d/Conv2D_grad/tuple/control_dependency, op=Identity) (name=conv/conv2d/kernel/Adam/read, op=Identity) (name=conv/conv2d/kernel/Adam_1/read, op=Identity) (name=conv/conv2d/bias/Adam/read, op=Identity) (name=conv/conv2d/bias/Adam_1/read, op=Identity) (name=conv/conv2d_1/kernel/Adam/read, op=Identity) (name=conv/conv2d_1/kernel/Adam_1/read, op=Identity) (name=conv/conv2d_1/bias/Adam/read, op=Identity) (name=conv/conv2d_1/bias/Adam_1/read, op=Identity) (name=conv/conv2d_2/kernel/Adam/read, op=Identity) (name=conv/conv2d_2/kernel/Adam_1/read, op=Identity) (name=conv/conv2d_2/bias/Adam/read, op=Identity) (name=conv/conv2d_2/bias/Adam_1/read, op=Identity) (name=cond/switch_t, op=Identity) (name=cond/switch_f, op=Identity) (name=zeros, op=Fill) (name=rnn/cudnn_lstm/opaque_kernel/Adam/cond/switch_t, op=Identity) (name=rnn/cudnn_lstm/opaque_kernel/Adam/cond/switch_f, op=Identity) (name=rnn/cudnn_lstm/opaque_kernel/Adam/read, op=Identity) (name=cond_1/switch_t, op=Identity) (name=cond_1/switch_f, op=Identity) (name=zeros_1, op=Fill) (name=rnn/cudnn_lstm/opaque_kernel/Adam_1/cond/switch_t, op=Identity) (name=rnn/cudnn_lstm/opaque_kernel/Adam_1/cond/switch_f, op=Identity) (name=rnn/cudnn_lstm/opaque_kernel/Adam_1/read, op=Identity) (name=dense4/dense/kernel/Adam/read, op=Identity) (name=dense4/dense/kernel/Adam_1/read, op=Identity) (name=dense4/dense/bias/Adam/read, op=Identity) (name=dense4/dense/bias/Adam_1/read, op=Identity) (name=logits/dense/kernel/Adam/read, op=Identity) (name=logits/dense/kernel/Adam_1/read, op=Identity) (name=logits/dense/bias/Adam/read, op=Identity) (name=logits/dense/bias/Adam_1/read, op=Identity) (name=Adam, op=AssignAdd) (name=concat, op=ConcatV2) (name=concat_1, op=ConcatV2) (name=Merge/MergeSummary, op=MergeSummary) (name=save/Identity, op=Identity)

how cant i train with my dataset?

i have dataset: 1 folder 'wav' (.wav file), 1 text file have lines = num of wav file with format name_wav text_of_wav
so, how can i train with this data. thanks so much,, im beginer

Update Documentation

Directories:
- Point out the required speech_checkpoints and speech-corpus dirs.
- Remember to update the tree output.
CSV: Add information about the required CSV format to README.md. (#8)
Reference the speech-corpus-dl git.
reset params.py and validate default params. (#10)

Common Voice Dataset

Hi,

I just wanted to know if all the datasets you have used are clean speech? Specifically, wondering about common voice dataset, by any chance have you analyzed the dataset? Since, they have a platform for recording, a mobile app as well as a browser platform, I feel there is a chance that the recordings can be noisy.

Thank you

About models

Hello, can I have a trained model that you don't need? The computing ability of my computer is relatively poor. I want to test the results of the model and then consider training with the cloud services.

Configuration for low memory GPU

I use laptop with 2GB GPU Memory (Nvidia MX150).

I try to build new language model, so i try many source code from deepspeech, pytorch, etc...

to make my laptop capable handle the process. i set the another source code with low batch and number of n_hidden. I already try to reduce the batch to 1 and_number units_rnn to 1024, but your code still insufied GPU memory...

do you have any recommendation of the setting?

command that i use:
python3 asr/train.py -- --used_model ds2 --rnn_cell rnn_relu --feature_type mfcc --batch_size 1 --max_epochs 15 --cudnn True --allow_vram_growth True --num_units_rnn 1024 --delete tensorboard learning_rate 0.00001

Input & output of graph

I wanted to know what are the input & output nodes of the graph generated in your code. Could you please provide me this information?
Thank you in advance

Issue with input and output names

Hello, I am currently trying to freeze the graph from this model and I am unable to do so because when I am inspecting the "graph.pbtxt" created after training, there is no node with the name of "logits/dense".

Please help me figure out what the output node name is so I can freeze the graph to .pb.

Thank you
Regard
Rahul B

Value of Beam Width

Hi
Could I have some info on how beam width is chosen as 1024? What is the role of beam width parameter? I have a confusion regarding this parameter.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.