pannous / tensorflow-speech-recognition Goto Github PK

View Code? Open in Web Editor NEW

2.2K 2.2K 640.0 31.87 MB

🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks

License: Other

Python 98.02% Swift 1.98%

deep-learning neural-network speech-recognition speech-to-text stt tensorflow

tensorflow-speech-recognition's Issues

How to record my own speech?

I have tried to record but unsuccessfully. There is module Record.py but it doesn't save speech.

What should I add to code in order to recognize my own speech?

problems with tensorboard_util.py

tensorboard_util.py will not run on windows; i made the following changes to make it work.

in layer/__init__.py:
changed "from tensorboard_util import *" to "from .tensorboard_util import *"
tensorboard_logs = '/tmp/tensorboard_logs/'
needs to be updated for windows, i just changed it to tensorboard_logs = './tmp/tensorboard_logs/'

logs=subprocess.check_output(["ls", tensorboard_logs]).split("\n")
to
logs=subprocess.check_output(["ls", tensorboard_logs]).decode("utf-8").split("\n")

thanks..

Failed to find any matching files for tflearn.lstm.model in speech2text-tflearn.py

speech2text-tflean.py fails with the error:
Failed to find any matching files for tflearn.lstm.model

or
ValueError: Restore called with invalid save path: 'tflearn.lstm.model'. File path is: 'tflearn.lstm.model'

on both linux and windows.

on lines:
model.load("tflearn.lstm.model")
and
model.save("tflearn.lstm.model")

thanks..

spoken_words_wav.tar

Could you, please, upload spoken_words_wav.tar somewhere?
Thank you.

errors

there are many errors when i use tensorflow 0.12 , could you get me a readnode for speech recognition?
thanks!

Any solution for the error -> Exception: Invalid objective: catagorical_crossentropy

I came across with some errors. I solved some of them but I can't solve this one.
Speech_data.py comes with error below.

Looking for data spoken_numbers_pcm.tar in data/
Extracting data/spoken_numbers_pcm.tar to data/
'tar' is not recognized as an internal or external command,
operable program or batch file.
Data ready!
loaded batch of 2402 files
Traceback (most recent call last):
File "demo.py", line 15, in
net = tflearn.regression(net, optimizer='adam', learning_rate=learning_rate, loss='catagorical_crossentropy')
File "C:\python35\lib\site-packages\tflearn\layers\estimator.py", line 174, in regression
loss = objectives.get(loss)(incoming, placeholder)
File "C:\python35\lib\site-packages\tflearn\objectives.py", line 10, in get
return get_from_module(identifier, globals(), 'objective')
File "C:\python35\lib\site-packages\tflearn\utils.py", line 25, in get_from_module
raise Exception('Invalid ' + str(module_name) + ': ' + str(identifier))
Exception: Invalid objective: catagorical_crossentropy

What is this error and how to do I resolve this?

Can someone tell me how to fix this issue? Please

Broken dependency ?? in densenet_layer.py

Not able to run densenet_layer.py; getting the error below;

Traceback (most recent call last): File "densenet_layer.py", line 4, in <module> import layer File "D:\git\AI\tensorflow-speech-recognition\layer\__init__.py", line 1, in <module> from net import * ImportError: No module named 'net'

i fixed the problem with;

from .net import *
File "D:\git\AI\tensorflow-speech-recognition\layer_init_.py", line 1, in

Running on windows 10, tensorflow 0.12, python3.5

Thanks..

record my own voice

can anyone tell me how do i record my own voice and where we should put it so that we get speech to text converted??

i am getting this issue

NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /home/hitesh/speec/tflearn.lstm.model
[[Node: save_1/RestoreV2_16 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save_1/Const_0, save_1/RestoreV2_16/tensor_names, save_1/RestoreV2_16/shape_and_slices)]]

.

Data for CTC in lstm to chars.

The data directory given for ctc data in the lstm_to_chars.py file is given as -
INPUT_PATH = '/data/ctc/sample_data/mfcc' # directory of MFCC nFeatures x nFrames 2-D array .npy files
Where can I find the data (since it is not available in speech_data.py)?

Train data is used to determine accuracy in dense_layer

I'm trying to use dense_layer. Dense_layer uses spectro_batch_generator from speech_data.py to fetch batches of data. Here it is already noted, that training and testing/validation set needs to be split
# shuffle(files) # todo : split test_fraction batch here!

A bit further in dense_layer, the function train from layer/net.py is used. In the train function, currently around line 389, there is:

  feed_dict = {x: batch_xs, y: batch_ys, keep_prob: dropout, self.train_phase: True}
  loss,_= session.run([self.cost,self.optimizer], feed_dict=feed_dict)

Immediately followed by:

  if step % display_step == 0:
    # Calculate batch accuracy, loss
    feed = {x: batch_xs, y: batch_ys, keep_prob: 1., self.train_phase: False}
    acc , summary = session.run([self.accuracy,self.summaries], feed_dict=feed)

If I understand it correctly (and I am new to this, so it's likely that I am wrong), the data is first fed into the train step, after which the exact same data is used to determine the accuracy.

How to augment data for the spectogram?

Any sample code to do data augmentation on the spectrograms?
Observed that for spectrogram words, they are mainly in 160 format. How do you get other variations such as 40, 60 +? What kind of transformation is that?

Thanks!

predict.py

i need a predict.py file. can anyone please help me out with it.

could not able to run the pannous/tensorflow-speech-recognition

sorry sir , but i have tried a lot to run your application on my system but i am unable to run it on my local system so please help me to run it on my local system i have already installed the setup of python based tensorflow on my system.

could not run train.py

train.py uses a function prepare_data in speech_data, but there is no such a function defined in speech_data

Training Number

The Training keeps going on and on no matter what...I set the training_iters value to 3000 still it keeps going on .. what is the reason?

How to...

How does one use this code? More specifically: How does someone who doesn't have an nvidia GPU to train a model use the speech-to-text?

missing import in speech2text-tflearn.py

missing import for librosa in speech2text-tflearn.py

also require python-tk to be installed through apt-get

Requirements File, Installation Guide

Hello,

The requirements.txt file is missing, I was not able to install the project.
It would be great if we can have an installation guide.

Thanks in advance,
Regards

Error When Run densenet_layer.py

Hello Everybody,
I have a problem.
./number_classifier_tflearn.py ./speaker_classifier_tflearn.py run and success but densenet_layer.py not working
I follow this steps on docker.

docker run -it -v C:\WorkData\GitRespostory\tensorflow-speech-recognition:/tf_speech gcr.io/tensorflow/tensorflow:latest-devel

after on shell command screen show

cd /tensorflow
git pull

then run this steps

apt-get update
apt-get install -y libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0
cd /tf_speech
pip install -r requirements.txt
pip install h5py
pip install librosa

Note: spoken_words.tar file manuel download and copy to folder.
and now
python densenet_layer.py

but show this error, please help me.

Traceback (most recent call last): File "densenet_layer.py", line 69, in <module> net.train(data=batch,batch_size=10,steps=5000,dropout=0.6,display_step=10,test_step=100) # run File "/tf_speech/layer/net.py", line 385, in train loss,_= session.run([self.cost,self.optimizer], feed_dict=feed_dict) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 943, in _run % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (10, 262144) for Tensor u'data/Placeholder:0', which has shape '(?, 4096, 4096)'

multiple problems with speech2text-seq2seq.py

Broken dependency "sugartensor" not in requirements.txt

AND

"tensorflow.examples.tutorials" not available in windows install of tensorflow, needs to be copied manually from git repo of tensorflow.

AND

Line 13, "Update:" needs to be commented in speech2text-seq2seq.py

AND

Traceback (most recent call last): File "speech2text-seq2seq.py", line 65, in <module> z = x.sg_conv1d(size=1, dim=num_dim, act='tanh', bn=True) AttributeError: 'list' object has no attribute 'sg_conv1d'

OS: Windows 10,
TensorFlow: 0.12
Python: 3.5

How to classify a entire sentence.

I have many sounds that WAV format. The duration is about 10 seconds. All the wav is one of the ten sentences. For example: "The number you have dialed is power off"/"dialed number is not exist"/ some other sentences. Is it suitable to use your project to do this?

Training is not using GPU capacity

Hey everyone!

I'm trying to train my model with the speech2text-tflearn code. Unfortunately it takes ages to train (a few days). I have installed Tensorflow with GPU support, but the Code is not using any of the GPUs capacity. I have not changed any paramters. What am I getting wrong? Any suggestions?

Thanks!
Cheers
julitosm

newer paper?

Hello, Pannous,
You do great work!!! Really great!!!
I'm new comer in speech recognition. I noticed that you cite in the project some papers of 2012 and 2014. Now it is 2017. If I want to repeat the state-of-art work, do you think I should read some recent papers beyond the ones you use in this project?
Please give me some suggestion. Thanks in advance.

ImportError: No module named core_rnn

Hi,

I'm trying to run ./number_classifier_tflearn.py and the following error occurs:
hdf5 is not supported on this machine (please install/reinstall h5py for optimal experience) Traceback (most recent call last): File "./number_classifier_tflearn.py", line 3, in <module> import tflearn File "/usr/local/lib/python2.7/site-packages/tflearn/__init__.py", line 21, in <module> from .layers import normalization File "/usr/local/lib/python2.7/site-packages/tflearn/layers/__init__.py", line 10, in <module> from .recurrent import lstm, gru, simple_rnn, bidirectional_rnn, \ File "/usr/local/lib/python2.7/site-packages/tflearn/layers/recurrent.py", line 8, in <module> from tensorflow.contrib.rnn.python.ops.core_rnn import static_rnn as _rnn, \ ImportError: No module named core_rnn

Any suggestions?

OS : OSX Yosemit on Unix

format of file in /data/speech

could you like to give an example of file in /data/speech?

License of the project

Hi!
What license does the project have? Also who is meant to be the copyright owner for the code commited to the project by third party developers?

Where is requirements.txt file?

Who has successfully found requirements.txt file?

ls

Getting some ideas from Wavenet?

Since this project is still in planning stage, I guess we are more open for new ideas. The README mentioned the LSTM, but Wavenet yields better results than LSTM accoring to DeepMind's paper. The Wavenet is explained in the following white paper. Do you think it will be too difficult for us to use the Wavenet approach?

https://drive.google.com/file/d/0B3cxcnOkPx9AeWpLVXhkTDJINDQ/view

Thanks.

ImportError: No module named layer while running densenet_layer.py

Hi,

I could not run densenet_layer.py since it throws import error of the module layer.

Traceback (most recent call last):
File "densenet_layer.py", line 6, in
import layer
ImportError: No module named layer

From my understanding, this is layers in tflearn. But the model architecture defined here doesnt work
net = layer.net(simple_dense, input_shape=(width,height), output_width=classes, learning_rate=0.01)

Thanks
Manishanker

Does not work "number detection using speech" example in this module

Hi @pannous ,

I happy to find example like yours with audio classification. But I see that you need to update your code because it has some problems in running the code.

Speaker Classification Clarification

Hello, I was investing your speaker classification example that uses TFLearn. I had a question about the test audio sample that was used to test the model. I may be mistaken, but I believe that this sample is inside the training set which would not be ideal for testing. Why is this (or isn't this if I am wrong) done?

Thank you in advance for your help!

Train new language

Is it possible to tran new language with a hundred words?

The error when running densenet_layer.py

ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.hdmi.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM hdmi
ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.hdmi.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM hdmi
ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.modem.0:CARD=0'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline:CARD=0,DEV=0
ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.modem.0:CARD=0'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline:CARD=0,DEV=0
ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.modem.0:CARD=0'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM phoneline
ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.modem.0:CARD=0'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM phoneline
[Errno Input overflowed] -9981
Expression 'ret' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 1735
Expression 'AlsaOpen( &alsaApi->baseHostApiRep, params, streamDir, &self->pcm )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 1902
Expression 'PaAlsaStreamComponent_Initialize( &self->capture, alsaApi, inParams, StreamDirection_In, NULL != callback )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2166
Expression 'PaAlsaStream_Initialize( stream, alsaHostApi, inputParameters, outputParameters, sampleRate, framesPerBuffer, callback, streamFlags, userData )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2835
Traceback (most recent call last):
File "record.py", line 101, in record
dataraw = stream.read(CHUNK)
File "/usr/lib/python3/dist-packages/pyaudio.py", line 605, in read
return pa.read_stream(self._stream, num_frames)
OSError: [Errno Input overflowed] -9981

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "record.py", line 138, in
record()
File "record.py", line 104, in record
stream=get_audio_input_stream()
File "record.py", line 71, in get_audio_input_stream
input_device_index=INDEX)
File "/usr/lib/python3/dist-packages/pyaudio.py", line 747, in open
stream = Stream(self, *args, **kwargs)
File "/usr/lib/python3/dist-packages/pyaudio.py", line 442, in init
self._stream = pa.open(**arguments)
OSError: [Errno Device unavailable] -9985

How to create custom 'train_words_index.txt'

I would like to know how to get this sequence number (2 42 14 66 93 19 46 42 24 43 49 3)?

In train_words_index.txt there are number of lines of the word and sequence number like this
'measurement_Victoria_160.wav.png 2 42 14 66 93 19 46 42 24 43 49 3'. I had try to find the way to create this sequence number many where but couldn't be found

Thank you in advance,

Error while reshaping tensors

I am facing problem in reshaping my tensors. Right now I am running train.py from your source but I got the following error:

File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 625, in _run
    % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (100, 4096) for Tensor 'Placeholder:0', which has shape '(?, 262144)'

this is my code snippet

for i in range(6000-1):
    batch_xs, batch_ys = speech.train.next_batch(100)
    # WTF, tensorflow can't do 3D tensor operations?
    # https://github.com/tensorflow/tensorflow/issues/406 =>

    batch_xs=[flatten(matrix) for matrix in batch_xs]

    #batch_ys = np.reshape(batch_ys, (100,4096))
    #batch_xs = np.reshape(batch_xs, (4096,100))

    #  you have to reshape to flat/matrix data? why didn't they call it matrixflow?
    feed = {x: batch_xs, y_: batch_ys}
    speech_step.run(feed) # better for encod_entropy too! (later)
    if(i%100==0):
        print("iteration %d"%i)#, end=' ')
        eval(feed)
    if((i+1)%7000==0):
      print("l_rate*=0.1")
      sess.run(tf.assign(l_rate,l_rate*0.1))
  print("Train")

How to create a speech to text conversion model using this library?

Hi, I'm a beginner to ML concepts and I was wondering whether a speech to text model can be constructed using this library. I'm clueless as of now on how do it. I'd love to be able to test out one and learn from it. Thanks,

can people tell me in what order to run these codes?

I study these codes recently,but i do not konw which code should i run first,and next.can someone help me?please

data download link is broken for words test data?

Any other alternative download link for the data?
I tried clicking the link provided in the source file and in caffe's implementation but to no avail:
https://www.dropbox.com/s/eb5zqskvnuj0r78/spoken_words.tar?dl=0

Thanks!

spoken_words url broken

the url spoken_words is broken
`
Downloading from http://pannous.net/files/spoken_words to data/spoken_words
Traceback (most recent call last):
File "speaker_classifier_tflearn.py", line 17, in

urllib.error.HTTPError: HTTP Error 404: Not Found`

Currently empty

When will the tensorflow version be implemented?

ValueError: No variables to save

I am getting error while running the examples; number_calssifier_tflearn.py and speaker_classifier_tflearn.py. below are details;

Looking for data spoken_numbers_pcm.tar in data/ Extracting data/spoken_numbers_pcm.tar to data/ Data ready! loaded batch of 2402 files loaded batch of 2402 files loaded batch of 2402 files loaded batch of 2402 files loaded batch of 2402 files Traceback (most recent call last): File "number_classifier_tflearn.py", line 26, in <module> model = tflearn.DNN(net) File "C:\Program Files\Anaconda3\lib\site-packages\tflearn\models\dnn.py", line 57, in __init__ session=session) File "C:\Program Files\Anaconda3\lib\site-packages\tflearn\helpers\trainer.py", line 125, in __init__ keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours) File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 1000, in __init__ self.build() File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 1021, in build raise ValueError("No variables to save") ValueError: No variables to save

AND

15 speakers: ['Ralph', 'Albert', 'Vicki', 'Samantha', 'Junior', 'Kathy', 'Fred', 'Princess', 'Steffi', 'Alex', 'Daniel', 'Agnes', 'Victoria', 'Tom', 'Bruce'] speakers ['Ralph', 'Albert', 'Vicki', 'Samantha', 'Junior', 'Kathy', 'Fred', 'Princess', 'Steffi', 'Alex', 'Daniel', 'Agnes', 'Victoria', 'Tom', 'Bruce'] Looking for data spoken_numbers_pcm.tar in data/ Extracting data/spoken_numbers_pcm.tar to data/ Data ready! 15 speakers: ['Ralph', 'Albert', 'Vicki', 'Samantha', 'Junior', 'Kathy', 'Fred', 'Princess', 'Steffi', 'Alex', 'Daniel', 'Agnes', 'Victoria', 'Tom', 'Bruce'] loaded batch of 2402 files Traceback (most recent call last): File "speaker_classifier_tflearn.py", line 27, in <module> model = tflearn.DNN(net) File "C:\Program Files\Anaconda3\lib\site-packages\tflearn\models\dnn.py", line 57, in __init__ session=session) File "C:\Program Files\Anaconda3\lib\site-packages\tflearn\helpers\trainer.py", line 125, in __init__ keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours) File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 1000, in __init__ self.build() File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 1021, in build raise ValueError("No variables to save") ValueError: No variables to save

Thanks..

tflearn error: No variables to save

Hi,
I download the speech_data.py and speaker_classifier_tflearn.py.
When I run the speaker_classifier_tflearn.py, I got errors as follows:
Traceback (most recent call last): File "speaker_classifier_tflearn.py", line 28, in <module> model = tflearn.DNN(net) File "/root/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tflearn/models/dnn.py", line 57, in __init__ session=session) File "/root/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tflearn/helpers/trainer.py", line 125, in __init__ keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours) File "/root/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1000, in __init__ self.build() File "/root/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1021, in build raise ValueError("No variables to save") ValueError: No variables to save

python number_classifier_tflearn.py hangs on wrong path instead of raising error and take long time to get the data

Hi all, awesome project! I checked it out and wasn't able to run the classification procedure.

Running
python number_classifier_tflearn.py causes the code to run properly up to some line and then hangs out for long time and then capture the mouse while hang out.

Create Spectrograms

How did you create "Sample spectrogram, Karen uttering 'zero' with 160 words per minute."? How did you create that gray scale spectrogram?

everything works fine except that patterns recognition index is not retrieved

Greetings

I did the following:

cloned the repository;
installed pre-requisites
trained my dataset using ./number_classifier_tflearn.py
ran ./record.py
erased the repository and repeated the steps copying the output to gist

as stated below
https://gist.github.com/tiagmoraismorgado/673ca5de5317a1583761a314e7d38ab1

even though, everything works fine except that patterns recognition index is not retrieved tensorflow records voice but it doesn't return recognized pattern index. looking forward for help

What should dtype of placeholder y_ in training be?

From speech_encoder.py,
batch_xs, batch_ys = speech.train.next_batch(100)
batch_xs=[flatten(matrix) for matrix in batch_xs]
feed = {x: batch_xs, y_: batch_ys}

The above has the following error:

ValueError: invalid literal for float(): 2 14 68 6 32 14 73 6 47 14 73 3

What should placeholder of y_ be?

Thanks!

TODOs

split test set(s) #28
make input->chars converge well (input->class works well already)
sliding window
merge WarpCTC or alternative
peer2peer training!

pannous / tensorflow-speech-recognition Goto Github PK

tensorflow-speech-recognition's Issues

Greetings

Recommend Projects

Recommend Topics

Recommend Org