pannous / tensorflow-speech-recognition Goto Github PK
View Code? Open in Web Editor NEW🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
License: Other
🎙Speech recognition using the tensorflow deep learning framework, sequence-to-sequence neural networks
License: Other
I have tried to record but unsuccessfully. There is module Record.py but it doesn't save speech.
What should I add to code in order to recognize my own speech?
tensorboard_util.py will not run on windows; i made the following changes to make it work.
in layer/__init__.py
:
changed "from tensorboard_util import *"
to "from .tensorboard_util import *"
tensorboard_logs = '/tmp/tensorboard_logs/'
needs to be updated for windows, i just changed it to tensorboard_logs = './tmp/tensorboard_logs/'
logs=subprocess.check_output(["ls", tensorboard_logs]).split("\n")
to
logs=subprocess.check_output(["ls", tensorboard_logs]).decode("utf-8").split("\n")
thanks..
speech2text-tflean.py fails with the error:
Failed to find any matching files for tflearn.lstm.model
or
ValueError: Restore called with invalid save path: 'tflearn.lstm.model'. File path is: 'tflearn.lstm.model'
on both linux and windows.
on lines:
model.load("tflearn.lstm.model")
and
model.save("tflearn.lstm.model")
thanks..
Could you, please, upload spoken_words_wav.tar somewhere?
Thank you.
there are many errors when i use tensorflow 0.12 , could you get me a readnode for speech recognition?
thanks!
I came across with some errors. I solved some of them but I can't solve this one.
Speech_data.py comes with error below.
Looking for data spoken_numbers_pcm.tar in data/
Extracting data/spoken_numbers_pcm.tar to data/
'tar' is not recognized as an internal or external command,
operable program or batch file.
Data ready!
loaded batch of 2402 files
Traceback (most recent call last):
File "demo.py", line 15, in
net = tflearn.regression(net, optimizer='adam', learning_rate=learning_rate, loss='catagorical_crossentropy')
File "C:\python35\lib\site-packages\tflearn\layers\estimator.py", line 174, in regression
loss = objectives.get(loss)(incoming, placeholder)
File "C:\python35\lib\site-packages\tflearn\objectives.py", line 10, in get
return get_from_module(identifier, globals(), 'objective')
File "C:\python35\lib\site-packages\tflearn\utils.py", line 25, in get_from_module
raise Exception('Invalid ' + str(module_name) + ': ' + str(identifier))
Exception: Invalid objective: catagorical_crossentropy
What is this error and how to do I resolve this?
Can someone tell me how to fix this issue? Please
Not able to run densenet_layer.py; getting the error below;
Traceback (most recent call last): File "densenet_layer.py", line 4, in <module> import layer File "D:\git\AI\tensorflow-speech-recognition\layer\__init__.py", line 1, in <module> from net import * ImportError: No module named 'net'
i fixed the problem with;
from .net import *
File "D:\git\AI\tensorflow-speech-recognition\layer_init_.py", line 1, in
Running on windows 10, tensorflow 0.12, python3.5
Thanks..
can anyone tell me how do i record my own voice and where we should put it so that we get speech to text converted??
NotFoundError (see above for traceback): Unsuccessful TensorSliceReader constructor: Failed to find any matching files for /home/hitesh/speec/tflearn.lstm.model
[[Node: save_1/RestoreV2_16 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_recv_save_1/Const_0, save_1/RestoreV2_16/tensor_names, save_1/RestoreV2_16/shape_and_slices)]]
The data directory given for ctc data in the lstm_to_chars.py file is given as -
INPUT_PATH = '/data/ctc/sample_data/mfcc' # directory of MFCC nFeatures x nFrames 2-D array .npy files
Where can I find the data (since it is not available in speech_data.py)?
I'm trying to use dense_layer. Dense_layer uses spectro_batch_generator from speech_data.py to fetch batches of data. Here it is already noted, that training and testing/validation set needs to be split
# shuffle(files) # todo : split test_fraction batch here!
A bit further in dense_layer, the function train from layer/net.py is used. In the train function, currently around line 389, there is:
feed_dict = {x: batch_xs, y: batch_ys, keep_prob: dropout, self.train_phase: True}
loss,_= session.run([self.cost,self.optimizer], feed_dict=feed_dict)
Immediately followed by:
if step % display_step == 0:
# Calculate batch accuracy, loss
feed = {x: batch_xs, y: batch_ys, keep_prob: 1., self.train_phase: False}
acc , summary = session.run([self.accuracy,self.summaries], feed_dict=feed)
If I understand it correctly (and I am new to this, so it's likely that I am wrong), the data is first fed into the train step, after which the exact same data is used to determine the accuracy.
Any sample code to do data augmentation on the spectrograms?
Observed that for spectrogram words, they are mainly in 160 format. How do you get other variations such as 40, 60 +? What kind of transformation is that?
Thanks!
i need a predict.py file. can anyone please help me out with it.
sorry sir , but i have tried a lot to run your application on my system but i am unable to run it on my local system so please help me to run it on my local system i have already installed the setup of python based tensorflow on my system.
train.py uses a function prepare_data in speech_data, but there is no such a function defined in speech_data
How does one use this code? More specifically: How does someone who doesn't have an nvidia GPU to train a model use the speech-to-text?
missing import for librosa in speech2text-tflearn.py
also require python-tk to be installed through apt-get
Hello,
The requirements.txt file is missing, I was not able to install the project.
It would be great if we can have an installation guide.
Thanks in advance,
Regards
Hello Everybody,
I have a problem.
./number_classifier_tflearn.py ./speaker_classifier_tflearn.py run and success but densenet_layer.py not working
I follow this steps on docker.
docker run -it -v C:\WorkData\GitRespostory\tensorflow-speech-recognition:/tf_speech gcr.io/tensorflow/tensorflow:latest-devel
after on shell command screen show
cd /tensorflow
git pull
then run this steps
apt-get update
apt-get install -y libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0
cd /tf_speech
pip install -r requirements.txt
pip install h5py
pip install librosa
Note: spoken_words.tar file manuel download and copy to folder.
and now
python densenet_layer.py
but show this error, please help me.
Traceback (most recent call last): File "densenet_layer.py", line 69, in <module> net.train(data=batch,batch_size=10,steps=5000,dropout=0.6,display_step=10,test_step=100) # run File "/tf_speech/layer/net.py", line 385, in train loss,_= session.run([self.cost,self.optimizer], feed_dict=feed_dict) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 766, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 943, in _run % (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (10, 262144) for Tensor u'data/Placeholder:0', which has shape '(?, 4096, 4096)'
Broken dependency "sugartensor" not in requirements.txt
AND
"tensorflow.examples.tutorials" not available in windows install of tensorflow, needs to be copied manually from git repo of tensorflow.
AND
Line 13, "Update:" needs to be commented in speech2text-seq2seq.py
AND
Traceback (most recent call last): File "speech2text-seq2seq.py", line 65, in <module> z = x.sg_conv1d(size=1, dim=num_dim, act='tanh', bn=True) AttributeError: 'list' object has no attribute 'sg_conv1d'
OS: Windows 10,
TensorFlow: 0.12
Python: 3.5
I have many sounds that WAV format. The duration is about 10 seconds. All the wav is one of the ten sentences. For example: "The number you have dialed is power off"/"dialed number is not exist"/ some other sentences. Is it suitable to use your project to do this?
Hey everyone!
I'm trying to train my model with the speech2text-tflearn code. Unfortunately it takes ages to train (a few days). I have installed Tensorflow with GPU support, but the Code is not using any of the GPUs capacity. I have not changed any paramters. What am I getting wrong? Any suggestions?
Thanks!
Cheers
julitosm
Hello, Pannous,
You do great work!!! Really great!!!
I'm new comer in speech recognition. I noticed that you cite in the project some papers of 2012 and 2014. Now it is 2017. If I want to repeat the state-of-art work, do you think I should read some recent papers beyond the ones you use in this project?
Please give me some suggestion. Thanks in advance.
Hi,
I'm trying to run ./number_classifier_tflearn.py
and the following error occurs:
hdf5 is not supported on this machine (please install/reinstall h5py for optimal experience) Traceback (most recent call last): File "./number_classifier_tflearn.py", line 3, in <module> import tflearn File "/usr/local/lib/python2.7/site-packages/tflearn/__init__.py", line 21, in <module> from .layers import normalization File "/usr/local/lib/python2.7/site-packages/tflearn/layers/__init__.py", line 10, in <module> from .recurrent import lstm, gru, simple_rnn, bidirectional_rnn, \ File "/usr/local/lib/python2.7/site-packages/tflearn/layers/recurrent.py", line 8, in <module> from tensorflow.contrib.rnn.python.ops.core_rnn import static_rnn as _rnn, \ ImportError: No module named core_rnn
Any suggestions?
OS : OSX Yosemit on Unix
could you like to give an example of file in /data/speech?
Hi!
What license does the project have? Also who is meant to be the copyright owner for the code commited to the project by third party developers?
Who has successfully found requirements.txt file?
Since this project is still in planning stage, I guess we are more open for new ideas. The README mentioned the LSTM, but Wavenet yields better results than LSTM accoring to DeepMind's paper. The Wavenet is explained in the following white paper. Do you think it will be too difficult for us to use the Wavenet approach?
https://drive.google.com/file/d/0B3cxcnOkPx9AeWpLVXhkTDJINDQ/view
Thanks.
Hi,
I could not run densenet_layer.py since it throws import error of the module layer.
Traceback (most recent call last):
File "densenet_layer.py", line 6, in
import layer
ImportError: No module named layer
From my understanding, this is layers in tflearn. But the model architecture defined here doesnt work
net = layer.net(simple_dense, input_shape=(width,height), output_width=classes, learning_rate=0.01)
Thanks
Manishanker
Hi @pannous ,
I happy to find example like yours with audio classification. But I see that you need to update your code because it has some problems in running the code.
Hello, I was investing your speaker classification example that uses TFLearn. I had a question about the test audio sample that was used to test the model. I may be mistaken, but I believe that this sample is inside the training set which would not be ideal for testing. Why is this (or isn't this if I am wrong) done?
Thank you in advance for your help!
Is it possible to tran new language with a hundred words?
ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.hdmi.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM hdmi
ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.hdmi.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM hdmi
ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.modem.0:CARD=0'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline:CARD=0,DEV=0
ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.modem.0:CARD=0'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline:CARD=0,DEV=0
ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.modem.0:CARD=0'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM phoneline
ALSA lib confmisc.c:1286:(snd_func_refer) Unable to find definition 'cards.CA0106.pcm.modem.0:CARD=0'
ALSA lib conf.c:4292:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory
ALSA lib conf.c:4771:(snd_config_expand) Evaluate error: No such file or directory
ALSA lib pcm.c:2266:(snd_pcm_open_noupdate) Unknown PCM phoneline
[Errno Input overflowed] -9981
Expression 'ret' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 1735
Expression 'AlsaOpen( &alsaApi->baseHostApiRep, params, streamDir, &self->pcm )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 1902
Expression 'PaAlsaStreamComponent_Initialize( &self->capture, alsaApi, inParams, StreamDirection_In, NULL != callback )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2166
Expression 'PaAlsaStream_Initialize( stream, alsaHostApi, inputParameters, outputParameters, sampleRate, framesPerBuffer, callback, streamFlags, userData )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2835
Traceback (most recent call last):
File "record.py", line 101, in record
dataraw = stream.read(CHUNK)
File "/usr/lib/python3/dist-packages/pyaudio.py", line 605, in read
return pa.read_stream(self._stream, num_frames)
OSError: [Errno Input overflowed] -9981
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "record.py", line 138, in
record()
File "record.py", line 104, in record
stream=get_audio_input_stream()
File "record.py", line 71, in get_audio_input_stream
input_device_index=INDEX)
File "/usr/lib/python3/dist-packages/pyaudio.py", line 747, in open
stream = Stream(self, *args, **kwargs)
File "/usr/lib/python3/dist-packages/pyaudio.py", line 442, in init
self._stream = pa.open(**arguments)
OSError: [Errno Device unavailable] -9985
I would like to know how to get this sequence number (2 42 14 66 93 19 46 42 24 43 49 3)?
In train_words_index.txt there are number of lines of the word and sequence number like this
'measurement_Victoria_160.wav.png 2 42 14 66 93 19 46 42 24 43 49 3'. I had try to find the way to create this sequence number many where but couldn't be found
Thank you in advance,
I am facing problem in reshaping my tensors. Right now I am running train.py from your source but I got the following error:
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 625, in _run
% (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (100, 4096) for Tensor 'Placeholder:0', which has shape '(?, 262144)'
this is my code snippet
for i in range(6000-1):
batch_xs, batch_ys = speech.train.next_batch(100)
# WTF, tensorflow can't do 3D tensor operations?
# https://github.com/tensorflow/tensorflow/issues/406 =>
batch_xs=[flatten(matrix) for matrix in batch_xs]
#batch_ys = np.reshape(batch_ys, (100,4096))
#batch_xs = np.reshape(batch_xs, (4096,100))
# you have to reshape to flat/matrix data? why didn't they call it matrixflow?
feed = {x: batch_xs, y_: batch_ys}
speech_step.run(feed) # better for encod_entropy too! (later)
if(i%100==0):
print("iteration %d"%i)#, end=' ')
eval(feed)
if((i+1)%7000==0):
print("l_rate*=0.1")
sess.run(tf.assign(l_rate,l_rate*0.1))
print("Train")
Hi, I'm a beginner to ML concepts and I was wondering whether a speech to text model can be constructed using this library. I'm clueless as of now on how do it. I'd love to be able to test out one and learn from it. Thanks,
I study these codes recently,but i do not konw which code should i run first,and next.can someone help me?please
Any other alternative download link for the data?
I tried clicking the link provided in the source file and in caffe's implementation but to no avail:
https://www.dropbox.com/s/eb5zqskvnuj0r78/spoken_words.tar?dl=0
Thanks!
the url spoken_words is broken
`
Downloading from http://pannous.net/files/spoken_words to data/spoken_words
Traceback (most recent call last):
File "speaker_classifier_tflearn.py", line 17, in
urllib.error.HTTPError: HTTP Error 404: Not Found`
When will the tensorflow version be implemented?
I am getting error while running the examples; number_calssifier_tflearn.py and speaker_classifier_tflearn.py. below are details;
Looking for data spoken_numbers_pcm.tar in data/ Extracting data/spoken_numbers_pcm.tar to data/ Data ready! loaded batch of 2402 files loaded batch of 2402 files loaded batch of 2402 files loaded batch of 2402 files loaded batch of 2402 files Traceback (most recent call last): File "number_classifier_tflearn.py", line 26, in <module> model = tflearn.DNN(net) File "C:\Program Files\Anaconda3\lib\site-packages\tflearn\models\dnn.py", line 57, in __init__ session=session) File "C:\Program Files\Anaconda3\lib\site-packages\tflearn\helpers\trainer.py", line 125, in __init__ keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours) File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 1000, in __init__ self.build() File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 1021, in build raise ValueError("No variables to save") ValueError: No variables to save
AND
15 speakers: ['Ralph', 'Albert', 'Vicki', 'Samantha', 'Junior', 'Kathy', 'Fred', 'Princess', 'Steffi', 'Alex', 'Daniel', 'Agnes', 'Victoria', 'Tom', 'Bruce'] speakers ['Ralph', 'Albert', 'Vicki', 'Samantha', 'Junior', 'Kathy', 'Fred', 'Princess', 'Steffi', 'Alex', 'Daniel', 'Agnes', 'Victoria', 'Tom', 'Bruce'] Looking for data spoken_numbers_pcm.tar in data/ Extracting data/spoken_numbers_pcm.tar to data/ Data ready! 15 speakers: ['Ralph', 'Albert', 'Vicki', 'Samantha', 'Junior', 'Kathy', 'Fred', 'Princess', 'Steffi', 'Alex', 'Daniel', 'Agnes', 'Victoria', 'Tom', 'Bruce'] loaded batch of 2402 files Traceback (most recent call last): File "speaker_classifier_tflearn.py", line 27, in <module> model = tflearn.DNN(net) File "C:\Program Files\Anaconda3\lib\site-packages\tflearn\models\dnn.py", line 57, in __init__ session=session) File "C:\Program Files\Anaconda3\lib\site-packages\tflearn\helpers\trainer.py", line 125, in __init__ keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours) File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 1000, in __init__ self.build() File "C:\Program Files\Anaconda3\lib\site-packages\tensorflow\python\training\saver.py", line 1021, in build raise ValueError("No variables to save") ValueError: No variables to save
Thanks..
Hi,
I download the speech_data.py
and speaker_classifier_tflearn.py
.
When I run the speaker_classifier_tflearn.py
, I got errors as follows:
Traceback (most recent call last): File "speaker_classifier_tflearn.py", line 28, in <module> model = tflearn.DNN(net) File "/root/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tflearn/models/dnn.py", line 57, in __init__ session=session) File "/root/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tflearn/helpers/trainer.py", line 125, in __init__ keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours) File "/root/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1000, in __init__ self.build() File "/root/anaconda2/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1021, in build raise ValueError("No variables to save") ValueError: No variables to save
Hi all, awesome project! I checked it out and wasn't able to run the classification procedure.
Running
python number_classifier_tflearn.py causes the code to run properly up to some line and then hangs out for long time and then capture the mouse while hang out.
How did you create "Sample spectrogram, Karen uttering 'zero' with 160 words per minute."? How did you create that gray scale spectrogram?
I did the following:
as stated below
https://gist.github.com/tiagmoraismorgado/673ca5de5317a1583761a314e7d38ab1
even though, everything works fine except that patterns recognition index is not retrieved tensorflow records voice but it doesn't return recognized pattern index. looking forward for help
From speech_encoder.py,
batch_xs, batch_ys = speech.train.next_batch(100)
batch_xs=[flatten(matrix) for matrix in batch_xs]
feed = {x: batch_xs, y_: batch_ys}
The above has the following error:
ValueError: invalid literal for float(): 2 14 68 6 32 14 73 6 47 14 73 3
What should placeholder of y_ be?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.