bill9800 / speech_separation Goto Github PK
View Code? Open in Web Editor NEWInclude some core functions and model to handle speech separation
License: MIT License
Include some core functions and model to handle speech separation
License: MIT License
when i run AV_MODER_V2 , a error :Unknown loss function:loss_func,i can't understand it
Hi there,
i´m a bit confused. Your diagramm displays, that you get 75 * 1 * 1 * 1792 face embeddings frome FaceNet. But in the original paper they used 1×1×1024 face embeddings. They used the layer called "avg pool" in FaceNet. In the code -> pretrain_load_test.py it seems like your using the layer "avg pool". But why 1792?
Greetings:)
There I think we need not have "cd" in function. If I am Wrong then correct me.
@bill9800 I'm getting the following error while evaluating the audio-visual model
Traceback (most recent call last):
File "AV_model_eval.py", line 92, in
T = utils.fast_istft(F,power=False)
File "/home/lenovo/Downloads/speech_separation/lib/utils.py", line 75, in fast_istft
data = istft(real_imag_shrink(data))
File "/home/lenovo/Downloads/speech_separation/lib/utils.py", line 30, in istft
Total[start:end] = Total[start:end] + data[i, :] * windows
ValueError: operands could not be broadcast together with shapes (257,) (512,)
How to solve this?
Thanks for sharing the code.Due to the limitation of our devices,I have been working for a long time on this model.However, I still don't get good result. I want to see the results, compare them with my work. Do you mind sharing a pre-trained model? This will help me a lot to go on with my work.
Thanks a lot!
@bill9800 sir while evaluating the audio_only model the pred folder generates the .wav files. All the files are silent that means there is no any voice in that. why is that so?
I have trained my own model and have been able to use the model_v2/AV_model_eval.py to test against testing data.
Have you been able to infer on a video that the system has not seen before?
What steps are needed to be taken to process the new video ready to be inferred on?
Hey @bill9800, thanks for sharing this repo. We've been working on some versions of this model since the AVSpeech dataset was published. Do you have any evaluation metrics, or do you mind sharing a pre-trained v2 model? We've been playing with a much narrower model trained on embeddings with 128 features, and it would be great to understand the differences a bit.
Thanks again!
Trained using:
I infer on a video that the model has not seen.
The output contains 2 audio files that sound very similar to each other. I cannot distinguish voices between each audio file.
Has anyone got a working model that seems to be doing better than mine?
I have created a pull request addressing the indexing issue (#17) . Keras during run time adds another dimension at axis 0
which becomes the batch axis. Hence, sliced()
function slices through the second last dimension viz. embedding
and not the people_num
.
The cost function is defined as follows,in your codes:
def audio_discriminate_loss2(gamma=0.1,beta = 20.1,num_speaker=2):
def loss_func(S_true,S_pred,gamma=gamma,beta=beta,num_speaker=num_speaker):
sum_mtr = K.zeros_like(S_true[:,:,:,:,0])
for i in range(num_speaker):
sum_mtr += K.square(S_true[:,:,:,:,i]-S_pred[:,:,:,:,i])
for j in range(num_speaker):
if i != j:
sum_mtr -= gamma(K.square(S_true[:,:,:,:,i]-S_pred[:,:,:,:,j]))
for i in range(num_speaker):
for j in range(i+1,num_speaker):
#sum_mtr -= betaK.square(S_pred[:,:,:,i]-S_pred[:,:,:,j])
#sum_mtr += betaK.square(S_true[:,:,:,:,i]-S_true[:,:,:,:,j])
pass
#sum = K.sum(K.maximum(K.flatten(sum_mtr),0))
loss = K.mean(K.flatten(sum_mtr))
return loss
return loss_func
However, I do not understand the meaning of this parts:
for j in range(num_speaker):
if i != j:
sum_mtr -= gamma*(K.square(S_true[:,:,:,:,i]-S_pred[:,:,:,:,j]))
I guess you want to use the permutation invariant (PIT) loss, but the definition of PIT is not like that. what is the meaning of this part?
No function m_link in avhandler.py
@bill9800 Hi, It's really a amazing work. Thanks for sharing the code. However, I have some problems about the training loss. I trained 9 epoch ( datasets about 30000 videos,batchsize = 2. ).
I noticed that my original training loss was about 0.45. After 9 epoch, my training loss is about 0.18 and it can not decrease. It is normal? what is the situation about your training loss?
I am looking for your replay!
@bill9800 It is really a amazing work! Thanks a lot for sharing the code. Howerver,I meet a problem when I try to keep training using the pretrained model. AS follows:
Traceback (most recent call last):
File "/home/yyh/pycharm-2018.3.4/helpers/pydev/pydevd.py", line 1741, in
main()
File "/home/yyh/pycharm-2018.3.4/helpers/pydev/pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/yyh/pycharm-2018.3.4/helpers/pydev/pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/yyh/pycharm-2018.3.4/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/yyh/xym/speech_separation-master/model/model_v2/AV_train.py", line 74, in
AV_model = load_model(latest_file,custom_objects={"tf": tf})
File "/home/yyh/anaconda3/envs/xym/lib/python3.6/site-packages/keras/engine/saving.py", line 289, in load_model
sample_weight_mode=sample_weight_mode)
File "/home/yyh/anaconda3/envs/xym/lib/python3.6/site-packages/keras/engine/training.py", line 139, in compile
loss_function = losses.get(loss)
File "/home/yyh/anaconda3/envs/xym/lib/python3.6/site-packages/keras/losses.py", line 133, in get
return deserialize(identifier)
File "/home/yyh/anaconda3/envs/xym/lib/python3.6/site-packages/keras/losses.py", line 114, in deserialize
printable_module_name='loss function')
File "/home/yyh/anaconda3/envs/xym/lib/python3.6/site-packages/keras/utils/generic_utils.py", line 165, in deserialize_keras_object
':' + function_name)
ValueError: Unknown loss function:loss_func
my environment is tensorflow-gpu==1.8.0. keras ==2.2.2. python=3.6.
cloud you give me some advices?
I am trying to download the datasets using audio_downloader.py. But I am getting error in the download function of AVHandler.py which says "The filename, directory name, or volume label syntax is incorrect.
The system cannot find the path specified". Any help will be appreciate
Thanks for your efforts. I have an issue and a couple of questions.
I've followed the steps to preprocess the data, I've then downloaded your pretrained h5 model from google drive(https://drive.google.com/file/d/1GfTtnisfnRluUf-V1FQzCWe8_BG5tNYI/view?usp=drivesdk). After that I've tried to run the evaluation script of model_v2(using python2.7 and 3.5), but the code produces segmentation fault when calling the load_model function.
Do you have any suggestions?
What is your python version?
What are your Keras and tensorflow versions?
How many gigabytes of vram required to load the model during the evaluation time?
How long does it take to process(i.e. feedfoward) a single 3 seconds segment on your GPU?
Do you have any sample outputs (wav files that have been generated by your pretrained model) to share with us?
Thank your very much :D
Hello, I have some question,
while evaluating the audio_only model with the AOmodel-2p-001-0.00000.h5 file, the pred folder generates the .wav files. All the files are silent which means there is no voice in that. why is that so?
Hi there,
wehen i am trying to predict there is this error going on:
line 41 in speech_separation/model/model_v2/AV_model_eval.py /
face_embs[1, :, :, :, i] = np.load(face_path + single_idxs[i] + "_face_emb.npy")
IndexError: index 1 is out of bounds for axis 0 with size 1
face_embs shape is (1,75,1,1972,2)
i can be in my case 0 or 1
np.load**(face_path + single_idxs[i] + "_face_emb.npy")** shape is (75,1,1972)
Whats wrong here? Do we need to change line 41 from face_embs[1, :, :, :, i] to from face_embs[0, :, :, :, i]
Greetings:)
@bill9800 Thanks for doing this amazing project and sharing the code.But while executing, I faced some problem while training ( /model_v2/AV_train.py ) the module. As follows
Epoch 1/100
Traceback (most recent call last):
File "AV_train.py", line 107, in
initial_epoch=initial_epoch
File "/home/avicky/env/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/avicky/env/lib/python3.7/site-packages/keras/engine/training.py", line 1732, in fit_generator
initial_epoch=initial_epoch)
File "/home/avicky/env/lib/python3.7/site-packages/keras/engine/training_generator.py", line 185, in fit_generator
generator_output = next(output_generator)
File "/home/avicky/env/lib/python3.7/site-packages/keras/utils/data_utils.py", line 625, in get
six.reraise(*sys.exc_info())
File "/home/avicky/env/lib/python3.7/site-packages/six.py", line 696, in reraise
raise value
File "/home/avicky/env/lib/python3.7/site-packages/keras/utils/data_utils.py", line 610, in get
inputs = future.get(timeout=30)
File "/usr/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/avicky/env/lib/python3.7/site-packages/keras/utils/data_utils.py", line 406, in get_index
return _SHARED_SEQUENCES[uid][i]
File "../lib/MyGenerator.py", line 84, in getitem
[X1, X2], y = self.__data_generation(filename_temp)
File "../lib/MyGenerator.py", line 106, in __data_generation
y[i, :, :, :, j] = np.load(self.database_dir_path+'audio/AV_model_database/crm/' + info[j + 1])
File "/home/avicky/env/lib/python3.7/site-packages/numpy/lib/npyio.py", line 428, in load
fid = open(os_fspath(file), "rb")
FileNotFoundError: [Errno 2] No such file or directory: '../../data/audio/AV_model_database/crm/mix_face_emb.npy'
I have run the code for small amount of data to check for errors.I have attatched the photos of data file generated for training. Can u please check if they are generated right? and help me with the solution of above problem so that i can go ahead.Thank you!!!
subj
cannot download youtube videos or audio
@bill9800 Output of the AV_model_eval.py in the pred folder is nothing but the mixed audio files. We are supposed to get the isolated files. Please help on this.
If so, could you share that with us?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.