Code Monkey home page Code Monkey logo

speech_separation's People

Contributors

bill9800 avatar njerschow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speech_separation's Issues

About test loss

when i run AV_MODER_V2 , a error :Unknown loss function:loss_func,i can't understand it

Face embeddings

Hi there,
i´m a bit confused. Your diagramm displays, that you get 75 * 1 * 1 * 1792 face embeddings frome FaceNet. But in the original paper they used 1×1×1024 face embeddings. They used the layer called "avg pool" in FaceNet. In the code -> pretrain_load_test.py it seems like your using the layer "avg pool". But why 1792?
Greetings:)

Operands error

@bill9800 I'm getting the following error while evaluating the audio-visual model
Traceback (most recent call last):
File "AV_model_eval.py", line 92, in
T = utils.fast_istft(F,power=False)
File "/home/lenovo/Downloads/speech_separation/lib/utils.py", line 75, in fast_istft
data = istft(real_imag_shrink(data))
File "/home/lenovo/Downloads/speech_separation/lib/utils.py", line 30, in istft
Total[start:end] = Total[start:end] + data[i, :] * windows
ValueError: operands could not be broadcast together with shapes (257,) (512,)

How to solve this?

Question about pre-trained AV Model

Thanks for sharing the code.Due to the limitation of our devices,I have been working for a long time on this model.However, I still don't get good result. I want to see the results, compare them with my work. Do you mind sharing a pre-trained model? This will help me a lot to go on with my work.
Thanks a lot!

How would you infer on a downloaded video?

I have trained my own model and have been able to use the model_v2/AV_model_eval.py to test against testing data.

Have you been able to infer on a video that the system has not seen before?

What steps are needed to be taken to process the new video ready to be inferred on?

Evaluation or Pre-Trained AV Model?

Hey @bill9800, thanks for sharing this repo. We've been working on some versions of this model since the AVSpeech dataset was published. Do you have any evaluation metrics, or do you mind sharing a pre-trained v2 model? We've been playing with a much narrower model trained on embeddings with 128 features, and it would be great to understand the differences a bit.

Thanks again!

No improvement of model after 5 epochs

Trained using:

  • 2 GPUS
  • batch size=2
  • number audio/video files=50k
  • epochs=5

I infer on a video that the model has not seen.

The output contains 2 audio files that sound very similar to each other. I cannot distinguish voices between each audio file.

Has anyone got a working model that seems to be doing better than mine?

Indexing issue in model/lib/model_AV_new.py

I have created a pull request addressing the indexing issue (#17) . Keras during run time adds another dimension at axis 0 which becomes the batch axis. Hence, sliced() function slices through the second last dimension viz. embedding and not the people_num.

Question about the loss funvtion

The cost function is defined as follows,in your codes:
def audio_discriminate_loss2(gamma=0.1,beta = 20.1,num_speaker=2):
def loss_func(S_true,S_pred,gamma=gamma,beta=beta,num_speaker=num_speaker):
sum_mtr = K.zeros_like(S_true[:,:,:,:,0])
for i in range(num_speaker):
sum_mtr += K.square(S_true[:,:,:,:,i]-S_pred[:,:,:,:,i])
for j in range(num_speaker):
if i != j:
sum_mtr -= gamma
(K.square(S_true[:,:,:,:,i]-S_pred[:,:,:,:,j]))
for i in range(num_speaker):
for j in range(i+1,num_speaker):
#sum_mtr -= betaK.square(S_pred[:,:,:,i]-S_pred[:,:,:,j])
#sum_mtr += beta
K.square(S_true[:,:,:,:,i]-S_true[:,:,:,:,j])
pass
#sum = K.sum(K.maximum(K.flatten(sum_mtr),0))
loss = K.mean(K.flatten(sum_mtr))
return loss
return loss_func
However, I do not understand the meaning of this parts:
for j in range(num_speaker):
if i != j:
sum_mtr -= gamma*(K.square(S_true[:,:,:,:,i]-S_pred[:,:,:,:,j]))
I guess you want to use the permutation invariant (PIT) loss, but the definition of PIT is not like that. what is the meaning of this part?

Question about the training loss

@bill9800 Hi, It's really a amazing work. Thanks for sharing the code. However, I have some problems about the training loss. I trained 9 epoch ( datasets about 30000 videos,batchsize = 2. ).
I noticed that my original training loss was about 0.45. After 9 epoch, my training loss is about 0.18 and it can not decrease. It is normal? what is the situation about your training loss?
I am looking for your replay!

problems about keeping training using the pretrained model

@bill9800 It is really a amazing work! Thanks a lot for sharing the code. Howerver,I meet a problem when I try to keep training using the pretrained model. AS follows:
Traceback (most recent call last):
File "/home/yyh/pycharm-2018.3.4/helpers/pydev/pydevd.py", line 1741, in
main()
File "/home/yyh/pycharm-2018.3.4/helpers/pydev/pydevd.py", line 1735, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/home/yyh/pycharm-2018.3.4/helpers/pydev/pydevd.py", line 1135, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/yyh/pycharm-2018.3.4/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/yyh/xym/speech_separation-master/model/model_v2/AV_train.py", line 74, in
AV_model = load_model(latest_file,custom_objects={"tf": tf})
File "/home/yyh/anaconda3/envs/xym/lib/python3.6/site-packages/keras/engine/saving.py", line 289, in load_model
sample_weight_mode=sample_weight_mode)
File "/home/yyh/anaconda3/envs/xym/lib/python3.6/site-packages/keras/engine/training.py", line 139, in compile
loss_function = losses.get(loss)
File "/home/yyh/anaconda3/envs/xym/lib/python3.6/site-packages/keras/losses.py", line 133, in get
return deserialize(identifier)
File "/home/yyh/anaconda3/envs/xym/lib/python3.6/site-packages/keras/losses.py", line 114, in deserialize
printable_module_name='loss function')
File "/home/yyh/anaconda3/envs/xym/lib/python3.6/site-packages/keras/utils/generic_utils.py", line 165, in deserialize_keras_object
':' + function_name)
ValueError: Unknown loss function:loss_func

my environment is tensorflow-gpu==1.8.0. keras ==2.2.2. python=3.6.
cloud you give me some advices?

Cannot load the pretrained model

Thanks for your efforts. I have an issue and a couple of questions.
I've followed the steps to preprocess the data, I've then downloaded your pretrained h5 model from google drive(https://drive.google.com/file/d/1GfTtnisfnRluUf-V1FQzCWe8_BG5tNYI/view?usp=drivesdk). After that I've tried to run the evaluation script of model_v2(using python2.7 and 3.5), but the code produces segmentation fault when calling the load_model function.
Do you have any suggestions?
What is your python version?
What are your Keras and tensorflow versions?
How many gigabytes of vram required to load the model during the evaluation time?
How long does it take to process(i.e. feedfoward) a single 3 seconds segment on your GPU?
Do you have any sample outputs (wav files that have been generated by your pretrained model) to share with us?

Thank your very much :D

Pre-trained_model_v1() not working?

Hello, I have some question,
while evaluating the audio_only model with the AOmodel-2p-001-0.00000.h5 file, the pred folder generates the .wav files. All the files are silent which means there is no voice in that. why is that so?

IndexError in AV_model_eval.py at parse_X_data()

Hi there,
wehen i am trying to predict there is this error going on:

line 41 in speech_separation/model/model_v2/AV_model_eval.py /

face_embs[1, :, :, :, i] = np.load(face_path + single_idxs[i] + "_face_emb.npy")

IndexError: index 1 is out of bounds for axis 0 with size 1

face_embs shape is (1,75,1,1972,2)
i can be in my case 0 or 1
np.load**(face_path + single_idxs[i] + "_face_emb.npy")** shape is (75,1,1972)

Whats wrong here? Do we need to change line 41 from face_embs[1, :, :, :, i] to from face_embs[0, :, :, :, i]

Greetings:)

Problem while trying to run AV_train.py file

@bill9800 Thanks for doing this amazing project and sharing the code.But while executing, I faced some problem while training ( /model_v2/AV_train.py ) the module. As follows

Epoch 1/100
Traceback (most recent call last):
File "AV_train.py", line 107, in
initial_epoch=initial_epoch
File "/home/avicky/env/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/avicky/env/lib/python3.7/site-packages/keras/engine/training.py", line 1732, in fit_generator
initial_epoch=initial_epoch)
File "/home/avicky/env/lib/python3.7/site-packages/keras/engine/training_generator.py", line 185, in fit_generator
generator_output = next(output_generator)
File "/home/avicky/env/lib/python3.7/site-packages/keras/utils/data_utils.py", line 625, in get
six.reraise(*sys.exc_info())
File "/home/avicky/env/lib/python3.7/site-packages/six.py", line 696, in reraise
raise value
File "/home/avicky/env/lib/python3.7/site-packages/keras/utils/data_utils.py", line 610, in get
inputs = future.get(timeout=30)
File "/usr/lib/python3.7/multiprocessing/pool.py", line 657, in get
raise self._value
File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/home/avicky/env/lib/python3.7/site-packages/keras/utils/data_utils.py", line 406, in get_index
return _SHARED_SEQUENCES[uid][i]
File "../lib/MyGenerator.py", line 84, in getitem
[X1, X2], y = self.__data_generation(filename_temp)
File "../lib/MyGenerator.py", line 106, in __data_generation
y[i, :, :, :, j] = np.load(self.database_dir_path+'audio/AV_model_database/crm/' + info[j + 1])
File "/home/avicky/env/lib/python3.7/site-packages/numpy/lib/npyio.py", line 428, in load
fid = open(os_fspath(file), "rb")
FileNotFoundError: [Errno 2] No such file or directory: '../../data/audio/AV_model_database/crm/mix_face_emb.npy'

I have run the code for small amount of data to check for errors.I have attatched the photos of data file generated for training. Can u please check if they are generated right? and help me with the solution of above problem so that i can go ahead.Thank you!!!

Screenshot from 2020-03-15 17-16-21

Screenshot from 2020-03-15 17-16-26

Screenshot from 2020-03-15 17-16-34

Screenshot from 2020-03-15 17-13-56

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.