Code Monkey home page Code Monkey logo

soundnet's People

Contributors

kittenish avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

soundnet's Issues

error: running extract_predictions.lua, CUDNN_STATUS_NOT_SUPPORTED error

Trying to run original extract_predictions code to get categories classification.

Env: Torch7, Cuda 10.0, Cudnn version: 7501

Come across following problem:
linux command: list=/home/kzhang3256/soundnet/forSoundNet/data.txt th extract_predictions.lua

Results:
{
force : 0
write : 0
model : "models/soundnet8_final.t7"
list : "/home/kzhang3256/soundnet/forSoundNet/data.txt"
}
Loading network: models/soundnet8_final.t7
Network:
nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> output]
(1): cudnn.SpatialConvolution(1 -> 16, 1x64, 1,2, 0,32)
(2): nn.SpatialBatchNormalization (4D) (16)
(3): cudnn.ReLU
(4): cudnn.SpatialMaxPooling(1x8, 1,8)
(5): cudnn.SpatialConvolution(16 -> 32, 1x32, 1,2, 0,16)
(6): nn.SpatialBatchNormalization (4D) (32)
(7): cudnn.ReLU
(8): cudnn.SpatialMaxPooling(1x8, 1,8)
(9): cudnn.SpatialConvolution(32 -> 64, 1x16, 1,2, 0,8)
(10): nn.SpatialBatchNormalization (4D) (64)
(11): cudnn.ReLU
(12): cudnn.SpatialConvolution(64 -> 128, 1x8, 1,2, 0,4)
(13): nn.SpatialBatchNormalization (4D) (128)
(14): cudnn.ReLU
(15): cudnn.SpatialConvolution(128 -> 256, 1x4, 1,2, 0,2)
(16): nn.SpatialBatchNormalization (4D) (256)
(17): cudnn.ReLU
(18): cudnn.SpatialMaxPooling(1x4, 1,4)
(19): cudnn.SpatialConvolution(256 -> 512, 1x4, 1,2, 0,2)
(20): nn.SpatialBatchNormalization (4D) (512)
(21): cudnn.ReLU
(22): cudnn.SpatialConvolution(512 -> 1024, 1x4, 1,2, 0,2)
(23): nn.SpatialBatchNormalization (4D) (1024)
(24): cudnn.ReLU
(25): nn.ConcatTable {
input
|-> (1): cudnn.SpatialConvolution(1024 -> 1000, 1x8, 1,2) -> (2): cudnn.SpatialConvolution(1024 -> 401, 1x8, 1,2)
... -> output
}
(26): nn.MapTable {
cudnn.SpatialSoftMax
}
}
/home/kzhang3256/torch/install/bin/luajit: .../kzhang3256/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:

/home/kzhang3256/torch/install/share/lua/5.1/cudnn/init.lua:58: Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED (cudnnSetFilterNdDescriptor)
stack traceback:
[C]: in function 'error'
/home/kzhang3256/torch/install/share/lua/5.1/cudnn/init.lua:58: in function 'errcheck'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:45: in function 'resetWeightDescriptors'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:358: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:357>
[C]: in function 'xpcall'
.../kzhang3256/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...kzhang3256/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
extract_predictions.lua:74: in main chunk
[C]: in function 'dofile'
...3256/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunkenter code here
[C]: at 0x555d0ebb7610

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
.../kzhang3256/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
...kzhang3256/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
extract_predictions.lua:74: in main chunk
[C]: in function 'dofile'
...3256/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x555d0ebb7610

Anyone knows what's this error: "Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED (cudnnSetFilterNdDescriptor)"

Thanks!

CPU model

Dear,
I have downloaded the model, but when I run demo.lua to load the model, I am getting error:

/home/parallels/torch/install/bin/luajit: /home/parallels/torch/install/share/lua/5.1/torch/File.lua:343: unknown Torch class <torch.CudaTensor>
stack traceback:
	[C]: in function 'error'
	/home/parallels/torch/install/share/lua/5.1/torch/File.lua:343: in function 'readObject'
	/home/parallels/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
	/home/parallels/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read'
	/home/parallels/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
	/home/parallels/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
	demo.lua:15: in main chunk
	[C]: in function 'dofile'
	...lels/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x00405d50

I think this is because the model in trained with GPU, but I must use it in CPU-only environment. How can I convert the model to CPU or Can u give me a CPU model?
Thanks very much.

Error when trying to evaluate a new dataset: CUDNN_STATUS_BAD_PARAM

Hi, thanks for releasing this code.

I want to evaluate SoundNet with another dataset.
I have created text files for training and testing that contain a column for the full path to the WAV files and another column for their class. All the audio files are the same length.

I have modified the eval_dcase.lua script to read these text files and expect the duration of the files.

When I run it, I get the following error:

/home/jdieza15/torch/install/bin/luajit: /home/jdieza15/torch/install/share/lua/5.1/nn/Container.lua:67: 
In 25 module of nn.Sequential:
In 1 module of nn.ConcatTable:
/home/jdieza15/torch/install/share/lua/5.1/cudnn/init.lua:162: Error in CuDNN: CUDNN_STATUS_BAD_PARAM (cudnnGetConvolutionNdForwardOutputDim)
stack traceback:
	[C]: in function 'error'
	/home/jdieza15/torch/install/share/lua/5.1/cudnn/init.lua:162: in function 'errcheck'
	...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:140: in function 'createIODescriptors'
	...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:188: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:186>
	[C]: in function 'xpcall'
	/home/jdieza15/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
	.../jdieza15/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function <.../jdieza15/torch/install/share/lua/5.1/nn/ConcatTable.lua:9>
	[C]: in function 'xpcall'
	/home/jdieza15/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
	...e/jdieza15/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
	eval_urbansound8k.lua:64: in function 'read_dataset'
	eval_urbansound8k.lua:115: in main chunk
	[C]: in function 'dofile'
	...za15/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x00405d50

WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
	[C]: in function 'error'
	/home/jdieza15/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
	...e/jdieza15/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
	eval_urbansound8k.lua:64: in function 'read_dataset'
	eval_urbansound8k.lua:115: in main chunk
	[C]: in function 'dofile'
	...za15/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
	[C]: at 0x00405d50

This is the code in line 64:
net:forward(snd:view(1,1,-1,1):cuda())

I never used lua before, so I do not know how to interpret this error. Is it related to my installation of CuDNN or I am doing something wrong when running the code?

Thanks

A question about the output of visual CNN.

Hi, thanks for your nice paper. I met a question that in your paper you say the numbers of frames of the videos are variable. So how do you fuse the CNN output from different frames so the length of last output is a constant? Just computing the average or something else? Thank you very much.

size problems for audio classification

I am so sorry to disturb you. when i use pre-train soundnet to speech emotion recognition, I have some questions. Could you please give me a hand? Thanks

Question 1:
wav, sr = torchaudio.load(path) reads the audio samples, then it is preprocessed by wav.unsqueeze(1).unsqueeze(-1).repeat(1,1,8,1).
what are the requirements for the audio sample rate? Does the sample rate must be 22050? what are the other restrictions?

Question 2:
the last layer is nn.Conv2d(1024, 401, kernel_size=(8, 1), stride=(2, 1)) to extract speech features.
Feature size varies depending upon the length of the audio, what does it depend upon? I want to use the feature for audio classification. How do I get constant dimension feature vector for all of my audio files?
the same as your mentioned, an audio file with 1476864 samples produces feature of dimension [1x1024x46x1] and other files with 2199168 samples produce a feature of dimension [1x1024x68x1]. [1x1024x46x1], 1 represents batch, 1024 channel_out, what is 46 represented? what is the last dimension 1 represented?

Question 3:
How do get constant dimension feature vector for both files? Finally, when I try to classify, What do I need to do with the features of ouput ( 1, 401, feature, 1)so that I can use them in the final classification task? how can the faltten method be better, (batch, channel_out* 1, feature)? average on the channel? or other methods?

PS
I am new to audio and DL, sorry ask basic problem
Thanks
best

Size of feature

@cvondrick @yusufaytar
Thank you very much for sharing this code.

I am new to audio. I was trying extract features from my audio files. Feature size varies depending upon the length of the video, what does it depend upon? I want to use the feature for classification. How do I get constant dimension feature vector for all of my audio files?
For example, an audio file with 1476864 samples produces feature of dimension [1x1024x46x1] and other files with 2199168 samples produce a feature of dimension [1x1024x68x1]. How do get constant dimension feature vector for both files?

How do I have to modify sound signal to apply net in sliding window fashion in the temporal direction?

label_text_file = '/data/vision/torralba/crossmodal/soundnet/lmdbs/train_frames4_%04d.txt'

Hello,thank you very much for sharing the code and the dataset.
I am a new one in audio_visual.
I want to reappear the source code.
when I read the code ,I find I do not have the file "label_text_file" in main_train.lua.
If i want to train the model by myself, i have to have the raw mp3s(359GB), and the image features(88GB), but i dont know how to get the label_text_file.
if everyone knows, or everyone had re-trained the model by yourself, please give me some advise or some experience.
thank you very much

Tips on crawling images from video

Hi, I'd like to ask some tips that would be generally applicable in video/image stuff deep learning, I've been only working on music-related works. Some (if not all) might seem dumb ;)

  • What would be good image format of extracted frames of video? jpeg or png?
  • What was image sampling rate in the work? -- how many images per second did you sample?
  • Any other tip/hack would be appreciated.

Thanks!

Erros when running finetune

Hello @cvondrick ,

It seems that audio_simple, the dataset variable defined in main_finetune.lua, is not valid for data.lua to load the input audios. It shows following error:

/home/yclin/distro/install/bin/luajit: /home/yclin/Workspace/soundnet/data/data.lua:24: Unknown dataset: audio_simple

I also tried to replace audio_simple with donkey_audio and donkey_audio_labeled, but none of them work.

Would you please have a look in the finetune section of README?

Question: Steps to get category labels

Hi, thanks for making great implementation!

I tried to extract features from a sound by using the pretrained models like:

sky: 43.56%
stage, indoor: 5.46%
amusement park: 5.24%

spotlight: 16.74%
fountain: 12.33%
traffic light: 5.76%

I want to get each category labels but I don't understand how to convert them from HDF5 format.
Could you please provide me how to get category labels step by step?

Any help will be appreciated.

Not working

munender@cseproj149:~/code_space/soundnet$ list=data.txt th extract_feat.lua
/users/gpu/munender/src/torch/install/bin/lua: ...ender/src/torch/install/share/lua/5.1/trepl/init.lua:389: ...unender/src/torch/install/share/lua/5.1/hdf5/ffi.lua:56: expected align(#) on line 687
stack traceback:
[C]: in function 'error'
...ender/src/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
extract_feat.lua:4: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: ?

All features negative

hi,
i used trained model to extracti feature from mp3s, but all 1000 dimentional features are negative. is this normal?

Unzipping the frames folder!

Dear @cvondrick ,
Thank you very much for making this code and the dataset publicly available.
I have a strange experience with unzipping the frames folder. It has been 2 days and still continue to unzipping all the frames.
I simply used tar -xvzf frames_public.tar.gz

Have you had similar experience with that? Or is the problem on my side?

Thank you very much.

What's the place CNN exactly?

Hi, I couldn't find what pre-trained network you used for the place CNN from the paper or the website. Where does it come from?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.