cvondrick / soundnet Goto Github PK
View Code? Open in Web Editor NEWSoundNet: Learning Sound Representations from Unlabeled Video. NIPS 2016
Home Page: http://projects.csail.mit.edu/soundnet/
License: MIT License
SoundNet: Learning Sound Representations from Unlabeled Video. NIPS 2016
Home Page: http://projects.csail.mit.edu/soundnet/
License: MIT License
Trying to run original extract_predictions code to get categories classification.
Env: Torch7, Cuda 10.0, Cudnn version: 7501
Come across following problem:
linux command: list=/home/kzhang3256/soundnet/forSoundNet/data.txt th extract_predictions.lua
Results:
{
force : 0
write : 0
model : "models/soundnet8_final.t7"
list : "/home/kzhang3256/soundnet/forSoundNet/data.txt"
}
Loading network: models/soundnet8_final.t7
Network:
nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> output]
(1): cudnn.SpatialConvolution(1 -> 16, 1x64, 1,2, 0,32)
(2): nn.SpatialBatchNormalization (4D) (16)
(3): cudnn.ReLU
(4): cudnn.SpatialMaxPooling(1x8, 1,8)
(5): cudnn.SpatialConvolution(16 -> 32, 1x32, 1,2, 0,16)
(6): nn.SpatialBatchNormalization (4D) (32)
(7): cudnn.ReLU
(8): cudnn.SpatialMaxPooling(1x8, 1,8)
(9): cudnn.SpatialConvolution(32 -> 64, 1x16, 1,2, 0,8)
(10): nn.SpatialBatchNormalization (4D) (64)
(11): cudnn.ReLU
(12): cudnn.SpatialConvolution(64 -> 128, 1x8, 1,2, 0,4)
(13): nn.SpatialBatchNormalization (4D) (128)
(14): cudnn.ReLU
(15): cudnn.SpatialConvolution(128 -> 256, 1x4, 1,2, 0,2)
(16): nn.SpatialBatchNormalization (4D) (256)
(17): cudnn.ReLU
(18): cudnn.SpatialMaxPooling(1x4, 1,4)
(19): cudnn.SpatialConvolution(256 -> 512, 1x4, 1,2, 0,2)
(20): nn.SpatialBatchNormalization (4D) (512)
(21): cudnn.ReLU
(22): cudnn.SpatialConvolution(512 -> 1024, 1x4, 1,2, 0,2)
(23): nn.SpatialBatchNormalization (4D) (1024)
(24): cudnn.ReLU
(25): nn.ConcatTable {
input
|-> (1): cudnn.SpatialConvolution(1024 -> 1000, 1x8, 1,2) -> (2): cudnn.SpatialConvolution(1024 -> 401, 1x8, 1,2)
... -> output
}
(26): nn.MapTable {
cudnn.SpatialSoftMax
}
}
/home/kzhang3256/torch/install/bin/luajit: .../kzhang3256/torch/install/share/lua/5.1/nn/Container.lua:67:
In 1 module of nn.Sequential:
/home/kzhang3256/torch/install/share/lua/5.1/cudnn/init.lua:58: Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED (cudnnSetFilterNdDescriptor)
stack traceback:
[C]: in function 'error'
/home/kzhang3256/torch/install/share/lua/5.1/cudnn/init.lua:58: in function 'errcheck'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:45: in function 'resetWeightDescriptors'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:358: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:357>
[C]: in function 'xpcall'
.../kzhang3256/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...kzhang3256/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
extract_predictions.lua:74: in main chunk
[C]: in function 'dofile'
...3256/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunkenter code here
[C]: at 0x555d0ebb7610
WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
.../kzhang3256/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
...kzhang3256/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
extract_predictions.lua:74: in main chunk
[C]: in function 'dofile'
...3256/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x555d0ebb7610
Anyone knows what's this error: "Error in CuDNN: CUDNN_STATUS_NOT_SUPPORTED (cudnnSetFilterNdDescriptor)"
Thanks!
Dear,
I have downloaded the model, but when I run demo.lua
to load the model, I am getting error:
/home/parallels/torch/install/bin/luajit: /home/parallels/torch/install/share/lua/5.1/torch/File.lua:343: unknown Torch class <torch.CudaTensor>
stack traceback:
[C]: in function 'error'
/home/parallels/torch/install/share/lua/5.1/torch/File.lua:343: in function 'readObject'
/home/parallels/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/home/parallels/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read'
/home/parallels/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
/home/parallels/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
demo.lua:15: in main chunk
[C]: in function 'dofile'
...lels/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
I think this is because the model in trained with GPU, but I must use it in CPU-only environment. How can I convert the model to CPU or Can u give me a CPU model?
Thanks very much.
Hi, in the Soundnet website I can see the MP3s are available to download, but I do not see their corresponding class probabilities.
Are they not available yet? Otherwise could you point me where I can find them?
Thanks
Now I simply use a virtual machine running under VM. I just want to use the pretrained model to get some result. Can I simply use cpu instead of gpu? If I can ,what should I do?
Hi, thanks for releasing this code.
I want to evaluate SoundNet with another dataset.
I have created text files for training and testing that contain a column for the full path to the WAV files and another column for their class. All the audio files are the same length.
I have modified the eval_dcase.lua
script to read these text files and expect the duration of the files.
When I run it, I get the following error:
/home/jdieza15/torch/install/bin/luajit: /home/jdieza15/torch/install/share/lua/5.1/nn/Container.lua:67:
In 25 module of nn.Sequential:
In 1 module of nn.ConcatTable:
/home/jdieza15/torch/install/share/lua/5.1/cudnn/init.lua:162: Error in CuDNN: CUDNN_STATUS_BAD_PARAM (cudnnGetConvolutionNdForwardOutputDim)
stack traceback:
[C]: in function 'error'
/home/jdieza15/torch/install/share/lua/5.1/cudnn/init.lua:162: in function 'errcheck'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:140: in function 'createIODescriptors'
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:188: in function <...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:186>
[C]: in function 'xpcall'
/home/jdieza15/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
.../jdieza15/torch/install/share/lua/5.1/nn/ConcatTable.lua:11: in function <.../jdieza15/torch/install/share/lua/5.1/nn/ConcatTable.lua:9>
[C]: in function 'xpcall'
/home/jdieza15/torch/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
...e/jdieza15/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
eval_urbansound8k.lua:64: in function 'read_dataset'
eval_urbansound8k.lua:115: in main chunk
[C]: in function 'dofile'
...za15/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
WARNING: If you see a stack trace below, it doesn't point to the place where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
/home/jdieza15/torch/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
...e/jdieza15/torch/install/share/lua/5.1/nn/Sequential.lua:44: in function 'forward'
eval_urbansound8k.lua:64: in function 'read_dataset'
eval_urbansound8k.lua:115: in main chunk
[C]: in function 'dofile'
...za15/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00405d50
This is the code in line 64:
net:forward(snd:view(1,1,-1,1):cuda())
I never used lua before, so I do not know how to interpret this error. Is it related to my installation of CuDNN or I am doing something wrong when running the code?
Thanks
Hi, thanks for your nice paper. I met a question that in your paper you say the numbers of frames of the videos are variable. So how do you fuse the CNN output from different frames so the length of last output is a constant? Just computing the average or something else? Thank you very much.
I am so sorry to disturb you. when i use pre-train soundnet to speech emotion recognition, I have some questions. Could you please give me a hand? Thanks
Question 1:
wav, sr = torchaudio.load(path) reads the audio samples, then it is preprocessed by wav.unsqueeze(1).unsqueeze(-1).repeat(1,1,8,1).
what are the requirements for the audio sample rate? Does the sample rate must be 22050? what are the other restrictions?
Question 2:
the last layer is nn.Conv2d(1024, 401, kernel_size=(8, 1), stride=(2, 1)) to extract speech features.
Feature size varies depending upon the length of the audio, what does it depend upon? I want to use the feature for audio classification. How do I get constant dimension feature vector for all of my audio files?
the same as your mentioned, an audio file with 1476864 samples produces feature of dimension [1x1024x46x1] and other files with 2199168 samples produce a feature of dimension [1x1024x68x1]. [1x1024x46x1], 1 represents batch, 1024 channel_out, what is 46 represented? what is the last dimension 1 represented?
Question 3:
How do get constant dimension feature vector for both files? Finally, when I try to classify, What do I need to do with the features of ouput ( 1, 401, feature, 1)so that I can use them in the final classification task? how can the faltten method be better, (batch, channel_out* 1, feature)? average on the channel? or other methods?
PS
I am new to audio and DL, sorry ask basic problem
Thanks
best
I just simply run demo.lua and it end up with:
lua: cannot open <models/soundnet8_final.lua> in mode r at /home/yangshuo/torch/pkg/torch/lib/TH/THDiskFile.c:670
I searched https://projects.csail.mit.edu/soundnet/ and git,but I still not find soundnet8_final.lua.
where can i find it
@cvondrick @yusufaytar
Thank you very much for sharing this code.
I am new to audio. I was trying extract features from my audio files. Feature size varies depending upon the length of the video, what does it depend upon? I want to use the feature for classification. How do I get constant dimension feature vector for all of my audio files?
For example, an audio file with 1476864 samples produces feature of dimension [1x1024x46x1] and other files with 2199168 samples produce a feature of dimension [1x1024x68x1]. How do get constant dimension feature vector for both files?
How do I have to modify sound signal to apply net in sliding window fashion in the temporal direction?
Hello,thank you very much for sharing the code and the dataset.
I am a new one in audio_visual.
I want to reappear the source code.
when I read the code ,I find I do not have the file "label_text_file" in main_train.lua.
If i want to train the model by myself, i have to have the raw mp3s(359GB), and the image features(88GB), but i dont know how to get the label_text_file.
if everyone knows, or everyone had re-trained the model by yourself, please give me some advise or some experience.
thank you very much
Hi, I'd like to ask some tips that would be generally applicable in video/image stuff deep learning, I've been only working on music-related works. Some (if not all) might seem dumb ;)
Thanks!
I have downloaded the training data from the demo website (https://projects.csail.mit.edu/soundnet/), and was trying to run the main_train.lua script. But I always get an out-of-memory error at the following line:
optim.adam(fx, parameters, optimState)
The same thing happens even if I run main_train_small.lua.
I am using 120GB CPU memory and 4.7GB GPU memory. Do I need more?
Hello @cvondrick ,
It seems that audio_simple
, the dataset
variable defined in main_finetune.lua
, is not valid for data.lua
to load the input audios. It shows following error:
/home/yclin/distro/install/bin/luajit: /home/yclin/Workspace/soundnet/data/data.lua:24: Unknown dataset: audio_simple
I also tried to replace audio_simple
with donkey_audio
and donkey_audio_labeled
, but none of them work.
Would you please have a look in the finetune section of README?
Hi, thanks for making great implementation!
I tried to extract features from a sound by using the pretrained models like:
sky: 43.56%
stage, indoor: 5.46%
amusement park: 5.24%
spotlight: 16.74%
fountain: 12.33%
traffic light: 5.76%
I want to get each category labels but I don't understand how to convert them from HDF5 format.
Could you please provide me how to get category labels step by step?
Any help will be appreciated.
Hi, Guys, to be honest, i think this repo is just like a shit, if u also think so please feel free, the fuck dataset link is just here https://projects.csail.mit.edu/soundnet/
munender@cseproj149:~/code_space/soundnet$ list=data.txt th extract_feat.lua
/users/gpu/munender/src/torch/install/bin/lua: ...ender/src/torch/install/share/lua/5.1/trepl/init.lua:389: ...unender/src/torch/install/share/lua/5.1/hdf5/ffi.lua:56: expected align(#) on line 687
stack traceback:
[C]: in function 'error'
...ender/src/torch/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
extract_feat.lua:4: in main chunk
[C]: in function 'dofile'
.../torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: ?
What are the layer number inputs for other CNN layers?
Hi, when I run
torch.load('soundnet8_final.t7')
with python3.6, pytorch0.4.0
I got this error.
Do you know what's going on?
Thank you~
hi,
i used trained model to extracti feature from mp3s, but all 1000 dimentional features are negative. is this normal?
Dear @cvondrick ,
Thank you very much for making this code and the dataset publicly available.
I have a strange experience with unzipping the frames folder. It has been 2 days and still continue to unzipping all the frames.
I simply used tar -xvzf frames_public.tar.gz
Have you had similar experience with that? Or is the problem on my side?
Thank you very much.
Hi, I couldn't find what pre-trained network you used for the place CNN from the paper or the website. Where does it come from?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.