The slowfast from bomri

Error with Detectron2 when using run_net.py

Hello,
Thanks for sharing the code. For running the model using run_net.py I face the following error that seemingly related to Detectron2:

Exception has occurred: ImportError /home/user/miniconda3/envs/pytorch/lib/python3.9/site-packages/detectron2/_C.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZNK2at6Tensor7reshapeEN3c108ArrayRefIlEE File "/home/user/vtn/slowfast/models/head_helper.py", line 8, in <module> from detectron2.layers import ROIAlign File "/home/user/vtn/slowfast/models/video_model_builder.py", line 14, in <module> from . import head_helper, resnet_helper, stem_helper, vtn_helper File "/home/user/vtn/slowfast/models/__init__.py", line 6, in <module> from .video_model_builder import ResNet, SlowFast # noqa File "/home/user/vtn/slowfast/utils/misc.py", line 21, in <module> from slowfast.models.batchnorm_helper import SubBatchNorm3d File "/home/user/vtn/run_net.py", line 5, in <module> from slowfast.utils.misc import launch_job

I'm wondering because I have installed all the required packages and the versions of the installed packages are as follows:

Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
absl-py 1.3.0 pypi_0 pypi
antlr4-python3-runtime 4.9.3 pypi_0 pypi
aom 3.3.0 h27087fc_1 conda-forge
av 9.2.0 py310h1b041b7_0 conda-forge
binutils_impl_linux-64 2.39 h6ceecb4_0 conda-forge
black 22.3.0 pypi_0 pypi
blas 1.0 mkl
brotlipy 0.7.0 py310h7f8727e_1002
bzip2 1.0.8 h7b6447c_0
ca-certificates 2022.07.19 h06a4308_0
cachetools 5.2.0 pypi_0 pypi
certifi 2022.9.24 py310h06a4308_0
cffi 1.15.1 py310h74dc2b5_0
charset-normalizer 2.0.4 pyhd3eb1b0_0
click 8.1.3 pypi_0 pypi
cloudpickle 2.2.0 pypi_0 pypi
contourpy 1.0.5 pypi_0 pypi
cryptography 37.0.1 py310h9ce1e76_0
cudatoolkit 11.3.1 h2bc3f7f_2
cycler 0.11.0 pypi_0 pypi
cython 0.29.32 pypi_0 pypi
decorator 4.4.2 pypi_0 pypi
decord 0.6.0 pypi_0 pypi
detectron2 0.6 dev_0
fairscale 0.4.12 pypi_0 pypi
ffmpeg 4.4.1 h964e5f1_4 conda-forge
fftw 3.3.9 h27cfd23_1
filelock 3.8.0 pypi_0 pypi
fonttools 4.38.0 pypi_0 pypi
freetype 2.11.0 h70c0345_0
future 0.18.2 pypi_0 pypi
fvcore 0.1.5 pypi_0 pypi
gcc 12.2.0 h26027b1_11 conda-forge
gcc7 0.0.7 pypi_0 pypi
gcc_impl_linux-64 12.2.0 hcc96c02_18 conda-forge
giflib 5.2.1 h7b6447c_0
gmp 6.2.1 h295c915_3
gnutls 3.6.15 he1e5248_0
google-auth 2.13.0 pypi_0 pypi
google-auth-oauthlib 0.4.6 pypi_0 pypi
grpcio 1.50.0 pypi_0 pypi
gxx 12.2.0 h26027b1_11 conda-forge
gxx_impl_linux-64 12.2.0 hcc96c02_18 conda-forge
huggingface-hub 0.10.1 pypi_0 pypi
hydra-core 1.2.0 pypi_0 pypi
icu 70.1 h27087fc_0 conda-forge
idna 3.4 py310h06a4308_0
imageio 2.22.2 pypi_0 pypi
imageio-ffmpeg 0.4.7 pypi_0 pypi
intel-openmp 2021.4.0 h06a4308_3561
iopath 0.1.9 pypi_0 pypi
joblib 1.2.0 pypi_0 pypi
jpeg 9e h7f8727e_0
kernel-headers_linux-64 2.6.32 he073ed8_15 conda-forge
kiwisolver 1.4.4 pypi_0 pypi
lame 3.100 h7b6447c_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.39 hc81fddc_0 conda-forge
lerc 3.0 h295c915_0
libdeflate 1.8 h7f8727e_5
libdrm 2.4.113 h166bdaf_0 conda-forge
libffi 3.3 he6710b0_2
libgcc-devel_linux-64 12.2.0 h3b97bd3_18 conda-forge
libgcc-ng 12.2.0 h65d4601_18 conda-forge
libgfortran-ng 11.2.0 h00389a5_1
libgfortran5 11.2.0 h1234567_1
libgomp 12.2.0 h65d4601_18 conda-forge
libiconv 1.16 h7f8727e_2
libidn2 2.3.2 h7f8727e_0
libpciaccess 0.16 h516909a_0 conda-forge
libpng 1.6.37 hbc83047_0
libsanitizer 12.2.0 h46fd767_18 conda-forge
libstdcxx-devel_linux-64 12.2.0 h3b97bd3_18 conda-forge
libstdcxx-ng 12.2.0 h46fd767_18 conda-forge
libtasn1 4.16.0 h27cfd23_0
libtiff 4.4.0 hecacb30_0
libunistring 0.9.10 h27cfd23_0
libuuid 1.0.3 h7f8727e_2
libva 2.16.0 h166bdaf_0 conda-forge
libvpx 1.11.0 h9c3ff4c_3 conda-forge
libwebp 1.2.4 h11a3e52_0
libwebp-base 1.2.4 h5eee18b_0
libxcb 1.13 h7f98852_1004 conda-forge
libxml2 2.10.2 h22db469_0 conda-forge
libzlib 1.2.12 h166bdaf_3 conda-forge
lz4-c 1.9.3 h295c915_1
markdown 3.4.1 pypi_0 pypi
markupsafe 2.1.1 pypi_0 pypi
matplotlib 3.6.0 pypi_0 pypi
mkl 2021.4.0 h06a4308_640
mkl-service 2.4.0 py310h7f8727e_0
mkl_fft 1.3.1 py310hd6ae3a3_0
mkl_random 1.2.2 py310h00e6091_0
moviepy 1.0.3 pypi_0 pypi
mypy-extensions 0.4.3 pypi_0 pypi
ncurses 6.3 h5eee18b_3
nettle 3.7.3 hbbd107a_1
networkx 2.8.7 pypi_0 pypi
numpy 1.23.3 py310hd5efca6_0
numpy-base 1.23.3 py310h8e6c178_0
nvidia-ml-py 11.515.75 pypi_0 pypi
nvitop 0.10.1 pypi_0 pypi
oauthlib 3.2.2 pypi_0 pypi
omegaconf 2.2.3 pypi_0 pypi
opencv-python 4.6.0.66 pypi_0 pypi
openh264 2.1.1 h4ff587b_0
openssl 1.1.1q h7f8727e_0
packaging 21.3 pypi_0 pypi
pandas 1.5.1 pypi_0 pypi
parameterized 0.8.1 pypi_0 pypi
pathspec 0.10.1 pypi_0 pypi
pillow 9.2.0 py310hace64e9_1
pip 22.1.2 py310h06a4308_0 anaconda
platformdirs 2.5.2 pypi_0 pypi
portalocker 2.6.0 pypi_0 pypi
proglog 0.1.10 pypi_0 pypi
protobuf 3.19.6 pypi_0 pypi
psutil 5.9.3 pypi_0 pypi
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
pyasn1 0.4.8 pypi_0 pypi
pyasn1-modules 0.2.8 pypi_0 pypi
pycocotools 2.0.5 pypi_0 pypi
pycparser 2.21 pyhd3eb1b0_0
pydot 1.4.2 pypi_0 pypi
pyopenssl 22.0.0 pyhd3eb1b0_0
pyparsing 3.0.9 pypi_0 pypi
pysocks 1.7.1 py310h06a4308_0
python 3.10.0 h12debd9_5
python-dateutil 2.8.2 pypi_0 pypi
python_abi 3.10 2_cp310 conda-forge
pytorch 1.12.1 py3.10_cuda11.3_cudnn8.3.2_0 pytorch
pytorch-mutex 1.0 cuda pytorch
pytorchvideo 0.1.5 pypi_0 pypi
pytz 2022.5 pypi_0 pypi
pyyaml 6.0 pypi_0 pypi
readline 8.1.2 h7f8727e_1
regex 2022.9.13 pypi_0 pypi
requests 2.28.1 py310h06a4308_0
requests-oauthlib 1.3.1 pypi_0 pypi
rsa 4.9 pypi_0 pypi
scikit-learn 1.1.2 pypi_0 pypi
scipy 1.9.3 pypi_0 pypi
seaborn 0.12.1 pypi_0 pypi
setuptools 63.4.1 py310h06a4308_0
simplejson 3.17.6 pypi_0 pypi
six 1.16.0 pyhd3eb1b0_1
sqlite 3.39.3 h5082296_0
svt-av1 1.1.0 h27087fc_1 conda-forge
sysroot_linux-64 2.12 he073ed8_15 conda-forge
tabulate 0.9.0 pypi_0 pypi
tensorboard 2.10.1 pypi_0 pypi
tensorboard-data-server 0.6.1 pypi_0 pypi
tensorboard-plugin-wit 1.8.1 pypi_0 pypi
termcolor 2.0.1 pypi_0 pypi
threadpoolctl 3.1.0 pypi_0 pypi
timm 0.6.11 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
tokenizers 0.13.1 pypi_0 pypi
tomli 2.0.1 pypi_0 pypi
torchaudio 0.12.1 py310_cu113 pytorch
torchvideo 0.0.0 pypi_0 pypi
torchvision 0.13.1 py310_cu113 pytorch
tqdm 4.64.1 pypi_0 pypi
transformers 4.23.1 pypi_0 pypi
typing_extensions 4.3.0 py310h06a4308_0
tzdata 2022e h04d1e81_0
urllib3 1.26.12 py310h06a4308_0
werkzeug 2.2.2 pypi_0 pypi
wheel 0.37.1 pyhd3eb1b0_0
x264 1!161.3030 h7f98852_1 conda-forge
x265 3.5 h924138e_3 conda-forge
xorg-fixesproto 5.0 h7f98852_1002 conda-forge
xorg-kbproto 1.0.7 h7f98852_1002 conda-forge
xorg-libx11 1.7.2 h7f98852_0 conda-forge
xorg-libxau 1.0.9 h7f98852_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xorg-libxext 1.3.4 h7f98852_1 conda-forge
xorg-libxfixes 5.0.3 h7f98852_1004 conda-forge
xorg-xextproto 7.3.0 h7f98852_1002 conda-forge
xorg-xproto 7.0.31 h7f98852_1007 conda-forge
xz 5.2.6 h5eee18b_0
yacs 0.1.8 pypi_0 pypi
zlib 1.2.12 h5eee18b_3
zstd 1.5.2 ha4553b6_0

How can I fix this Problem.
Please let me know.

Thank You

Where is the code？

Where is the code？Can the code be provided?

How to reproduce result in live camera feed without saving as video

Hi @bomri ,
Thanks a lot for making the repo public.
I want to reproduce result in live camera feed without saving as video.
For example, I start the camera feed, creating a stack of 16 frames, then providing these frames as input to the model.
For that, I went through data loading code but didn't get it completely where to start.
I'm went through the code run_net.py that leads to test_net.py that further leads to loader.py for my use case.
Can you please guide me through this?

how to prepare kinetics400 dataset to run your code?

When I run the code, I get the following error：
raceback (most recent call last):
File "/media/hulijuan/hdisk1/bishe_code/SlowFast/tools/run_net.py", line 46, in
main()
File "/media/hulijuan/hdisk1/bishe_code/SlowFast/tools/run_net.py", line 27, in main
launch_job(cfg=cfg, init_method=args.init_method, func=train)
File "/media/hulijuan/hdisk1/bishe_code/SlowFast/slowfast/utils/misc.py", line 303, in launch_job
func(cfg=cfg)
File "/media/hulijuan/hdisk1/bishe_code/SlowFast/tools/train_net.py", line 392, in train
train_loader = loader.construct_loader(cfg, "train")
File "/media/hulijuan/hdisk1/bishe_code/SlowFast/slowfast/datasets/loader.py", line 83, in construct_loader
dataset = build_dataset(dataset_name, cfg, split)
File "/media/hulijuan/hdisk1/bishe_code/SlowFast/slowfast/datasets/build.py", line 31, in build_dataset
return DATASET_REGISTRY.get(name)(cfg, split)
File "/media/hulijuan/hdisk1/bishe_code/SlowFast/slowfast/datasets/kinetics.py", line 85, in init
self._construct_loader()
File "/media/hulijuan/hdisk1/bishe_code/SlowFast/slowfast/datasets/kinetics.py", line 109, in _construct_loader
== 2
AssertionError

So I want to ask you what I should do with the data or can you show me your data directory structure and the parameters that you pass into the command line.

Have you preprocessed the kinetics dataset?

I found the format of .csv file in kinetic 400 is different from the requirement in your code.

Visualization: how effective is a PART of frames on decision making

Hey,

thanks for your fantastic work! It inspires me a lot.

I'm trying to visualize the CLS token as described in the paper. I take the attention matrix of CLS token and visualize the weights. What I can get is: which frame helps the model make its classification decision.

But from the paper, as my understanding, you are able to visualize how a part of a frame is helping the model make its decision. For example, in the same frame where hands, rope and shackle show up, the weights of hands > rope > shackle.

May I ask how ow did you visualise the impact on results for a part of the image instead of a image as a whole part?

Thanks in advance,
Leo

How to visualize the cls token in VTN ?

Hello,

Your work on VTN is excellent, it inspires me a lot.
In the paper, you said you visualize the [CLS] token attention weights. I am trying to visualize it , but I struggled to understand the meaning of the cls token. I now can get the cls token before mlp which shapes ( batch size * 768 ), how can I visualize it?

I would appreciate it if you could tell me. If you can help me with a simple example, that would be great.

Thank you!

Position IDs in VTN

Hi @bomri

Thanks for the code. Could you please explain what is position_ids (and their shape) in VTN code here?
https://github.com/bomri/SlowFast/blob/master/slowfast/models/video_model_builder.py#L825

Do I need to modify my dataloader to include the position_ids?

Thanks

VTN input example

Hello,

I am trying to setup the VTN model for training. For that I am digging into the model's architecture, specifically the model's forward function. But I am struggling to understand how the input values should be constructed, specifically what the position_ids is and how it is calculated. I have tried to look in the internet for examples and it seems that there is none.

If you can help me with a simple example, that would be great. I also can help by adding documentation to this bit of code.

Reproducing top 1 accuracy in Figure 4

Hi,

I am trying to reproduce the top 1 accuracy of ViT-B-VTN in Figure 4. I trained a model using the config file in configs/Kinetics/VIT_B_VTN.yaml, but I only get 68.5% accuracy after epoch 9. The following is the train and val result from my experiment. Is there anything that I have overlooked? Thanks!

train:

epoch
1/25    41.12769
2/25    54.65638
3/25    58.83401
4/25    61.77271
5/25    63.87714
6/25    65.69931
7/25    67.11017
8/25    68.33328
9/25    69.61342

val:

epoch
2/25    59.21026
3/25    62.00704
4/25    64.85412
5/25    65.65895
6/25    67.18813
7/25    67.39437
8/25    68.49095
9/25    68.50604

Unable to load weights of the 'VIT_B_VTN' model

While trying to test run_net.py using the weights 'VTN_VIT_B_KINETICS.pyth' (available from the model zoo link, I am unable to load the weights. I am getting the error as - "PytorchStreamReader failed reading zip archive: failed finding central directory" and unable to proceed further.

Would like to request help for the same, thanks! :)

Model weights for ViT-B-VTN (1 layer)

Thanks for open-sourcing your work!

Do you still have the model weights for the 1-layer ViT-B-VTN lying around, and would you be willing to add them to the MODEL_ZOO?

It would be greatly appreciated 🙏

Thermal diagram of key frame

hi @author

How to draw the thermal diagram of the key frame in the paper.

I would appreciate it if you could tell me.

size of checkpoint file

Hello,
I have trained a model using the VTN model, but the checkpoint file was about 900MB. Is it right? I think the file is too big.

How to use Resnet as backbone?

Hi, Thank for your works.
I want to know how to use resnet as the backbone as mentioned in the paper.
Should I drop full-connect layer or/and avg-pooling layer of Resnet?

VTN attention window

Hi,
Thank you sharing the code.
In the paper of VTN, the authors said For the Longformer, we use an effective attention window of size 32. In the code the attention window is 18. Could you please specify the window size for reproducing the result?

Thank you very much.

VTN not converging on UCF101

Hello there,

Thanks for sharing the code. I am not using the full codebase, instead, I am taking your model and fitting it into my training code. My training code is standard UCF101 classification code and works pretty well with the known 3D-CNN architectures. This issue can be very crucial for the people who are trying to use just your architecture for different video understanding tasks. I have done the following things, not sure what went wrong and VTN is not converging with my code:

I installed dependencies and was successfully able to build VTN model from https://github.com/bomri/SlowFast/blob/master/slowfast/models/video_model_builder.py#L765, I used https://github.com/bomri/SlowFast/blob/master/configs/Kinetics/VIT_B_VTN.yaml config file and passed random input vector and vtn was able to produce some output of expected dimension without any error.
Now, I changed the NUM_CLASSES=101 for UCF101 in config yaml file and put the model into my training framework which has adam optimizer and dataloder that produces an output of 16 frames with skip rate of 2 and resolution of 224. The training loss slightly decreases from 4.8 --> 4.6 in an initial epoch and then just stuck there.

I have no clue what has gone wrong, any lead would be appreciated. Hoping to hear from you soon.

-Ishan

bomri / slowfast Goto Github PK

slowfast's People

Stargazers

Watchers

Forkers

slowfast's Issues

Recommend Projects

Recommend Topics

Recommend Org