Code Monkey home page Code Monkey logo

semantic-aware-scene-recognition's Introduction

Semantic-Aware Scene Recognition

GitHub version GitHub license GitHub stars

Official Pytorch Implementation of Semantic-Aware Scene Recognition by Alejandro López-Cifuentes, Marcos Escudero-Viñolo, Jesús Bescós and Álvaro García-Martín (Elsevier Pattern Recognition).

ExampleFocus

Summary

This paper propose to improve scene recognition by using object information to focalize learning during the training process. The main contributions of the paper are threefold:

  • We propose an end-to-end multi-modal deep learning architecture which gathers both image and context information using a two-branched CNN architecture.
  • We propose to use semantic segmentation as an additional information source to automatically create, through a convolutional neural network, an attention model to reinforce the learning of relevant contextual information.
  • We validate the effectiveness of the proposed method through experimental results on public scene recognition datasets such as ADE20K, MIT Indoor 67, SUN 397 and Places365 obtaining state-of-the-art results.

The propose CNN architecture is as follows:

NetworkArchitecture

State-of-the-art Results

ADE20K Dataset

RGB Semantic Top@1 Top@2 Top@5 MCA
55.90 67.25 78.00 20.96
50.60 60.45 72.10 12.17
62.55 73.25 82.75 27.00

MIT Indoor 67 Dataset

Method Backbone Number of Parameters Top@1
PlaceNet Places-CNN 62 M 68.24
MOP-CNN CaffeNet 62 M 68.90
CNNaug-SVM OverFeat 145 M 69.00
HybridNet Places-CNN 62 M 70.80
URDL + CNNaug AlexNet 62 M 71.90
MPP-FCR2 AlexNet 62 M 75.67
DSFL + CNN (7 Scales) AlexNet 62M 76.23
MPP + DSFL AlexNet 62 M 80.78
CFV VGG-19 143 M 81.00
CS VGG-19 143 M 82.24
SDO (1 Scale) 2 x VGG-19 276 M 83.98
VSAD 2 x VGG-19 276 M 86.20
SDO (9 Scales) 2 x VGG-19 276 M 86.76
Ours ResNet-18 + Sem Branch + G-RGB-H 47 M 85.58
Ours* ResNet-50 + Sem Branch + G-RGB-H 85 M 87.10

SUN 397 Dataset

Method Backbone Number of Parameters Top@1
Decaf AlexNet 62 M 40.94
MOP-CNN CaffeNet 62 M 51.98
HybridNet Places-CNN 62 M 53.86
Places-CNN Places-CNN 62 M 54.23
Places-CNN ft Places-CNN 62 M 56.20
CS VGG-19 143 M 64.53
SDO (1 Scale) 2 x VGG-19 276 M 66.98
VSAD 2 x VGG-19 276 M 73.00
SDO (9 Scale) 2 x VGG-19 276 M 73.41
Ours ResNet-18 + Sem Branch + G-RGB-H 47 M 71.25
Ours* ResNet-50 + Sem Branch + G-RGB-H 85 M 74.04

Places 365 Dataset

Network Number of Parameters Top@1 Top@2 Top@5 MCA
AlexNet 62 M 47.45 62.33 78.39 49.15
AlexNet* 62 M 53.17 - 82.59 -
GooLeNet* 7 M 53.63 - 83.88 -
ResNet-18 12 M 53.05 68.87 83.86 54.40
ResNet-50 25 M 55.47 70.40 85.36 55.47
ResNet-50* 25 M 54.74 - 85.08 -
VGG-19* 143 M 55.24 - 84.91 -
DenseNet-161 29 M 56.12 71.48 86.12 56.12
Ours 47 M 56.51 71.57 86.00 56.51

Setup

Requirements

The repository has been tested in the following software versions.

  • Ubuntu 16.04
  • Python 3.6
  • Anaconda 4.6

Clone Repository

Clone repository running the following command:

$ git clone https://github.com/vpulab/Semantic-Aware-Scene-Recognition.git

Anaconda Enviroment

To create and setup the Anaconda Envirmorent run the following terminal command from the repository folder:

$ conda env create -f Config/Conda_Env.yml
$ conda activate SA-Scene-Recognition

Datasets

Download and setup instructions for each datasets are provided in the follwing links:

Evaluation

Model Zoo

In order to evaluate the models independently, download them from the following links and indicate the path in YAML configuration files (Usually /Data/Model Zoo/DATASET FOLDER).

[Recommended] Alternatively you can run the following script from the repository folder to download all the available Model Zoo:

bash ./Scripts/download_ModelZoo.sh

ADE20K

MIT Indoor 67

SUN 397

Places 365

Run Evaluation

In order to evaluate models run evaluation.py file from the respository folder indicating the dataset YAML configuration path:

python evaluation.py --ConfigPath [PATH to configuration file]

Example for ADE20K Dataset:

python evaluation.py --ConfigPath Config/config_ADE20K.yaml

All the desired configuration (backbone architecture to use, model to load, batch size...etc) should be changed in each separate YAML configuration file.

Computed performance metrics for both training and validation sets are:

  • Top@1
  • Top@2
  • Top@5
  • Mean Class Accuracy (MCA)

Citation

If you find this code and work useful, please consider citing:

@article{lopez2020semantic,
  title={Semantic-Aware Scene Recognition},
  author={L{\'o}pez-Cifuentes, Alejandro and Escudero-Vi{\~n}olo, Marcos and Besc{\'o}s, Jes{\'u}s and Garc{\'\i}a-Mart{\'\i}n, {\'A}lvaro},
  journal={Pattern Recognition},
  pages={107256},
  year={2020},
  publisher={Elsevier}
}

Acknowledgment

This study has been partially supported by the Spanish Government through its TEC2017-88169-R MobiNetVideo project.

LogoMinisterio

semantic-aware-scene-recognition's People

Contributors

alexlopezcifuentes avatar dhatwalia avatar jiahangwu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

semantic-aware-scene-recognition's Issues

resnet50-RGB-branch model

I use the resnet50 RGB-branch model on MIT indoor67 dataset to predict the image from the same dataset, but most of the result is not right, why is that?
Thanks a lot!

Question about two attention modules

There are two attention modules used in SAScnenNet,one is chain connected 3xChAM and the other is “Attention Module”.

Q1: What happened when feature pass 3xChAM (concentrate on svevral special channel that strongly about the scene?)
Q2: Why do we need 3 ChAM but not less or more (is it bucause 3module can make the feature more concentrate on the decisive feature that help to make sure the scene?)
Q3: Why do we need “Attention Module”, what is the difference between this and ChAM in function (is it like one judeg "what" and another judge "where" in CBAM?)

I very much look forward to your reply

How to test this model?

Thank you for the amazing repo
However, it appears as if this repo is used for training the model. is it possible to create a file that just takes the input image and generates an output from the trained model?

Places-365 noisy training data not found

Tried bash ./Scripts/download_Places365_extra.sh ./Data/Datasets/places365_standard
command to get the precomputed Semantic Segmentation masks.

Training masks are not downloaded only it contains validation masks.

Please, update/fix the sh file to provide training data for precomputed Semantic Segmentation masks.

Thanks.

how to train your model. Need the train.py file or the command to train

Hi. thank you for releasing your code. Could you provide the command to train your model?
the evaluation.py quits with No checkpoint found. I don't want to use your checkpoint. I would like to have mine created during the training. Any idea of how to do that with your current repository?

Thank you

noisy_annotations vs noisy_scores

|----images
	|--- training
		ADE_train_00000001.jpg
		...
	|--- validation
		ADE_val_00000001.jpg
		...
|----noisy_annotations_RGB
	|--- training
		ADE_train_00000001.png
		...
	|--- validation
		ADE_train_00000001.png
		...
|----noisy_scores_RGB
	|--- training
		ADE_train_00000001.png
		...
	|--- validation
		ADE_train_00000001.png
		...

The difference between the images folder and the other two folders seems to be .jpg vs .png. But what's the difference between the noisy_annotations_RGB folder and the noisy_scores_RGB folder? Or is there a typo?

And the validation folders of the noisy_annotations_RGB and noisy_scores_RGB folders should not contain train set files, as shown above, right?

The training code

Hello! I want to reproduce the experimental results,but I failed on the training code. Can you upload your training code, please!

Pre-trained model

First off, thank you very much for open-sourcing the code for your state-of-the-art results!

Would it be possible to open-source the best performing model you obtained?

I understand that you have provided some code, but I don't know if enough code/details are provided to train the model from scratch (like there is no train.py, no mention of how much computational power is required, etc.)...

Could one of the two things above be provided please?

How to convert semantic segmentation results into required format

Hi,
Thanks for your amazing work.

I met a question when I implemented this model to other unseen data. The model required two extra inputs: sem_labels and sem_scores. I checked your paper and couldn't find out specified instruction about how to convert the original semantic segmentation results to this two new inputs.

The semantic segmentation model is this. The model will predict a W x H x L score matrix. Can you explain a little about the following operations?

Best,
Neo

When I use RGB_ ResNet50_ SUN model, select ONLY_ RGB: TRUE, the evaluation.py report an error

in config_SUN397.yaml:
MODEL:
ARCH: ResNet-50
PATH: ./Data/Model Zoo/SUN397/
NAME: RGB_ResNet50_SUN
ONLY_RGB: TRUE
ONLY_SEM: FALSE

TRAINING:
PRINT_FREQ: 10
PRECOMPUTED_SEM: FALSE
BATCH_SIZE:
TRAIN: 100
TEST: 1
LR: 2.5e-4
LR_DECAY: 10
MOMENTUM: 0.9
OPTIMIZER: DFW
POLY_POWER: 0.9
WEIGHT_DECAY: 5.0e-4
AVERAGE_LOSS: 20

VALIDATION:
PRINT_FREQ: 10
BATCH_SIZE:
TRAIN: 100
TEST: 1
TEN_CROPS: TRUE

error:
FileNotFoundError: [Errno 2] No such file or directory: './Data/Datasets/SUN397/noisy_annotations_RGB/val/conference_room/sun_aatxlublfjchvvzu.png'

the model is selecting PRECOMPUTED_ SEM: FALSE , still required noise_ Annotations_ RGB and noise_ Annotations_Scores ?

Thank you very much for answering my question!

Model zoo links expired

Hello, I am trying to download the models from the model zoo links, but it seems that they have expired. Could you fix that for us please? Thanks a lot.

Runtime Error while evaluating the model

I am getting the following error on running evaluation.py

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

This is the stack trace
Traceback (most recent call last):
File "evaluation.py", line 300, in
val_top1, val_top2, val_top5, val_loss, val_ClassTPDic = evaluationDataLoader(val_loader, model, set='Validation')
File "evaluation.py", line 110, in evaluationDataLoader
prec1, prec2, prec5 = utils.accuracy(outputSceneLabel.data, sceneLabelGT, topk=(1, 2, 5))
File "/content/Semantic-Aware-Scene-Recognition/Libs/Utils/utils.py", line 108, in accuracy
correct_k = correct[:k].view(-1).float().sum(0)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Could you please help me with this issue?

download dataset

I can not download the dataset such as ADE20K , could you check that problem ? I am in China .Thank you very much !

Why did the semantic score map become 152 channels

Hello! First of all, thank you very much for sharing the source code of your paper, which has played a very positive role in my research work.
It is well known that the number of object obtained from dataset ADE20K is 150, but for some reason, you set the number of channels in the source code to 152. Can you explain why?

Once again, I would like to express my sincere respect for your work

Thank you!

Hi,

I just wanted to say thank you for sharing your code and the pretrained weights. It's a super cool work! :)

AttributeError: Can't pickle local object 'ADE20KDataset.__init__.<locals>.<lambda>'

Hi, I downloaded all the available Model Zoo, and ran the evaluation.py. However, there are some bugs in my project. Please, give me some suggestions and solutions.
Thanks.

The error is following:

Traceback (most recent call last):
  File "evaluation.py", line 279, in <module>
    sample = next(iter(val_loader))
  File "D:\Software\Code\Anaconda3\envs\SA-Scene-Recognition\lib\site-packages\torch\utils\data\dataloader.py", line 279, in __iter__
    return _MultiProcessingDataLoaderIter(self)
  File "D:\Software\Code\Anaconda3\envs\SA-Scene-Recognition\lib\site-packages\torch\utils\data\dataloader.py", line 719, in __init__
    w.start()
  File "D:\Software\Code\Anaconda3\envs\SA-Scene-Recognition\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "D:\Software\Code\Anaconda3\envs\SA-Scene-Recognition\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "D:\Software\Code\Anaconda3\envs\SA-Scene-Recognition\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "D:\Software\Code\Anaconda3\envs\SA-Scene-Recognition\lib\multiprocessing\popen_spawn_win32.py", line 89, in __init__
    reduction.dump(process_obj, to_child)
  File "D:\Software\Code\Anaconda3\envs\SA-Scene-Recognition\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'ADE20KDataset.__init__.<locals>.<lambda>'

Model Zoo Links Broken

I tried downloading the models on the MIT Indoor 67, SUN 397, and Places 365 dataset, and all of these links seem to be broken. The only link that works is for the ADE20K dataset.

Can these please be fixed?

questions about the precomputed segmentation mask

Hi, I have just read your paper and i wonder how to get the precomputed semantic segmentation mask, as the original feature maps have a dimension of 150, so i want to know the codes of how to get the semantic segmentation map and process them into '.png' file.
Thanks very much!!!

evaluation.py KeyError: 'ARCH'

When I run python evaluation.py --ConfigPath Config/config_ADE20K.yaml after doing everything in the README.md, I get the following error:

-----------------------------------------------------------------
Evaluation starting...
-----------------------------------------------------------------
Evaluating complete model
Traceback (most recent call last):
  File "evaluation.py", line 168, in <module>
    print('Selected RG backbone architecture: ' + CONFIG['MODEL']['ARCH'])
KeyError: 'ARCH'

When I print out the contents of CONFIG['MODEL'], I get:

CONFIG[MODEL]: {'PATH': './Data/Model Zoo/ADEChallengeData2016/', 'NAME': 'SAScene_ResNet18_ADE', 'ONLY_RGB': False, 'ONLY_SEM': False}

Any ideas why I may be getting this error?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.