Code Monkey home page Code Monkey logo

scenegraph's Introduction

Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction

Introduction

Scene graph prediction is the task of mapping an image into a set of bounding boxes, along with their categories and relations (e.g., see [2, 3, 4, 5, 6]).

In the paper Scene Graphs with Permutation-Invariant Structured Prediction (2018) [1] we present a new architecture for graph inference that has the following structural property: on the one hand, the architecture is invariant to input permutations; on the other hand, every permutation-invariant function can be implemented via this architecture.

In this repository, we share our architecture implementation for the task of scene graph prediction.

Model implementation

Scene Graph Predictor (SGP) gets as an input inital confidience distributions per entity and relation and processes these to obtain new labels. SGP satisfies the graph permutation invariance property intoduced in the paper. The model is implemented in TensorFlow. For the initial confidence distributions per entity and relation, we simply re-use features learned by the baseline model from Zellers et al. (2017). (git repository https://github.com/rowanz/neural-motifs)

SGP architecture

Our SGP implementation is using an iteratively RNN to process predictions. Each step outputs an improved predictions.

A schematic representation of the architecture. Given an image, a Label predictor outputs initial predictions equation. Then, our SGP model, computes each equation element wise. Next, they are summed to create vector equation, which is concatenated with equation. Then, equation is applied, and another summation creates the graph representation. Finally, equation classifies objects and equation classifies relation. The process of SGP could be repeated iteratively (in the paper we repeat it 3 times).

For more information, please look at the code (Module/Module.py file) and the paper.

Attention with SGP architecture

Our SGP architecture uses attention at the feature-level for each node during inference. We weight the significance of each feature per node, such that the network can choose which features from adjacent nodes contributes the most information.

An example of attention per entities and global attention over all nodes. The size and location of objects provide a key signal to the attention mechanism. The model assigns higher confidence for the label "tie" when the label "shirt" is detected (third panel from the left). Similarly, the model assigns a higher confidence for the label "eye" when it is located near "hair".

Dependencies

To get started with the framework, install the following dependencies:

Run "pip install -r requirements.txt" - to install all the requirements.

Usage

  1. Run "python Run.py download" to download and extract train, validation and test data. The data already contains the result of applying the baseline detecor over the VisualGenome data.
  2. Run "python Run.py eval gpi_linguistic_pretrained <gpu-number>" to evaluate the pre-trained model of our best variant, linguistic with multi-head attention. (recall@100 SG Classification).
  3. Run "python Run.py train gpi_linguistic <gpu-number>" to train a new model (linguistic with multi-head attention).
  4. Run "python Run.py eval gpi_linguistic_best <gpu-number>" to evaluate the new model. (recall@100 SG Classification).

About this repository

This repository contains an implementation of our best variant (Linguistic with multi-head attention) of the Scene Graph Prediction (SGP) model introduced in the paper Scene Graphs with Permutation-Invariant Structured Prediction. (The repsitory updated for version 1 of the paper - the results of latest version will be published Γ—in the future). Specifically, the repository allow to run scene-graph classification (recall@100) evaluation script on our pre-trained model or alternatively (1) train an SGP model (2) evaluate the trained model using scene-graph classification (recall@100) evaluation script.

References

[1] Roei Herzig, Moshiko Raboh, Gal Chechik, Jonathan Berant, Amir Globerson, Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction, 2018.

[2] Justin Johnson, Ranjay Krishna, Michael Stark, Li Jia Li, David A. Shamma, Michael S. Bernstein, Fei Fei Li, Image Retrieval using Scene Graphs, CVPR, 2015.

[3] Cewu Lu, Ranjay Krishna, Michael S. Bernstein, Fei Fei Li, Visual Relationship Detection with Language Priors, ECCV, 2016.

[4] Xu, Danfei and Zhu, Yuke and Choy, Christopher and Fei-Fei, Li, Scene Graph Generation by Iterative Message Passing, CVPR, 2017.

[5] Alejandro Newell and Jia Deng, Pixels to Graphs by Associative Embedding, NIPS, 2017.

[6] Rowan Zellers, Mark Yatskar, Sam Thomson, Yejin Choi, Neural Motifs: Scene Graph Parsing with Global Context, CVPR, 2018.

Cite

Please cite our paper if you use this code in your own work:

@inproceedings{hr18perminvimg2sg,
  author    = {Roei Herzig and
               Moshiko Raboh and
               Gal Chechik and
               Jonathan Berant and
               Amir Globerson},
  title     = {Mapping Images to Scene Graphs with Permutation-Invariant Structured
               Prediction},
  booktitle = {Advances in Neural Information Processing Systems (NIPS)},
  year      = {2018}
}

scenegraph's People

Contributors

roeiherz avatar shikorab avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.