Code Monkey home page Code Monkey logo

semantic_fidelity-and-egoshots's Introduction

Egoshots dataset and Semantic Fidelity metric

This repo contains code for our paper

Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models accepted at the MACHINE LEARNING IN REAL LIFE (ML-IRL) ICLR 2020 workshop.

Dataset

Egoshots consists of real-life ego-vision images captioned using state of the art image captioning models, and aims at evaluating the robustness, diversity, and sensitivity of these models, as well as providing a real life-logging setting on-the-wild dataset that can aid the task of evaluating real settings. It consists of images from two computer scientists while interning at Philips Research, Netherlands, for one 1 month each.

Images are taken automatically by the Autoographer wearable camera when events of interest are detected autonomously.

Egoshots Dataset images are availaible at Egoshots repo with corresponding (transfer learning pre-trained) captions here.

Captioning Egoshots

Unlabelled images of the Egoshots dataset are captioned by exploiting different image captioning models. We limit our work to three models, namely:

  1. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
  2. nocaps: novel object captioning at scale
  3. Decoupled Novel Object Captioner

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

cd image-captioning/ShowAttendAndTell
conda create -n myenv
conda activate myenv
pip install -r requirements.txt

The images to be captioned need to be placed in the folder test/images/. The pretrained weights of the image captioning network are extracted from this link into the current folder.

The pre-trained image captioning model can be used to caption the dataset by running the following command:

python main.py --phase=test \
    --model_file='./models/289999.npy' \
    --beam_size=3
  • All the generated captions are saved in the test folder as results.csv
  • To caption the Egoshots images and to extract the pre-trained weights the codes are built upon this repository.

nocaps: novel object captioning at scale

cd image-captioning/nocaps

To prevent version mismatch (such as GPU Tensorflow version and Caffe conflict) a separate virtual environment is used:

conda create -n caffe
conda activate caffe
pip install -r requirements.txt

The images to be captioned are put in the folder images/. The pre-trained weights are downloaded using

./download_models.sh

Images are captioned by

python noc_captioner.py
  • The generated captions are saved in the results folder.
  • The code for captioning the images and pre-trained weights are built upon this repository.

Decoupled Novel Object Captioner

cd image-captioning/dnoc

The images to be captioned are put in the folder prepare_data\mscoco\val2014\. All the images are pre-processed using the following command:

conda activate myenv
cd prepare_data
sh step2_detection.sh
sh step3_image_feature_extraction.sh
sh step4_transfer_coco_to_noc.sh
python run.py
cd ..

The pre-trained weights of the model can be downloaded from here. The pre-processed images are captioned using

python run.py --stage test
  • All generated captions are saved in dnoc_ego.txt.
  • The code for preparing the data and captioning the images and the pre-trained weights are built upon this repository.

Object-Detector

To measure the Semantic Fidelity of the given caption all the object classes present in the image also need to be detected. The initial metric uses YOLO-9000 because of its ability to detect 9000 different classes.

YOLO-9000

cd image-captioning/YOLO-9000

To detect all the object classes present in each Egoshots images YOLO9000 is used. The detection and the pre-trained weights are extracted using this repository. All images are stored in darknet/data/EgoShots/. To run the YOLO-9000 object detector on all the images of the Egoshots dataset, run:

for i in data/EgoShots/*.jpg; do ./darknet detector test cfg/combine9k.data cfg/yolo9000.cfg ../yolo9000-weights/yolo9000.weights "$i" ; done > detected_object.txt
  • The detected objects corresponding to each image are stored in the file detected_object.txt

Objects and Caption Annotation

Filter the individual CSV's into a single file for each image their corresponding captions as Captions.csv and for each image all the object classes as Objects.csv.

python caption_annotation.py
python image-captioning/YOLO-9000/darknet/object_detector_annotation.py

Semantic Fidelity(SF) metric computation

python metrics.py

The code calculates the Semantic Fidelity value for each captions and the final value are saved as Meta-data.csv.

SF and its variants

To check the authenticity of the initial SF metric we compare its similarity with manually annotated images through the use of the Human Semantic Fidelity metric (and various different forms of SFs by comparing the Pearson correlation coef. rho and coef. of determination R^2 values):

The notebook SFs_plot.ipynb compares various regression plots for the different variants of SFs with their corresponding confidence interval.

Final Caption

The Semantic Fidelity as calculated is used to output the final captions(in the order highest to lowest SF) for the given image as

python final_caption.py --image ****.jpg

Acknowledgement

We thank the work by Meng-Jiun Chiou, vsubhashini, Yu-Wu and Philippe Rémy for releasing the pretrained weights of the image captioning and object detector models which helped in labelling the Egoshots dataset.

Paper associated

If you use it, please cite:

Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models Pranav Agarwal, Alejandro Betancourt, Vana Panagiotou, Natalia Díaz-Rodríguez. Machine Learning in Real Life (ML-IRL) ICLR 2020 Workshop https://arxiv.org/abs/2003.11743

@InProceedings{Agarwal20egoshots,
    author={Pranav Agarwal and Alejandro Betancourt and Vana Panagiotou and Natalia Díaz-Rodríguez},
    year={2020},
    month = {Mar},
    title = {Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models},
    booktitle = {Machine Learning in Real Life (ML-IRL) Workshop at the International Conference on Learning Representations  (ICLR)},
    url={https://arxiv.org/abs/2003.11743}
}

semantic_fidelity-and-egoshots's People

Contributors

pranaval avatar nataliadiaz avatar

Stargazers

 avatar Ifty Mohammad Rezwan avatar  avatar

Watchers

 avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.