Code Monkey home page Code Monkey logo

identify-known-sites-in-photo-album's Introduction

Identify Known Sites in Photo Album

shield shield shield shield shield shield shield shield shield shield shield shield shield shield shield shield shield shield shield

alt text

Introduction:

This is a university project based on the Google Landmark Recognition 2020 kaggle competiton.

The goal of this project is to classify successfully images of known sites from around the world, given big and challenging train set to learn from and a test set that contain mainly out of domain images.

In face of the special and challenging features of the data set, we proposed and implemented two possible solution using machine learning techniques.

The first solution, a baseline, is a simple straight forward aprrocah, training a CNN (EfficientNet using RAdam optimizer) and use it as a classifier. This solution fail to overcome the challenging aspects of the data set and yields poor results.

The second solution is a retrival based solution that derive inspiration from other teams solution to this competition.

This solution consist of two steps, the first is to clean the test set from out of domain images using object detection (we used YOLO darknet implementation). Object detection examples:

alt text alt text

The second is classification using nearest neighbor algorithm, using the images features vector.

The power of using feature vectore and K-NN (the test set image is to the left, next to it there are the 5 nearest neighbors from the train set. If there is a small res X in the lower right corner of the image, it means that this image class is not as the test set image):

alt text alt text alt text

Even when the classification is not succesful, the nearest neighbors still have some resemblance to the test set image:

alt text

This solution is built to face on the challenging features of the data set and although the solution it yields are far from great they are much better than the baseline's results.

Prerequisites

To run the whole code of this project, one needs the following libraries (in the specified version or higher):

Library Version
Python 3.6
torch 1.8.0
torchvision 0.9.0
pandas 1.25.0
numpy 1.19.0
opencv 4.2.0
matplotlib 3.2.1
seaborn 0.11.0
efficientnet_pytorch 0.7.0
torch_optimizer 0.1.0
sklearn 0.21.3
PIlow 6.1.0
tqdm 4.55.0

In this project we also used YOLO darknet implementation as an object detector. We used version 3 and version 4 network that were pre trained on Open Images Dataset and COCO Dataset accordingly.

Many of the code in this project is part of a jupyter notebook. Unfortunately, GitHub is not able to render successfully all the notebooks, so one can download them and run them locally or via colab or view them using nbviewer with the links in the nbviewer directory.

Code and Repository Organization

The code we wrote for this project is organized in sub directories, so that there is a sub directory for each part of the project. Each sub directory contain the relevant code files (.py or .ipynb) and may contain csv files or images.

Sub-Directory Content
\baseline directory containing implementation of the baseline, results and evaluation
\data directory containing GLDv2 dataset analysis
\feature_extraction directory containing implementation of feature extraction and K-NN classifier
\images directory containing images used in this repository
\landmark_classifier directory containing pre-process of the data as input to YOLO Darknet implementation and its results analysis and evaluation
\nbviewer directory containing nbviewer links for the jupyter notebook in this repository
\poster directory containing the project poster
\results_and_evaluation directory containing the classification results and evaluation

We tried to write the code so it will be organized and well documented.

Team:

Matan Kleiner

Yuval Snir

Supervised by Ori Linial

References

[1] T. Weyand, A. Araujo, B. Cao and J. Sim, Proc. "Google Landmarks Dataset v2 - A Large-Scale Benchmark for Instance-Level Recognition and Retrieval", CVPR'20

[2] K. Chen et-al โ€œ2nd Place and 2nd Place Solution to Kaggle Landmark Recognition and Retrieval Competition 2019", arXiv:1906.03990 [cs.CV], Jun. 2019.

[3] J. Redmon and A. Farhadi. "YOLOv3: An Incremental Improvement", arXiv:1804.02767v1 [cs.CV] Apr. 2018.

[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton. "ImageNet classification with deep convolutional neural networks", In Proceedings of NIPS, pages 1106โ€“1114, 2012.

identify-known-sites-in-photo-album's People

Contributors

matankleiner avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

yuvalsnir

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.