Code Monkey home page Code Monkey logo

bachelorthesis's Introduction

Understanding Variational Autoencoders' Latent Representations of Remote Sensing Images

Disclaimer: I would not write code like this today :3

Abstract

In computer vision, neural networks have been successfully employed to solve multiple tasks in parallel in one model. The joint exploitation of related objectives can improve the performance of each individual task. The architecture of these multi task models can be more complex since the networks branch into the atomic tasks at a certain depth. For this reason the process of designing such architectures often involves a lot of time consuming trial-and-error. Therefore a more systematic taxonomy is desirable to replace this experimental process. For constructing this taxonomy it is necessary to have an understanding of the latent information learned by specific layers of single task models. This work uses convolutional variational autoencoders to produce latent representations of aerial images which are analyzed to understand the inner workings of the models. The method relies on testing whether or not learned clusters can be attributed to different high level input features like topographic classes common for the field of remote sensing. Visualizations are produced to gain insight and an understanding of the information captured in the latent space of the variational autoencoders. Moreover, it is observed how different architectural choices affect the reconstructions and the latent space. Code to reproduce the experiments is publicly available here: https://github.com/HannesStaerk/bachelorThesis.

Models

Every script like Kernel3adjusted2x2x256.py contains an architecture with an adjustable coding size for the latent vector. The a data_source_dir with 128x128 images has to be specified and functions to train, to make predictions and generations and to create the t-SNE and PCA visualizations can be called.

Data

In utils.py there is a function to split and resize images. This can be used to split or risize the 1024x1024 images given by the dataset into the 128x128 images that the models take as input.

The Data can be downloaded here http://www.grss-ieee.org/community/technical-committees/data-fusion/2019-ieee-grss-data-fusion-contest-data/

bachelorthesis's People

Contributors

hannesstark avatar

Stargazers

 avatar

bachelorthesis's Issues

Description of architecture

I need much more information about the architecture. I think it can be meaningful, f we have always a short table which contains the successively applied layer (convolutional (and strides), (un)pooling layers, ReLU functions, skip connections as well as input size of images and resulting feature maps). Can you create a document for the latest architecture, which describe this sequence. With a current description we may discuss with matthias a little bit better.

Data Preprocessing

Initially the network has to work on the images which are totally different to that of the MNIST dataset. Firstly, we have to decrease the resolution of the images by a factor of 4 and using interpolation methods (bilinear interpolation?), to use the proposed architecture.

If the resolution has to be approximately the same like the input, we can decrease the image by a factor of 1.5 or 2 and divide the images in order to get subpatches. This can be a better approach?

Data Augmentation

  • If data with very wide and different appearances are utilized, the network convergence will be difficult and require larger amounts of training data
  • There are millions of parameteres in a deep CNN which require a lot of training data in order to prevent the over-fitting
  • Synthetic data augmentation:
    • Multi-resolution
    • Jittering
    • Rotation (images are rotated by $angle \in [-15,15]$ degree)
    • Translation(images are cropped randomly - RGB and DSM)
    • Flip (images are flipped horizontally and vertically)
    • Color space (each band is multiplied by a random value $value \in [0.5, 1.5]$

I have to use such techniques for further work with other students, so i have to implement such data augmentation methods

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.