Code Monkey home page Code Monkey logo

vision_language's Introduction

Vision Language Warehouse

Bridging visual modalities and natural language is a interesting yet challenging task. It attracts more and more research highlights and requires interdisciplinary efforts from Computer Vision, Natural Language Processing and Machine Learning.

This repository contains recent papers, projects and materials on Image Captioning, Text-Image Matching and Text-to-Image Generation.

Content

Image captioning

Template-based methods

VIsual TRAnslator: Linking perceptions and natural language descriptions PDF

Learning visually grounded words and syntax for a scene description task PDF

Every picture tells a story: Generating sentences from images PDF

Babytalk: Understanding and generating simple image descriptions PDF

Deep-learning-based approaches

Show and Tell: A Neural Image Caption Generator (CVPR2015) PDF

Deep Visual-Semantic Alignments for Generating Image Descriptions (CVPR2015) PDF code site

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (ICML2015) PDF code site

Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks (NIPS2015) PDF

Areas of Attention for Image Captioning (ICCV2017) PDF

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning (CVPR2017) PDF code

SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning (CVPR2017) PDF code

Self-critical Sequence Training for Image Captioning (CVPR2017) PDF

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning (AAAI2018) PDF code

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering (CVPR2018) PDF code

Convolutional Image Captioning (CVPR2018) PDF code

Rethinking the Form of Latent States in Image Captioning (ECCV2018) PDF code

Recurrent Fusion Network for Image Captioning (ECCV2018) PDF

Materials

GitHub repositories

pytorch-tutorial/image_captioning

ruotianluo/ImageCaptioning.pytorch

tylin/coco-caption

alecwangcq/show-attend-and-tell

sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning

daveredrum/image-captioning

Docs

Deep Visual-Semantic Alignments for Generating Image Descriptions

Automated Image Captioning

Caption this, with TensorFlow

Soft & hard attention

Text-Image Matching

Cross-modal Retrieval with Correspondence Autoencoder (ACMMM2014) PDF

Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models (arXiv 2014) PDF

Multimodal Convolutional Neural Networks for Matching Image and Sentence (ICCV2015) PDF

Identity-Aware Textual-Visual Matching with Latent Co-attention (ICCV2017) PDF

Instance-aware Image and Sentence Matching with Selective Multimodal LSTM (CVPR2017) PDF

Deep Cross-Modal Projection Learning for Image-Text Matching (ECCV2018) PDF

End-to-end cross-modality retrieval with CCA projections and pairwise ranking loss (JMIR2018) PDF

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models (CVPR2018) PDF

Text-to-Image Generation

Generating Images From Captions with Attention (ICLR2016) PDF code

Learning What and Where to Draw (NIPS2016) PDF code

Generative Adversarial Text to Image Synthesis (ICML2016) PDF code

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks (ICCV2017) PDF code

ChatPainter: Improving Text to Image Generation using Dialogue (arXiv 2018) PDF

AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks (CVPR2018) PDF Code code

Text2Scene: Generating Abstract Scenes from Textual Descriptions (arXiv2018) PDF

vision_language's People

Contributors

daveredrum avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.