Paper Collection - A List of Computer Vision Papers and Notes

Image Classification
Popular Module
Object Detection in Image
Image Caption
Image Generations
Image and Language
Activation Maximization
Style Transfer
Super Resolution
Image Segmentation
Open Courses
Online Books

Image Classification:

Network in Network [Paper] [Note] [Torch Code]

Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400 (2013).

VGG [Paper] [Note] [Torch Code]

Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).

GoogleNet [Paper] [Note] [Torch Code]

Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

ResNet [Paper] [Note] [Torch Code]

He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

Popular Module

Dropout [Paper] [Note]

Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of Machine Learning Research 15.1 (2014): 1929-1958.

Batch Normalization [Paper] [Note]

Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015.

Object Detection in Image

RCNN [Paper] [Note] [Code]

Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, Rich feature hierarchies for accurate object detection and semantic segmentation

Spatial pyramid pooling in deep convolutional networks for visual recognition [[Paper]] (http://arxiv.org/abs/1406.4729) [Note] [Code]

He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2015, 37(9): 1904-1916.

Fast R-CNN [[Paper]] (http://arxiv.org/pdf/1504.08083) [Note] [Code]

Ross Girshick, Fast R-CNN, arXiv:1504.08083.

Faster R-CNN, Microsoft Research [[Paper]] (http://arxiv.org/pdf/1506.01497) [Note] [Code] [Python Code]

Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, arXiv:1506.01497.

End-to-end people detection in crowded scenes [[Paper]] (http://arxiv.org/abs/1506.04878) [Note] [Code]

Russell Stewart, Mykhaylo Andriluka, End-to-end people detection in crowded scenes, arXiv:1506.04878.

You Only Look Once: Unified, Real-Time Object Detection [[Paper]] (http://arxiv.org/abs/1506.02640) [Note] [Code]

Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, You Only Look Once: Unified, Real-Time Object Detection, arXiv:1506.02640

Adaptive Object Detection Using Adjacency and Zoom Prediction [[Paper]] (http://arxiv.org/abs/1512.07711) [Note]

Lu Y, Javidi T, Lazebnik S. Adaptive Object Detection Using Adjacency and Zoom Prediction[J]. arXiv:1512.07711, 2015.

Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [Paper] [Note]

Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick. arXiv:1512.04143, 2015.

G-CNN: an Iterative Grid Based Object Detector [Paper]

Mahyar Najibi, Mohammad Rastegari, Larry S. Davis. arXiv:1512.07729, 2015.

Seq-NMS for Video Object Detection [Paper] [Note]

Wei Han, Pooya Khorrami, Tom Le Paine, Prajit Ramachandran, Mohammad Babaeizadeh, Honghui Shi, Jianan Li, Shuicheng Yan, Thomas S. Huang. Seq-NMS for Video Object Detection. arXiv preprint arXiv:1602.08465, 2016

Image Caption

Exploring Nearest Neighbor Approaches for Image Captioning [Paper]

Devlin J, Gupta S, Girshick R, et al. Exploring Nearest Neighbor Approaches for Image Captioning[J]. arXiv preprint arXiv:1505.04467, 2015.

Show and Tell: A Neural Image Caption Generator [Paper] [Note]

Vinyals, Oriol, et al. "Show and tell: A neural image caption generator." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

Image Generations:

Pixel Recurrent Neural Networks [Paper] [Note]

van den Oord A, Kalchbrenner N, Kavukcuoglu K. Pixel Recurrent Neural Networks[J]. arXiv preprint arXiv:1601.06759, 2016.

Variational Autoencoder [Paper] [Note]

Kingma D P, Welling M. Auto-encoding variational bayes[J]. arXiv preprint arXiv:1312.6114, 2013.

DRAW: A recurrent neural network for image generation [Paper] [Torch Code] [Tensorflow Code] [Note]

Gregor K, Danihelka I, Graves A, et al. DRAW: A recurrent neural network for image generation[J]. arXiv preprint arXiv:1502.04623, 2015.

Scribbler: Controlling Deep Image Synthesis with Sketch and Color [Paper] [Note]

Patsorn Sangkloy, Jingwan Lu, et al. Scribbler: Controlling Deep Image Synthesis with Sketch and Color. arXiv preprint arXiv:1612.00835, 2016.

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [Paper]

Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks[J]. arXiv preprint arXiv:1511.06434, 2015.

Improved Techniques for Training GANs [Paper]

Salimans T, Goodfellow I, Zaremba W, et al. Improved Techniques for Training GANs[J]. arXiv preprint arXiv:1606.03498, 2016.

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets[Paper]

Chen X, Duan Y, Houthooft R, et al. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets[J]. arXiv preprint arXiv:1606.03657, 2016.

Image-to-Image Translation with Conditional Adversarial Networks [Paper] [Note] [Torch Code] [Tensorflow Code]

Isola P, Zhu J Y, Zhou T, et al. Image-to-Image Translation with Conditional Adversarial Networks[J]. arXiv preprint arXiv:1611.07004, 2016.

Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts [Paper] [Note]

Levent Karacan, Zeynep Akata, Aykut Erdem, Erkut Erdem. Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts [J]. arXiv preprint arXiv:1612.00215, 2016.

Learning to Discover Cross-Domain Relations with Generative Adversarial Networks [Paper] [Note]

Kim, Taeksoo, et al. "Learning to Discover Cross-Domain Relations with Generative Adversarial Networks." arXiv preprint arXiv:1703.05192 (2017).

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks [Paper] [Note]

Zhu J Y, Park T, Isola P, et al. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks[J]. arXiv preprint arXiv:1703.10593, 2017.

BEGAN: Boundary Equilibrium Generative Adversarial Networks [Paper] [Note]

Berthelot, David, Tom Schumm, and Luke Metz. "BEGAN: Boundary Equilibrium Generative Adversarial Networks." arXiv preprint arXiv:1703.10717 (2017).

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks [Paper] [Note] [Tensorflow Code]

Zhang, Han, et al. "StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks." arXiv preprint arXiv:1612.03242 (2016).

Invertible Conditional GANs for image editing [Paper] [Note]

Perarnau G, van de Weijer J, Raducanu B, et al. Invertible Conditional GANs for image editing[J]. arXiv preprint arXiv:1611.06355, 2016.

Stacked Generative Adversarial Networks [Paper] [Note]

Huang X, Li Y, Poursaeed O, et al. Stacked generative adversarial networks[J]. arXiv preprint arXiv:1612.04357, 2016.

Rotating Your Face Using Multi-task Deep Neural Network [Paper] [Note]

Yim J, Jung H, Yoo B I, et al. Rotating your face using multi-task deep neural network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 676-684.

Image and Language

Learning Deep Representations of Fine-Grained Visual Descriptions [Paper] [Note]

Reed, Scott, et al. "Learning deep representations of fine-grained visual descriptions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

Activation Maximization

Synthesizing the preferred inputs for neurons in neural networks via deep generator networks [Paper] [Note]

Nguyen A, Dosovitskiy A, Yosinski J, et al. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks[J]. arXiv preprint arXiv:1605.09304, 2016.

Style Transfer

A neural algorithm of artistic style [Paper] [Note]

Gatys L A, Ecker A S, Bethge M. A neural algorithm of artistic style[J]. arXiv preprint arXiv:1508.06576, 2015.

Perceptual losses for real-time style transfer and super-resolution [Paper] [Note]

Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and super-resolution[J]. arXiv preprint arXiv:1603.08155, 2016.

Preserving Color in Neural Artistic Style Transfer [Paper] [Note] [Pytorch Code]

Gatys, Leon A., et al. "Preserving color in neural artistic style transfer." arXiv preprint arXiv:1606.05897 (2016).

A Learned Representation For Artistic Style [Paper] [Note] [Tensorflow Code] [Lasagne Code]

Dumoulin, Vincent, Jonathon Shlens, and Manjunath Kudlur. "A learned representation for artistic style." (2017).

Demystifying Neural Style Transfer [Paper]

Li, Yanghao, et al. "Demystifying Neural Style Transfer." arXiv preprint arXiv:1701.01036 (2017).

Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization [Paper]

Huang, Xun, and Serge Belongie. "Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization." arXiv preprint arXiv:1703.06868 (2017).

Fast Patch-based Style Transfer of Arbitrary Style [Paper]

Chen, Tian Qi, and Mark Schmidt. "Fast Patch-based Style Transfer of Arbitrary Style." arXiv preprint arXiv:1612.04337 (2016).

Low-level vision

Texture Enhancement via High-Resolution Style Transfer for Single-Image Super-Resolution [Paper] [Note]

Il Jun Ahn, Woo Hyun Nam. Texture Enhancement via High-Resolution Style Transfer for Single-Image Super-Resolution [J]. arXiv preprint arXiv:1612.00085, 2016.

Deep Joint Image Filtering [Paper] [Note]

Li Y, Huang J B, Ahuja N, et al. Deep joint image filtering[C]//European Conference on Computer Vision. Springer International Publishing, 2016: 154-169.

Image Segmentation

Fully convolutional networks for semantic segmentation [Paper] [Note]

Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3431-3440.

Video Editing

Deep Video Color Propagation [Paper] [Note]

Meyer S, Cornillère V, Djelouah A, et al. Deep Video Color Propagation. BMVC 2018.

Deep Matching

AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching [Paper] [Note]

Novotný D, Larlus D, Vedaldi A. AnchorNet: A Weakly Supervised Network to Learn Geometry-Sensitive Features for Semantic Matching, CVPR. 2017

Open Courses

CS231n: Convolutional Neural Networks for Visual Recognition [Course Page]
CS224d: Deep Learning for Natural Language Processing [Course Page]

Online Books

Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Mathmatics

Introduction to Probability Models, Sheldon M. Ross

Misc

k-means++: The advantages of careful seeding [Paper] [Note]

Arthur D, Vassilvitskii S. k-means++: The advantages of careful seeding[C]//Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics, 2007: 1027-1035.

sunshineatnoon / paper-collection Goto Github PK

paper-collection's Introduction