elleryqueenhomels / arbitrary_style_transfer Goto Github PK

Fast Neural Style Transfer with Arbitrary Style using AdaIN Layer - Based on Huang et al. "Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization"

License: MIT License

Python 99.17% Shell 0.83%

arbitrary_style_transfer's Introduction

Arbitrary-Style-Transfer

Arbitrary-Style-Per-Model Fast Neural Style Transfer Method

Description

Using an Encoder-AdaIN-Decoder architecture - Deep Convolutional Neural Network as a Style Transfer Network (STN) which can receive two arbitrary images as inputs (one as content, the other one as style) and output a generated image that recombines the content and spatial structure from the former and the style (color, texture) from the latter without re-training the network. The STN is trained using MS-COCO dataset (about 12.6GB) and WikiArt dataset (about 36GB).

This code is based on Huang et al. Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization (ICCV 2017)

System overview. Picture comes from Huang et al. original paper. The encoder is a fixed VGG-19 (up to relu4_1) which is pre-trained on ImageNet dataset for image classification. We train the decoder to invert the AdaIN output from feature spaces back to the image spaces.

Prerequisites

Pre-trained VGG19 normalised network (MD5 c637adfa9cee4b33b59c5a754883ba82)
I have provided a convertor in the tool folder. It can extract kernel and bias from the torch model file (.t7 format) and save them into a npz file which is easier to process via NumPy.
Or you can simply download my pre-processed file:
Pre-trained VGG19 normalised network npz format (MD5 c5c961738b134ffe206e0a552c728aea)
Microsoft COCO dataset
WikiArt dataset

Trained Model

You can download my trained model from here which is trained with style weight equal to 2.0
Or you can directly use download_trained_model.sh in the repo.

Manual

The main file main.py is a demo, which has already contained training procedure and inferring procedure (inferring means generating stylized images).
You can switch these two procedures by changing the flag IS_TRAINING.
By default,
(1) The content images lie in the folder "./images/content/"
(2) The style images lie in the folder "./images/style/"
(3) The weights file of the pre-trained VGG-19 lies in the current working directory. (See Prerequisites above. By the way, download_vgg19.sh already takes care of this.)
(4) The MS-COCO images dataset for training lies in the folder "../MS_COCO/" (See Prerequisites above)
(5) The WikiArt images dataset for training lies in the folder "../WikiArt/" (See Prerequisites above)
(6) The checkpoint files of trained models lie in the folder "./models/" (You should create this folder manually before training.)
(7) After inferring procedure, the stylized images will be generated and output to the folder "./outputs/"
For training, you should make sure (3), (4), (5) and (6) are prepared correctly.
For inferring, you should make sure (1), (2), (3) and (6) are prepared correctly.
Of course, you can organize all the files and folders as you want, and what you need to do is just modifying related parameters in the main.py file.

Results

style	output (generated image)

My Running Environment

Hardware

CPU: Intel® Core™ i9-7900X (3.30GHz x 10 cores, 20 threads)
GPU: NVIDIA® Titan Xp (Architecture: Pascal, Frame buffer: 12GB)
Memory: 32GB DDR4

Operating System

ubuntu 16.04.03 LTS

Software

Python 3.6.2
NumPy 1.13.1
TensorFlow 1.3.0
SciPy 0.19.1
CUDA 8.0.61
cuDNN 6.0.21

References

The Encoder which is implemented with first few layers(up to relu4_1) of a pre-trained VGG-19 is based on Anish Athalye's vgg.py

Citation

  @misc{ye2017arbitrarystyletransfer,
    author = {Wengao Ye},
    title = {Arbitrary Style Transfer},
    year = {2017},
    publisher = {GitHub},
    journal = {GitHub repository},
    howpublished = {\url{https://github.com/elleryqueenhomels/arbitrary_style_transfer}}
  }

arbitrary_style_transfer's People

Contributors

Stargazers

Watchers

Forkers

hli1221 hpjmlgy willpansutd ml-lab auzyze xueyangfu xuanhan863 githublzb sarathknv lexmao dedekinds eridgd jianqiangren hedeyu zhaojsh fjchange yongxiongwei jinghongmiao gabeochieng ahaoboy lp249839965 angus1996 honwaii haolucifer richardyang dokumushikun xingxl ymiyu jbmcgill zeitgeistqian ppronchenko maciej3031 ankushswar1 graceguor hyojunguy vincentwei0919 johshisha yuchuangou congzhengithub yakuzeng iamjunwei rock913 robinwenqian tanavinet binahhu rennywan khanhlvg uranc shikhar2562 xiusdk drminix opcheese wuyx jien-0 c00renut jackyzhong260 mellivorapku ipsita2105 ruanjiyang sheng00125 johndpope meteyildirim rbnhd int-fanle silyfox lcsouzamenezes beijie001 aidenmei firewings2515 magicalking cstichbury evangottschalk elijahahianyo fskeo s8xy umozeo farmingtong wensiyuansix jbluv awekling spicyguml hay-man iam20cm coder-drinker nicbair masemxiao paramedick maigone 0x8235 xupercoin makroya billionerd ntt720 e-kiss-me tutuna nicolesherwood zaku-zaku obsidian6s d3p10y tufo830

arbitrary_style_transfer's Issues

Is it possible to run this on mobile device?

What is the speed compared to Fast style transfer ?
Also is it possible to shrink the size of the model, since model + vgg19 is like 200mb, which is too much for mobile devices ?

How should I train my own model?

hello,
I have a question about training a new model,
and I want to know how I would train my own model, I always get the error , and I have no idea why.
I would appreciate if you can tell me how to do this, thank you @elleryqueenhomels

Resource Exhausted Error while testing

 ResourceExhaustedError` (see above for traceback): OOM when allocating tensor with shape[1,64,1600,1330] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

	 [[node Conv2D_10 (defined at /localhome/prathmeshmadhu/work/EFI/Code/arbitrary_style_transfer/encoder.py:95)  = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Conv2D_10-0-TransposeNHWCToNCHW-LayoutOptimizer, encoder/conv1_2/kernel/read)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[{{node clip_by_value/_77}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_487_clip_by_value", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

I am trying to generate some 40000+ different stylized images and it generates some 220+ images and then always throws this resources error. My hardware details are as follows :

GPU : GeForce GTX 970, 4GB.
CPU : 8 cores, 64 GB.

Do let me know if you need any further details.
Thanks.

The results of this program which I got was not right.

Firstly, thank you very much. This program is very useful and easy to understand.
But after I train this network, I did not get right result. I just got an image with single color.
I change the BATCH_SIZE = 2, and use 10000 images(content and style) to train.
I do not know where I was wrong.
I am very grateful if you have any advice. Thank you.

when I train the model,why is it always interrupted?

I downloaded the wikiart and MSCOCO dataset. The numger of them are 80k and 110k respectively. And I preprocess the content images and the style images(wikiart dataset has some pictures that are damaged,but MSCOCO not). So I begin to train the model, but it is always interrupted before ending up with the first epoch. Can you give me some advice?
The wrong results are:(one epoch should have about 9900steps)

step: 5620, total loss: 2517.050, elapsed time: 0:54:17.473793
content loss: 2478.894
style loss : 3815.609, weighted style loss: 38.156

Something wrong happens! Current model is saved to <D:\wxd\arbitrary_style_transfer-master_0\tmp_model>

Done training! Elapsed time: 0:54:30.136937
Model is saved to: model_retrain/style_weight_1e-2.ckpt

Successfully! Done all training...

we end at 2019-01-06 10:24:25.703097

yeah, I think your suggestion is right. And I will have a try. Thank you very much. And I still have the question: when you train the model, the loss is big as mine? The content loss is about 2000-5000 and it seems never converged.

yeah, I think your suggestion is right. And I will have a try. Thank you very much. And I still have the question: when you train the model, the loss is big as mine? The content loss is about 2000-5000 and it seems never converged.

Originally posted by @wxd19961228 in #8 (comment)

Is it possible to choose the layers ?

Is it possible to choose wich layers of the vgg decoders corresponds to contents and syle while using a pretrained decoder ?

Converting to BGR in style_transfer_net.py

Hey,

I was going through the code and I cant understand why you convert the inputs from RGB to BGR.

Would be grateful for any help,

Best

VGG Model File Source (And a question)

Two questions:

Where can I find the code for training the vgg19 model that is available for download in the repositories README?
During training, are the weights of the VGG model updated, or is that signal kept the same? The project I'm working on is style transfer for webpage screenshots. I suspect that I need to train a model for classifying webpages before I can do style transfer. That's why it would be useful to know how you trained this VGG model, so that I can do the same (with my own dataset).

Many thanks, great repo!

For README display

Just use for README display