Code Monkey home page Code Monkey logo

gordicaleksa / pytorch-neural-style-transfer Goto Github PK

View Code? Open in Web Editor NEW
347.0 8.0 76.0 37.11 MB

Reconstruction of the original paper on neural style transfer (Gatys et al.). I've additionally included reconstruction scripts which allow you to reconstruct only the content or the style of the image - for better understanding of how NST works.

Home Page: https://youtube.com/c/TheAIEpiphany

License: MIT License

Python 100.00%
neural-style-transfer machine-learning deep-learning python pytorch non-photorealistic-rendering neural-style-transfer-pytorch neural-style-transfer-tutorial deep-learning-tutorial style-transfer

pytorch-neural-style-transfer's Introduction

Neural Style Transfer (optimization method) 💻 + 🎨 = ❤️

This repo contains a concise PyTorch implementation of the original NST paper (:link: Gatys et al.).

It's an accompanying repository for this video series on YouTube.

NST Intro

What is NST algorithm?

The algorithm transfers style from one input image (the style image) onto another input image (the content image) using CNN nets (usually VGG-16/19) and gives a composite, stylized image out which keeps the content from the content image but takes the style from the style image.

Why yet another NST repo?

It's the cleanest and most concise NST repo that I know of + it's written in PyTorch! ❤️

Most of NST repos were written in TensorFlow (before it even had L-BFGS optimizer) and torch (obsolete framework, used Lua) and are overly complicated often times including multiple functionalities (video, static image, color transfer, etc.) in 1 repo and exposing 100 parameters over command-line (out of which maybe 5 or 6 may actually be used on a regular basis).

Examples

Transfering style gives beautiful artistic results:

And here are some results coupled with their style:

Note: all of the stylized images were produced by me (using this repo), credits for original image artists are given bellow.

Content/Style tradeoff

Changing style weight gives you less or more style on the final image, assuming you keep the content weight constant.
I did increments of 10 here for style weight (1e1, 1e2, 1e3, 1e4), while keeping content weight at constant 1e5, and I used random image as initialization image.

Impact of total variation (tv) loss

Rarely explained, the total variation loss i.e. it's corresponding weight controls the smoothness of the image.
I also did increments of 10 here (1e1, 1e4, 1e5, 1e6) and I used content image as initialization image.

Optimization initialization

Starting with different initialization images: noise (white or gaussian), content and style leads to different results.
Empirically content image gives the best results as explored in this research paper also.
Here you can see results for content, random and style initialization in that order (left to right):

You can also see that with style initialization we had some content from the artwork leaking directly into our output.

Famous "Figure 3" reconstruction

Finally if I haven't included this portion you couldn't say that I've successfully reproduced the original paper (laughs in Python):

I haven't give it much effort results can be much nicer.

Content reconstruction

If we only use the content (perceptual) loss and try to minimize that objective function this is what we get (starting from noise):

In steps 0, 26, 70 and 509 of the L-BFGS numerical optimizer, using layer relu3_1 for content representation.
Check-out this section if you want to play with this.

Style reconstruction

We can do the same thing for style (on the left is the original art image "Candy") starting from noise:

In steps 45, 129 and 510 of the L-BFGS using layers relu1_1, relu2_1, relu3_1, relu4_1 and relu5_1 for style representation.

Setup

  1. Open Anaconda Prompt and navigate into project directory cd path_to_repo
  2. Run conda env create (while in project directory)
  3. Run activate pytorch-nst

That's it! It should work out-of-the-box executing environment.yml file which deals with dependencies.


PyTorch package will pull some version of CUDA with it, but it is highly recommended that you install system-wide CUDA beforehand, mostly because of GPU drivers. I also recommend using Miniconda installer as a way to get conda on your system.

Follow through points 1 and 2 of this setup and use the most up-to-date versions of Miniconda (Python 3.7) and CUDA/cuDNN. (I recommend CUDA 10.1 as it is compatible with PyTorch 1.4, which is used in this repo, and newest compatible cuDNN)

Usage

  1. Copy content images to the default content image directory: /data/content-images/
  2. Copy style images to the default style image directory: /data/style-images/
  3. Run python neural_style_transfer.py --content_img_name <content-img-name> --style_img_name <style-img-name>

It's that easy. For more advanced usage take a look at the code it's (hopefully) self-explanatory (if you speak Python ^^).

Or take a look at this accompanying YouTube video, it explains how to use this repo in greater detail.

Just run it! So that you can get something like this: ❤️

Debugging/Experimenting

Q: L-BFGS can't run on my computer it takes too much GPU VRAM?
A: Set Adam as your default and take a look at the code for initial style/content/tv weights you should use as a start point.

Q: Output image looks too much like style image?
A: Decrease style weight or take a look at the table of weights (in neural_style_transfer.py), which I've included, that works.

Q: There is too much noise (image is not smooth)?
A: Increase total variation (tv) weight (usually by multiples of 10, again the table is your friend here or just experiment yourself).

Reconstruct image from representation

I've also included a file that will help you better understand how the algorithm works and what the neural net sees.
What it does is that it allows you to visualize content (feature maps) and style representations (Gram matrices).
It will also reconstruct either only style or content using those representations and corresponding model that produces them.

Just run this:
reconstruct_image_from_representation.py --should_reconstruct_content <Bool> --should_visualize_representation <Bool>

And that's it! --should_visualize_representation if set to True will visualize these for you
--should_reconstruct_content picks between style and content reconstruction

Here are some feature maps (relu1_1, VGG 19) as well as a Gram matrix (relu2_1, VGG 19) for Van Gogh's famous starry night:

No more dark magic.

Acknowledgements

I found these repos useful: (while developing this one)

I found some of the content/style images I was using here:

Other images are now already classics in the NST world.

Citation

If you find this code useful for your research, please cite the following:

@misc{Gordić2020nst,
  author = {Gordić, Aleksa},
  title = {pytorch-neural-style-transfer},
  year = {2020},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/gordicaleksa/pytorch-neural-style-transfer}},
}

Connect with me

If you'd love to have some more AI-related content in your life 🤓, consider:

Licence

License: MIT

pytorch-neural-style-transfer's People

Contributors

gordicaleksa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-neural-style-transfer's Issues

Getting to work with CUDA 11.1

Hi Aleksa,

thanks for this fantastic python implementation. I tried to get it to run under Windows 10 with an NVIDIA GPU.
I am failing with the error:

Traceback (most recent call last):
  File "neural_style_transfer.py", line 183, in <module>
    results_path = neural_style_transfer(optimization_config)
  File "neural_style_transfer.py", line 64, in neural_style_transfer
    content_img = utils.prepare_img(content_img_path, config['height'], device)
  File "D:\pytorch-neural-style-transfer\utils\utils.py", line 51, in prepare_img
    img = transform(img).to(device).unsqueeze(0)
  File "C:\Users\Sebastian\anaconda3\envs\pytorch-nst\lib\site-packages\torch\cuda\__init__.py", line 196, in _lazy_init
    _check_driver()
  File "C:\Users\Sebastian\anaconda3\envs\pytorch-nst\lib\site-packages\torch\cuda\__init__.py", line 94, in _check_driver
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

my nvidia-smi provides

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 456.81       Driver Version: 456.81       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 970    WDDM  | 00000000:01:00.0  On |                  N/A |
|  0%   42C    P8    17W / 200W |    268MiB /  4096MiB |      4%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

For me it seems that the version of the installed cuda toolkit and the version that pytorch requires are different?
Sorry, I am just about to start to learn with python an cuda....

Thank you

Sebastian

Why are there better results when using images in range [0, 255] instead of [0, 1]?

I was running into issues trying to re-create the original paper, and stumbled upon this repository.

I was able to re-create the results when using the caffe pretrained model (which has images in the range of [0, 255]), but had drastically different results when using pytorch's pretrained model (which has images in the range of [0, 1]). I noticed this tidbit of code in your repository:

# normalize using ImageNet's mean
# [0, 255] range worked much better for me than [0, 1] range (even though PyTorch models were trained on latter)
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Lambda(lambda x: x.mul(255)),
transforms.Normalize(mean=IMAGENET_MEAN_255, std=IMAGENET_STD_NEUTRAL)
])

I applied that same transformation, and got results that are comparable to the original paper. I am somewhat confused about why this works, though. If pytorch's vgg19 is trained on millions of images in the range of [0, 1], wouldn't it just interpret anything above 1 as being pure white?

Support for Nvidia Ampere cards

Just an FYI, I was playing around with this repo all morning (awesome work!) and was wondering why it was so slow. It turns out that it was using CPU and not CUDA/GPU. (It would be nice if you spammed a warning for others in case they are not as stubborn as me) I am using a newer Ampere card that needs CUDA 11, so I had to solve the issue as shown below. I hope this might help others. Note that this is similar to another closed issue by somebody else, however I had to make further changes such as adding cudatoolkit=11.1 to mine.

name: pytorch-nst
channels:
  - defaults
  - pytorch
  - conda-forge
dependencies:
  - cudatoolkit=11.1
  - python=3.7.6
  - pip=20.0.2
  - matplotlib=3.1.3
  - torchvision
  - torchaudio
  - pip:
    - numpy==1.18.1
    - opencv-python==4.2.0.32
    - conda-forge::pytorch

generated image could be better

I've run through the codes and was able to generate images by running the model. I realize that the images generated were sort of over exposed. I modified the codes to use pytorch's own save_image function, instead of opencv as was used in the codes, and found that I could achieve better results.

using a mean and std of:
mean = np.asarray([ 0.485, 0.456, 0.406 ])
std = np.asarray([ 0.229, 0.224, 0.225 ])
I used this transforms for loading:
transforms.ToTensor(),
transforms.Normalize(mean, std, inplace=True), # normalize image based on mean and std of ImageNet dataset
transforms.Lambda(lambda x: x.mul(255.))

and these transforms for saving:
transforms.Lambda(lambda x: x.div(255.) ),
transforms.Normalize((-1 * mean / std), (1.0 / std),inplace=True)

saving to file uses torch.utils.save_image function which accepts tensor.

couple of thing that puzzles me:

  1. running the training is very fast. 3000 epochs completed within 6+ mins. I could not figure out how you made it so fast. could you please advice me?
    1. loss backward doesn't require retain_graph=True. I independently wrote one myself and it gave error during training, about the graph being freed and calling basckward a second time. I only managed to train it by setting retain_graph to True. Could you please enlighten me why you could reuse the content and style outputs for the respective images, and only generates the outputs for the optimizing_umg and yet don't need to set retain_graph to True?

Changing height value doesn't produce any results.

Editing the neural_style_transfer.py and changing the default value for --height or using --height value on the command line produces no end result. The data/output-images folder created is blank.

Edited the default value and change it to 1280 for a photo I wanted to use. I have a Titan RTX

(pytorch-nst) gateway@gateway-media:~/work/ns/pytorch-neural-style-transfer$ python neural_style_transfer.py --content_img_name i1.jpg --style_img_name tre.jpg
Using vgg19 in the optimization procedure.
L-BFGS | iteration: 000, total loss=2920575401984.0000, content_loss=      0.0000, style loss=2920527360000.0000, tv loss=48057560.0000
L-BFGS | iteration: 001, total loss=2920575401984.0000, content_loss=      0.0001, style loss=2920527360000.0000, tv loss=48057560.0000
L-BFGS | iteration: 002, total loss=2920575401984.0000, content_loss=      0.0001, style loss=2920527360000.0000, tv loss=48057560.0000
L-BFGS | iteration: 003, total loss=2920575401984.0000, content_loss=      0.0001, style loss=2920527360000.0000, tv loss=48057560.0000
L-BFGS | iteration: 004, total loss=2920575401984.0000, content_loss=      0.0001, style loss=2920527360000.0000, tv loss=48057560.0000

Uses the command line switch

(pytorch-nst) gateway@gateway-media:~/work/ns/pytorch-neural-style-transfer$ python neural_style_transfer.py --content_img_name i1.jpg --style_img_name tre.jpg --height 1280
Using vgg19 in the optimization procedure.
L-BFGS | iteration: 000, total loss=10584828411904.0000, content_loss=      0.0000, style loss=10584816000000.0000, tv loss=12864146.0000
L-BFGS | iteration: 001, total loss=10584828411904.0000, content_loss=      0.0002, style loss=10584816000000.0000, tv loss=12864146.0000
L-BFGS | iteration: 002, total loss=10584828411904.0000, content_loss=      0.0002, style loss=10584816000000.0000, tv loss=12864146.0000
L-BFGS | iteration: 003, total loss=10584828411904.0000, content_loss=      0.0002, style loss=10584816000000.0000, tv loss=12864146.0000
L-BFGS | iteration: 004, total loss=10584828411904.0000, content_loss=      0.0002, style loss=10584816000000.0000, tv loss=12864146.0000
(pytorch-nst) gateway@gateway-media:~/work/ns/pytorch-neural-style-transfer$ 

nvidia-smi

Tue Oct 13 16:35:56 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  TITAN RTX           Off  | 00000000:01:00.0 Off |                  N/A |
| 41%   29C    P8    15W / 280W |    292MiB / 24220MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 00000000:02:00.0 Off |                  N/A |
| 21%   33C    P8     5W / 180W |      2MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2061      G   /usr/lib/xorg/Xorg                191MiB |
|    0   N/A  N/A      2745      G   ...mviewer/tv_bin/TeamViewer       13MiB |
|    0   N/A  N/A      2949      G   /usr/bin/gnome-shell               83MiB |
+-----------------------------------------------------------------------------+

The Meaning of Total Variation Loss

I found an explanation of the total variation loss from the tutorial on the official website of TensorFlow. It says that total variation loss is a regularization term that can decrease the high-freqency artifacts of the output image. Neural Style Transfer, without total variation loss, tend to increase the high frequency components of the content image. Maybe we can add this to the Readme file

Numpy version mismatch

I am having an issue running reconstruct code:

RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd

I ran the pip install numpy --upgrade command which did not help.

May I ask you to post the conda requirements.txt file?

Thank you

VGG pre-training and fine-tuning

Hello everyone,
I would like to consult the image dataset used for the VGG pre-training model (vgg19.path) you provided? I am currently in the preliminary stages of deep learning and I am not sure if I need to fine-tuning it for our task? Or did I not notice/understand that fine-tuning is included in your code.

How fast is it

Hey,
i am really interested in neural networks, especially running them on webcams.
How fast is your implementation on a decent GPU?
Best Regards,
JoJo

Style Transfer usage for Generative Art

Would it be possible to use this or a similar library to - instead of transferring a style to another image - build an ML model that learns the style characteristics (color, shapes, objects, etc) of a collection of inputs (e.g., N number of similar art pieces by a particular artist) which can then be used to output Y number of unique generative images? In other words, the algo transfers the style from the physical to the digital as a set of configurable parameters for generating art. This would allow non-technical artists to enter the generative space.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.