Comments (9)
Deep neural nets take a lot of resources to train, and training on a laptop CPU will be slow. Based on my benchmarks here https://github.com/jcjohnson/cnn-benchmarks, a top-of-the-line Pascal Titan X GPU is about 50x to 75x faster than a pair of 8-core server CPUs; your Macbook Pro CPU is either a dual core or quad core, and in either case is much slower than server CPUs. Training a style transfer model takes about 4-6 hours on a Pascal Titan X, so 30 days to train means you are 120x to 180x slower than a Pascal Titan X, which sounds about right.
Large values of -style_image_size
do not affect the training speed. Training with smaller content images will affect the training speed, but to do that you'll need to rerun make_style_dataset.py
with smaller values of -height
and -width
; I haven't tried it, but I'd expect reasonable results if you set these to 128.
40,000 iterations might not be necessary, but it varies a lot from style to style and it's nearly impossible to predict how many iterations are necessary without trying it. For most styles I'd expect you to need at least 10,000 iterations to start getting reasonable results.
To speed up training you can also try modifying the model architecture. The default architecture is pretty big; try adding the flag -arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,u32,u16,c9s1-3
to train.lua
; this will use only half the number of filters per layer, which will be faster to train.
from fast-neural-style.
Thanks.
"-arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,u32,u16,c9s1-3" is the instance normalization, correct?
from fast-neural-style.
Instance norm is controlled with the -use_instance_norm
flag; the default value is 1
which turns on instance norm so you don't need to add any explicit flags to use it.
from fast-neural-style.
Do you mean that "-arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,u32,u16,c9s1-3" model is even smaller than the "instance normalization" based model?
from fast-neural-style.
Any architecture can be used with or without instance normalization.
The models from the paper do not use instance normalization and their architecture is
-arch c9s1-32,d64,d128,R128,R128,R128,R128,R128,u64,u32,c9s1-3
The pretrained models with instance normalization use
-arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,u32,u16,c9s1-3
which uses half as many filters per layer as the models from the paper.
from fast-neural-style.
I saw Nvidia has a more expensive Tesla card. Is that better than the Pascal Titan X?
from fast-neural-style.
@gecong If you are referring to P100 it should be faster than the Pascal Titan X - though it's only available as part of the NVIDIA DGX-1 supercomputer. Priced at $129000 :)
from fast-neural-style.
Training a style transfer model takes about 4-6 hours on a Pascal Titan X
@jcjohnson This is good to hear. I was worried when looking at the readme and can't find info on speed.
from fast-neural-style.
@gecong Pascal Titan X is the top consumer-level card from NVIDIA; they also have Tesla P40 and Tesla P100 GPUs. These cards are designed to be more reliable in a server or datacenter environment.
The Tesla P40 and Pascal Titan X are essentially the same card; they are both based on the GP102 processor, but the Pascal Titan X has only 28 / 30 streaming multiprocessors (SMs) enabled while the P40 has all 30 enabled; the P40 also has a slightly lower base clock rate. I haven't run benchmarks, but I'd expect the P40 to be about 5% faster than a Pascal Titan X.
The Tesla P100 is quite different; it is based on the GP100 processor and has HBM2 memory. The P100 should also perform about the same as a P40 for single-precision operations (which is what we usually use in deep learning) but the P100 has drastically better double and single-precision floating point performance compared to the P40 and Pascal Titan X.
The bottom line is that the Tesla cards would be only marginally faster than a Pascal Titan X for fast-neural-style, so you are best off with a Pascal Titan X.
from fast-neural-style.
Related Issues (20)
- cuda runtime error when training
- invalid argument: /path/to/output/file.h5
- Training image sizes
- train requirements.txt error 'ascii' codec can't decode byte 0xe2 in position 1178
- How to understand the gradient backward propagation of perceptual loss? HOT 1
- did you try it with nvidia jetson tx2 / nvidia xavier hardware?
- cannot open <starry_night.t7>
- Cannot run fast_neural_style.lua script(libjpeg library problem) HOT 1
- Diffrent results on diffrent machines HOT 1
- C# Implementation
- predict result is different when running opencv and fast_neural_style.lua script HOT 1
- get bad results after training candy
- unable to set v4l2 format: Invalid argument
- File.lua: unknown object HOT 2
- What is motivation of using 9x9 conv at first and last layer?
- tanh 150 constant - why? HOT 6
- could we apply fast-neural-transfer to image deformation?
- fork failed: cannot allocate memory HOT 1
- Problems with lua HOT 1
- Does anyone know the original source of the mosaic image?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fast-neural-style.