I setup everything on MacBook Pro and started the training with th t

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Training is very slow about fast-neural-style HOT 9 OPEN

jcjohnson commented on September 7, 2024

Training is very slow

from fast-neural-style.

Comments (9)

jcjohnson commented on September 7, 2024 1

Deep neural nets take a lot of resources to train, and training on a laptop CPU will be slow. Based on my benchmarks here https://github.com/jcjohnson/cnn-benchmarks, a top-of-the-line Pascal Titan X GPU is about 50x to 75x faster than a pair of 8-core server CPUs; your Macbook Pro CPU is either a dual core or quad core, and in either case is much slower than server CPUs. Training a style transfer model takes about 4-6 hours on a Pascal Titan X, so 30 days to train means you are 120x to 180x slower than a Pascal Titan X, which sounds about right.

Large values of -style_image_size do not affect the training speed. Training with smaller content images will affect the training speed, but to do that you'll need to rerun make_style_dataset.py with smaller values of -height and -width; I haven't tried it, but I'd expect reasonable results if you set these to 128.

40,000 iterations might not be necessary, but it varies a lot from style to style and it's nearly impossible to predict how many iterations are necessary without trying it. For most styles I'd expect you to need at least 10,000 iterations to start getting reasonable results.

To speed up training you can also try modifying the model architecture. The default architecture is pretty big; try adding the flag -arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,u32,u16,c9s1-3 to train.lua; this will use only half the number of filters per layer, which will be faster to train.

from fast-neural-style.

gecong commented on September 7, 2024

Thanks.

"-arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,u32,u16,c9s1-3" is the instance normalization, correct?

from fast-neural-style.

jcjohnson commented on September 7, 2024

Instance norm is controlled with the -use_instance_norm flag; the default value is 1 which turns on instance norm so you don't need to add any explicit flags to use it.

from fast-neural-style.

gecong commented on September 7, 2024

Do you mean that "-arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,u32,u16,c9s1-3" model is even smaller than the "instance normalization" based model?

from fast-neural-style.

jcjohnson commented on September 7, 2024

Any architecture can be used with or without instance normalization.

The models from the paper do not use instance normalization and their architecture is

-arch c9s1-32,d64,d128,R128,R128,R128,R128,R128,u64,u32,c9s1-3

The pretrained models with instance normalization use

-arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,u32,u16,c9s1-3

which uses half as many filters per layer as the models from the paper.

from fast-neural-style.

gecong commented on September 7, 2024

I saw Nvidia has a more expensive Tesla card. Is that better than the Pascal Titan X?

from fast-neural-style.

lcb931023 commented on September 7, 2024

@gecong If you are referring to P100 it should be faster than the Pascal Titan X - though it's only available as part of the NVIDIA DGX-1 supercomputer. Priced at $129000 :)

from fast-neural-style.

lcb931023 commented on September 7, 2024

Training a style transfer model takes about 4-6 hours on a Pascal Titan X

@jcjohnson This is good to hear. I was worried when looking at the readme and can't find info on speed.

from fast-neural-style.

jcjohnson commented on September 7, 2024

@gecong Pascal Titan X is the top consumer-level card from NVIDIA; they also have Tesla P40 and Tesla P100 GPUs. These cards are designed to be more reliable in a server or datacenter environment.

The Tesla P40 and Pascal Titan X are essentially the same card; they are both based on the GP102 processor, but the Pascal Titan X has only 28 / 30 streaming multiprocessors (SMs) enabled while the P40 has all 30 enabled; the P40 also has a slightly lower base clock rate. I haven't run benchmarks, but I'd expect the P40 to be about 5% faster than a Pascal Titan X.

The Tesla P100 is quite different; it is based on the GP100 processor and has HBM2 memory. The P100 should also perform about the same as a P40 for single-precision operations (which is what we usually use in deep learning) but the P100 has drastically better double and single-precision floating point performance compared to the P40 and Pascal Titan X.

The bottom line is that the Tesla cards would be only marginally faster than a Pascal Titan X for fast-neural-style, so you are best off with a Pascal Titan X.

from fast-neural-style.

Training is very slow about fast-neural-style HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent