Code Monkey home page Code Monkey logo

Comments (9)

jcjohnson avatar jcjohnson commented on September 7, 2024 1

Deep neural nets take a lot of resources to train, and training on a laptop CPU will be slow. Based on my benchmarks here https://github.com/jcjohnson/cnn-benchmarks, a top-of-the-line Pascal Titan X GPU is about 50x to 75x faster than a pair of 8-core server CPUs; your Macbook Pro CPU is either a dual core or quad core, and in either case is much slower than server CPUs. Training a style transfer model takes about 4-6 hours on a Pascal Titan X, so 30 days to train means you are 120x to 180x slower than a Pascal Titan X, which sounds about right.

Large values of -style_image_size do not affect the training speed. Training with smaller content images will affect the training speed, but to do that you'll need to rerun make_style_dataset.py with smaller values of -height and -width; I haven't tried it, but I'd expect reasonable results if you set these to 128.

40,000 iterations might not be necessary, but it varies a lot from style to style and it's nearly impossible to predict how many iterations are necessary without trying it. For most styles I'd expect you to need at least 10,000 iterations to start getting reasonable results.

To speed up training you can also try modifying the model architecture. The default architecture is pretty big; try adding the flag -arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,u32,u16,c9s1-3 to train.lua; this will use only half the number of filters per layer, which will be faster to train.

from fast-neural-style.

gecong avatar gecong commented on September 7, 2024

Thanks.

"-arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,u32,u16,c9s1-3" is the instance normalization, correct?

from fast-neural-style.

jcjohnson avatar jcjohnson commented on September 7, 2024

Instance norm is controlled with the -use_instance_norm flag; the default value is 1 which turns on instance norm so you don't need to add any explicit flags to use it.

from fast-neural-style.

gecong avatar gecong commented on September 7, 2024

Do you mean that "-arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,u32,u16,c9s1-3" model is even smaller than the "instance normalization" based model?

from fast-neural-style.

jcjohnson avatar jcjohnson commented on September 7, 2024

Any architecture can be used with or without instance normalization.

The models from the paper do not use instance normalization and their architecture is

-arch c9s1-32,d64,d128,R128,R128,R128,R128,R128,u64,u32,c9s1-3

The pretrained models with instance normalization use

-arch c9s1-16,d32,d64,R64,R64,R64,R64,R64,u32,u16,c9s1-3

which uses half as many filters per layer as the models from the paper.

from fast-neural-style.

gecong avatar gecong commented on September 7, 2024

I saw Nvidia has a more expensive Tesla card. Is that better than the Pascal Titan X?

from fast-neural-style.

lcb931023 avatar lcb931023 commented on September 7, 2024

@gecong If you are referring to P100 it should be faster than the Pascal Titan X - though it's only available as part of the NVIDIA DGX-1 supercomputer. Priced at $129000 :)

from fast-neural-style.

lcb931023 avatar lcb931023 commented on September 7, 2024

Training a style transfer model takes about 4-6 hours on a Pascal Titan X

@jcjohnson This is good to hear. I was worried when looking at the readme and can't find info on speed.

from fast-neural-style.

jcjohnson avatar jcjohnson commented on September 7, 2024

@gecong Pascal Titan X is the top consumer-level card from NVIDIA; they also have Tesla P40 and Tesla P100 GPUs. These cards are designed to be more reliable in a server or datacenter environment.

The Tesla P40 and Pascal Titan X are essentially the same card; they are both based on the GP102 processor, but the Pascal Titan X has only 28 / 30 streaming multiprocessors (SMs) enabled while the P40 has all 30 enabled; the P40 also has a slightly lower base clock rate. I haven't run benchmarks, but I'd expect the P40 to be about 5% faster than a Pascal Titan X.

The Tesla P100 is quite different; it is based on the GP100 processor and has HBM2 memory. The P100 should also perform about the same as a P40 for single-precision operations (which is what we usually use in deep learning) but the P100 has drastically better double and single-precision floating point performance compared to the P40 and Pascal Titan X.

The bottom line is that the Tesla cards would be only marginally faster than a Pascal Titan X for fast-neural-style, so you are best off with a Pascal Titan X.

from fast-neural-style.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.