Code Monkey home page Code Monkey logo

Comments (11)

JackKelly avatar JackKelly commented on August 29, 2024

the next code (as of the evening of Fri 2021-05-14), can get ~ 50 it/s and only 24 G of RAM usage without NWP loading (with 4 workers).

from predict_pv_yield.

JackKelly avatar JackKelly commented on August 29, 2024

NWPDataInMem.get_sample takes about 70 ms per sample. So with 8 samples per batch, it takes over half a second. That's probably the issue.

The interpolation (even linear) takes a while. Replacing the linear interpolation for ffill decreases the runtime of NWPDataInMem.get_sample from 70 ms to 15 ms. And increases the training speed from about 5 it/s to 15 it/s.

from predict_pv_yield.

JackKelly avatar JackKelly commented on August 29, 2024

hmmm, maybe the issue is that get_nwp_example resamples the entire NWP field (a big image!) Some options to speed it up:

  1. resample to 5-minutely in NWPDataLoader.load_single_chunk()
  2. Only resample data that we need. e.g. maybe the NWPDataInMem.get_sample() would return hourly data, from start.floor('h') to end.ceil('h') and then it'd be up to the Transform to resample, after selecting what we need. I like this idea.

from predict_pv_yield.

JackKelly avatar JackKelly commented on August 29, 2024

I've implemented option 2, and it's helped a lot! NWPDataInMem.get_sample() now takes only 7.26 ms, and the system trains 25 it/sec, with GPU usage hovering around 15%.

from predict_pv_yield.

JackKelly avatar JackKelly commented on August 29, 2024

More things to try:

  • Limit the spatial extent of the satellite imagery. DONE: Reduces size of nwp_in_mem to 14 MB (from 37 MB), and reduces runtime of get_sample to 5.78 ms (from 7.26 ms). Doesn't seem to speed up training much, or reduce mem during training much (with 5 workers, uses 53 GB RAM, and does about 20 it/s).
  • run get_sample() from the 3 AsyncDataLoaders in parallel. Try both threads and processes. Thoughts: Can't spawn child processes from daemonic worker processes. And not sure multiple threads will help because get_sample() is CPU-bound
  • a VM with more RAM, and then add more workers. (10 workers, 8-bit NWP, 32-bit PV uses 78 GB, and gets about 30 it/s, GPU usage of max 22%. 12 workers = 33 it/s, 99.5 GB RAM)
  • use minimal data type for NWP (uint8 for temperature) (DONE: reduces size of nwp_in_mem to 3.6 MB (from 14 MB) and reduces runtime of nwp_in_mem.get_sample() down to 4.6 ms, down from 5.78 ms. Uses about 44 GB RAM during training with 5 workers.)
  • Try again without NWP data, to see the memory usage and the training speed (it/s) and GPU usage. DONE. Without NWP, and with 12 workers, uses 71 GB RAM. Achieves 71 it/s and max GPU utilisation of 40%.
  • Is the PV data using lots of memory? If so, use minimal data type for PV? Share data between processes?!
  • try loading a complete batch at once
  • try using different processes for each data source: Can't spawn child processes from daemonic worker processes!

from predict_pv_yield.

JackKelly avatar JackKelly commented on August 29, 2024

So, we know that including NWP data slows training down by a factor of more the 2x.

get_sample takes 4.3ms for NWP; and takes 1.13ms for sat data. So maybe need to speed up get_sample?

from predict_pv_yield.

JackKelly avatar JackKelly commented on August 29, 2024

Profiling each line in get_nwp_example:

0.179 ms: date_range
1.686 ms: nwp.sel(init_time=target_times_hourly, method=ffill)
0.157 ms: init_time_future
0.043 ms: init_times[target_times_hourly > t0_hourly]
0.216 ms: steps = target_times_hourly - init_times
0.360 ms: init_time_indexer
0.103 ms: step_indexer
1.526 ms: nwp.sel(init_time=init_time_indexer, step=step_indexer)
CPU times: user 7.57 ms, sys: 0 ns, total: 7.57 ms
Wall time: 6.46 ms

from predict_pv_yield.

JackKelly avatar JackKelly commented on August 29, 2024

oooh... looks like it's possible to significantly speed up the selection based on 'step' by first transposing so that 'step' is the first dimension. This gets the runtime down to 1.73 ms if always using the first init_time. Need to see if this speed up holds when using multiple init times based on t0.

from predict_pv_yield.

JackKelly avatar JackKelly commented on August 29, 2024

Nope, doesn't look like transposing gives us the same performance increase when selecting multiple init times.

But, better news: I noticed that, when using NWPs, the code is pretty constantly loading from disk when min_n_samples_per_disk_load = 1000 and max_n_samples_per_disk_load = 2000. Increasing these to 4,000 and 8,000, respectively, gets us up to 50 it/s after 30,000 iterations (yay!) with NWPs, and 12 workers.

To really speed things up, I think we perhaps need to re-create the NWP Zarr, so the data is stored more efficiently on disk (#26).

from predict_pv_yield.

JackKelly avatar JackKelly commented on August 29, 2024

Swapping back to the 'old', more thorough way of getting NWPs, gives us 47.8 it/s

from predict_pv_yield.

JackKelly avatar JackKelly commented on August 29, 2024

Can't launch sub-processes from the worker processes: daemonic child processes aren't allowed to have children :)

from predict_pv_yield.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.