Tag Nano, Re this <a href="https://github.com/gslab-econ/blp-instruments/issues/201#is

BLP #201 Updates about template HOT 3 CLOSED

ShiqiYang2022 commented on September 2, 2024

BLP #201 Updates

from template.

Comments (3)

ShiqiYang2022 commented on September 2, 2024

Tag Nano, Re this https://github.com/gslab-econ/blp-instruments/issues/201#issuecomment-1888211730:

I implemented the solution you suggested in 1b6333c. The memory used improved marginally, because the exponent value in these lines are not big enough. That indicates that approximating exponential function in Matlab itself is memory-costing even when we calculate with small exponent number. Hence, my conclusion is after 1b6333c, we already optimized the code at this calculation, and there's less space to further reduce that memory usage.

from template.

ShiqiYang2022 commented on September 2, 2024

Tag Jesse Nano, I am summarizing what we have learned per goal of https://github.com/gslab-econ/blp-instruments/issues/201#issue-2018943946:

Per Goals of this issue:

Rerun again /analysis_cluster/ to further test our changes in #195. See here.
Revise the code to only profile 3 RCNL estimations in https://github.com/gslab-econ/blp-instruments/issues/195#issuecomment-1823738708. See here.
Understand where we experienced the peak demand of computing resources(memory).
Conclusion: The memory peaks when --
(1) Use splitapply() function to calculate derivatives related to nonlinear parameters (D_nonlin).
(2) Calculating matrices inverse. Specifically, it peaks when we calculate derivatives in rcnl_der() and the derivative of the utility in rcnl_ddelta(); which involves matrices inverse.
(3) Seldomly, some odd spots where I cannot explain why it consumes lot of memory.
Understand where and why we experienced the longest compute time.
Conclusion: splitapply() function consumes the most of total time (75%) per profiling results of RCNL.
Determine the peak demand of memory.
I use job monitoring tools within SLURM system to export the memory job demanded. I cleaned the outputs into dropbox joined_jobs.csv. Codes see /issue/joboverview/ folder. This file includes the demographics of all kind jobs in the full run of Nov. 2023. This can also serve as a benchmark for future running.
Conclusion: Per analysis of joined_jobs.csv, peak memory only shows positive relationship w.r.t the size of market. For market size = [7500, 10000, 12500], the peak memory (on average) = [10, 13, 16](GB).

To provide a high-level explanation in (3)-(5), please find the figure attached. Below is the monitoring record of job 10000_markets_default_tol_ic_RCNL23_0_0_1. This job run during 1:28 PM - 4:33 PM two weeks ago (x-axis); and I take the snapshot of its memory usage (y-axis) every second and plot the time-trend. Blue arrow indicates the places where the splitapply() function is running; and in each loop, after it finishes D_nonlin calculation, it shifts to inverse matrix calculation. The inverse matrix calculation consumes a bit less (but still a lot) memory but is time-efficient, so we can observe a very soon dive-and-recover pattern after some horizontal line.

There are many hurdles to profile/monitor jobs simply because it's too heavy; and I have to use multiple tools to figure out what the job looks like. I reluctantly decided to make the post look concise and information-worthy. But if there's any places which needs more clarification, just let me know!

from template.

ShiqiYang2022 commented on September 2, 2024

Tag Jesse Nano, here are the potential spaces to improve for the next steps in #205 from my side:

Memory

For splitapply() function, I think there exist space to reduce memory; as splitapply() is doing the linear calculation and point multiplication of matrices. Intuitively it should cost less memory than matrix inverse functions. My sense is that splitapply is storing intermediate results as we loop over for every i, resulting big memory usage.

https://github.com/gslab-econ/blp-instruments/blob/8b9669a414e114208f09b542a651ef1e788ee769/analysis_cluster/code/rcnl_demand/rcnl_ddelta.m#L44

For Calculating matrices inverse, I do not think currently there's improvement spaces without a fundamental modification in the code. I asked Christopher Conlon about reducing memory in inverting matrices during NBER IO and he suggests us to move those steps into GPUs as he feels to be efficient to let GPU specialize.
One obvious path to improve is to request different memory usage based on different market size, instead of uniformly request for 20GB. I implemented in [Commit].
One question from my side is, what is the marginal benefit of reducing the memory? Per Comment, the benefit of reducing memory usage is to reduce the memory per CPU ratio from 20GB to 8GB; so that we can at most have 2.5 times of jobs running simultaneously. Note that if we want to significantly cut the total memory usage, we may need to cut memory usage on both splitapply() and inverting matrices; as the memory usage of them are similar and both are large at this time, so the cost of reducing memory might be a little high. I suggest we decide whether or not further cut down the memory after we revise the DGP and estimation process in #205.

Time

Per [Comment], we spend most of time on splitapply() function. I think splitapply() function has room to speed-up. Per public discussions (example here), splitapply() is slow when there are many groups. It looks promising to cut down the time cost in splitapply() in future #205 and I propose to implement that.

from template.

BLP #201 Updates about template HOT 3 CLOSED

Comments (3)

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent