How we are using LightGBM: We are using the LightGB

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thank you for the prompt reply <a class="user-mention notranslate" data-hovercard-type

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Memory not being returned to OS on calling C.LGBM_BoosterFree about lightgbm HOT 15 OPEN

alzio2607 commented on June 8, 2024 2

Memory not being returned to OS on calling C.LGBM_BoosterFree

from lightgbm.

Comments (15)

alzio2607 commented on June 8, 2024 1

Welcome! From the write-up of leaves, looks like it focuses on the prediction part. I'll look into it though. Thanks!

from lightgbm.

alzio2607 commented on June 8, 2024 1

@jameslamb we deep dived into this and have come to a conclusion that this issue is a result of interaction between the Golang Garbage Collector and the threads created by the C_API.

Here are our findings:

-- First, we wrote a pure cpp script and loaded and released the model via c_api repeatedly. We saw that the RSS was constant throughout the run. We noticed that the process creates as many threads as available on machine.

-- Second, since LightGBM c_api uses Open MP interface to manage threads we fixed the number of threads to 8 by setting the env var OMP_NUM_THREADS. We noticed that the RSS was still stable, and the process created the specified number of threads and did not spawn more threads.

We believe that the c_api keeps a certain amount of buffer with the threads in order to reuse it when required, which is the correct thing to do as a part of optimization.

These 2 experiments ruled out the doubt that the c_api is mismanaging the memory.

-- Third, we loaded the same model via a Golang Script. We forced triggered GC after every cycle of load and release. We also fixed the number of threads for the process to 8 using OMP_NUM_THREADS = 8. This is where things got interesting.
When the GC ran, it transiently zombied out the live threads and c_api had to request a new set of threads from the OS in order to load the model. This led to an increase in the total number of threads of the process. Also worth noting is that whenever the number of threads remain unchanged after a cycle, the RSS is stable as well.

__
The zombied threads still held on to their buffer memory, and hence there was a gradual increase in RSS. From the following image, it's clear that only 8 threads are in use, rest are just holding the memory.

-- Fourth, we stopped the force trigger of GC and switched off the automatic GC by setting debug.setgcpercent(-1) in the Golang code. The result was a stable RSS with the Golang code!

Conclusion : The Garbage Collection of Golang is somehow interfering with the thread management of c_api and the c code is losing control over it's threads that have a buffer memory and when zombied, this memory is lost and contributes to RSS.

Let me know, if this issue should still be a part of this repo. I am anyway heading over to the Golang repo to highlight this and find a way to resolve it.

from lightgbm.

alzio2607 commented on June 8, 2024 1

@jameslamb hi. we did try the threadless version and we saw an improvement initially, but with time the memory seems to swell up like the threaded version only.

Although, we seemed to have solved the issue. We schedule our model refreshes in a goroutine, so all the calls to the C API are made inside this goroutine. We think that the C API is allocating some buffers in the underlying native OS thread of this goroutine and during GC and the sleeps we are adding between model refreshes, this native thread is yielded and so are the buffers (hence zombieng the memory).

We used a functionality provided by Golang in its runtime package, runtime.LockOSThread(), to permanently map the Model Loader Goroutine to a particular native thread.

We are seeing improvements and a much flatter RSS graph now

Would just like to request you to confirm if we're correct in assuming that the library does some allocations in the thread locally.

from lightgbm.

jameslamb commented on June 8, 2024

Thanks for using LightGBM.

Is it possible to upgrade to a newer version of LightGBM? v3.1.1 is 3.5 years old at this point (link), so you're missing out on 3.5 years of fixes and changes, including some related to memory efficiency and fixed memory leaks.

from lightgbm.

alzio2607 commented on June 8, 2024

Thank you for the prompt reply @jameslamb. I would really want to upgrade the lightgbm version, but working in a production environment, the switch is not easy to do for us, since our training backend also uses the same lightgbm version and we'll need to do a whole lot of testing to make the switch. Moreover, the same version works fine in a Java environment, so I would really want to figure out why Golang is causing an issue.

However, I'll try using the latest version and report back the results.

from lightgbm.

jameslamb commented on June 8, 2024

Thanks! Excellent report by the way, this was really interesting to read.

I have one other theory. I know very little about Go, but looking at the code you provided, it seems like you are loading the model into a string and then passing that string into LightGBM.... but never freeing the string.

modelString = getModelAsString()

(code link)

Your ReleaseMemory() method leaves that modelString unmodified.

func ReleaseMemory() {
	res := int(C.LGBM_BoosterFree(predictor))

	if res == -1 {
		panic("LightGBM C_API : Failed to release the memory of the LightGBM model")
	}
}

Could you try freeing the string?

Alternatively, if the model file is available in the container's local filesystem, you can skip the step of reading it up into a string and just directly load it from a file, with LGBM_BoosterCreateFromModelFile(). That was available in v3.1.1.

https://github.com/microsoft/LightGBM/blob/v3.1.1/include/LightGBM/c_api.h#L421-L423

from lightgbm.

alzio2607 commented on June 8, 2024

@jameslamb Yes I am loading the model string as a global variable, but I am doing that once in intialization. And my base(Initial) rss is compared after that point. So, in the output Initial: 1524 refers to the RSS value after loading the model string once. I knew about the LGBM_BoosterCreateFromModelFile() method, but loaded from string to emulate exactly what we are doing in production.

Also, I just tried with the latest version and I am seeing the same. So, it could possibly be how things are handled at Golang side. Is there any documentation that allows one to learn how to properly write a C Wrapper for the LightGBM C API?

from lightgbm.

jameslamb commented on June 8, 2024

Is there any documentation that allows one to learn how to properly write a C Wrapper for the LightGBM C API?

There is not really, sorry. That feature request is tracked in #6261.

There are some suggestions in this comment: #6261 (comment).

Also, I just tried with the latest version and I am seeing the same. So, it could possibly be how things are handled at Golang side.

That's helpful, thank you!

Every release of LightGBM is tested with valgrind and leak sanitizer, and I'm fairly sure that those tests cover code paths that use LGBM_BoosterLoadModelFromString(). I'd expect those tests to catch any significant memory leaks.

So yes, based on what you've shared so far, I strongly suspect this is about some interaction between Go's memory management and how your Go code is calling into LightGBM, not a leak from LightGBM itself. But I don't know that for sure.

We can leave this open for a bit, maybe someone else will be able to provide other theories.

from lightgbm.

alzio2607 commented on June 8, 2024

Okay sure. Will be really helpful if someone can shed light on go implementations

from lightgbm.

jameslamb commented on June 8, 2024

We don't have any Go code in this repo or anyone doing active Go development (as far as I know).

Maybe https://github.com/dmitryikh/leaves can help.

If you do find a solution, please do come back and post it here. And thanks again for the EXCELLENT write-up and reproducible example!

from lightgbm.

jameslamb commented on June 8, 2024

I see you've posted a related question over on Stack Overflow: https://stackoverflow.com/questions/78347190/is-this-the-correct-way-to-use-cgo-to-call-c-code-via-go

Totally fine, since it's not exactly the same as the one you posted here.

I'm linking it here so anyone finding this GitHub issue from search will also find that related Stack Overflow question. I recommend you link back to this issue from your Stack Overflow post, for the same reason.

from lightgbm.

jameslamb commented on June 8, 2024

Oh actually it looks like you posted two separate Stack Overflow posts about this same topic?

https://stackoverflow.com/questions/78347190/is-this-the-correct-way-to-use-cgo-to-call-c-code-via-go

https://stackoverflow.com/questions/78343407/how-to-use-cgo-properly-to-avoid-memory-leaks-working-with-lightgbm-c-api-from

Please don't do this. Spreading the same conversation around so many places, especially without linking between them, means that different people might spend time answering your question in different places, not knowing that others are talking about it somewhere else. That makes it more difficult for others facing the same problem to benefit from the conversation when searching from search engines, and risks wasting the time of some of the people involved.

Since it seems like https://stackoverflow.com/questions/78343407/how-to-use-cgo-properly-to-avoid-memory-leaks-working-with-lightgbm-c-api-from is getting answers, I think you should close https://stackoverflow.com/questions/78347190/is-this-the-correct-way-to-use-cgo-to-call-c-code-via-go and link to that.

from lightgbm.

alzio2607 commented on June 8, 2024

Hey. Yes I raised the issue on Stackoverflow as well. Since there are very less number of people using Lightgbm who are working on Go, I wanted to get a broader audience to have a better chance at resolving. Although I did go overboard with creating 2 posts. Removed the 2nd one.

from lightgbm.

jameslamb commented on June 8, 2024

Wow, interesting!!!

Thank you for this excellent write-up. It'll take me some time to think through it carefully, but I do have one piece of advice to share that might be useful.

You can build LightGBM with OpenMP support completely turned off like this:

cmake -B build -S . -DUSE_OPENMP=OFF
cmake --build build --target _lightgbm

When using LightGBM to generate predictions, OpenMP is helpful for parallelizing over rows in the input data. It's also helpful in parallelizing the work of initially loading the model from a text file.

But I don't believe OpenMP actually speeds up anything in LightGBM for single-row predictions. So if your service is only generating predictions on a single row at a time, and you're finding that some interaction with LightGBM's use of OpenMP is leading to a memory leak, I recommend using a version of LightGBM built without OpenMP support.

That'll prevent that initial pool of threads being allocated, and it'll make the behavior of the program invariant to settings like the OMP_NUM_THREADS environment variable.

from lightgbm.

jameslamb commented on June 8, 2024

Ah interesting, thanks for sharing that! I personally am not familiar with Go or how its garbage collection works so I can't offer any feedback on that approach. Glad to hear you've found changes to stabilize your applications though!

Would just like to request you to confirm if we're correct in assuming that the library does some allocations in the thread locally.

I'm not certain, but I think the answer is "yes". I believe the std::vectors in this block (which runs during some prediction codepaths) will be thread-local allocations:

LightGBM/src/boosting/gbdt.cpp

Lines 674 to 686 in a70e832

    
           if (objective_function_ != nullptr) { 
        
             #pragma omp parallel for num_threads(OMP_NUM_THREADS()) schedule(static) 
        
             for (data_size_t i = 0; i < num_data; ++i) { 
        
               std::vector<double> tree_pred(num_tree_per_iteration_); 
        
               for (int j = 0; j < num_tree_per_iteration_; ++j) { 
        
                 tree_pred[j] = raw_scores[j * num_data + i]; 
        
               } 
        
               std::vector<double> tmp_result(num_class_); 
        
               objective_function_->ConvertOutput(tree_pred.data(), tmp_result.data()); 
        
               for (int j = 0; j < num_class_; ++j) { 
        
                 out_result[j * num_data + i] = static_cast<double>(tmp_result[j]); 
        
               } 
        
             }

Those are never explicitly freed in LightGBM's code... LightGBM is just declaring them on the stack and trusting they'll be cleaned up automatically.

from lightgbm.

Memory not being returned to OS on calling C.LGBM_BoosterFree about lightgbm HOT 15 OPEN

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	if (objective_function_ != nullptr) {
	#pragma omp parallel for num_threads(OMP_NUM_THREADS()) schedule(static)
	for (data_size_t i = 0; i < num_data; ++i) {
	std::vector<double> tree_pred(num_tree_per_iteration_);
	for (int j = 0; j < num_tree_per_iteration_; ++j) {
	tree_pred[j] = raw_scores[j * num_data + i];
	}
	std::vector<double> tmp_result(num_class_);
	objective_function_->ConvertOutput(tree_pred.data(), tmp_result.data());
	for (int j = 0; j < num_class_; ++j) {
	out_result[j * num_data + i] = static_cast<double>(tmp_result[j]);
	}
	}