Comments (8)
I did some more research on this topic. Generally, it's not a bug, it's a feature.
Our intention in DALI was not to deallocate GPU memory, virtually ever (it's freed it after process is killed). The reason is that we're keeping a pool of GPU memory, shared by all DALI Pipelines per a given process and creating subsequent Pipeline object is way cheaper, when the memory is already allocated. The peaks marked by arrows in the image above are rather a something unwanted in this behaviour - these are the parts of external libraries, where we cannot control the memory allocation.
That being said, I believe that the use-case presented above is a valid point to ask and a legit reason to introduce a possibility to actually free the GPU memory. We'll introduce such possibility in DALI and DALI Backend. I'll be posting the status updates here.
from dali_backend.
@nrgsy ,
We did not add it to DALI Backend, however I believe required functionality exists already in DALI, therefore I'll create a PR adding it. Thank you for bringing attention to this.
from dali_backend.
@nrgsy , @appearancefnp ,
The PR is merged. You can expect the feature in next Triton release.
from dali_backend.
Hi @appearancefnp ,
thanks for reaching out. I'm not sure I understand correctly what you'd like to do, but are images, images_2 and images_3
supposed to create a batch?
In DALI, batches are implicit. That means, that such DALI pipeline:
@pipeline_def(
batch_size=3,
num_threads=1,
device_id=0,
output_dtype=[types.FLOAT],
output_ndim=[4], # Dimensions of image, not including batch dimension
)
def decode_pipeline():
images = fn.external_source(device="cpu", name="input_0", dtype=types.UINT8, ndim=1)
images = fn.experimental.decoders.image(
images,
device="mixed",
dtype=types.UINT16,
)
images = fn.transpose(images, perm=[2, 0, 1])
images = fn.cast([images], dtype=types.FLOAT)
image_max_value = fn.reductions.max(images)
normalization_value = set_normalization_value(image_max_value)
images /= normalization_value
return images
already works on batch of 3 images. Having images_2
and images_3
very likely boosts the memory consumption and is not necessary for batch processing.
Also, please correct me if I'm wrong, but the TIFFs you're working with - 3x5000x10000x3x2(size of uint16)
sum up to about 1.8GB
of data per batch (I assumed, that the 3
at the beginning of the shape is the batch dimension)? If so, after adding some additional memory for the fn.transpose
, the amount of memory looks legit. If you'll remove the images_2
and images_3
from the pipeline, the amount of memory should lower to about 2.3GB
Lastly, about the loading/unloading/memory consumption. DALI uses the lazy-allocation model. It means, that when the DALI pipeline is fed with data, DALI tries to handle the input data with existing memory. If whatever DALI has already allocated is not enough, DALI allocates additional memory. Naturally, this process will grow asymptotically and will plateau on the size of memory, which is required to handle the biggest batch possible. For example, if my dataset contains images of various sizes, but the largest one is 1920x1080x3 (uint8)
, then for batch_size=7
simple DALI decoding pipeline will plateau on about 43MB.
Unloading DALI correctly frees the allocated memory, but as an optimization, a given DALI pipeline when loaded will allocate the same amount of memory it freed before. Since allocations are one of the most expensive operations, this helps in avoiding the warmup
phase after unloading/loading DALI pipeline. Is that OK with you? Or would your use-case require starting the warmup
from scratch?
Hopefully my explanation here was clear. If you have any other questions, don't hesitate to ask :)
from dali_backend.
Thanks for the reply!
I know the pipeline looks weird, but my model input consists of three RGB images. I know it's a weird way to do it, but currently it is designed this way. But this is not the problem I want to address right now.
The problem is that after unloading the models, i would want to free the memory.
After unloading DALI models, the triton inference server keeps the memory and does not release it. If I could unload it completely and load it again it would be great. The model warmup is not a problem :) The GPU memory is the problem! Because for my use case, I want to free the GPU memory from the inference server and use it else where. So yes, I want the warmup from scrath :)
from dali_backend.
I see. This actually might be a bug, I'd need to check this out. Thanks for reporting and the repro. I'll be posting status updates here.
Cheers!
from dali_backend.
@szalpal Thanks for the updates! I know it's expensive to reallocate GPU memory in terms of time, but if it's an optional configuration setting, the would be great!
Cheers!
from dali_backend.
That being said, I believe that the use-case presented above is a valid point to ask and a legit reason to introduce a possibility to actually free the GPU memory. We'll introduce such possibility in DALI and DALI Backend. I'll be posting the status updates here.
@szalpal, curious if this feature ever got added? My team ran into this issue recently and thought it was a bug. We were creating and destroying cuda shared memory regions many times sequentially in the same process, and saw GPU memory usage increase until we ran out of memory. This did not happen prior to our switch to dali_backend (we are using dali_backend for image preprocessing, which was previously done before writing the image to shared GPU memory). Our proposed fix is to avoid creating and destroying shared memory many times in the same process, but would be good to know if there is a way to avoid increasing memory usage and warmup from scratch instead.
from dali_backend.
Related Issues (20)
- Batching does not improve performance with dali HOT 10
- Can dali backend support default values or optional input? HOT 2
- Unexpected large memory needed for gpu resize HOT 4
- Error in thread 31: nvJPEG error (5): The user-provided allocator functions, for either memory allocation or for releasing the memory, returned a non-zero code. HOT 6
- Cannot compile dali_backend with older version of triton HOT 2
- how to provide batch input data for dali pipeline whicn input shapes [-1] HOT 1
- if I want to crop from different start point, how can I build pipe to do this? HOT 2
- Test issue
- Connecting InputOperator with no explicit inputs to Triton HOT 12
- Could not serialize dali.fn.python_function HOT 1
- when using crop_mirror_normalize func, Output layout "CHW" is slower than "HWC" HOT 5
- dlopen libcuda.so failed!. Please install GPU dirverTraceback (most recent call last): HOT 4
- Prefeed multiple input batches to the inference pipeline HOT 7
- Unable to load numpy module in a DALI backend HOT 3
- DALI pipeline in Triton - formatting InferInput batch of images for UINT8 HOT 3
- 'NoneType' object has no attribute 'loader' when trying to load DALI model. HOT 15
- How to format client code for inception example HOT 14
- How to get list of image paths into dali pipeline? HOT 4
- How to use scalar inputs HOT 3
- Video Input larger than max
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dali_backend.