Hello! I was excited for the 16-bit TIFF decoding, but there is a bu

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

DALI backend not releasing device memory,about triton-inference-server/dali_backend

Comments (8)

szalpal commented on July 23, 2024 1

I did some more research on this topic. Generally, it's not a bug, it's a feature.

Our intention in DALI was not to deallocate GPU memory, virtually ever (it's freed it after process is killed). The reason is that we're keeping a pool of GPU memory, shared by all DALI Pipelines per a given process and creating subsequent Pipeline object is way cheaper, when the memory is already allocated. The peaks marked by arrows in the image above are rather a something unwanted in this behaviour - these are the parts of external libraries, where we cannot control the memory allocation.

That being said, I believe that the use-case presented above is a valid point to ask and a legit reason to introduce a possibility to actually free the GPU memory. We'll introduce such possibility in DALI and DALI Backend. I'll be posting the status updates here.

from dali_backend.

szalpal commented on July 23, 2024 1

@nrgsy ,

We did not add it to DALI Backend, however I believe required functionality exists already in DALI, therefore I'll create a PR adding it. Thank you for bringing attention to this.

from dali_backend.

szalpal commented on July 23, 2024 1

@nrgsy , @appearancefnp ,

The PR is merged. You can expect the feature in next Triton release.

from dali_backend.

szalpal commented on July 23, 2024

Hi @appearancefnp ,

thanks for reaching out. I'm not sure I understand correctly what you'd like to do, but are images, images_2 and images_3 supposed to create a batch?

In DALI, batches are implicit. That means, that such DALI pipeline:

@pipeline_def(
    batch_size=3,
    num_threads=1,
    device_id=0,
    output_dtype=[types.FLOAT],
    output_ndim=[4],  # Dimensions of image, not including batch dimension
)
def decode_pipeline():
    images = fn.external_source(device="cpu", name="input_0", dtype=types.UINT8, ndim=1)
    images = fn.experimental.decoders.image(
        images,
        device="mixed",
        dtype=types.UINT16,
    )

    images = fn.transpose(images, perm=[2, 0, 1])
    images = fn.cast([images], dtype=types.FLOAT)
    image_max_value = fn.reductions.max(images)
    normalization_value = set_normalization_value(image_max_value)
    images /= normalization_value
    return images

already works on batch of 3 images. Having images_2 and images_3 very likely boosts the memory consumption and is not necessary for batch processing.

Also, please correct me if I'm wrong, but the TIFFs you're working with - 3x5000x10000x3x2(size of uint16) sum up to about 1.8GB of data per batch (I assumed, that the 3 at the beginning of the shape is the batch dimension)? If so, after adding some additional memory for the fn.transpose, the amount of memory looks legit. If you'll remove the images_2 and images_3 from the pipeline, the amount of memory should lower to about 2.3GB

Lastly, about the loading/unloading/memory consumption. DALI uses the lazy-allocation model. It means, that when the DALI pipeline is fed with data, DALI tries to handle the input data with existing memory. If whatever DALI has already allocated is not enough, DALI allocates additional memory. Naturally, this process will grow asymptotically and will plateau on the size of memory, which is required to handle the biggest batch possible. For example, if my dataset contains images of various sizes, but the largest one is 1920x1080x3 (uint8), then for batch_size=7 simple DALI decoding pipeline will plateau on about 43MB.

Unloading DALI correctly frees the allocated memory, but as an optimization, a given DALI pipeline when loaded will allocate the same amount of memory it freed before. Since allocations are one of the most expensive operations, this helps in avoiding the warmup phase after unloading/loading DALI pipeline. Is that OK with you? Or would your use-case require starting the warmup from scratch?

Hopefully my explanation here was clear. If you have any other questions, don't hesitate to ask :)

from dali_backend.

appearancefnp commented on July 23, 2024

Thanks for the reply!

I know the pipeline looks weird, but my model input consists of three RGB images. I know it's a weird way to do it, but currently it is designed this way. But this is not the problem I want to address right now.

The problem is that after unloading the models, i would want to free the memory.

After unloading DALI models, the triton inference server keeps the memory and does not release it. If I could unload it completely and load it again it would be great. The model warmup is not a problem :) The GPU memory is the problem! Because for my use case, I want to free the GPU memory from the inference server and use it else where. So yes, I want the warmup from scrath :)

from dali_backend.

szalpal commented on July 23, 2024

I see. This actually might be a bug, I'd need to check this out. Thanks for reporting and the repro. I'll be posting status updates here.

Cheers!

from dali_backend.

appearancefnp commented on July 23, 2024

@szalpal Thanks for the updates! I know it's expensive to reallocate GPU memory in terms of time, but if it's an optional configuration setting, the would be great!

Cheers!

from dali_backend.

nrgsy commented on July 23, 2024

That being said, I believe that the use-case presented above is a valid point to ask and a legit reason to introduce a possibility to actually free the GPU memory. We'll introduce such possibility in DALI and DALI Backend. I'll be posting the status updates here.

@szalpal, curious if this feature ever got added? My team ran into this issue recently and thought it was a bug. We were creating and destroying cuda shared memory regions many times sequentially in the same process, and saw GPU memory usage increase until we ran out of memory. This did not happen prior to our switch to dali_backend (we are using dali_backend for image preprocessing, which was previously done before writing the image to shared GPU memory). Our proposed fix is to avoid creating and destroying shared memory many times in the same process, but would be good to know if there is a way to avoid increasing memory usage and warmup from scratch instead.

from dali_backend.

DALI backend not releasing device memory about dali_backend HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent