Code Monkey home page Code Monkey logo

Comments (8)

szalpal avatar szalpal commented on July 23, 2024 1

I did some more research on this topic. Generally, it's not a bug, it's a feature.

Our intention in DALI was not to deallocate GPU memory, virtually ever (it's freed it after process is killed). The reason is that we're keeping a pool of GPU memory, shared by all DALI Pipelines per a given process and creating subsequent Pipeline object is way cheaper, when the memory is already allocated. The peaks marked by arrows in the image above are rather a something unwanted in this behaviour - these are the parts of external libraries, where we cannot control the memory allocation.

That being said, I believe that the use-case presented above is a valid point to ask and a legit reason to introduce a possibility to actually free the GPU memory. We'll introduce such possibility in DALI and DALI Backend. I'll be posting the status updates here.

from dali_backend.

szalpal avatar szalpal commented on July 23, 2024 1

@nrgsy ,

We did not add it to DALI Backend, however I believe required functionality exists already in DALI, therefore I'll create a PR adding it. Thank you for bringing attention to this.

from dali_backend.

szalpal avatar szalpal commented on July 23, 2024 1

@nrgsy , @appearancefnp ,

The PR is merged. You can expect the feature in next Triton release.

from dali_backend.

szalpal avatar szalpal commented on July 23, 2024

Hi @appearancefnp ,

thanks for reaching out. I'm not sure I understand correctly what you'd like to do, but are images, images_2 and images_3 supposed to create a batch?

In DALI, batches are implicit. That means, that such DALI pipeline:

@pipeline_def(
    batch_size=3,
    num_threads=1,
    device_id=0,
    output_dtype=[types.FLOAT],
    output_ndim=[4],  # Dimensions of image, not including batch dimension
)
def decode_pipeline():
    images = fn.external_source(device="cpu", name="input_0", dtype=types.UINT8, ndim=1)
    images = fn.experimental.decoders.image(
        images,
        device="mixed",
        dtype=types.UINT16,
    )

    images = fn.transpose(images, perm=[2, 0, 1])
    images = fn.cast([images], dtype=types.FLOAT)
    image_max_value = fn.reductions.max(images)
    normalization_value = set_normalization_value(image_max_value)
    images /= normalization_value
    return images

already works on batch of 3 images. Having images_2 and images_3 very likely boosts the memory consumption and is not necessary for batch processing.

Also, please correct me if I'm wrong, but the TIFFs you're working with - 3x5000x10000x3x2(size of uint16) sum up to about 1.8GB of data per batch (I assumed, that the 3 at the beginning of the shape is the batch dimension)? If so, after adding some additional memory for the fn.transpose, the amount of memory looks legit. If you'll remove the images_2 and images_3 from the pipeline, the amount of memory should lower to about 2.3GB

Lastly, about the loading/unloading/memory consumption. DALI uses the lazy-allocation model. It means, that when the DALI pipeline is fed with data, DALI tries to handle the input data with existing memory. If whatever DALI has already allocated is not enough, DALI allocates additional memory. Naturally, this process will grow asymptotically and will plateau on the size of memory, which is required to handle the biggest batch possible. For example, if my dataset contains images of various sizes, but the largest one is 1920x1080x3 (uint8), then for batch_size=7 simple DALI decoding pipeline will plateau on about 43MB.

Unloading DALI correctly frees the allocated memory, but as an optimization, a given DALI pipeline when loaded will allocate the same amount of memory it freed before. Since allocations are one of the most expensive operations, this helps in avoiding the warmup phase after unloading/loading DALI pipeline. Is that OK with you? Or would your use-case require starting the warmup from scratch?

Hopefully my explanation here was clear. If you have any other questions, don't hesitate to ask :)

from dali_backend.

appearancefnp avatar appearancefnp commented on July 23, 2024

Thanks for the reply!

I know the pipeline looks weird, but my model input consists of three RGB images. I know it's a weird way to do it, but currently it is designed this way. But this is not the problem I want to address right now.

The problem is that after unloading the models, i would want to free the memory.

image

After unloading DALI models, the triton inference server keeps the memory and does not release it. If I could unload it completely and load it again it would be great. The model warmup is not a problem :) The GPU memory is the problem! Because for my use case, I want to free the GPU memory from the inference server and use it else where. So yes, I want the warmup from scrath :)

from dali_backend.

szalpal avatar szalpal commented on July 23, 2024

I see. This actually might be a bug, I'd need to check this out. Thanks for reporting and the repro. I'll be posting status updates here.

Cheers!

from dali_backend.

appearancefnp avatar appearancefnp commented on July 23, 2024

@szalpal Thanks for the updates! I know it's expensive to reallocate GPU memory in terms of time, but if it's an optional configuration setting, the would be great!

Cheers!

from dali_backend.

nrgsy avatar nrgsy commented on July 23, 2024

That being said, I believe that the use-case presented above is a valid point to ask and a legit reason to introduce a possibility to actually free the GPU memory. We'll introduce such possibility in DALI and DALI Backend. I'll be posting the status updates here.

@szalpal, curious if this feature ever got added? My team ran into this issue recently and thought it was a bug. We were creating and destroying cuda shared memory regions many times sequentially in the same process, and saw GPU memory usage increase until we ran out of memory. This did not happen prior to our switch to dali_backend (we are using dali_backend for image preprocessing, which was previously done before writing the image to shared GPU memory). Our proposed fix is to avoid creating and destroying shared memory many times in the same process, but would be good to know if there is a way to avoid increasing memory usage and warmup from scratch instead.

from dali_backend.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.