Code Monkey home page Code Monkey logo

Comments (7)

banasraf avatar banasraf commented on June 29, 2024 1

@fversaci
Hey. This approach should be easier to achieve. We support similar scenario with the video input (single input file results in multiple output batches). This will require using the decoupled model (docs). Let me experiment a bit to see what needs to be adjusted to make it work in this case

from dali_backend.

fversaci avatar fversaci commented on June 29, 2024 1

Hi all,
I wanted to provide an update on our use case. Since there is currently no general prefeeding available for the DALI backend in Triton, we have implemented internal prefetching in our plugin. We take the original batch we receive from Triton (e.g., bs=4096), split it into mini-batches (e.g., bs=256), and apply prefetching to these mini-batches.

If anyone is experiencing a similar issue, our code is available in this repository.

from dali_backend.

JanuszL avatar JanuszL commented on June 29, 2024 1

@fversaci - very good work. Thank you for sheering its results.

from dali_backend.

fversaci avatar fversaci commented on June 29, 2024 1

The code is now in the dev branch, along with some (minimal) documentation:
https://github.com/fversaci/cassandra-dali-plugin/tree/dev/examples/triton

from dali_backend.

banasraf avatar banasraf commented on June 29, 2024

Hey @fversaci
Unfortunately, currently there's no way of prefeeding data to inputs in DALI backend. Internally we have an assumption that we don't process upcoming requests untill we send responses for all the previous ones.

We can lift that limitation and we would like to do that, because it might improve performance in various scenarios. However, this will require a significant rework of the backend, so it's hard to predict when are we going to be able to tackle this.

If you haven't experimented with this already, you might want to check the performance when you increase the number of model instances (docs). Maybe higher parallelism would help to hide the cost of fetching the data.

from dali_backend.

fversaci avatar fversaci commented on June 29, 2024

Hi @banasraf,

Thank you for the information (and your availability in general). I will definitely try increasing the number of model instances to see how it improves the throughput.

Regarding the issue with prefeeding Triton-DALI pipelines, I have been considering a temporary solution, while it's still not possible to prefeed them. We could provide a mega-batch (e.g., 1024 UUIDs) to the pipeline and our module could then split it into mini-batches (e.g., 8 mini-batches of size 128), and handle the prefeeding internally.

However, our current code implementing this approach is not functioning properly, since Triton expects to receive a single batch of the same size as the input batch:

E1026 13:09:54.096342 959 dali_model_instance.cc:40] Cannot split a shape list with 128 samples to list shapes of total 1024 samples.

Do you think this issue is easier to fix compared to the general prefeeding problem? In other words, can Triton-DALI handle multi-part answers to queries?

To see or test our code:

git clone https://github.com/fversaci/cassandra-dali-plugin.git -b triton
cd cassandra-dali-plugin
docker build -t cassandra-dali-plugin -f Dockerfile.triton .   # this might take some time
docker run --cap-add=sys_admin --rm -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --name cass-dali cassandra-dali-plugin
# within the container
./start-and-fill-db.sh
./start-triton.sh   # don't close the container
# new shell within the host
docker exec -ti cass-dali fish
# within the container
python3 client-triton.py

from dali_backend.

fversaci avatar fversaci commented on June 29, 2024

Hi @banasraf,

Do you have any updates on adapting the decoupled model to our specific use case?

Meanwhile I have modified our code so that:

  1. It now has three client implementations to play with: client-http-triton.py, client-grpc-triton.py, client-grpc-stream-triton.py
  2. The model produces a reduced output instead of the full tensors. This means that the bottleneck during testing is no longer on the Python clients, but rather in the Triton server pipeline. As a result, the throughput is much higher than before.
  3. I set the default max_batch_size in models/dali_cassandra/config.pbtxt to 256, which matches the
    size offered by the clients. When changing max_batch_size to, e.g., 512, the CassandraTriton plugin automatically splits the large batches into smaller ones, which causes this error to be produced:
Cannot split a shape list with 256 samples to list shapes of total 512 samples.
  1. The plugin now logs the input size of each batch it receives and the current status of its internal prefetching mechanism.

Thanks!

from dali_backend.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.