Comments (7)
@fversaci
Hey. This approach should be easier to achieve. We support similar scenario with the video input (single input file results in multiple output batches). This will require using the decoupled model (docs). Let me experiment a bit to see what needs to be adjusted to make it work in this case
from dali_backend.
Hi all,
I wanted to provide an update on our use case. Since there is currently no general prefeeding available for the DALI backend in Triton, we have implemented internal prefetching in our plugin. We take the original batch we receive from Triton (e.g., bs=4096), split it into mini-batches (e.g., bs=256), and apply prefetching to these mini-batches.
If anyone is experiencing a similar issue, our code is available in this repository.
from dali_backend.
@fversaci - very good work. Thank you for sheering its results.
from dali_backend.
The code is now in the dev
branch, along with some (minimal) documentation:
https://github.com/fversaci/cassandra-dali-plugin/tree/dev/examples/triton
from dali_backend.
Hey @fversaci
Unfortunately, currently there's no way of prefeeding data to inputs in DALI backend. Internally we have an assumption that we don't process upcoming requests untill we send responses for all the previous ones.
We can lift that limitation and we would like to do that, because it might improve performance in various scenarios. However, this will require a significant rework of the backend, so it's hard to predict when are we going to be able to tackle this.
If you haven't experimented with this already, you might want to check the performance when you increase the number of model instances (docs). Maybe higher parallelism would help to hide the cost of fetching the data.
from dali_backend.
Hi @banasraf,
Thank you for the information (and your availability in general). I will definitely try increasing the number of model instances to see how it improves the throughput.
Regarding the issue with prefeeding Triton-DALI pipelines, I have been considering a temporary solution, while it's still not possible to prefeed them. We could provide a mega-batch (e.g., 1024 UUIDs) to the pipeline and our module could then split it into mini-batches (e.g., 8 mini-batches of size 128), and handle the prefeeding internally.
However, our current code implementing this approach is not functioning properly, since Triton expects to receive a single batch of the same size as the input batch:
E1026 13:09:54.096342 959 dali_model_instance.cc:40] Cannot split a shape list with 128 samples to list shapes of total 1024 samples.
Do you think this issue is easier to fix compared to the general prefeeding problem? In other words, can Triton-DALI handle multi-part answers to queries?
To see or test our code:
git clone https://github.com/fversaci/cassandra-dali-plugin.git -b triton
cd cassandra-dali-plugin
docker build -t cassandra-dali-plugin -f Dockerfile.triton . # this might take some time
docker run --cap-add=sys_admin --rm -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 --name cass-dali cassandra-dali-plugin
# within the container
./start-and-fill-db.sh
./start-triton.sh # don't close the container
# new shell within the host
docker exec -ti cass-dali fish
# within the container
python3 client-triton.py
from dali_backend.
Hi @banasraf,
Do you have any updates on adapting the decoupled model to our specific use case?
Meanwhile I have modified our code so that:
- It now has three client implementations to play with:
client-http-triton.py
,client-grpc-triton.py
,client-grpc-stream-triton.py
- The model produces a reduced output instead of the full tensors. This means that the bottleneck during testing is no longer on the Python clients, but rather in the Triton server pipeline. As a result, the throughput is much higher than before.
- I set the default
max_batch_size
inmodels/dali_cassandra/config.pbtxt
to 256, which matches the
size offered by the clients. When changingmax_batch_size
to, e.g., 512, the CassandraTriton plugin automatically splits the large batches into smaller ones, which causes this error to be produced:
Cannot split a shape list with 256 samples to list shapes of total 512 samples.
- The plugin now logs the input size of each batch it receives and the current status of its internal prefetching mechanism.
Thanks!
from dali_backend.
Related Issues (20)
- Error when executing Mixed operator decoders__Image when sending image binary to dali in triton HOT 9
- how to use the numpy data in the DALI HOT 3
- Batching does not improve performance with dali HOT 10
- Can dali backend support default values or optional input? HOT 2
- Unexpected large memory needed for gpu resize HOT 4
- Error in thread 31: nvJPEG error (5): The user-provided allocator functions, for either memory allocation or for releasing the memory, returned a non-zero code. HOT 6
- Cannot compile dali_backend with older version of triton HOT 2
- how to provide batch input data for dali pipeline whicn input shapes [-1] HOT 1
- if I want to crop from different start point, how can I build pipe to do this? HOT 2
- Test issue
- Connecting InputOperator with no explicit inputs to Triton HOT 12
- Could not serialize dali.fn.python_function HOT 1
- when using crop_mirror_normalize func, Output layout "CHW" is slower than "HWC" HOT 5
- dlopen libcuda.so failed!. Please install GPU dirverTraceback (most recent call last): HOT 4
- Unable to load numpy module in a DALI backend HOT 3
- DALI pipeline in Triton - formatting InferInput batch of images for UINT8 HOT 3
- 'NoneType' object has no attribute 'loader' when trying to load DALI model. HOT 15
- How to format client code for inception example HOT 14
- How to get list of image paths into dali pipeline? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dali_backend.