opendatahub-io / caikit-tgis-serving Goto Github PK

View Code? Open in Web Editor NEW

19.0 19.0 44.0 1.91 MB

License: Apache License 2.0

Dockerfile 20.65% Python 44.39% Makefile 7.37% Shell 27.60%

caikit-tgis-serving's People

Contributors

Stargazers

Watchers

Forkers

vaibhavjainwiz xaenalt jooho vedantmahabaleshwarkar red-hat-data-services tarukumar bdattoma rhuss danielezonca heyselbi markstur vassilisvassiliadis melissaflinn manosnoam rpancham dagrayvid dtrifiro gabe-l-hart jimknochelmann wangjun1974 mamurak guimou spolti kpouget dimakis esposem davidesalerno wangzheng422 jwforres ckadner ymoatti maxusmusti mohankrishna225 terrytangyuan nanyte25 killiangolds christinaexyou openshift-cherrypick-robot geored israel-hdez nicklucche

caikit-tgis-serving's Issues

[Bug] TGIS container fails to run on a FIPS cluster

When deploying a LLM model using the new Caikit+TGIS architecture introduced with #107 , the TGIS container (i.e., transformer-container) fails to start if the cluster has FIPS cryptography enabled.

These are the 2 errors I got in the container logs
There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. fips.c(145): OpenSSL internal error, assertion failed: FATAL FIPS SELFTEST FAILURE
Note: the TRANSFORMERS_CACHE is actually set in the ServinRuntime

This was found on a OpenShift 4.13.18 cluster with RHODS 2.1.2 (aka 1.32.2) and KServe 0.11 installed

Update the caikit dependencies

Caikit has released new versions of the libraries. We need to update the dependencies and validate that the image is correct.

The kserve install failed with `TARGET_OPERATOR=rhods`

kserve install failed with the following error .

[perfci@f23-h33-000-6018r ~]$ oc -n redhat-ods-operator describe subs/rhods-operator
...
  Conditions:
    Message:               constraints not satisfiable: no operators found from catalog rhods-catalog in namespace openshift-marketplace referenced by subscription rhods-operator, subscription rhods-operator exists
    Reason:                ConstraintsNotSatisfiable
    Status:                True
    Type:                  ResolutionFailed
    Last Transition Time:  2023-08-22T19:06:09Z
    Message:               targeted catalogsource openshift-marketplace/rhods-catalog missing
    Reason:                UnhealthyCatalogSourceFound
    Status:                True
    Type:                  CatalogSourcesUnhealthy
  Last Updated:            2023-08-22T19:06:09Z

Knative support `mTLS:true` so doc/script need to be updated

Based on (this slack thread](https://redhat-internal.slack.com/archives/C05742W6F7T/p1692610378820169), knative-serving configuration needs to be updated to use mTLS:true option in SMCP.

With this issue, we need to update:

documentation
- match up script and document.
scripts
- verify all scripts are working fine with the change
related manifests
- SMCP
- PeerAuthentication

Add two remaining demos to Caikit/TGIS Quickstart

Remaining demos are:

Update runtime
Traffic splitting

Add those to documentation.

Add instructions about setting TGI(S) parameters

Users may need to set particular TGI(S) parameter when using Caikit+TGIS runtime on KServe. An example is the model timeout parameter which can be necessary to be tweaked based on the model size.

We should document the procedure in our docs

Additionally, in a future UI effort, this option should be present in the user interface

GRPC endpoint not responding properly after the InferenceService reports as `Loaded`

As part of my automated scale test, I observe that the InferenceService sometimes reports as Loaded, but the call to GRPC endpoint returns with errors.

Examples:

<command>
set -o pipefail;
i=0;

GRPCURL_DATA=$(cat "subprojects/llm-load-test/openorca-subset-006.json" | jq .dataset[$i].input )

grpcurl    -insecure    -d "$GRPCURL_DATA"    -H "mm-model-id: flan-t5-small-caikit"    u0-m7-predictor-watsonx-serving-scale-test-u0.apps.psap-watsonx-dgxa100.perf.lab.eng.bos.redhat.com:443    caikit.runtime.Nlp.NlpService/TextGenerationTaskPredict
</command>

<stderr> ERROR:
<stderr>   Code: Unavailable
<stderr>   Message: connections to all backends failing; last error: UNKNOWN: ipv4:127.0.0.1:8033: Failed to connect to remote host: Connection refused

<command>
set -o pipefail;
set -e;
dest=/mnt/logs/016__watsonx_serving__validate_model_all/u0-m6/answers.json
queries=/mnt/logs/016__watsonx_serving__validate_model_all/u0-m6/questions.json
rm -f "$dest" "$queries"

for i in $(seq 10); do
  GRPCURL_DATA=$(cat "subprojects/llm-load-test/openorca-subset-006.json" | jq .dataset[$i].input )
  echo $GRPCURL_DATA >> "$queries"
  grpcurl    -insecure    -d "$GRPCURL_DATA"    -H "mm-model-id: flan-t5-small-caikit"    u0-m6-predictor-watsonx-serving-scale-test-u0.apps.psap-watsonx-dgxa100.perf.lab.eng.bos.redhat.com:443    caikit.runtime.Nlp.NlpService/TextGenerationTaskPredict    >> "$dest"
  echo "Call $i/10 passed"
done
</command>

<stdout> Call 1/10 passed
<stdout> Call 2/10 passed
<stdout> Call 3/10 passed
<stdout> Call 4/10 passed
<stdout> Call 5/10 passed
<stdout> Call 6/10 passed
<stdout> Call 7/10 passed
<stdout> Call 8/10 passed
<stdout> Call 9/10 passed
<stderr> ERROR:
<stderr>   Code: Unavailable
<stderr>   Message: error reading from server: EOF

Versions

NAME                          DISPLAY                                          VERSION    REPLACES                                   PHASE
jaeger-operator.v1.47.1-5     Red Hat OpenShift distributed tracing platform   1.47.1-5   jaeger-operator.v1.47.0-2-0.1696814090.p   Succeeded
kiali-operator.v1.65.9        Kiali Operator                                   1.65.9     kiali-operator.v1.65.8                     Succeeded
rhods-operator.2.3.0          Red Hat OpenShift Data Science                   2.3.0      rhods-operator.2.2.0                       Succeeded
serverless-operator.v1.30.1   Red Hat OpenShift Serverless                     1.30.1     serverless-operator.v1.30.0                Succeeded
servicemeshoperator.v2.4.4    Red Hat OpenShift Service Mesh                   2.4.4-0    servicemeshoperator.v2.4.3                 Succeeded

quay.io/opendatahub/text-generation-inference@sha256:0e3d00961fed95a8f8b12ed7ce50305acbbfe37ee33d37e81ba9e7ed71c73b69
quay.io/opendatahub/caikit-tgis-serving@sha256:ed920d21a4ba24643c725a96b762b114b50f580e6fee198f7ccd0bc73a95a6ab

probes: investigate if it's possible to check if the model is loaded

See caikit/caikit#654, version bump in #210

Consider Sail Operator for Istio with Service Mesh being merged back upstream Istio

"As a preview of this effort, we are pleased to announce that there is a new operator for Istio on Red Hat OpenShift for developer preview and early feedback. This new operator - temporarily called the “Sail Operator” (more on this below) will be the foundation for OpenShift Service Mesh 3."

https://cloud.redhat.com/blog/introducing-a-new-operator-for-istio-on-openshift

Add support for HTTP

Description

In caikit>=0.15.0, there's a new entrypoint for python -m caikit.runtime that allows the server to run either the HTTP or gRPC server or both in parallel. This should be a simple update to the entrypoint in start-serving.sh.

Serving the Prompt-tuned models with Caikit-TGIS

Make RHODS/ODH deployment optional in KServe installation

Issue to track the PR attached that enhances the quickstart script.

Add github actions tests

Add github action to build/test the image, possibly using the docker-compose smoke test proposed in #112

Automated smoke test for caikit serving

Cakit+TGIS combined image/SR

The previous Caikit+TGIS image has to be split into separate containers but will reside in the same SR/pod.

Add more ServingRuntime examples

Add more ServingRuntime examples with:

caikit standalone
tgis standalone
caikit+tgis

Anything I'm forgetting, @Xaenalt ?

[RFE] support "custom" namespaces

currently the caikit install instructions for the demo of t5/flan have you use specific namespaces. But namespaces are cluster-scoped and must be unique. Two users in the same cluster cannot create a namespace minio

It would be good if the instructions let you specify namespaces ahead of time (via bash env) for minio and other components and then used those vars. For example, myminio and mydemo

[SPIKE] Impact of TGIS's new req on safetensor format

Caikit + TGIS returns empty answer when using HTTP calls

While using the ServingRuntime definition from https://github.com/opendatahub-io/caikit-tgis-serving/pull/131/files#diff-94e62eddc4f3b075ea6c7d9eb86d45728d2c9ebb3c00ae43fd81863ccb6c01f9 which leverages on REST call (HTTP port 8080) I'm facing issues in getting the model answers.

The query returns empty response. These are 2 example of REST calls I tried:

curl -d '{"model_id": "<model_name>","inputs": "At what temperature does water boil?"}' -insecure <ksvc_url>:8080/api/v1/task/text-generation

curl --json '{"model_id": "<model_name>","inputs": "At what temperature does water boil?"}' -insecure <ksvc_url>:8080/api/v1/task/text-generation

I also tried by getting the cluster CA secret and include it in the curl call like this:

oc get secret -n openshift-ingress router-certs-default -o json | jq '.data."tls.crt"' | sed 's/"//g' | base64 -d > <filename>.crt
curl --json '{"model_id": "<model_name>","inputs": "At what temperature does water boil?"}' -insecure <ksvc_url>:8080/api/v1/task/text-generation --cacert <filename>.crt

Is there anything wrong with the way I'm performing the call? Please notice that using the same ServingRuntime set to use gRPC port it works just fine

Metrics for caikit, tgi and istio were not observed in the deployed namespace

After following instructions to deploy and access Metrics on RHODS 1.32 v2 RC7 (brew.registry.redhat.io/rh-osbs/iib:568805), according to:
https://github.com/opendatahub-io/caikit-tgis-serving/blob/main/demo/kserve/metrics.md
which involves applying 2 configmaps into a test namespace (for example TEST_NS=watsonx)

The configMaps were created in the test namespace:

$▶ oc describe configmap/cluster-monitoring-config -n ${TEST_NS}
Name:         cluster-monitoring-config
Namespace:    watsonx
Labels:       <none>
Annotations:  <none>

Data
====
config.yaml:
----
enableUserWorkload: true


BinaryData
====

Events:  <none>

$▶ oc describe configmap/user-workload-monitoring-config -n ${TEST_NS}
Name:         user-workload-monitoring-config
Namespace:    watsonx
Labels:       <none>
Annotations:  <none>

Data
====
config.yaml:
----
prometheus:
  logLevel: debug 
  retention: 15d #Change as needed


BinaryData
====

Events:  <none>

But the expected metrics for caikit, tgi or istio were not observed as expected:

Looking at the default namespace openshift-monitoring - we can see that the original configmap was not changed:

Apparently, if updating the default configmaps in openshift-user-workload-monitoring and in cluster-monitoring-config then the expected metrics show up.

Related feature: opendatahub-io/caikit#3

Support demo/kserve script for ODH

At the moment, the scripts only support RHODS/PREVIEW RHODS.

However, the new odh operator v2.1 is out so we need to enhance the scripts to support opendatahub.

Installation script need to be update when ODH 1.10 is out

The script for odh supports odh operator 1.9 that uses alphav1 api. From ODH 1.10, it will use v1 api like RHODS so we need to remove this part.

https://github.com/opendatahub-io/caikit-tgis-serving/pull/84/files#r1331736041

Script-based installation of Kserve with 404 link

In Readme:
https://github.com/opendatahub-io/caikit-tgis-serving/blob/main/demo/kserve/scripts/README.md

The Step-by-step commands leads to:

SMMR update is not needed anymore

SMMR update part from scripts is not needed anymore because a new reconciliation to control the SMMR was added into odh-model-controller.

gRPC connection with Python over SSL does not (always) work

Caikit+TGIS stack deployed as per procedure.
Model loaded and working, confirmed with grpcurl command with the --insecure parameter that bypass certificate validation.

I'm now trying to make this work with the grpc library in Python. The Python implementation does not allow to bypass certificate validation for TLS encryption (grpcurl is written in Go, for which the bypass is implemented, therefore working).

So to get it to work you have to export the SSL certificate and use it when defining the channel. Like this:

with open('certificate.pem', 'rb') as f:
    creds = grpc.ssl_channel_credentials(f.read())

server_address = inference_server_url

channel = grpc.secure_channel(server_address, creds)

This work on some servers, but not on others where you get this error:

_MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "failed to connect to all addresses; last error: UNKNOWN: ipv4:52.87.25.239:443: Peer name caikit-example-isvc-predictor-kserve-demo.apps.aisrhods-dell.bj30.p1.openshiftapps.com is not in peer certificate"
	debug_error_string = "UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:52.87.25.239:443: Peer name caikit-example-isvc-predictor-kserve-demo.apps.aisrhods-dell.bj30.p1.openshiftapps.com is not in peer certificate {grpc_status:14, created_time:"2023-10-09T16:09:55.378527928+00:00"}"
>

Self-signed certificates format are identical in both cases (only the CN or Organization changes obviously), installation of the Caikit+TGIS stack is identical as far as we can tell.

So to solve the issue it's either:

provide a working recipe to work with the self-signed certificates used in the deployment
provide a working recipe to bypass self-signed certificates
provide an http interface as Python requests methods will allow to bypass self-signed certificates
Any other solution to be able to consume Caikit+TGIS from Python (!grpcurl not an option)

TGIS standalone SR and ISVC

After Caikit-TGIS image splitting, the goal is:

to create the TGIS standalone ServingRuntime
and InferenceService
basic instructions on deploying and doing a sample inference call

Add documentation about the available metrics

Based on watsonx requirements, we should make available these metrics, at least:

'# of inference requests over defined time period
Avg. response time over defined time period
'# of successful / failed inference requests over defined time period
Compute utilization (CPU,GPU,Memory)

However, users won't find metrics with the same name and some of them need to be computed by combination. Examples:

failed inference requests over defined time period: you must do sth like tgi_batch_inference_count-tgi_batch_inference_success plus adding the time period syntax
Memory consumption: there isn't a specific istio/tgi/caikit metric for it (at least, i didn't find it). I thought users can compute it with sth similar to: sum(container_memory_working_set_bytes{pod='<isvc_predictor_pod_name>',namespace='<isvc_namespace>',container='',}) BY (pod, namespace)

Moreover, there are additional metrics which deserves to be documented, like tgi_request_generated_tokens_count

Service mesh resources are not completely removed

Not all service mesh resources are being remove with the uninstall script. See https://docs.openshift.com/container-platform/4.13/service_mesh/v2x/removing-ossm.html for list of resources to remove

Jaeger is needed for installing Service Mesh

With a fresh cluster, the scripts/doc is not working because SCMP is not running properly with this msg:

    - lastTransitionTime: '2023-10-20T12:54:10Z'
      message: >-
        Dependency "Jaeger CRD" is missing: error: no matches for kind "Jaeger"
        in version "jaegertracing.io/v1"
      reason: DependencyMissingError
      status: 'False'
      type: Reconciled
    - lastTransitionTime: '2023-10-20T12:54:10Z'
      message: >-
        Dependency "Jaeger CRD" is missing: error: no matches for kind "Jaeger"
        in version "jaegertracing.io/v1"
      reason: DependencyMissingError
      status: 'False'
      type: Ready

I am not sure if it is ServiceMesh issue or not but at least, it blocked KServe installation so we need to add jaeger as a pre-requisite. By the way, we removed jaeger with this confirm msg

The Kiali and Jaeger Tracing operators are not required to be installed

@bartoszmajsak do you have any idea?

grpc health prob need to be added for serving runtime

At the moment, there is no readiness probe. grpc health prob would be a good one to check it.
https://github.com/grpc-ecosystem/grpc-health-probe/

Investigate the use of dev containers as mechanism for supporting development on Apple silicon

Developing and testing of the caikit-tgis-serving component on an Apple laptop (Intel and ARM chipsets) does not seem to be supported by this project.

We need to find a way to allow developers using Apple hardware to make meanifull contributions to the project. To that end we should:

Investigate if the use of dev containers would allow for meaningful contributions
If dev containers prove to be beneficial, we should document:

how to setup the environment
how to debug / unit test / integration test in the environment

An initial pass at the problem has been discussed in #171.

Synchronization issue when the model is just launched

Describe the bug

There is a synchronization issue at the launch of the Pod with the current images:

the containers get all Ready:

flan-t5-small-gpu-predictor-00001-deployment-6768c548d8-8btqc   4/4     Running   0          41s

the model appears as Loaded in the inference service:

  modelStatus:
    copies:
      failedCopies: 0
      totalCopies: 1
    states:
      activeModelState: Loaded
      targetModelState: Loaded

but the model takes several extra seconds to be able to serve requests:

HOST=...
METHOD=caikit.runtime.Nlp.NlpService/TextGenerationTaskPredict
while true; do
  GRPCURL_DATA=$(echo "{'max_new_tokens': 25, 'min_new_tokens': 25, 'text': 'At what temperature does liquid Nitrogen boil?'}" | sed "s/'/\"/g")
  grpcurl  -insecure  -d "$GRPCURL_DATA"  -H "mm-model-id: flan-t5-small-caikit"  $HOST  $METHOD
  sleep 1
done

ERROR:
  Code: Internal
  Message: Unhandled exception during prediction
ERROR:
  Code: Internal
  Message: Unhandled exception during prediction
ERROR:
  Code: Internal
  Message: Unhandled exception during prediction
ERROR:
  Code: Internal
  Message: Unhandled exception during prediction
ERROR:
  Code: Internal
  Message: Unhandled exception during prediction
ERROR:
  Code: Internal
  Message: Unhandled exception during prediction
ERROR:
  Code: Internal
  Message: Unhandled exception during prediction
{
  "generated_text": "74 degrees F.C., a temperature of 74 degrees F.C., a temperature of ",
  "generated_tokens": "25",
  "finish_reason": "MAX_TOKENS",
  "producer_id": {
    "name": "Text Generation",
    "version": "0.1.0"
  },
  "input_token_count": "10"
}

in the transformer-container logs, we can see this error:

{"channel": "GP-SERVICR-I", "exception": null, "level": "warning", "log_code": "<RUN49049070W>", "message": "<_InactiveRpcError of RPC that terminated with:
\tstatus = StatusCode.UNAVAILABLE
\tdetails = \"failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8033: Failed to connect to remote host: Connection refused\"
\tdebug_error_string = \"UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8033: Failed to connect to remote host: Connection refused {created_time:\"2023-10-24T11:48:51.016344787+00:00\", grpc_status:14}\"
>", "model_id": "flan-t5-small-caikit", "num_indent": 0, "stack_trace": "Traceback (most recent call last):
  File \"/caikit/lib/python3.9/site-packages/caikit/runtime/servicers/global_predict_servicer.py\", line 283, in _handle_predict_exceptions
    yield
  File \"/caikit/lib/python3.9/site-packages/caikit/runtime/servicers/global_predict_servicer.py\", line 260, in predict_model
    response = work.do()
  File \"/caikit/lib/python3.9/site-packages/caikit/runtime/work_management/abortable_action.py\", line 118, in do
    return self.__work_thread.get_or_throw()
  File \"/caikit/lib/python3.9/site-packages/caikit/core/toolkit/destroyable_thread.py\", line 188, in get_or_throw
    raise self.__runnable_exception
  File \"/caikit/lib/python3.9/site-packages/caikit/core/toolkit/destroyable_thread.py\", line 124, in run
    self.__runnable_result = self.runnable_func(
  File \"/caikit/lib/python3.9/site-packages/caikit_nlp/modules/text_generation/text_generation_tgis.py\", line 237, in run
    return self.tgis_generation_client.unary_generate(
  File \"/caikit/lib/python3.9/site-packages/caikit_nlp/toolkit/text_generation/tgis_utils.py\", line 315, in unary_generate
    batch_response = self.tgis_client.Generate(request)
  File \"/caikit/lib64/python3.9/site-packages/grpc/_channel.py\", line 1161, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File \"/caikit/lib64/python3.9/site-packages/grpc/_channel.py\", line 1004, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
\tstatus = StatusCode.UNAVAILABLE
\tdetails = \"failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8033: Failed to connect to remote host: Connection refused\"
\tdebug_error_string = \"UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: ipv4:127.0.0.1:8033: Failed to connect to remote host: Connection refused {created_time:\"2023-10-24T11:48:51.016344787+00:00\", grpc_status:14}\"
>
", "thread_id": 140123215742720, "timestamp": "2023-10-24T11:48:51.017178"}

Platform

quay.io/opendatahub/text-generation-inference@sha256:0e3d00961fed95a8f8b12ed7ce50305acbbfe37ee33d37e81ba9e7ed71c73b69
quay.io/opendatahub/caikit-tgis-serving@sha256:adb8d1153b900e304fbcc934189c68cffea035d4b82848446c72c3d5554ee0ca

Sample Code

caikit_tgit_config.yaml.log
inference_service.yaml.log
serving_runtime.yaml.log

Restructure and Enhance the QuickStart documentation

The new documentation should have sections that answer the following questions:

Installation Steps
How do you add/deploy a model and a sample inference URL with a sample model in the example?
Remove a model (undeploy)
Canary rollout, A/B testing
How to upgrade runtime?
How to access metrics?

Remove namespaces from demo/kserve/custom-manifests/metrics yamls

In https://github.com/opendatahub-io/caikit-tgis-serving/blob/main/demo/kserve/metrics.md

It is mentioned to use -n $TEST_NS for applying:
custom-manifests/metrics/uwm-cm-enable.yaml and custom-manifests/metrics/uwm-cm-conf.yaml

However, since the namespace was already defined in the yamls, it will fail with:

$▶ oc apply -f ./custom-manifests/metrics/uwm-cm-enable.yaml -n $TEST_NS
error: the namespace from the provided object "openshift-monitoring" does not match the namespace "kserve-demo".
You must pass '--namespace=openshift-monitoring' to perform this operation.

Please update the yamls - and remove the namespace item, in order for above command to work.

Related feature: opendatahub-io/caikit#3

Caikit-tgis-serving with July 28th code drop cannot load models

Currently the July 28th caikit-nlp (the one used in pr-25) is not able to load models with errors of the following form:

{"channel": "TXT_GEN", "exception": null, "level": "error", "log_code": "<NLP51672289E>", "message": "exception raised: ValueError('value check failed: Cannot run model /opt/models/flan-t5-small-caikit/artifacts with TGIS locally since it has no base artifacts')", "num_indent": 0, "thread_id": 139931863193344, "timestamp": "2023-07-28T18:55:16.932746"}
{"channel": "MODEL-LOADER", "exception": null, "level": "error", "log_code": "<RUN62912924E>", "message": "load failed when processing path: /opt/models/flan-t5-small-caikit with error: ValueError('value check failed: Cannot run model /opt/models/flan-t5-small-caikit/artifacts with TGIS locally since it has no base artifacts')", "model_id": "flan-t5-small-caikit", "num_indent": 0, "thread_id": 139931863193344, "timestamp": "2023-07-28T18:55:16.933255"}

This seems to have something to do with the caikit tgis local backend

Update all trackers and issues accordingly

FAQ doc to gather known issues/solution

In order to solve typical issues, it would be a good idea to start documenting this in FAQ style.

This doc will help users solve their issues by themselves.

Create a list of package requirements

Using the latest version https://github.com/opendatahub-io/caikit-tgis-serving/blob/main/demo/kserve/scripts/generate-wildcard-certs.sh the script is failing with the following error:

+ openssl req -x509 -newkey rsa:2048 -sha256 -days 3560 -nodes -subj '/CN=<value>' -extensions san -config <configpath> -CA <cert_path> -CAkey <key_path> -keyout <keyout_path> -out <out_path>
req: Unrecognized flag CA
req: Use -help for summary.

The cause was due an outdated pkg installed on the system. It could happen to anyone using the script, hence it would be good to explicitly say the minimum reqs

Caikit/TGIS swallows model loading running out of memory

When trying to load a model in a Pod running with a memory limit too low, the out-of-memory error message is swallowed by TGIS and hard to troubleshoot (in addition to Caikit swallowing the TGIS error):

2023-09-26T09:40:45.259993Z  INFO text_generation_launcher: Starting shard 0
Shard 0: supports_causal_lm = False, supports_seq2seq_lm = True
2023-09-26T09:40:55.279072Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-09-26T09:40:57.571196Z ERROR text_generation_launcher: Shard 0 failed to start:

2023-09-26T09:40:57.571219Z  INFO text_generation_launcher: Shutting down shards
{"channel": "TGISPROC", "exception": null, "level": "error", "log_code": "<MTS11752287E>", "message": "exception raised: RuntimeError('TGIS failed to boot up with the model. See logs for details')", "num_indent": 0, "thread_id": 140590947739392, "timestamp": "2023-09-26T09:40:59.288074"}

while troubleshooting it, I observed that even TGIS return code does not refect the OOM error, although my attemps confirmed that not giving enough memory was the cause of the load failure:

sh-4.4$ text-generation-launcher --num-shard 1 --model-name /mnt/models/flan-t5-large/artifacts/ --port 3000;
2023-09-26T11:42:33.150862Z  INFO text_generation_launcher: Launcher args: Args { model_name: "/mnt/models/flan-t5-large/artifacts/", revision: None, deployment_framework: "hf_transformers", dtype: None, dtype_str: Some("float16"), num_shard: Some(1), max_concurrent_requests: 150, max_sequence_length: 4096, max_new_tokens: 1024, max_batch_size: 256, max_batch_weight: Some(47458400), max_prefill_weight: None, max_waiting_tokens: 24, port: 3000, grpc_port: 8033, shard_uds_path: "/tmp/text-generation-server", master_addr: "localhost", master_port: 29500, json_output: false, tls_cert_path: None, tls_key_path: None, tls_client_ca_cert_path: None, output_special_tokens: false, cuda_process_memory_fraction: 1.0 }
2023-09-26T11:42:33.151097Z  INFO text_generation_launcher: Starting shard 0
Shard 0: supports_causal_lm = False, supports_seq2seq_lm = True
2023-09-26T11:42:43.180572Z  INFO text_generation_launcher: Waiting for shard 0 to be ready...
2023-09-26T11:42:50.384697Z ERROR text_generation_launcher: Shard 0 failed to start:

2023-09-26T11:42:50.384723Z  INFO text_generation_launcher: Shutting down shards
sh-4.4$ echo $?
1

Build a new image with the latest nlp version

HuggingFace PR merged and odh caikit-nlp repo was updated. As a result, caikit tgis serving runtime needs to build a new image with the latest source and library.

huggingface/peft#400
opendatahub-io/caikit-nlp#6

model_id is mandatory for model queries but it can have any values

in order to query a model using caikit+TGIS runtime, we must pass the model_id parameter in the HTTP payload (or mm-model-id for grpc).
However, it can have any value and as far as the endpoint is correct, the model responses.

In the following screenshot you can see 3 calls:

first one using the actual model id
second one using a dummy model id
third one without the model id parameter

Serveless resources are not completely removed

Some resources of KNative are not being removed with the uninstall script. See here https://docs.openshift.com/container-platform/4.8/serverless/install/removing-openshift-serverless.html#serverless-deleting-crds_removing-openshift-serverless for more details

KServe does not catch Caikit runtime status correctly when subsprocess (tgis) have issues

When I create a ServingRuntime+InferenceService with some incorrect parameters, Caikit cannot load the model.

{"channel": "MODEL-LOADER", "exception": null, "level": "error", "log_code": "<RUN62912924E>", "message": "load failed when processing path: /mnt/models/flan-t5-small-caikit with error: RuntimeError('TGIS failed to boot up with the model. See logs for details')", "model_id": "flan-t5-small-caikit", "num_indent": 0, "thread_id": 140660900353792, "timestamp": "2023-09-21T19:39:45.781105"}

This part is expected. However, the InferenceService still shows the model as Loaded, which is unexpected:

  modelStatus:
    copies:
      failedCopies: 0
      totalCopies: 1
    states:
      activeModelState: Loaded
      targetModelState: Loaded
    transitionStatus: UpToDate

[SPIKE] The system must support the ability to update the Caikit serving runtime version without impacting actively served models

From req doc:
For example, the upstream version will be updated and need to incorporate in RHODS as appropriate without impacting deployed models. A new RHODS release must not break model serving functionality.

Caikit Standalone image/SR

Caikit standalone image/SR needs to be created.
Several steps required:

Contribute the RH dockerfile upstream
Sync upstream with odh repo
Build the image and share the image here
Add the SR and Inference Service to caikit-tgis-serving repo
Communicate to QE and UI teams the new SR

Update `caikit-nlp` version to resolve caikit/caikit-nlp#237

The current version of caikit-tgis-serving exposes some GRPC function arguments (and services) in random order. Two examples:

The critical part of it (random argument order) is already solved in caikit/caikit-nlp#237 main. PR is open to fix the service method order.

The caikit-nlp git ref should be updated before publishing the next release of caikit-tgis-serving.

add readiness probe on TGIS container (caikit+tgis)

model should be loaded
more?

@Xaenalt

Review OpenShift CI flows

Per a comment in openshift-ci troubleshooting, it's possible our build/push workflow isn't quite correct: Thread

It's not 100% clear to me, but it seems like the mirror job also builds, or at least waits for the build, but the comment from the team there makes it unclear

It's also a good chance to review the entirety of our test workflows with the openshift-ci team, and use the 'request consultation' option to do an overall review of our openshift-ci jobs for better maintenance in the future

ODH Operator v2 changed API so docs/scripts are not working now

OpenDataHub Operator v2.1 changed the api.

The DataScienceCluster resource at [this address](https://github.com/opendatahub-io/caikit-tgis-serving/blob/main/demo/kserve/custom-manifests/opendatahub/kserve-dsc.yaml) doesn't work with the latest RC update (true/false vs Managed/Removed)

Due to this change, manifests need to be updated.

add/improve docs for caikit+tgis setup

#107 added a caikit image which relies on a separate tgis container. This also includes an example setup of the ServingRuntime/InferenceService manifest that can be deployed and tested, but the documentation is missing and/or outdated.