Comments (12)
Looks like you are using KServe serverless mode, which uses Knative.
Knative always tries to resolve image tags to digests, which is an operation that requires access to the registry (reference: https://knative.dev/docs/serving/tag-resolution/)
Thus, you may want your try using the digest of your image in the InferenceServices, instead of 0.9.0-gpu
.
from kserve.
Hi, @israel-hdez, @spolti! Thanks a lot! This works for me!
I edited the ConfigMap config-deployment
microk8s kubectl edit configmap config-deployment -n knative-serving
by adding the following line:
registries-skipping-tag-resolving: "kind.local,ko.local,dev.local,index.docker.io"
and the local image was successfully applied in InferenceService.
from kserve.
Hi, I never used microk8s before, but there are a few things that might be causing it:
First, shouldn't you use the complete image name instead just yurkoff/torchserve-kfs:0.9.0-gpu
?
Secondly, this looks strange:
"https://index.docker.io/v2 /"
notice the space in the last /
you might need to investigate why is this API address having the extra space at the end.
from kserve.
Hello!
Thanks for the answer.
There is no space there, apparently it was copied incorrectly from Linux. I tried using the full name (docker.io/yurkoff/torchserve-kfs:0.9.0-gpu
) too.
Revision "llm-predictor-00001" failed with message: Unable to fetch image "docker.io/yurkoff/torchserve-kfs:0.9.0-gpu": failed to resolve image to digest: Get "https://index.docker.io/v2/": read tcp 10.1.22.219:40004->54.236.113.205:443: read: connection reset by peer.
Interestingly, KubeFlow is automatically deployed from the local images, but the InferenceService is not possible.
from kserve.
you might need to do this in your isvc: https://kserve.github.io/website/0.11/modelserving/v1beta1/custom/custom_model/#deploy-the-rest-custom-serving-runtime-on-kserve
Using SHA might be helpful as well.
The podSpec is exposed in the isvc as inline, so any PodSpec field would be available like the example above.
from kserve.
I don’t quite understand what exactly I need to do? I compiled the image in Docker. It is successfully downloaded and deployed in a cluster with Internet access. From this cluster I export the image as a tar file. I import the resulting image into the cluster without the Internet. For some reason, InferenceService thinks that it does not exist and tries to download it. If you create a Deployment, it considers that the image is present.
from kserve.
See the inference service structure from the link I sent you. PullImagePolicy and the container is a property from the containers field.
from kserve.
Sorry, but I didn't find any mention of imagePullPolicy in the link provided. However, this parameter is in the description of V1beta1TorchServeSpec.
from kserve.
I tried to organize a local registry. I uploaded my image yurkoff/torchserve-kfs:0.9.0-gpu there, but I get the following error:
Message: Revision "llm-predictor-00001" failed with message: Unable to fetch image "127.0.0.1:32000/yurkoff/torchserve-kfs:0.9.0-gpu": failed to resolve image to digest: Get "https://127.0.0.1:32000/v2/": dial tcp 127.0.0.1:32000: connect: connection refused; Get "http://127.0.0.1:32000/v2/": dial tcp 127.0.0.1:32000: connect: connection refused.
Given that the registry is available.
curl -v http://127.0.0.1:32000/v2/
* Trying 127.0.0.1:32000...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 32000 (#0)
> GET /v2/ HTTP/1.1
> Host: 127.0.0.1:32000
> User-Agent: curl/7.68.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Length: 2
< Content-Type: application/json; charset=utf-8
< Docker-Distribution-Api-Version: registry/2.0
< X-Content-Type-Options: nosniff
< Date: Thu, 16 May 2024 11:03:13 GMT
<
{}
* Connection #0 to host 127.0.0.1 left intact
curl -v http://127.0.0.1:32000/v2/_catalog
* Trying 127.0.0.1:32000...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 32000 (#0)
> GET /v2/_catalog HTTP/1.1
> Host: 127.0.0.1:32000
> User-Agent: curl/7.68.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Content-Type: application/json; charset=utf-8
< Docker-Distribution-Api-Version: registry/2.0
< X-Content-Type-Options: nosniff
< Date: Thu, 16 May 2024 12:00:45 GMT
< Content-Length: 44
<
{"repositories":["yurkoff/torchserve-kfs"]}
* Connection #0 to host 127.0.0.1 left intact
I can’t understand what information InferenceService wants to receive from outside if everything is available locall
from kserve.
Hi, what I meant was to use this structure:
spec:
predictor:
containers:
image: xxx
name: kserve-container
ports: xxx
or you can define it in your custom Serving Runtime as well.
from kserve.
Hi, @spolti !
I tried this, same result.
My yaml-file:
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: "llm"
namespace: "kubeflow-megaputer"
spec:
predictor:
containers:
- name: kserve-container
image: "yurkoff/torchserve-kfs:0.9.0-gpu"
imagePullPolicy: IfNotPresent
# storageUri: pvc://torchserve-claim/llm
env:
- name: STORAGE_URI
value: pvc://torchserve-claim/llm
resources:
requests:
cpu: "2"
memory: 16Gi
nvidia.com/gpu: "1"
limits:
cpu: "4"
memory: 24Gi
nvidia.com/gpu: "1"
from kserve.
Nice @israel-hdez , didn't spot it :D
from kserve.
Related Issues (20)
- Add Oracle Cloud Infrastructure (OCI) Object Storage as a storage agent
- VirtualService regex match should be case insensitive
- Python SDK for KServe and Kubeflow Pipelines can not be installed at the same time HOT 2
- UnicodeDecodeError for grpcurl request with Bytes column in DataFrame HOT 7
- error setting up interface service HOT 2
- mlflow model cannot be loaded HOT 8
- stop using `gcr.io/kubebuilder/kube-rbac-proxy` before `18 March 2025` (image being deleted) HOT 1
- add Xinfernece ( an inference platform which integrated transformers, vllm, and llama.cpp as engines,) runtime for LLM Serving Runtime HOT 5
- Completion fails when echo is true with vLLM backend
- protobuf version conflict while trying to integrate with kfp HOT 2
- Client fails to list clusterservingruntimes HOT 2
- Not able to access torchserve custom metrics after deploying inference service on kserve
- The request to InferenceService is sent twice
- Getting timeout failed to failed to call webhook: Post "https://kserve-webhook-server-service.default.svc:443/mutate-serving-kserve-io-v1beta1-inferenceservice?timeout=10s" HOT 7
- Multi-Lora support
- fake client returns no kind "ClusterServingRuntimeList" is registered for version "serving/v1alpha1" HOT 1
- duplicated hosts error after configuring the additional domains HOT 1
- Document missing content
- Can't find version compatibility matrix for KServe HOT 4
- Make inference to models using Istio and keycloak avoiding session cookies HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kserve.