Code Monkey home page Code Monkey logo

Comments (10)

sivanantha321 avatar sivanantha321 commented on September 24, 2024

@VikasAbhishek Can you post the response of http://${Host}:${Port}/v1/models

from kserve.

VikasAbhishek avatar VikasAbhishek commented on September 24, 2024

@VikasAbhishek Can you post the response of http://${Host}:${Port}/v1/models

  • Trying 127.0.0.1:8080...
  • TCP_NODELAY set
  • Connected to localhost (127.0.0.1) port 8080 (#0)

GET /v1/models HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.68.0
Accept: /

  • Mark bundle as not supporting multiuse
    < HTTP/1.1 404 Not Found
    < date: Mon, 13 May 2024 06:49:27 GMT
    < server: istio-envoy
    < content-length: 0
    <
  • Connection #0 to host localhost left intact

from kserve.

sivanantha321 avatar sivanantha321 commented on September 24, 2024

Have you added Host header ?

from kserve.

VikasAbhishek avatar VikasAbhishek commented on September 24, 2024

from kserve.

VikasAbhishek avatar VikasAbhishek commented on September 24, 2024

from kserve.

sivanantha321 avatar sivanantha321 commented on September 24, 2024

@VikasAbhishek The response is not available in your comment, anyways you can verify if the model is ready, and can view the model name. Try using this as the model name for inference . If the response is empty, this may mean that the model is not loaded. In that case please, verify the model server logs.

from kserve.

VikasAbhishek avatar VikasAbhishek commented on September 24, 2024

from kserve.

sivanantha321 avatar sivanantha321 commented on September 24, 2024

@VikasAbhishek As mentioned earlier, the model is not loaded. Please, verify the model server logs and storage initializer logs.

from kserve.

fschlz avatar fschlz commented on September 24, 2024

I have a similar issue using the InferenceService on Azure AKS with a MLflow tracked model.

This is the structure of my model directory:

└── model
    ├── MLmodel
    ├── conda.yaml
    ├── metadata
    │   ├── MLmodel
    │   ├── conda.yaml
    │   ├── python_env.yaml
    │   └── requirements.txt
    ├── model-settings.json
    ├── model.pkl
    ├── python_env.yaml
    └── requirements.txt

I followed the Azure guide here and MLflow guide here but cannot seem to get the InferenceService to deploy correctly.

This is my model.yml

apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
  name: "wine-classifier"
  namespace: "mlflow-kserve-test"
spec:
  predictor:
    serviceAccountName: sa
    model:
      modelFormat:
        name: mlflow
      protocolVersion: v2
      storageUri: "https://{SA}.blob.core.windows.net/azureml/ExperimentRun/dcid.{RUNID}/model"

I tried using storageUri: "https://{SA}.blob.core.windows.net/azureml/ExperimentRun/dcid.{RUNID}/model/model.pkl", but then the service doesn't start properly because it cannot build the environment.

Here are the log outputs from the kserve-container:

Environment tarball not found at '/mnt/models/environment.tar.gz'
Environment not found at './envs/environment'
2024-06-05 14:36:17,236 [mlserver.parallel] DEBUG - Starting response processing loop...
2024-06-05 14:36:17,238 [mlserver.rest] INFO - HTTP server running on http://0.0.0.0:8080
INFO:     Started server process [1]
INFO:     Waiting for application startup.
2024-06-05 14:36:17,267 [mlserver.metrics] INFO - Metrics server running on http://0.0.0.0:8082
2024-06-05 14:36:17,267 [mlserver.metrics] INFO - Prometheus scraping endpoint can be accessed on http://0.0.0.0:8082/metrics
INFO:     Started server process [1]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
2024-06-05 14:36:18,568 [mlserver.grpc] INFO - gRPC server running on http://0.0.0.0:9000
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
INFO:     Uvicorn running on http://0.0.0.0:8082 (Press CTRL+C to quit)
2024/06/05 14:36:19 WARNING mlflow.pyfunc: Detected one or more mismatches between the model's dependencies and the current Python environment:
- mlflow (current: 2.3.1, required: mlflow==2.12.2)
- cloudpickle (current: 2.2.1, required: cloudpickle==3.0.0)
- numpy (current: 1.23.5, required: numpy==1.24.4)
- packaging (current: 23.1, required: packaging==23.2)
- psutil (current: uninstalled, required: psutil==5.9.8)
- pyyaml (current: 6.0, required: pyyaml==6.0.1)
- scikit-learn (current: 1.2.2, required: scikit-learn==1.3.2)
- scipy (current: 1.9.1, required: scipy==1.10.1)
To fix the mismatches, call `mlflow.pyfunc.get_model_dependencies(model_uri)` to fetch the model's environment and install dependencies using the resulting environment file.
2024-06-05 14:36:19,791 [mlserver] INFO - Couldn't load model 'wine-classifier'. Model will be removed from registry.
2024-06-05 14:36:19,791 [mlserver.parallel] ERROR - An error occurred processing a model update of type 'Load'.
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/mlserver/parallel/worker.py", line 158, in _process_model_update
await self._model_registry.load(model_settings)
File "/opt/conda/lib/python3.8/site-packages/mlserver/registry.py", line 293, in load
return await self._models[model_settings.name].load(model_settings)
File "/opt/conda/lib/python3.8/site-packages/mlserver/registry.py", line 148, in load
await self._load_model(new_model)
File "/opt/conda/lib/python3.8/site-packages/mlserver/registry.py", line 165, in _load_model
model.ready = await model.load()
File "/opt/conda/lib/python3.8/site-packages/mlserver_mlflow/runtime.py", line 155, in load
self._model = mlflow.pyfunc.load_model(model_uri)
File "/opt/conda/lib/python3.8/site-packages/mlflow/pyfunc/__init__.py", line 582, in load_model
model_meta = Model.load(os.path.join(local_path, MLMODEL_FILE_NAME))
File "/opt/conda/lib/python3.8/site-packages/mlflow/models/model.py", line 468, in load
return cls.from_dict(yaml.safe_load(f.read()))
File "/opt/conda/lib/python3.8/site-packages/mlflow/models/model.py", line 478, in from_dict
model_dict["signature"] = ModelSignature.from_dict(model_dict["signature"])
File "/opt/conda/lib/python3.8/site-packages/mlflow/models/signature.py", line 83, in from_dict
inputs = Schema.from_json(signature_dict["inputs"])
File "/opt/conda/lib/python3.8/site-packages/mlflow/types/schema.py", line 360, in from_json
return cls([read_input(x) for x in json.loads(json_str)])
File "/opt/conda/lib/python3.8/site-packages/mlflow/types/schema.py", line 360, in <listcomp>
return cls([read_input(x) for x in json.loads(json_str)])
File "/opt/conda/lib/python3.8/site-packages/mlflow/types/schema.py", line 358, in read_input
return TensorSpec.from_json_dict(**x) if x["type"] == "tensor" else ColSpec(**x)
TypeError: __init__() got an unexpected keyword argument 'required'
2024-06-05 14:36:19,793 [mlserver] INFO - Couldn't load model 'wine-classifier'. Model will be removed from registry.
2024-06-05 14:36:19,795 [mlserver.parallel] ERROR - An error occurred processing a model update of type 'Unload'.
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/mlserver/parallel/worker.py", line 160, in _process_model_update
await self._model_registry.unload_version(
File "/opt/conda/lib/python3.8/site-packages/mlserver/registry.py", line 302, in unload_version
await model_registry.unload_version(version)
File "/opt/conda/lib/python3.8/site-packages/mlserver/registry.py", line 201, in unload_version
model = await self.get_model(version)
File "/opt/conda/lib/python3.8/site-packages/mlserver/registry.py", line 237, in get_model
raise ModelNotFound(self._name, version)
mlserver.errors.ModelNotFound: Model wine-classifier not found
2024-06-05 14:36:19,796 [mlserver] ERROR - Some of the models failed to load during startup!
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/mlserver/server.py", line 125, in start
await asyncio.gather(
File "/opt/conda/lib/python3.8/site-packages/mlserver/registry.py", line 293, in load
return await self._models[model_settings.name].load(model_settings)
File "/opt/conda/lib/python3.8/site-packages/mlserver/registry.py", line 148, in load
await self._load_model(new_model)
File "/opt/conda/lib/python3.8/site-packages/mlserver/registry.py", line 161, in _load_model
model = await callback(model)
File "/opt/conda/lib/python3.8/site-packages/mlserver/parallel/registry.py", line 152, in load_model
loaded = await pool.load_model(model)
File "/opt/conda/lib/python3.8/site-packages/mlserver/parallel/pool.py", line 74, in load_model
await self._dispatcher.dispatch_update(load_message)
File "/opt/conda/lib/python3.8/site-packages/mlserver/parallel/dispatcher.py", line 123, in dispatch_update
return await asyncio.gather(
File "/opt/conda/lib/python3.8/site-packages/mlserver/parallel/dispatcher.py", line 138, in _dispatch_update
return await self._dispatch(worker_update)
File "/opt/conda/lib/python3.8/site-packages/mlserver/parallel/dispatcher.py", line 146, in _dispatch
return await self._wait_response(internal_id)
File "/opt/conda/lib/python3.8/site-packages/mlserver/parallel/dispatcher.py", line 152, in _wait_response
inference_response = await async_response
mlserver.parallel.errors.WorkerError: builtins.TypeError: __init__() got an unexpected keyword argument 'required'
2024-06-05 14:36:19,796 [mlserver.parallel] INFO - Waiting for shutdown of default inference pool...
2024-06-05 14:36:19,997 [mlserver.parallel] INFO - Shutdown of default inference pool complete
2024-06-05 14:36:19,997 [mlserver.grpc] INFO - Waiting for gRPC server shutdown
2024-06-05 14:36:20,001 [mlserver.grpc] INFO - gRPC server shutdown complete
INFO:     Shutting down
INFO:     Shutting down
INFO:     Waiting for application shutdown.
INFO:     Waiting for application shutdown.
INFO:     Application shutdown complete.
INFO:     Finished server process [1]
INFO:     Application shutdown complete.
INFO:     Finished server process [1]

It manages to create the environment, but cannot load the model.

Calling mlflow.pyfunc.load_model(model_uri) locally, loads the model.
And testing with mlserver start . was also successful.

Deployments of real-time endpoints of the tracked model in Azure Machine Learning also work, but I need an alternative to deploy on prem.

Any help would be much appreciated.

from kserve.

dr3s avatar dr3s commented on September 24, 2024

I have a similar issue. pretty sure it's due to the dependencies not loading for the model but the error is less than helpful. In my case, I'm using a private pypi so thinking that's not working.

from kserve.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.