Comments (33)
@CeliaGMqrz any update on this ?
from charts.
@iamhritik290799 I am doing an ugly hack to work around this for now. You can figure out what your tracking args are by ssh-ing into a running tracking pod and running ps aux | cat
. Thanks for troubleshooting this.
tracking:
command: [ "/bin/sh", "-c" ]
args:
- >
unset MLFLOW_S3_ENDPOINT_URL;
mlflow server --host=0.0.0.0 --port=5000 --app-name=basic-auth
--serve-artifacts --artifacts-destination=s3://$YOUR_BUCKET
--backend-store-uri=postgresql://postgres:$(MLFLOW_DATABASE_PASSWORD)@$YOUR_DB_HOST:5432/mlflow_db;
from charts.
@andresbono I've raised one PR to mitigate this ENV issue.
from charts.
Hi @carrodher
I have checked and after removing this ENV MLFLOW_S3_ENDPOINT_URL from the deployment, it's working fine and I'm able to load artifact in my mlflow experiments
I have checked on the mlflow documentation as well and they suggested to unset MLFLOW_S3_ENDPOINT_URL env on the client system but somehow after removing this env in deployment it worked.
but in the bitnami/mlflow helm chart tracking deployment template there is no parameter used to exclude this env if not required. PFBR
from charts.
We have the same use case as @iamhritik290799 (artifacts saved in S3, no custom endpoint, AWS service user) and the suggested solution also worked for us.
from charts.
@aaj-synth Alas this solution doesn't work for us: If we remove externalS3.host
from values.yaml
, then we get the error message "No Artifacts Recorded Use the log artifact APIs to store file outputs from MLflow runs" in the Artifacts tab of the web interface. So far only the solution suggested by @iamhritik290799 has solved the issue for us.
from charts.
Thanks @iamhritik290799 ! Great work.
from charts.
The only thing that still baffles me is that I am 100% certain I tried including the regional code before. But maybe there was something else misconfigured at that time. Which I guess shows that it is good to challenge your own assumptions.
Edit: the original issue description also included the region, so apparently it was broken at some point but maybe got fixed in mlflow itself? Especially since from an AWS S3 usecase perspective nothing relevant has changed in the chart (as far as I can tell), as I already mentioned in #23959 (comment). Super weird.
Edit edit: Ok, the inital bug description also contained the bucket name in the host, so maybe we really just never tested it with "just" the regional. Well, hopefully the updated value description in my PR will eliminate any remaining question marks that dummies like myself could have in the future and we can finally close this issue once and for all π
from charts.
Hi,
Looking at the issue, it is not clear to me that the issue is related to the Bitnami packaging of MLflow or some issue with S3 inside the mlflow code. Did you check with the upstream developers?
from charts.
checked but they were saying maybe there is some issue in bitnami mlflow image due to which you are getting this error when trying to access your Artifacts from S3 bucket.
btw we are these args in our mlflow container :
containers:
- args:
- server
- --backend-store-uri=postgresql://admin:$(MLFLOW_DATABASE_PASSWORD)@rds-instance-endpoint.rds.amazonaws.com:5432/mlflow
- --artifacts-destination=s3://mlflow-artifacts
- --serve-artifacts
- --host=0.0.0.0
- --port=5000
- --app-name=basic-auth
from charts.
The issue may not be directly related to the Bitnami container image or Helm chart, but rather to how the application is being utilized or configured in your specific environment.
Having said that, if you think that's not the case and are interested in contributing a solution, we welcome you to create a pull request. The Bitnami team is excited to review your submission and offer feedback. You can find the contributing guidelines here.
Your contribution will greatly benefit the community. Feel free to reach out if you have any questions or need assistance.
If you have any questions about the application itself, customizing its content, or questions about technology and infrastructure usage, we highly recommend that you refer to the forums and user guides provided by the project responsible for the application or technology.
With that said, we'll keep this ticket open until the stale bot automatically closes it, in case someone from the community contributes valuable insights.
from charts.
Team, have you made any changes on the helm chart to customize env variable for the mlflow deployment ?
from charts.
Even before we make any changes in the Helm chart, I think we should clarify in which specific cases or scenarios setting the MLFLOW_S3_ENDPOINT_URL
env-var for the tracking component is required. @iamhritik290799, @act-mreeves, can you help clarifying that?
BTW @iamhritik290799, just to confirm, the screenshot you shared that suggests to unset the env-var comes from this documentation page, right? https://mlflow.org/docs/2.10.2/tracking/artifacts-stores.html#setting-bucket-region
from charts.
@andresbono My specific issue which requires me to unset MLFLOW_S3_ENDPOINT_URL
is when using IRSA (IAM for Service accounts) and an AWS S3 Bucket.
You are 100% correct I have not exhaustively tested if and when this env var IS required.
What I think I am seeing that only the bucket name is needed in this scenario and these are the relevant arguments given to the mlflow binary: --serve-artifacts --artifacts-destination=s3://my-mlflow-bucket
.
tracking:
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::1234567890:role/my-mlflow-s3-role
externalS3:
useCredentialsInSecret: false
protocol: "https"
host: "my-mlflow-bucket.s3.us-east-1.amazonaws.com"
bucket: "my-mlflow-bucket"
serveArtifacts: true
In a nut shell I think if you use external s3 per mlflow/mlflow#9523 (comment) if you use minio (which is default on this helm chart) you use MLFLOW_S3_ENDPOINT_URL.
There is a lot of discussion here too: mlflow/mlflow#7104. I think @Gekko0114 would have more domain knowledge to explain what is going on here.
from charts.
@act-mreeves correct. Additionally, in my setup, I only require the "--artifacts-destination" argument, which I can define in the pod args section. However, I do not need the default "MLFLOW_S3_ENDPOINT_URL" environment variable that comes with the Bitnami Helm chart.
@andresbono My request is to either remove this ENV variable that's by default right now in the helm chart or make it optional rather than mandatory.
from charts.
Hi @andresbono , any update on this ?
from charts.
Thank you for all the additional information you provided. Based on that, I think the best option is to remove the environment variable from the deployment. When needed in some specific scenarios, users can always add it via tracking.extraEnvVars
.
Would you like to send a PR addressing the change? Thank you!!
from charts.
Another solution is not to set the externalS3.host
when using IRSA. I tried it and that worked perfectly
from charts.
Okay, that's weird. I'm at the chart version 1.0.2
and using IRSA to provide access to the S3 bucket and RDS instance and for me, i simply removing the external.host
did the trick. On the pod definition i cannot see the env var anymore.
Environment:
BITNAMI_DEBUG: false
MLFLOW_DATABASE_PASSWORD: <redacted>
AWS_STS_REGIONAL_ENDPOINTS: regional
AWS_DEFAULT_REGION: <redacted>
AWS_REGION: <redacted>
AWS_ROLE_ARN: <redacted>
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
from charts.
A team member will review the PR soon. Thank you so much for your contribution.
from charts.
Has someone already verified whether this actually fixes the issue?
We have now tried to remove @act-mreeves's workaround and update to the v1.0.3 helm chart (I believe the fix is already included there?) but we are now back to experiencing issues with artifact access
from charts.
Has someone already verified whether this actually fixes the issue?
We have now tried to remove @act-mreeves's workaround and update to the v1.0.3 helm chart (I believe the fix is already included there?) but we are now back to experiencing issues with artifact access
Nvm, figured it out π
from charts.
FYI, #26462 may look as a regression of this issue, but it shouldn't be. Please check the comments in the PR for more information. TL;DR:
- β
externalS3.host=mlflow-bucket-name.s3.eu-central-1.amazonaws.com
- β
externalS3.host=s3.eu-central-1.amazonaws.com
- β
externalS3.host=s3.amazonaws.com
from charts.
@andresbono cannot confirm, after upgrading to the latest release we get this error yet again!
As stated before MLFLOW_S3_ENDPOINT_URL
should not be set when using AWS S3, see also: mlflow/mlflow#9523 (comment)
The change in #26462 causes yet again MLFLOW_S3_ENDPOINT_URL to be set when using AWS S3. We now had to resort back to using the initial workaround described in #23959 (comment)
Can we get this issue re-opened please...
from charts.
Hi @Jasper-Ben, could you share what is the value you are passing for externalS3.host
? see #23959 (comment). You can redact it, I'm just interested in the format.
I don't know if you had a chance to check the comments of #26462, but we did some extensive testing and it worked for all the test cases, given the proper values were passed.
from charts.
Hi @Jasper-Ben, could you share what is the value you are passing for
externalS3.host
? see #23959 (comment). You can redact it, I'm just interested in the format.I don't know if you had a chance to check the comments of #26462, but we did some extensive testing and it worked for all the test cases, given the proper values were passed.
Yes, I have read the comments and we are using s3.amazonaws.com
as host.
What did these tests include? The initial connection test to s3 works (has been before the initial fix as well). The issue only appears when you try to actually access the artifacts of a job.
If that helps, this is our terraform config (without the workaround):
resource "helm_release" "k8s_mlflow" {
name = local.release_name
namespace = kubernetes_namespace.mlflow.metadata[0].name
repository = "https://charts.bitnami.com/bitnami"
chart = "mlflow"
version = "1.4.22"
values = [
file("${path.module}/helm_values.yaml"),
yamlencode(var.extra_helm_configuration),
yamlencode({
commonLabels = local.common_labels_k8s
tracking = {
auth = {
username = var.tracking_username
password = var.tracking_password
}
persistence = {
enabled = false
}
},
run = {
persistence = {
enabled = false
}
},
externalS3 = {
host = "s3.amazonaws.com"
bucket = local.artifact_bucket # this is the plain bucket name
accessKeyID = aws_iam_access_key.s3_access.id
accessKeySecret = aws_iam_access_key.s3_access.secret
},
externalDatabase = {
host = kubernetes_manifest.postgres.manifest.metadata.name
user = keys(kubernetes_manifest.postgres.manifest.spec.users).0
existingSecret = "${local.postgres_user}.${local.postgres_name}.credentials.postgresql.acid.zalan.do"
existingSecretPasswordKey = "password"
database = "${keys(kubernetes_manifest.postgres.manifest.spec.databases).0}?sslmode=require"
authDatabase = "${keys(kubernetes_manifest.postgres.manifest.spec.databases).1}?sslmode=require"
}
})
]
}
Also, we have the following additional helm values:
minio:
enabled: false
tracking:
nodeAffinityPreset:
type: hard
key: node.kubernetes.io/lifecycle
values:
- normal
service:
type: "ClusterIP"
metrics:
enabled: true
serviceMonitor:
enabled: true
resources:
requests:
cpu: 2
memory: 6Gi
limits:
cpu: 3
memory: 10Gi
auth:
enabled: true
run:
nodeAffinityPreset:
type: hard
key: node.kubernetes.io/lifecycle
values:
- normal
source:
type: "configmap"
postgresql:
enabled: false
from charts.
We have been using s3.amazonaws.com
from the beginning btw, and we have also tried regional endpoint. Both don't work.
Also, I do not see how #26462 would fix anything compared to the pre-#25294 code.
#25294 caused the MLFLOW_S3_ENDPOINT_URL
variable to be set only when the internal minio setup is used. Which fixed it for AWS S3 users but broke it for other external S3 compatible storage solutions.
#26462 from a AWS S3 perspective basically reverted the previous change with the exception that it will now set the MLFLOW_S3_ENDPOINT_URL
variable on the following condition ("hidden" behind the include):
{{- if or .Values.minio.enabled .Values.externalS3.host -}}
Which of course will always evaluate to true
for the AWS S3 use-case, since we also need to set externalS3.host
for mflow to be configured to use S3 at all, thus the MLFLOW_S3_ENDPOINT_URL
variable is set again.
So basically we went full circle on this issue and it has been "fixed" for one use-case while breaking it for another (2x).
Maybe the addressing style stuff from #26462 (comment) fixes things (haven't fully understood / tested that yet) but just setting s3.amazonaws.com
does not.
from charts.
Thank you @Jasper-Ben.
What did these tests include? The initial connection test to s3 works (has been before the initial fix as well). The issue only appears when you try to actually access the artifacts of a job.
You can find what we tested here: #26462 (comment) (unfold Scenario 2). I specifically tested the access to job artifacts, see the screenshot. When I tested it with theexternalS3.host=s3.amazonaws.com
value, it worked fine. There should be some relevant difference between my testing scenario and yours.
I share your concern about going in circles on this issue. My assumption was that setting the proper externalS3.host
was enough, that is why merging #26462 made sense.
Maybe the addressing style stuff from #26462 (comment) fixes things (haven't fully understood / tested that yet)
Please, try that and let us know about any other update you may have.
from charts.
from charts.
Just to reiterate the status quo:
I set up a second test instance using the exact same configuration as mentioned in #23959 (comment).
The important bits:
externalS3.host
is set tos3.amazonaws.com
externalS3.bucket
is set to a plain bucket name
I then used the following example project to create an experiment:
import mlflow
from mlflow.models import infer_signature
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
mlflow.set_tracking_uri(uri="<MLFLOW_URI>")
# Load the Iris dataset
X, y = datasets.load_iris(return_X_y=True)
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Define the model hyperparameters
params = {
"solver": "lbfgs",
"max_iter": 1000,
"multi_class": "auto",
"random_state": 8888,
}
# Train the model
lr = LogisticRegression(**params)
lr.fit(X_train, y_train)
# Predict on the test set
y_pred = lr.predict(X_test)
# Calculate metrics
accuracy = accuracy_score(y_test, y_pred)
# Create a new MLflow Experiment
mlflow.set_experiment("MLflow Quickstart")
# Start an MLflow run
with mlflow.start_run():
# Log the hyperparameters
mlflow.log_params(params)
# Log the loss metric
mlflow.log_metric("accuracy", accuracy)
# Set a tag that we can use to remind ourselves what this run was for
mlflow.set_tag("Training Info", "Basic LR model for iris data")
# Infer the model signature
signature = infer_signature(X_train, lr.predict(X_train))
# Log the model
model_info = mlflow.sklearn.log_model(
sk_model=lr,
artifact_path="iris_model",
signature=signature,
input_example=X_train,
registered_model_name="tracking-quickstart",
)
(basically step 3 and 4 from https://mlflow.org/docs/latest/getting-started/intro-quickstart/index.html)
This will cause the following error while trying to upload the artifacts to AWS S3:
mlflow boto3.exceptions.S3UploadFailedError: Failed to upload /tmp/tmpl8tifcwt/input_example.json to <BUCKET_NAME>/2/d4780a6c6c50403fab62785c7a08d8db/artifacts/iris_model/input_example.json: An error occurred (PermanentRedirect) when calling the PutObject operation: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
So I was able to reproduce the issue on a fresh setup. I will now experiment with the addressing style.
from charts.
I changed the addressing style to path
as suggested in #26462 (comment). Still, I get the same error message. So that does not seem to help.
Pinging @frittentheke for visibility.
The tracking pod looks like this now:
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2024-08-21T13:48:36Z"
generateName: iris-devops-mlflow-test-tracking-ffb6fd9f-
labels:
app.kubernetes.io/component: tracking
app.kubernetes.io/instance: iris-devops-mlflow-test
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: mlflow
app.kubernetes.io/part-of: mlflow
app.kubernetes.io/version: 2.15.1
generator: Terraform
helm.sh/chart: mlflow-1.4.22
pod-template-hash: ffb6fd9f
name: iris-devops-mlflow-test-tracking-ffb6fd9f-kqblg
namespace: mlflow-test
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: iris-devops-mlflow-test-tracking-ffb6fd9f
uid: 0a5ca0e4-48c7-4007-8d37-1edc0156792c
resourceVersion: "755350757"
uid: 709a958f-f5ee-4450-b172-e147e05153a3
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node.kubernetes.io/lifecycle
operator: In
values:
- normal
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchLabels:
app.kubernetes.io/component: mlflow
app.kubernetes.io/instance: iris-devops-mlflow-test
app.kubernetes.io/name: mlflow
topologyKey: kubernetes.io/hostname
weight: 1
automountServiceAccountToken: false
containers:
- args:
- server
- --backend-store-uri=postgresql://mlflow:$(MLFLOW_DATABASE_PASSWORD)@iris-devops-mlflow-test-postgres:5432/mlflow?sslmode=require
- --artifacts-destination=s3://<BUCKET_NAME>
- --serve-artifacts
- --host=0.0.0.0
- --port=5000
- --expose-prometheus=/bitnami/mlflow/metrics
- --app-name=basic-auth
command:
- mlflow
env:
- name: BITNAMI_DEBUG
value: "false"
- name: MLFLOW_DATABASE_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: mlflow.iris-devops-mlflow-test-postgres.credentials.postgresql.acid.zalan.do
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: root-user
name: iris-devops-mlflow-test-externals3
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: root-password
name: iris-devops-mlflow-test-externals3
- name: MLFLOW_S3_ENDPOINT_URL
value: https://s3.amazonaws.com:443
- name: MLFLOW_BOTO_CLIENT_ADDRESSING_STYLE
value: path
image: docker.io/bitnami/mlflow:2.15.1-debian-12-r0
imagePullPolicy: IfNotPresent
livenessProbe:
exec:
command:
- pgrep
- -f
- mlflow.server
failureThreshold: 5
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: mlflow
ports:
- containerPort: 5000
name: http
protocol: TCP
readinessProbe:
failureThreshold: 5
initialDelaySeconds: 5
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: http
timeoutSeconds: 5
resources:
limits:
cpu: "3"
memory: 10Gi
requests:
cpu: "2"
memory: 6Gi
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
runAsGroup: 1001
runAsNonRoot: true
runAsUser: 1001
seLinuxOptions: {}
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp
name: tmp
- mountPath: /app/mlruns
name: mlruns
- mountPath: /app/mlartifacts
name: mlartifacts
- mountPath: /bitnami/mlflow-basic-auth/basic_auth.ini
name: rendered-basic-auth
subPath: basic_auth.ini
- mountPath: /bitnami/mlflow
name: data
dnsPolicy: ClusterFirst
enableServiceLinks: true
initContainers:
- command:
- bash
- -ec
- |
#!/bin/bash
retry_while() {
local -r cmd="${1:?cmd is missing}"
local -r retries="${2:-12}"
local -r sleep_time="${3:-5}"
local return_value=1
read -r -a command <<< "$cmd"
for ((i = 1 ; i <= retries ; i+=1 )); do
"${command[@]}" && return_value=0 && break
sleep "$sleep_time"
done
return $return_value
}
check_host() {
local -r host="${1:-?missing host}"
local -r port="${2:-?missing port}"
if wait-for-port --timeout=5 --host=${host} --state=inuse $port ; then
return 0
else
return 1
fi
}
echo "Checking connection to iris-devops-mlflow-test-postgres:5432"
if ! retry_while "check_host iris-devops-mlflow-test-postgres 5432"; then
echo "Connection error"
exit 1
fi
echo "Connection success"
exit 0
image: docker.io/bitnami/os-shell:12-debian-12-r27
imagePullPolicy: IfNotPresent
name: wait-for-database
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
runAsGroup: 1001
runAsNonRoot: true
runAsUser: 1001
seLinuxOptions: {}
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp
name: tmp
- command:
- bash
- -ec
- |
#!/bin/bash
cp /bitnami/mlflow-basic-auth/basic_auth.ini /bitnami/rendered-basic-auth/basic_auth.ini
image: docker.io/bitnami/mlflow:2.15.1-debian-12-r0
imagePullPolicy: IfNotPresent
name: get-default-auth-conf
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
runAsGroup: 1001
runAsNonRoot: true
runAsUser: 1001
seLinuxOptions: {}
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp
name: tmp
- mountPath: /bitnami/rendered-basic-auth
name: rendered-basic-auth
- command:
- bash
- -ec
- |
#!/bin/bash
# First render the overrides
render-template /bitnami/basic-auth-overrides/*.ini > /tmp/rendered-overrides.ini
# Loop through the ini overrides and apply it to the final basic_auth.ini
# read the file line by line
while IFS='=' read -r key value
do
# remove leading and trailing spaces from key and value
key="$(echo $key | tr -d " ")"
value="$(echo $value | tr -d " ")"
ini-file set -s mlflow -k "$key" -v "$value" /bitnami/rendered-basic-auth/basic_auth.ini
done < "/tmp/rendered-overrides.ini"
# Remove temporary files
rm /tmp/rendered-overrides.ini
env:
- name: MLFLOW_DATABASE_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: mlflow.iris-devops-mlflow-test-postgres.credentials.postgresql.acid.zalan.do
- name: MLFLOW_DATABASE_AUTH_URI
value: postgresql://mlflow:$(MLFLOW_DATABASE_PASSWORD)@iris-devops-mlflow-test-postgres:5432/mlflow_auth?sslmode=require
- name: MLFLOW_TRACKING_USERNAME
valueFrom:
secretKeyRef:
key: admin-user
name: iris-devops-mlflow-test-tracking
- name: MLFLOW_TRACKING_PASSWORD
valueFrom:
secretKeyRef:
key: admin-password
name: iris-devops-mlflow-test-tracking
- name: MLFLOW_BOTO_CLIENT_ADDRESSING_STYLE
value: path
image: docker.io/bitnami/os-shell:12-debian-12-r27
imagePullPolicy: IfNotPresent
name: render-auth-conf
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
runAsGroup: 1001
runAsNonRoot: true
runAsUser: 1001
seLinuxOptions: {}
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp
name: tmp
- mountPath: /bitnami/basic-auth-overrides
name: basic-auth-overrides
- mountPath: /bitnami/rendered-basic-auth
name: rendered-basic-auth
- args:
- -m
- mlflow.server.auth
- db
- upgrade
- --url
- postgresql://mlflow:$(MLFLOW_DATABASE_PASSWORD)@iris-devops-mlflow-test-postgres:5432/mlflow_auth?sslmode=require
command:
- python
env:
- name: MLFLOW_DATABASE_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: mlflow.iris-devops-mlflow-test-postgres.credentials.postgresql.acid.zalan.do
image: docker.io/bitnami/mlflow:2.15.1-debian-12-r0
imagePullPolicy: IfNotPresent
name: upgrade-db-auth
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
runAsGroup: 1001
runAsNonRoot: true
runAsUser: 1001
seLinuxOptions: {}
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp
name: tmp
- command:
- bash
- -ec
- |
#!/bin/bash
retry_while() {
local -r cmd="${1:?cmd is missing}"
local -r retries="${2:-12}"
local -r sleep_time="${3:-5}"
local return_value=1
read -r -a command <<< "$cmd"
for ((i = 1 ; i <= retries ; i+=1 )); do
"${command[@]}" && return_value=0 && break
sleep "$sleep_time"
done
return $return_value
}
check_host() {
local -r host="${1:-?missing host}"
local -r port="${2:-?missing port}"
if wait-for-port --timeout=5 --host=${host} --state=inuse $port ; then
return 0
else
return 1
fi
}
echo "Checking connection to s3.amazonaws.com:443"
if ! retry_while "check_host s3.amazonaws.com 443"; then
echo "Connection error"
exit 1
fi
echo "Connection success"
exit 0
image: 693612562064.dkr.ecr.eu-central-1.amazonaws.com/docker.io/bitnami/os-shell:12-debian-12-r27
imagePullPolicy: IfNotPresent
name: wait-for-s3
resources: {}
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
privileged: false
readOnlyRootFilesystem: true
runAsGroup: 1001
runAsNonRoot: true
runAsUser: 1001
seLinuxOptions: {}
seccompProfile:
type: RuntimeDefault
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /tmp
name: tmp
nodeName: ip-10-208-18-75.eu-central-1.compute.internal
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Always
schedulerName: default-scheduler
securityContext:
fsGroup: 1001
fsGroupChangePolicy: Always
serviceAccount: iris-devops-mlflow-test-tracking
serviceAccountName: iris-devops-mlflow-test-tracking
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- emptyDir: {}
name: tmp
- emptyDir: {}
name: mlruns
- emptyDir: {}
name: mlartifacts
- configMap:
defaultMode: 420
name: iris-devops-mlflow-test-tracking-auth-overrides
name: basic-auth-overrides
- emptyDir: {}
name: rendered-basic-auth
- emptyDir: {}
name: data
from charts.
mlflow boto3.exceptions.S3UploadFailedError: Failed to upload /tmp/tmpl8tifcwt/input_example.json to <BUCKET_NAME>/2/d4780a6c6c50403fab62785c7a08d8db/artifacts/iris_model/input_example.json: An error occurred (PermanentRedirect) when calling the PutObject operation: The bucket you are attempting to access must be addressed using the specified endpoint. Please send all future requests to this endpoint.
I suppose the bucket resides in some region and AWS does not like you to continue using the global S3 hostname.
See e.g. thoughtbot/paperclip#2151 on how setting the endpoint to the correct regional endpoint fixes things.
See https://docs.aws.amazon.com/general/latest/gr/s3.html#s3_region for list of endpoints.
from charts.
I figured it out (also thanks to @frittentheke).
It works when using a regional endpoint, regardless of the addressing style.
Turns out that using the HTTP endpoint is always regional, in contrast to the s3
endpoint (https://<bucket_name>.s3.eu-central-1.amazonaws.com
vs s3://<bucket_name>
). (Yes that is f****n confusing). I probably knew that at some point but the information was purged out of my brain, so I had to rediscover it. When setting the MLFLOW_S3_ENDPOINT_URL environment variable, Mlflow uses the HTTP endpoint.
So the reason why @andresbono tested successful with the host set to s3.amazonaws.com
is that he just happened to test with a bucket deployed in us-east-1, which AWS will default to for HTTP endpoints when no region specific endpoint is used (see: https://stackoverflow.com/questions/51611874/access-amazon-s3-bucket-without-region-end-point/51612461#51612461).
We use a bucket in eu-central-1, which is why just setting externalS3.host=s3.amazonaws.com
breaks for us.
So basically the fix here is to always use the regional endpoint, everything else will just cause confusion. So what I would do: Delete / Update comment #23959 (comment) to only include the regional endpoint as β and update the example at https://github.com/bitnami/charts/blob/main/bitnami/mlflow/README.md?plain=1#L451C74-L451C83 to use a regional endpoint with the note to pick the appropriate regional endpoint from https://docs.aws.amazon.com/general/latest/gr/s3.html#s3_region. For the latter I will create a PR.
Also, maybe someone else could verify/reproduce my findings, just in case?
from charts.
Related Issues (20)
- [bitnami/rabbitmq] BLUF: We could predict scaling (HPA) needs with custom Prometheus metrics if we had a "rabbitmq_queue_messages_consumed" metric HOT 1
- [bitnami/grafana-operator] admin password not in grafana.ini causes errors in operator HOT 3
- [bitnami/grafana-mimir] mimir gateway password - excessive quotes HOT 2
- bitnami/seaweedfs Support for postgresql external database HOT 1
- [bitnami/redis] Automatic labeling of master pod fails if useHostnames is set to false HOT 1
- [bitnami/minio]: `metrics.prometheusRule` also requires `metrics.enabled` which doesn't exist. HOT 4
- [bitnami/odoo] chown: changing ownership of '/bitnami/odoo': Operation not permitted when booting HOT 2
- [bitnami/seaweedfs] seaweedfs externalDatabase "postgresql" connection error.
- [bitnami/matomo] Connect Matomo to Azure Flexible server with encrypted connections (SSL) HOT 2
- [bitnami/zookeeper] Zookeeper StatefulSet is always out of sync in argocd HOT 4
- [bitnami/mariadb-galera] optional 'node-name'/'node-number' parameter? (feature or support request) HOT 3
- [bitnami/mongodb] exitCode: 2 with unrecognized option storage.journal.enabled HOT 1
- [bitnami/elasticsearch] kibana not booting up HOT 3
- [bitnami/rabbitmq-cluster-operator]: Two services with the same name HOT 1
- [bitnami/harbor] Image push to Harbor registry takes a minimum of 30min and can go upto 45min also HOT 4
- [bitnami/mariadb-galera] upgrading does not run mariadb-upgrade HOT 4
- Update thanos 15.7.15 to 15.7.16, sidecars no longer show up on thanos query stores HOT 1
- [bitnami/rabbitmq-cluster-operator] Add alerting rules as optional HOT 1
- [bitnami/apisix] Ability to disable Kubernetes discovery HOT 1
- [bitnami/mariadb-galera] warning: setting timezone 'Etc/UTC' fails on server. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from charts.