Comments (18)
Hi @leelavg ,
Here is our storage pool configuration:
apiVersion: kadalu-operator.storage/v1alpha1
kind: KadaluStorage
metadata:
creationTimestamp: "2021-07-20T19:14:50Z"
generation: 1
name: storage-pool
namespace: kadalu
resourceVersion: "81568311"
uid: ********-****-****-****-**************
spec:
kadalu_format: native
pvReclaimPolicy: delete
storage:
- device: /dev/mapper/sdb_crypt
node: node01p
- device: /dev/mapper/sdb_crypt
node: node02p
- device: /dev/mapper/sdb_crypt
node: node03p
type: Replica3
volume_id: ********-****-****-****-**************
We are using device
mode.
from kadalu.
Hello,
Do you have any idea about the issue ?
Regards
from kadalu.
Hello,
I've upgraded my cluster to Kadalu 1.2.0 version and the problem still persist :(
I've also tested with another Kubernetes version (1.23.8 and 1.27.5)
Regards
from kadalu.
Hello,
I've recorded a session where we reproduced the issue. This will maybe bring some clarity over it.
This was tested on Kubernetes 1.27.5 and Kadalu 0.8.15, but we reproduced the issue on all version of Kadalu up to 1.2.0.
I'm very surprised about this issue as it seems very critical. Until now it didn't impact us. Maybe we didn't have use cases where the issue could really create applicative problems. It appeared when we migrate a specific application storage from Heketi to Kadalu, where we have a lot of replica (more than 10) and the application heavily rely on files being synced.
I'm wondering if the issue is more a Kadalu issue or a glusterfs issue.
If anyone is able to reproduce this in a test environment, it would already be a great answer as it would confirm that the issue does not come from a specific setup from us.
Thank you
from kadalu.
Sometimes, the another pods have not the latest content.
Does it actually same latest content when pods is on the same node?
from kadalu.
- Unfortunately, I'm unable to reproduce the scenario
- the commands and corresponding description are a little bit off in the cast, however the scenario still remains the same
- ref
# k version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.27.4
Kustomize Version: v5.0.1
Server Version: v1.27.6+b49f9d1
# curl -sL https://github.com/kadalu/kadalu/releases/download/1.2.0/kadalu-operator-openshift.yaml | sed 's/"no"/"yes"/' | k apply -f -
customresourcedefinition.apiextensions.k8s.io/kadalustorages.kadalu-operator.storage unchanged
namespace/kadalu created
serviceaccount/kadalu-operator created
serviceaccount/kadalu-csi-nodeplugin created
serviceaccount/kadalu-csi-provisioner created
serviceaccount/kadalu-server-sa created
clusterrole.rbac.authorization.k8s.io/kadalu-operator created
clusterrole.rbac.authorization.k8s.io/kadalu-csi-external-attacher created
clusterrole.rbac.authorization.k8s.io/kadalu-csi-external-provisioner created
clusterrole.rbac.authorization.k8s.io/kadalu-csi-external-resizer created
clusterrolebinding.rbac.authorization.k8s.io/kadalu-operator created
clusterrolebinding.rbac.authorization.k8s.io/kadalu-csi-external-attacher created
clusterrolebinding.rbac.authorization.k8s.io/kadalu-csi-external-provisioner created
clusterrolebinding.rbac.authorization.k8s.io/kadalu-csi-external-resizer created
role.rbac.authorization.k8s.io/kadalu-operator created
rolebinding.rbac.authorization.k8s.io/kadalu-operator created
Warning: would violate PodSecurity "restricted:v1.24": privileged (container "kadalu-operator" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "kadalu-operator" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "kadalu-operator" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "kadalu-operator" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "kadalu-operator" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/operator created
securitycontextconstraints.security.openshift.io/kadalu-scc created
# curl -sL https://github.com/kadalu/kadalu/releases/download/1.2.0/csi-nodeplugin-openshift.yaml | sed 's/"no"/"yes"/' | k apply -f -
clusterrole.rbac.authorization.k8s.io/kadalu-csi-nodeplugin created
clusterrolebinding.rbac.authorization.k8s.io/kadalu-csi-nodeplugin created
daemonset.apps/kadalu-csi-nodeplugin created
# k get kds pool -ojsonpath='{.spec}' | jq
{
"pvReclaimPolicy": "delete",
"single_pv_per_pool": false,
"storage": [
{
"device": "/dev/loop0",
"node": "ip-10-0-16-104.us-east-2.compute.internal"
},
{
"device": "/dev/loop0",
"node": "ip-10-0-50-78.us-east-2.compute.internal"
},
{
"device": "/dev/loop0",
"node": "ip-10-0-78-34.us-east-2.compute.internal"
}
],
"type": "Replica3"
}
# cat /tmp/dep.yaml
# -*- mode: yaml -*-
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pv-io
spec:
storageClassName: kadalu.pool
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Mi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: io-app
spec:
replicas: 3
selector:
matchLabels:
app: io-app
template:
metadata:
labels:
app: io-app
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- io-app
topologyKey: "kubernetes.io/hostname"
containers:
- name: io-pod
image: docker.io/kadalu/test-io:devel
imagePullPolicy: IfNotPresent
command: ['sh', '-c', 'echo "Ready!" && /usr/bin/tail -f /dev/null']
volumeMounts:
- mountPath: '/mnt/alpha'
name: csivol
livenessProbe:
exec:
command:
- 'sh'
- '-ec'
- 'df'
initialDelaySeconds: 3
periodSeconds: 3
volumes:
- name: csivol
persistentVolumeClaim:
claimName: pv-io
# k get po -owide -lapp
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
io-app-78c4f8b784-4rjql 1/1 Running 0 79m 10.129.2.12 ip-10-0-16-104.us-east-2.compute.internal <none> <none>
io-app-78c4f8b784-5x9bw 1/1 Running 0 79m 10.128.2.20 ip-10-0-78-34.us-east-2.compute.internal <none> <none>
io-app-78c4f8b784-jl8np 1/1 Running 0 79m 10.131.0.23 ip-10-0-50-78.us-east-2.compute.internal <none> <none>
Screencast_13_11_23_10.01.46_AM_IST.mp4
Closing thoughts:
- We would've received more reports if the core tech, ie, replication isn't working as expected, still I don't write off your bug and at the same time, unless I see it I can't believe
- Your bug report is missing KadaluStorageCR, might be you are using path mode 🤔?
- Anyways, I can relook if you can provide a coded reproducer image which I could just deploy.
from kadalu.
I encountered the same issue. Here is my storage pool configuration:
apiVersion: kadalu-operator.storage/v1alpha1
kind: KadaluStorage
metadata:
name: project-kadalu
spec:
type: Replica3
storage:
- node: openstack-hdp01 # node name as shown in kubectl get nodes
path: /data01/k8s-pv/project-kadalu
- node: openstack-hdp02
path: /data01/k8s-pv/project-kadalu
- node: openstack-hdp03
path: /data01/k8s-pv/project-kadalu
I have three pods simultaneously mounting a PVC. However, when I modified a file, the file was updated on the server at that moment. But the content of the file in one of the pods did not get updated, while the content in the other two pods is new.
from kadalu.
path: /data01/k8s-pv/project-kadalu
- alright, issue seems to be hit in
path
mode
@couloum / @benderbandolero do you concur?
from kadalu.
I switched to using GlusterFS storage and encountered the same issue, so it seems to be a problem related to GlusterFS.
from kadalu.
Hi @Yitozu , do you have more information on the test you performed? You switched to GlusterFS without Kadalu? Which version of GlusterFS are you using?
I find it awkward that this kind of issue can appear in GlusterFS as it is the basic feature expected from a distributed FS. We should see if we need to create an issue directly to GlusterFS.
from kadalu.
@couloum "We previously used k8s version 1.18, paired with the 'gluster/gluster-centos:latest' image. However, due to GlusterFS not supporting CSI (Container Storage Interface), we migrated to Kadalu. For confidentiality reasons, I cannot provide the original scenario, but I can roughly describe the reproduction steps: We have a pod named 'browser' used for file management, which mounts 'project-pvc.' Subsequently, a Spark program continuously overwrites a specific file within 'project-pvc.' After multiple operations, inconsistencies arose in the content of the same file among the 'browser' pod, the Spark driver pod, and the Spark executor pod, all of which simultaneously mounted 'project-pvc.'"
from kadalu.
I am seeing the similar issue as well, any further findings or RC ?
from kadalu.
@amarts if possible could you pls comment?
from kadalu.
Related Issues (20)
- Node plugin is failing with NodeUnpublishVolume exception constanly HOT 4
- [Bug]: FUSE mount goes to Transport Endpoint not connected HOT 2
- [Bug]: nomad [nodeserver - 141:NodeUnpublishVolume] HOT 1
- How to restrict "kadalu-csi-provisioner-0" to run in selected nodes HOT 1
- [Bug]: In Replica3 KadaluStorage FUSE is showing subdir as TranportEndpointNotConnected HOT 9
- trusted.gfs.squota.limit is half of pvc's actual size ? HOT 1
- [Bug]: Dupicate fuse processes/issue with existing volume mount process detection
- [Bug]: Randomize order of external mounts HOT 2
- [Bug]: PersistentVolumes get deleted when their relative PersistentVolumeClaims are deleted due to StorageClass reclaim policy ignoring the `--pv-reclaim-policy` value. HOT 7
- [Bug]: Updating kadalu deletes PVs? HOT 1
- [Bug]: Operator restart removes tolerations from storage pool StatefulSets HOT 2
- [Bug]: The displayed storage space is inconsistent with the actual storage space HOT 1
- [Bug]: External Storage will not connect to another GlusterFS host if first is down HOT 2
- [RFE]: Support for storage definitions in helm chart HOT 2
- [RFE]: Helm. kadalu-logging and kadalu-csi-nodeplugin . Parameterization of image
- [Need Help]: Expect shared path for different pods across machine/nodes HOT 7
- [Need Help]When I upload a file directly to the server's mount path of kadalu pvc, the file cannot be identified in the corresponding pod. HOT 5
- [Bug]: nomad controller exception HOT 2
- [Bug]: Over-provisioning stops working when one of the PVC is resized HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kadalu.