Code Monkey home page Code Monkey logo

Comments (18)

couloum avatar couloum commented on June 4, 2024 1

Hi @leelavg ,
Here is our storage pool configuration:

apiVersion: kadalu-operator.storage/v1alpha1
kind: KadaluStorage
metadata:
  creationTimestamp: "2021-07-20T19:14:50Z"
  generation: 1
  name: storage-pool
  namespace: kadalu
  resourceVersion: "81568311"
  uid: ********-****-****-****-**************
spec:
  kadalu_format: native
  pvReclaimPolicy: delete
  storage:
  - device: /dev/mapper/sdb_crypt
    node: node01p
  - device: /dev/mapper/sdb_crypt
    node: node02p
  - device: /dev/mapper/sdb_crypt
    node: node03p
  type: Replica3
  volume_id: ********-****-****-****-**************

We are using device mode.

from kadalu.

benderbandolero avatar benderbandolero commented on June 4, 2024

Hello,

Do you have any idea about the issue ?

Regards

from kadalu.

benderbandolero avatar benderbandolero commented on June 4, 2024

Hello,

I've upgraded my cluster to Kadalu 1.2.0 version and the problem still persist :(
I've also tested with another Kubernetes version (1.23.8 and 1.27.5)

Regards

from kadalu.

couloum avatar couloum commented on June 4, 2024

Hello,

I've recorded a session where we reproduced the issue. This will maybe bring some clarity over it.

Kadalu issue

This was tested on Kubernetes 1.27.5 and Kadalu 0.8.15, but we reproduced the issue on all version of Kadalu up to 1.2.0.

I'm very surprised about this issue as it seems very critical. Until now it didn't impact us. Maybe we didn't have use cases where the issue could really create applicative problems. It appeared when we migrate a specific application storage from Heketi to Kadalu, where we have a lot of replica (more than 10) and the application heavily rely on files being synced.

I'm wondering if the issue is more a Kadalu issue or a glusterfs issue.

If anyone is able to reproduce this in a test environment, it would already be a great answer as it would confirm that the issue does not come from a specific setup from us.

Thank you

from kadalu.

liyuntao avatar liyuntao commented on June 4, 2024

Hi @benderbandolero

Sometimes, the another pods have not the latest content.

Does it actually same latest content when pods is on the same node?

from kadalu.

leelavg avatar leelavg commented on June 4, 2024
  • Unfortunately, I'm unable to reproduce the scenario
  • the commands and corresponding description are a little bit off in the cast, however the scenario still remains the same
  • ref
# k version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.27.4
Kustomize Version: v5.0.1
Server Version: v1.27.6+b49f9d1

# curl -sL https://github.com/kadalu/kadalu/releases/download/1.2.0/kadalu-operator-openshift.yaml | sed 's/"no"/"yes"/' | k apply -f -
customresourcedefinition.apiextensions.k8s.io/kadalustorages.kadalu-operator.storage unchanged
namespace/kadalu created
serviceaccount/kadalu-operator created
serviceaccount/kadalu-csi-nodeplugin created
serviceaccount/kadalu-csi-provisioner created
serviceaccount/kadalu-server-sa created
clusterrole.rbac.authorization.k8s.io/kadalu-operator created
clusterrole.rbac.authorization.k8s.io/kadalu-csi-external-attacher created
clusterrole.rbac.authorization.k8s.io/kadalu-csi-external-provisioner created
clusterrole.rbac.authorization.k8s.io/kadalu-csi-external-resizer created
clusterrolebinding.rbac.authorization.k8s.io/kadalu-operator created
clusterrolebinding.rbac.authorization.k8s.io/kadalu-csi-external-attacher created
clusterrolebinding.rbac.authorization.k8s.io/kadalu-csi-external-provisioner created
clusterrolebinding.rbac.authorization.k8s.io/kadalu-csi-external-resizer created
role.rbac.authorization.k8s.io/kadalu-operator created
rolebinding.rbac.authorization.k8s.io/kadalu-operator created
Warning: would violate PodSecurity "restricted:v1.24": privileged (container "kadalu-operator" must not set securityContext.privileged=true), allowPrivilegeEscalation != false (container "kadalu-operator" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "kadalu-operator" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "kadalu-operator" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "kadalu-operator" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
deployment.apps/operator created
securitycontextconstraints.security.openshift.io/kadalu-scc created

# curl -sL https://github.com/kadalu/kadalu/releases/download/1.2.0/csi-nodeplugin-openshift.yaml | sed 's/"no"/"yes"/' | k apply -f -                        
clusterrole.rbac.authorization.k8s.io/kadalu-csi-nodeplugin created
clusterrolebinding.rbac.authorization.k8s.io/kadalu-csi-nodeplugin created
daemonset.apps/kadalu-csi-nodeplugin created
# k get kds pool -ojsonpath='{.spec}' | jq
{
  "pvReclaimPolicy": "delete",
  "single_pv_per_pool": false,
  "storage": [
    {
      "device": "/dev/loop0",
      "node": "ip-10-0-16-104.us-east-2.compute.internal"
    },
    {
      "device": "/dev/loop0",
      "node": "ip-10-0-50-78.us-east-2.compute.internal"
    },
    {
      "device": "/dev/loop0",
      "node": "ip-10-0-78-34.us-east-2.compute.internal"
    }
  ],
  "type": "Replica3"
}
# cat /tmp/dep.yaml
# -*- mode: yaml -*-
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pv-io
spec:
  storageClassName: kadalu.pool
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Mi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: io-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: io-app
  template:
    metadata:
      labels:
        app: io-app
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - io-app
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: io-pod
        image: docker.io/kadalu/test-io:devel
        imagePullPolicy: IfNotPresent
        command: ['sh', '-c', 'echo "Ready!" && /usr/bin/tail -f /dev/null']
        volumeMounts:
        - mountPath: '/mnt/alpha'
          name: csivol
        livenessProbe:
          exec:
            command:
              - 'sh'
              - '-ec'
              - 'df'
          initialDelaySeconds: 3
          periodSeconds: 3
      volumes:
      - name: csivol
        persistentVolumeClaim:
          claimName: pv-io
# k get po -owide -lapp
NAME                      READY   STATUS    RESTARTS   AGE   IP            NODE                                        NOMINATED NODE   READINESS GATES
io-app-78c4f8b784-4rjql   1/1     Running   0          79m   10.129.2.12   ip-10-0-16-104.us-east-2.compute.internal   <none>           <none>
io-app-78c4f8b784-5x9bw   1/1     Running   0          79m   10.128.2.20   ip-10-0-78-34.us-east-2.compute.internal    <none>           <none>
io-app-78c4f8b784-jl8np   1/1     Running   0          79m   10.131.0.23   ip-10-0-50-78.us-east-2.compute.internal    <none>           <none>
Screencast_13_11_23_10.01.46_AM_IST.mp4

Closing thoughts:

  1. We would've received more reports if the core tech, ie, replication isn't working as expected, still I don't write off your bug and at the same time, unless I see it I can't believe
  2. Your bug report is missing KadaluStorageCR, might be you are using path mode 🤔?
  3. Anyways, I can relook if you can provide a coded reproducer image which I could just deploy.

from kadalu.

Yitozu avatar Yitozu commented on June 4, 2024

I encountered the same issue. Here is my storage pool configuration:

apiVersion: kadalu-operator.storage/v1alpha1
kind: KadaluStorage
metadata:
name: project-kadalu
spec:
type: Replica3
storage:
- node: openstack-hdp01 # node name as shown in kubectl get nodes
path: /data01/k8s-pv/project-kadalu
- node: openstack-hdp02
path: /data01/k8s-pv/project-kadalu
- node: openstack-hdp03
path: /data01/k8s-pv/project-kadalu

I have three pods simultaneously mounting a PVC. However, when I modified a file, the file was updated on the server at that moment. But the content of the file in one of the pods did not get updated, while the content in the other two pods is new.

from kadalu.

leelavg avatar leelavg commented on June 4, 2024

path: /data01/k8s-pv/project-kadalu

from kadalu.

Yitozu avatar Yitozu commented on June 4, 2024

I switched to using GlusterFS storage and encountered the same issue, so it seems to be a problem related to GlusterFS.

from kadalu.

couloum avatar couloum commented on June 4, 2024

Hi @Yitozu , do you have more information on the test you performed? You switched to GlusterFS without Kadalu? Which version of GlusterFS are you using?
I find it awkward that this kind of issue can appear in GlusterFS as it is the basic feature expected from a distributed FS. We should see if we need to create an issue directly to GlusterFS.

from kadalu.

Yitozu avatar Yitozu commented on June 4, 2024

@couloum "We previously used k8s version 1.18, paired with the 'gluster/gluster-centos:latest' image. However, due to GlusterFS not supporting CSI (Container Storage Interface), we migrated to Kadalu. For confidentiality reasons, I cannot provide the original scenario, but I can roughly describe the reproduction steps: We have a pod named 'browser' used for file management, which mounts 'project-pvc.' Subsequently, a Spark program continuously overwrites a specific file within 'project-pvc.' After multiple operations, inconsistencies arose in the content of the same file among the 'browser' pod, the Spark driver pod, and the Spark executor pod, all of which simultaneously mounted 'project-pvc.'"

from kadalu.

rajtupakula avatar rajtupakula commented on June 4, 2024

I am seeing the similar issue as well, any further findings or RC ?

from kadalu.

leelavg avatar leelavg commented on June 4, 2024

@amarts if possible could you pls comment?

from kadalu.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.