Code Monkey home page Code Monkey logo

Comments (3)

derekbit avatar derekbit commented on May 28, 2024

The issue can be reproduced by

  • Create a RWX volume.
  • Create a workload A using the volume. The running workload A ensures the share-manager pod keeps running.
  • Repeatedly attach and detach workload B using the volume. The memory usage (cat /proc/<PID of nfs-ganesha>/status | grep VmRSS) of the nfs-ganesha increases over time.

from longhorn.

longhorn-io-github-bot avatar longhorn-io-github-bot commented on May 28, 2024

Pre Ready-For-Testing Checklist

  • Where is the reproduce steps/test steps documented?
    The reproduce steps/test steps are at:
  1. Create a 3 node cluster
  2. Create first workload with a RWX volume by https://github.com/longhorn/longhorn/blob/master/examples/rwx/rwx-nginx-deployment.yaml
  3. Create second workload with the RWX volume.
  4. Scale down the second workload and scale up repeatedly 100 times
  5. Find the PID of the nfs-ganesha in the share-manager pod by ps aux
  6. Observe the VmRSS of nfs-ganesha in the share-manager pod by cat /proc/<nfs-ganesha PID>/status | grep VmRSS
  7. VmRSS in LH v1.6.1 is significantly larger than the value after applying the fix.
  • Does the PR include the explanation for the fix or the feature?

  • Have the backend code been merged (Manager, Engine, Instance Manager, BackupStore etc) (including backport-needed/*)?
    The PR is at

longhorn/nfs-ganesha#13
longhorn/longhorn-share-manager#204

  • Which areas/issues this PR might have potential impacts on?
    Area: RWX volume, memory leak, upstream
    Issues

from longhorn.

roger-ryao avatar roger-ryao commented on May 28, 2024

Verified on v1.6.x-head 20240507

The test steps
#8394 (comment)

  1. Create first workload with a RWX volume by https://github.com/longhorn/longhorn/blob/master/examples/rwx/rwx-nginx-deployment.yaml
  2. Scale up the replicas to 3.
  3. Check if 3 workloads are in the "Running" state.
  4. Scale down the replicas to 1.
  5. Check if one workload are in the "Running" state.
    We can test steps 2-5 using the following shell script.
deployment_rwx_test.sh
#!/bin/bash

# Define the deployment name
DEPLOYMENT_NAME="rwx-test"
KUBECONFIF="/home/ryao/Desktop/note/longhorn-tool/ryao-161.yaml"

for ((i=1; i<=100; i++)); do
    # Scale deployment to 10 replicas
    kubectl --kubeconfig=$KUBECONFIF scale deployment $DEPLOYMENT_NAME --replicas=3

    # Wait for the deployment to have 3 ready replicas
    until [[ "$(kubectl --kubeconfig=$KUBECONFIF get deployment $DEPLOYMENT_NAME -o=jsonpath='{.status.readyReplicas}')" == "3" ]]; do
        ready_replicas=$(kubectl --kubeconfig=$KUBECONFIF get deployment $DEPLOYMENT_NAME -o=jsonpath='{.status.readyReplicas}')
        echo "Iteration #$i: $DEPLOYMENT_NAME has $ready_replicas ready replicas"
        sleep 1
    done

    # Check if all pods are in the "Running" state
    while [[ $(kubectl --kubeconfig=$KUBECONFIF get pods -l=app=$DEPLOYMENT_NAME -o=jsonpath='{.items[*].status.phase}') != "Running Running Running" ]]; do
        echo "Not all pods are in the 'Running' state. Waiting..."
        sleep 5
    done

    # Scale deployment down to 1 replicas
    kubectl --kubeconfig=$KUBECONFIF scale deployment $DEPLOYMENT_NAME --replicas=1

    # Wait for the deployment to have 1 ready replicas
    until [[ "$(kubectl --kubeconfig=$KUBECONFIF get deployment $DEPLOYMENT_NAME -o=jsonpath='{.status.readyReplicas}')" == "1" ]]; do
        ready_replicas=$(kubectl --kubeconfig=$KUBECONFIF get deployment $DEPLOYMENT_NAME -o=jsonpath='{.status.readyReplicas}')
        echo "Iteration #$i: $DEPLOYMENT_NAME has $ready_replicas ready replicas"
        sleep 1
    done

    # Check if all pods are in the "Running" state
    while [[ $(kubectl --kubeconfig=$KUBECONFIF get pods -l=app=$DEPLOYMENT_NAME -o=jsonpath='{.items[*].status.phase}') != "Running" ]]; do
        echo "Not all pods are in the 'Running' state. Waiting..."
        sleep 5
    done
done
  1. Find the PID of the nfs-ganesha in the share-manager pod by ps aux
  2. Observe the VmRSS of nfs-ganesha in the share-manager pod by cat /proc/<nfs-ganesha PID>/status | grep VmRSS

Result Passed

  1. We were also able to reproduce this issue on v1.6.1.
  2. After executing the script, the output for v1.6.1 is as follows:
Every 2.0s: cat /proc/29/status | grep VmRSS                     share-manager-pvc-119d403e-ae17-4f4f-aa7f-06e7bf40fca2: Tue May  7 09:54:38 2024

VmRSS:     47192 kB

For the v1.6.x-head

Every 2.0s: cat /proc/29/status | grep VmRSS                    share-manager-pvc-f22c2fdf-330e-4c22-aea2-45a10c570cbf: Tue May  7 10:09:11 2024

VmRSS:     41604 kB

from longhorn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.