Code Monkey home page Code Monkey logo

Comments (8)

junshiguo avatar junshiguo commented on June 19, 2024

And here is the values yaml file content to create the cluster.


# This scenario deploys a cluster with
# 3 core members and 2 read replicas.
acceptLicenseAgreement: "yes"
neo4jPassword: "passw0rd"

useAPOC: "true"

core:
standalone: false
numberOfServers: 3
persistentVolume:
enabled: true
mountPath: /data
size: 10Gi
storageClass: "faster"

readReplica:
numberOfServers: 2
persistentVolume:
enabled: true
mountPath: /data
size: 10Gi
storageClass: "faster"

tolerations: []

podLabels:
clusterLabel: "affinity-test"

affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:

  • labelSelector:
    matchExpressions:
    • key: clusterLabel
      operator: In
      values:
      • affinity-test
        topologyKey: "kubernetes.io/hostname"

from neo4j-helm.

moxious avatar moxious commented on June 19, 2024

Actually I think this error may rather be caused by a DNS caching change that was made in the last release. One other user had a problem with it, so I've removed it and already made the change here: 3e1fbe2 -- but it's not in a release yet.

Can you clone the repo and try installing directly from the repo instead of the helm release URL and see if this fixes?

Some other points about your YML:

  • useAPOC was deprecated. See the user guide for more details, but now you would specify it like this: plugins: "[\"apoc\"]".

If you can verify the master branch works without the DNS caching bit, I'll cut a new release to make sure that's out.

from neo4j-helm.

junshiguo avatar junshiguo commented on June 19, 2024

Hi @moxious , using master branch does not solve the issue. The exception log is the same.

The readiness check failure issue only happens when affinity is configured. Removing affinity section in my YML file, I could create cluster and pass readiness check. Could you check what DNS related check would be added if affinity/anti-affinity is enabled?

from neo4j-helm.

moxious avatar moxious commented on June 19, 2024

@junshiguo I still suspect this isn't related to anti-affinity rules, because of the actual errors you're reporting in your debug.log. I've found that the actual cluster formation is flaky on my side, and I've opened a branch to change the approach slightly. Can you check out branch issue-60-readiness and try it here? I think this may resolve, let me explain why.

You'll recall that in order to do support rolling upgrades and scaling cores up, we "over-provisioned" the discovery services. Normally you have 3 cores (discovery services 0, 1, 2). But we created extras: 3, 4. The reason your cluster is failing to form is because your core members can't contact host 3 or 4.

The way the discovery works under the covers is that Neo4j is sending an API request to kubernetes asking it to list the services, and using that as a connection basis. It gets 5 entries (instead of 3) because we over-provisioned the services on purpose. It fails to connect, and cluster formation fails. This is actually unfortunate and probably shouldn't be the case, because I separately told the cluster it could form with 3 members, and it still has that. I'm taking that up as a technical point with the internal clustering team to see what they think about proper behavior there.

Now, the work-around that I put on this branch is that discovery will only ever work against the first 3 endpoints. In this way, it doesn't matter if you have a service 3 & 4 that don't resolve, because they won't be consulted. And yet -- they are still there, so you can still scale core members when you need. If you do scale, member 3 and 4 will still discover the cluster via talking to 0, 1, and 2.

The odd part about this which is why it was hard to spot, is that the cluster formation sometimes happens and sometimes doesn't, depending on the order of operations in kubernetes.

from neo4j-helm.

junshiguo avatar junshiguo commented on June 19, 2024

@moxious Thanks for the detailed explanation. The work-around branch works. Cluster can be started successfully.

I did some more tests today. The readiness check fails when I use self defined labels (in my example it is clusterLabel=affinity-test) as labelSelector-matchExpressions-key.
When I use app.kubernetes.io/component as selector key, the cluster can be up normally as well. Hope this info helps.

Below fails without work-around:

podLabels:
  clusterLabel: "affinity-test"

affinity:
 podAntiAffinity:
   requiredDuringSchedulingIgnoredDuringExecution:
   - labelSelector:
       matchExpressions:
       - key: clusterLabel
         operator: In
         values:
         - affinity-test
     topologyKey: "kubernetes.io/hostname"

Below works without work-around:

affinity:
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: app.kubernetes.io/component
          operator: In
          values:
          - core
      topologyKey: "kubernetes.io/hostname"

from neo4j-helm.

moxious avatar moxious commented on June 19, 2024

@junshiguo I have tested this both way, and local to me the anti-affinity labels work both ways, both with and without the custom pod labels. I'd recommend making your matchExpressions more precise to have it not only be core but also match on the chart name or instance name, but other than that it should be working well.

The test that I'm running with though includes the branch fix that I provided. can you try again using the latest "master" (where that branch fix has been merged)? If you still have formation issues, then we might need to see the 3 debug.log files to figure out what's going on.

Custom labels are merely added to the existing labels, and you should be able to specify whateaver you want for custom labels (and also use them in pod anti affinity rules) without affecting cluster formation. The only exception would be if your custom labels try to overwrite an existing label we use, or if possibly your anti-affinity rules end up making the cluster unscheduleable.

from neo4j-helm.

junshiguo avatar junshiguo commented on June 19, 2024

Thanks @moxious. Latest master branch works well with no formation issue.

from neo4j-helm.

moxious avatar moxious commented on June 19, 2024

In this case, I'm going to close this issue for now, but we can open additional ones if needed later.

from neo4j-helm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.