Code Monkey home page Code Monkey logo

Comments (36)

subnetmarco avatar subnetmarco commented on May 22, 2024

From an implementation perspective this could be a cleaner and easier design:

  • Every time an invalidation happens, the invalidation is being stored in a specific invalidation table in the datastore, in the following format:
INSERT INTO invalidations (id_to_invalidate, type_to_invalidate, created_at) VALUES (263af28e-72b7-402f-c0f1-506a70c420e6, "plugins", now())
  • Asynchronously every node checks the invalidations table, and stores the time of the check in memory in a variable like last_check_at. Every n seconds it checks again all the invalidation that have been created in the meanwhile where created_at > last_check_at, executes those invalidations and updates the last_check_at value again.
  • When a new node starts, last_check_at is being set to the time the node was started, so that new nodes will only execute the newer invalidations and not the older ones.
  • The table can have a TTL set to an appropriate value, like one hour or one day, so that the table doesn't grow too much.

from kong.

subnetmarco avatar subnetmarco commented on May 22, 2024

The first step for building invalidations has been merged with #42. It implements a mechanism for time based cache expiration.

To fully implement invalidations, the time-based expiration should be removed in favor of an application-based invalidation.

from kong.

subnetmarco avatar subnetmarco commented on May 22, 2024

Following up on this - do you see any problems with the proposed solution (the invalidation table)? Invalidations could be stored in the table for a week and then automatically expire leveraging Cassandra TTLs.

Advantages:

  • Not having to introduce yet another layer (like Serfdom or Consul).
  • Not introducing cluster awareness to Kong now would also mean keeping it very simple to scale up and down (cluster awareness would mean improving the CLI to add cluster-join, cluster-remove, cluster-info properties, and also execute those operations every time a new machine is being added/removed).

Disadvantages:

  • Kong needs to have a job that every second queries for the latest invalidations. Not necessarily expensive, but in a way not super elegant.
  • The invalidation need to expire at one point because we don't want that table to keep growing indefinitely. A TTL of a week or a few days could be set, with the assumption that if a machine had a network problem that lasts for more than the TTL it means that machine is dead and should be restarted. This will avoid data inconsistencies.

from kong.

thibaultcha avatar thibaultcha commented on May 22, 2024

Really against this. It looks very clumsy and using techs that are not built for this kind of job (Cassandra and querying it every second... This feels like manually doing something clustered databases are supposed to deal with, and this is not a good sign...). Sadly I think our only real option is to use something actually built for that kind of job, and step away from Cassandra.

from kong.

ahmadnassri avatar ahmadnassri commented on May 22, 2024

if you're leaning towards clustering Kong nodes, then might as well ditch the database layer and just rely on local memory and local storage of each node, and implement data syncing algorithm...

otherwise, low hanging fruit is to keep with the database provided clustering and do selective caching based on entity type (api vs consumer vs plugin)

another approach that might be simpler and more robust, is a separate process that talks to the database (on a recurring intervals) on behalf of Kong, and just updates in-memory objects that Kong can access (shared memory space)

from kong.

thibaultcha avatar thibaultcha commented on May 22, 2024

as well ditch the database layer and just rely on local memory and local storage of each node, and implement data syncing algorithm

👍. Which is why I suggest using already existing solutions like service directories (@thefosk quoted Consul).

from kong.

subnetmarco avatar subnetmarco commented on May 22, 2024

another approach that might be simpler and more robust, is a separate process that talks to the database (on a recurring intervals) on behalf of Kong, and just updates in-memory objects that Kong can access.

This would be the solution proposed above.

The only problem with having Kong querying Cassandra every n seconds, is that as the Kong cluster grows, more and more connections are being sent to Cassandra. This becomes a problem when the number of Kong machines grows very big (100+ nodes, but then Cassandra can be scaled too).

To effectively solve the problem we need a good implementation of a gossip protocol, so that for example when one machine receives the HTTP request to update an API, we can communicate this change to every other node. So we don't need to replace our database, we need a new feature on top of our database. If we decide to go thru this route, then something like Serfdom that can live in the machine where Kong is running would be ideal, because from a user's perspective there wouldn't be nothing else to setup, scale or maintain.

from kong.

ahmadnassri avatar ahmadnassri commented on May 22, 2024

The only problem with having Kong querying Cassandra every n seconds, is that as the Kong cluster grows, more and more connections are being sent to Cassandra. This becomes a problem when the number of Kong machines grows very big (100+ nodes, but then Cassandra can be scaled too).

correct, however you wont be querying with every request, you'll just be querying to update the Kong shared memory space, meaning you can make one big query to get all the data every time (in theory) and just update what's needed

from kong.

thibaultcha avatar thibaultcha commented on May 22, 2024

However loading the entire records from Cassandra in nginx's lua memory zone could be an issue if too many records are in Cassandra. This approach also feels like reinventing the wheel (a "implement data syncing algorithm" like @ahmadnassri said), a complex problem (we don't have solid knowledge on distribution algorithms at Mashape, or am I overthinking this?) already solved.

from kong.

subnetmarco avatar subnetmarco commented on May 22, 2024

Not only that, nginx's memory zone is a simple key=value store with no indexes or support for complex queries. We would need to drop any query other than (find, delete, update, *)_by_id.

from kong.

subnetmarco avatar subnetmarco commented on May 22, 2024

As well ditch the database layer and just rely on local memory and local storage of each node, and implement data syncing algorithm.

The idea is nice, not sure how feasible it is though, as @thibaultcha said.

Let's say that we have multiple Kong nodes connected through a gossip protocol, and each node has a local datastore (let's say Cassandra, since we already have the DAO for it), we could have detached Cassandra nodes on each Kong node, in a clustered Kong environment, where basically Cassandra only stores the data and the gossip protocol takes care of the propagation (effectively replacing Cassandra's clusterization and propagation).

Kong would ship with in an embedded nginx + Cassandra + Serfdom configuration, no external DBs. Each node would have it's own Cassandra (or any other datastore). We would still be able to make complex queries since the underlying datastore would be Cassandra.

from kong.

thibaultcha avatar thibaultcha commented on May 22, 2024

I don't understand the role of Cassandra in your latest comment @thefosk.

from kong.

thibaultcha avatar thibaultcha commented on May 22, 2024

Also does serfdom implement a storage solution? I think we are here talking about 2 different approaches:

  • the one you described: a database, Kong, and a new component responsible for telling each Kong node when to reload data.
  • the one I suggest: leave the gossiping to a tool that already does that, so a database, aka a service directory, like etcd or Consul, use Cassandra (or anything else) only for cold storage. Then our DAO becomes a serializer to load data in Cassandra, or anything else (even a configuration file like #528) to the service directory. Letting the job of distribution to tools that already solve this problem.

from kong.

subnetmarco avatar subnetmarco commented on May 22, 2024

@thibaultcha Cassandra would only be the store medium, living standalone in the machine. Basically you could replace Cassandra with SQLite and the role would be the same. Serfdom and its gossip protocol implementation would tell each node the data to store locally into Cassandra, without having a centralized datastore.

The reason why I said Cassandra is just for conveniency: we already have the DAO for it.

from kong.

subnetmarco avatar subnetmarco commented on May 22, 2024

the one you described: a database, Kong, and a new component responsible for telling each Kong node when to reload data.

There is a variant of this, and it's the last option I was suggesting.

Kong, serfdom and Cassandra all living in the same machine. We have 4 nodes? Then 4 Kong, 4 Cassandra, 4 Serfdom instances.

Each Cassandra node lives on it's own, without knowing anything about the other Cassandra instances. It's only being used as a powerful local storage medium (like SQLite was in the beginning), leaving all the propagation job to serfdom.

Cassandra would be only used for conveniency here, since we already have everything working for it. We could use another datastore, but we can't use a simple key=value storage with no indexes like Consul, etcd, or the in-memory nginx cache.

from kong.

subnetmarco avatar subnetmarco commented on May 22, 2024

leave the gossiping to a tool that already does that, so a database, aka a service directory, like etcd or Consul, use Cassandra (or anything else) only for cold storage. Then our DAO becomes a serializer to load data in Cassandra, or anything else (even a configuration file like #528) to the service directory

Maybe we are suggesting the same thing with different terminology.

from kong.

ahmadnassri avatar ahmadnassri commented on May 22, 2024

(we don't have solid knowledge on distribution algorithms at Mashape, or am I overthinking this?)

yeah, and I don't think its a viable approach anyways, we're not in the business of building databases.

from kong.

thibaultcha avatar thibaultcha commented on May 22, 2024

Yeah that's my point

from kong.

ahmadnassri avatar ahmadnassri commented on May 22, 2024

Each Cassandra node lives on it's own, without knowing anything about the other Cassandra instances. It's only being used as a powerful local storage medium (like SQLite was in the beginning), leaving all the propagation job to serfdom.

why not just simply couple Cassandra within Kong builds (exactly like nginx) and just hide it away from the end user, the Kong <-> Cassandra connection would become local and with little latency and the clustering logic is outsourced to Cassandra's internals.

no need for caching, no need for validation.

from kong.

subnetmarco avatar subnetmarco commented on May 22, 2024

Okay, we do agree that we need some sort of cluster awareness, and that we need something capable of supporting a gossip protocol.

I would like to better understand one thing. @thibaultcha as you know tools like Consul, or etcd, support a very simple implementation of a key-value store. I don't think we will be able to get rid of Cassandra, because we still want to be able to do complex queries (and increments). I +1 the idea of introducing a new tool, but I don't believe we can't get rid of a real database.

@thibaultcha @ahmadnassri - thoughts?

from kong.

thibaultcha avatar thibaultcha commented on May 22, 2024

why not just simply couple Cassandra within Kong builds (exactly like nginx) and just hide it away from the end user, the Kong <-> Cassandra connection would become local and with little latency and the clustering logic is outsourced to Cassandra's internals.

We can't have Cassandra run on each node and use nginx's resources, nor expect users to have the ressources for that.

@thefosk My idea was just to put API routing and plugins in it, and keep Cassandra. Data is retrieved from Cassandra and serialized on boot, so that we have different families of keys (for APIs, for plugins, etc). But we wouldn't have cache for any other entities, like the ones used by plugins (API keys, etc...). I agree it is not ideal either. We are pretty much stuck on this issue tbh.

from kong.

subnetmarco avatar subnetmarco commented on May 22, 2024

We can't have Cassandra run on each node and use nginx's resources, nor expect users to have the ressources for that.

I agree.

So, we can't make too many changes all at once, I want to proceed step by step and see where this leads us to. My idea is to:

  • Integrate Kong with Serfdom
  • Every time the admin API is being invoked that changes something, the node that processes the HTTP request will also tell Serfdom to send an invalidation event to the other nodes. The other nodes will invalidate the data only when they receive such event (as opposed of using a time expiration invalidation).
  • Introducing Serfdom, means introducing cluster awarness (otherwise Serfdom doesn't know where to send the events to). In this first version, the CLI needs to be updated with functions to:
    • Join a node to a cluster
    • Remove a node from a cluster
    • Check the cluster status
    • (something else that I am missing?)

This will not change much the foundations (where the datastore is located, for example), but will achieve the invalidation goal on top of the Gossip protocol of Serfdom.

Serfdom will be shipped in the distributions (like dnsmasq), so nothing will change in the installation procedure. Serfdom opens a couple of ports, that the user will need to make sure they are properly firewalled.

Thoughts?

from kong.

ahmadnassri avatar ahmadnassri commented on May 22, 2024

I like the idea, couple of thoughts:

  • License: need to verify we're able to package and ship Serfdom alongside Kong: Mozilla Public License, version 2.0
  • Footprint: how many more resources would Serfdom require?
  • Debugging and Support: since this is a completely separate entity (and much bigger than dnsmasq) I worry about supporting it, so we have to also think of ways to incorporating debug dumps and logging into Kong's own.

from kong.

thibaultcha avatar thibaultcha commented on May 22, 2024
  • Need to evaluate any alternative to Serfdom and consider them.
  • Need to figure out how to make Kong and Serfdom communicate between each other.

from kong.

ahmadnassri avatar ahmadnassri commented on May 22, 2024

Need to evaluate any alternative to Serfdom and consider them.

this probably should be priority, to make sure we have picked the right tool, not just based on hype.

from kong.

subnetmarco avatar subnetmarco commented on May 22, 2024

License: need to verify we're able to package and ship Serfdom alongside Kong: Mozilla Public License, version 2.0

Can you investigate this? @ahmadnassri

Footprint: how many more resources would Serfdom require?

Not much. A couple of ports open, and that would be pretty much it.

Debugging and Support: since this is a completely separate entity (and much bigger than dnsmasq) I worry about supporting it, so we have to also think of ways to incorporating debug dumps and logging into Kong's own.

It's okay. I am not too worried about this - we are using a pretty basic functionality among the ones that it provides, nothing crazy.

Need to evaluate any alternative to Serfdom and consider them.

Yes - to date Serfdom is my best option. Happy to discuss about other options. When I was looking for a solution to this problem, Serfdom was my pick because is decentralized (no centralized servers to setup, like Apache Zookeeper) and very straightforward to use. Also, support for every platform. (Consul itself is also built on top of Serfdom for cluster awareness).

Need to figure out how to make Kong and Serfdom communicate between each other.

Serfdom will trigger a script every time an event is received. The script can then whatever we want to do, start an HTTP request (my first idea), TCP, etc. Invalidation on our scale could happen over HTTP in my opinion, because it's not going to be too much intensive.

from kong.

ahmadnassri avatar ahmadnassri commented on May 22, 2024

Can you investigate this? @ahmadnassri

best suited for a lawyer ... my go-to source for understanding licenses is: http://choosealicense.com/licenses/mpl-2.0/

seems permissive, but I would still consult an expert.

Not much. A couple of ports open, and that would be pretty much it.

I meant in terms of memory, hard disk and CPU usage :)

from kong.

subnetmarco avatar subnetmarco commented on May 22, 2024

I meant in terms of memory, hard disk and CPU usage

Negligible.

from kong.

sonicaghi avatar sonicaghi commented on May 22, 2024

@thefosk ping Glaser for legal. can apache 2.0 wrap a mpl2?

from kong.

subnetmarco avatar subnetmarco commented on May 22, 2024

Just did. No red flags in the licenses.

from kong.

subnetmarco avatar subnetmarco commented on May 22, 2024

Since we reached an agreement on Serfdom for now and there are no licensing issues, closed in favor of #651 which covers the technical integration aspects.

from kong.

sonicaghi avatar sonicaghi commented on May 22, 2024

@thefosk you look at this? https://github.com/airbnb/synapse

from kong.

subnetmarco avatar subnetmarco commented on May 22, 2024

That serves a different purpose @sinzone

from kong.

hutchic avatar hutchic commented on May 22, 2024

already using serf why not go all the way and make consul a prerequisite for Kong? Would allow us to rely on local memory and local storage of each node withou reinventing a wheel already invented elsewhere

from kong.

subnetmarco avatar subnetmarco commented on May 22, 2024

@hutchic because Consul is a centralized dependency and we don't want to introduce any more dependencies to Kong besides the datastore.

from kong.

hutchic avatar hutchic commented on May 22, 2024

sure I mean consul is no more centralized then serf https://www.consul.io/intro/vs/serf.html but I can understand wanting to avoid adding more dependencies. Forever growing the number of queries to psql doesn't seem like it'll scale past a certain point so might need distributed in-memory some day regardless

from kong.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.