Code Monkey home page Code Monkey logo

Comments (14)

dongfuye avatar dongfuye commented on August 16, 2024 1

I have not considered keeping the old configurations before, so I will take some time to design it. Maybe I will reply to this tomorrow.

from nebulex.

cabol avatar cabol commented on August 16, 2024

Hey 👋 !! First of all, thanks for the proposal, I agree the current GC strategy is quite confusing and prone to unexpected behavior if you don't configure it properly, so yeah, it would be great if we can improve it.

I have a couple of doubts tho, regarding the generation_max_time, you mention "If the items of our local cache has expire time, then you can set generation_max_time to the expire time", but in the cache the entries may have different expiration time, so I think that makes it very complicated, perhaps I misunderstood it?

Currently, the way the size/memory check works is by using a inverse-proportional backoff, because the idea is, the lower size/momory the greater the timeout (if there is yet enough space, we don't want to check the limits that often). On the other hand, if size/memory is reaching the limit, we want the timeout to happen more often, so that memory space can be released faster. For that reason, you can configure what is going to be the gc_cleanup_min_timeout and gc_cleanup_max_timeout, with those values along with the max_size and allocated_memory, an inverse linear function is built to calculate the next timeout, this is basically how the GC works for the size/memory check. Something like:

gc_size_memory_check

Based on this, my other question is, how we will ensure the size or memory will be checked more often when we are reaching the limit and less often when we are far from it? Perhaps as you explain, this sis maybe a totally different strategy? But, how we will ensure the size/memory will be checked as soon as we reach the limit? According to what I see in the chart, the max_time and the 2 * max_time is like the analogous to the gc_cleanup_min_timeout and gc_cleanup_max_timeout don't you think?

Anyway, Iet me know your thoughts, I'm glad to discuss it so we can improve the current logic, that would be great. Thanks!!

from nebulex.

dongfuye avatar dongfuye commented on August 16, 2024

Thank you for your response. It is glad to discuss such an important strategy on this great project with you.

The above proposal has not included the strategy to check size and memory.

For the cache generation, there are two backends: ets and shards. The cost of getting memory and size for ets is quite cheap, so even checking them every seconds is OK.

If the backend is shards, the cost of checking them will go expensive as cluster become large. For a cluster of 16 nodes, each checking in each node will result in 15 remote calls, and each checking of the cluster overall will result in 16*15 = 240 remote calls. For a cluster of 100 nodes, each checking of the cluster overall will result in 100 * 99 ~= 10 K remote calls.

In my experience, most clusters are small, probably less than 16 nodes, so a checking interval of 3 seconds may be OK. If the cluster become larger, then the checking interval should be specified by the user, maybe 2 minutes or more.

Maybe a configuration named gc_check_interval can be introduced. The default value is 3 seconds, and the user only need to set it for large clusters.

In a large cluster, even if a large gc_check_interval is set, there may be problems. If the items distributed uniformly across the shards and are accessed uniformly, then all nodes may reach the limit simultaneous and will do checking at the same time, causing tremendous remote calls. So in the implementation of checking size and memory, randomizing the checking time is important.

In my opinion, reducing extra traffic at peak time may be more important than reducing extra traffic at non-peak time.

from nebulex.

cabol avatar cabol commented on August 16, 2024

Ok, I think I'm understanding the strategy a bit more, but I need to understand some things yet 😅 . Let me first clarify some things that can simplify the problem a lot.

For the cache generation, there are two backends: ets and shards. The cost of getting memory and size for ets is quite cheap, so even checking them every seconds is OK.

Agreed, even check-in the size/memory every second is OK.

If the backend is shards, the cost of checking them will go expensive as cluster become large. For a cluster of 16 nodes, each checking in each node will result in 15 remote calls, and each checking of the cluster overall will result in 16*15 = 240 remote calls. For a cluster of 100 nodes, each checking of the cluster overall will result in 100 * 99 ~= 10 K remote calls.

Nebulex uses :shards but only locally, so there are no network calls at all. Therefore, the cost of checking the size here is proportional to the number of partitions (ETS partitions), but no network calls, so I wouldn't be too worried about it. I'm mentioning this because it may simplify the problem, we shouldn't think about remote calls because this happens locally, it is for the local adapter. Now, if you have a partitioned cache with the partitioned adapter, there are network calls, but not related to the GC. The GC works locally with its local adapter, which means, in a partitioned cache, each partition has its own GC, they may run at the same time, but they are independent and without remote calls. Let me know if it is clear.

So, having said that, if I understand correctly, instead of having min and max timeout for size/memory checking (gc_cleanup_min_timeout and gc_cleanup_max_timeout) and calculating the actual cleanup timeout using an inverse linear function (which I agree is confusing and maybe complex), the strategy consists of having a fixed timeout (let's call it gc_cleanup_timeout) that will check whether the size and/or memory have exceeded the limit but only for the current/newest generation and if so, then a new generation is created, is that right?

from nebulex.

dongfuye avatar dongfuye commented on August 16, 2024

the strategy consists of having a fixed timeout (let's call it gc_cleanup_timeout) that will check whether the size and/or memory have exceeded the limit but only for the current/newest generation and if so, then a new generation is created, is that right?

Yes! Exactly!

from nebulex.

cabol avatar cabol commented on August 16, 2024

Ok, that makes sense. I suggest introducing new config options :generation_cleanup_timeout and :generation_max_size (and maybe also :generation_allocated_memory), and when :generation_cleanup_timeout is set, it overrides :gc_cleanup_min_timeout and :gc_cleanup_max_timeout. In that way, we don't remove existing functionality causing breaking changes, instead, by adding new functionality we can get feedback and maybe later deprecate it. It would be nice to take measurements to compare both strategies, especially in terms of memory efficiency. Overall, I like the idea, as you said, it is simple, just focus on the current/newest generation, which means, we configure the max size (or max memory) but per generation, which makes this easier. Let me know what you think.

from nebulex.

dongfuye avatar dongfuye commented on August 16, 2024

The configuration proposal:

The new version of configuration for Nebulex should be as simple as possible. The user only need to set only one option, for example: generation_max_size.

Following are the options for new strategy:

  • generation_max_size(will be implemented, the test need this): If the size of items hold by the newest generation exceed this value, then a new generation will be created. No default value.
  • generation_allocated_memory(will be implemented, my work need this): If the memory of the newest generation exceed this value, then a new generation will be created. No default value.
  • generation_max_time(may be implemented later): If the newest generation exceed this time limit, then a new generation will be created. No default value.
  • generation_cleanup_timeout: The interval for checking whether the newest generation has been exceed. Default value is 3 seconds.

If any of generation_max_time, generation_max_size and generation_allocated_memory is set, then the new strategy take effect and ignore all the old options. If none of these 3 options is set, then go back to the old strategy as normal.

This design can be compatible with old options without any modification and also make the configuration for new strategy as simple as possible. If old strategy is deprecated and finally removed, then we can enforce that at least one of the 3 options are required.

from nebulex.

cabol avatar cabol commented on August 16, 2024

Ok, some thoughts and doubts:

generation_max_time(may be implemented later): If the newest generation exceed this time limit, then a new generation will be created. No default value.

What is the difference between :generation_max_time and the current :gc_interval? As far as I understand and based on your explanation "If the newest generation exceed this time limit, then a new generation will be created", it does the same, so I still don't get that part.

If any of generation_max_time, generation_max_size and generation_allocated_memory is set, then the new strategy take effect and ignore all the old options. If none of these 3 options is set, then go back to the old strategy as normal.

I wouldn't say ALL options, let's better say: ALL options except :gc_interval, because this one is fundamental for the GC, and I don't see the difference with the option :generation_max_time you described, am I missing something?

from nebulex.

dongfuye avatar dongfuye commented on August 16, 2024

The internal behavior is exactly the same. My suggestion of this new option is based on following reasons:

  • For the new strategy, using the name : generation_max_time may be easier to understand. Other options for new strategy are generation_* and a new option named : generation_max_time can follow the convention.
  • Existing codes have already set :gc_interval to 12 hours by default. This value may not be appropriate for some applications, so let it default to nil in the new strategy may be a good choice.
  • It may be confusing to users to understand that :gc_interval is affecting both old and new strategy.

from nebulex.

cabol avatar cabol commented on August 16, 2024

Ok, that makes sense, but I'd rename :generation_max_time to :generation_start_timeout, to make it even clearer (indicating,g this is the time, in milliseconds, that the GC waits before starting/creating a new generation and delete the oldest one).

from nebulex.

dongfuye avatar dongfuye commented on August 16, 2024

OK, I agree to rename :generation_max_time to :generation_start_timeout

I am currently on vacation until next weekend, and then I'll implement this new strategy and make a PR.

from nebulex.

cabol avatar cabol commented on August 16, 2024

I am currently on vacation until next weekend, and then I'll implement this new strategy and make a PR.

Sure, no rush, take your time, thank you very much for this, looking forward to seeing that PR 👍

from nebulex.

cabol avatar cabol commented on August 16, 2024

@dongfuye whenever you have some time check this out: https://github.com/nebulex-project/nebulex_local. It is the new local adapter for Nebulex v3, and I think it covers what we have discussed here. Let me know your thoughts about it.

from nebulex.

dongfuye avatar dongfuye commented on August 16, 2024

whenever you have some time check this out: https://github.com/nebulex-project/nebulex_local. It is the new local adapter for Nebulex v3, and I think it covers what we have discussed here. Let me know your thoughts about it.

Great! It is perfectly implemented!

from nebulex.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.