zio / zio-cache Goto Github PK

A ZIO native cache with a simple and compositional interface

License: Apache License 2.0

Scala 100.00%

cache scala functional-programming asyncronous concurrency effects concurrent-programming asynchronous-programming concurrent concurrent-data-structure zio asynchronicity

zio-cache's Introduction

ZIO Cache

ZIO Cache is a library that makes it easy to optimize the performance of our application by caching values.

Introduction

Sometimes we may call or receive requests to do overlapping work. Assume we are writing a service that is going to handle all incoming requests. We don't want to handle duplicate requests. Using ZIO Cache we can make our application to be more performant by preventing duplicated works.

Some key features of ZIO Cache:

Compositionality — If we want our applications to be compositional, different parts of our application may do overlapping work. ZIO Cache helps us to stay benefit from compositionality while using caching.
Unification of Synchronous and Asynchronous Caches — Compositional definition of cache in terms of lookup function unifies synchronous and asynchronous caches. So the lookup function can compute value either synchronously or asynchronously.
Deep ZIO Integration — ZIO Cache is a ZIO native solution. So without losing the power of ZIO it includes support for concurrent lookups, failure, and interruption.
Caching Policy — Using caching policy, the ZIO Cache can determine when values should/may be removed from the cache. So, if we want to build something more complex and custom we have a lot of flexibility. The caching policy has two parts and together they define a whole caching policy:
- Priority (Optional Removal) — When we are running out of space, it defines the order that the existing values might be removed from the cache to make more space.
- Evict (Mandatory Removal) — Regardless of space when we must remove existing values because they are no longer valid anymore. They might be invalid because they do not satisfy business requirements (e.g., maybe it's too old). This is a function that determines whether an entry is valid based on the entry and the current time.
Composition Caching Policy — We can define much more complicated caching policies out of much simpler ones.
Cache/Entry Statistics — ZIO Cache maintains some good statistic metrics, such as entries, memory size, hits, misses, loads, evictions, and total load time. So we can look at how our cache is doing and decide where we should change our caching policy to improve caching metrics.

How to Define a Cache?

A cache is defined in terms of a lookup function that describes how to compute the value associated with a key if a value is not already in the cache.

import zio._

trait Lookup[-Key, -Environment, +Error, +Value] {
  def lookup(key: Key): ZIO[Environment, Error, Value]
}

The lookup function takes a key of type Key and returns a ZIO effect that requires an environment of type Environment and can fail with an error of type Error or succeed with a value of type Value. Because the lookup function returns a ZIO effect it can describe both synchronous and asynchronous workflows.

We construct a cache using a lookup function as well as a maximum size and a time to live.

trait Cache[-Key, +Error, +Value] {
  def get(k: Key): IO[Error, Value]
}

object Cache {

  def make[Key, Environment, Error, Value](
    capacity: Int,
    timeToLive: Duration,
    lookup: Lookup[Key, Environment, Error, Value]
  ): ZIO[Environment, Nothing, Cache[Key, Error, Value]] =
    ???
}

Once we have created a cache the most idiomatic way to work with it is the get operator. The get operator will return the current value in the cache if it exists or else compute a new value, put it in the cache, and return it.

If multiple concurrent processes get the value at the same time the value will only be computed once, with all of the other processes receiving the computed value as soon as it is available. All of this will be done using ZIO's fiber based concurrency model without ever blocking any underlying operating system threads.

Installation

In order to use this library, we need to add the following line in our build.sbt file:

libraryDependencies += "dev.zio" %% "zio-cache" % "0.2.2"

Example

In this example, we are calling timeConsumingEffect three times in parallel with the same key. The ZIO Cache runs this effect only once. So the concurrent lookups will suspend until the value being computed is available:

import zio._
import zio.cache.{Cache, Lookup}

object ZIOCacheExample extends ZIOAppDefault {
  def timeConsumingEffect(key: String) =
    ZIO.sleep(5.seconds).as(key.hashCode)

  def run =
    for {
      cache <- Cache.make(
        capacity = 100,
        timeToLive = Duration.Infinity,
        lookup = Lookup(timeConsumingEffect)
      )
      result <- cache
        .get("key1")
        .zipPar(cache.get("key1"))
        .zipPar(cache.get("key1"))
      _ <- ZIO.debug(
        s"Result of parallel execution of three effects with the same key: $result"
      )

      hits <- cache.cacheStats.map(_.hits)
      misses <- cache.cacheStats.map(_.misses)
      _ <- ZIO.debug(s"Number of cache hits: $hits")
      _ <- ZIO.debug(s"Number of cache misses: $misses")
    } yield ()

}

The output of this program should be as follows:

Result of parallel execution three effects with the same key: ((3288498,3288498),3288498)
Number of cache hits: 2
Number of cache misses: 1

Resources

Compositional Caching by Adam Fraser (December 2020) — In this talk, Adam will introduce ZIO Cache, a new library in the ZIO ecosystem that provides a drop-in caching solution for ZIO applications. We will see how ZIO’s support for asynchrony and concurrent lets us implement a cache in terms of a single lookup function and how we get many other things such as typed errors and compositional caching policies for free. See how easy it can be to add caching to your ZIO application!

Documentation

Learn more on the ZIO Cache homepage!

Contributing

For the general guidelines, see ZIO contributor's guide.

Code of Conduct

See the Code of Conduct

Support

Come chat with us on .

License

zio-cache's People

Stargazers

Watchers

zio-cache's Issues

Optional jitter for the cache ttl

It would be nice to have a jitter parameter so if a number of keys is getting queried continuously, the periodic re-fetching of them spreads out a bit.

Allow batch lookups

Many HTTP APIs have a batch endpoint. This allows multiple values to be requested with a single HTTP call.

This doesn't work well with ZIO-cache right now, as there is no way to look up multiple values at once.

HTTP endpoints are prime targets for caching, since network overhead is usually significant. So I think support for this use case would be a great addition to ZIO-cache.

I'm not sure what the best interface for this would be, but ideally, it would:

Use cached values for keys already present in the cache;
Call the user-defined batch function with the remaining keys (if any);
Add new entries to the cache;
Enforce at the type level that the user-defined batch function returns a value for every key.

Ability to transform value

Much like there is a constructor to transform the key, it would be great to have one to transform the value. The use case I have right now is an http call where the ttl is controlled by the Cache-Control header in the response, but the value stored in the cache is the decoded body. Does that make sense?

def makeUltimate[In, Key, Environment, Error, Result, Value](
    capacity: Int,
    lookup: Lookup[In, Environment, Error, Result]
  )(
    timeToLive: Exit[Error, Value] => Duration,
    keyBy: In => Key,
    valueFrom: Result => ZIO[Environment, Error, Value]
  )(implicit trace: Trace): URIO[Environment, Cache[In, Error, Value]]

Home stretch: add a conditional lookup function, so that in my use case instead of doing a blanket GET request again it would add the If-Modified-Since header.

Ability to clear the cache

Motivation

Although it's not common, some special circumstance might require to clear the entire cache. We should allow it if desired.

Considerations

We might go one step further and allow a user-provided predicate to clear entries conditionally.

Delete Lookup?

It doesn't seem to provide much right now and feels more like boilerplate rather than a useful abstraction. Is there a plan to add other types of lookups?

Complex Coding ?

Hi John,

why your coding style is super complex ? can you please change your coding style to something meaningful and less complex.

Thanks

Refined result from `get`

Is it possible to tweak the signature of get to know when the result I get has been calculated by my query or another request that happened to arrive before mine?
We need it to refine our retry policy client-side.

Ability to preload the cache at start time

Motivation

Sometimes it's desired to pre-warm the cache with entries that are known to be frequently used, as an optimization strategy. For instance, an e-commerce web site might want to load the top 100 most popular items right from the start.

Considerations

We'll probably need a separate lookup function to retrieve these entries and populate the cache.
It's probably a good idea to stagger the provided TTL for each entry so that we can avoid the situation where these hotspot entries all expire at the same time and subsequent requests can trigger an avalanche of retrieval.

Enhance cache lookup

Hi!

While working with zio-cache I ran into a problem where Key alone does not provide enough information
to compute the cached Value.

As a motivating example let's assume that some request contains the user ID and some other data extracted
from a session cookie.

case class CookieData(data:String)
case class Request(userId:Int, cookieData:CookieData)

trait UserSessionDataComputationService {
  type UserSessionData
  def expensiveUserSessionDataComputation(request:Request):UIO[UserSessionData]
}

A cache lookup can trigger a expensiveUserSessionDataComputation call to compute the cached value.

With the current version of zio-cache, we can set Key to Int,
Value to UserSessionData and
Environment to UserSessionDataComputationService.
To run the effect and construct the cache we must provide a UserSessionDataComputationService once.
However, we cannot access the Request instance.
We cannot solve the problem by making Request part of the environment,
because the request would only be set once during cache creation instead of cache lookup.

One solution is to update the interface of Cache's get,lookupValue and refresh methods to return
a ZIO[Environment,Error,Value] instead of IO[Error,Value].
In the example given above Environment is set to Request only.
UserSessionDataComputationService can be provided as input for a cache layer.

This branch shows a possible implementation with a demo app:
https://github.com/landlockedsurfer/zio-cache/commits/lookup-environment

Another solution is to project out the key from a given input. A new type variable Input is introduced
and a keyByInput function passed on cache creation to extract the key from the given input.

This branch shows a possible implementation for this solution including a demo app:
https://github.com/landlockedsurfer/zio-cache/commits/key-by-input

What do you think?

Kind regards,
Manfred

Create a few documentation pages in the microsite to describe usage and configuration

Test fails when executed in IntelliJ

When tests are executed using IntelliJ, tests fail with the following output (varies with each run).

CacheSpec
- cacheStats
  Test failed after 4 iterations with input: 13
  Original input before shrinking was: 296005039
  • 47 was not equal to 49
  hits == 49L
  hits = 47
  at /home/ravi/projects/zio-cache/zio-cache/shared/src/test/scala/zio/cache/CacheSpec.scala:22
  
  Test failed after 4 iterations with input: 13
  Original input before shrinking was: 296005039
  • 53 was not equal to 51
  misses == 51L
  misses = 53
  at /home/ravi/projects/zio-cache/zio-cache/shared/src/test/scala/zio/cache/CacheSpec.scala:23
- invalidate
- invalidateAll
- lookup
  - sequential
  - concurrent
  - capacity
- refresh method
  - should update the cache with a new value
  - should update the cache with a new value even if the last get or refresh failed
  - should get the value if the key doesn't exist in the cache
- size
  Ran 10 tests in 4 s 263 ms: 9 succeeded, 0 ignored, 1 failed
CacheSpec
- cacheStats
  Test failed after 4 iterations with input: 13
  Original input before shrinking was: 296005039
  • 47 was not equal to 49
  hits == 49L
  hits = 47
  at /home/ravi/projects/zio-cache/zio-cache/shared/src/test/scala/zio/cache/CacheSpec.scala:22
  
  Test failed after 4 iterations with input: 13
  Original input before shrinking was: 296005039
  • 53 was not equal to 51
  misses == 51L
  misses = 53
  at /home/ravi/projects/zio-cache/zio-cache/shared/src/test/scala/zio/cache/CacheSpec.scala:23

Process finished with exit code 1

Refactor Evict

Pull out of CachePolicy
Simplify so it cannot look at time or EntryStats
Pass it to cache constructor (Cache#make)

Separately, move "ttl" concerns as expirationTime member of EntryStats (#6 alternative idea).

Add hooks for auditing

It should be possible to audit a cache to figure out why values are retained or expired (and when, etc.).

Ability to add callback/hook to expiry events

Motivation

For some business need, we might be interested in when an entry expires and gets evicted from the cache. We should provide some mechanism (such as callbacks) that users can tap into for such events.

Considerations

If cache reset is added, perhaps we should provide a hook for that event too.

Alternative cost/weight per entry

Use case: a cache of document collections. Each entry can have a small or large number of documents, which can be varying in size themselves. Eviction should happen because of the memory heap being to full; so if there is 500 MB 'free' then I can either allow insertion of many small document collections or perhaps just one big collection.

Currently the 'size' of the cache is just the amount of entries (weight fixed to 1), I would like to allow the lookup function to provide arbitrary weights (float/double).

Force specifying a name / description for the cache when a cache is created

This will help with ZIO metrics integration for caches.

Add tests to cover more complex scenarios and cross-platform cases.

Motivation

Currently the test cases are pretty simple. More complex scenarios should be covered. Additionally tests for other platforms (namely JS and native) should also be added to make sure behaviors on those platforms are within our expectation.

Add operators to Lookup

includeKeys - A predicate on keys that should ALWAYS be cached
excludeKeys - A predicate on keys that should NEVER be cached
Combining lookup functions?
- orElse (fallback)
- race (first success)
Unary operators
- onSuccess(v => ZIO(...))
- onFailure(e => ZIO(...))

val lookup2 = lookup.includeKeys(List("SPECIAL_KEY1", "SPECIAL_KEY2") contains _).excludeKeys(_ == "SPECIAL_KEY3")

Looking up an expired item increases both the 'hits' and 'misses' stats of the cache.

Is this intentional? It kind of confused me as a new user when I wanted to test a behavior.

Call CachingPolicy#evict more often

A goal should be that if all entries should expire after 1 hour, then if the cache is left alone for a sufficient amount of time, eventually, it contains no entries.

Provide more stats in `CacheStats`

Motivation

Currently we provide hits and misses stats. Cache count should be a useful addition to the stats.

Considerations

Adding another LongAdder for the count should suffice.
Maybe for completeness, we can include capacity in the stats even though the value is user-provided.
Not sure this is applicable to zio.internal.MutableConcurrentQueue - perhaps the current allocation size of the underlying data structure (if it varies/grows) is also a good stat to report.

Ability to set expiry time from lookup function

I need to set expiry time for each cached item from the lookup function.
In my use case I request an auth token from a server and the response contains the token as well as its expiry time.
Because of this I need some way to set the expiry time for each item when adding it to the cache.

Cache should support multiple backing storage engines (memory, redis, memcache, etc.)

In-memory caching is very desirable, but any caching mechanism should support multiple backends as well.

Ability to iterate/query items in the cache

Motivation

Currently there is no easy way to iterate through or query against the items in the cache. There might be cases where you would like to do that (debugging comes to mind).

Considerations

Maybe we can provide either an iterator or a query/filter interface to users for this purpose. However, we'll need to take potential performance impact and data consistency into account:

Any query only represents a snapshot of the cache at a particular moment.
We probably shouldn't keep additional copies of data to hold this snapshot just for this purpose

Ability to trigger a lookup call deliberately

Motivation

Currently get is the only way to trigger a lookup call, which may or may not happen depending on whether the target entry resides in the cache. However, there are times when we want to:

refresh a cache entry to its most up-to-date value from our persistence store (it could have changed since the last retrieval)
we simply want to extend the TTL of an entry by repopulating it
we want both 1) and 2)

At the moment, we would have to invalidate the entry first then get it again. This is probably not the best way to handle it. For example, a popular item is being requested constantly. If we evict it first then fetch it, during the fetch, we could receive tons of requests for this item. Even though we can handle a Thundering Herd situation, we should avoid it in the first place.

The proposal is that we can trigger an update, which runs in the background. Upon a successful retrieval, we will update the entry with the new value. During the time of retrieval, all incoming requests are served right away without delay.

Option to turn off caching errors

Motivation

Currently we cache errors when lookup function calls fail. While this can be beneficial (e.g. as a countermeasure to malicious attacks that send bogus requests for non-existing entities), there are times when such behavior is not desired. We should give users the option to opt out.

Add memory estimation

Maybe we create an Estimator[Value] that can estimate the size of a value, which can be passed into Cache.make.

Support managed cached values

It would be nice to be able to cache ZManaged values that gets released when they get out from the cache.

Ability to add cache entries directly, bypassing the `Lookup` function

Motivation

In some scenarios, we might want to add a value to the cache directly without calling the lookup function. An example of such business logic is we have some special values that don't exist in the database (from which the lookup function retrieves values), nonetheless we want to serve these special values by injecting them directly into the cache.

Considerations

The decision to bypass lookup is most likely determined by some external conditions. Currently the Lookup function has the following signature:

def lookup(key: Key): ZIO[Environment, Error, Value]

And this function is provided upfront during the construction of the cache, when the conditions to bypass might not be available.

We'd probably either need to express the conditions and the value to add through Environment, or need to add additional (optional) parameters to the lookup function.

Ability to set an initial size in addition to `capacity`?

Motivation

Depending on the underlying data structure, initializing it to a known size that fits user's use pattern might be a good optimization.

Considerations

If I'm not mistaken, zio.internal.MutableConcurrentQueue is currently used as the underlying data structure. I'm not familiar with the characteristics or the implementation of this data structure. Perhaps this ticket won't be applicable or useful to MutableConcurrentQueue.

Reconsider EntryStats#loads

Because we do not "reload" on eviction, this would always be 1. We should probably delete it for now.

Compat with ZIO 2.1.0-RC2

When upgrading an app that uses zio-cache to ZIO 2.1.0-RC2, I get:

    Exception in thread "zio-fiber-131478979" java.lang.NoSuchMethodError: 'zio.internal.MutableConcurrentQueue zio.internal.MutableConcurrentQueue$.unbounded()'
    	at zio.cache.Cache$CacheState$.initial(Cache.scala:369)

Would it be possible to have an RC release or something that supports ZIO 2.1? Thanks!

zio / zio-cache Goto Github PK

zio-cache's Introduction

ZIO Cache

Introduction

How to Define a Cache?

Installation

Example

Resources

Documentation

Contributing

Code of Conduct

Support

License

zio-cache's People

Stargazers

Watchers

Forkers

zio-cache's Issues

Motivation

Considerations

Motivation

Considerations

Motivation

Considerations

Motivation

Motivation

Considerations

Motivation

Considerations

Motivation

Motivation

Motivation

Considerations

Motivation

Considerations

Recommend Projects

Recommend Topics

Recommend Org