peterbe / django-cache-memoize Goto Github PK

View Code? Open in Web Editor NEW

157.0 7.0 31.0 100 KB

Django utility for a memoization decorator that uses the Django cache framework.

Home Page: https://django-cache-memoize.readthedocs.io/

License: Mozilla Public License 2.0

Python 99.68% Shell 0.32%

django-cache-memoize's Introduction

django-cache-memoize

License: MPL 2.0

Django utility for a memoization decorator that uses the Django cache framework.

For versions of Python and Django, check out the tox.ini file.

Key Features

Memoized function calls can be invalidated.
Works with non-trivial arguments and keyword arguments
Insight into cache hits and cache missed with a callback.
Ability to use as a "guard" for repeated execution when storing the function result isn't important or needed.

Installation

pip install django-cache-memoize

Usage

# Import the decorator
from cache_memoize import cache_memoize

# Attach decorator to cacheable function with a timeout of 100 seconds.
@cache_memoize(100)
def expensive_function(start, end):
    return random.randint(start, end)

# Just a regular Django view
def myview(request):
    # If you run this view repeatedly you'll get the same
    # output every time for 100 seconds.
    return http.HttpResponse(str(expensive_function(0, 100)))

The caching uses Django's default cache framework. Ultimately, it calls django.core.cache.cache.set(cache_key, function_out, expiration). So if you have a function that returns something that can't be pickled and cached it won't work.

For cases like this, Django exposes a simple, low-level cache API. You can use this API to store objects in the cache with any level of granularity you like. You can cache any Python object that can be pickled safely: strings, dictionaries, lists of model objects, and so forth. (Most common Python objects can be pickled; refer to the Python documentation for more information about pickling.)

See documentation.

Example Usage

This blog post: How to use django-cache-memoize

It demonstrates similarly to the above Usage example but with a little more detail. In particular it demonstrates the difference between not using django-cache-memoize and then adding it to your code after.

Advanced Usage

`args_rewrite`

Internally the decorator rewrites every argument and keyword argument to the function it wraps into a concatenated string. The first thing you might want to do is help the decorator rewrite the arguments to something more suitable as a cache key string. For example, suppose you have instances of a class whose __str__ method doesn't return a unique value. For example:

class Record(models.Model):
    name = models.CharField(max_length=100)
    lastname = models.CharField(max_length=100)
    friends = models.ManyToManyField(SomeOtherModel)

    def __str__(self):
        return self.name

# Example use:
>>> record = Record.objects.create(name='Peter', lastname='Bengtsson')
>>> print(record)
Peter
>>> record2 = Record.objects.create(name='Peter', lastname='Different')
>>> print(record2)
Peter

This is a contrived example, but basically you know that the str() conversion of certain arguments isn't safe. Then you can pass in a callable called args_rewrite. It gets the same positional and keyword arguments as the function you're decorating. Here's an example implementation:

from cache_memoize import cache_memoize

def count_friends_args_rewrite(record):
    # The 'id' is always unique. Use that instead of the default __str__
    return record.id

@cache_memoize(100, args_rewrite=count_friends_args_rewrite)
def count_friends(record):
    # Assume this is an expensive function that can be memoize cached.
    return record.friends.all().count()

`prefix`

By default the prefix becomes the name of the function. Consider:

from cache_memoize import cache_memoize

@cache_memoize(10, prefix='randomness')
def function1():
    return random.random()

@cache_memoize(10, prefix='randomness')
def function2():  # different name, same arguments, same functionality
    return random.random()

# Example use
>>> function1()
0.39403406043780986
>>> function1()
0.39403406043780986
>>> # ^ repeated of course
>>> function2()
0.39403406043780986
>>> # ^ because the prefix was forcibly the same, the cache key is the same

`hit_callable`

If set, a function that gets called with the original argument and keyword arguments if the cache was able to find and return a cache hit. For example, suppose you want to tell your statsd server every time there's a cache hit.

from cache_memoize import cache_memoize

def _cache_hit(user, **kwargs):
    statsdthing.incr(f'cachehit:{user.id}', 1)

@cache_memoize(10, hit_callable=_cache_hit)
def calculate_tax(user, tax=0.1):
    return ...

`miss_callable`

Exact same functionality as hit_callable except the obvious difference that it gets called if it was not a cache hit.

`store_result`

This is useful if you have a function you want to make sure only gets called once per timeout expiration but you don't actually care that much about what the function return value was. Perhaps because you know that the function returns something that would quickly fill up your memcached or perhaps you know it returns something that can't be pickled. Then you can set store_result to False. This is equivalent to your function returning True.

from cache_memoize import cache_memoize

@cache_memoize(1000, store_result=False)
def send_tax_returns(user):
    # something something time consuming
    ...
    return some_none_pickleable_thing

def myview(request):
    # View this view as much as you like the 'send_tax_returns' function
    # won't be called more than once every 1000 seconds.
    send_tax_returns(request.user)

`cache_exceptions`

This is useful if you have a function that can raise an exception as valid result. If the cached function raises any of specified exceptions is the exception cached and raised as normal. Subsequent cached calls will immediately re-raise the exception and the function will not be executed. cache_exceptions accepts an Exception or a tuple of Exceptions.

This option allows you to cache said exceptions like any other result. Only exceptions raised from the list of classes provided as cache_exceptions are cached, all others are propagated immediately.

>>> from cache_memoize import cache_memoize

>>> class InvalidParameter(Exception):
...     pass

>>> @cache_memoize(1000, cache_exceptions=(InvalidParameter, ))
... def run_calculations(parameter):
...     # something something time consuming
...     raise InvalidParameter

>>> run_calculations(1)
Traceback (most recent call last):
...
InvalidParameter

# run_calculations will now raise InvalidParameter immediately
# without running the expensive calculation
>>> run_calculations(1)
Traceback (most recent call last):
...
InvalidParameter

`cache_alias`

The cache_alias argument allows you to use a cache other than the default.

# Given settings like:
# CACHES = {
#     'default': {...},
#     'other': {...},
# }

@cache_memoize(1000, cache_alias='other')
def myfunc(start, end):
    return random.random()

Cache invalidation

When you want to "undo" some caching done, you simply call the function again with the same arguments except you add .invalidate to the function.

from cache_memoize import cache_memoize

@cache_memoize(10)
def expensive_function(start, end):
    return random.randint(start, end)

>>> expensive_function(1, 100)
65
>>> expensive_function(1, 100)
65
>>> expensive_function(100, 200)
121
>>> exensive_function.invalidate(1, 100)
>>> expensive_function(1, 100)
89
>>> expensive_function(100, 200)
121

An "alias" of doing the same thing is to pass a keyword argument called _refresh=True. Like this:

# Continuing from the code block above
>>> expensive_function(100, 200)
121
>>> expensive_function(100, 200, _refresh=True)
177
>>> expensive_function(100, 200)
177

There is no way to clear more than one cache key. In the above example, you had to know the "original arguments" when you wanted to invalidate the cache. There is no method "search" for all cache keys that match a certain pattern.

Compatibility

Python 3.8, 3.9, 3.10 & 3.11
Django 3.2, 4.1 & 4.2

Check out the tox.ini file for more up-to-date compatibility by test coverage.

Prior Art

History

Mozilla Symbol Server is written in Django. It's a web service that sits between C++ debuggers and AWS S3. It shuffles symbol files in and out of AWS S3. Symbol files are for C++ (and other compiled languages) what sourcemaps are for JavaScript.

This service gets a LOT of traffic. The download traffic (proxying requests for symbols in S3) gets about ~40 requests per second. Due to the nature of the application most of these GETs result in a 404 Not Found but instead of asking AWS S3 for every single file, these lookups are cached in a highly configured Redis configuration. This Redis cache is also connected to the part of the code that uploads new files.

New uploads are arriving as zip file bundles of files, from Mozilla's build systems, at a rate of about 600MB every minute, each containing on average about 100 files each. When a new upload comes in we need to quickly be able find out if it exists in S3 and this gets cached since often the same files are repeated in different uploads. But when a file does get uploaded into S3 we need to quickly and confidently invalidate any local caches. That way you get to keep a really aggressive cache without any stale periods.

This is the use case django-cache-memoize was built for and tested in. It was originally written for Python 3.6 in Django 1.11 but when extracted, made compatible with Python 2.7 and as far back as Django 1.8.

django-cache-memoize is also used in SongSear.ch to cache short queries in the autocomplete search input. All autocomplete is done by Elasticsearch, which is amazingly fast, but not as fast as memcached.

"Competition"

There is already django-memoize by Thomas Vavrys. It too is available as a memoization decorator you use in Django. And it uses the default cache framework as a storage. It used inspect on the decorated function to build a cache key.

In benchmarks running both django-memoize and django-cache-memoize I found django-cache-memoize to be ~4 times faster on average.

Another key difference is that django-cache-memoize uses str() and django-memoize uses repr() which in certain cases of mutable objects (e.g. class instances) as arguments the caching will not work. For example, this does not work in django-memoize:

from memoize import memoize

@memoize(60)
def count_user_groups(user):
    return user.groups.all().count()

def myview(request):
    # this will never be memoized
    print(count_user_groups(request.user))

However, this works...

from cache_memoize import cache_memoize

@cache_memoize(60)
def count_user_groups(user):
    return user.groups.all().count()

def myview(request):
    # this *will* work as expected
    print(count_user_groups(request.user))

Development

The most basic thing is to clone the repo and run:

pip install -e ".[dev]"
tox

Code style is all black

All code has to be formatted with Black and the best tool for checking this is therapist since it can help you run all, help you fix things, and help you make sure linting is passing before you git commit. This project also uses flake8 to check other things Black can't check.

To check linting with tox use:

tox -e lint-py36

To install the therapist pre-commit hook simply run:

therapist install

When you run therapist run it will only check the files you've touched. To run it for all files use:

therapist run --use-tracked-files

And to fix all/any issues run:

therapist run --use-tracked-files --fix

django-cache-memoize's People

Contributors

Stargazers

Watchers

django-cache-memoize's Issues

Update in bulk

I have a situation where I am caching records from remote API get requests. I would like to run a background task that fetches a list in bulk and primes the cache in bulk.

It would be useful if there was a helper for this like ".invalidate"

Perhaps cache_memoized_function.set(args, result)? And potentially cache_memoized_function.set_in_bulk([(args1, result1), (args2, result2), ..., (argsN, resultN)])

Right now to do this I need to hit the list api and then parse the results and rerun the memoized function. So I have to fetch the records twice in this case

Drop python 2.7 support

Related to discussions on how to implement #15 and pending python2.7 "end-of-life" in about a year, it's probably a good time to try to drop support and open up to python3-only features without the headache of maintaining compatibility.

Maybe a good chance to add version 3.7 to the tox config too, as it now seems to be available on Travis

Release the package with Django 3.2 support

Hey, we are working on upgrading edX codebase to Django 3.2. This package is being used in the codebase but it isn't yet released with Django 3.2 support. Kindly consider releasing it with Django 3.2 support.
Thanks

Caching result only if not None.

Is there any workaround to cache result only if return value is not None

"Dog-piling" prevention support

Under very heavy load when the cached version of that page expires or is missing, there may be sufficient concurrency that multiple workers will all attempt to evaluate the target function simultatenously. This could be "problematic".

Not sure about the best solution here, perhaps some very loose coupling with various locking mechanisms via a path.to.lock.method?

Arguments containing colons can be mis-cached

Since the default key construction uses a colon (:) to separate raw string values, strings arguments with colons in them can lead to mis-caching.

(If an argument is user input, this could theoretically be a security issue allowing an user to access the "wrong" object from the cache.)

POC – the assertion fails since only one call was made, since the cache key is a:b:c:bla for both invocations.

def test_colons():
    calls_made = []
    @cache_memoize(10)
    def fun(a, b, k="bla"):
        calls_made.append((a, b, k))
        return (a, b, k)

    fun('a:b', 'c')
    fun('a', 'b:c')
    assert len(calls_made) == 2

Invalidate cache for a given function

Hello, is there a way to invalidate all cached values for a function versus invalidating the cached value for a given set of arguments?

How to know the cache key of a cached function?

Also tag releases on Git

When packaging a python package for NixOS we prefer to run the tests to make sure our packaging works and we have a smoke test to notice when things go wrong.

The PyPi sdist package does not include tests, which is very reasonable. We would like to fetch the latest version directly from Git, which includes the test. However there are no tags to pinpoint the exact commits that were used for the PyPi releases.

Please add tags, if you can.

Feature Request: extra key components

Sometimes we know, that cache state should depend on some external data like some dependency last update datetime. It would be nice if extra args would be supported so one can pass smth like

@cache_memoize(60, extra=('foo', obj.pk, foo.unix_timestamp))
def foo(*args, **kwargs):
    return 42

This also solves #35

Does invalidate() return the new cached result?

Your README seems to imply that the invalidate() call doesn't return the new value:

>>> expensive_function(100, 200)
121
>>> exensive_function.invalidate(1, 200)
>>> expensive_function(1, 100)
89

… however a call with _refresh=True does:

>>> expensive_function(100, 200)
121
>>> expensive_function(100, 200, _refresh=True)
177
>>> expensive_function(100, 200)
177

Is this correct?

Cache miss each time if used on class instance method

Looks like self param is being taken into account.
Current workaround that works is using empty args_rewrite list:

class MyClass(object):

    @cache_memoize(30, args_rewrite=lambda x: [])
    def get_data(self):
      #....

Django 2.x support?

Are there plans for adding support for Django 2.x?

invalidate_cache doesn't work in 0.1.4 release

I think it's because of the _refresh keyword arguments doesn't get removed inside the invalidate function.

tox tests don't test with Django 2.

Summary says is all. https://github.com/peterbe/django-cache-memoize/blob/master/tox.ini

Incomplete sdist

Hi,

in order to properly ship the package with Debian, could you please build a compelte sdist tarball, including:

license
docs
tests
pytest and tox config

Once done, please upload a (post-)release sdist to PyPI as well.

Thanks,
Nik

Issue new release on pypi

Can a new release be issued on pypi? The get_cache_key method is very useful for testing/debugging.

Pickle error unsupported

I use Python3.8 with django 3.0.1

and i get the following error
unsupported pickle protocol: 5

File "C:\Users\U779\django_projects\8bit-apps\venv\lib\site-packages\cache_memoize_init_.py", line 131, in inner
result = cache.get(cache_key, MARKER)
File "C:\Users\U779\django_projects\8bit-apps\venv\lib\site-packages\django\core\cache\backends\db.py", line 51, in get
return self.get_many([key], version).get(key, default)
File "C:\Users\U779\django_projects\8bit-apps\venv\lib\site-packages\django\core\cache\backends\db.py", line 92, in get_many
value = pickle.loads(base64.b64decode(value.encode()))

Why _refresh and invalidate doesn't work?

I'm using this package in django,like this:

class A(models.Model):
    @cache_memoize(300)
    def b(self, arg1='', arg2=None)
         return ...

And clear cache like this:

self.b(_refresh=True)
self.b.invalidate(self)

Both this two method can't clear the method b's cache.

Please update changelog for new releases

Similar to #60 it is useful to have this metadata around releases

tox runs no tests

Not sure how or when this started (the tox.ini was cloned from somewhere) but running tox -e py36-django21 (for example) doesn't actually run the tests. It just manages to run python --version.

Also the docs don't need to be built for every version.

Add support for Django 4.0

Please consider adding support for Django 4.0.
Thanks

Use func.qualname instead of func.name

By using func.__qualname__ instead of func.__name__ we can apply the decorator to methods and get unique keys including the class' name.

For example if we have two classes with a method with the same name, and we apply @cache_memoize to both of them the same cache key will be used, unless we explicitly use prefix. But IMHO doing as suggested is a safe default.

Just noticed that for Python 2 this would require adding some dependency like https://pypi.org/project/qualname/

Add a license

Hey there,

I would like to use your package in a project but I noticed you don't have a license file.
would you consider adding a license to your repo to make it clear what the license terms are?

MIT is an often suggested choice if you aren't sure. :-)
See: https://choosealicense.com/

Thanks!

Automatically (with help) invalidate cache when code changes

Sometimes a change in the code should trigger the invalidation of the cache. The idea is to add the version optional parameter to cache_memoize, defaulting to None. Then the version, if not None, would be added to the cache key.

We could use it like this:

@cache_memoize(60)
def f(...):

Then we fix a bug in f and write:

@cache_memoize(60, version=1)

Another bug fixed (so many 🐛 🪲 ) :

@cache_memoize(60, version=2)

So, when the fixes are deployed the cache will be invalidated, instead of returning a wrong cached value.

clear cache in different processing

I use celery to do something which take long time, and this func will change origin model. So I must clear the origin cache when the func is done.But I found call the cached method with ' _refresh=True' in celery processing can't clear main processing.How to clear this cache?

Why not kwargs_rewrite?

Is there a reason there is no kwargs_rewrite?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.