geewee / django-auto-prefetching Goto Github PK

View Code? Open in Web Editor NEW

226.0 226.0 18.0 420 KB

Automatic prefetching for Django

License: MIT License

Python 100.00%

django-auto-prefetching's People

Contributors

Stargazers

Watchers

Forkers

lexaf pmoorani tetsumote doorlist oyeyipo sousapedro11 kmcswiney landcaster eo2875 paulzhousz hemache chenrb siteshen bopo sgarcialaguna-mms ranjitc jaspersui orcarex

django-auto-prefetching's Issues

Additional unnecessary prefetching occurs in some edge cases

First of all thanks for this awesome package.

I noticed an issue whereby if your serializer does not make use of any Foreign Key relations in the underlying Django model, and those relations are not nullable, they will ALL be prefetched. This could cause a performance issue for people who want to use a serializer that does not render those relations as they will be prefetched even though they are never going to be serialized.

The root cause appears to be the behaviour specified by the Django docs:

There may be some situations where you wish to call select_related() with a lot of related objects, or where you don’t know all of the relations. In these cases it is possible to call select_related() with no arguments. This will follow all non-null foreign keys it can find - nullable foreign keys must be specified. This is not recommended in most cases as it is likely to make the underlying query more complex, and return more data, than is actually needed.

For example if we had this set of models and serializer:

class MyRelation(models.Model):
  pass

class MyModel(models.Model):
  textfield: models.TextField()
  relation = models.ForeignKey(MyRelation, on_delete=models.CASCADE)

class MySerializer(serializers.ModelSerializer):
  class Meta:
    model = MyModel
    fields = ('textfield',)

The link from MyModel to MyRelation is not specified for use in the serializer, but this package will prefetch it anyway. The reason is because in this section of code:

def prefetch(queryset, serializer: Type[ModelSerializer]):
    select_related, prefetch_related = _prefetch(serializer)
    try:
        return queryset.select_related(*select_related).prefetch_related(
            *prefetch_related
        )

In this scenario the select_related evaluates to an empty set, and as a result, queryset.select_related thinks you want it to automatically prefetch any non-nullable relationships.

I was going to submit a fix for this like so:

def prefetch(queryset, serializer: Type[ModelSerializer]):
    select_related, prefetch_related = _prefetch(serializer)
    try:        
        if select_related:
            queryset = queryset.select_related(*select_related)
        if prefetch_related:
            queryset = queryset.prefetch_related(*prefetch_related)
        return queryset

After re-running the tests, two tests fail:

FAIL: test_reverse_foreign_key_lookups (test_project.tests.tests.TestOneToMany)
AssertionError: 2 != 1 : 2 queries executed, 1 expected
Captured queries were:
1. SELECT "test_project_childb"."id", "test_project_childb"."childB_text", "test_project_childb"."parent_id" FROM "test_project_childb"
2. SELECT "test_project_toplevel"."id", "test_project_toplevel"."top_level_text" FROM "test_project_toplevel" WHERE "test_project_toplevel"."id" = 1 LIMIT 21

FAIL: test_it_prefetches_when_using_dotted_property_access (test_project.tests.tests.TestOneToMany)
AssertionError: 2 != 1 : 2 queries executed, 1 expected
Captured queries were:
1. SELECT "test_project_childb"."id", "test_project_childb"."childB_text", "test_project_childb"."parent_id" FROM "test_project_childb"
2. SELECT "test_project_toplevel"."id", "test_project_toplevel"."top_level_text" FROM "test_project_toplevel" WHERE "test_project_toplevel"."id" = 1 LIMIT 21

Both tests are related to the use of the ChildBSerializerWithDottedPropertyAccess. The failures are not directly caused by my change, but rather, they were previously passing for the wrong reason: the relations were only being prefetched by default after the package calls select_related with an empty set rather than with the name of the actual relation associated with that dotted property.

As a result I decided not to submit a PR and instead ask whether this was a known issue or intended behaviour. It seems like the dotted property access never actually worked as it was intended and the tests associated with this behaviour were really just passing by accident. But by fixing this the overzealous pre-fetching issue, we expose the issue with these dotted properties.

dotted notation to traversel in serializer gives AttributeError

dotted notation to traverse attributes are not working

user_platforms = UserPlatformSerializer(source='user.user_platforms', read_only=True, many=True)

AttributeError: Cannot find 'user.user_platforms' on Artist object, 'user.user_platforms__platform__logo' is an invalid parameter to prefetch_related()

function based view support

Hi, i want to ask you something. Related the issue on #18, does this works in function based view? like, using method_decorator or something like that?

Runtime error thrown if you forget to add many_true to a Serializer that needs it.

If you forget to add many=True to a serializer that has the reverse side of the ForeignKey we calculate the wrong prefetch_related fields and we get an error. We can't catch this error early because it's only thrown when the queryset is evaluated

RestQL integration/support

This looks really cool.

My team has been using and contributing to RestQL for several months and we built the majority of the implementation around auto eager loading/mapping definitions, from the view level.

We definitely considered trying to do auto eager in the way you have but certainly ran into the issue of SerializerMethodField and for that reason we opted not to implement that.

All that said, having the option to have this mixin potentially play well with django-restql may be advantageous for projects that don't have as many complex serializers, don't want to manage the mapping definition for those serializers (it's explicit but needs to be maintained by the team), but still want the option of GraphQL-like data fetching (and other niceties provided by RestQL).

I haven't used this library: it was listed on the most recent Django newsletter (congrats!). But it seems like these two projects have a lot in common (again, at this time I haven't tried out this project or looked at its source), and either combining or testing out integrating them could be very advantageous.

Curious for your thoughts!

BUG: get_serializer_class not supported

When using the AutoPrefetchViewSetMixin errors get thrown for fields which are not on the original serializer_class (?). We have a GenericViewSet which has a main model and actions which return different serializers (basically nested info). When using the Mixin, the following error appears:

b'{"message":"Invalid field name(s) given in select_related: \'gateway\', \'unit_id\', \'protocol\', \'type\', \'manufacturer\', \'conf\'. Choices are: address_id, type_id, parent_id, created_by, company_id","status":500,"error":true,"detail":""}'

Where the select_related fields are the actual fields of the serializer being used and the Choices are the fields on the main model/ serializer_class for this ViewSet.

FEATURE REQUEST: user can customize prefetch fields in ViewSet

Reference Implementation:

class AutoPrefetchViewSetMixin:
    auto_prefetch_excluded_fields = set()
    auto_prefetch_extra_select_fields = set()
    auto_prefetch_extra_prefetch_fields = set()

    def get_auto_prefetch_excluded_fields(self):
        return self.auto_prefetch_excluded_fields

    def get_auto_prefetch_extra_select_fields(self):
        return self.auto_prefetch_extra_select_fields

    def get_auto_prefetch_extra_prefetch_fields(self):
        return self.auto_prefetch_extra_prefetch_fields

    def get_queryset(self):
        serializer = self.get_serializer()
        qs = super().get_queryset()

        kwargs = {
            "excluded_fields": self.get_auto_prefetch_excluded_fields(),
            "extra_select_fiedls": self.get_auto_prefetch_extra_select_fields(),
            "extra_prefetch_fields": self.get_auto_prefetch_extra_prefetch_fields(),
        }
        return prefetch(qs, serializer, **kwargs)


def prefetch(
    queryset,
    serializer: Type[ModelSerializer],
    *,
    excluded_fields=set(),
    extra_select_fields=set(),
    extra_prefetch_fields=set(),
):
    select_related, prefetch_related = _prefetch(serializer)
    select_related = select_related + extra_select_fields - excluded_fields
    prefetch_related = prefetch_related + extra_prefetch_fields - excluded_fields
    try:
        if select_related:
            queryset = queryset.select_related(*select_related)
        if prefetch_related:
            queryset = queryset.prefetch_related(*prefetch_related)
        return queryset
    except FieldError as e:
        raise ValueError(
            f"Calculated wrong field in select_related. Do you have a nested serializer for a ForeignKey where "
            f"you've forgotten to specify many=True? Original error: {e}"
        )

Prefetch when returning nested object from POST

I've seen an issue where if a large nested object is POSTed to the server, the WritableNestedSerializer returns the instance it created from create() and then that object is fetched without select_related or prefetch_related applied to it.

Using AutoPrefetchViewSetMixin, the GET requests are all prefetched correctly but responses from POSTs are not. Is this something within scope for this project or is there some other kind of work around?

FieldError using HyperlinkedIdentityField

Exception:

FieldError at /api/web_content/engine_configs/
Invalid field name(s) given in select_related: '*'. Choices are: (none)

select_related is "*" in prefetch:

Workaround: ignore HyperlinkedIdentityField fields:

    for name, field_instance in fields:
        field_type_name = field_instance.__class__.__name__
        logger.debug(
            f'{" " * indentation} Field "{name}", type: {field_type_name}, src: "{field_instance.source}"'
        )

        # We potentially need to recurse deeper
        if isinstance(field_instance, (BaseSerializer, RelatedField, ManyRelatedField)) \
                and not isinstance(field_instance, HyperlinkedIdentityField):
            # ...

Serializer:

class EngineUpdateSerializer(HispaSerializerMixin, serializers.ModelSerializer):
    url = serializers.HyperlinkedIdentityField(view_name='web_contents:engineupdate-detail')
    config = SimpleEngineConfigSerializer()
    first_update = SimpleEngineUpdateSerializer()
    parent = SimpleEngineUpdateSerializer()

    class Meta:
        model = EngineUpdate

Add CircleCI Status Badge

https://circleci.com/docs/2.0/status-badges/

Duplicate LEFT Join

In my serializer I have 6 same FK fields:

field = CustomSerializer(
        source="field", read_only=True
    )
field = CustomSerializer(
    source="field2", read_only=True
)

And django_auto_prefetching.prefetch(queryset, self.get_serializer_class()) generate 2 LEFT JOIN, that cause:

django.db.utils.OperationalError: (1116, 'Too many tables; MariaDB can only use 61 tables in a join')

Can we prevent duplicate JOIN by default?

How to use your package?

I am having performance issues with drf. I feel like your library might help, but I can't figure out how to use it, or if it pertains. All the things I try result in no impact. Please see the stackoverflow question.

I'd appreciate any thoughts.

Why PrimaryKeyRelatedField is not in IGNORED_FIELD_TYPES?

I think PrimaryKeyRelatedField also doesn't need to apply prefetchin.
The code of to_representation in PrimaryKeyRelatedField is:

    def to_representation(self, value):
        if self.pk_field is not None:
            return self.pk_field.to_representation(value.pk)
        return value.pk

The pk_field field must be a simple type (not related, i think) and it is also accessible directly in the main query without applying a join.

I'm having big overloads because the PrimaryKeyRelatedField add a lot of unnecessary joins.

What do you think about it?? I can send a PR.
In case you don't want to add that change, maybe it would be good to make it configurable. By default it would stay as it is to avoid risking other projects that use the library

Thanks!

Opinion on djangorestframework-serializer-extensions

I see djangorestframework-serializer-extensions provides similar functionality with its extensions_auto_optimize = True (documented here). Are there any relevant differences to your knowledge?

https://github.com/evenicoulddoit/django-rest-framework-serializer-extensions

Shouldn't we check the field only if it's inherited from ModelSerializer instead of BaseSerializer?

Hi, thank you for releasing this amazing work, I found it's really a package with great potential.

I tried to adapt my current project to this package, and I found that we are currently checking the field_instance as below:

# __init__.py, Line 144

        # We potentially need to recurse deeper
        if isinstance(
                field_instance, (BaseSerializer, RelatedField, ManyRelatedField)
        ) and ...

IMHO, should the BaseSerializer be replaced with ModelSerializer? Because the serializers.Serializer is also inherited from BaseSerializer which is not the target we want to check if any select_related and prefetch_related fields should be included.

I know we can exclude the fields by specifying them with auto_prefetch_excluded_fields but I'm just curious about the use cases, for me, the serializers.Serializer is often used as the request body validator and it's also can be used to generate the API schema with drf-spectacular (it's also an awesome package that generates the OpenAPI schema automatically btw).

Overwriting `get_queryset` method

Hi,

I have a problem using your mixin when my viewset is having their own custom get_queryset method. Its simple does not work... Short example:

class ArtistViewSet(AutoPrefetchViewSetMixin, viewsets.ModelViewSet):
  def get_queryset(self):
    return Artist.objects.filter(owner=self.request.user)

in this case, your mixin method get_queryset is not invoked at all...

Is there a way around to make this working?
Thanks

Allow for generic views

Does this only work with modelviewsets? or can it work with Generic views
https://www.django-rest-framework.org/api-guide/generic-views/