Code Monkey home page Code Monkey logo

Comments (7)

pquentin avatar pquentin commented on June 13, 2024

Hello! Thanks for your report and sorry that upgrading gives you an error. I'm actually not sure why. Here's how I could help you:

from elasticsearch-py.

qfliu avatar qfliu commented on June 13, 2024

Thanks for the reply. I was upgrading es python client so that I can inject headers on runtime.

"How easy is it for you to try other versions of the client between 7.1.0 and 7.17.17? In particular, 7.9, 7.15 and 7.16 brought big changes."

  • I can definitely try other versions. Before I tried with es 7.8.0 and es-py 7.8.0. And I got the same error

"7.1.0 did not have an async client, how were you using AsyncElasticsearch? Maybe with https://github.com/elastic/elasticsearch-py-async?"

  • correct, I was using elasticsearch-py-async before with 7.1.0 and I see it's deprecated and become native supported in elasticsearch-py.

"Do you have a code sample to reproduce this issue?"
We wrap around the es functions with some our own logic and here is our test code. We are using thrift calls also. Basically, it's just issuing a simple search request, which passes fine, and use the scroll_id we get from first call to search the next part.

response_1 = await self._call_service(
            'scrollVideos', T.ScrollVideosRequest(
                metadata=self.request_metadata,
                query=T.SearchVideosQuery(
                    exclude_filters=[T.SearchVideosFilter(platforms=['youtube'])]
                ),
                scroll_size=999,
            )
        )

        self.assertEqual(response_1.total, 5)
        self.assertIsInstance(response_1.scroll_id, str)
        self.assertEqual(len(response_1.videos), 5)

        response_2 = await self._call_service(
            'scrollVideos', T.ScrollVideosRequest(
                metadata=self.request_metadata,
                scroll_id=response_1.scroll_id,
            )
        )

"Can you please enable logging and report back? This will tell us exactly what the client is sending to the Elasticsearch server. https://elasticsearch-py.readthedocs.io/en/v7.17.10/index.html?highlight=logging#logging
"

  • I will try to look into this. I was working on getting more logging out of es docker.

from elasticsearch-py.

pquentin avatar pquentin commented on June 13, 2024

I tried to reproduce, without success.

import asyncio
import sys
from datetime import datetime

import elasticsearch
import elasticsearch.helpers

es = elasticsearch.Elasticsearch("http://localhost:9200")
print(es.info())
print(sys.version_info)
print(elasticsearch.__version__)


def gendata():
    for i in range(10_000):
        yield {
            "_index": "test-index",
            "author": "kimchy",
            "text": "Elasticsearch: cool. bonsai cool.",
            "timestamp": datetime.now(),
            "id": i,
        }


elasticsearch.helpers.bulk(es, gendata())


async def scroll():
    async_es = elasticsearch.AsyncElasticsearch()

    resp = await async_es.search(
        index="test-index",
        query={"match_all": {}},
        scroll="1m",
        size=100,
    )
    assert len(resp["hits"]["hits"]) == 100

    resp2 = await async_es.scroll(
        scroll_id=resp["_scroll_id"],
        scroll="1m",
    )
    assert len(resp2["hits"]["hits"]) == 100

    await async_es.close()


asyncio.run(scroll())

from elasticsearch-py.

pquentin avatar pquentin commented on June 13, 2024

By the way, I would suggest if possible to use the scan helper or the point in time API, which is recommended over scrolling but does not have a helper in the client yet.

from elasticsearch-py.

qfliu avatar qfliu commented on June 13, 2024

By the way, I would suggest if possible to use the scan helper or the point in time API, which is recommended over scrolling but does not have a helper in the client yet.

Thanks for the suggestion. Just to clarify:

  • Scan does not return any result on the first search call while scroll request does. Is this correct?
  • What does it mean "they do not have a helper" in the client yet?

from elasticsearch-py.

qfliu avatar qfliu commented on June 13, 2024

Was the api changed such that if the scroll request with scroll_id returns 0 hits, it will throw 400 illegal_argument_exception?

from elasticsearch-py.

qfliu avatar qfliu commented on June 13, 2024

Finally, I found the error.

It was actually not the scroll request causing it, it was the clear_scroll request. My bad :p

We were on es 7.0.1 before and now trying to move to 7.17.9. We were calling clear_scroll as:

await es_client.clear_scroll(es_response['_scroll_id'])

This is bad because the parameter order was changed during the version upgrade:
v7.0.1:

def clear_scroll(self, scroll_id=None, body=None, params=None):

v7.17.9

def clear_scroll(self, body=None, scroll_id=None, params=None, headers=None):

The change was first introduced in v7.5.0 from looking at the github repo. Maybe it happened earlier but I cannot seem to find the tags from 7.2.x to 7.4.x

from elasticsearch-py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.