es-delete-by-query's People
es-delete-by-query's Issues
Remove connection from function
It's better to have es connection as an argument rather than connecting inside the function (if we use it as a library, we probably have the connection already in our code).
Implement docopt instead of argparse
Docopt is much more convenient than just argparse, and, in my option, worth implementing
Compatibility with Elasticsearch v1.x server
In order to make this work with Elasticsearch 1.4.2, I had to make a few edits. Just documenting here for other onlookers or to add this information to the readme in a PR.
Thanks for this great project! ❤️
Sniff options not supported
The sniff options defined at cli_delete_by_query.py#L75-L80
aren't supported in Elasticsearch 1.x.
Related issue: elastic/elasticsearch-py#358
Solution: The solution is just to comment out those sniff options.
Error:
Traceback (most recent call last):
File "cli_delete_by_query.py", line 80, in <module>
sniffer_timeout=60
File "/home/deployer/es-delete-by-query/venv/lib/python3.4/site-packages/elasticsearch/client/__init__.py", line 188, in __init__
self.transport = transport_class(_normalize_hosts(hosts), **kwargs)
File "/home/deployer/es-delete-by-query/venv/lib/python3.4/site-packages/elasticsearch/transport.py", line 122, in __init__
self.sniff_hosts(True)
File "/home/deployer/es-delete-by-query/venv/lib/python3.4/site-packages/elasticsearch/transport.py", line 237, in sniff_hosts
hosts = list(filter(None, (self._get_host_info(n) for n in node_info)))
File "/home/deployer/es-delete-by-query/venv/lib/python3.4/site-packages/elasticsearch/transport.py", line 237, in <genexpr>
hosts = list(filter(None, (self._get_host_info(n) for n in node_info)))
File "/home/deployer/es-delete-by-query/venv/lib/python3.4/site-packages/elasticsearch/transport.py", line 221, in _get_host_info
host['port'] = int(host['port'])
ValueError: invalid literal for int() with base 10: '9200]'
Incompatible scroll/scan search API's
Related issue, I was clued in by this comment mentioning the equivalent scroll/scan API call for a Elasticsearch v1.x server: elastic/elasticsearch-net#326 (comment)
API docs for reference: https://www.elastic.co/guide/en/elasticsearch/reference/1.4/search-request-scroll.html
Solution: The solution is to use v1.x of the elasticsearch
Python library, pip install elasticsearch==1.9.0
. https://pypi.org/project/elasticsearch/1.9.0/ specifies "For Elasticsearch 1.0 and later, use the major version 1 (1.x.y) of the library."
For reference, I was able to use elasticsearch==6.2.0
which is what the current requirements.txt
specifies against an Elasticsearch 2.3.1 server without issues.
Error:
update_validated - ERROR - Elasticsearch error: ElasticsearchIllegalArgumentException[Failed to decode scrollId]; nested: IOException[Bad Base64 input character decimal 123 in array position 0];
Traceback (most recent call last):
File "cli_delete_by_query.py", line 99, in <module>
query=delete_query
File "/home/deployer/es-delete-by-query/delete_by_query.py", line 92, in delete_by_query
raise ex
File "/home/deployer/es-delete-by-query/delete_by_query.py", line 68, in delete_by_query
page = es.scroll(scroll_id=sid, scroll='2m')
File "/home/deployer/es-delete-by-query/venv/lib/python3.4/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
return func(*args, params=params, **kwargs)
File "/home/deployer/es-delete-by-query/venv/lib/python3.4/site-packages/elasticsearch/client/__init__.py", line 1011, in scroll
params=params, body=body)
File "/home/deployer/es-delete-by-query/venv/lib/python3.4/site-packages/elasticsearch/transport.py", line 314, in perform_request
status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)
File "/home/deployer/es-delete-by-query/venv/lib/python3.4/site-packages/elasticsearch/connection/http_urllib3.py", line 180, in perform_request
self._raise_error(response.status, raw_data)
File "/home/deployer/es-delete-by-query/venv/lib/python3.4/site-packages/elasticsearch/connection/base.py", line 125, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
Change structure and names
The function name should be delete_by_query
or delete_docs_by_query
.
Move the function into separate library file, so it could be used both as the command-line util (current code) and a library to use programmatically:
from delete_by_query import delete_by_query
delete_by_query(es_connection, index, doc_type, query)
Replace prints to logging
It is better to use logger (http://docs.python-guide.org/en/latest/writing/logging/) instead of print statements.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.