elastic / curator Goto Github PK
View Code? Open in Web Editor NEWCurator: Tending your Elasticsearch indices
License: Other
Curator: Tending your Elasticsearch indices
License: Other
I run an ELK stack and keep the last 30 days' worth of indices open for searching with Kibana. I automatically close any indexes older than 30 days using Curator, which works fine.
However, I'm looking to build in automation for keeping on top of disk space use and I've found that when I run
/usr/local/bin/curator --host localhost --prefix logstash- -C space -g 200
It seems to only take into account the currently-open indices, which implies it will run into issues when I run up against the 200G limit I've set, as the open indexes will never be that large.
Is this an intended feature or does something need altering in order to account for closed indexes as well when running a space-based cleanup?
Quick Question:
Does this script have to be run nightly (since logstash creates a new index every day),
or do I just run it once?
For instance, here is the command to run:
curator.py --host my-elasticsearch -c 30 -d 90
Thanks
Some of my indices are named foo-%Y.%m
and curator doesn't work on them.
Some logging use cases benefit from not having to compute what is a "daily" or "weekly" or whatever window of indices.
We could have curator, with its knowledge of index ages, allow users to manage aliases pointing at defined ranges of time. For example, we could configure a 'weekly' alias pointing always to the last 7 days of indices or a 'yesterday' alias pointing at, well, yesterday.
Not sure how much value this provides because it's not something I hear requested frequently.
Curator currently looks for 1.0.0 as a max version. Do we want to have to increase this for every minor release?
It would be quite convenient if one could simply pip install curator
.
If there's interest I could work up a pull request to do that.
(Note however that https://pypi.python.org/pypi/curator already exists and is an image gallery tool)
I've created a pull request for a trivial logic fix for dry-run when deleting based on space usage.
With snapshots being part of the new features of ES 1.0 it makes sense to have Curator be able to capture snapshots and save to a designated target (S3, for example). Bonus points if we get it to delete the old index after a successful snapshot/backup.
I've installed the curator 0.6.1 version via pip onto centos 6 and I'm seeing a problem where the time_unit is incorrect in for hours in find_expired_indicies. It is expecting hourly while the argument being passed to the script is hours.
required_parts = 4 if time_unit == 'hourly' else 3
It's correct in github and it'd be nice if you could update the script in pip as well.
Update CHANGELOG and do a little logging tweak.
We should be able to act on the previous day's index as soon as the day rolls over. The current settings disallow this:
$ date
Tue Apr 15 11:12:10 CDT 2014
$ ./curator.py -d 1 --dry-run
…
2014-04-15T11:12:14.167 INFO main:369 Deleting indices older than 1 days...
2014-04-15T11:12:14.172 INFO index_loop:294 Would have attempted deleting index logstash-2014.04.13 because it is 1 day, 0:00:00 older than the calculated cutoff.
2014-04-15T11:12:14.172 INFO find_expired_indices:212 logstash-2014.04.14 is 0:00:00 above the cutoff.
2014-04-15T11:12:14.173 INFO find_expired_indices:212 logstash-2014.04.15 is 1 day, 0:00:00 above the cutoff.
…
logstash-2014.04.14 is 0:00:00 above the cutoff
also suggests that it should be able to be acted on.
I installed latest version (1.0.0) with pip. No matter which curator command I run, I get these errors:
Traceback (most recent call last):
File "/usr/local/lib/python2.6/dist-packages/logging/init.py", line 723, in emit
msg = self.format(record)
File "/usr/local/lib/python2.6/dist-packages/logging/init.py", line 609, in format
return fmt.format(record)
File "/usr/local/lib/python2.6/dist-packages/logging/init.py", line 402, in format
s = self._fmt % record.dict
KeyError: 'funcName'
Ubuntu 10.04 LTS, Python 2.6.5.
Thank you.
curator.py
Line 250 reads:
object_list = get_snaplist(client, repository)
But should read:
object_list = get_snaplist(client, repository, prefix=prefix)
http://www.elasticsearch.org/guide/reference/api/admin-indices-open-close/
Closed indexes incur basically no run-time resource usage and reads/writes to them should fail.
Closing an index before deleting it allows you to make it unsearchable before you finally purge it. This can be useful in panicked situations where you might need that data you thought you deleted.
Scenario: Want 90 days of logs.
This gives you a 30 day window to go "oops!" and recover anything closed if you need it but otherwise only consumes disk space (but not other resources) until deleted.
Hello guys! Thanks for your work.
I was thinking about a new feature.
If you want to re-import some data with logstash because you lost someting, may be you want to remove old data before and then, re-import the new data. So, as a feature, Could you add a source/origin date to apply a interval with -d (by example) instead of using datetime.utcnow(). It my be to use a new param in the command line.
Thanks for reading....
I was thinking to schedule a delete for old indices. Unfortunately, the pattern of the indices name does not include any separator (e.g. myindex_20140624) and an error is raised by the split function line 288:
parts = unprefixed_object_name.split(separator)
Would it be possible to parse the date part of an index using a regexp instead of using a split?
Thanks in advance!
Hello,
I use Elasticsearch 1.0.0.RC2 on Debian Wheezy from the deb package available on the Elasticsearch site.
When I use curator I get the following error :
xlogerais@virt-cha-xman ~ $ curator --host localhost -d5 -c2
2014-02-28T23:59:44.865 INFO main:332 Job starting...
2014-02-28T23:59:44.866 INFO _new_conn:257 Starting new HTTP connection (1): localhost
2014-02-28T23:59:44.871 INFO log_request_success:49 GET http://localhost:9200/ [status:200 request:0.005s]
Traceback (most recent call last):
File "/usr/local/bin/curator", line 9, in
load_entry_point('elasticsearch-curator==0.6.2', 'console_scripts', 'curator')()
File "/usr/local/lib/python2.7/dist-packages/curator/curator.py", line 344, in main
version_number = get_version(client)
File "/usr/local/lib/python2.7/dist-packages/curator/curator.py", line 148, in get_version
return tuple(map(int, version.split('.')))
ValueError: invalid literal for int() with base 10: 'RC2'
It seems that the naming used for elasticsearch for theirs release candidates cause problem to curator.
Thanks.
It should be possible to go in and delete individual documents by tag - or the non-existence of a tag. This would allow you to delete unimportant documents in your collection.
Could also be useful to allow the movement of "important" docs to a different index.
Hi there,
it's more than an inclusion request, than an issue.
Here is my modifications to include ssl+url_prefix to the Elasticsearch() call and also I added optparse backward compatibility for old systems (but still used) with python 2.6.
Else, thanks for this useful script :)
help message looks like this now :
$ ./curator.py --help
Usage: curator.py [options]
Curator for Elasticsearch indices. Can delete (by space or time), close,
disable bloom filters and optimize (forceMerge) your indices.
Options:
--version show program's version number and exit
-h, --help show this help message and exit
--host=HOST Elasticsearch host. Default: localhost
--url_prefix=URL_PREFIX
Elasticsearch http url prefix. Default: none
--port=PORT Elasticsearch port. Default: 9200
--ssl Connect to Elasticsearch through SSL. Default: false
-t TIMEOUT, --timeout=TIMEOUT
Elasticsearch timeout. Default: 30
-p PREFIX, --prefix=PREFIX
Prefix for the indices. Indices that do not have this
prefix are skipped. Default: logstash-
-s SEPARATOR, --separator=SEPARATOR
Time unit separator. Default: .
-C CURATION_STYLE, --curation-style=CURATION_STYLE
Curate indices by [time, space] Default: time
-T TIME_UNIT, --time-unit=TIME_UNIT
Unit of time to reckon by: [days, hours] Default: days
-d DELETE_OLDER, --delete=DELETE_OLDER
Delete indices older than n TIME_UNITs.
-c CLOSE_OLDER, --close=CLOSE_OLDER
Close indices older than n TIME_UNITs.
-b BLOOM_OLDER, --bloom=BLOOM_OLDER
Disable bloom filter for indices older than n
TIME_UNITs.
-g DISK_SPACE, --disk-space=DISK_SPACE
Delete indices beyond n GIGABYTES.
--max_num_segments=MAX_NUM_SEGMENTS
Maximum number of segments, post-optimize. Default: 2
-o OPTIMIZE, --optimize=OPTIMIZE
Optimize (Lucene forceMerge) indices older than n
TIME_UNITs. Must increase timeout to stay connected
throughout optimize operation, recommend no less than
3600.
-n, --dry-run If true, does not perform any changes to the
Elasticsearch indices.
-D, --debug Debug mode
-l LOG_FILE, --logfile=LOG_FILE
log file
Examples highlighting new features :
$ curator.py --host foo.bar --port 443 --ssl --url_prefix "backend" --prefix "windows-" --delete 31 -n
2014-01-27T13:22:42.000 INFO main:333 Job starting...
2014-01-27T13:22:42.000 INFO main:352 Deleting indices older than 31 days...
2014-01-27T13:22:42.001 INFO _new_conn:635 Starting new HTTPS connection (1): foo.bar
2014-01-27T13:22:42.025 INFO log_request_success:49 GET http://foo.bar:443/backend/_settings [status:200 request:0.024s]
[...]
2014-01-27T13:22:42.122 INFO find_expired_indices:196 windows-2014.01.27 is 32 days, 0:00:00 above the cutoff.
2014-01-27T13:22:42.123 INFO index_loop:309 DELETE index operations completed.
2014-01-27T13:22:42.123 INFO main:372 Done in 0:00:00.124427.
Please find the following patch,
Cheers!
diff --git a/curator.py b/curator.py
index 11c8272..0dd3ff8 100755
--- a/curator.py
+++ b/curator.py
@@ -34,7 +34,6 @@
import sys
import time
import logging
-import argparse
from datetime import timedelta, datetime
import elasticsearch
@@ -55,12 +54,21 @@ logger = logging.getLogger(__name__)
def make_parser():
""" Creates an ArgumentParser to parse the command line options. """
- parser = argparse.ArgumentParser(description='Curator for Elasticsearch indices. Can delete (by space or time), close, disable bloom filters and optimize (forceMerge) your indices.')
-
- parser.add_argument('-v', '--version', action='version', version='%(prog)s '+__version__)
-
+ help_desc = 'Curator for Elasticsearch indices. Can delete (by space or time), close, disable bloom filters and optimize (forceMerge) your indices.'
+ try:
+ import argparse
+ parser = argparse.ArgumentParser(description=help_desc)
+ parser.add_argument('-v', '--version', action='version', version='%(prog)s '+__version__)
+ except ImportError:
+ import optparse
+ parser = optparse.OptionParser(description=help_desc, version='%prog '+ __version__)
+ parser.parse_args_orig = parser.parse_args
+ parser.parse_args = lambda: parser.parse_args_orig()[0]
+ parser.add_argument = parser.add_option
parser.add_argument('--host', help='Elasticsearch host. Default: localhost', default='localhost')
+ parser.add_argument('--url_prefix', help='Elasticsearch http url prefix. Default: none', default='')
parser.add_argument('--port', help='Elasticsearch port. Default: 9200', default=9200, type=int)
+ parser.add_argument('--ssl', help='Connect to Elasticsearch through SSL. Default: false', action='store_true', default=False)
parser.add_argument('-t', '--timeout', help='Elasticsearch timeout. Default: 30', default=30, type=int)
parser.add_argument('-p', '--prefix', help='Prefix for the indices. Indices that do not have this prefix are skipped. Default: logstash-', default='logstash-')
@@ -332,8 +340,7 @@ def main():
logger.error('Malformed arguments: {0}'.format(';'.join(check_args)))
parser.print_help()
return
-
- client = elasticsearch.Elasticsearch('{0}:{1}'.format(arguments.host, arguments.port), timeout=arguments.timeout)
+ client = elasticsearch.Elasticsearch(host=arguments.host, port=arguments.port, url_prefix=arguments.url_prefix, timeout=arguments.timeout, use_ssl=arguments.ssl)
# Delete by space first
if arguments.disk_space:
I occasionally run into an issue where something creates a bunch of future indexes. (time being set incorrectly on a host being the typical reason)
Since I trust the timestamp I receive (if i don't trust it, it makes reindexing impossible) I'd like to be able to delete the future timestamps with curator. Doing it by hand is not much fun.
Hi - it would be really cool if curator was able to pick which indexes to preform its magic on based on a partial index name
for example
logstash-YYYY.MM.dd - keep these for 30 days
custom-app-YYYY.MM.dd - keep these for only 5 days
Cheers
I install curator by sudo pip install elasticsearch-curator in aws ec2 instance, after installation complete, I call command "curator --help", and got error below:
anything wrong with my setting?
Traceback (most recent call last):
File "/usr/bin/curator", line 5, in
from pkg_resources import load_entry_point
File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 2655, in
working_set.require(requires)
File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 648, in require
needed = self.resolve(parse_requirements(requirements))
File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 546, in resolve
raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: elasticsearch>=1.0.0,<2.0.0
Using python 3.4
and curator-script.py 1.1.2
on Windows Server 2012
Command:
curator show --show-snapshots --repository "Kibana_Repository"
Result:
Traceback (most recent call last):
File "C:\Python34\Scripts\curator-script.py", line 9, in <module>
load_entry_point('elasticsearch-curator==1.1.2', 'console_scripts', 'curator')()
File "C:\Python34\lib\site-packages\curator\curator.py", line 643, in main
stream=open(arguments.log_file, 'a') if arguments.log_file else sys.stderr)
FileNotFoundError: [Errno 2] No such file or directory: '/dev/null'
Would be nice to add a confirmation prompt before actually triggering deletion of indices, eg. when the delete command is invoked, print out the list of indices it has identified to be deleted, and allow the end user to specify Y/N to proceed? (just in case the end user mistyped the older-than value)
In the haste to fix the oversight in 1.2.1, this file did not get the change.
Hi,
i'm experiencing problems with curator running as a cronjob.
Cron sends emails as soon as a job writes to stderr.
So far so good - but as curator seems to write its log to stderr when there is no logfile defined, cron sends emails each time the job is running.
I think this might be a bug?
Here are the code parts where stderr is used:
https://github.com/elasticsearch/curator/search?q=stderr&ref=cmdform
kind regards
Dominik
Proposed roughly from a discussion in #logstash IRC -
Create a future index with dynamic settings, such as doing shard allocation based on some computed value, etc.
hy all,
i'm trying to use curator with my ES cluster, but whatever options i uses it always says:
2014-04-24T12:00:56.665 ERROR find_expired_indices:201 Could not find a valid timestamp from the index: [name of my indices]
my indices are named like that : collectd-logstash-DD.MM.YYYY
i've tried this options:
curator --host localhost -p collectd-logstash- -b 2 -c 4
but no luck
We don't have elaticsearch 1.0. We have elasticsearch 0.9x, and as per the readme, it's not compatible:
Expected Elasticsearch version range > 1.0.0 < 2.0.0
ERROR: Incompatible with version 0.90.3 of Elasticsearch. Exiting.
I tried to do a:
pip install elasticsearch-curator==0.6
but it ends up installing curator 1.0
I also tried doing a:
yolk -V elasticsearch-curator
And it only reports back:
elasticsearch-curator 1.0.0
Is there a way to install curator 0.6 via pip?
(Sorry if this a pip newbie question...)
Curator would be a great place to implement time based shard allocation for indicies.
Example use case:
After n days set index tags so that indicies move to elasticsearch instances with larger slower disks.
the error message received is :
File "logstash_index_cleaner.py", line 63, in get_index_epoch return time.mktime([int(part) for part in year_month_day_optionalhour] + [0, 0, 0, 0, 0]) TypeError: Tuple or struct_time argument required
the command used is:- python logstash_index_cleaner.py -d 14
this is on a 64 bit windows 2012 server with no internet access, therefore not sure if it has to do with the representation of the date format in windows or if it depends on the internet connectivity
Hi,
Would it be possible to extend curator to delete not only specific indices like "logstash-2014.05.05", "logstash-2014.05.06", but also specific types like "logstash-2014.05.05/syslog" or "logstash-2014.05.05/apache" ? This would allow to manage different lifecycles for different types.
Hi,
Thanks for the wonderful tool. I'm trying to delete indices matching a pattern with this command:
curator -p .marvel- -d 0
Unfortunately, this doesn't work. It seems to suggest that it would work in the documentation without a prefix, i.e.,
curator --host blah -d 0
Could curator support prefixes?
Thank you.
-=david=-
I suggest that check index time using time format, not using string split.
What do you think?
like this:
curator delete [-p prefix] [-f time_format]
Traceback (most recent call last):
File "/usr/bin/curator", line 9, in <module>
load_entry_point('elasticsearch-curator==1.0.0', 'console_scripts', 'curator')()
File "/usr/lib/python2.6/site-packages/curator/curator.py", line 345, in main
version_number = get_version(client)
File "/usr/lib/python2.6/site-packages/curator/curator.py", line 163, in get_version
version = client.info()['version']['number']
TypeError: string indices must be integers
elasticsearch 1.1.1
When running curator on python 2.6 I get the following issue:
Traceback (most recent call last):
File "/usr/bin/curator", line 9, in
load_entry_point('elasticsearch-curator==1.1.0', 'console_scripts', 'curator')()
File "/usr/lib/python2.6/site-packages/curator/curator.py", line 668, in main
logging.debug("argdict = {}".format(argdict))
ValueError: zero length field name in format
It's simple enough to patch by adding the '0' to the script. Afterwards it works like a champ.
Why not upgrade python? My version of yum is tied to python V2.6, so upgrading will break my OS.
Not sure what's going on yet, but closed indices are not recognized by the 1.0 API calls and so are not deleted. This behavior did not apparently exist with the 0.4.4 API calls for ES versions < 1.0
$ ls -1 | grep logstash
logstash-2014.02.13
logstash-2014.02.15
logstash-2014.02.16
logstash-2014.02.17
logstash-2014.02.18
logstash-2014.02.19
logstash-2014.02.20
logstash-2014.02.21
logstash-2014.02.22
$ curator --host blackbox -D -d 6
2014-02-22T13:54:49.193 INFO main:333 Job starting...
2014-02-22T13:54:49.194 INFO _new_conn:257 Starting new HTTP connection (1): blackbox
2014-02-22T13:54:49.195 DEBUG _make_request:374 Setting read timeout to 30
2014-02-22T13:54:49.197 DEBUG _make_request:414 "GET / HTTP/1.1" 200 297
2014-02-22T13:54:49.198 INFO log_request_success:49 GET http://blackbox:9200/ [status:200 request:0.004s]
2014-02-22T13:54:49.198 DEBUG log_request_success:51 > None
2014-02-22T13:54:49.198 DEBUG log_request_success:52 < {
"status" : 200,
"name" : "Blackbox",
"version" : {
"number" : "1.0.0",
"build_hash" : "a46900e9c72c0a623d71b54016357d5f94c8ea32",
"build_timestamp" : "2014-02-12T16:18:34Z",
"build_snapshot" : false,
"lucene_version" : "4.6"
},
"tagline" : "You Know, for Search"
}
2014-02-22T13:54:49.198 DEBUG main:346 Detected Elasticsearch version 1.0.0
2014-02-22T13:54:49.198 INFO main:359 Deleting indices older than 6 days...
2014-02-22T13:54:49.199 DEBUG _make_request:374 Setting read timeout to 30
2014-02-22T13:54:49.202 DEBUG _make_request:414 "GET /logstash-*/_settings HTTP/1.1" 200 1543
2014-02-22T13:54:49.202 INFO log_request_success:49 GET http://blackbox:9200/logstash-*/_settings [status:200 request:0.003s]
2014-02-22T13:54:49.202 DEBUG log_request_success:51 > None
2014-02-22T13:54:49.202 DEBUG log_request_success:52 < {"logstash-2014.02.21":{"settings":{"index":{"uuid":"GMamNSN-TmKodXvgYkktmg","number_of_replicas":"1","number_of_shards":"5","refresh_interval":"5s","version":{"created":"1000099"}}}},"logstash-2014.02.18":{"settings":{"index":{"codec":{"bloom":{"load":"false"}},"uuid":"s9V9b2tIRxyanZJ4s0P5vQ","number_of_replicas":"1","analysis":{"analyzer":{"default":{"type":"standard","stopwords":"_none_"}}},"number_of_shards":"5","refresh_interval":"5s","version":{"created":"901199"}}}},"logstash-2014.02.19":{"settings":{"index":{"codec":{"bloom":{"load":"false"}},"uuid":"2k3v2gl2RROXYcD3vcMslQ","number_of_replicas":"1","analysis":{"analyzer":{"default":{"type":"standard","stopwords":"_none_"}}},"number_of_shards":"5","refresh_interval":"5s","version":{"created":"901199"}}}},"logstash-2014.02.20":{"settings":{"index":{"codec":{"bloom":{"load":"false"}},"uuid":"E6IlpHOqQauIbKMC0QjqEQ","number_of_replicas":"1","analysis":{"analyzer":{"default":{"type":"standard","stopwords":"_none_"}}},"number_of_shards":"5","refresh_interval":"5s","version":{"created":"901199"}}}},"logstash-2014.02.17":{"settings":{"index":{"codec":{"bloom":{"load":"false"}},"uuid":"eaWwVnnuQ-eoWoyu5Dyl4Q","number_of_replicas":"1","analysis":{"analyzer":{"default":{"type":"standard","stopwords":"_none_"}}},"number_of_shards":"5","refresh_interval":"5s","version":{"created":"901199"}}}},"logstash-2014.02.22":{"settings":{"index":{"uuid":"JU64q1s0TaWkO1hFsOaqkA","number_of_replicas":"1","number_of_shards":"5","refresh_interval":"5s","version":{"created":"1000099"}}}}}
2014-02-22T13:54:49.205 INFO find_expired_indices:209 logstash-2014.02.17 is 1 day, 0:00:00 above the cutoff.
2014-02-22T13:54:49.206 INFO find_expired_indices:209 logstash-2014.02.18 is 2 days, 0:00:00 above the cutoff.
2014-02-22T13:54:49.206 INFO find_expired_indices:209 logstash-2014.02.19 is 3 days, 0:00:00 above the cutoff.
2014-02-22T13:54:49.206 INFO find_expired_indices:209 logstash-2014.02.20 is 4 days, 0:00:00 above the cutoff.
2014-02-22T13:54:49.206 INFO find_expired_indices:209 logstash-2014.02.21 is 5 days, 0:00:00 above the cutoff.
2014-02-22T13:54:49.206 INFO find_expired_indices:209 logstash-2014.02.22 is 6 days, 0:00:00 above the cutoff.
2014-02-22T13:54:49.206 INFO index_loop:309 DELETE index operations completed.
2014-02-22T13:54:49.206 INFO main:379 Done in 0:00:00.015639.
As you can see, it clearly does not see the closed indices. We will need to correct something to fix this before officially releasing curator 1.0
The default is 2 and no matter what setting is specified at the commandline, curator optimizes to 2 segments.
Hi,
I can't run the latest master branch with Elasticsearch version 1.0.1. I've installed the latest version by cloning the master branch and then run the command setup.py install
. This was suggested in a comment of this blog post.
This is the output:
python setup.py install
running install
running bdist_egg
running egg_info
writing requirements to elasticsearch_curator.egg-info/requires.txt
writing elasticsearch_curator.egg-info/PKG-INFO
writing top-level names to elasticsearch_curator.egg-info/top_level.txt
writing dependency_links to elasticsearch_curator.egg-info/dependency_links.txt
writing entry points to elasticsearch_curator.egg-info/entry_points.txt
reading manifest file 'elasticsearch_curator.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files matching '__pycache__' found under directory '*'
warning: no previously-included files matching '*.py[co]' found under directory '*'
writing manifest file 'elasticsearch_curator.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_py
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/curator
copying build/lib/curator/curator.py -> build/bdist.linux-x86_64/egg/curator
copying build/lib/curator/__init__.py -> build/bdist.linux-x86_64/egg/curator
byte-compiling build/bdist.linux-x86_64/egg/curator/curator.py to curator.pyc
byte-compiling build/bdist.linux-x86_64/egg/curator/__init__.py to __init__.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying elasticsearch_curator.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying elasticsearch_curator.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying elasticsearch_curator.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying elasticsearch_curator.egg-info/entry_points.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying elasticsearch_curator.egg-info/requires.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying elasticsearch_curator.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
creating 'dist/elasticsearch_curator-1.0.0_dev-py2.6.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing elasticsearch_curator-1.0.0_dev-py2.6.egg
creating /usr/lib/python2.6/site-packages/elasticsearch_curator-1.0.0_dev-py2.6.egg
Extracting elasticsearch_curator-1.0.0_dev-py2.6.egg to /usr/lib/python2.6/site-packages
Adding elasticsearch-curator 1.0.0-dev to easy-install.pth file
Installing curator script to /usr/bin
Installed /usr/lib/python2.6/site-packages/elasticsearch_curator-1.0.0_dev-py2.6.egg
Processing dependencies for elasticsearch-curator==1.0.0-dev
Searching for elasticsearch>=1.0.0,<2.0.0
Reading http://pypi.python.org/simple/elasticsearch/
Best match: elasticsearch 1.0.0
Downloading https://pypi.python.org/packages/source/e/elasticsearch/elasticsearch-1.0.0.tar.gz#md5=ac087d3f7a704b2c45079e7e25b56b9f
Processing elasticsearch-1.0.0.tar.gz
Running elasticsearch-1.0.0/setup.py -q bdist_egg --dist-dir /tmp/easy_install-T98TOX/elasticsearch-1.0.0/egg-dist-tmp-1CKF3Z
error: /tmp/easy_install-T98TOX/elasticsearch-1.0.0/README.rst: No such file or directory
As you can see, the install quits with an error.
If I try to install the script with pip install .
then it seems to work.
Unpacking /root/curator
Running setup.py (path:/tmp/pip-Zn9sKP-build/setup.py) egg_info for package from file:///root/curator
warning: no previously-included files matching '__pycache__' found under directory '*'
warning: no previously-included files matching '*.py[co]' found under directory '*'
Downloading/unpacking elasticsearch>=1.0.0,<2.0.0 (from elasticsearch-curator==1.0.0-dev)
Downloading elasticsearch-1.0.0-py2.py3-none-any.whl (47kB): 47kB downloaded
Downloading/unpacking urllib3>=1.5,<2.0 (from elasticsearch>=1.0.0,<2.0.0->elasticsearch-curator==1.0.0-dev)
Downloading urllib3-1.7.1.tar.gz (67kB): 67kB downloaded
Running setup.py (path:/tmp/pip_build_root/urllib3/setup.py) egg_info for package urllib3
Installing collected packages: elasticsearch, elasticsearch-curator, urllib3
Running setup.py install for elasticsearch-curator
warning: no previously-included files matching '__pycache__' found under directory '*'
warning: no previously-included files matching '*.py[co]' found under directory '*'
Installing curator script to /usr/bin
Running setup.py install for urllib3
Successfully installed elasticsearch elasticsearch-curator urllib3
Cleaning up...
But now it shows the following error if I execute the curator
command.
Traceback (most recent call last):
File "/usr/bin/curator", line 5, in <module>
from pkg_resources import load_entry_point
File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 2655, in <module>
working_set.require(__requires__)
File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 648, in require
needed = self.resolve(parse_requirements(requirements))
File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 546, in resolve
raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: elasticsearch>=1.0.0,<2.0.0
Last but not least the installed elasticsearch version:
curl -XGET 'localhost:9200'
{
"status" : 200,
"name" : "Lucky Luke",
"version" : {
"number" : "1.0.1",
"build_hash" : "5c03844e1978e5cc924dab2a423dc63ce881c42b",
"build_timestamp" : "2014-02-25T15:52:53Z",
"build_snapshot" : false,
"lucene_version" : "4.6"
},
"tagline" : "You Know, for Search"
}
Installing elasticsearch-curator (as a dependency via pip) on Jenkins CI fails because setup.py uses environment variable BUILD_NUMBER which is set by default in Jenkins.
The workaround is to unset the BUILD_NUMBER before the pip install step, but I think that a better solution would be updating setup.py to check for a more specific environment variable (e.g. CURATOR_BUILD_NUMBER)
Thanks
Per @jordansissel's comment:
@@ -96,7 +96,7 @@ def make_parser(): parser_allocation.set_defaults(func=command_loop) parser_allocation.add_argument('-p', '--prefix', help='Prefix for the indices. Indices that do not have this prefix are skipped. Default: logstash-', default=DEFAULT_ARGS['prefix']) parser_allocation.add_argument('--timestring', help="Python strftime string to match your index definition, e.g. 2014.07.15 would be %%Y.%%m.%%d", type=str, default=None) - parser_allocation.add_argument('-T', '--time-unit', dest='time_unit', action='store', help='Unit of time to reckon by: [hours|days|weeks] Default: days', default=DEFAULT_ARGS['time_unit'], type=str) + parser_allocation.add_argument('-T', '--time-unit', dest='time_unit', action='store', help='Unit of time to reckon by: [hours|days|weeks|months] Default: days', default=DEFAULT_ARGS['time_unit'], type=str)
Don't worry about this right now, but long term, probably worth refactoring some of these "shared" arguments into a single method?
add_common_arguments(parser_allocation)
def add_common_arguments(parser): parser.add_argument('-T'...)
Curator 0.6.1 - installed with pip
Elasticsearch version 0.90.9
When trying to run an optimize task, curator fails immediately:
# /usr/local/bin/curator --host localhost --port 9201 -t 3600 -o 2 --max_num_segments 1
2014-02-13T14:09:17.747 INFO main:325 Job starting...
2014-02-13T14:09:17.747 INFO main:360 Optimizing indices older than 2 days...
2014-02-13T14:09:17.748 INFO _new_conn:257 Starting new HTTP connection (1): localhost
2014-02-13T14:09:17.761 INFO log_request_success:49 GET http://localhost:9201/_settings [status:200 request:0.013s]
2014-02-13T14:09:17.765 INFO index_loop:290 Attempting to optimize index logstash-2014.01.15 because it is 27 days, 0:00:00 older than cutoff.
2014-02-13T14:09:17.766 WARNING log_request_fail:68 GET /_cluster/state/metadata/logstash-2014.01.15 [status:400 request:0.001s]
2014-02-13T14:09:17.766 INFO log_request_fail:70 > None
Talking to @untergeek in IRC, he mentioned that curator doesn't yet support weekly indices, which I use in an app that isnt logstash, but uses the same index name formatting. It would be great to see the feature added!
I tried to grasp in a few second what curator is really for and now that I found the blog entry http://www.elasticsearch.org/blog/elasticsearch-curator-version-1-1-0-released/ I think I barely got it.
The current README.md and the start of the Wiki IMHO fail to summarize what the tool is good for in one/two sentence.
All it currently says it "Have time-series indices in Elasticsearch? This is the tool for you!" but that doesn't give away anything.
I want to add expire logs command to a cron job and run everyday.
python logstash_index_cleaner.py --host localhost --port 9200 -p logstash- -d 30 --keep-open-days 14
Running the above command fails because the tool doesn't check the status of an index before it tries to close the index.
python logstash_index_cleaner.py --host localhost --port 9200 -p logstash- -d 30 --keep-open-days 14
2014-01-13T23:34:20+0000 INFO main:178 Job starting...
2014-01-13T23:34:20+0000 INFO main:214 Index CLOSE operations commencing...
2014-01-13T23:34:20+0000 INFO main:225 Closing indices older than 14 days.
2014-01-13T23:34:20+0000 INFO _new_conn:257 Starting new HTTP connection (1): localhost
2014-01-13T23:34:20+0000 INFO log_request_success:49 GET http://localhost:9200/_settings [status:200 request:0.015s]
2014-01-13T23:34:20+0000 INFO main:238 Attempting to close index logstash-2013.12.06 because it is 25 days, 0:34:20.625264 older than cutoff.
2014-01-13T23:34:20+0000 WARNING log_request_fail:68 POST /logstash-2013.12.06/_close [status:500 request:0.011s]
2014-01-13T23:34:20+0000 INFO log_request_fail:70 > None
Traceback (most recent call last):
File "logstash_index_cleaner.py", line 255, in <module>
main()
File "logstash_index_cleaner.py", line 241, in main
do_operation = IndicesClient.close(index_name)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/utils.py", line 70, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/client/indices.py", line 105, in close
params=params)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/transport.py", line 269, in perform_request
status, headers, raw_data = connection.perform_request(method, url, params, body, ignore=ignore)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 55, in perform_request
self._raise_error(response.status, raw_data)
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/base.py", line 83, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.TransportError: TransportError(500, u'RemoteTransportException[[savvy_es_chef_master_1][inet[/10.35.91.184:9300]][indices/close]]; nested: NullPointerException; ')
--delete-older-than DELETE_OLDER_THAN
Delete snapshots older than n TIME_UNITs.
$ curator snapshot --repository example --prefix test --delete-older-than 30
...
2014-07-29 23:25:35,894 ERROR Could not find a valid timestamp for test_snapshot with timestring %Y.%m.%d
...
I would expect this to delete snapshots older than 30 days with a given prefix. Instead, find_expired_data always uses find_index_time to check timestring in index name, but snapshot names do not necessarily correspond to index names. Instead it would make sense to use end_time on the snapshots themselves.
I can submit a patch if you would like, although it's not clear to me if this is broken or just misleading?
Hey there!
CentOS 6.5, EPEL repo installed via rpm and pip installed via EPEL.
Running the following command results in the error.
[root@logstash expire-logs]# python logstash_index_cleaner.py --host my-elasticsearch -d 14
2014-01-10T00:53:43+0000 INFO main:178 Job starting...
Traceback (most recent call last):
File "logstash_index_cleaner.py", line 255, in <module>
main()
File "logstash_index_cleaner.py", line 182, in main
h = logging.NullHandler()
AttributeError: 'module' object has no attribute 'NullHandler'
Multiple iterations of the command result in the same error. If I can provide any further information, please let me know!
$ curator show --show-indices
Traceback (most recent call last):
File "/usr/bin/curator", line 9, in
load_entry_point('elasticsearch-curator==1.2.1', 'console_scripts', 'curator')()
File "/usr/lib/python2.6/site-packages/curator/curator.py", line 704, in main
if arguments.timestring:
AttributeError: 'Namespace' object has no attribute 'timestring'
Related to new timestring and time-unit missing from parser_show but always checking for them on arguments. Not sure if you'd prefer to add those args under parser instead of skipping show or work around by setting the defaults differently.
Hi,
I'm unable to find any Copyright information. I would like to close waja-archive/elasticsearch-curator#1
Thanks, Jan.
I'd like to see the ability for users to register their own actions to be run at given thresholds. Example API (rough draft):
@curator.action('close')
def close_index(client, index_name):
client.indices.close(index=index_name)
Which can then be used via the CLI by specifying the 'close' action type.
Problem is how do we do the discovery - we could ask the users to wrap the code in their at this point:
#!/usr/bin/env python
import curator
# ... actions here
if __name__ == '__main__':
curator.main()
or maybe just provide an env variable with a list of python modules to be loaded before the run:
CURATOR_ACTIONS='myapp.curator_actions' curator.py -....
This way curator itself wouldn't have to support all the options people might wish to run but will instead focus on selecting the indices for those actions and dispatching the calls to those actions.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.