Comments (3)
@wiibaa, thanks for migrating these. I know you're aware of these details, but I'm going to comment here for the benefit of other readers.
TTL is a bad idea for time-series data as indices can grow to billions of documents per day. If a document-level TTL is set on any document in the index, the entire index will be scanned every 60 seconds (a configurable default) to look for documents which have a TTL set, and to check if it has expired. This is a tremendous amount of overhead just for reading, not even deleting.
Even using a delete_by_query
via cron is problematic because of how it affects segment sizing and allocation. It makes for very uneven segment merges which puts strain on your indexing and search operations. In addition, deleting documents in Elasticsearch does not result in immediate deletion. From Elasticsearch: The Definitive Guide book:
Internally, Elasticsearch has marked the old document as deleted…The old version of the document doesn’t disappear immediately, although you won’t be able to access it. Elasticsearch cleans up deleted documents in the background as you continue to index more data.
The need to prune documents by TTL or by query is necessary in certain environments, but time-series data (like logs) should almost never be handled in this way, in my opinion. Your Elasticsearch environment will be far better served by splitting into separate indices and dropping them with DELETE
calls (or using curator).
With this said, if you insist on doing event-level TTL, there's nothing to prevent users from adding a TTL to individual events in Logstash by adding a field called _ttl
with a string value that Elasticsearch will recognize such as "1d"
for one day, or "1w"
for one week. See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-ttl
mutate {
add_field => { "_ttl" => "1d" }
}
Because this is fully supported now—because of the Logstash 1.2 schema change—I'm going to close this issue with all caveats mentioned. Feel free to re-open if you believe it necessary.
from logstash-output-elasticsearch.
@untergeek I had a rough idea, but it is always good to hear the details from the experts
from logstash-output-elasticsearch.
A valuable clue. Thank you!
from logstash-output-elasticsearch.
Related Issues (20)
- normalized SSL config not applied when LS core uses private-API build_client directly
- Unify the error behaviour for template installation and ilm
- Doc: Update docs to call out differences from standard offering
- Allow the creation of custom data streams HOT 1
- Reduce ES response size through use of filter_path HOT 3
- Use integration's metadata fields (id, index, pipeline) when present
- Use integration's metadata fields (_routing, _version, _version_type) when present
- New management of `version` and `version_type` corrupt datastreams processing
- Isolate datastream vs normal indexing decision into test fixture
- Plugin fails with permission denied error HOT 1
- Cannot bundle install due to gemspec deps HOT 1
- Doc: Update data streams info to explain implications of ECS settings
- Handling non UTF-8 data.
- Default values of `http_compression` and `compression_level` are incosistent HOT 2
- Determine if `data_streams => true` can be possible with ECS compatibility effectively disabled
- Doc: Expand `silence_errors_in_log` description to show possible values
- SSL settings changed and not documented HOT 1
- Too large payload responses (413) are not being properly handled
- [CI] integration tests fail with PKIX path validation failed
- [Doc] clarify the acceptable format of ssl_key
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from logstash-output-elasticsearch.