Code Monkey home page Code Monkey logo

Comments (3)

untergeek avatar untergeek commented on July 21, 2024

@wiibaa, thanks for migrating these. I know you're aware of these details, but I'm going to comment here for the benefit of other readers.

TTL is a bad idea for time-series data as indices can grow to billions of documents per day. If a document-level TTL is set on any document in the index, the entire index will be scanned every 60 seconds (a configurable default) to look for documents which have a TTL set, and to check if it has expired. This is a tremendous amount of overhead just for reading, not even deleting.

Even using a delete_by_query via cron is problematic because of how it affects segment sizing and allocation. It makes for very uneven segment merges which puts strain on your indexing and search operations. In addition, deleting documents in Elasticsearch does not result in immediate deletion. From Elasticsearch: The Definitive Guide book:

Internally, Elasticsearch has marked the old document as deleted…The old version of the document doesn’t disappear immediately, although you won’t be able to access it. Elasticsearch cleans up deleted documents in the background as you continue to index more data.

The need to prune documents by TTL or by query is necessary in certain environments, but time-series data (like logs) should almost never be handled in this way, in my opinion. Your Elasticsearch environment will be far better served by splitting into separate indices and dropping them with DELETE calls (or using curator).

With this said, if you insist on doing event-level TTL, there's nothing to prevent users from adding a TTL to individual events in Logstash by adding a field called _ttl with a string value that Elasticsearch will recognize such as "1d" for one day, or "1w" for one week. See http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-ttl

  mutate {
    add_field => { "_ttl" => "1d" }
  }

Because this is fully supported now—because of the Logstash 1.2 schema change—I'm going to close this issue with all caveats mentioned. Feel free to re-open if you believe it necessary.

from logstash-output-elasticsearch.

wiibaa avatar wiibaa commented on July 21, 2024

@untergeek I had a rough idea, but it is always good to hear the details from the experts

from logstash-output-elasticsearch.

wols avatar wols commented on July 21, 2024

A valuable clue. Thank you!

from logstash-output-elasticsearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.