Code Monkey home page Code Monkey logo

Comments (12)

anilreddyb avatar anilreddyb commented on September 22, 2024

Thnos compactor config:
image

from thanos.

yeya24 avatar yeya24 commented on September 22, 2024

Error is

level=error ts=2024-01-24T13:15:57.569235075Z caller=compact.go:499 msg="retriable error" err="compaction: group 0@1151584605916149957: download block 01HFT1FJBC0FRECWZB25NHY7AT: copy object to file: write /data/compact/0@1151584605916149957/01HFT1FJBC0FRECWZB25NHY7AT/chunks/000012: no space left on device"

Please allocate more space for the compactor pod.

from thanos.

anilreddyb avatar anilreddyb commented on September 22, 2024

We are observing disk space is left at 6GB on disk
image
How much disk space do we need to maintain with the standard process?

from thanos.

anilreddyb avatar anilreddyb commented on September 22, 2024

The data inside meta.json mintime and maxtime is for nov2023 and all the latest logs are getting processed and data also available in s3 bucket but some reason its try to download old data(nov2023) we are not sure why its downloading?

from thanos.

douglascamata avatar douglascamata commented on September 22, 2024

How much disk space do we need to maintain with the standard process?

This is impossible to predict and also depends on your configuration. General guidance is:

  • Do not try to have unlimited retention.
  • Ensure your Compactor is always working and not halted or simply stuck (i.e. due to low CPU limit).

In cases like this, simply give it more disk. Data deletion is the very last step in Compactor's algorithm.

from thanos.

anilreddyb avatar anilreddyb commented on September 22, 2024

@douglascamata, Thanks for the respone, The follow up quesion when you say unlimited retention what exactly you mean, and bellow is our current configuration and does these config is seems fine:
image

retentionResolutionRaw: 30d
retentionResolution5m: 30d
retentionResolution1h: 10y

We are aslo seeing bellow error in the compactore logs what exactly significes the issue and could it be due to wrong retention specified in configuration:

level=warn ts=2024-01-24T13:38:07.872259646Z caller=objstore.go:386 group="0@{cluster="", env="uat", prometheus="observability/kube-prometheus-stack-prometheus", prometheus_replica="prometheus-kube-prometheus-stack-prometheus-2"}" groupKey=0@1151584605916149957 msg="failed to remove file on partial dir download error" file=/data/compact/0@1151584605916149957/01HFT1FJBC0FRECWZB25NHY7AT err="remove /data/compact/0@1151584605916149957/01HFT1FJBC0FRECWZB25NHY7AT: directory not empty"
level=error ts=2024-01-24T13:38:07.872381278Z caller=compact.go:499 msg="retriable error" err="compaction: group 0@1151584605916149957: download block 01HFT1FJBC0FRECWZB25NHY7AT: copy object to file: write /data/compact/0@1151584605916149957/01HFT1FJBC0FRECWZB25NHY7AT/chunks/000012: no space left on device"
level=warn ts=2024-01-24T14:00:17.88467112Z caller=objstore.go:386 group="0@{cluster="dev-test", env="uat",
prometheus="observability/kube-prometheus-stack-prometheus", prometheus_replica="prometheus-kube-prometheus-stack-prometheus-2"}" groupKey=0@1151584605916149957 msg="failed to remove file on partial dir download error" file=/data/compact/0@1151584605916149957/01HFT1FJBC0FRECWZB25NHY7AT err="remove /data/compact/0@1151584605916149957/01HFT1FJBC0FRECWZB25NHY7AT: directory not empty"

from thanos.

douglascamata avatar douglascamata commented on September 22, 2024

@anilreddyb with 10 years retention on 1h-downsampled metrics you will have problems in your system. Keep in mind that the Compactor has to be "aware" of literally all the blocks you have in your object storage. The Compactor (and Store Gateway) are often listing all blocks, checking out their meta files, checking for markers (other metadata) stored as files, etc. Now imagine the amount of requests that having 10 years of blocks there will be. Factor in that some providers will charge you based on amount of API requests...

Otherwise, focussing on yours logs and the fact that you have no disk space, that might be the reason for the other failures. I recommend to reduce your Compactor to 0 replicas, clean up your PVC and restart it.

from thanos.

anilreddyb avatar anilreddyb commented on September 22, 2024

I executed the commands below, deleting the previous data. After some time, a new folder was generated, and the data within it was associated with the date range of October 12th to 16th. My inquiry is why the data specifically references the month of October. It's important to note that this data is already present in the S3 bucket. Since the deletion of the old data, the /data directory is now at 100% free, eliminating any disk space concerns. Therefore, it seems the issue is unrelated to disk space.

thanos tools bucket retention --objstore.config-file=/conf/objstore.yml
thanos tools bucket cleanup --delete-delay=0s --objstore.config-file=/conf/objstore.yml

image Is there a way to investigate why the data consistently points to the month of October, even after manually deleting the old data multiple times?

from thanos.

douglascamata avatar douglascamata commented on September 22, 2024

why the data consistently points to the month of October

I don't understand the question. What do you mean with "data consistently points"?

from thanos.

anilreddyb avatar anilreddyb commented on September 22, 2024

@douglascamata,
I removed the old data within the /data/compact/ folder. Subsequently, new folders are generated. When I examine the metadata.json file's min-time and max-time, it displays a timestamp from the month of October. My question is, why does it indicate October when the file was created today? The timestamp should reflect the current date and month if the file was generated today.
If there's a retention period of the last 30 days, the metadata.json file should ideally show a timestamp within this timeframe. If it indicates October instead, there may be a configuration issue with the retention policy or data management process that needs to be addressed.

retentionResolutionRaw: 30d
retentionResolution5m: 30d
retentionResolution1h: 10y

from thanos.

douglascamata avatar douglascamata commented on September 22, 2024

There should be one metadata.json file per block you have in object storage, never only one (unless you only have 1 block, of course). And you are still keeping 10y of 1h-resolution data.

There's no issue that we are aware of with retention policy or data management.

from thanos.

douglascamata avatar douglascamata commented on September 22, 2024

You need to let the Compactor run and monitor its metrics to see wether it's working. Looking at the filesystem without deep understanding of how the Compactor works will only confuse you.

from thanos.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.