Comments (12)
from thanos.
Error is
level=error ts=2024-01-24T13:15:57.569235075Z caller=compact.go:499 msg="retriable error" err="compaction: group 0@1151584605916149957: download block 01HFT1FJBC0FRECWZB25NHY7AT: copy object to file: write /data/compact/0@1151584605916149957/01HFT1FJBC0FRECWZB25NHY7AT/chunks/000012: no space left on device"
Please allocate more space for the compactor pod.
from thanos.
We are observing disk space is left at 6GB on disk
How much disk space do we need to maintain with the standard process?
from thanos.
The data inside meta.json mintime and maxtime is for nov2023 and all the latest logs are getting processed and data also available in s3 bucket but some reason its try to download old data(nov2023) we are not sure why its downloading?
from thanos.
How much disk space do we need to maintain with the standard process?
This is impossible to predict and also depends on your configuration. General guidance is:
- Do not try to have unlimited retention.
- Ensure your Compactor is always working and not halted or simply stuck (i.e. due to low CPU limit).
In cases like this, simply give it more disk. Data deletion is the very last step in Compactor's algorithm.
from thanos.
@douglascamata, Thanks for the respone, The follow up quesion when you say unlimited retention what exactly you mean, and bellow is our current configuration and does these config is seems fine:
retentionResolutionRaw: 30d
retentionResolution5m: 30d
retentionResolution1h: 10y
We are aslo seeing bellow error in the compactore logs what exactly significes the issue and could it be due to wrong retention specified in configuration:
level=warn ts=2024-01-24T13:38:07.872259646Z caller=objstore.go:386 group="0@{cluster="", env="uat", prometheus="observability/kube-prometheus-stack-prometheus", prometheus_replica="prometheus-kube-prometheus-stack-prometheus-2"}" groupKey=0@1151584605916149957 msg="failed to remove file on partial dir download error" file=/data/compact/0@1151584605916149957/01HFT1FJBC0FRECWZB25NHY7AT err="remove /data/compact/0@1151584605916149957/01HFT1FJBC0FRECWZB25NHY7AT: directory not empty"
level=error ts=2024-01-24T13:38:07.872381278Z caller=compact.go:499 msg="retriable error" err="compaction: group 0@1151584605916149957: download block 01HFT1FJBC0FRECWZB25NHY7AT: copy object to file: write /data/compact/0@1151584605916149957/01HFT1FJBC0FRECWZB25NHY7AT/chunks/000012: no space left on device"
level=warn ts=2024-01-24T14:00:17.88467112Z caller=objstore.go:386 group="0@{cluster="dev-test", env="uat", prometheus="observability/kube-prometheus-stack-prometheus", prometheus_replica="prometheus-kube-prometheus-stack-prometheus-2"}" groupKey=0@1151584605916149957 msg="failed to remove file on partial dir download error" file=/data/compact/0@1151584605916149957/01HFT1FJBC0FRECWZB25NHY7AT err="remove /data/compact/0@1151584605916149957/01HFT1FJBC0FRECWZB25NHY7AT: directory not empty"
from thanos.
@anilreddyb with 10 years retention on 1h-downsampled metrics you will have problems in your system. Keep in mind that the Compactor has to be "aware" of literally all the blocks you have in your object storage. The Compactor (and Store Gateway) are often listing all blocks, checking out their meta files, checking for markers (other metadata) stored as files, etc. Now imagine the amount of requests that having 10 years of blocks there will be. Factor in that some providers will charge you based on amount of API requests...
Otherwise, focussing on yours logs and the fact that you have no disk space, that might be the reason for the other failures. I recommend to reduce your Compactor to 0 replicas, clean up your PVC and restart it.
from thanos.
I executed the commands below, deleting the previous data. After some time, a new folder was generated, and the data within it was associated with the date range of October 12th to 16th. My inquiry is why the data specifically references the month of October. It's important to note that this data is already present in the S3 bucket. Since the deletion of the old data, the /data directory is now at 100% free, eliminating any disk space concerns. Therefore, it seems the issue is unrelated to disk space.
thanos tools bucket retention --objstore.config-file=/conf/objstore.yml
thanos tools bucket cleanup --delete-delay=0s --objstore.config-file=/conf/objstore.yml
from thanos.
why the data consistently points to the month of October
I don't understand the question. What do you mean with "data consistently points"?
from thanos.
@douglascamata,
I removed the old data within the /data/compact/ folder. Subsequently, new folders are generated. When I examine the metadata.json file's min-time and max-time, it displays a timestamp from the month of October. My question is, why does it indicate October when the file was created today? The timestamp should reflect the current date and month if the file was generated today.
If there's a retention period of the last 30 days, the metadata.json file should ideally show a timestamp within this timeframe. If it indicates October instead, there may be a configuration issue with the retention policy or data management process that needs to be addressed.
retentionResolutionRaw: 30d
retentionResolution5m: 30d
retentionResolution1h: 10y
from thanos.
There should be one metadata.json file per block you have in object storage, never only one (unless you only have 1 block, of course). And you are still keeping 10y of 1h-resolution data.
There's no issue that we are aware of with retention policy or data management.
from thanos.
You need to let the Compactor run and monitor its metrics to see wether it's working. Looking at the filesystem without deep understanding of how the Compactor works will only confuse you.
from thanos.
Related Issues (20)
- Receiver samples limit should include native histograms
- Docs version dropdown overflows the page
- Rule: Support QueryOffset
- Add support to the Info field in the body of query API responses
- Expose more rueidis clientOpts
- Index Cache: Change fetchMulti interface to return slice rather than map HOT 3
- Sidecar: After a block upload was delayed, all subsequent block uploads were delayed as well HOT 4
- Can we filter out the blocks that are already fully compressed and then downsample them. This way downsampled blocks will not be compressed.There shouldn't be any other problems with this, right? HOT 1
- query/sidecar: same store.limits flags on both components HOT 3
- Thanos update causing compactor to err on sync before retention HOT 3
- query: regression in v0.36.0 - concurrent_gate_queries periodically reaching limit HOT 4
- Thanos querier rate/increase function creating huge spikes/incorrect results when deduplication is enabled
- Make TLS versions and ciphers configurable HOT 1
- FT add feature flag for 'promql-experimental-functions'. HOT 3
- Improvement of OpenSSF Scorecard Score HOT 1
- Thanos Query: gaps in deduplicated data HOT 6
- Segmentation Violation when connecting to Kafka
- Thanos-1x.github.io HOT 2
- Thanos plus Memcached Losing Metrics HOT 3
- Longer loading times for index headers with Thanos Receive compared to Thanos Sidecar HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from thanos.