Comments (12)
Re Work Item, I mean not an Issue (where this some problem), but a Feature Request. When you create an Issue, you can choose the type. Feature Request means you want something added to the technology. It explains what you need to present in the template.
from service-fabric-observer.
Hi Mark,
This feature isn't actually ready for prime time. There is an active work item tracking this (internal) and I will let you know when it comes to fruition. I think the documentation is ahead of reality here. Even in my local tests, I am not getting the results I expect.
This means that there is nothing FO can provide in the near term. The data you are getting back is not correct (it doesn't seem like 79 bytes is realistic for your stateful service replicas...). Sorry for confusing you. Let's hold off on this for now until I get back to you. Feel free to leave this Issue open in the interim.
from service-fabric-observer.
Thanks for the report. That is indeed a doc bug.
There is currently no disk-related monitoring done by AppObserver.
I will correct the documentation.
from service-fabric-observer.
Fixed the documentation to reflect reality. Thanks again for catching that.
In terms of adding disk monitoring capability to AppObserver, feel free to create a Work Item and it will be looked into. What disk IO metrics do you want to measure and apply thresholds?
from service-fabric-observer.
Hey, not quite sure what you mean by creating a work item. Do you want a separate issue? I'm mostly interested in disk consumption on a per app / service basis, so that we can have a way to track how much disk space each service consumes and how it changes over time. I don't know how practical that would be to do though.
from service-fabric-observer.
Hi,
So, that would be something like tracking (on Windows) WriteTransferCount, which is the number of bytes written to disk by a process. It's what you see in Task Manager, Details view, for a process when you add the "I/O write bytes" column. Implementation-wise, that is easy to add to AppObserver, but users would need to supply a Warning threshold to enable it and it is unclear to me if users know what constitutes misbehavior. So, maybe the service writes data to logs or some other file(s) and this could amount to GBs of data. What constitutes too much? That would be left to the user to decide, but observers only monitor resources that have thresholds specified, so you could just use a really large value to limit Warning noise or if you know that your service is supposed to manage the disk space it consumes, you could warn when it eats 10GB or something, which could signal that your disk cleanup code is failing. Again, this would be up to the user.
from service-fabric-observer.
That makes sense thanks. I'm not sure I/O Write Bytes would be useful as it looks to me like that value never goes down, so its not a representation of the current disk space utilisation of a process, but how much its written to disk in its lifetime (which for very long lived processes is going to end up being huge).
The scenario we have is a lot of stateful services that co-locate their state on disk with the code, and its this "state" disk consumption that it would be interesting to track, but i'm not sure if a metric easily exists to do so.
from service-fabric-observer.
Yeah, you are right. That won't really help.
I am not sure what performance counter would help you here. Are you trying to measure how much replicated state exists on disk?
from service-fabric-observer.
Yes, if possible, with ideally a breakdown per app or service.
from service-fabric-observer.
This information is actually available via TStore SF perfcounters.
I have not had a chance to experiment with this yet, however. You can open Performance Monitor, go to SF counters, look under TStore. Disk Size and Item Count are the droids you're looking for, particularly Disk Size.
from service-fabric-observer.
Hi Charles. Thanks for this, it looks interesting. I found this page that describes the counters: https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-reliable-services-diagnostics
Item Count = The number of items in the store.
Disk Size = The total disk size, in bytes, of checkpoint files for the store.
I used Perfmon to look at them on some of my clusters. They obviously aren't present on nodes that run stateless services, but I did find them on my nodes that have stateful services, however it seemed a bit strange that for every instance the value was the same (79 in this case). I was expecting it to vary between different services, but it was also hard to work out which instance was for which service, which I assume is determinable by the ID it returns.
If you are interested in exploring this I'd be keen to see how it could be implemented in AppObserver so that it more easily allows you to see the values per service/app.
Oh, I also found they weren't present on my clusters that are still SF 9.0, but were on my ones that were SF 9.1. Are these counters new to SF 9.1 do you know?
from service-fabric-observer.
I think the issue here is unrelated to the counter implementation - it is fine.... It is more of an understanding problem vis a vis how the counter actually works. So, I verified that the results are accurate. However, there is something to keep in mind here:
A check point will be initiated when the specified threshold (CheckPointThresholdInMB) is reached. This amounts to the log usage exceeding this threshold. At that point, the counter will return non-zero value (so greater than the CheckPointThresholdInMB as bytes). The default value for this setting is 50MB. You can do a local experiment and change the value to be lower for your stateful service (only for testing, mind you - do not use a small value in production...).
So, the counter is not a problem. It was just understanding what is going on that took some time (plus I frankly haven't had much time to revisit this and when I did I talked to a dev on the SF data team to clear this up).
Note that there is still a work item in progress to work on the overall feature, including performance improvements, better documentation of what the data means, etc. Also, querying the counters from C# code (via PerformanceCounter class) does not work. So, that needs to be sorted out before FO can do any monitoring/reporting for this.
from service-fabric-observer.
Related Issues (20)
- With support for .Net 3.1 ending tin 6 months, can it be migrated to .Net 6 (LTS) HOT 6
- [WORK ITEM] Docs
- [BUG] net6 branch: fix typo in README.md HOT 1
- [BUG] Fabric Observer is unable to get private working set for process with long name HOT 7
- [BUG] Deployed fabricobserver failed to start HOT 2
- [BUG] FO can put itself into Warning with an Infinite TTL Health Report.
- [BUG] Linux: FabricSystemObserver does not report results for some services.
- [FEATURE REQUEST] Add RG Monitoring support (Memory). Windows-only.
- [FEATURE REQUEST] Add Private Bytes monitoring to AppObserver. Windows-only.
- [FEATURE REQUEST] Enable Dump on Warning for Windows user service processes.
- [FEATURE REQUEST] Add support to AppObserver for monitoring multiple code packages.
- [BUG] AppObserver does not monitor GuestExecutable services.
- [BUG] ClusterObserver not building HOT 2
- [BUG]: Concurrent monitoring: CPU data deletion mistiming can lead to no results. HOT 2
- [BUG] log directory calculation in FabricObserver.Extensibility/Utilities/DataTableFileLogger.cs, ConfigureLogger is faulty HOT 4
- [BUG] Allocated File Handles metric no longer being logged by AppObserver in FabricObserver 3.2.8 HOT 2
- [BUG] Child process monitoring does not work when parent procs have only one descendant.
- Informing the customer when a plugin is incorrect[BUG]
- [FEATURE REQUEST] Make telemetry parameters configurable in application parameters HOT 15
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from service-fabric-observer.