Skip to main content

Command Palette

Search for a command to run...

Overall Health Metric of a System

Check this

Updated
2 min read
Overall Health Metric of a System
A

Curious about Outages, Distributed Systems, DevOps, Backend, SRE related stuffs.

Prometheus is one of the largely used tool for monitoring system metrics in many companies. This tools contains almost 130 various metrics but we don't need all the metrics every time we run a monitor-task. In some cases it exposes many metrics of a TSDB. Apparently it doesn't have enough metrics to reveal more info about any TSDB. So the point is everything having certain kind of metrics should have an overall health metrics.

This resolves two issues for the SREs.

  • There won't be any need to search through all the metrics for a particular info that we are expecting to be there, no need to do many grep searches for error, fail, corrupt keywords. No more searching for those particular 8 metrics of TSDB out of 130 metrics, just one overall health of the TSDB should be enough.

  • Whenever a new update enters into a system, accordingly the metrics also change, this becomes an issue for the folks who track down metrics on day-to-day basis. Because they have to add new metrics to the board with proper reconstruction. In this whole journey if something goes wrong or breaks then again that becomes another challenge in a chain.

So how this should work? It should cover some basics of most of the metrics and should give an average health of a system which should be enough to decide whether the folk watching Prometheus can choose in which area particularly he/she needs to dig in more. Along with this the overall metrics should be applied to sub-systems and the global system too to avoid unnecessary chaos in the team .

Suppose you have quite good sized system with log capturing and alerting mechanisms for sub-level and global-level, then you can add a webhook into the logging system to track the last time message that came from any of the sub-level system with proper priority level. So a Prometheus metric can be added to that particular stuff for future use.

Thanks for reading up to here.

source - HaveGeneralHealthMetric

D

Explaining The Devastating Effects Of Coronavirus On Insomnia

Most of you would know that insomnia is a sleep problem and it causes difficulty in falling asleep and staying asleep for a reasonable amount of time. It also affects sleep quality, resulting in daytime inactivity. Initially, insomnia symptoms may not require adequate treatment. If a person feels tired and cannot perform his physical activities during the day, he should buy sleeping pills for a short time as per his doctor or therapist. Do not overuse sleeping pills, it can be harmful, and can affect your health. If you are experiencing symptoms of this disorder then you need to go for proper treatment. Zopiclone 10mg tablets are available online for you to use if your symptoms are not under control. You can buy these insomnia pills online from Pharma Health Online.

https://www.pharmahealthonline.com/sleeping-aid/the-devastating-impact-of-coronavirus-on-short-term-and-long-term-insomnia/

https://www.pharmahealthonline.com/shop/sleeping-aid/zopiclone-10-mg/