What's the best way to monitor SAM uptime and health status? Some sort of a service status dashboard, with a single "worst status" indicator that I could put on SAM home page and set up an alert for it?
The reason I ask is that we had a couple of scary things happen recently. One, SolarWinds.DataProcessor.exe started crashing after a P2V and domain-joining of our SW SAM server, yet the web console looked good with no indications of any problem. After a while we noticed that the "Last Database Update" was stale instead of "now" or "10 seconds ago":
...and then I found myself talking to the monitor, "so, you mean, you CRASHED, and crashed hard, stopped working completely, yet everything is green and happy?"
So SAM was aware that all monitoring stopped dead in its tracks yet displayed no alerts other than the reminder to renew maintenance... I'd prefer all nodes and monitors to turn apoplectically red to get my attention, not stay green like nothing happened. Am I wrong in that?
Two, our Alerting stopped working. No NetPerfMon logging, no emails, no errors or warnings in Windows logs, nothing. Sure, the web console would display signs of failures if the views were set up right - yet when alerts silently stop working, it's a systemic failure of a mission critical, and fairly expensive component. Then started again when everyone's gone home, on its own - but only partially. Some alerts work, some don't. I totally understand how an app may not always be self-aware to know it failed, yet when it does know, and when its main mission is to alert if something is wrong: shouldn't it, by default?
Is there a simple way to add SAM health status to a view, ideally any view, and specifically, to alert if SAM is aware of any abnormal conditions of its own?