A log monitor which captures certain strings in the file did not capture one of the strings it was configured for. Here's the timeline of events of why it didn't capture the error:
- The monitor was set up to tail the log file every 5 minutes to capture everything in the log since the last time the monitor ran. This is by design with a vended application we use for monitoring.
- The monitor ran at 12:59 PM and didn't find any errors.
- The error comes in at 1:00:59 AM with the string "LOG EXCEPTION"
- The log file rolls because it has a size limitation.
- The monitor runs at 1:04 AM and tails the file again, but the error is now in the rolled file.
- 10:00 AM, the business user reports the problem
Gotcha! Clearly we missed this in our thinking when we set up the monitor. We've now configured our monitoring to always look at the last two log files. Since the files don't grow too quickly, that should suffice (given the 5 minute interval).
So if you use Sitescope, keep this in mind. Don't get caught with your pants down. I'm sure by now I've lost everyone who uses Sitescope for monitoring (they're now checking they don't have similar gaps). :-)
Cheers!
No comments:
Post a Comment