none
Monitor: Unable to Process Windows Event Log

    Question

  • Our leading alert noise generator right now is the "Unable to Process Windows Event Log" alert generated by the HealthService. Way ahead of the competition, though not at the repeat count level.

    The problem is I don't see any real errors that these events are trapping.  I see the unhealthy event being generated, a minute or so later the healthy event is generated, flipping back and forth.  I haven't actually found any actually corrupt event logs on these servers yet. It's mostly the same handful of servers generating these alerts, and maybe a few of them do have corrupt event logs, but how do I find the ones with real issues?

    Any insight into this monitor and how I can evaluate or tune it? 
    Thursday, April 7, 2011 1:02 PM

Answers

  • The problem here is that the monitor will notice a corrupt eventlog. Unfortunately windows servers have more than 1 eventlog. So when the next eventlog is read without problems, the monitor is (incorrectly) reset. The next time the agent tries to read the corrupt eventlog again, it will change the state of the monitor again until a good eventlog is read and so on.

    I can't remember if the eventlog name is noted in the health eplorer. But if it isn't just clear all eventlogs of the server generating the alert.


    Regards,
    Marc Klaver
    http://jama00.wordpress.com/
    • Marked as answer by Vivian Xing Friday, April 15, 2011 8:18 AM
    Thursday, April 7, 2011 4:49 PM
  • Hi,

    Please check if information in the following link will help:

    http://www.eggheadcafe.com/software/aspnet/33484310/how-to-fix-scom-falsly-detecting-corrupted-system-event-log.aspx


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
    • Marked as answer by Vivian Xing Friday, April 15, 2011 8:18 AM
    Friday, April 8, 2011 6:51 AM
  • Our leading alert noise generator right now is the "Unable to Process Windows Event Log" alert generated by the HealthService. Way ahead of the competition, though not at the repeat count level.

    The problem is I don't see any real errors that these events are trapping.  I see the unhealthy event being generated, a minute or so later the healthy event is generated, flipping back and forth.  I haven't actually found any actually corrupt event logs on these servers yet. It's mostly the same handful of servers generating these alerts, and maybe a few of them do have corrupt event logs, but how do I find the ones with real issues?

    Any insight into this monitor and how I can evaluate or tune it? 


    The name of the corrupt eventlog is given in the alert in the opsmgr eventlog, i believe this can be seen in the opsmgr console as well. The reason why you get so many alerts is the healthy event is triggered on a general "good" alert, instead of a specific eventlog problem. And the general alert occurs like every minute.

    We stumbled on this as well when we had a support call with MS about something else. This would be investigated/escalated by the engineer, so I hope it's being updated in a new release of the core mp's. For now not much you can do then regularly check this alert (create a closed alert view for this alert) and clear the corrupted eventlogs asap (i've always found a corrupt eventlog, so the unhealthy state is correct imho).


    Rob Korving
    http://jama00.wordpress.com/
    • Marked as answer by Vivian Xing Friday, April 15, 2011 8:18 AM
    Friday, April 8, 2011 7:45 AM

All replies

  • The problem here is that the monitor will notice a corrupt eventlog. Unfortunately windows servers have more than 1 eventlog. So when the next eventlog is read without problems, the monitor is (incorrectly) reset. The next time the agent tries to read the corrupt eventlog again, it will change the state of the monitor again until a good eventlog is read and so on.

    I can't remember if the eventlog name is noted in the health eplorer. But if it isn't just clear all eventlogs of the server generating the alert.


    Regards,
    Marc Klaver
    http://jama00.wordpress.com/
    • Marked as answer by Vivian Xing Friday, April 15, 2011 8:18 AM
    Thursday, April 7, 2011 4:49 PM
  • Hi,

    Please check if information in the following link will help:

    http://www.eggheadcafe.com/software/aspnet/33484310/how-to-fix-scom-falsly-detecting-corrupted-system-event-log.aspx


    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread.
    • Marked as answer by Vivian Xing Friday, April 15, 2011 8:18 AM
    Friday, April 8, 2011 6:51 AM
  • Our leading alert noise generator right now is the "Unable to Process Windows Event Log" alert generated by the HealthService. Way ahead of the competition, though not at the repeat count level.

    The problem is I don't see any real errors that these events are trapping.  I see the unhealthy event being generated, a minute or so later the healthy event is generated, flipping back and forth.  I haven't actually found any actually corrupt event logs on these servers yet. It's mostly the same handful of servers generating these alerts, and maybe a few of them do have corrupt event logs, but how do I find the ones with real issues?

    Any insight into this monitor and how I can evaluate or tune it? 


    The name of the corrupt eventlog is given in the alert in the opsmgr eventlog, i believe this can be seen in the opsmgr console as well. The reason why you get so many alerts is the healthy event is triggered on a general "good" alert, instead of a specific eventlog problem. And the general alert occurs like every minute.

    We stumbled on this as well when we had a support call with MS about something else. This would be investigated/escalated by the engineer, so I hope it's being updated in a new release of the core mp's. For now not much you can do then regularly check this alert (create a closed alert view for this alert) and clear the corrupted eventlogs asap (i've always found a corrupt eventlog, so the unhealthy state is correct imho).


    Rob Korving
    http://jama00.wordpress.com/
    • Marked as answer by Vivian Xing Friday, April 15, 2011 8:18 AM
    Friday, April 8, 2011 7:45 AM