Event Correlation


  • Hi everyone,

    I'm look for some help from the Service Manager Group, if Event Correlation from an ITIL perspective can be implemented with little to no effort.   Is there a partner that has a turn-key solution or a known consultant(s) that has done such event correlation that matches BMC solution?

    The used case scenarios are something that BMC can demonstrate but System Center, does not out-of-the-box Event Correlation to the scenarios seen below.  I don't know how much resource effort would it take, if we had to develop in-house.   I do not believe this is a SCOM task, since we want to play it safe to generate alert triggering accuracy (generate all alerts) per technology layer's MP.   I do not believe this is a job for SCORCH, since data to be analyzed is centralized in the SCSM CMDB, the event correlation analysis would need to take in account multiple conditions, Date&Time, Alert, Configuration task, Change Control, and consolidating Incident or Problem views (presentation) of the possible root-cause, I would think this would be more of a SQL/SQL view customization at the SCSM database, but I do not know how much customization can be done at the SQL level that is allowed.

     Three scenarios:

    • You have a pool of 1000 Virtual Machines for your IaaS but the your are not monitoring other technology layers above the OS and need to ensure fast MTTR if the Event Correlation technology can pin-point root-cause for the Tier 1 to triage to the correct fabric team (SERVER, STORAGE, and NETWORK).
    • You have a switch that host 48 servers, and the switch fails, 48 server alerts-to-incidents would storm the CMDB, is there a way for the Event Correlation to customize the view of the 48 server alerts/incident tickets into collapse group and create an addition incident ticket with a high priority that that the root-cause was the switch and not the Servers alerts to be looked at first.   This will prevent clutter in the incident view and pin-point the root-cause.
    • A change control that successfully, place computers into maintenance mode, implemented a package but the package is now causing unexpected restarts or reboots from the server, but it is not witness this event after maintenance mode has been removed.   Here the event correlation would pin-point the RFC event was the cause, and a roll-back of that change would be the remediation.

    Thanks in advance,

    David Dellanno

    Tuesday, September 17, 2013 8:16 PM