locked
Monitors That Haven't Reset RRS feed

  • Question

  • Hello - hope I'm posting to the correct area.

     

    We have recently re-installed SCE to the same server after doing a complete un-install.  Overall, the second installation has gone well.  At this time, the only items that are giving me fits revolve around the Management Server showing as being in a critical state.

     

    There are two different monitors under Entity Health - Availability that are showing as being failed.  The first is the Data Warehouse Configuration Synchronization Data Reader Sate - server (Data Warehouse Synchronization Server), and the second is the SDK Service - Database Connectivity - server (Root Management Server).  Both are showing as being in a critical state for the past several days, back to about the time that work was being done revoloving around installing SCE and making modifications to the SQL instance.

     

    The server has been reset various times since then, and seems to be functioning.  Since we are still in the initial deployment phase, I have tried to export the Operations Manager after puttting the server in Maintenance Mode and letting it come back out of Maintenance mode, and the sate of the errors does not change.

     

    I believe that both aspects are funcitoning correctly now, but cannot seem to find a way to get the current sate to clear and check itself again.  Is there way to reset this monitoring on the SCE server of itself?  Other monitors, like the Health Service Handle Count Threshold seem to go from OK to Critical and back to OK, which is expected.

     

    Thanks in advance.

     

    Jim

    Thursday, August 9, 2007 4:44 PM

Answers

  • Hi Jonas,

     

    To find the "Flush Health Service State and Cache" task, follow these steps.

     

    1. Run SCE console and navigate to Monitoring -> Computers.

     

    2. In the Computer section, locate the line for the related computer. If it is the SCE server, please double click its icon in the "Management Server" column. However, if it is not the SCE server, please double click its icon in the "Agent" column.  (Note: Please pay more attention to the location where you double click. If everything works, a new "State" dialog should open.)

     

    3. In the newly opened "State" dialog, right click the related computer’s item in the "Name" column, and then choose "Health Service Tasks". You will find the task of "Flush Health Service State and Cache" in the list.

     

    As to the different Health State between Computers panel and Monitoring panel, it is because the information is from different sources. In Computers panel it is from inventory while in Monitoring panel it is from real-time monitoring.

     

    We'd recommend that you start up a new thread for the issues you mentioned. We generally focus on one topic in one thread because in this way it will be better for other community members to participate in the discussion, and to search/find specific answers more efficiently in the future.

     

    Besides, please post the events with full information as below, using the button "Copies the details of the event to the Clipboard". It will be more helpful.

     

    Event Type: Error

    Event Source: OpsMgr Connector

    Event Category: None

    Event ID: 20002

    Date:  7/31/2007

    Time:  5:15:46 PM

    User:  N/A

    Computer: FILE-OS3-02

    Description:

    A device at IP xx.xx.xx.xx:8712 attempted to connect but could not be authenticated, and was rejected.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

     

    Thanks.

     

                                              

    Sincerely,

    Yog Li

    Microsoft Online Community Support

     

    Monday, August 27, 2007 10:46 AM
  • Hi Tim -

     

    Thanks for your suggestion.  I did not find any active alerts for this server, even when making the Show Alerts For time parameter greater than the amount of time that the server has been up.  The monitoring state was set that way because the rule was seeing a no database connection error, and not finding a follow up database connection established acknowledgement in the Operations Manager events.

     

    After reading your post and finding no active alert to close, I found a task called Flush Health Service State and Cache.  I'm not even sure where I was when I found that option, but it seems to become exposed when a computer is in a faulted state.  Executing that routine always produces an error, since the task never records properly that it completed because of the flush process.  The end result is, Health Monitor now is seeing the current state of the server and finding no errors.

     

    Thanks for your response - hope this will help someone else in the future.

     

    Jim

     

    Friday, August 10, 2007 6:08 PM

All replies

  • Jim,

    Have you gone to the Monitoring page and under Active Alerts selected Close Alert for the alerts that you mention? Do they continue to reappear in the alert list?

    I had a similar issue immediately after installing SCE where I got a number of Critical Alerts for some of my SQL related services on my management server. Once they were cleared they never appeared again. It seems to be due to the monitor checking the state faster than the time that the service requires to start.

    HTH,

    Tim

    Thursday, August 9, 2007 9:15 PM
  • Hi Tim -

     

    Thanks for your suggestion.  I did not find any active alerts for this server, even when making the Show Alerts For time parameter greater than the amount of time that the server has been up.  The monitoring state was set that way because the rule was seeing a no database connection error, and not finding a follow up database connection established acknowledgement in the Operations Manager events.

     

    After reading your post and finding no active alert to close, I found a task called Flush Health Service State and Cache.  I'm not even sure where I was when I found that option, but it seems to become exposed when a computer is in a faulted state.  Executing that routine always produces an error, since the task never records properly that it completed because of the flush process.  The end result is, Health Monitor now is seeing the current state of the server and finding no errors.

     

    Thanks for your response - hope this will help someone else in the future.

     

    Jim

     

    Friday, August 10, 2007 6:08 PM
  • Hi,

    I have a similar problem - all alerts removed and still a critical state on the managed server.

    How are you supposed to clear the state?
    I have tried Reset Health in the Health explorer with no success, and I can not find any "Flush Health Service State and Cache" task.

    I have even tried putting the server in maintenance mode (although I've read you should not do that), but it did not help clearing the state nor could I find any Flush task.

    Another note: It's quite strange that on the Computers view the Health State is OK, but on the Monitoring view the state is critical.

    Besides the above state problem there are some correlating problems according to the event viewer "Operations Manager".

    The SQL Server Express option was selected during installation and the first (bottom event) is strange as is complains about not being able to login. (These events came after a restart of the server)

    Any suggestions or ideas of what have gone wrong?

    Thanks


    -------------------------
    Warning:
    Summary: 1 rule(s)/monitor(s) failed and got unloaded, 1 of them reached the failure limit that prevents automatic reload. Management group "COMPILER12_MG". This is summary only event, please see other events with descriptions of unloaded rule(s)/monitor(s).

    ---
    Warning:
    Enforce GPO Link for SCE : The script 'SCE_Enforce_GPO_Link' failed to get GPMC Object ''.

    Error:
    ---

    Failed to create process due to error '0x80070003 : The system cannot find the path specified.
    ', this workflow will be unloaded.

    Command executed:    "C:\WINDOWS\system32\windowspowershell\v1.0\powershell.exe" . '"C:\Program Files\System Center Essentials 2007\Health Service State\Monitoring Host Temporary Files 7\4297\MPUpdate.ps1"' -MPVersionFileUrl:'http://go.microsoft.com/fwlink/?LinkId=57329'
    Working Directory:    C:\Program Files\System Center Essentials 2007\Health Service State\Monitoring Host Temporary Files 7\4297\

    One or more workflows were affected by this. 

    Workflow name: Microsoft.SystemCenter.CheckForManagementPackUpdates
    Instance name: compiler12.blabla.com
    Instance ID: {AE4C3BD9-81B6-C78A-A9B4-A2299553B4E8}
    Management group: COMPILER12_MG

    ---
    Warning:

    In PerfDataSource, could not find counter OpsMgr DW Synchronization Module, Data Items/sec, All Instances in Snapshot. Unable to submit Performance value. Module will not be unloaded.

    One or more workflows were affected by this. 

    Workflow name: Microsoft.SystemCenter.DataWarehouse.CollectionRule.Performance.Synchronization.DataItemsPerSecond
    Instance name: compiler12.blabla.com
    Instance ID: {AE4C3BD9-81B6-C78A-A9B4-A2299553B4E8}
    Management group: COMPILER12_MG

    ---
    Warning:

    In PerfDataSource, could not find counter OpsMgr DW Synchronization Module, Avg. Batch Size, All Instances in Snapshot. Unable to submit Performance value. Module will not be unloaded.

    One or more workflows were affected by this. 

    Workflow name: Microsoft.SystemCenter.DataWarehouse.CollectionRule.Performance.Synchronization.AvgBatchSize
    Instance name: compiler12.blabla.com
    Instance ID: {AE4C3BD9-81B6-C78A-A9B4-A2299553B4E8}
    Management group: COMPILER12_MG

    ---
    Warning:

    In PerfDataSource, could not find counter WSUS: Server Web Methods, Web method exceptions,  in Snapshot. Unable to submit Performance value. Module will not be unloaded.

    One or more workflows were affected by this. 

    Workflow name: Microsoft.Windows.Server.UpdateServices.3.Server.ServerWebMethodExceptions.Collection
    Instance name: WSUS
    Instance ID: {8C26CE44-2083-AE00-D780-340ACC6664AE}
    Management group: COMPILER12_MG

    ---
    Error:
    A database exception was thrown in the Operations Manager SDK service. Exception Message: Cannot open database "OperationsManager" requested by the login. The login failed.
    Login failed for user 'NT AUTHORITY\SYSTEM'.
    Friday, August 24, 2007 10:38 AM
  • Hi Jonas,

     

    To find the "Flush Health Service State and Cache" task, follow these steps.

     

    1. Run SCE console and navigate to Monitoring -> Computers.

     

    2. In the Computer section, locate the line for the related computer. If it is the SCE server, please double click its icon in the "Management Server" column. However, if it is not the SCE server, please double click its icon in the "Agent" column.  (Note: Please pay more attention to the location where you double click. If everything works, a new "State" dialog should open.)

     

    3. In the newly opened "State" dialog, right click the related computer’s item in the "Name" column, and then choose "Health Service Tasks". You will find the task of "Flush Health Service State and Cache" in the list.

     

    As to the different Health State between Computers panel and Monitoring panel, it is because the information is from different sources. In Computers panel it is from inventory while in Monitoring panel it is from real-time monitoring.

     

    We'd recommend that you start up a new thread for the issues you mentioned. We generally focus on one topic in one thread because in this way it will be better for other community members to participate in the discussion, and to search/find specific answers more efficiently in the future.

     

    Besides, please post the events with full information as below, using the button "Copies the details of the event to the Clipboard". It will be more helpful.

     

    Event Type: Error

    Event Source: OpsMgr Connector

    Event Category: None

    Event ID: 20002

    Date:  7/31/2007

    Time:  5:15:46 PM

    User:  N/A

    Computer: FILE-OS3-02

    Description:

    A device at IP xx.xx.xx.xx:8712 attempted to connect but could not be authenticated, and was rejected.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

     

    Thanks.

     

                                              

    Sincerely,

    Yog Li

    Microsoft Online Community Support

     

    Monday, August 27, 2007 10:46 AM
  • The flush health service worked.

     

    Thanks

    Wednesday, August 29, 2007 7:11 AM