none
After discovering unix server on scom , server showing Healthy icon with grey colour RRS feed

  • Question

  • Hi all,

    I have discovered an unix server on scom using console and used root credentials

    After installing, unix server is showing on the console, but icon is showing healthy but in grey colour.

    And also no alert got generated for this.

    also the port 1270 is opened and i am able to ping the server.

    I even tried to reset and recalculate the health, but seems no luck.

    and also nothing i can find in event viewer logs

    Can anyone please help me in this.


    AD


    • Edited by AD_SC Thursday, June 6, 2019 11:54 AM
    Thursday, June 6, 2019 11:49 AM

All replies

  • Hello,

    Start by checking the logs for any clues, I suggest you refer to the troubleshooting documentation below:
    Troubleshooting monitoring of UNIX and Linux computers

    Kevin Holman has also written a throughout blog post about monitoring UNIX/Linux agents with SCOM:
    Monitoring UNIX/Linux with OpsMgr 2016


    Check if you have missed anything and make sure you have done everything accordingly.

    Best regards,
    Leon


    Blog: https://thesystemcenterblog.com LinkedIn:

    Thursday, June 6, 2019 12:05 PM
  • Hi,

    What is our operations manager version? 2012 R2 (UR9 or lower) may have this problem. We need to to upgrade to UR9 or above (the latest is UR14 for 2012 R2).

    If we are not using 2012 R2, since we've checked the ping status and port 1270 and secure shell port (default 22), we may try to restart the unix agent to disgnose the issue:

    Pesky UNIX/Linux SCOM Agents (Gray State) – RETURN CODE: 1
    http://www.systemcentercentral.com/pesky-unixlinux-scom-agents-gray-state-return-code-1/

    If we have access to the database, we may run the following query to get the gray reason:

    USE OperationsManagerDW
    SELECT
        ME.Path,
        HSO.StartDateTime AS OutageStartDateTime,
        DATEDIFF (DD, HSO.StartDateTime, GETUTCDATE()) AS OutageDays,
        HSO.ReasonCode,
        DS.Name AS ReasonString
    FROM  vManagedEntity AS ME INNER JOIN
        vHealthServiceOutage AS HSO ON HSO.ManagedEntityRowId = ME.ManagedEntityRowId INNER JOIN
        vStringResource AS SR ON HSO.ReasonCode = 
        REPLACE(LEFT(SR.StringResourceSystemName, LEN(SR.StringResourceSystemName)
        - CHARINDEX('.', REVERSE(SR.StringResourceSystemName))), 
        'System.Availability.StateData.Reasons.', '') INNER JOIN
        vDisplayString AS DS ON DS.ElementGuid = SR.StringResourceGuid
    WHERE (HSO.EndDateTime IS NULL)
        AND (SR.StringResourceSystemName LIKE 'System.Availability.StateData.Reasons.[0-9]%')
        AND DS.LanguageCode = 'ENU'
    ORDER BY OutageStartDateTime

    This is the screenshot in the lab of 1807 with SQL 2016:




    Hope the above information helps.

    Regards,

    Alex Zhu
    -----------------------------------------------
    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.
    Friday, June 7, 2019 3:15 AM
  • Hi Alex,

    I am using scom 2016 version

    I execute this query but in the result, i am not getting the output of the unix server for which i am facing the issue.

    problem is , in the event viewer also,i am not getting any logs, not any alert.

    even i tried restarting the scom agent from unix server, but no luck

    i also tried to kill the process and restart the scom agent, but again no luck



    AD


    • Edited by AD_SC Friday, June 7, 2019 3:49 AM
    Friday, June 7, 2019 3:40 AM
  • Hi,

    Thank you very much for getting back so quickly. If all the above actions fail, we may consider re-installing the agent for the Unix server.

    remember to run something like this before installing the agent:

    rm -rf /etc/opt/Microsoft/scx/* (this is necessary to remove cert)

    Hope the above information helps.

    Regards,

    Alex Zhu
    -----------------------------------------------
    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.
    Friday, June 7, 2019 6:53 AM
  • HI Alex, 

    I even tried uninstalling and installing again, but no luck :-(

    firstly i uninstall from scom console using root cred and after 30 mins, again installed from scom using root cred.

    Also i tried by manually uninstalling the scom agent from unix server side and installing it again from console, But all seems invain

    below command i used:

    Log on as the root user, and uninstall the agent by typing

    rpm -e scx

     

    To verify that the package is uninstalled, type

    rpm -q scx



    AD


    • Edited by AD_SC Friday, June 7, 2019 9:02 AM
    Friday, June 7, 2019 9:01 AM
  • Can someone please help me on this.

    AD

    Monday, June 10, 2019 5:31 AM
  • Here’s a pretty detailed troubleshooting guide that you could try:

    Deploying and Troubleshooting SCOM on Unix/Linux machines


    Blog: https://thesystemcenterblog.com LinkedIn:

    Monday, June 10, 2019 5:47 AM