none
ALL AIX scom agent lost contact with SCOM server

    Question

  • Dear all,

     

    All our scom agent on AIX suddenly lost contact with SCOM server.  Other platform like Redhat, solaris are not affected.

    We get 2 types of alert, "Heartbeat failed" and "Access Denied Error".

    We have try restart the agent, reinstall the agent, but still doesn't work.

    We are using Cross platform Update 2.

    We confirmed the unix action account and password is right, because same account is used to contact redhat and solaris too.  If it is not correct, other platform should have same issue.

    Any idea?

     

    There are some log.

    auth.log

    Jan 18 15:56:55 asihkdodb02 auth|security:debug cimservera PAM: pam_authenticate: error Authentication failed

    Jan 18 15:57:44 asihkdodb02 auth|security:debug last message repeated 6 times

    Jan 18 15:58:14 asihkdodb02 auth|security:debug cimservera PAM: pam_authenticate: error Authentication failed

    Jan 18 15:59:44 asihkdodb02 auth|security:debug last message repeated 3 times

    Jan 18 16:00:14 asihkdodb02 auth|security:debug cimservera PAM: pam_authenticate: error Authentication failed

    Jan 18 16:01:55 asihkdodb02 auth|security:debug last message repeated 14 times

    Jan 18 16:02:14 asihkdodb02 auth|security:debug cimservera PAM: pam_authenticate: error Authentication failed

    Jan 18 16:03:44 asihkdodb02 auth|security:debug last message repeated 3 times

    Jan 18 16:04:14 asihkdodb02 auth|security:debug cimservera PAM: pam_authenticate: error Authentication failed

    Jan 18 16:04:44 asihkdodb02 auth|security:debug cimservera PAM: pam_authenticate: error Authentication failed

     

     

    scxcimd.log

    02/02/2011-17:39:38 INFO    cimserver: Authentication failed for user=scom_usr.

    02/02/2011-17:40:08 INFO    cimserver: Authentication failed for user=scom_usr.

    02/02/2011-17:40:38 INFO    cimserver: Authentication failed for user=scom_usr.

    02/02/2011-17:41:08 INFO    cimserver: Authentication failed for user=scom_usr.

    02/02/2011-17:41:38 INFO    cimserver: Authentication failed for user=scom_usr.

    02/02/2011-17:42:08 INFO    cimserver: Authentication failed for user=scom_usr.

    02/02/2011-17:42:38 INFO    cimserver: Authentication failed for user=scom_usr.

    02/02/2011-17:43:08 INFO    cimserver: Authentication failed for user=scom_usr.

    02/02/2011-17:43:38 INFO    cimserver: Authentication failed for user=scom_usr.

    02/02/2011-17:44:07 INFO    cimserver: Authentication failed for user=scom_usr.


    Thanks.

    -Wilson

    • Moved by WadeWe Wednesday, October 26, 2011 8:47 PM New forum (From:Cross Platform Solutions for System Center)
    Monday, February 07, 2011 9:49 AM

Answers

  • Wilson,

    Your PAM setting are good as the winrm command would have failed if that was the problem and I doubt the Tivoli agents are causing issues either as if the winrm command works I cannot see why OM would not.

    My next suggestion would be to look at your Run As accounts. Has something changed in the Unix action account? Is scom_usr added to the Unix Computer Class? Look at this link and just verify your Run as accounts are setup properly. http://technet.microsoft.com/en-us/library/dd788981.aspx

    Has the password been reset for the scom_usr on the AIX systems but not reset on your other UNIX/Linux platforms? This would explain why the Run as accounts still work for your other systems but fails on AIX. Just based off the logs errors you sent it appears to be a password issue and the Run as accounts would be the only place OM gets this information.

    -Steve

    • Marked as answer by Wilson010 Tuesday, February 08, 2011 2:33 PM
    Tuesday, February 08, 2011 1:28 PM
    Moderator

All replies

  • Wilson,

    What does your /etc/pam.conf file look like on the AIX systems? It should have the following in it. Ususally at the very bottom.

    # The configuration of scx is generated by the scx installer.
    scx    auth    required        /usr/lib/security/pam_aix
    scx    account required        /usr/lib/security/pam_aix
    # End of section generated by the scx installer.

    Regards,

    -Steve

    Monday, February 07, 2011 3:56 PM
    Moderator
  • Hi Steve,

    They are there.  We have these lines in pam.conf.  The AIX agents work fine for many months, just suddenly not working all together.

    # The configuration of scx is generated by the scx installer.

    scx    auth    required        /usr/lib/security/pam_aix

    scx    account required        /usr/lib/security/pam_aix

    # End of section generated by the scx installer.

     

    Is it possible it's problem with pam on AIX side?  How to confirm it?

     

    And I can run winrm command on RMS, sound the validation is fine.

     

    C:\Documents and Settings\xxxxx>winrm e http://schemas.microsoft.com/wbem/wsci

    m/1/cim-schema/2/SCX_Agent?__cimnamespace=root/scx -r:https://xxxx:1270 -

    u:scom_usr -p:xxxxx -auth:basic -encoding:utf-8

    SCX_Agent

        Architecture = powerpc

        BuildDate = 2010-01-25T00:00:00Z

        BuildNumber = 258

        Caption = SCX Agent meta-information

        Description = Labeled_Build - 20100125

        ElementName = null

        HealthState = null

        Hostname = as2hkdapp06.asia.ccb.com

        InstallDate = 2010-02-23T11:07:08Z

        KitVersionString = 1.0.4-258

        MajorVersion = 1

        MinActiveLogSeverityThreshold = INFO

        MinorVersion = 0

        Name = scx

        OSAlias = AIX

        OSName = AIX

        OSType = AIX

        OSVersion = 6.1

        OperationalStatus = null

        RevisionNumber = 4

        Status = null

        StatusDescriptions = null

        UnameArchitecture = powerpc

        VersionString = 1.0.4-258

     

     

    -Wilson

    Tuesday, February 08, 2011 1:23 AM
  •  

    I hear the unix guy have mention they have tivoli agent on AIX too, not sure if they are related, just for information only. 

    -Wilson

    Tuesday, February 08, 2011 1:56 AM
  • Wilson,

    Your PAM setting are good as the winrm command would have failed if that was the problem and I doubt the Tivoli agents are causing issues either as if the winrm command works I cannot see why OM would not.

    My next suggestion would be to look at your Run As accounts. Has something changed in the Unix action account? Is scom_usr added to the Unix Computer Class? Look at this link and just verify your Run as accounts are setup properly. http://technet.microsoft.com/en-us/library/dd788981.aspx

    Has the password been reset for the scom_usr on the AIX systems but not reset on your other UNIX/Linux platforms? This would explain why the Run as accounts still work for your other systems but fails on AIX. Just based off the logs errors you sent it appears to be a password issue and the Run as accounts would be the only place OM gets this information.

    -Steve

    • Marked as answer by Wilson010 Tuesday, February 08, 2011 2:33 PM
    Tuesday, February 08, 2011 1:28 PM
    Moderator
  • Hi Steve,

    Thanks for information.

    Yes, you are right.  It must be something wrong about the Run As accounts.  So I double check all the account setting.  And find that there is a group which have a dynamic inclusion rule, defined to include all the AIX server!!!  We haven't update this group for a long time(it's create in the UAT testing phrase at very beginning time).  But recently did!!!

    This group also binded to an unix runas account, but with different password.  That's why all AIX server DIE together.

    After removed this rule, everything work fine again.

    Thanks a lot.

     

    -Wilson

     

     

    Tuesday, February 08, 2011 2:32 PM
  • Hi steve,

    Thanks for your information,

    We are installing scom2012 in AIX 6.1 server , scom2012 is installed and also certificate signed at last we are getting the error as Access denied.

    Error message as:

         Discovery was not sucessful

       Computer: Server name

       Message: Invalid credentials

       Details:The agent responded to the request but the WSM connect failed due to : Access is Denied


    Credentials are correct we are trying with root user. Also in /etc/pam.conf the entries are done automatically. its showings as

    # The configuration of scx is generated by the scx installer.
    scx    auth    required        /usr/lib/security/pam_aix
    scx    account required        /usr/lib/security/pam_aix
    # End of section generated by the scx installer.

    Please help for the same....!

    -jagadeesan




    • Edited by Jagadeesan K Wednesday, July 24, 2013 5:38 PM missing words
    Wednesday, July 24, 2013 5:34 PM
  • Is the agent actually installed on the AIX server? If so see if you can connect via the following winrm command from the SCOM server via a command prompt window.

     winrm e http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_Agent?__cimnamespace=root/scx -r:https://xxxx:1270 -u:root -p:xxxxx -auth:basic -encoding:utf-8

    If the agent is not installed it appears you have permission problems with the account you are using. Verify the account can ssh into the AIX server. Note that when installing the agent, the account must have root access. Either the root account needs to be used or you need to use an account with SUDO.

    Regards,

    -Steve


    Wednesday, July 24, 2013 5:47 PM
    Moderator
  • Thank steve,

          I tried winrm in command prompt in scom server, the following error message im getting:

    WSManFault

       Message = The server Certificate on the destination computer <XXXX:1270> has the following errors:

    The SSL Certificate could not be checked for the revocation. The server used to check for revocation might be unreachable.

    The SSL Certificate contains a common name <CN> that does not match the hostname

    -

    Error number: -2147012721 0x80072F8F

    A security error occurred

    Regards

    Jagadeesan

    Wednesday, July 24, 2013 6:14 PM
  • Bad certificate. Your hostname for the AIX server does not match what is in your DNS that SCOM is using.

    On the AIX server run the following command:

    openssl x509 -in /etc/opt/microsoft/scx/ssl/scx.pem -text

    Look at the 'Subject:' line. It should look similar to this but with your settings.

    Subject: DC=com, DC=abc, CN=aixserver, CN=aixserver.abc.com

    If you do a nslookup from the SCOM server is this what it resolves to? My guess is not and you need to fix it.

    Regards,

    -Steve

    Wednesday, July 24, 2013 6:22 PM
    Moderator
  • I checked the DNS entries, Both the DNS entries are same in AIX server and also in SCOM server.

    Regards

    Jagadeesan

    Wednesday, July 24, 2013 6:27 PM
  • Please run the openssl command and verify. The error is specifically stating that your CN does not match the hostname.

    See this link for details - http://social.technet.microsoft.com/wiki/contents/articles/4966.troubleshooting-unixlinux-agent-discovery-in-system-center-2012-operations-manager.aspx#A

    -Steve


    Wednesday, July 24, 2013 6:30 PM
    Moderator
  • I upgraded the version of openssl and openssh in AIX server, still the same issue.

    Could you please tell me what are all the commands and settings  need to check for openssl in AIX server and  in scom server.

    Note: Out of 13 AIX servers we have configured SCOM 2012 in 12 Server , rest only one server having this access denied issue.

    Regards

    Jagadeesan

    Wednesday, July 24, 2013 7:25 PM
  • I don't believe there is anything wrong with OpenSSL or OpenSSH. You just need to get the certificate configured properly based on the error you posted originally.

    Can you run the openssl command I sent in a earlier post on the AIX server?

    "openssl x509 -in /etc/opt/microsoft/scx/ssl/scx.pem -text"

    This should print out the certificate for the AIX agent. Does this work?

    -Steve

    Wednesday, July 24, 2013 7:36 PM
    Moderator
  • Yes i given the command that you given for certificate, Please see the below output of the command:


    root@XXXX:/root $openssl x509 -in /etc/opt/microsoft/scx/ssl/scx.pem -text
    Certificate:
        Data:
            Version: 1 (0x0)
            Serial Number: 1 (0x1)
            Signature Algorithm: sha1WithRSAEncryption
            Issuer: CN=SCX-Certificate/title=<<i remove this title>>, DC=SCOM2012
            Validity
                Not Before: Jul 23 07:38:18 2012 GMT
                Not After : Jul 24 06:03:10 2023 GMT
            Subject: DC=local, DC=steelsa, CN=XXXX, CN=XXXX.steelsa.local
            Subject Public Key Info:
                Public Key Algorithm: rsaEncryption
                RSA Public Key: (2048 bit)
                    Modulus (2048 bit):
                        << i remove the text>>
                    Exponent: 65537 (0x10001)
            X509v3 extensions:
                X509v3 Extended Key Usage:
                    TLS Web Server Authentication
        Signature Algorithm: sha1WithRSAEncryption
            << i remove the text>>
    -----BEGIN CERTIFICATE-----
    << i remove the textt>>
    -----END CERTIFICATE-----

    Regards

    Jagadeesan

    Wednesday, July 24, 2013 7:44 PM
  • And XXXX.steelsa.local (where XXXX is the actual hostname) is what a nslookup returns from the SCOM server?

    -Steve

    Wednesday, July 24, 2013 7:51 PM
    Moderator
  • Hostname XXXX is duplicate but both the SCOM and AIX server is having the same hostname that i check from nslookup. i am sure about hostname.

    Regards

    Jagadeesan K

    Wednesday, July 24, 2013 7:55 PM
  • Not sure what you mean by 'Hostname XXXX is a duplicate" but try the following:

    On the AIX server run

    /opt/microsoft/scx/bin/tools/scxsslconfig -h <hostname of AIX server in DNS> -d <domain name of AIX server in DNS> -f -v

    Once complete try the winrm command I sent originally and see if it works. If this works run discovery again on the AIX server from SCOM.

    -Steve

    Wednesday, July 24, 2013 8:02 PM
    Moderator
  • while sending you i changed Actually hostname as XXXX  , but actual hostname is drrapp_ps.

    This AIX server is in cluster node having two IP's and Two Hostnames:

          Cluster hostname  : drrapp

          Physical hostname : drrapp_ps

    We are trying for physical IP and hostname (drrapp_ps)

    Please see the output of the command you have given:

    root@drrapp:/root $/opt/microsoft/scx/bin/tools/scxsslconfig -h drrapp_ps -d steelsa.local -f -v         
    Setting debugMode=true
    Generated hostname:   "drrapp" (eGethostname)
    Generated domainname: "steelsa.local" (eEtcResolvConf)

    Original Host Name:     drrapp_ps
    Original Domain Name:   steelsa.local
    Start Days:    -365
    End Days:      7300
    Cert Length:   2048
    Target Path:   /etc/opt/microsoft/scx/ssl

    Generating certificate with hostname="drrapp_ps", domainname="steelsa.local"

    Not using punycode because library not found, reason given: '   0509-022 Cannot load module .   0509-026 System error: A file or directory in the path name does not exist.'.
    Converting string in raw form, 'steelsa.local'.
    Domain name, after processing:steelsa.local
    Generated certificate
    return code = 0
    root@drrapp:/root $

    After this i tried winrm command  in SCOM server but the same error i am getting

    WSManFault

       Message = The server Certificate on the destination computer <XXXX:1270> has the following errors:

    The SSL Certificate could not be checked for the revocation. The server used to check for revocation might be unreachable.

    The SSL Certificate contains a common name <CN> that does not match the hostname

    -

    Error number: -2147012721 0x80072F8F

    A security error occurred

    After that i tried to discovery again but still the same Access denied error.

    Regards

    Jagadeesan

    Wednesday, July 24, 2013 8:18 PM
  • What name are you putting in the discovery wizard when you go to discover it, drrapp.steelsa.local or drrapp_ps.steelsa.local? The certificate is being generated for drrapp_ps.steelsa.local so this is what you need to use in the discovery wizard and also in the winrm command.

    Is this what you tried and you are still getting the CN error?

    -Steve

    Wednesday, July 24, 2013 8:28 PM
    Moderator
  • In discovery wizard i am using the IP address of drrapp_ps.steelsa.local. i have not used hostname there.

    Regards

    Jagadeesan

    Wednesday, July 24, 2013 8:33 PM
  • Now i tried with the hostname drrapp_ps.steelsa.local in discovery wizard and  also in winrm command.

    Output of both  discovery wizard and winrm command is:

    Discovery wizard :

                  Access is denied (same as previous)

    Winrm command :

    WSManFault

       Message = Access is denied

    Error number: -2147024891 0x80070005

    Access is denied

    Now the CN error is not there, but access is denied error.

    Regards

    Jagadeesan

    Wednesday, July 24, 2013 8:55 PM
  • What user are you trying to use when connecting? Can this user ssh into the system without issues?

    -Steve

    Wednesday, July 24, 2013 8:58 PM
    Moderator
  • I am using root user, yes root user can ssh into the system, no issues.

    Regards

    Jagadeesan

    Wednesday, July 24, 2013 9:17 PM
  • What does your /etc/pam.conf file look like on the AIX systems? It should have the following in it. Ususally at the very bottom.

    # The configuration of scx is generated by the scx installer.
    scx    auth    required        /usr/lib/security/pam_aix
    scx    account required        /usr/lib/security/pam_aix
    # End of section generated by the scx installer.

    -Steve

    Wednesday, July 24, 2013 9:20 PM
    Moderator
  • Yes same entries are there as you shown above 

    Output of cat /etc/pam.conf:

    # The configuration of scx is generated by the scx installer.
    scx    auth    required        /usr/lib/security/pam_aix
    scx    account required        /usr/lib/security/pam_aix
    # End of section generated by the scx installer.

    Regards

    Jagadeesan

    Wednesday, July 24, 2013 9:24 PM
  • Anything in the /var/log/syslog.log as to why access is denied? The agent runs on port 1270, is this port open?

    -Steve

    Wednesday, July 24, 2013 9:32 PM
    Moderator
  • Ports are all open , I am getting the following error in /var/adm/messages on AIX server:

    drrapp auth|security:alert cimservera PAM: open_module: module /usr/lib/security/pam_aix writable by group

    Jul 25 03:17:06 drrapp auth|security:err|error cimservera PAM: load_modules: can not open module /usr/lib/security/pam_aix

    Jul 25 03:17:06 drrapp auth|security:alert cimservera PAM: open_module: module /usr/lib/security/pam_aix writable by group

    Jul 25 03:17:06 drrapp auth|security:err|error cimservera PAM: load_modules: can not open module /usr/lib/security/pam_aix

    Regards

    Jagadeesan

    Wednesday, July 24, 2013 9:51 PM
  • Sounds like your problem lies in the /usr/lib/security/pam_aix module. Does it exist and what are the permissions on it? I'd compare it to a working AIX server.

    -Steve


    Wednesday, July 24, 2013 10:01 PM
    Moderator
  • Given full access to /usr/lib/security/pam_aix module and also related Pam file, but no solution for this

    Regards

    Jagadeesan

    Wednesday, July 24, 2013 10:14 PM
  • Hi steve,

    Tuesday, July 30, 2013 6:50 AM
  • Hi steve,

             Still the issue has not resolved, Please help us. We are working last 3 weeks on this same issue but still we have not find the solution.

    -Jagadeesan K 

    Tuesday, July 30, 2013 7:08 AM
  • Jagadeesan,

    Please open a ticket with Microsoft support. At this point we are going to need access to the system so we can determine what the issue is.

    Regards,

    -Steve

    Tuesday, July 30, 2013 11:55 AM
    Moderator