none
Issue with only some RHEL servers and installing SCOM agent - certificate error RRS feed

  • Question

  • Hello to all - we recently rebuilt our SCOM environment and rebuilt it using 2012 R2 RU3. We have successfully deployed the SCOM agent to all of our Windows servers and most of our Linux servers with no major issues. However, there are three Linux servers running RHEL 6.5 that all have the same exact behavior. They all discover fine, but when you go to do the installation, you immediately get a "Certificate signing operation was not successful."

    Below are the details of the error, if anyone could provide some advice, I would appreciate it.

    Task invocation failed with error code -2130771918. Error message was: The SCXCertWriteAction module encountered a DoProcess exception. The workflow "Microsoft.Unix.Agent.GetCert.Task" has been unloaded.


    Module: SCXCertWriteAction

    Location: DoProcess

    Exception type: ScxCertLibException

    Exception message: Unable to create certificate context
    ; {ASN1 bad tag value met.
    }

    Additional data: Sudo path: /usr/bin/


    Management group: Ours

    Workflow name: Microsoft.Unix.Agent.GetCert.Task

    Object name: Unix Linux Monitoring Resource Pool

    Object ID: {B1698091-E324-6C57-79B4-2BBF46ED2952}

    Wednesday, May 28, 2014 6:12 PM

Answers

  • Ended up figuring out what was going on here.  Not sure if it was the same issue the OP had, but it really isn't documented anywhere so it's worth noting if by chance some one else sees something similar. 

    When SCOM performs discovery of the Unix agents it essentially attempts a connection to port 1270 from the management server.  Our firewalls had 1270 open, but they were filtering out the RST packet that was being returned from the Unix box.  As such, SCOM only saw the Syn/Ack and assumed that something was listening on the port.  Instead of deploying the client, it attempted to sign the certificate.  Since the client was not present, no cert existed, and the install failed.  Once we put a rule in place to allow that packet to be returned to the management server, deployments worked fine. 

    I doubt it happens much, but in high secure environments, I could see something like this causing problems for others. 


    Nathan Gau MCT, MCITP, MCTS, CEH, CISSP

    Thursday, August 7, 2014 1:25 PM

All replies

  • Were these 3 Linux systems discovered properly prior to the rebuild? If so, try deleting the old certificates and rediscovering them. On the Linux system run the following command and then run the Discovery Wizard again.

    rm -rf /etc/opt/microsoft/scx/ssl/*

    If that does not help, try to manually create the certificates and see what gets returned. On the Linux system run the following command, note that the agent should be installed from a prior discovery attempt, and if successful run the Discovery Wizard again.

    /opt/microsoft/scx/bin/tools/scxsslconfig -f -v

    If the scxsslconfig command fails post the results.

    Regards,

    -Steve

    Wednesday, May 28, 2014 8:20 PM
    Moderator
  • I'm curious if anyone found an answer to this.

    I'm experiencing the exact same symptoms and output on a Solaris 10 sparc system.  Discovery works.  I've verified I can SSL to the system from the SCOM server and port 1270 is open between them as well.  Working with my UNIX team, they can see that the scom account is making SUDO commands.  The directory Steve mentioned in the above reply has not been created on this system.  This is a fresh install, the SCOM client has never been there.  There is a certificate on the management server which has been imported into SCOM as well.  If anyone figured this out, I'm kind of curious what it was.


    Nathan Gau MCT, MCITP, MCTS, CEH, CISSP

    Thursday, June 12, 2014 2:24 PM
  • Nathan,

    You stated they can see the SUDO commands, did they configure SUDO properly for the OM agent?

    http://social.technet.microsoft.com/wiki/contents/articles/7375.configuring-sudo-elevation-for-unix-and-linux-monitoring-with-system-center-2012-operations-manager.aspx

    What happens if you run the [/opt/microsoft/scx/bin/tools/scxsslconfig -f -v] command on the agent?

    Regards,

    -Steve

    Thursday, June 12, 2014 2:32 PM
    Moderator
  • Hi Steve,

    Apologies for the long response.  Yes, this is the article that we used to configure it.  Our environment is solaris 10 Sparc and we used the configuration file for Solaris 10 (minus the x86 piece) for SCOM 2012 SP1, which is what we are running.  I've triple checked the sudoer's file as well.  There was on mistake in it originally, but it did not fix it once corrected.

    Also of interest, when I deploy via the wizard, when the screen comes up to manage the client, it simply says to sign the certificate and manage the client.  In our lab environment (where we were able to get this to work) it always says to install the agent.  I connected to the Solaris system and saw no evidence of the agent being installed as the agent's microsoft directories were non-existent.  We've since tried it manually and this failed as well.  My Unix admin ended up having to make the following additions to the sudoers file:

    monuser ALL=(root) NOPASSWD: /usr/sbin/pkgadd

    monuser ALL=(root) NOPASSWD: /usr/sbin/pkginfo -l

    Should note, he doesn't consider this to be a permanent solution, as this essentially gives the monuser account rights to install anything.  This allowed for the install and managing the client. 

    One other thing of note, in that file, the Unix admin wanted to know why there is only one quote in this statement.  He thinks that may be a part of the issue and isn't used to seeing a sudoers file setup in that manner.  The exit he said was a bit different as well.

    monuser ALL=(root) NOPASSWD: /usr/bin/sh -c echo -e "mail=*/usr/sbin/pkgadd -a /tmp/scx-monuser/scx -n -d /tmp/scx-monuser/scx-1.[0-9].[0-9]-[0-9][0-9][0-9].solaris.1[0-1].sparc.pkg MSFTscx;*exit ?EC


    Nathan Gau MCT, MCITP, MCTS, CEH, CISSP


    Wednesday, June 18, 2014 5:12 PM
  • bumping this... still seeing problems.  Just curious if anyone has any ideas here.  I've turned on verbose logging and I'm not seeing much in the logs.  It's like SCOM thinks the agent is there, even though it's not. 

    


    Nathan Gau MCT, MCITP, MCTS, CEH, CISSP

    Tuesday, July 1, 2014 4:03 PM
  • The only difference I can see between a successful and an unsuccessful installation is the contents of the SCXWSManProbAction log.  I've attached the version of a failure below.  For a successful run, the elevtype has a value of 2 instead of 0 that is shown in the picture.  As well, a DoInit process is immediately kicked off in a successful install that is missing from this log. 

    Can anyone tell me why this might be happening?  Or perhaps can anyone tell me where there would be additional log data I can review (particularly on the UNIX box).  I can say for certain that none of the Microsoft/scx items were created.  When the install kicks off, it's like it thinks it's there and attempts to immediately sign the cert.

    Adding contents of the SCXWSManProbAction

    


    Nathan Gau MCT, MCITP, MCTS, CEH, CISSP

    Tuesday, July 1, 2014 4:56 PM
  • Nathan,

    Sorry, meant to get to this and forgot.

    This is correct -

    "monuser ALL=(root) NOPASSWD: /usr/bin/sh -c echo -e "mail=*/usr/sbin/pkgadd -a /tmp/scx-monuser/scx -n -d /tmp/scx-monuser/scx-1.[0-9].[0-9]-[0-9][0-9][0-9].solaris.1[0-1].sparc.pkg MSFTscx;*exit ?EC"

    The single quote gets handled by the * which tells it to match anything up the /usr/sbin/pkgadd. This does work in our lab environments so I doubt it's the issue but you can add the following to the /etc/sudoers file and see what SUDO is complaining about.

    Defaults logfile=/var/log/sudo.log

    When you say this - "Also of interest, when I deploy via the wizard, when the screen comes up to manage the client, it simply says to sign the certificate and manage the client"

    If running SCOM 2012 or SCOM 2012 SP1, can you do a 'ps -ef | grep scx' and verify there are no old agent processes still running on this system?

    If running SCOM 2012 R2, do a 'ps -ef | grep omi' 

    There are additional things we can check but it would be best if you open a support case with MS and someone can work directly with you to help resolve the issue.

    Regards,

    -Steve

    Tuesday, July 1, 2014 4:57 PM
    Moderator
  • logged in with the service account my output to the ps-ef | grep scx is as follows:

    monuser 18132  2922  0 11:29:19 pts/3   0:00 grep scx

    Not quite sure what that means


    Nathan Gau MCT, MCITP, MCTS, CEH, CISSP

    Wednesday, July 2, 2014 3:35 PM
  • Just means it found your grep command as a process running. You can run it as follows and it will not show it - 'ps -ef | grep scx | grep -v grep' but it does not look like there are any running agent processes so this rules that out.

    Based off everything you posted it looks like the Discovery Wizard thinks there is an agent already installed on this system. Are you sure SCOM is resolving the hostname of this system properly and it's not actually trying to connect to another system in your environment?

    Regards,

    -Steve

    Wednesday, July 2, 2014 3:48 PM
    Moderator
  • Steve, am having exactly same issue.  In first instance something went wrong and I removed everything from /opt/; /var/opt/; /etc/opt/ of microsoft folder. Then found one stale entry showing scx from command "ps -ef|grep scx" and removed it.  Now when am trying to install it is failing at signing process with same error mentioned above "Task invocation......".  Verified host name by pinging CN name, FQDN, IP address, ping with -a option and nslookup.  Still it is failing. 

    Any help please..

    Thanks,

    Satya

    Saturday, July 5, 2014 7:17 AM
  • Satya,

    Run the following on the agent and see what is returned for the CN name?

     openssl x509 -in /etc/opt/microsoft/scx/ssl/scx.pem -text | grep CN
    

    If successful compare what is returned to what is in your DNS server SCOM uses to resolve the hostname. Do they match?

    Regards,

    -Steve

    Monday, July 7, 2014 2:39 PM
    Moderator
  • Steve,

    I removed everything from /opt; /var/opt/; /etc/opt/.  There was a zombie process after that which was also removed.  Since then when trying to install agent, it is failing at "Signing" with below error .  

    Task invocation failed with error code -2130771918. Error message was: The SCXCertWriteAction module encountered a DoProcess exception. The workflow "Microsoft.Unix.Agent.GetCert.Task" has been unloaded.

    Module: SCXCertWriteAction
    Location: DoProcess
    Exception type: ScxCertLibException
    Exception message: Unable to create certificate context
    ; {ASN1 bad tag value met.
    }
    Additional data: Sudo path: /usr/bin/
    Management group: <Removed>
    Workflow name: Microsoft.Unix.Agent.GetCert.Task
    Object name: Unix/ Linux Monitoring Resource Pool
    Object ID: <Removed>

    And when I checked no microsoft folder is getting created at the specified locations of opt.  

    -Satya

    Monday, July 7, 2014 3:23 PM
  • Certificate signing error - 02130771918

    Task invocation failed with error code -2130771918. Error message was: The SCXCertWriteAction module encountered a DoProcess exception. The workflow “Microsoft.Unix_Agent.GetCert.Task” has been unloaded.

    Module: ScxCertWriteAction Location: DoProcess Exception type: ScxCertLibException Excheption message: Unable to open root store : { Access is denied. }

    Possible Causes

    • The Management Servers default action account does not have the necessary privileges (administrator) to open the root certificate store.

    Resolutions

    • Set the Action Account for the Manage Server(s) as a local administrator account
    • Configure a local administrator account in the Run As Profile: Certificate Signing Profile

    If the Certificate Signing Profile is configured, the action account associated in that profile will be used. If not, it will fall back to the default Action Account.

    -Steve

    Monday, July 7, 2014 3:39 PM
    Moderator
  • I see the action account in local administrators group on one MS, does it need to be in local admin group of 2nd MS also?  Please let me know.

    -Satya

    Monday, July 7, 2014 3:51 PM
  • yes - SCOM needs access to the certificate store on each MS.

    -Steve

    Monday, July 7, 2014 3:54 PM
    Moderator
  • Added action account to local admin group on MS participating in Unix/Linux Monitoring Resource Pool.  Restarted services (Microsoft Monitoring agent, Data access and Configuration) and tried again.  But failed.

    Above in your reply you mentioned-

    Excheption message: Unable to open root store : { Access is denied. } 

    But it is actually below one, is that differs?

    Exception message: Unable to create certificate context

    -Satya

    Monday, July 7, 2014 4:09 PM
  • Still sounds like a access error. I'd open a ticket with MS and someone can work directly with you to resolve the issue as trying to troubleshoot it over the forums is not resolving the issue.

    Regards,

    -Steve

    Monday, July 7, 2014 6:39 PM
    Moderator
  • Hi,

    You can install the agent manually for the agents. you just need to install the agent and generate the certificate and then manually sign the certificate using scxcert.exe. Then place the certificates on both the SCOM management server and agent machine.

    Regards,

    DR

    Wednesday, July 9, 2014 1:59 PM
  • Ended up figuring out what was going on here.  Not sure if it was the same issue the OP had, but it really isn't documented anywhere so it's worth noting if by chance some one else sees something similar. 

    When SCOM performs discovery of the Unix agents it essentially attempts a connection to port 1270 from the management server.  Our firewalls had 1270 open, but they were filtering out the RST packet that was being returned from the Unix box.  As such, SCOM only saw the Syn/Ack and assumed that something was listening on the port.  Instead of deploying the client, it attempted to sign the certificate.  Since the client was not present, no cert existed, and the install failed.  Once we put a rule in place to allow that packet to be returned to the management server, deployments worked fine. 

    I doubt it happens much, but in high secure environments, I could see something like this causing problems for others. 


    Nathan Gau MCT, MCITP, MCTS, CEH, CISSP

    Thursday, August 7, 2014 1:25 PM
  • Nathan,

    Thanks for posting the solution to your issue. I'm sure others may run across this and this is very helpful.

    Regards,

    -Steve

    Thursday, August 7, 2014 2:42 PM
    Moderator
  • Hi Steve,

    I've got the same error when trying to tool from the SCOM Console, but even when trying to install the agent manually, it fails stating that the HISTFILE is a read only variable. When executing the --help command when trying to manually install the package manually, i can see the error happening on Line 45 of the histfile. Here is what happens between line 42 and line 46:

    touch $histfile
    chmod 600 $histfile

    export HISTFILE=$histfile
    export HISTSIZE=255

    I believe that on Line 43 when it changes the rights on HISTFILE to 600, it causes the issue. 

    I'm still awaiting feedback from our Unix admins on the matter, but it seems for now my issue is locally on that O/S. We run RHEL 7.1 on the server we are trying to tool.

    EDIT: here is the initial error when executing the package:

    [username@server /tmp]$ sudo rpm scx-1.5.1-256.rhel.7.x64.rpm --install
    basename: extra operand ‘tty’
    Try 'basename --help' for more information.
    /root/.profile.security: line 45: HISTFILE: readonly variable
    warning: %post(scx-1.5.1-256.el7.x86_64) scriptlet failed, exit status 1

    Best Regards


    • Edited by AnDuToit Tuesday, January 22, 2019 9:35 AM
    • Proposed as answer by CyrAz Monday, February 4, 2019 11:42 AM
    • Unproposed as answer by CyrAz Monday, February 4, 2019 11:48 AM
    Tuesday, January 22, 2019 9:21 AM
  • A different reason for the same message .

    Linux client had been removed unsuccesfully, with errors, so that re-installation raised  a totally similar issue .

    We realized that the reason was in two uninstalled packages  with names omi.x86_64 and auoms.x86_64.

    yum remove  made us happier 8-) . Removing both packages was the decision, client reinstalled succesfully and now it works !

    Monday, February 4, 2019 9:16 AM