locked
July 2017 updates broke OAuth app configuration replication to secondaries in WID farm RRS feed

  • Question

  • ADFS: Server 2016 x3
    WAP: Server 2016 x3
    FBL: 2016

    Have some OIDC web api and native/server applications configured, some are for test purposes and others are for deployments of LOB applications or services like Kubernetes. I'm seeing this issue across all of these OAuth 2.0 based trusts.

    As an example, I have a standard application group configured. A web application that issues role claims for the user and a server application with a client secret that has rights to a few scopes like "allatclaims, openid, profile".

    When I authenticate for this app against my Primary ADFS server I have no problems, authentication is fine, role claims are sent and processed fine by the client application (which is an asp.net core MVC application). However when I try the same thing against either of the secondaries I have no success with any extra claims, the user is still authenticated and an id_token is issued fine, the access_token is issued and the app is able to work but no role claims come through.

    As this only started happening after the July Windows Updates I tried a few things:

    • Rebooted all servers
    • Validated that all servers were fully patched to the same level
    • Switched secondaries to look at themselves then back at the primary (to force refresh of all trusts via replication)
    • Re-added secondaries to farm (even removed old WID DBs), so effectively a fresh replication of all info from the primary

    Two of the ADFS servers live in one datacenter, they talk to the same domain controllers, have the same user rights, same service account everything matches but no luck. I thought that maybe I could find out what was replicated to one of the secondaries by switching it to primary and booting up the ADFS console.

    When I did this I discovered via the console none of my application groups had apps listed inside them, !?!?!?!. Then I discovered all my OAuth apps, that should be sitting in Application Groups were showing up as Relying Party Trusts. What the!! This is how OAuth trusts worked in ADFS 3.0 if I recall correctly, you have the trust and then you setup via PS the client.

    So the July 2017 patches are doing something weird to these types of apps when they are replicated in a WID farm from a primary to a secondary. As far as I can tell... anyone else seen this? Or able to replicate this? Anyone keen to give it a bash? Shouldn't be hard, set up a native app or web api inside an application group, replicate it to a secondary and turn the secondary to a primary to see what it has.

    Update: Rolling back the update on the secondary servers has no effect, the issue remains. Going to look at rolling back the primary on a clone.

    Update: Reverting FBL for the secondaries, then raising them again does not make a difference either. So it looks like the issue might lie with patches sitting on the Primary, will have to roll them out and see what happens.

    Update: Rolling back July patches has no effect.. so now I'm confused as. The issue remains, which could be just because the patch makes changes to the ADFS config that don't get reverted.
    Wednesday, July 26, 2017 6:48 AM

Answers

  • This has now been resolved as of KB4038801 the relevant note:

    • Addressed issue where the Windows Internal Database (WID) on Windows Server 2016 AD FS servers fails to synchronize some settings because of a foreign key constraint. These settings include the ApplicationGroupId columns from IdentityServerPolicy.Scopes and IdentityServerPolicy.Clients tables. The synchronization failure can cause different claim, claim provider, and application experiences between primary and secondary AD FS servers. Also, if you move the WID primary role to a secondary node, you cannot manage application groups using the AD FS management user interface.

    I would add to this that after applying the KB to the ADFS servers in the farm, I had to actually re-add the secondary nodes by overwriting the config on them with Add-AdfsFarmNode. This pulled a fresh copy of the database from the primary server.


    Monday, October 2, 2017 8:37 AM

All replies

  • We are having exactly the same secondary ADFS server authentication issues, occuring after having applied the July updates. Restoring the ADFS primary and secondary servers from back-up didn't solve it. We had to temporarily shutdown the secondary ADFS server to solve the authentication errors.

    What is the status in your farm? Did you find any cause for this?




    • Edited by niconl Monday, July 31, 2017 7:15 PM
    Monday, July 31, 2017 9:50 AM
  • Not yet no, I'm in the same boat as you, secondary WID farm members don't process extra claims, it is as if they are acting as ADFS 3 compatible nodes even though they are set to ADFS 2016 FBL. I'm kind of glad (sorry!) to hear someone else has this, I'll go ahead and submit it with Microsoft.. through some channel, if you have support options I would raise it with them too.

    Hmm, although.. you say that you restored your primary ADFS server.. and your secondary.. and the issue was still present even though they both didn't have the July patches? Or did you restore the primary, apply patches, then restore the secondary? I appreciate the detail you can provide on your scenario.

    Tuesday, August 1, 2017 3:27 AM
  • Our issue is probably a different one, ADFS version 2012R2.

    After having applied the July patches some users could not connect to Office365 anymore.
    We restored both primary and secondary ADFS servers from backup to undo the patches.
    This didn't fix the connection errors. 
    It appeared that clients connecting through the secondary ADFS server were getting the connection errors.

    The clients connecting through the secondary server directly get a HTTP/1.1 502 Connection Failed when browsing to https://<FQDN Of Our Federation Service>/adfs/ls/idpinitiatedsignon.aspx
    And a connection to 'FQDN Of Our Federation Service' failure. Error: TimedOut (0x274c). 
    System.Net.Sockets.SocketException A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond '<FQDN Of Our Federation Service> IPv4 address':443

    • Edited by niconl Tuesday, August 1, 2017 3:15 PM
    Tuesday, August 1, 2017 3:15 PM
  • Our issue was NLB related. As soon as the second ADFS server came online, the NLB IPv4 address stopped replying over port 443. This is solved now, after having cleared the arp table in the switches.
    • Edited by niconl Thursday, August 3, 2017 6:38 AM
    Wednesday, August 2, 2017 10:09 PM
  • Further investigations found that backing up the ADFS primary WID with the Rapid Restore Tool and restoring it to another node resolved the issue for that other node, but of course this just left us with multiple primaries. Anyway some further digging revealed the following DB differences between a primary and secondary node.

    [AdfsConfigurationV3].[Clients].ApplicationGroupId
    Primary WID: Populated with GUIDs
    Secondary WID: NULL

    [AdfsConfigurationV3].[MetadataSources].PublishedPolicyContents
    Primary WID: Populated with XML data
    Secondary WID: NULL

    [AdfsConfigurationV3].[Scopes].ApplicationGroupId
    Primary WID: Populated with GUIDs
    Secondary WID: NULL

    This issue persists regardless of which 2016 server is set to primary, or even if the data is restored directly into a totally new primary server via a backup. Any secondaries fail to receive the data over their transfer. Just about to apply the September 2017 CU to the farm to see if the issue is resolved, otherwise will raise it with MS Premier and report back (in case anyone else is in the same boat).

    Friday, September 22, 2017 2:45 AM
  • This has now been resolved as of KB4038801 the relevant note:

    • Addressed issue where the Windows Internal Database (WID) on Windows Server 2016 AD FS servers fails to synchronize some settings because of a foreign key constraint. These settings include the ApplicationGroupId columns from IdentityServerPolicy.Scopes and IdentityServerPolicy.Clients tables. The synchronization failure can cause different claim, claim provider, and application experiences between primary and secondary AD FS servers. Also, if you move the WID primary role to a secondary node, you cannot manage application groups using the AD FS management user interface.

    I would add to this that after applying the KB to the ADFS servers in the farm, I had to actually re-add the secondary nodes by overwriting the config on them with Add-AdfsFarmNode. This pulled a fresh copy of the database from the primary server.


    Monday, October 2, 2017 8:37 AM