none
SQL deadlocks after upgrading to MIM 2016 RRS feed

  • Question

  • I've upgraded my development FIM 2010 R2 environment to MIM 2016 (4.3.2195) on new virtual servers (same underlying hardware).  The OS is Server 2012 R2 and SQL is 2014 Standard (12.0.4213.0). I'm getting hundreds of failed-modification-via-web-services errors when exporting to the FIM connector which look to be due to deadlocks:

    Stack Trace: Microsoft.ResourceManagement.WebServices.Exceptions.UnwillingToPerformException: Other ---> System.Data.SqlClient.SqlException: Reraised Error 50000, Level 13, State 1, Procedure ReRaiseException, Line 37, Message: Reraised Error 50000, Level 13, State 1, Procedure ReRaiseException, Line 37, Message: Reraised Error 1205, Level 13, State 51, Procedure GenerateRequestOutput, Line 1148, Message: Transaction (Process ID 110) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
       at System.Data.SqlClient.SqlConnection.OnError

    I've worked with my DBA and we've tweaked a few things on the SQL side (including adding RAM and moving disk to new SAN storage), but it doesn't seem to have helped much at all.  My FIM 2010 setup seemed to handle largish volumes of updates like this (3000+) relatively error-free.  I'm hesitant to upgrade my production environment without resolving this, since that volume of change does occur there periodically as well.  Is there tuning I can do to reduce or eliminate this problem?  What additional information about my setup would be helpful?  Most of the person updates that are failing look to be minor and not really different from those that appear to be succeeding.

    -Robert
    UW-River Falls

    Thursday, June 16, 2016 3:42 PM

Answers

  • Robert,

    I have seen this sort of error occur when you have two workflows triggered at the same time trying to make changes to the same object. I suggest you look at the failed requests and see what they were trying to do. See what MPR kicked them off and what other WFs it may have kicked off, even what other subsequent MPRs and workflows.

    You may have to redesign how you do some of these things.

    I know it shouldn't have changed since April but when you have a race condition changing the track (new VMs, and OSes) can cause the race condition to surface.


    David Lundell, Get your copy of FIM Best Practices Volume 1 http://blog.ilmbestpractices.com/2010/08/book-is-here-fim-best-practices-volume.html

    Friday, June 24, 2016 11:04 PM
  • An embarrassing misconfiguration was the source of our problems; The database compability level was set to 2014. This should be left alone with the default value (2008 level). All our deadlocks disappeared when this was corrected.
    • Marked as answer by Robert Rust Wednesday, February 8, 2017 4:32 PM
    Wednesday, November 23, 2016 2:03 PM

All replies

  • Hi,

    just a blind arrow.

    I think error is not related with DB capacity.

    Please try once steps that are generally followed for 'stopped server error'.

    (1)Start all services sequential and then do MIM service delta import and delta synch...

    -Mann

    Friday, June 17, 2016 8:31 AM
  • Mann,

    I've tried that as well as using a full import/full sync.  While the export is likely triggering set membership changes and probably a workflow, none of that has changed since I ran a similar export in early April on FIM 2010R2.

    -Robert

    Friday, June 17, 2016 1:29 PM
  • I had the blues with these deadlocks. My solution is to install a new FIMService and populate it again.

    1. Export ALL objects. Policy, Portal and ALL custom objects. You will need to modify the ExportPolicy.ps1. Export the schema also, naturally.

    2. install fresh new  FIMService database.

    3. Import the objects. You will need to work a bit on the matching criteria in the SyncPolicy.ps1.

    In this way you rule out the internal corruptions. The export / import can't carry it. Since I already have 2 clients dealing with the same problem, please let me know how your case is going.


    GH


    • Edited by Guy Horn Friday, June 24, 2016 1:01 PM
    Friday, June 24, 2016 1:00 PM
  • Robert,

    I have seen this sort of error occur when you have two workflows triggered at the same time trying to make changes to the same object. I suggest you look at the failed requests and see what they were trying to do. See what MPR kicked them off and what other WFs it may have kicked off, even what other subsequent MPRs and workflows.

    You may have to redesign how you do some of these things.

    I know it shouldn't have changed since April but when you have a race condition changing the track (new VMs, and OSes) can cause the race condition to surface.


    David Lundell, Get your copy of FIM Best Practices Volume 1 http://blog.ilmbestpractices.com/2010/08/book-is-here-fim-best-practices-volume.html

    Friday, June 24, 2016 11:04 PM
  • David,

    There is only one request showing as failed in the portal, but it's a msidmCompositeType request so it contains 600+ attribute changes for various objects.  One third of the request is simply marking sync rules as "Applied".

    -Robert

    Monday, June 27, 2016 3:37 PM
  • Robert,

    I find regularly rebuilding indexes offline helps especially if there are many long-running queries. When you browse your MIM portal, do you run into lots of timeouts?

    There are some good articles around optimizing FIM performance that I would like to share with you:

    https://technet.microsoft.com/en-us/library/ff608274(v=ws.10).aspx

    https://blog.css-security.com/blog/optimizing-fim-performance


    Did my post help? Please use "Vote As Helpful", "Mark as answer" or "Propose as answer". Thank you!

    Thursday, July 7, 2016 7:41 PM
  • Tried this approach on a MIM SP1 lab installation at one of my customers with no success. First we saw a lot of SQL deadlock errors after after we upgraded their FIM2010 R2 lab installation to MIM (Without SP1). We can recreate the deadlocks by only running a simple workflow manipulating a bool attribute on a few selected users - some workflow instances completes fine, but most of them deadlocks...

    We then tried to upgrade the same installation further to MIM SP1 with same results - a lot of deadlocks by triggering the same test workflow. 

    So the last couple of days and nights I have been reinstalling av brand new MIM SP1 installation in their test environment and have done a painfully slow full export/import of the schema, policy and all other objects from the broken MIM installation to the new one. First everything worked beautifully - but after some testing this started to generate deadlocks to.

    Versions that i'm testing on:

    Windows Server 2012 R2
    SQL Server 2014 SP2 CU2 (12.0.5522.0)
    MIM 2016 With SP1 (4.4.1237)


    • Edited by Fredrik Melby Friday, November 18, 2016 1:23 PM Typos
    Friday, November 18, 2016 1:22 PM
  • An embarrassing misconfiguration was the source of our problems; The database compability level was set to 2014. This should be left alone with the default value (2008 level). All our deadlocks disappeared when this was corrected.
    • Marked as answer by Robert Rust Wednesday, February 8, 2017 4:32 PM
    Wednesday, November 23, 2016 2:03 PM
  • Were you able to just change the setting (stopping/restarting appropriate services of course) and then it worked? I've currently got the databases at 2014.

    • Edited by Robert Rust Wednesday, November 23, 2016 9:50 PM
    Wednesday, November 23, 2016 9:49 PM
  • Fredrik

    I would recommend opening a case as we have seen in rare cases issue with Boolean attribute on FIM MA export causing deadlock . To confirm it is the same and appropriate fix  , Also a deadlock graph would help

    Wednesday, November 30, 2016 2:24 PM
    Moderator
  • I did switch compatibility level to 2008 and my deadlock errors went away.  I've run a few sync cycles with new data and haven't encountered any. I've had a case open with Microsoft on this problem since the end of June and they have never suggested this workaround, though I fear that they may consider it a resolution without identifying the root cause.

    -Robert

    Monday, December 5, 2016 7:33 PM
  • Hello Robert,

    From what you described and the errors you are seeing, I am fairly certain this is the same issue I ran into a few times. I recently opened a case with Microsoft about the issue you are seeing and got it bugged. A fix should be coming out in the future, but the temporary solution from Microsoft was to replace the default fim_generaterequestoutput stored SPROC with a new one from Microsoft. The new stored SPROC essentially allows SQL to retry on a deadlocked task. If you still have your case open, bring this to their attention and they may discover it is the same issue and give you the updated SPROC. The only caveat is that you will have to make sure to replace the default SPROC with the updated one when you apply any FIM/MIM updates.

    Thanks.

     
    Wednesday, December 7, 2016 6:18 AM
  • Jordan we would need to see the deadlock graph to see if it matches the issue you reported as your was related to a Boolean update
    Wednesday, December 7, 2016 1:33 PM
    Moderator
  • You are correct David, sorry if I jumped the gun a little bit. Hopefully Robert Rust can get those SQL deadlock graph to you guys to identify the issue.

    Thanks.

    Wednesday, December 7, 2016 11:02 PM
  • I thought I'd post a quick update to this.  My system is performing fine with the database(s) in SQL 2008 compatibility mode and I've done my production system upgrade as well (using 2008 compatibility). I never got an answer from Microsoft as to why that was necessary or whether I'd ever be able to run in a newer mode. I'm sorry I don't have a deadlock graph available. :(

    -Robert

    Wednesday, February 8, 2017 4:35 PM
  • Hi David, this problem where not related (directly) to the FIM MA. The deadlocks we saw, could be triggered by changes made by MPRs/Workflows in the service - and changes made in the service during FIM MA export (who again triggered MPRs/Workflows for further changes). The deadlocks did occur on changes to both bool, strings and number attributes (That I tested..).

    In SQL server 2014 they have made some changes around parallel execution of queries/procedures. This information corresponds very nicely with the deadlocks we saw while the database was in 2014 version mode: FIM procedures (fim_generaterequestoutput) in the database tried to update the same table at the same time.


    Tuesday, May 2, 2017 11:57 AM