Answered by:
Kerberos Troubles

Question
-
This is more of a comment than a question, but I would love to get people's feedback on the issue...
Virtually every customer that I have deployed MIIS into has issues with Kerberos. It always bubbles up when the unicodePwd attribute of the Active Directory MA does not properly set the password for new AD Accounts. In fortunate situations, I am able to resolve the issue by trusting the MIIS computer for delegation, but more often than not, Kerberos issues are much more complicated than that.
My question is... Other than the 5-million page documents Microsoft provides for troubleshooting Kerberos, do any of you have a simplified hit-list of things to check when unicodePwd does not work?
Wednesday, July 26, 2006 2:45 AM
Answers
-
Richard,
AD issues in general used to be a serious concern for us for any AD or Exchange migration, much less MIIS implementation. That is until we made an "AD Health Check" mandatory before any AD dependent engagement. Ensynch's AD Health Check is based on the one PSS/ROSS developed and will apply (for free) for certain Enterprise level customers (talk to your TAM). We extended ours to be more deliverable friendly so we can demonstrate value and provide basic documentation on the existing AD deployment.
Since making the ADHC mandatory we no longer experience any project delays (outside of immediate outages) due to AD inconsistancies. In my time I have seen AD in some pretty poor states, but I would say replication issues are the most common. Kerberos time inconsistancies are typically lower on the list of "symptoms" in my experience. In other words, if you were to take a broad look at all of the symptoms and chase them down individually you'd have much more work on your hands as opposed to chasing down and troubleshooting replication and schema inconsistencies holistically. It's my experience that if you can solve the really big replication issues, that most of the other little problems will resolve themselves; because they were really symptoms of a much larger problem.
So, my recommendation is to first establish that replication is not broken. If you can find the cuase of a replication or NC replication failure then you can typically resolve any Kerberos problems. On rare occasions (back in the W2k pre-SP4 days) I would need to reset the secure channel of the DC to resolve replication "Access Denied" issues (those are some of the worst ones to deal with).
As for tools - start with DCDIAG, REPLMON, and the like. You need to isolate the DCs that are not replicating and then verify the DNS records (DNSLINT is a good start) and that you have proper connectivity (no intermediate firewalls blocking any interesting ports).Wednesday, July 26, 2006 2:30 PM -
Yesterday I read through MIIS Series 1.4 and in one of the chapters the following is recommended:
3.
Increase the default Kerberos version 5 authentication protocol time-out value on the MIIS 2003 with SP1 server by adding the registry parameter KdcWaitTime to the following registry key and setting the time-out value to 30 seconds. This time-out value must be increased from the default of 5 seconds to ensure that you do not experience Kerberos protocol time-out issues caused by network latency.
1.
Start Registry Editor.
2.
Under the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ Lsa\Kerberos\Parameters key, create a REG_DWORD value named KdcWaitTime and set its value to 30 (seconds).
3.
Restart the MIIS 2003 with SP1 server for the changes to take effect.
See: http://www.microsoft.com/technet/security/topics/identitymanagement/idmanage/P2Ident_4.mspx?mfr=true
In the same series, but then the Password management section there is a note under chapter Troubleshooting (table 6.1) about time skew. It says:
"Make sure that the SPN set for the MIIS 2003 with SP1 service account is correct."
and "Check the time skew between the domain controller and the MIIS server. For Kerberos protocol authentication, this time skew should by default not be more than five minutes."See: http://www.microsoft.com/technet/security/topics/identitymanagement/idmanage/p2pass_5.mspx?mfr=true
and http://technet2.microsoft.com/WindowsServer/en/library/6ee8470e-a0e8-40b2-a84f-dbec6bcbd8621033.mspx?mfr=trueFriday, July 28, 2006 7:04 AM
All replies
-
Richard,
AD issues in general used to be a serious concern for us for any AD or Exchange migration, much less MIIS implementation. That is until we made an "AD Health Check" mandatory before any AD dependent engagement. Ensynch's AD Health Check is based on the one PSS/ROSS developed and will apply (for free) for certain Enterprise level customers (talk to your TAM). We extended ours to be more deliverable friendly so we can demonstrate value and provide basic documentation on the existing AD deployment.
Since making the ADHC mandatory we no longer experience any project delays (outside of immediate outages) due to AD inconsistancies. In my time I have seen AD in some pretty poor states, but I would say replication issues are the most common. Kerberos time inconsistancies are typically lower on the list of "symptoms" in my experience. In other words, if you were to take a broad look at all of the symptoms and chase them down individually you'd have much more work on your hands as opposed to chasing down and troubleshooting replication and schema inconsistencies holistically. It's my experience that if you can solve the really big replication issues, that most of the other little problems will resolve themselves; because they were really symptoms of a much larger problem.
So, my recommendation is to first establish that replication is not broken. If you can find the cuase of a replication or NC replication failure then you can typically resolve any Kerberos problems. On rare occasions (back in the W2k pre-SP4 days) I would need to reset the secure channel of the DC to resolve replication "Access Denied" issues (those are some of the worst ones to deal with).
As for tools - start with DCDIAG, REPLMON, and the like. You need to isolate the DCs that are not replicating and then verify the DNS records (DNSLINT is a good start) and that you have proper connectivity (no intermediate firewalls blocking any interesting ports).Wednesday, July 26, 2006 2:30 PM -
Thank you for your response, Brad! I agree with a ADHC mandatory drill. Especially with our Simple Sign-On offerings that absolutely require a "Healty AD". Unfortunately, there does not seem to be a "developer-friendly" approach to this
that I was hoping for.
Thursday, July 27, 2006 7:22 PM -
Yesterday I read through MIIS Series 1.4 and in one of the chapters the following is recommended:
3.
Increase the default Kerberos version 5 authentication protocol time-out value on the MIIS 2003 with SP1 server by adding the registry parameter KdcWaitTime to the following registry key and setting the time-out value to 30 seconds. This time-out value must be increased from the default of 5 seconds to ensure that you do not experience Kerberos protocol time-out issues caused by network latency.
1.
Start Registry Editor.
2.
Under the HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ Lsa\Kerberos\Parameters key, create a REG_DWORD value named KdcWaitTime and set its value to 30 (seconds).
3.
Restart the MIIS 2003 with SP1 server for the changes to take effect.
See: http://www.microsoft.com/technet/security/topics/identitymanagement/idmanage/P2Ident_4.mspx?mfr=true
In the same series, but then the Password management section there is a note under chapter Troubleshooting (table 6.1) about time skew. It says:
"Make sure that the SPN set for the MIIS 2003 with SP1 service account is correct."
and "Check the time skew between the domain controller and the MIIS server. For Kerberos protocol authentication, this time skew should by default not be more than five minutes."See: http://www.microsoft.com/technet/security/topics/identitymanagement/idmanage/p2pass_5.mspx?mfr=true
and http://technet2.microsoft.com/WindowsServer/en/library/6ee8470e-a0e8-40b2-a84f-dbec6bcbd8621033.mspx?mfr=trueFriday, July 28, 2006 7:04 AM -
Thank you, Danny! I missed this as part of the SP1 update. I went back to one of my old stand-alone VMs that I could never get this working on, installed MIIS SP1, changed this registry setting, and then unicodePwd started to work. Admittedly, I'm not sure if it was an SP1 fix or this registry setting, but nonetheless, it works!Friday, July 28, 2006 2:58 PM
-
That is strange - I would expect the Default Domain Policy to override this, but it may not be covered by this policy like the main domain wide Kerberos settings are.
I've never had to mess with the SPN of the service account, and time skew will affect more than just MIIS.Saturday, July 29, 2006 2:33 AM -
Ok, I was able to confirm that the KDCWaitTime is not part of Group Policy, but if you want to use Group Policy to set this value (and some other useful Kerberos values for troubleshooting) then you can use an ADM template I've created. Now, the caveat is that using an ADM template is effectively the same as importing a .REG file since it tattoo's the registry (removing the policy does not remove the setting).
Here is the ADM template file, just copy this code into "kerberos.adm" and import it into your Group Policy by right clicking the Administrative Templates node in the GP and selecting the "Add/Remove Templates" option:
#if version <= 2
CLASS MACHINE
CATEGORY !!GPEOnly
POLICY !!GPEOnlyPolicy
KEYNAME "Software\Policies"
PART !!GPEOnly_Tip1 TEXT
END PART
PART !!GPEOnly_Tip2 TEXT
END PART
END POLICY
END CATEGORY
#endif
#if version >= 3
CLASS MACHINE
CATEGORY !!KRB_PARAMS
KEYNAME "SYSTEM\CurrentControlSet\Control\Lsa\Kerberos\Parameters"
POLICY !!SET_MAXPACKETSIZE
EXPLAIN !!MAXPACKETSIZE_HELP
PART !!MAXPACKETSIZE NUMERIC REQUIRED
VALUENAME "MaxPacketSize"
MIN 1 MAX 2000 DEFAULT 1465
END PART
PART !!MAXPACKETSIZE_TIP TEXT
END PART
END POLICY
POLICY !!LOGLEVEL
EXPLAIN !!LOGLEVEL_HELP
VALUENAME "LogLevel"
END POLICY
POLICY !!SET_KDCWAITTIME
EXPLAIN !!KDCWAITTIME_HELP
PART !!KDCWAITTIME NUMERIC REQUIRED
VALUENAME "KdcWaitTime"
MIN 1 MAX 300 DEFAULT 10
END PART
END POLICY
POLICY !!SET_KDCBACKOFFTIME
EXPLAIN !!KDCBACKOFFTIME_HELP
PART !!KDCBACKOFFTIME NUMERIC REQUIRED
VALUENAME "KdcBackoffTime"
MIN 1 MAX 300 DEFAULT 10
END PART
END POLICY
POLICY !!SET_KDCSENDRETRIES
EXPLAIN !!KDCSENDRETRIES_HELP
PART !!KDCSENDRETRIES NUMERIC REQUIRED
VALUENAME "KdcSendRetries"
MIN 0 MAX 99 DEFAULT 3
END PART
END POLICY
END CATEGORY
#endif
[strings]
GPEOnly_Tip1="The kerberos.adm file is for Windows 2000 and later only"
GPEOnly_Tip2="None of its policies will be displayed here"
GPEOnly="GP Only"
GPEOnlyPolicy="Kerberos.ADM"
KRB_PARAMS=Kerberos Parameters
SET_MAXPACKETSIZE=Set MaxPacketSize
MAXPACKETSIZE_HELP=The Windows 2000/2003 Kerberos Authentication package is the default in Windows 2000/2003. It coexists with challenge/response (NTLM) and is used in instances in which both a client and server can negotiate Kerberos. Request for Comments (RFC) 1510 states that when a client contacts the Key Distribution Center (KDC), it should send a User Datagram Protocol (UDP) datagram to port 88 at the KDC's IP address. The KDC should respond with a reply datagram to the sending port at the sender's IP address.\n\nWindows 2000/2003, by default, uses UDP when the data can be fit in packets under 2,000 bytes. Any data above this value uses TCP to carry the packets.\n\nNOTE:\nIn Windows 2000 the default value is 2,000 bytes.\nIn Windows 2003 the default value is 1465
MAXPACKETSIZE=Bytes:
MAXPACKETSIZE_TIP=Range is from 1 to 2000. Use 1 to force Kerberos to use TCP.
LOGLEVEL=Kerberos Event Logging
LOGLEVEL_HELP=Windows offers the capability of tracing detailed Kerberos events through the event log mechanism. You can use this information when you troubleshoot Kerberos. All Kerberos errors are logged to the System log.
SET_KDCWAITTIME=KDCWaitTime
KDCWAITTIME_HELP=This value is the time Windows waits for a response from a KDC.
KDCWAITTIME=Seconds:
SET_KDCBACKOFFTIME=KDCBackoffTime
KDCBACKOFFTIME_HELP=This value is the time between successive calls to the KDC if the previous call failed.
KDCBACKOFFTIME=Seconds:
SET_KDCSENDRETRIES=KDCSendRetries
KDCSENDRETRIES_HELP=This value is the number of times that a client will try to contact a KDC.
KDCSENDRETRIES=Retries:
Disclaimer: Always validate these changes in a test environment - this is provided as-is with no guarantees!
I have used the "MaxPacketSize" entry to troubleshoot networks that were inadvertently filtering UDP across WAN links which kills Kerberos. Setting this value to 1 will force AD to use TCP which if it solves the authentication issues confirms a UDP issue.Monday, July 31, 2006 5:32 PM -
Ok, I've posted some additional information on my blog concerning this:
http://idchaos.blogspot.com/2006/07/miis-solution-troubleshooting-kerberos.htmlMonday, July 31, 2006 10:56 PM -
Brad, thank you for providing such a thorough answer! I will put this to use the next time I encounter issues.Wednesday, August 2, 2006 9:00 PM