Answered by:
Windows Server 2008 DC Slow Boot After Power Failure

Question
-
Just discovered a problem with our AD servers. We have had two long power failures in on two weeks, and we are told they still haven't found the problem. So expecting more. Joy.
Situation: 3 DC's, 11 application servers.
DC1 - Windows 2008 Standard x64: DC, DNS (AD Integrated), DHCP (60% Scope), IAS (Wireless Auth)
DC2 - Windows 2008 Standard x64: DC, DNS (AD Integrated), DHCP (40% Scope), IAS (Wireless Auth)
DC3 - Windows 2008 Enterprise x64: DC, DNS (AD Integrated), Certificate Services
All systems on the network shutdown properly after the power failure, before the UPS turns off. When the power is restored, the UPS starts up, and all of the machines start up. Inital problem was that the application servers that had services using AD accounts. When investigating this, I found that the 3 DC's are taking up to 10 minutes to boot up.
I am fairly certain that this is a DNS related issue. From looking through the various events in the log, nearly every warning points to not being able to contact a logon server or resolve a name.
Initially I thought this was due to all 3 servers starting up together and 'conflicting' whilst trying to do an update. I then tried booting a single server to see if the issue went away, but it didn't. It still took up to 10 minutes to load, mainly hanging on the 'Applying Computer Settings' page. After this first server had booted, the other two booted in normal time (around 2 minutes from cold). They also had no errors in the logs during this boot.
DC1
IP: 10.28.120.21
DNS 1: 10.28.120.22
DNS 2: 10.28.120.21
DC2
IP: 10.28.120.22
DNS 1: 10.28.120.23
DNS 2: 10.28.120.22DC3
IP: 10.28.120.23
DNS 1: 10.28.120.21
DNS 2: 10.28.120.23NetBIOS is disabled on every machine in the network, and we have no WINS servers.
The servers that we have for the DC's are HP ProLiant DL360 G4p. They use Broadcom NIC's, using teaming. The drivers have been updated to the latest version, but this hasn't made a difference. Windows is also fully up to date.
Does anyone know if there is a way to speed up the boot process? I see a bit of a chicken/egg situation with AD-Integrated DNS, where one can't start without the other. Not sure of a way round it though - other than going back to a non-integrated DNS solution.
Just thought, if I enabled LMHOSTS lookup or NetBIOS on each of the three DC's and configured them with the names and IP addresses of the three servers, would that then get round the problem? I am guessing not, as AD requires DNS, but maybe someone could clarify that?
Thanks!
Saturday, May 22, 2010 2:20 PM
Answers
-
There are pros and cons to pointing DCs to themselves for DNS and other servers for DNS. In your case, having the servers point to each other for DNS is a problem if they are all powered up at the same time as you experienced. If they point to themselves, they should boot faster, but you will see errors in the event log about AD complaining that the DNS zone was not loaded. In this scenario, if you cannot control the boot-up order of the systems, there is not much you can do about the long wait time.
I dont think your LMHOSTS suggestion would be helpful; and it adds an additional component to manage as you add/remove DCs in your environment.
Your focus should be to have uninterrupted power. It should not be a common occurrence to have this sitution in your datacenter. I understand that may be easier said than done. However, the power is the root cause.
Visit my blog: anITKB.com, an IT Knowledge Base.- Proposed as answer by Meinolf Weber Sunday, May 23, 2010 11:18 AM
- Marked as answer by Craig Tolley Monday, May 24, 2010 8:06 AM
Saturday, May 22, 2010 2:59 PM
All replies
-
There are pros and cons to pointing DCs to themselves for DNS and other servers for DNS. In your case, having the servers point to each other for DNS is a problem if they are all powered up at the same time as you experienced. If they point to themselves, they should boot faster, but you will see errors in the event log about AD complaining that the DNS zone was not loaded. In this scenario, if you cannot control the boot-up order of the systems, there is not much you can do about the long wait time.
I dont think your LMHOSTS suggestion would be helpful; and it adds an additional component to manage as you add/remove DCs in your environment.
Your focus should be to have uninterrupted power. It should not be a common occurrence to have this sitution in your datacenter. I understand that may be easier said than done. However, the power is the root cause.
Visit my blog: anITKB.com, an IT Knowledge Base.- Proposed as answer by Meinolf Weber Sunday, May 23, 2010 11:18 AM
- Marked as answer by Craig Tolley Monday, May 24, 2010 8:06 AM
Saturday, May 22, 2010 2:59 PM -
Thanks Jorge.
It would be great to have enough UPS capacity to keep 1 DC running for up to 6 hours, (which is around the length of time we had an outage for).
I will try and adjust our shutdown times to keep at least one DC running for as long as possible, to try and alleviate these issues in the future.
Thanks for your response.
Monday, May 24, 2010 8:09 AM