Weird server behavior --- server will not allow for rdp nor local login nor remote access of management console
Monday, March 11, 2013 2:00 PM
I have 3 offices. Each with a AD server. Originally server fs was role holder for all fsmo roles. It also had dns, and cert services. DNS is via DFS as well as sysvol. About 3 weeks ago, FS died suddenly and without the ability to restore. I seized all of the roles forcefully from FS to server D. Dheld the roles while I rebuilt FS on a new hardware. Named the same, added back to domain, added original roles back.
Have since added a third server, DC, and it holds all of the roles now. Transferred off roles successfully and without errors.
Now have very strange errors on FS and D. Have rebuilt WMI with wbem cmd. Have reset ip using netsh command.
Here's what happens. I can reboot either of D or FS. When I do, everything works ok, with only some minor barks in the event logs. At this point, standard communication tests will be successful. RPC works, portquery is happy, I can log in remotely or locally, no problem. After a short period of time, 4 minutes, 4 hours or 72 hours (never more than 72 hours), the servers, D and FS will go into a state where they cannot communicate with each other, rpc filure begin to occur and then I cannot log into either server - locally nor remote, AD communications fail. Clients can still log into network shares and print. DNS works for clients.
This never, not even once, has happened to the new server DC.
As soon as I reboot - have to hard reset the server each time, the issues go away for a little while. I would normally say, firewall, or port blocked, etc. I do not believe this to be the case here. I would next feel like it may be a network stack issue. Not sure about that, as it works, then stops...
I need some help troubleshooting this. Any ideas? I suspect that it may be related to AD, security, etc... or bad metadata in AD because of the forced move.
Or, and this is what I think may actually be at the root of the issues, is that DFS does not seem to work well in this environment and since we are using DFS for server to server AD updates, it may be some part of this which is broken, and affects DNS, and AD and then other services time out and become unhappy.
Additional information, it seems like the backups stall at "scanning system files", so, the backups are also "timing out" - I have removed, rebooted and reinstalled the server backup software. Still not working.
On the FS server, I also get SPN errors, access denied when trying to create the WRM instance when restarting the server or service.
- Edited by David M Nuvo Monday, March 11, 2013 3:55 PM
Monday, March 11, 2013 7:37 PM
Looks like a nightmare scenario what you have in hands... but let's try to help you.
I found some interesting things from what you wrote:
1. Did you put the same name in the FS which was also a DC? Have you cleaned up the AD metadata before? In this case I would recommend to put another name to this new server.... here can be part of the problem...
2. DFS- have you checked replciation between nodes? DCDIAG results? REPADMIN status? Look to the folloowing links to verify the replication between DCs: http://technet.microsoft.com/en-us/library/cc811551%28v=ws.10%29.aspx, http://technet.microsoft.com/en-us/library/cc949120%28v=ws.10%29.aspx
3. You told also about Certificate Services... did you had the ROOTCA there? Have you saved the private key to rebuild it?
Just some questions to get a better overview...
MCP | Microsoft Certified Professional
MCTS 70-640 | Microsoft Certified Technology Specialist: Windows Server 2008 Active Directory, Configuration
MCTS 70-642 | Microsoft Certified Technology Specialist: Windows Server 2008 Network Infrastructure, Configuration
MCTS 70-680 | Microsoft Certified Technology Specialist: Windows 7, Configuration
Monday, March 11, 2013 7:48 PM
I currently cannot rename FS. Too many legacy client machines point to this machine by name.
Repadmin works fine after a reboot. Once communications go south, then no tools work. Synching right after a reboot works fine among all DC's. All tools report everything as fine. Have run some dcdiags. But not too familiar, what would you recommend?
Yes, FS and D are both AD servers. Yes, FS was a root level ent CA. I just reissued new certs to all the existing servers. Seemed to work ok.
Which dcdiag's would you suggest running?
Yeah, it is a real pain the the arse. I had to reboot both servers at lunch today. Once the communication is lost, people at the office cannot log into the domain, and they lose access to shared printers and drives. Only a reboot fixes the issue - and that it only temporary.
Have GFI AV on the servers. Maybe that has something to do with it? No fw via the AV product.
Tuesday, March 12, 2013 10:12 AMCould you not rename the FS and add a dns name for the old FS name to point to the new one, that would allow your legacy clients to still see it on the old name.
- Marked As Answer by Cheers ZHANGMicrosoft Contingent Staff, Moderator Tuesday, March 19, 2013 8:09 AM
Monday, April 01, 2013 3:37 PMTurns out that removing our Antivirus - GFI Vipre - now, all is ok with the universe. FYI.