none
Where do I start when troubleshooting File Replication issues on domain controllers....?

    Question

  • I recently (a couple weeks ago) became part of a small, 2 person IT team.  In looking at event viewer, we seem to have several issues going on with replication among the dc's.  The entire network is contained on one subnet.  There are a total of 3 domain controllers.  Server C holds all FSMO roles, all are global catalog servers, they are all Active Directory Integrated, and the domain functional level is Windows 2000 Native. 

    Server A:  NTDS Replication Warning 2089, no DNS errors, File Replication warning regarding replication from Server C to Server A (event viewer is full of this error!)

    Server B: DNS Event Viewer is FULL of Warning Event 5501 (bad packets), File Replication Service Warning 13508 regarding replication from Server C to Server B.

    Server C: Directory Service errors 1864, 2042 (2042 pertains to Server A).  Intermittent File Replication Service errors 13568 (journal wrap errors) listed on 2/23, 3/6 and 5/12. 

    DNS forwarders set up on Server A only, none showing on B or C.

    DCDiag tests:  All three dc's failed systemlog, Server A & B failed frsevent, and 2 replications test passed even though it shows errors in replication between Server A & Server C and tombstone life warnings. 

    I believe that an old dc may have not been demoted properly.  Not sure exactly where to start in cleaning up these problems!  Any direction would be most appreciated! 

    Thanks,

    K. Nelson

    Tuesday, June 12, 2012 10:03 PM

Answers

  • Image**** the OS is W2K3, sp2
    Pakapp**** the OS is W2K8, sp2 and it is a virtual server
    Paksher**** the OS is W2K, sp4

    Which one is the PDC Emulator? Run a netdom query fsmo to see.

    And what is the VM host? If HyperV, you must *partially* disable the host's time service. If VMWare, you must full disable the host's time service. Read the VMWare KBs regarding this issue in the link below.

    Virtualizing Domain Controllers and the Windows Time Service
    Published by acefekay on Aug 23, 2011 at 1:15 AM
    http://msmvps.com/blogs/acefekay/archive/2011/08/23/virtualizing-domain-controllers-and-the-windows-time-service.aspx

    and

    Running Domain Controllers in Hyper-V
    http://technet.microsoft.com/en-us/library/d2cae85b-41ac-497f-8cd1-5fbaa6740ffe(v=ws.10)#bkmk1_planning_to_virtualize_domain_controllers

    .

    And I hope you've never used the host's snapshot feature and restored a DC from a snapshot! Never do that!!!!

    .

    One final concern that may or may not factor in –
    Could we have some type of intermittent connectivity issue that is not evident?  I ask because occasionally during the day, I will have problems remotely accessing the Pakapp** dc.  I immediately ping the server, which responds fine.  Usually a minute or two later, I can access remotely again.  Am I wrong to assume that just because the nic status shows that it’s been connected 39 days, it is therefore always connected?  Should this be a concern and if so, how would you advise checking for connectivity issues?

    Sounds like it's hardware related - either a bad NIC, wire or switch port.

    Maybe this could be the cause of your issues.

    .

    Finally, I ran dcdiag /v on all the dc’s this afternoon and the only one showing a couple failures is Paksher****, the one with the failing drives.  Just to add some more comedy to the situation, I made a visit to the server room this morning to physically take a look and discovered that 2 of the 3 power supplies are also dead! 

    You have a lot on your plate!

    .


    Ace Fekay
    MVP, MCT, MCITP EA, MCTS Windows 2008/R2, Exchange 2007 & Exchange 2010, Exchange 2010 EA, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This post is provided AS-IS with no warranties or guarantees and confers no rights.

    FaceBook Twitter LinkedIn

    Wednesday, June 20, 2012 11:58 PM

All replies

  • For starters if you have a polluted Directory Service
    http://blogs.dirteam.com/blogs/paulbergson/archive/2009/06/09/active-directory-cleanup-the-most-common-question-i-see.aspx

    I have not run across 2089 before but it appears to be an informational warning on a NTDS backup
    http://eventid.net/display.asp?eventid=2089&source=

    Is there any communication problems between the two DC's?
    http://eventid.net/display.asp?eventid=2089&source=
    http://eventid.net/display.asp?eventid=13508&eventno=349&source=NtFrs&phase=1

    The bad packets have nothing to do with AD, could this be anytype of hardware issue?  The 5501 message should be posted in the General Windows forum.

    2042 appears to be an Exchange/Appletalk issue.  I would suggest you repost in the Exchange forum.

    1864 shows you have rep issues and tombstone issues.  I would suggest you consider just demoting and promoting C, but you need to learn why you lost connectivity.  Once resolved repromote.  If you can't demote try the flag /forceremoval.

    --
    Paul Bergson
    MVP - Directory Services
    MCITP: Enterprise Administrator
    MCTS, MCT, MCSE, MCSA, Security+, BS CSci
    2008, Vista, 2003, 2000 (Early Achiever), NT4
    http://blogs.dirteam.com/blogs/paulbergson  Twitter @pbbergs
    Please no e-mails, any questions should be posted in the NewsGroup. This posting is provided "AS IS" with no warranties, and confers no rights.

    Wednesday, June 13, 2012 1:49 AM
    Moderator
  • Hi,

    All the above links will help you get started.. As you can see there are replication errors cropping up, we need to find out the reason why replication is failing..

    Are all of them in a single site or different sites?

    Download port query UI and make sure none of the ports are blocking between the DC's. Refer.. http://geekswithblogs.net/TSCustomiser/archive/2007/05/09/112357.aspx

    Once you have confirmed all ports are opened, try to replicate again.. Repadmin/syncall. Make sure you do this in a elvated command prompt

    Check the errors.

    Out of the 3 servers i beleive all are active? Are you able to open their sysvol folders from run command..? Type \\servername\ from there you can browse..

    In terms of DNS, all DC's should point themselves as Primary DNS server and Other DC's as secondary DNS server in TCP/IP properties.


    Regards, Mohan R Sr. Administrator - Server Support


    Wednesday, June 13, 2012 3:58 AM
  • And just to add, uninstall any AV software to insure that their network protection features are not blocking necessary traffic. I had a customer with that issue about 18 months ago. Darn AV stopped replication cold.

    .

    Server Engineer suggested PortQry to determine any blocked ports. Just to add, choose the "Domain & Trusts" option when you run it between all the DCs, so you will be running it 6 times: A to B & C, B to A & B, C to A & B.

    .

    I totally agree with Paul about "C" with the 1864. FYI, just to add, you may have to run a metadata cleanup to remove its reference after the /forceremoval (link below).

    .

    For the any old DCs that you believe haven't been demoted properly, they need to be cleaned out of the AD database. However, none of your Event log errors, based on your post, indicate any other DCs are not replicating. More info on the cleanup process in the following link. I would just follow the procedure to see if there are any. If not, just cancel it.

    Complete Step by Step Guideline to Remove an Orphaned Domain controller (including seizing FSMOs, running a metadata cleanup, cleanup DNS (Nameserver tab), AD Sites (old DC references), and more)
    Published by Ace Fekay, MCT, MVP DS on Oct 5, 2010 at 12:14 AM
    http://msmvps.com/blogs/acefekay/archive/2010/10/05/complete-step-by-step-to-remove-an-orphaned-domain-controller.aspx

    .

    And just to double check, let's make sure there are no duplicate AD integrated zones in the AD database:

    Using ADSI Edit to Resolve Conflicting or Duplicate AD Integrated DNS zones
    Published by acefekay on Sep 2, 2009 at 2:34 PM  7748  2
    http://msmvps.com/blogs/acefekay/archive/2009/09/02/using-adsi-edit-to-resolve-conflicting-or-duplicate-ad-integrated-dns-zones.aspx

    .

    .

    Are the DCs in one Site or spread among multiple locations?


    Ace Fekay
    MVP, MCT, MCITP EA, MCTS Windows 2008/R2, Exchange 2007 & Exchange 2010, Exchange 2010 EA, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This post is provided AS-IS with no warranties or guarantees and confers no rights.

    FaceBookTwitterLinkedIn


    Wednesday, June 13, 2012 4:40 AM
  • Thank you all so very much for pointing me in the right direction!  


    Event 2089 was related to Active Directory partition back up. Our Symantec product had stopped working correctly and the other IT person is working on resolving that issue.

    The DNS event viewer is full of Event 5501 (bad packets)...I assumed that since it showed up in the DNS logs it was an issue.  There are Event 590 warnings in the Application Log, and Event 1 error messages in the System log pertaining to the battery.  From what I can tell, the battery either needs reconditioned or replaced and the firmware upgraded on the server.  I was going to put this off until the replication problems were fixed...however, it makes sense that these could have something to do with the issues at hand so I will add those tasks to my priority list. 

    Event 2042 - We don't use Exchange, and we only have one Mac on the network.  This is the exact message:
    Log Name:      Directory Service
    Source:        Microsoft-Windows-ActiveDirectory_DomainService
    Date:          6/12/2012 12:55:27 PM
    Event ID:      2042
    Task Category: Replication
    Level:         Error
    Keywords:      Classic
    User:          ANONYMOUS LOGON
    Computer:      {SERVERC}.{domain}.com
    Description:
    It has been too long since this machine last replicated with the named source machine. The time between replications with this source has exceeded the tombstone lifetime. Replication has been stopped with this source.

     The reason that replication is not allowed to continue is that the two DCs may contain lingering objects.  Objects that have been deleted and garbage collected from an Active Directory Domain Services partition but still exist in the writable partitions of other DCs in the same domain, or read-only partitions of global catalog servers in other domains in the forest are known as "lingering objects".  If the local destination DC was allowed to replicate with the source DC, these potential lingering object would be recreated in the local Active Directory Domain Services database.

    Time of last successful replication:
    2011-12-19 06:46:08
    Invocation ID of source directory server:
    3ec5a3bc-1691-40d2-9a64-17387be83b95
    Name of source directory server:
    b41adbb2-c739-47ad-bb98-e1216516bf35._msdcs.***.com
    Tombstone lifetime (days):
    60

    The replication operation has failed.


    User Action:
      The action plan to recover from this error can be found at http://support.microsoft.com/?id=314282.

     If both the source and destination DCs are Windows Server 2003 DCs, then install the support tools included on the installation CD.  To see which objects would be deleted without actually performing the deletion run "repadmin /removelingeringobjects <Source DC> <Destination DC DSA GUID> <NC> /ADVISORY_MODE". The eventlogs on the source DC will enumerate all lingering objects.  To remove lingering objects from a source domain controller run "repadmin /removelingeringobjects <Source DC> <Destination DC DSA GUID> <NC>".

     If either source or destination DC is a Windows 2000 Server DC, then more information on how to remove lingering objects on the source DC can be found at http://support.microsoft.com/?id=314282 or from your Microsoft support personnel.

     If you need Active Directory Domain Services replication to function immediately at all costs and don't have time to remove lingering objects, enable replication by setting the following registry key to a non-zero value:

    Registry Key:
    HKLM\System\CurrentControlSet\Services\NTDS\Parameters\Allow Replication With Divergent and Corrupt Partner

     Replication errors between DCs sharing a common partition can prevent user and compter acounts, trust relationships, their passwords, security groups, security group memberships and other Active Directory Domain Services configuration data to vary between DCs, affecting the ability to log on, find objects of interest and perform other critical operations. These inconsistencies are resolved once replication errors are resolved.  DCs that fail to inbound replicate deleted objects within tombstone lifetime number of days will remain inconsistent until lingering objects are manually removed by an administrator from each local DC.  Additionally, replication may continue to be blocked after this registry key is set, depending on whether lingering objects are located immediately.


    Alternate User Action:

    Force demote or reinstall the DC(s) that were disconnected.

    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="Microsoft-Windows-ActiveDirectory_DomainService" Guid="{0e8478c5-3605-4e8c-8497-1e730c959516}" EventSourceName="NTDS Replication" />
        <EventID Qualifiers="49152">2042</EventID>
        <Version>0</Version>
        <Level>2</Level>
        <Task>5</Task>
        <Opcode>0</Opcode>
        <Keywords>0x8080000000000000</Keywords>
        <TimeCreated SystemTime="2012-06-12T17:55:27.162605400Z" />
        <EventRecordID>13035</EventRecordID>
        <Correlation />
        <Execution ProcessID="640" ThreadID="1032" />
        <Channel>Directory Service</Channel>
        <Computer>SERVERC.{DOMAIN}.com</Computer>
        <Security UserID="S-1-5-7" />
      </System>
      <EventData>
        <Data>2011-12-19 06:46:08</Data>
        <Data>3ec5a3bc-1691-40d2-9a64-17387be83b95</Data>
        <Data>b41adbb2-c739-47ad-bb98-e1216516bf35._msdcs.paksher.com</Data>
        <Data>60</Data>
        <Data>Allow Replication With Divergent and Corrupt Partner</Data>
        <Data>System\CurrentControlSet\Services\NTDS\Parameters</Data>
      </EventData>
    </Event>

    Mohan, all of the domain controllers are in one single site.  Yes, I can access and open the sysvol share of all three from a run command.I will check the IP addresses in each server today.

    I will start implementing all of your suggestions over the next day or two and will post an update either at the end of this week or first thing next week.  Thanks again for all of the help!

                 
    • Edited by KatNelson Wednesday, June 13, 2012 1:54 PM
    Wednesday, June 13, 2012 1:53 PM
  • After working on this last week, it appears that we have cleaned up a majority of the issues.  Replication appears to be working, and we are no longer seeing any of the "tombstone" messages.  I ran FRSDIAG tests on all dc's this morning with the following results - I will work on troubleshooting these today unless anyone has something specific they can share regarding the results:

    RESULTS FROM SERVER A:
    FAILED repadmin /showreps
    Default-First-Site-Name\SERVERA
    DSA Options : IS_GC
    objectGuid  : b8a3f55a-42a8-4c00-81e3-e36648501aaa
    invocationID: c959caf9-3458-44f9-9eeb-2c916502c348

    DsBindWithCred to 192.168.1.254 failed with status 87 (0x57):


    RESULTS FROM SERVER C:
    FAILED repadmin /showreps
    Default-First-Site-Name\SERVERC
    DSA Options : IS_GC
    objectGuid  : b41adbb2-c739-47ad-bb98-e1216516bf35
    invocationID: 3ec5a3bc-1691-40d2-9a64-17387be83b95

    DsBindWithCred to 192.168.1.247 failed with status 87 (0x57):
    <SndCsMain:                     4336:   877: S0: 15:38:46> :SR: Cmd 01444270, CxtG 795eab1d, WS ERROR_RETRY, To   SERVERA.(DOMAIN).com Len:  (366) [SndFail - rpc call]
    <SndCsMain:                     4336:   904: S0: 15:38:46> :SR: Cmd 01444270, CxtG 795eab1d, WS ERROR_RETRY, To   SERVERA.(DOMAIN).com Len:  (366) [SndFail - Send Penalty]
    <SndCsMain:                     2660:   877: S0: 15:39:04> :SR: Cmd 01447340, CxtG 82c79f29, WS ERROR_RETRY, To   SERVERA.(DOMAIN).com Len:  (366) [SndFail - rpc call]

    RESULTS FROM SERVER B:

    Default-First-Site-Name\SERVERB
    DSA Options : IS_GC
    objectGuid  : c81f2f5c-934f-4e7a-b346-ad12c49089d5
    invocationID: c81f2f5c-934f-4e7a-b346-ad12c49089d5

    DsBindWithCred to 192.168.1.101 failed with status 87 (0x57):

    <SndCsMain:                     5436:   895: S0: 15:32:48> :SR: Cmd 01620cd0, CxtG 49e1c4de, WS ERROR_RETRY, To   SERVERA.(DOMAIN).com Len:  (370) [SndFail - Send Penalty]
    <SndCsMain:                     5436:   868: S0: 15:37:49> :SR: Cmd 01620be8, CxtG 49e1c4de, WS ERROR_RETRY, To   SERVERA.(DOMAIN).com Len:  (370) [SndFail - rpc call]
    <SndCsMain:                     5436:   895: S0: 15:37:49> :SR: Cmd 01620be8, CxtG 49e1c4de, WS ERROR_RETRY, To   SERVERA.(DOMAIN).com Len:  (370) [SndFail - Send Penalty]

    I also ran DCDIAG with the /DnsAll /e /v switches and the only tests that failed were the systemlog tests ( with last week's errors before we did our clean up).

    The only other area that remains a problem is our firewall.  When I run the Port Query tool, these issues remain:

    SERVER A TO SERVER C
    UDP port 389 (unknown service): LISTENING or FILTERED
    UDP port 88 (kerberos service): LISTENING or FILTERED
    portqry.exe -n 192.168.1.247 -e 88 -p BOTH exits with return code 0x00000002
    UDP port 137 (netbios-ns service): LISTENING or FILTERED
    UDP port 138 (netbios-dgm service): LISTENING or FILTERED
    TCP port 42 (nameserver service): NOT LISTENING
    portqry.exe -n 192.168.1.247 -e 42 -p TCP exits with return code 0x00000001

    SERVER A TO SERVER B
    UDP port 389 (unknown service): LISTENING or FILTERED
    UDP port 88 (kerberos service): LISTENING or FILTERED
    portqry.exe -n 192.168.1.101 -e 88 -p BOTH exits with return code 0x00000002
    UDP port 137 (netbios-ns service): LISTENING or FILTERED
    UDP port 138 (netbios-dgm service): LISTENING or FILTERED
    portqry.exe -n 192.168.1.101 -e 138 -p UDP exits with return code 0x00000002


    SERVER B TO SERVER A
    UDP port 389 (unknown service): LISTENING or FILTERED
    UDP port 88 (kerberos service): LISTENING or FILTERED
    UDP port 137 (netbios-ns service): LISTENING or FILTERED
    UDP port 138 (netbios-dgm service): LISTENING or FILTERED
    TCP port 42 (nameserver service): NOT LISTENING
    portqry.exe -n 192.168.1.254 -e 42 -p TCP exits with return code 0x00000001.


    SERVER B TO SERVER C
    UDP port 389 (unknown service): LISTENING or FILTERED
    UDP port 88 (kerberos service): LISTENING or FILTERED
    UDP port 137 (netbios-ns service): LISTENING or FILTERED
    UDP port 138 (netbios-dgm service): LISTENING or FILTERED
    TCP port 42 (nameserver service): NOT LISTENING
    portqry.exe -n 192.168.1.247 -e 42 -p TCP exits with return code 0x00000001.


    SERVER C TO SERVER B
    UDP port 389 (unknown service): LISTENING or FILTERED
    UDP port 88 (kerberos service): LISTENING or FILTERED
    portqry.exe -n 192.168.1.254 -e 88 -p BOTH exits with return code 0x00000002
    UDP port 137 (netbios-ns service): LISTENING or FILTERED
    UDP port 138 (netbios-dgm service): LISTENING or FILTERED
    portqry.exe -n 192.168.1.254 -e 138 -p UDP exits with return code 0x00000002
    TCP port 42 (nameserver service): NOT LISTENING
    portqry.exe -n 192.168.1.254 -e 42 -p TCP exits with return code 0x00000001.

    SERVER C TO SERVER A
    UDP port 389 (unknown service): LISTENING or FILTERED
    UDP port 88 (kerberos service): LISTENING or FILTERED
    portqry.exe -n 192.168.1.101 -e 88 -p BOTH exits with return code 0x00000002
    UDP port 137 (netbios-ns service): LISTENING or FILTERED
    UDP port 138 (netbios-dgm service): LISTENING or FILTERED
    portqry.exe -n 192.168.1.101 -e 138 -p UDP exits with return code 0x00000002

    I think we will have to search the Watchguard forums for help in configuring those port settings on our Firebox firewall, as we grew more confused every time we tried to work on them (weren't sure if the rules were incoming, outgoing, etc.)

    I am still seeing event 5501 bad packet errors, and I will do some more research on that issue. 

    Friday night I re-conditioned the battery that was showing errors on Server B. (AFAMGT, event 1 & VxSvc_Perc2Pro event 590).  Event viewer is still throwing both of these errors, so we may need to invest in a new battery.

    If anyone has any suggests, fire away!  Otherwise, I will keep busy today researching the errors currently at hand.  Thank you all!

    Monday, June 18, 2012 4:10 PM
  • For those PortQry blocks, you're ok there. Here's more in on that:

    "At times you may see errors such as The RPC server is unavailable or There are no more endpoints available from the endpoint mapper ..."
    Also, if you get return codes 0x0000002 or 0x0000001, it may simply mean that PortQRY is checking the UDP port and not TCP, which that service may be listening on. Quoted from the blog in the following link...
    "[...] If you get a LISTENING or FILTERED response, check and see whether we are checking TCP or UDP, most likely it was attempting to use UDP and this would be a normal response as UDP is connectionless. An example of this would be if you query port 88 for Kerberos against a DC and use the following syntax:
    Portqry –n server1 –e 88 –p both [...]"
    Using PortQry for Troubleshooting, by the DS Team [MSFT]
    http://blogs.technet.com/b/askds/archive/2009/01/22/using-portqry-for-troubleshooting.aspx

    .

    Those repadmin results still indicate there is a replication issue, so I am not sure what you mean that replication was fixed, unless I missed something?

    Any new event log errors on the DCs besides what's been posted?

    .

    Run dcdiag /v and post the results to your free Skydrive or other sharing site, and provide us a link.

    .

    Also post an unedited ipconfig /all from those three DCs, please.

    .

    I'm not sure why we're messing with the Watchguard rules. Are you saying there are rules set for the VPN tunnel connections between locations? They should be left wide opened, with no restrictions for the whole rangee for TCP & UDP, ports 1 - 65535.

    .


    Ace Fekay
    MVP, MCT, MCITP EA, MCTS Windows 2008/R2, Exchange 2007 & Exchange 2010, Exchange 2010 EA, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This post is provided AS-IS with no warranties or guarantees and confers no rights.

    FaceBook Twitter LinkedIn

    Monday, June 18, 2012 7:47 PM
  • After a busy morning, I was able to finish putting all the reports together.  Here is the skydrive link:

    http://sdrv.ms/LC3z60

    I think I included everything you had asked for.  Thanks again!

    K. Nelson

    Tuesday, June 19, 2012 5:18 PM
  • Thanks for posting that.

    Kind of a busy day for me today. Numerous errors. Some of which I see are hard drive problems? That can cause this, too, But what I see so far glancing through the dcdiag and this is only part of it...

    .

    FRS is not running on PAKAPP1.paksher.com. - Is it running? Check services, please

    .

    Event Type: Warning
    Event Source: DNS
    Event Category: None
    Event ID: 5501
    Date:  6/18/2012
    Time:  7:38:14 AM
    User:  N/A
    Computer: PAKSHER2
    Description:
    The DNS server encountered a bad packet from 75.98.29.1.  Packet processing leads beyond packet length.

    Are you using a Forwarder? If so, what forwarder are you using?

    .

             [Replications Check,PAKAPP1] A recent replication attempt failed:
                From IMAGE2 to PAKAPP1

    Have you disabled all firewalls and uninstalled any AV apps on all DCs?

    .

    Event Type: Error
    Event Source: KDC
    Event Category: None
    Event ID: 27
    Date:  6/17/2012
    Time:  12:43:21 PM
    User:  N/A
    Computer: IMAGE2
    Description:
    While processing a TGS request for the target server krbtgt/PAKSHER.COM, the account PAK-EXTRMGR$@PAKSHER.COM did not have a suitable key for generating a Kerberos ticket (the missing key has an ID of 8). The requested etypes were 18.  The accounts available etypes were 23  -133  -128  3  1.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    You can ignore this error if the client (PAK-EXTRMGR) is authenticating. It's indicating mismatched encryption due to OS client and AD version differences. MOre info:
    http://eventid.net/display-eventid-27-source-KDC-eventno-5627-phase-1.htm

    .

    As for your DNS settings in the NICs, I would only specify two DNS servers, the first being a partner, the second itself. The third may never get to be used due to the client side resolver time out process waiting for a reply and waiting for the other two to time out, so they're superfluous.

    .

    There are more...  As I get some time, I will peruse through it, unless someone else beats me to it and can take a look to see what I missed.

    .


    Ace Fekay
    MVP, MCT, MCITP EA, MCTS Windows 2008/R2, Exchange 2007 & Exchange 2010, Exchange 2010 EA, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This post is provided AS-IS with no warranties or guarantees and confers no rights.

    FaceBook Twitter LinkedIn

    Tuesday, June 19, 2012 6:02 PM
  • I forgot to add, here's more on the drive controller errors on PAKSHER2:
    http://eventid.net/display-eventid-590-source-VxSvc_Perc2Pro-eventno-5319-phase-1.htm

    .

    Application Logs
    Event Type: Warning
    Event Source: VxSvc_Perc2Pro
    Event Category: None
    Event ID: 590
    Date:  6/18/2012
    Time:  8:04:48 AM
    User:  N/A
    Computer: PAKSHER2
    Description:
    PERC 3/Di Controller 0 , On Array Disk 0:3 the failure prediction threshold exceeded due to test-No action needed.

    Event Type: Warning
    Event Source: VxSvc_Perc2Pro
    Event Category: None
    Event ID: 590
    Date:  6/18/2012
    Time:  8:04:42 AM
    User:  N/A
    Computer: PAKSHER2
    Description:
    PERC 3/Di Controller 0 , On Array Disk 0:2 the failure prediction threshold exceeded due to test-No action needed.

    Event Type: Warning
    Event Source: VxSvc_Perc2Pro
    Event Category: None
    Event ID: 590
    Date:  6/18/2012
    Time:  8:04:34 AM
    User:  N/A
    Computer: PAKSHER2
    Description:
    PERC 3/Di Controller 0 , On Array Disk 0:1 the failure prediction threshold exceeded due to test-No action needed.


    Ace Fekay
    MVP, MCT, MCITP EA, MCTS Windows 2008/R2, Exchange 2007 & Exchange 2010, Exchange 2010 EA, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This post is provided AS-IS with no warranties or guarantees and confers no rights.

    FaceBook Twitter LinkedIn

    Tuesday, June 19, 2012 8:07 PM
  • Thanks again for taking time to respond!

    File Replication Services is running on PAKAPP1 and is set to automatic startup type.

    Forwarders:
    Image 2 lists 
    "All other DNS domains" and "Do not use recursion for this domain" is selected

    Pakapp1 lists IP 206.255.244.169 (long01cpe45-170.tx.cablelynx.com)
    {The other IT person believes she got this from our ISP}
    and has "Use root hints if no forwarders are available" selected

    Paksher2 has no forwarders enabled


    Firewalls:
    "Somehow" the firewall was turned back on on Pakapp1 and Image2 even though we had checked this last week....I disabled them both today in services.
    From what I have read, Windows Server 2000 does not have a firewall.  Apparently you have to use TCP/IP filtering - I checked these settings on the nic and it is not enabled, and "Permit All" is set on the TCP ports, UDP ports, and IP Protocols.  IPSEC is not being used.  If there are any settings I need to be made aware of, please let me know as this is the only setting I have checked.  

    A/V
    No A/V on Image2
    Pakapp1 - had Malwarebytes only, uninstalled last week
    Paksher2 - uninstalled Malwarebytes last week (but had not rebooted machine yet - will do this tonight)
    Paksher2 also has Symantec Live Update installed, which was disabled last week, still showing disabled.

    DNS settings on NICs - I have read so many articles and they all seem to say something different! I read several that advise configurations in the way you advised, so I changed them this afternoon accordingly.

    I had a remote session with Dell this afternoon regarding the drive errors on Paksher2. We could not proceed with any in-depth investigation because the software he installed needed the server to reboot in order to function (and we did not want to shut down during company hours). I plan on staying after tonight and working on this. The technician is certain that 3 of the 5 drives is failing, if not already failed.  I am currently trying to talk the other IT member into demoting this server, setting up a replacement and de-commissioning this one as soon as practical. She actually has a brand new spare server sitting in the server room, so this seems like fate to me!

    Thanks again for your guidance. I sure appreciate your help!

    K. Nelson

    Tuesday, June 19, 2012 9:23 PM
  • I would uninstall AV until you get past this. Sometimes they have some sort of lingering service or feature still running.

    .

    As for Forwarders, you need to be consistent on all DNS servers.

    .

    Good luck with the failed drive!

    .

    Waht OS are your DCs? Will you be installing on the new one?

    .


    Ace Fekay
    MVP, MCT, MCITP EA, MCTS Windows 2008/R2, Exchange 2007 & Exchange 2010, Exchange 2010 EA, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This post is provided AS-IS with no warranties or guarantees and confers no rights.

    FaceBook Twitter LinkedIn

    Tuesday, June 19, 2012 9:30 PM
  • Okay.  I have all firewalls disabled and all A/V removed.  I was too busy putting out fires today to make any changes to the DNS Forwarders:
    As for Forwarders, you need to be consistent on all DNS servers.
    To clarify, I need to go enable forwarders on all the DC's, and use the same IP addresses, correct?  I'm planning on using the Google IP 8.8.8.8 & 8.8.4.4
    I'm not sure I'm clearly understanding what I'm reading out on the web about the "do not use recursion for this domain" setting on the Forwarders tab.  We currently have this selected - from what I'm reading, most admins don't advise this.  If I de-select this option, is there a chance it would cause me even more problems?  
    As far as the fires that I was busy working on, we had multiple password issues with users today.  Before I came on board the other IT person had enabled a maximum password age of 42 days on the default domain policy.  She had not enabled any setting for “Prompt user to change password before expiration” so I assumed everyone’s password was just expiring normally, and I went in and enabled the default of 14 days so they would be notified.  However, she mentioned that many users were saying that it was too soon to be seeing this again as they had just changed their password a week or two ago.  We were able to solve most access/network issues by having the user push CTL+ALT+DEL and change their password.  

    As far as event logs, these were the only outstanding in my opinion:
    Directory Service
    Event Type: Information
    Event Source: NTDS KCC
    Event Category: Knowledge Consistency Checker 
    Event ID: 1404
    Date: 6/18/2012
    Time: 1:42:38 PM
    User: NT AUTHORITY\ANONYMOUS LOGON
    Computer: IMAGE***
    Description:
    The local domain controller is now the intersite topology generator and 
    has assumed responsibility for generating and maintaining intersite 
    replication topologies for this site.
    For more information, see Help and 
    Support Center at http://go.microsoft.com/fwlink/events.asp.
    I found this entry out on the web, so I assume all is ok:
    "There is one domain controller for every site that generates the connection objects between sites. That role, known as the inter-site topology generator (ISTG), has moved from one domain controller to another. This is an expected occurrence.  If this message is logged frequently, there might be intermittent communication between the members of the site."

    Log Name:      System
    Source:        TermDD
    Date:          6/18/2012 11:30:17 AM
    Event ID:      56
    Task Category: None
    Level:         Error
    Keywords:      Classic
    User:          N/A
    Computer:      PAKAPP**.paksher.com
    Description:  The Terminal Server security layer detected an error in the protocol 

    stream and has disconnected the client.
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="TermDD" />
        <EventID Qualifiers="49162">56</EventID>
        <Level>2</Level>
        <Task>0</Task>
        <Keywords>0x80000000000000</Keywords>
        <TimeCreated SystemTime="2012-06-18T16:30:17.253637100Z" />
        <EventRecordID>44472</EventRecordID>
        <Channel>System</Channel>
        <Computer>PAKAPP**.paksher.com</Computer>
        <Security />
      </System>
      <EventData>
        <Data>\Device\Termdd</Data>
        
    <Binary>00000400010000000000000038000AC00000000038000AC0000000000000000
    0000000000000000032000AD0</Binary>  ???

    Log Name:      Directory Service
    Source:        Microsoft-Windows-ActiveDirectory_DomainService
    Date:          6/19/2012 3:45:55 PM
    Event ID:      1104
    Task Category: Knowledge Consistency Checker
    Level:         Information
    Keywords:      Classic
    User:          ANONYMOUS LOGON
    Computer:      PAKAPP*.paksher.com
    Description:
    The Knowledge Consistency Checker (KCC) successfully terminated the  following change notifications. 
     Directory partition:
    CN=Configuration,DC=paksher,DC=com 
    Destination network address:
    b41adbb2-c739-47ad-bb98-e1216516bf35._msdcs.paksher.com 
    Destination directory service (if available):
    CN=NTDS Settings,CN=IMAGE*,CN=Servers,CN=Default-First-Site- Name,CN=Sites,CN=Configuration,DC=paksher,DC=com 
     This event can occur if either this directory service or the  destination directory service has been moved to another site.
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
     <System>
       <Provider Name="Microsoft-Windows-ActiveDirectory_DomainService"  Guid="{0e8478c5-3605-4e8c-8497-1e730c959516}" EventSourceName="NTDS  KCC" />    <EventID Qualifiers="16384">1104</EventID>
        <Version>0</Version>
        <Level>4</Level>
        <Task>1</Task>
        <Opcode>0</Opcode>
        <Keywords>0x8080000000000000</Keywords>
        <TimeCreated SystemTime="2012-06-19T20:45:55.493435700Z" />
        <EventRecordID>13112</EventRecordID>
        <Correlation />
        <Execution ProcessID="640" ThreadID="1540" />
        <Channel>Directory Service</Channel>
        <Computer>PAKAPP***.paksher.com</Computer>
        <Security UserID="S-1-5-7" />
      </System>
      <EventData>
        <Data>CN=Configuration,DC=paksher,DC=com</Data>
        <Data>b41adbb2-c739-47ad-bb98- e1216516bf35._msdcs.paksher.com</Data>
        <Data>CN=NTDS Settings,CN=IMAGE**,CN=Servers,CN=Default-First-Site- Name,CN=Sites,CN=Configuration,DC=paksher,DC=com</Data>
      </EventData>
    </Event>

    Log Name:      Directory Service
    Source:        Microsoft-Windows-ActiveDirectory_DomainService
    Date:          6/19/2012 12:25:54 PM
    Event ID:      1863
    Task Category: Replication
    Level:         Error
    Keywords:      Classic
    User:          ANONYMOUS LOGON
    Computer:      PAKAPP***.paksher.com
    Description:
    This is the replication status for the following directory partition on  this directory server. 
     Directory partition:
    DC=ForestDnsZones,DC=paksher,DC=com 
     This directory server has not received replication information from a  number of directory servers within the configured latency interval. 
     Latency Interval (Hours): 
    24 
    Number of directory servers in all sites:

    Number of directory servers in this site:

     The latency interval can be modified with the following registry key. 
     Registry Key: 
    HKLM\System\CurrentControlSet\Services\NTDS\Parameters\Replicator  latency error interval (hours) 
     To identify the directory servers by name, use the dcdiag.exe tool. 
    You can also use the support tool repadmin.exe to display the  replication latencies of the directory servers.   The command is  "repadmin /showvector /latency <partition-dn>".
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="Microsoft-Windows-ActiveDirectory_DomainService"  Guid="{0e8478c5-3605-4e8c-8497-1e730c959516}" EventSourceName="NTDS  Replication" />
        <EventID Qualifiers="49152">1863</EventID>
        <Version>0</Version>
        <Level>2</Level>
        <Task>5</Task>
        <Opcode>0</Opcode>
        <Keywords>0x8080000000000000</Keywords>
        <TimeCreated SystemTime="2012-06-19T17:25:54.622181900Z" />
        <EventRecordID>13106</EventRecordID>
        <Correlation />
        <Execution ProcessID="640" ThreadID="836" />
        <Channel>Directory Service</Channel>
        <Computer>PAKAPP***.paksher.com</Computer>
        <Security UserID="S-1-5-7" />
      </System>
      <EventData>
        <Data>DC=ForestDnsZones,DC=paksher,DC=com</Data>

    Log Name:      Directory Service
    Source:        Microsoft-Windows-ActiveDirectory_DomainService
    Date:          6/19/2012 12:25:54 PM
    Event ID:      1863
    Task Category: Replication
    Level:         Error
    Keywords:      Classic
    User:          ANONYMOUS LOGON
    Computer:      PAKAPP***.paksher.com
    Description:
    This is the replication status for the following directory partition on  this directory server. 
     Directory partition:
    DC=DomainDnsZones,DC=paksher,DC=com 
     This directory server has not received replication information from a  number of directory servers within the configured latency interval. 
     Latency Interval (Hours): 
    24 
    Number of directory servers in all sites:

    Number of directory servers in this site:


    The latency interval can be modified with the following registry key. 
    Registry Key: 
    HKLM\System\CurrentControlSet\Services\NTDS\Parameters\Replicator  latency error interval (hours) 
     To identify the directory servers by name, use the dcdiag.exe tool. 
    You can also use the support tool repadmin.exe to display the  replication latencies of the directory servers.   The command is  "repadmin /showvector /latency <partition-dn>".
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="Microsoft-Windows-ActiveDirectory_DomainService"  Guid="{0e8478c5-3605-4e8c-8497-1e730c959516}" EventSourceName="NTDS  Replication" />
        <EventID Qualifiers="49152">1863</EventID>
        <Version>0</Version>
        <Level>2</Level>
        <Task>5</Task>
        <Opcode>0</Opcode>
        <Keywords>0x8080000000000000</Keywords>
        <TimeCreated SystemTime="2012-06-19T17:25:54.622181900Z" />
        <EventRecordID>13105</EventRecordID>
        <Correlation />
        <Execution ProcessID="640" ThreadID="836" />
        <Channel>Directory Service</Channel>
        <Computer>PAKAPP***.paksher.com</Computer>
        <Security UserID="S-1-5-7" />
      </System>
      <EventData>
        <Data>DC=DomainDnsZones,DC=paksher,DC=com</Data>
        <Data>1</Data>
        <Data>1</Data>
        <Data>24</Data>
        <Data>System\CurrentControlSet\Services\NTDS\Parameters</Data>
      </EventData>
    </Event>
        <Data>1</Data>
        <Data>1</Data>
        <Data>24</Data>
        <Data>System\CurrentControlSet\Services\NTDS\Parameters</Data>
      </EventData>
    </Event>

    Log Name:      Directory Service
    Source:        Microsoft-Windows-ActiveDirectory_DomainService
    Date:          6/19/2012 12:25:54 PM
    Event ID:      1863
    Task Category: Replication
    Level:         Error
    Keywords:      Classic
    User:          ANONYMOUS LOGON
    Computer:      PAKAPP***.paksher.com
    Description:
    This is the replication status for the following directory partition on  this directory server. 
     Directory partition:
    CN=Schema,CN=Configuration,DC=paksher,DC=com 
    This directory server has not received replication information from a  number of directory servers within the configured latency interval. 
     Latency Interval (Hours): 
    24 
    Number of directory servers in all sites:

    Number of directory servers in this site:


    The latency interval can be modified with the following registry key. 
     Registry Key: 
    HKLM\System\CurrentControlSet\Services\NTDS\Parameters\Replicator  latency error interval (hours) 
     To identify the directory servers by name, use the dcdiag.exe tool. 
    You can also use the support tool repadmin.exe to display the  replication latencies of the directory servers.   The command is  "repadmin /showvector /latency <partition-dn>".
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="Microsoft-Windows-ActiveDirectory_DomainService"  Guid="{0e8478c5-3605-4e8c-8497-1e730c959516}" EventSourceName="NTDS  Replication" />
        <EventID Qualifiers="49152">1863</EventID>
        <Version>0</Version>
        <Level>2</Level>
        <Task>5</Task>
        <Opcode>0</Opcode>
        <Keywords>0x8080000000000000</Keywords>
        <TimeCreated SystemTime="2012-06-19T17:25:54.622181900Z" />
        <EventRecordID>13104</EventRecordID>
        <Correlation />
        <Execution ProcessID="640" ThreadID="836" />
        <Channel>Directory Service</Channel>
        <Computer>PAKAPP***.paksher.com</Computer>
        <Security UserID="S-1-5-7" />
      </System>
      <EventData>
        <Data>CN=Schema,CN=Configuration,DC=paksher,DC=com</Data>
        <Data>1</Data>
        <Data>1</Data>
        <Data>24</Data>
        <Data>System\CurrentControlSet\Services\NTDS\Parameters</Data>
      </EventData>
    </Event>

    Log Name:      Directory Service
    Source:        Microsoft-Windows-ActiveDirectory_DomainService
    Date:          6/19/2012 12:25:54 PM
    Event ID:      1863
    Task Category: Replication
    Level:         Error
    Keywords:      Classic
    User:          ANONYMOUS LOGON
    Computer:      PAKAPP***.paksher.com
    Description:
    This is the replication status for the following directory partition on  this directory server. 
     Directory partition:
    DC=paksher,DC=com 
     This directory server has not received replication information from a  number of directory servers within the configured latency interval. 
     Latency Interval (Hours): 
    24 
    Number of directory servers in all sites:

    Number of directory servers in this site:

     
    The latency interval can be modified with the following registry key. 
     Registry Key: 
    HKLM\System\CurrentControlSet\Services\NTDS\Parameters\Replicator  latency error interval (hours) 
     To identify the directory servers by name, use the dcdiag.exe tool. 
    You can also use the support tool repadmin.exe to display the  replication latencies of the directory servers.   The command is  "repadmin /showvector /latency <partition-dn>".
    Event Xml:
    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
      <System>
        <Provider Name="Microsoft-Windows-ActiveDirectory_DomainService"  
    Guid="{0e8478c5-3605-4e8c-8497-1e730c959516}" EventSourceName="NTDS  Replication" />
        <EventID Qualifiers="49152">1863</EventID>
        <Version>0</Version>
        <Level>2</Level>
        <Task>5</Task>
        <Opcode>0</Opcode>
        <Keywords>0x8080000000000000</Keywords>
        <TimeCreated SystemTime="2012-06-19T17:25:54.622181900Z" />
        <EventRecordID>13103</EventRecordID>
        <Correlation />
        <Execution ProcessID="640" ThreadID="836" />
        <Channel>Directory Service</Channel>
        <Computer>PAKAPP***.paksher.com</Computer>
        <Security UserID="S-1-5-7" />
      </System>
      <EventData>
        <Data>DC=paksher,DC=com</Data>
        <Data>1</Data>
        <Data>1</Data>
        <Data>24</Data>
        <Data>System\CurrentControlSet\Services\NTDS\Parameters</Data>
      </EventData>
    </Event>


    Because one of the users could not get out to the internet first thing this morning, I was asked to go change the TCP/IP settings on the DC nics back to pointing to themselves (and using another partner dc as the secondary).  Once everything is functioning correctly, I’ll make another attempt to get these changed as you instructed (pointing to another partner dc first, then to themselves as secondary).

    Image**** the OS is W2K3, sp2
    Pakapp**** the OS is W2K8, sp2 and it is a virtual server
    Paksher**** the OS is W2K, sp4

    One final concern that may or may not factor in – 
    Could we have some type of intermittent connectivity issue that is not evident?  I ask because occasionally during the day, I will have problems remotely accessing the Pakapp** dc.  I immediately ping the server, which responds fine.  Usually a minute or two later, I can access remotely again.  Am I wrong to assume that just because the nic status shows that it’s been connected 39 days, it is therefore always connected?  Should this be a concern and if so, how would you advise checking for connectivity issues?

    Finally, I ran dcdiag /v on all the dc’s this afternoon and the only one showing a couple failures is Paksher****, the one with the failing drives.  Just to add some more comedy to the situation, I made a visit to the server room this morning to physically take a look and discovered that 2 of the 3 power supplies are also dead!  

    Anyway, I posted the dcdiag tests out on skydrive:

    http://sdrv.ms/KUq8Uf

    Starting to feel like I’m banging my head against a brick wall about now…. Thanks again!

    K. Nelson


    Wednesday, June 20, 2012 10:01 PM
  • What IP is that DC GUID I have bolded and underlined below?

    Log Name:      Directory Service
    Source:        Microsoft-Windows-ActiveDirectory_DomainService
    Date:          6/19/2012 3:45:55 PM
    Event ID:      1104
    Task Category: Knowledge Consistency Checker
    Level:         Information
    Keywords:      Classic
    User:          ANONYMOUS LOGON
    Computer:      PAKAPP*.paksher.com
    Description:
    The Knowledge Consistency Checker (KCC) successfully terminated the  following change notifications.
    Directory partition:
    CN=Configuration,DC=paksher,DC=com
    Destination network address:
    b41adbb2-c739-47ad-bb98-e1216516bf35._msdcs.paksher.com
    Destination directory service (if available):
    CN=NTDS Settings,CN=IMAGE*,CN=Servers,CN=Default-First-Site- Name,CN=Sites,CN=Configuration,DC=paksher,DC=com
    This event can occur if either this directory service or the  destination directory service has been moved to another site.

    .

    .

    EDNS0 support and the DNS server you will be choosing as a Forwarder:

    Being consistent means to configure ALL DNS servers to have a forwarder, and using the same forwarder. And btw - Google's and openDNS do NOT support ENDS0, so addiitonal problems may occur due to this when you can't resolve certain names. I suggest anything else that supports EDNS0, such as 4.2.2.2. If you don't like that one, no problem, fine, just test wahtever DNS server you want to use with the following test so you know it supports it:

    Here's a quick command to test if there's an EDNS0 restriction in your firewall:
    nslookup -type=TXT rs.dns-oarc.net

    Or if you want to test a specific DNS server for EDNS0 support, whether an internal or external DNS server, use the following method:

    c:\>nslookup
    > server 4.2.2.2    <---- change the IP to whatever DNS server you want to test for EDSN0 support
    > set q=txt
    > rs.dns-oarc.net

    Look for the part in the response that says, " ...DNS reply size limit is at least xxxx." The xxxx is what it will support. If it's under 512, then it is blocking EDNS0 or the Forwarder you are using is blocking or not allowing/configured to use EDNS0.

    .

    .

    And leave that recursion setting default. It's fine.

    .


    Ace Fekay
    MVP, MCT, MCITP EA, MCTS Windows 2008/R2, Exchange 2007 & Exchange 2010, Exchange 2010 EA, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This post is provided AS-IS with no warranties or guarantees and confers no rights.

    FaceBook Twitter LinkedIn

    Wednesday, June 20, 2012 11:44 PM
  • Image**** the OS is W2K3, sp2
    Pakapp**** the OS is W2K8, sp2 and it is a virtual server
    Paksher**** the OS is W2K, sp4

    Which one is the PDC Emulator? Run a netdom query fsmo to see.

    And what is the VM host? If HyperV, you must *partially* disable the host's time service. If VMWare, you must full disable the host's time service. Read the VMWare KBs regarding this issue in the link below.

    Virtualizing Domain Controllers and the Windows Time Service
    Published by acefekay on Aug 23, 2011 at 1:15 AM
    http://msmvps.com/blogs/acefekay/archive/2011/08/23/virtualizing-domain-controllers-and-the-windows-time-service.aspx

    and

    Running Domain Controllers in Hyper-V
    http://technet.microsoft.com/en-us/library/d2cae85b-41ac-497f-8cd1-5fbaa6740ffe(v=ws.10)#bkmk1_planning_to_virtualize_domain_controllers

    .

    And I hope you've never used the host's snapshot feature and restored a DC from a snapshot! Never do that!!!!

    .

    One final concern that may or may not factor in –
    Could we have some type of intermittent connectivity issue that is not evident?  I ask because occasionally during the day, I will have problems remotely accessing the Pakapp** dc.  I immediately ping the server, which responds fine.  Usually a minute or two later, I can access remotely again.  Am I wrong to assume that just because the nic status shows that it’s been connected 39 days, it is therefore always connected?  Should this be a concern and if so, how would you advise checking for connectivity issues?

    Sounds like it's hardware related - either a bad NIC, wire or switch port.

    Maybe this could be the cause of your issues.

    .

    Finally, I ran dcdiag /v on all the dc’s this afternoon and the only one showing a couple failures is Paksher****, the one with the failing drives.  Just to add some more comedy to the situation, I made a visit to the server room this morning to physically take a look and discovered that 2 of the 3 power supplies are also dead! 

    You have a lot on your plate!

    .


    Ace Fekay
    MVP, MCT, MCITP EA, MCTS Windows 2008/R2, Exchange 2007 & Exchange 2010, Exchange 2010 EA, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This post is provided AS-IS with no warranties or guarantees and confers no rights.

    FaceBook Twitter LinkedIn

    Wednesday, June 20, 2012 11:58 PM
  • "What IP is that DC GUID I have bolded and underlined below?
    b41adbb2-c739-47ad-bb98-e1216516bf35._msdcs.paksher.com"
         That is the IMAGE*** server, with an IP address of 192.168.1.247

    PAKAPP*** holds all the FSMO roles

    I just found out yesterday that the Image*** server is also a virtual server.
    The host is VMWare ESXi 4.0 on for both virtual servers.
    We will try to get the VMWare time service disabled asap.

    (Thankfully, the other IT person says that she did not use the VMWare snapshot feature to restore the fallen DC…)

    Thank you for the pointing out the EDNS0 information and tools to test!  
    I added 4.2.2.2 and 4.2.2.3 as Forwarders to all of the DC's. I enabled the box to "Use root hints if no forwarders are available" on PAKAPP** (W2K8, they've changed the wording again, so hopefully I am correct).  I tested the IPS forwarder we had been using and it was being restricted!  After I changed these settings, the DNS event 5501 that had been filling our logs stopped.  Thank you!!!

    I ran another set of dcdiag  /v tests on each server late afternoon yesterday, and there were no errors (except for system log on Paksher**, which were older issues).

    I ran a set another set of dcdiag tests this morning, and now Paksher*** is failing both RidManager and MachineAccount tests again!  

    Starting test: RidManager
             * Available RID Pool for the Domain is 3107 to 1073741823
             * PAKAPP**.paksher.com is the RID Master
             * DsBind with RID Master was successful
             ldap_search_sW subtree of DC=paksher,DC=com for sam account failed with
     234: More data is available.
             No rids allocated -- please check eventlog.
             ......................... PAKSHER** failed test RidManager
          Starting test: MachineAccount
             ldap_search_sW failed with 234: More data is available.
             ldap_search_sW subtree of DC=paksher,DC=com for sam account failed with
     234: More data is available.
             ldap_search_sW subtree of DC=paksher,DC=com for sam account failed with
     234: More data is available.
             ......................... PAKSHER** failed test MachineAccount


    The other IT person’s husband gave us a spare server yesterday with W2K3.  Hopefully we can get that configured in the next week or two so I can at least demote the problem server (Paksher***) before it fails completely and gives us a whole new set of problems.  In reading some of the material in your blogs, we realized that we do indeed have our PDC role on a virtual server; I plan on transferring all the FSMO roles to a physical server once we have one of the new servers up and promoted.  Would it matter if all of the FSMO roles were handled by an older OS (W2K3)?  (The virtuals are the only servers in which we have W2K8 installed).  

    I’m going to try to do some research today on the sam account errors that are showing in the Paksher** dcdiag test.  

    The other IT person thinks that it is unlikely that we are having connectivity issues with PAKAPP*** - because the Image*** server is located on the same virtual host and it has no issues…

    Thanks again for all of the helpful documentation, suggestions, and for taking the time to post them!
    Friday, June 22, 2012 3:07 PM
  • Glad to hear I was helpful and the DNS errors are gone. Did you re-run the EDNSo test after putting the Forwarders in place? They should come back as 4096 instead of 490 & 512.

    .

    And yep, time to demote PAKSHR. Run a netdom query fsmo on it, and run the same on another DC. Do the two agree on who's holding the roles?

    .

    And if PAKSHR won't cleanly demote, you'll have to force demote it (dcpromo /forceremoval), and if you do have to do this, then you must run a metadata cleanup, and seize the FSMO roles to another DC. I previously posted my blog on the steps, but for your convenience, here it is again:

    Complete Step by Step Guideline to Remove an Orphaned Domain controller (including seizing FSMOs, running a metadata cleanup, cleanup DNS (Nameserver tab), AD Sites (old DC references), and more)
    Published by Ace Fekay, MCT, MVP DS on Oct 5, 2010 at 12:14 AM
    http://msmvps.com/blogs/acefekay/archive/2010/10/05/complete-step-by-step-to-remove-an-orphaned-domain-controller.aspx

    .


    Ace Fekay
    MVP, MCT, MCITP EA, MCTS Windows 2008/R2, Exchange 2007 & Exchange 2010, Exchange 2010 EA, MCSE & MCSA 2003/2000, MCSA Messaging 2003
    Microsoft Certified Trainer
    Microsoft MVP - Directory Services
    Complete List of Technical Blogs: http://www.delawarecountycomputerconsulting.com/technicalblogs.php

    This post is provided AS-IS with no warranties or guarantees and confers no rights.

    FaceBook Twitter LinkedIn

    Friday, June 22, 2012 4:14 PM
  • Yep, ran the EDNS0 test on the IP addresses that I used for the forwarders and they did report back "DNS reply size limit is at least 4096". 

    I could not get the netdom query fsmo to work on the Paksher** dc.  I was able to see that it correctly reported that Pakapp* holds all 5 fsmo roles.  Ran the netdom query fsmo on Image** dc and it also reports Pakapp*  as holding all 5 fsmo roles.  

    All of the event logs look okay.  We will get to work on the problem DC so we can get it demoted as soon as possible.  

    Thanks again for all of your guidance!  

    K. Nelson


    Monday, June 25, 2012 4:19 PM