none
Bad Copy After Complete Network Move

    Question

  • Exchange 2013 CU10 in a 2 member DAG with 1 2016 edge server. This was all working fine before the move. We moved offices recently and this past Saturday we moved the whole network (servers, switches, routers etc). The servers do RR DNS for "redundancy. These are hyper-v VM's.

    The DAG has a replication network (joined by a single switch) that both servers plug into. So 1 NIC on each server is dedicated to only this replication. 1 NIC on each server is dedicated to the "truted" (LAN) network.

    The passive member is in the Disconnected and resynchronizing state with Copy queue length: 333

    Test-replicationhealth | fl shows:

    Identity         :
    IsValid          : True
    ObjectState      : New

    RunspaceId       : 8b26d431-37dc-4b01-9146-2355784f1ad4
    Server           : SRVNAME
    Check            : ClusterNetwork
    CheckDescription : Checks that the networks are healthy.
    Result           : *FAILED*
    Error            : Subnet '192.168.123.0/24' on network 'MapiDagNetwork' is not Up.  Current state is 'Misconfigured'.
                       Subnet '192.168.123.0/24' on network 'MapiDagNetwork' is not Up.  Current state is 'Misconfigured'.
                       Subnet 'fe80::/64' on network 'MapiDagNetwork' is not Up.  Current state is 'Misconfigured'.
                       Subnet 'fe80::/64' on network 'MapiDagNetwork' is not Up.  Current state is 'Misconfigured'.

    RunspaceId       : 8b26d431-37dc-4b01-9146-2355784f1ad4
    Server           : SRVNAME
    Check            : DatabaseRedundancy
    CheckDescription : Verifies that databases have sufficient redundancy. If this check fails, it means that some
                       databases are at risk of losing data.
    Result           : *FAILED*
    Error            : There were database redundancy check failures for database 'DAGNAME' that may be lowering
                       its redundancy and putting the database at risk of data loss. Redundancy Count: 1. Expected
                       Redundancy Count: 2. Detailed error(s):


                               passive SRVNAME:
                               Passive database copy 'DAGNAME\srvname' has an unhealthy status:
                       DisconnectedAndResynchronizing. [SuspendComment: None specified.] [ErrorMessage: The Microsoft
                       Exchange Replication service was unable to perform an incremental reseed of database copy 'DAGNAME\EXCHANGE' due to a network error. The database copy status will be set to Disconnected.
                       Error A timeout occurred while communicating with server 'SRVNAME'. Error: "A connection could not
                       be completed within 5 seconds."
                       ].


    Identity         :
    IsValid          : True
    ObjectState      : New

    RunspaceId       : 8b26d431-37dc-4b01-9146-2355784f1ad4
    Server           : SRVNAME
    Check            : DatabaseAvailability
    CheckDescription : Verifies that databases have sufficient availability. If this check fails, it means that some
                       databases are at risk of losing service.
    Result           : *FAILED*
    Error            : There were database availability check failures for database 'Mailbox Database' that may be
                       lowering its availability. Availability Count: 1. Expected Availability Count: 2. Detailed
                       error(s):


                               exchange:
                               Database copy 'DAGNAME' on server 'exchange' has a copy queue length of 334 logs,
                       which is higher than the maximum allowed copy queue length of 100. If you need to activate this
                       database copy, you can use the Move-ActiveMailboxDatabase cmdlet with the -SkipLagChecks and
                       -MountDialOverride parameters to forcibly activate the database with some data loss. If the
                       database does not automatically mount after running Move-ActiveMailboxDatabase successfully, use
                       the Mount-Database cmdlet to mount the database.

    In ECP, the replicationDAGnetwork says both interfaces are up. One of the above errors says it could not communicate with the other member. There isn't a firewall between the 2 (i have turned off windows firewall and no AV firewall). I have matched the LAN NIC's and the replication NIC's from hyper-v manager to the physical NIC on the servers so i am pretty confident that i have the right NIC's plugged into the right switches and the ECP says the interfaces are up. Any ideas?

    Monday, March 20, 2017 12:59 PM

All replies

  • First thing I would do is Set-DatabaseAvailabilityGroup YourDagName -DiscoverNetworks to ensure it detected new DAG networks. 

    Check things with Get-DatabaseAvailabilityGroupNetwork also to make surde both new subnets are showing up and clean up old subnets if those are still showing up.

    Also last but not least, if it is not IP less DAG then make sure you assigned an IP from the new prod subnet to the DAG.


    Amit Tank | Blog: exchangeshare.wordpress.com

    Monday, March 20, 2017 8:21 PM
    Moderator
  • MapiDAGNetwork says misconfigured. The networks and corresponding ip's did NOT change. ReplicationDAGNetwork says up. The DAG is NOT ip less. I have the replication part fixed but now the cluster does not come up.
    Tuesday, March 21, 2017 2:05 AM
  • How are the Cluster Networks doing in Windows Clustering MMC? Up or Down? Try restarting Cluster & replication service and see if Cluster Networks come up...

    BTW which version of Windows is that?


    Amit Tank | Blog: exchangeshare.wordpress.com


    Tuesday, March 21, 2017 3:05 AM
    Moderator