none
Exchange 2010 DAG Seeding Error

    Question

  • This weekend I set up a DAG group between two Exchange 2010 Servers in two sites. Having a problem with completing the seed. I am getting:

     

    Error:

    A source-side operation failed. Error An error occurred while performing the seed operation. Error: An error occurred while communicating with server 'mail02'. Error: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.. [Database: Exchange DB, Server: DRML.mydomain.com]

     

    An error occurred while communicating with server 'mail02'. Error: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

     Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

     

    I notice that PAM is assigned to the passive exchange server-  From what I have read it should remain with the active copy. If this is correct, how do I change this. 

    I have a 90GB Database and a 10BM connection on both ends of this multi-site, single AD domain. Connectivity is good and when the seed is running I am utilizing about .5% of the network bandwidth. Can someone point me in the direction of some things I can troubleshoot.  And are there any acceptable alternatives to getting the database over there?

     

    Thanks,

    SJMP

    Monday, January 10, 2011 1:45 PM

Answers

  • If you're opening Exchange PowerShell through the icon which was installed with Exchange, it's using Remote PowerShell. All commands are sent to, and processed by, a Cas server with Remote PowerShell.

    I don't think this is causing the Seed to fail.

    Do the other commands issued in the same window fail in the same way?

    • Marked as answer by Allen Song Tuesday, January 25, 2011 2:38 AM
    Thursday, January 13, 2011 3:29 PM

All replies

  • Besides PAM being assigned to the passive exchange server, I ran a dependency report on the DAG cluster.

    'IP Address: 192.168.5.193' has no required dependencies.
    'IP Address: Address on Cluster Network 2' has no required dependencies.
    'Name: DAG' dependencies are 'IP Address: 192.168.5.193' or 'IP Address: Address on Cluster Network 2'.
    'Cluster Name' required dependencies are IP Address.

    5.193 is from the passive subnet. Should I assign a static IP and what about the active subnet. 

    Thanks,

    Monday, January 10, 2011 5:04 PM
  • The PAM has nothing to do with this.

    Does Test-ReplicationHealth warn about anything not being correct with the networks?

    Monday, January 10, 2011 5:22 PM
  • Everything comes back as Passed. 

    The only error I get is 4113 Error that it is either seeding, or does not have a healthy copy. which is expect to see. 

    • Proposed as answer by S Chris Wednesday, March 07, 2012 6:31 AM
    Monday, January 10, 2011 5:31 PM
  • Try running these two commands on both Dag members, and then restarting the replay service on both nodes.

    netsh int tcp set global autotuninglevel=restricted

    netsh int tcp set global autotuninglevel=normal

    Something could be going wrong with the W2K8R2 network throttling feature and it needs a swift kick in the pants.

    Monday, January 10, 2011 5:40 PM
  • I started running another copy attempt before your above instructions. Will do this if fails again. Has been running for 4hrs about 10% through. 

    Could you please tell me if Error 4113 is normal during the Seeding process. I understand it is a healthcheck error but it states:

    The Microsoft Exchange Replication service has suspended database copy 'Exchange DB' as part of the initial seeding operation.

    <Data>Passive copy 'Exchange DB\DRML' is not in a good state. Status: Seeding.

    Is this typical that it kicks off these errors during the seeding process?

     

     

    Monday, January 10, 2011 9:01 PM
  • Hi,

    No, that's not correct behaviour. Now please try to run Get-MailboxDatabaseCopyStatus -server DRML to post the detailed status of the Exchange DB.

    Additionally, you can try the below steps to check whether the copy status turns to Healthy.

    1. Restart Replication Service on server “EXCHMBS2”.
    2. Disable IPv6
    3. Disable TCP chimney and RSS from all the three nodes.
    http://support.microsoft.com/default.aspx?scid=kb;EN-US;951037

    Thanks

    Allen


    Allen Song
    Tuesday, January 11, 2011 6:17 AM
  • Allen,

    The database is still replicating 20hrs 60%. Ipv6 is disabled. (I will check the registry when the copy completes). I will also check tcp chimney and RSS post copy regardless of success/failure. 

    Should have an update by end of the day. 

    Thanks

    SJMP

     

    RunspaceId                       : e6890e59-f1d7-4de3-90c4-b4bad8c2c0a9

    Identity                         : Exchange DB\DRML

    Name                             : Exchange DB\DRML

    DatabaseName                     : Exchange DB

    Status                           : Seeding

    MailboxServer                    : DRML

    ActiveDatabaseCopy               : mail02

    ActivationSuspended              : True

    ActionInitiator                  : Service

    ErrorMessage                     :

    ErrorEventId                     :

    ExtendedErrorInfo                :

    SuspendComment                   : The Microsoft Exchange Replication service has suspended database copy 'Exchange DB'   as part of the initial seeding operation.

    SinglePageRestore                : 0

    ContentIndexState                : Failed

    ContentIndexErrorMessage         : Catalog is dismounted externally for database {d7656a2f-9ea3-4f37-90ab-5caba262565b}.

    CopyQueueLength                  : 110607

    ReplayQueueLength                : 0

    LatestAvailableLogTime           :

    LastCopyNotificationedLogTime    :

    LastCopiedLogTime                :

    LastInspectedLogTime             :

    LastReplayedLogTime              :

    LastLogGenerated                 : 110607

    LastLogCopyNotified              : 0

    LastLogCopied                    : 0

    LastLogInspected                 : 0

    LastLogReplayed                  : 0

    LogsReplayedSinceInstanceStart   : 0

    LogsCopiedSinceInstanceStart     : 0

    LatestFullBackupTime             :

    LatestIncrementalBackupTime      :

    LatestDifferentialBackupTime     :

    LatestCopyBackupTime             :

    SnapshotBackup                   :

    SnapshotLatestFullBackup         :

    SnapshotLatestIncrementalBackup  :

    SnapshotLatestDifferentialBackup :

    SnapshotLatestCopyBackup         :

    LogReplayQueueIncreasing         : False

    LogCopyQueueIncreasing           : False

    OutstandingDumpsterRequests      : {}

    OutgoingConnections              :

    IncomingLogCopyingNetwork        :

    SeedingNetwork                   :

    ActiveCopy                       : False

    Tuesday, January 11, 2011 1:29 PM
  • Given the size of the database, this could be moving along at the correct pace. Do you know how many MB/s of network activity is going on between the two servers?

    The only way to know for sure if there's some problem with the seeding is if you copy a very large file (a few gigabytes in size) using either robocopy or xcopy and eseutil /y copy (http://blogs.technet.com/b/askperf/archive/2007/05/08/slow-large-file-copy-issues.aspx). If those can go faster than the seed, than something is up with the Replay service. If those can't copy the database faster than the seed, than the issue is the network and possibly Windows.

    Tuesday, January 11, 2011 5:07 PM
  • Ok I think I found the problem. DB Copy ran for 40hrs, got to 99%. The error below is pointing to 192.168.2.2 which is the first exchange 2010 server in my domain. The mailboxes and PFDB have been migrated and production is pointing to 2.10 (production mailbox). All mail flow is also set to 2.x. Not sure why, mail or drml (seed source & target) would need to communicate with 2.2. 

    The only reason 2.2 is still online is I have not decommissioned yet. this server is still taking priority over 2.10 for CAS requests. If I power down this server, clients cannot access Outlook, OWA.... But I am not relying on it for production otherwise. Should I just uninstall the CAS role from this server and try to run this copy again.

     

    A source-side operation failed. Error An error occurred while performing the seed operation. Error: Failed to open a log truncation context to source server 'mail.mydomain.com'. Hresult: 0xfffffae7. Error: The database was either not found or was not replicated.. [Database: Exchange DB, Server: DRML. mydomain.com]

    Failed to open a log truncation context to source server 'mail02. mydomain.com'. Hresult: 0xfffffae7. Error: The database was either not found or was not replicated.

    Click here for help... http://technet.microsoft.com/en-US/library/ms.exch.err.default(EXCHG.140).aspx?v=14.1.267.0&t=exchgf1&e=ms.exch.err.Ex4543D9

    Warning:

    The cmdlet extension agent with the index 0 has thrown an exception in OnComplete(). The exception is: System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.2.2:443

       at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)

       at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Int32 timeout, Exception& exception)

       --- End of inner exception stack trace ---

       at System.Net.HttpWebRequest.GetRequestStream(TransportContext& context)

       at System.Net.HttpWebRequest.GetRequestStream()

       at System.Web.Services.Protocols.SoapHttpClientProtocol.Invoke(String methodName, Object[] parameters)

       at Microsoft.Exchange.SoapWebClient.CustomSoapHttpClientProtocol.<>c__DisplayClass4.<Invoke>b__3()

       at Microsoft.Exchange.SoapWebClient.HttpAuthenticator.NetworkServiceHttpAuthenticator.AuthenticateAndExecute[T](SoapHttpClientProtocol client, AuthenticateAndExecuteHandler`1 handler)

       at Microsoft.Exchange.SoapWebClient.SoapHttpClientAuthenticator.AuthenticateAndExecute[T](SoapHttpClientProtocol client, AuthenticateAndExecuteHandler`1 handler)

       at Microsoft.Exchange.SoapWebClient.EWS.ExchangeServiceBinding.FindFolder(FindFolderType FindFolder1)

       at Microsoft.Exchange.ProvisioningAgent.MailboxLoggerFactory.EwsMailer.GetAdminAuditLogsFolder(ADUser adUser)

       at Microsoft.Exchange.ProvisioningAgent.MailboxLoggerFactory.EwsMailer..ctor(OrganizationId organizationId, ADUser adUser, ExchangePrincipal principal)

       at Microsoft.Exchange.ProvisioningAgent.MailboxLoggerFactory.Create(OrganizationId organizationId, ADUser mailbox, ExchangePrincipal principal)

       at Microsoft.Exchange.ProvisioningAgent.AdminLogAgentClassFactory.ConfigWrapper.get_MailboxLogger()

       at Microsoft.Exchange.ProvisioningAgent.AdminLogProvisioningHandler.OnComplete(Boolean succeeded, Exception e)

       at Microsoft.Exchange.Provisioning.ProvisioningLayer.OnComplete(Task task, Boolean succeeded, Exception exception)

     

     

    Exchange Management Shell command attempted:

    Add-MailboxDatabaseCopy -Identity 'Exchange DB' -MailboxServer 'DRML' -ActivationPreference '2'

     

    Elapsed Time: 00:00:46

     

     

    Wednesday, January 12, 2011 9:52 PM
  • The reason why 192.168.2.2 is involved would be because of Remote PowerShell.

    I've only seen that error before when someone runs the Exchange PowerShell module inside of a local PowerShell instance, and then the cmdlet tries to do something that depends on remote PowerShell and dies.

    So it's possible the copy finished and went to healthy.

    Wednesday, January 12, 2011 10:13 PM
  • the copy never finished. During the copy I see the 89GB file in the temp-seeding folder. once it fails the edb is gone and the disk reclaims the 89GB of free space. 

    I am running powershell locally on production mail 2.10 and drml is 5.x

    Wednesday, January 12, 2011 10:16 PM
  • Recreated the DAG from scratch. Once I added the two server it dismounted production and would not let me mount. 

    Failed to open Active Manager Key. Error 2. I removed the servers from the DAG and mounted the store. Also saw this error:

     

    Log Name:      Application

    Source:        MSExchangeRepl

    Date:          1/12/2011 7:19:18 PM

    Event ID:      4082

    Task Category: Service

    Level:         Error

    Keywords:      Classic

    User:          N/A

    Computer:      MAIL02.mydomain.com

    Description:

    The replication network manager encountered an error while monitoring events. Error: Microsoft.Exchange.Cluster.Replay.AmClusterApiException: An Active Manager operation failed. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"OpenCluster(MAIL02.mydomain.com) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"' failed.. ---> System.ComponentModel.Win32Exception: There are no more endpoints available from the endpoint mapper

       --- End of inner exception stack trace ---

       at Microsoft.Exchange.Cluster.Replay.NetworkManager.DriveMapRefresh()

       at Microsoft.Exchange.Cluster.Replay.NetworkManager.TryDriveMapRefresh()

    Event Xml:

    <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">

      <System>

        <Provider Name="MSExchangeRepl" />

        <EventID Qualifiers="49156">4082</EventID>

        <Level>2</Level>

        <Task>1</Task>

        <Keywords>0x80000000000000</Keywords>

        <TimeCreated SystemTime="2011-01-13T00:19:18.000000000Z" />

        <EventRecordID>13403</EventRecordID>

        <Channel>Application</Channel>

        <Computer>MAIL02.mydomain.com</Computer>

        <Security />

      </System>

      <EventData>

        <Data>Microsoft.Exchange.Cluster.Replay.AmClusterApiException: An Active Manager operation failed. Error: An error occurred while attempting a cluster operation. Error: Cluster API '"OpenCluster(MAIL02.mydomain.com) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"' failed.. ---&gt; System.ComponentModel.Win32Exception: There are no more endpoints available from the endpoint mapper

       --- End of inner exception stack trace ---

       at Microsoft.Exchange.Cluster.Replay.NetworkManager.DriveMapRefresh()

       at Microsoft.Exchange.Cluster.Replay.NetworkManager.TryDriveMapRefresh()</Data>

      </EventData>

    </Event>

     

    Thursday, January 13, 2011 12:26 AM
  • Hi,

    From the results that you posted about Get-MailboxDatabaseCopyStatus -server DRML, we knew that there were a huge logs being copyied from the active to the passive.

    For this issue, that means the there are corrupted logs file in the active server. Thus, we need to temporary dismount the Active database, and move the transation logs to another folder, and mount the database, it will create a new series transaction log files. Then reseed the database.

    Thanks

    Allen


    Allen Song
    • Proposed as answer by heyko Monday, February 06, 2012 5:42 PM
    Thursday, January 13, 2011 6:48 AM
  • Allen,

    Where are the logs? I only have 514 currently on production mail server. Run a full backup nightly and flush the logs. 

    I checked on the old exchange server 2.2 (that I believe is causing the problem). This server is taking priority as the main exchange 2010 server. why is my DAG seed even looking at this server - it is not a member of the DAG group. 

    The cmdlet extension agent with the index 0 has thrown an exception in OnComplete(). The exception is: System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.2.2:443

    2.2 should not respond to this at all. the seed and DAG groups are between 2.10 and 5.13. 

     

     

    Thursday, January 13, 2011 2:03 PM
  • Remote PowerShell is running as a service on the Cas server. When you run Exchange PowerShell locally, you get those errors.

    Thursday, January 13, 2011 3:05 PM
  • jader3rd - forgive me for my ignorance on this issue but I dont understand the correlation  - 

     

    Why is a connection being attempted for the DAG process with 2.2 - 

    I ran the database copy from EMC on 2.10. When I open powershell on 2.10 it connects locally. And if it is causing my DAG seed to fail how can I rectify it?

    Thanks

    SJMP

    Thursday, January 13, 2011 3:11 PM
  • If you're opening Exchange PowerShell through the icon which was installed with Exchange, it's using Remote PowerShell. All commands are sent to, and processed by, a Cas server with Remote PowerShell.

    I don't think this is causing the Seed to fail.

    Do the other commands issued in the same window fail in the same way?

    • Marked as answer by Allen Song Tuesday, January 25, 2011 2:38 AM
    Thursday, January 13, 2011 3:29 PM
  • I have a similar case, often appear database copy and to disconnect connection.

    Take solutions
    1, command activation      Resume-mailboxdatabasecopy -identity db1/ex1
    2, reset seeds
    3, if above all not line, just restore the database, in user mailbox migration.


    交流着、关注着、思考着、提升着、分享着、快乐着!
    • Proposed as answer by login.lu Sunday, October 23, 2011 3:36 PM
    Sunday, October 23, 2011 3:36 PM
  • Hi,

    From the results that you posted about Get-MailboxDatabaseCopyStatus -server DRML, we knew that there were a huge logs being copyied from the active to the passive.

    For this issue, that means the there are corrupted logs file in the active server. Thus, we need to temporary dismount the Active database, and move the transation logs to another folder, and mount the database, it will create a new series transaction log files. Then reseed the database.

    Thanks

    Allen


    Allen Song

     

    Had a similar issue, this fix worked for me. Thanks!

    Thursday, December 01, 2011 9:52 PM
  • jader3rd,

    i had the same reseed problem (connection forcibly closed)

    I disabled autotuninglevel. It solved my problem

    netsh int tcp set global autotuninglevel=disabled

    Thx

    Thursday, December 15, 2011 10:01 AM
  • Actually, just "re-seeding" fixes it for me - but first and foremost, this should NOT happen (at least not in my case) - I am simply saying "create copy" - period - that's it.

    I find it so odd that it seems that we need to do all this TWEAKING to the TCP stack and these netsh commands (chimney and so forth). If so, then those things NEED TO BE INCLUDED IN THE MICROSOFT DOCS! Or a subsequent fix - and I am on 2010 Exchange SP2, so it is NOT in a fix thus far!

     

    Error:
    A source-side operation failed. Error An error occurred while performing the seed operation.
    Error: Failed to open a log truncation context to source server 'MYBOX1.mydomain.org'.
    Hresult: 0xfffffae7. Error: The database was either not found or was not replicated..
    [Database: myuser, Server: MYBOX2.mydomain.org]

    Failed to open a log truncation context to source server 'MYBOX1.mydomain.org'. Hresult: 0xfffffae7.
    Error: The database was either not found or was not replicated.
    Click here for help... http://technet.microsoft.com/en-US/library/ms.exch.err.default(EXCHG.141).aspx?v=14.2.247.1&t
    =exchgf1&e=ms.exch.err.Ex4543D9

    Exchange Management Shell command attempted:
    Add-MailboxDatabaseCopy -Identity 'myuser' -MailboxServer 'MYBOX2' -ActivationPreference '2'

    Elapsed Time: 00:00:13

    And then it get put into SUSPENDED status.

    jader3rd wrote:
    Try running these two commands on both Dag members, and then restarting the replay service on both nodes.
    netsh int tcp set global autotuninglevel=restricted
    netsh int tcp set global autotuninglevel=normal
    Something could be going wrong with the W2K8R2 network throttling feature and it needs a swift kick in the pants.

    [Note: I have NOT yet tried "jader3rd"'s suggestion yet]

    WeetA wrote:
    i had the same reseed problem (connection forcibly closed)
    I disabled autotuninglevel. It solved my problem
    netsh int tcp set global autotuninglevel=disabled

    [Note: WeetA's suggestion did NOT solve my problem]


    Allen Song wrote:

    No, that's not correct behaviour. Now please try to run Get-MailboxDatabaseCopyStatus -server DRML
    to post the detailed status of the Exchange DB.

    Additionally, you can try the below steps to check whether the copy status turns to Healthy.

    1. Restart Replication Service on server “EXCHMBS2”.
    2. Disable IPv6
    3. Disable TCP chimney and RSS from all the three nodes.
    http://support.microsoft.com/default.aspx?scid=kb;EN-US;951037


    [Note: I have NOT tried Allen Song's suggestion yet]


    tnjman
    Tuesday, December 27, 2011 9:49 PM
  • Also, I don't see ANY service called a "replay service;" can you please specify the exact service name? Is that a 'short name' or 'nickname' that you are using for the service?

    i.e., is it something like Microsoft Exchange Replay service?


    tnjman
    Tuesday, December 27, 2011 9:56 PM
  • Update: Actually, in some case, just clicking "Resume Copy" on the suspended entity, under the DB Management, seems to fix it and let it finish copying.

    Still, it's a BUG, that needs to be fixed or documented as to what settings we need to have in place to prevent this./


    tnjman
    Tuesday, December 27, 2011 9:59 PM
  • Hi,

    From the results that you posted about Get-MailboxDatabaseCopyStatus -server DRML, we knew that there were a huge logs being copyied from the active to the passive.

    For this issue, that means the there are corrupted logs file in the active server. Thus, we need to temporary dismount the Active database, and move the transation logs to another folder, and mount the database, it will create a new series transaction log files. Then reseed the database.

    Thanks

    Allen


    Allen Song

    Hi Allen,

    thanks for this post - dismounting the database / deleting the logfiles & database copy / mounting the database & adding a new copy works for me.

    Thanks!

     

    Monday, February 06, 2012 5:44 PM
  • Hi

    If meet the mailbox log corruption, can check in the following order
    1.C:\Program Files\Microsoft\Exchange Server\V14\Bin>eseutil /mk "c:\Mailbox\MB01\E00.chk" | find /i "checkpoint"
    2.C:\Program Files\Microsoft\Exchange Server\V14\Bin>eseutil /mh "c:\Mailbox\MB01\MB01.edb"
    3. delete fault in the database server mailbox. Log and. CHK files, and then to mount database.


    交流着、关注着、思考着、提升着、分享着、快乐着!

    Tuesday, February 07, 2012 1:53 AM
  • Hi

    If meet the mailbox log corruption, can check in the following order
    1.C:\Program Files\Microsoft\Exchange Server\V14\Bin>eseutil /mk "c:\Mailbox\MB01\E00.chk" | find /i "checkpoint"
    2.C:\Program Files\Microsoft\Exchange Server\V14\Bin>eseutil /mh "c:\Mailbox\MB01\MB01.edb"
    3. delete fault in the database server mailbox. Log and. CHK files, and then to mount database.


    交流着、关注着、思考着、提升着、分享着、快乐着!

    This is dangerous, do not simply delete transaction logs from the file system unless in specific circumstances.

    In this case you have not checked for clean shutdown - how do you know you will not lose data as a result of this?

    Also no checks to see if the logs are actually corrupt.  Run eseutil /ML against the log generation ID to scan all logs.

    And once you do this - IMMEDIATELY - run a full back up as the log sequence is now broken.


    Cheers, Rhoderick


    Tuesday, February 07, 2012 2:14 AM
  • Read the error message carefully:

    The cmdlet extension agent with the index 0 has thrown an exception in OnComplete(). The exception is: System.Net.WebException: Unable to connect to the remote server ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.2.2:443

    How about you see what cmdlet extension agent that is? 

    Get-CmdletExtensionAgent | select name, priority | sort priority

    I'm going to assume that this is your first Exchange server and has arbitration mailboxes etc?

    Also I saw a comment that when you shut this off you lose access to mailboxes. Have you changed the RPCClientAccessServer parameter on the mailbox databases to a different RPC Client Access endpoint? 


    Cheers, Rhoderick

    Tuesday, February 07, 2012 2:29 AM
  • I find the DAG some of the worst software that i have ever had the pleasure of dealing with. Right up there next to SCCM and other microsoft products that have an atrocious design. The development team behind exchange 2010 should be lined up and shot.
    • Edited by Johnmclain Wednesday, March 26, 2014 1:10 PM
    Wednesday, March 26, 2014 1:10 PM