none
Content Index problems after Exchange 2016 CU1 in DAG environment RRS feed

  • Question

  • Please help:

    Four Exchange 2016 servers in a DAG running the RTM release, had no problems with indexing.

    First two Exchange 2016 servers upgraded to CU1 completed the process ok and had no problems with indexing.

    The third Exchange 2016 server (passive copies only) upgraded to CU1 saw all databases content indexes go to unknown state.

    Troubleshooting so far includes:

    update-mailboxdatabasecopy [db\server] -catalogonly (with and without -force)

    Stop ...Search and ...Search Host Controller services, remove content index folder on disk, restart services.

    Create ContentSubmitters group in AD giving full control to administrators and network service.

    Remove database copy then re-add database copy - database seeding succeeds, becomes healthy passive database, but index seeding does not succeed, never becomes healthy. Tried update...catalogonly again of course, still doesn't work.

    Remove all passive copies from server, (reboot) remove server from DAG, (reboot) remove Exchange 2016 CU1 from server, (reboot) reinstall Exchange 2016 CU1, (reboot) reconfigure virtual directories, connectors, etc, (reboot) seed passive database copy - database seeding succeeds, becomes healthy passive database, but index seeding does not succeed, never becomes healthy. Tried update...catalogonly again of course, still doesn't work.

    Index state starts out as unknown, after update...catalogonly goes to FailedandSuspended with ContentIndexErrorMessage of "The content index is corrupted" and ContentIndexErrorCode of 19.

    Error from update...catalogonly in yellow font is:
    Warning: Seeding of content index catalog for database [db] failed. Please verify that the Microsoft Search (Exchange) and the Host Controller service for Exchange services are running and try the operation again.
    Error: The seeding operation failed. Error: An error occurred while performing the seed operation.
    Error: An error occurred while updating the search catalog files from server [server] to [server].
    Error: A transient exception from Exchange Search was encountered.
    Error: -1.

    Here are some errors from the app log:

    4999 error:

    Watson report about to be sent for process id: 9824, with parameters: E12IIS, c-RTL-AMD64, 16.01.0396.030, NodeRunner#IndexNode1, M.C.S.FastServer.Managed, M.C.S.F.Plugin.Start, S.R.InteropServices.SEHException, e136, 16.01.0396.030.
    ErrorReportingEnabled: True

    =======

    1009 warning:

    The description for Event ID 1009 from source MSExchangeFastSearch cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

    If the event originated on another computer, the display information had to be saved with the event.

    The following information was included with the event:

    db16k
    Microsoft.Exchange.Search.Core.Abstraction.OperationFailedException: The component operation has failed. ---> Microsoft.Exchange.Search.Engine.FeedingSkippedException: "Feeding was skipped for 'd49548c0-2dc5-43d7-9b9f-bf0689f7c661 (db16k)' due to the state 'Unknown', error code: 'Unknown'."
       at Microsoft.Exchange.Search.Engine.SearchFeedingController.InternalExecutionStart()
       at Microsoft.Exchange.Search.Core.Common.Executable.InternalExecutionStart(Object state)
       --- End of inner exception stack trace ---
       at Microsoft.Exchange.Search.Core.Common.Executable.EndExecute(IAsyncResult asyncResult)
       at Microsoft.Exchange.Search.Engine.SyncRootController.ExecuteComplete(IAsyncResult asyncResult)

    the message resource is present but the message is not found in the string/message table

    =======

    1009 warning #2:

    The description for Event ID 1009 from source MSExchangeFastSearch cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

    If the event originated on another computer, the display information had to be saved with the event.

    The following information was included with the event:

    db16k
    Microsoft.Exchange.Search.Core.Abstraction.OperationFailedException: The component operation has failed. ---> Microsoft.Exchange.Search.Engine.FeedingSkippedException: "Feeding was skipped for 'd49548c0-2dc5-43d7-9b9f-bf0689f7c661 (db16k)' due to the state 'Unknown', error code: 'Unknown'."
       at Microsoft.Exchange.Search.Engine.SearchFeedingController.InternalExecutionStart()
       at Microsoft.Exchange.Search.Core.Common.Executable.InternalExecutionStart(Object state)
       --- End of inner exception stack trace ---
       at Microsoft.Exchange.Search.Core.Common.Executable.EndExecute(IAsyncResult asyncResult)
       at Microsoft.Exchange.Search.Engine.SyncRootController.ExecuteComplete(IAsyncResult asyncResult)

    the message resource is present but the message is not found in the string/message table

    ==========

    1006 warning:

    The description for Event ID 1006 from source MSExchangeFastSearch cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

    If the event originated on another computer, the display information had to be saved with the event.

    The following information was included with the event:

    System.ServiceModel.CommunicationException: The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '00:01:00'. ---> System.IO.IOException: The read operation failed, see inner exception. ---> System.ServiceModel.CommunicationException: The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '00:01:00'. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
       at System.Net.Sockets.Socket.Receive(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
       at System.ServiceModel.Channels.SocketConnection.ReadCore(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout, Boolean closing)
       --- End of inner exception stack trace ---
       at System.ServiceModel.Channels.SocketConnection.ReadCore(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout, Boolean closing)
       at System.ServiceModel.Channels.SocketConnection.Read(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout)
       at System.ServiceModel.Channels.ConnectionStream.Read(Byte[] buffer, Int32 offset, Int32 count)
       at System.Net.FixedSizeReader.ReadPacket(Byte[] buffer, Int32 offset, Int32 count)
       at System.Net.Security.NegotiateStream.StartFrameHeader(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
       at System.Net.Security.NegotiateStream.StartReading(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
       at System.Net.Security.NegotiateStream.ProcessRead(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
       --- End of inner exception stack trace ---
       at System.Net.Security.NegotiateStream.ProcessRead(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)
       at System.Net.Security.NegotiateStream.Read(Byte[] buffer, Int32 offset, Int32 count)
       at System.ServiceModel.Channels.StreamConnection.Read(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout)
       --- End of inner exception stack trace ---

    Server stack trace:
       at System.ServiceModel.Channels.StreamConnection.Read(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout)
       at System.ServiceModel.Channels.SessionConnectionReader.Receive(TimeSpan timeout)
       at System.ServiceModel.Channels.SynchronizedMessageSource.Receive(TimeSpan timeout)
       at System.ServiceModel.Channels.TransportDuplexSessionChannel.Receive(TimeSpan timeout)
       at System.ServiceModel.Channels.TransportDuplexSessionChannel.TryReceive(TimeSpan timeout, Message& message)
       at System.ServiceModel.Dispatcher.DuplexChannelBinder.Request(Message message, TimeSpan timeout)
       at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
       at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
       at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)

    Exception rethrown at [0]:
       at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
       at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
       at Microsoft.Ceres.InteractionEngine.Services.ProcessingEngine.IProcessingEngine.ExecuteSearchFlow(String flowName, IEnumerable`1 inputData)
       at Microsoft.Exchange.Search.OperatorSchema.PagingImsFlowExecutor.<>c__DisplayClass1e.<ExecuteSearchFlow>b__1c(IProcessingEngineChannel proxy)
       at Microsoft.Exchange.Search.OperatorSchema.PagingImsFlowExecutor.ExecuteServiceCall(IProcessingEngineChannel& serviceProxy, Action`1 call, Int32 retryCount)
       at Microsoft.Exchange.Search.OperatorSchema.PagingImsFlowExecutor.ExecuteSearchFlow(String flowName, Dictionary`2 inputData)
       at Microsoft.Exchange.Search.OperatorSchema.PagingImsFlowExecutor.<ExecuteInternal>d__26.MoveNext()
       at Microsoft.Exchange.Search.OperatorSchema.PagingImsFlowExecutor.<Execute>d__5.MoveNext()
       at Microsoft.Exchange.Search.Fast.ExchangeQueryExecutor.RunUnderExceptionHandler[T](Func`1 call, IDiagnosticsSession session, String flowName)

    the message resource is present but the message is not found in the string/message table

    =========

    also this error: 

    Watson report about to be sent for process id: 14396, with parameters: E12IIS, c-RTL-AMD64, 16.01.0396.030, NodeRunner#IndexNode1, M.C.S.FastServer.Managed, M.C.S.F.Plugin.Start, S.R.InteropServices.SEHException, e136, 16.01.0396.030.
    ErrorReportingEnabled: True 


    Bill Coulter


    • Edited by Barebodkin Monday, March 28, 2016 12:03 PM additional app log error
    Saturday, March 26, 2016 12:02 PM

Answers

  • The newly built physical server with twin six-core processors (Xeon X5650) works perfectly. 

    Diagnosing exactly why CU1 content indexing is incompatible with older servers is above my pay grade. I can say, in response to ishmael.whale's post below, the old reliable 2950's were freshly built from scratch recently with server 2012 R2 and fully updated through windows update, so it would be a mistake to imagine they're using 8-year old NIC drivers. But thank you for taking the time to consider our situation and provide your thoughts. 

    I have not tried building VMs on the older hardware. Cain indicates that might very well work. I'm going to retire and replace them. They have been running dozens of different versions of Exchange flawlessly for almost a decade. They've seen newer servers come and go but they kept humming along. It's a shame, but it's a good reason to buy another new server and kick things up another notch.

    Thanks all. 


    Bill Coulter

    • Marked as answer by Barebodkin Wednesday, March 30, 2016 1:21 PM
    Wednesday, March 30, 2016 1:21 PM

All replies

  • Hi,

    Try to stop Exchange search services, remove index folder, after services restart it will be re-indexed, it is standard procedure with broken indexes.


    Regards From: Exchange Online | Windows Administrator's Area

    Saturday, March 26, 2016 1:18 PM
    Moderator
  • Thanks for your reply but as specifically described in the case, that was done. It didn't help.

    Bill Coulter

    Saturday, March 26, 2016 1:21 PM
  • Hi,

    In the third Exchange 2016 server, how about the free space on the dics?

    Generally, we need more space for the Database atleast 10 Percent free space for the size of the database.

    Best Regards.


    Please remember to mark the replies as answers if they help, and unmark the answers if they provide no help. If you have feedback for TechNet Support, contact tnmff@microsoft.com

    Lynn-Li
    TechNet Community Support

    Monday, March 28, 2016 7:43 AM
    Moderator
  • Free space on the third and fourth exchange servers, both of which have the same problem with indexes is ample. C: drives have over 400 GB free, log volume has about 250 GB free, database volumes have about 1 TB free or more. 

    The differences between the first and second vs the third and fourth servers are different hardware but same software. One difference I can think of has to do with the order in which everything was created, vis a vis circular logging. In a DAG, second database copies must be created before circular logging is enabled, then circular logging is enabled and third and subsequent copies are created. I'm going to try deselecting circular logging temporarily, but it's a shot in the dark.



    Bill Coulter

    Monday, March 28, 2016 12:18 PM
  • I had a similar problem with Exchange 2013 and one of the CU's the issue ended up being I had too many items in the C:\windows\temp directory.  Can you try clearing out C:\windows\temp and restarting the search services?

    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question. This can be beneficial to other community members reading the thread

    Monday, March 28, 2016 12:53 PM
  • Hi Bill,

    I have hit this same issue after upgrading from 2016 RTM to 2016 CU1, this was a single database without circular logging.

    The same issue happened after rebuilding the OS from scratch and installing 2016 CU1 clean... the one thing in common was both were on older hardware (~8 year old x3650).

    I am building a new virtual machine now to rule out hardware being the cause, will keep you updated.

    Cain.

    Tuesday, March 29, 2016 2:44 AM
  • Cain:

    The servers which don't like CU1 are older hardware in my case also. Physical, not virtual they are Dell 2950 with twin dual-core processors, Xeon 5160s and 32 GB RAM. (More than adequate for a tertiary role)

    I'm building a newer physical server (dual six-core, 96GB) from scratch to see if it can join the DAG and successfully make database copies with indexes. The ones which worked successfully with CU1 are a dual six-core with 96GB and a dual 8-core with 128GB. 

    Hinte:

    There was a lot of stuff in c:\windows\temp but clearing it all out and restarting did not help the indexing. Thanks for your reply just the same. 


    Bill Coulter

    Tuesday, March 29, 2016 11:36 AM
  • I'm still not sure whats going on here, 2016 CU1 worked perfectly as a vm but all 4 attempts on my old x3650 (dual xeon E5420) resulted in unhealthy indexes even for the default database.

    Going to take this as a hint that we should be running this on newer hardware.

    Wednesday, March 30, 2016 12:13 AM
  • just a thought, but the text above mentions a socket error, and you call out the commonality of old hardware. have you tried upgrading the NIC drivers?
    Wednesday, March 30, 2016 9:58 AM
  • The newly built physical server with twin six-core processors (Xeon X5650) works perfectly. 

    Diagnosing exactly why CU1 content indexing is incompatible with older servers is above my pay grade. I can say, in response to ishmael.whale's post below, the old reliable 2950's were freshly built from scratch recently with server 2012 R2 and fully updated through windows update, so it would be a mistake to imagine they're using 8-year old NIC drivers. But thank you for taking the time to consider our situation and provide your thoughts. 

    I have not tried building VMs on the older hardware. Cain indicates that might very well work. I'm going to retire and replace them. They have been running dozens of different versions of Exchange flawlessly for almost a decade. They've seen newer servers come and go but they kept humming along. It's a shame, but it's a good reason to buy another new server and kick things up another notch.

    Thanks all. 


    Bill Coulter

    • Marked as answer by Barebodkin Wednesday, March 30, 2016 1:21 PM
    Wednesday, March 30, 2016 1:21 PM
  • Wanted to add our experiences on this thread as well to help anyone searching this issue.

    This would certainly appear to be a CPU issue.

    We have built several DAG clusters and standalone Exchange servers on Intel Xeon 54xx, Intel Xeon 55xx and Intel Xeon 56xx Hyper-V VMs

    Standalone and DAG on a VM built on Xeon 54xx CPUs with Exchange 2016 CU1 has a meltdown, the indexing service simply fails to work it will crash NodeRunner and produce all the errors noted above. A re-installation of the indexing service doesn't seem to work either, however build a vanilla Exchange 2016 box on the same hardware and you will have no issues with indexing.

    We have had a case open since March with Microsoft. I do rather suspect this will be fixed in CU2 rather than patched mid-cycle.


    Russell

    Tuesday, May 31, 2016 11:26 AM
  • Hello,

    For information, I'm facing the same issue but my Exchange Servers (3 servers) are VM using the same host model.

    Search index appear as "Healthy" however Search results are limited to the end user cache. Event viewer log are full of errors about MSExchangeFastSearch like :

    Event 1010

    Microsoft.Exchange.Search.Fast.PerformingFastOperationException: An Exception was received during a FAST operation. ---> System.ServiceModel.CommunicationException: The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '00:01:00'. ---> System.IO.IOException: The write operation failed, see inner exception. ---> System.ServiceModel.CommunicationException: The socket connection was aborted. This could be caused by an error processing your message or a receive timeout being exceeded by the remote host, or an underlying network resource issue. Local socket timeout was '00:01:00'. ---> System.Net.Sockets.SocketException: An existing connection was forcibly closed by the remote host
       at System.Net.Sockets.Socket.Send(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags)
       at System.ServiceModel.Channels.SocketConnection.Write(Byte[] buffer, Int32 offset, Int32 size, Boolean immediate, TimeSpan timeout)

    I tried to rebuild cache, re-install Fast Search (with and without ContentSubmitters groups) without any success...

    I will try to move one of my VM to another host model to compare.





    • Edited by JordanOH Wednesday, June 8, 2016 2:58 PM
    Wednesday, June 8, 2016 2:45 PM
  • Hi there

    I too am seeing the same issue but all mine a VMs on Hyper-V 2012 R2 DataCentre hosts (HP DL380 G7s)

    Each Exchange 2016 CU1 VM is:

    Generation 2 

    • Gen 2 VM
    • 4 vCPUs
    • 24GB Ram
    • C Drive 40GB Thin Prov for OS etc
    • D Drive 40GB Applications (like Exchange) - Thin Prov
    • M Drive is a pass-thru disk of 300GB of a local HP SAS disk in the DL380 G7 for the databases. (same across all 3 Hyper-V Hosts). Formatted as ReFS within the Exchange VMs
    • All VMs are running Windows 2012 R2 Standard with every single windows update known to the planet.
    • Identical builds.
    • Exchange 2016 CU1

      First 2 no issues and replicating DB's fine, failover working, all good

      3rd VM is moaning about:

      Event ID 1009: "MSExchangeFastSearch cannot be found"

    Event ID 1008: "The description for Event ID 1008 from source MSExchangeFastSearch cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer."

    Event ID 4401: Microsoft Exchange Server Locator Service failed to find active server for database '1f152d2c-27df-45e1-ab2e-73ff42593997'. Error: An Active Manager operation failed. Error: Invalid Active Manager configuration. Error: Rootkey is not accessbile. (Error=An error occurred while attempting a cluster operation. Error: Cluster API failed: "GetClusterKey failed with 0x46. Error: The remote server has been paused or is in the process of being started")

    Ive also noticed in the System log the cluster service on this 3rd VM is terminating all the time 

    "The Cluster Service service terminated unexpectedly.  It has done this 13 time(s).  The following corrective action will be taken in 60000 milliseconds: Restart the service."

    and this Cluster HandShake issue:

    "Security Handshake between local and remote endpoints '192.168.60.74:~3343~ -> 192.168.50.73:~56968~' did not complete in '40' seconds, node terminating the connection"

    Edit: I'm going to bump up the crosssite heartbeat cluster times and see if that helps.

    ContentSubmitters Group in AD (had this when we have Exchange 2013). AD Forest/Domain is 2012 R2 Levels

    Exchange 2013 was fine, no issues and Exchange 2016 RTM was also good, but these are not in-place upgrades of those Exchange 2016 RTM boxes, these are fresh Virtual Machines with clean Exchange 2016 CU1 installed.

    I'm getting disconnects and failed DB copies on this 3rd VM, the first 2 are still replicating fine.



    • Edited by Andyhud Sunday, June 12, 2016 9:27 AM
    Sunday, June 12, 2016 9:25 AM
  • I have faced this issue too, bit this was helped.
    Sunday, June 12, 2016 11:10 AM
  • Hi there TanaKim

    What did you do to resolve?

    Sunday, June 12, 2016 11:29 AM
  • Wanted to add our experiences on this thread as well to help anyone searching this issue.

    This would certainly appear to be a CPU issue.

    We have built several DAG clusters and standalone Exchange servers on Intel Xeon 54xx, Intel Xeon 55xx and Intel Xeon 56xx Hyper-V VMs

    Standalone and DAG on a VM built on Xeon 54xx CPUs with Exchange 2016 CU1 has a meltdown, the indexing service simply fails to work it will crash NodeRunner and produce all the errors noted above. A re-installation of the indexing service doesn't seem to work either, however build a vanilla Exchange 2016 box on the same hardware and you will have no issues with indexing.

    We have had a case open since March with Microsoft. I do rather suspect this will be fixed in CU2 rather than patched mid-cycle.


    Russell

    OK, had a call from Microsoft today

    It is a recognised bug and is due to be fixed in CU2 which should be out before the end of the month

    • Proposed as answer by russgs Tuesday, June 21, 2016 2:57 PM
    Tuesday, June 21, 2016 2:57 PM
  • Thanks for info Russell, interesting

    Mine are E5646 CPUs...

    Lets hope CU2 sorts it.

    Andy


    Andy

    Tuesday, June 21, 2016 3:11 PM
  • Thanks Russel for your feedback.



    • Edited by JordanOH Tuesday, June 21, 2016 3:15 PM
    Tuesday, June 21, 2016 3:15 PM
  • Hey Andy,

    I'm not sure your issues is the same as the one in this thread, but i wouldn't be surprised if this is fixed in CU2!

    R

    Tuesday, June 21, 2016 3:36 PM
  • CU2 is now out....https://support.microsoft.com/en-gb/kb/3135742

    It specifically mentions the fix discussed so will test and advise

    Wednesday, June 22, 2016 3:14 PM
  • OK, can confirm CU2 resolves this
    • Proposed as answer by russgs Thursday, June 23, 2016 8:02 AM
    Thursday, June 23, 2016 8:02 AM
  • Yes, correct with CU2

    Same problem with DEll M600.

    Friday, June 24, 2016 11:07 AM
  • I'm still getting one or two dag members (all on 2016 CU2) with disconnected copies or just "servicedown"

    Its only ones in a separate AD site (different subnet). Whats wierd is we never had the problem on Ex2013.

    Event ID 1009: (on the mailbox server with the problem in the separate AD site) - App Log

    "The indexing of mailbox database XXXXXX encountered an unexpected exception. Error details: Microsoft.Exchange.Search.Core.Abstraction.OperationFailedException: The component operation has failed. ---> Microsoft.Exchange.Search.Core.Abstraction.OperationFailedException: Failed to establish admin RPC connection. ---> Microsoft.Mapi.MapiExceptionNetworkError: MapiExceptionNetworkError: Unable to make admin interface connection to server. (hr=0x80040115, ec=-2147221227)
    Diagnostic context:"

    Event 2058: (App log)

    The Microsoft Exchange Replication service was unable to perform an incremental reseed of database copy 'DB1\MbxSrv1' due to a network error. The database copy status will be set to Disconnected. Error Microsoft.Exchange.Cluster.Replay.NetworkCommunicationException: An error occurred while communicating with server 'MbxSrv2'. Error: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. ---> System.IO.IOException: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

    Event ID 2153: (again on mailbox server thats having problems) - App log

    The log copier was unable to communicate with server 'MbxSrv2.mydomain.com'. The copy of database 'DB2\MbxSrv1' is in a disconnected state. The communication error was: An error occurred while communicating with server 'MbxSrv2'. Error: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. The copier will automatically retry after a short delay.

    Also seen the cluster service timeout out and stopping every now and again..

    Funny, the 2 Mbx servers in the other AD site are fine.

    Event ID 7031: Sys Log

    The Cluster Service service terminated unexpectedly.  It has done this 37 time(s).  The following corrective action will be taken in 60000 milliseconds: Restart the service.

    Obviously network has been checked, all looks ok. Also bumped up CrossSiteDelay and Threshold.

    Can telnet from/to all servers on Port TCP 64327 (Default Dag Repl port)


    • Edited by Andyhud007 Thursday, July 14, 2016 7:25 AM
    Thursday, July 14, 2016 7:16 AM
  • We also have the "Index Catalog Phenomenon" in our freshly installed Exchange 2016 DAG-Environment.

    Creating new (primary) databases is working fine and is ending with a working catalog index. But every new passive copy has the Index in the "failed" state. After any time, the Exchange 2016 do an "auto reseed" and then the Index changed from "failed" to "healthy".

    It´s impossible to repair the Index on the passive copy; no "update-mailboxdatabasecopy -catalogonly", also the other tricks. Just waiting for the auto-reseed.

    I can see the usual error messages in the event log (1010 etc.) but i can´t find any solution for the index problem.

    Monday, September 26, 2016 9:36 AM
  • We had the same Problem with a 2Node DAG and newly created database after Ex2016 CU4 was applied. We tried serveral things like reseeding catalog or stop search Services without any success.  Database keeped in Status Failed or FailedAndSuspended.

    Next I tried to Change the ActivationPreference. (https://social.technet.microsoft.com/Forums/exchange/en-US/92125bd0-b012-48b2-b7cd-5a41ee6d42d2/exchange-2013-cu3-contentindex-failedandsuspended-event-1009-msexchangefastsearch-the-database?forum=exchangesvradmin)

    After this the status changed immediately to crawling and a few minutes later to Healthy.

    Thursday, December 22, 2016 3:18 PM
  • That's a quite interesting solution approach. Good to know that there is a solution (workaround) available.

    Thomas Stensitzki - MCSM, MCM, MCSE, MCSA, MCITP - Blog: http://justcantgetenough.granikos.eu/

    Wednesday, January 11, 2017 9:31 AM
  • Thank you, great workaround!

    Same problem... fixed by change of activation preference

    then...

    update-mailboxdatabasecopy DB\SVR -catalogonly  [this hadn't work previously]

    Thursday, February 9, 2017 10:32 AM
  • We had the same Problem with a 2Node DAG and newly created database after Ex2016 CU4 was applied. We tried serveral things like reseeding catalog or stop search Services without any success.  Database keeped in Status Failed or FailedAndSuspended.

    Next I tried to Change the ActivationPreference. (https://social.technet.microsoft.com/Forums/exchange/en-US/92125bd0-b012-48b2-b7cd-5a41ee6d42d2/exchange-2013-cu3-contentindex-failedandsuspended-event-1009-msexchangefastsearch-the-database?forum=exchangesvradmin)

    After this the status changed immediately to crawling and a few minutes later to Healthy.

    Thanks for sharing Tobias, this has to be bug surely?

    I'm still on Ex2016 Cu3 but going to update to Cu4 next week.. lets see if the problem re-occurs!

    Cheers

    Andy


    Andy

    Thursday, February 9, 2017 10:49 AM
  • We are seeing this as well on multiple CU4 exchange orgs.

    Thank you for the workaround of changing activation preference which then allows the update-mailboxdatabasecopy -catalogonly -deleteexistingfiles to properly update the index.

    Sunday, February 12, 2017 3:57 AM
  • We ran into this problem too with Exchange 2013 (DAG with 3 nodes).

    After CU14 the indexing for the MPF-Database didnt work. With CU15 everything went normal.

    But now we have one month (between CU14 and CU15) of new Objects in the MPF Database which are not detected and not shown in the search results with Outlook.

    Is it possible to reindex the missing month? Update-MailboxDatabaseCopy -Identity [DB_NAME] -CatalogOnly didn't work.

    Wednesday, February 15, 2017 3:34 PM