none
noderunner.exe high CPU usage

    Question

  • Hi All ,

    In our environment we have exchange 2013 with CU15 and we had kept our exchange servers in DAG for HA . All the servers hardware in the DAG was equally sized in terms of resources (CPU, RAM , JBOD Disks ).

    In that DAG Recently we had issue on one of the servers where the CPU usage hits 100% when i mount anyone of the database on that server .

    After DB is mounted , When i check the CPU resources i could see noderunner.exe is at the top and it is fluctuating between 60% to 99% .In most cases due to high CPU usage the mounted DB got failed over automatically to its partner server and it throws the below mentioned error message in the Managed availability logs (Mailboxdatabasefailureitems)

    Managed Availability  - Mailboxdatabasefailureitems : 

    Event ID : 1 

    Failure Item (Namespace=Store, Tag=HungStoreWorker, Database=CN=test)

    ==================

    Application Logs : 

    Event ID : 165

    Source : ExchangeStoreDB

    Task Category : Database recovery

    ====================

    As part of the troubleshooting steps i did the following steps

    1.When database is mounted and at the time of high CPU usage i had stopped the indexing services and checked the resluts .But the CPU usage has not come down to the normal state .

    2.Then we suspect that could be the Catalog file issue on that server .So we had completed reseeded the catalog file for all the database copies on the server .But the same problem persist when we try to mount anyone of the database copy on that server .

    3.Also we had reseeded all the database file as well .Again the same problem persist if we try to mount anyone of the DB on that server.

    Note : As per my knowledge this issue is not related to specific database .Since this issue comes up when we try to mount anyone of the database copy on that server.

    All of you share your views and suggestions to check this issue further in my end .


    Thanks & Regards S.Nithyanandham

    Wednesday, April 26, 2017 9:44 AM

Answers

  • Hello All , 

    We have found the fix for this issue .It turned out that the faulty memory module has caused this issue on the server .We have replaced the faulty memory module on that server .After that server started to work normally without issues.

    Thanks a lot to everyone for your support .


    Thanks & Regards S.Nithyanandham

    Monday, May 22, 2017 2:57 PM

All replies

  • Hello All , 

    Can someone please shed light on this case .


    Thanks & Regards S.Nithyanandham

    Wednesday, April 26, 2017 1:35 PM
  • Hello All , 

    If there is any suggestions , Please let me know the steps .


    Thanks & Regards S.Nithyanandham

    Thursday, April 27, 2017 8:31 AM
  • Hello,

    As we know, the noderunner.exe is used by the new Exchange search, FAST search engine, to provide more robust indexing and searching. And if you stop the Microsoft Exchange Search Host Controller service, the processes will be stopped.
    More details about it, refer to "About the NodeRunner.exe process" section.

    Therefore, we can run below command to check the search indexing in DAG:
    Get-MailboxDatabase | Get-MailboxDatabaseCopyStatus

    If the content index is Failed and Suspended state, and it remain not works after reseed the Search Catalog with CatalogOnly.
    Thus, try to stop search services and manually remove the content index catalog file under %ExchangeInstallPath\Mailbox\<name of mailbox database>_Catalog\<GUID>12.1.Single.

    Best Regards,

    Allen Wang


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Thursday, April 27, 2017 2:06 PM
    Moderator
  • Hello Allen , 

    Thanks for your reply .

    In my case noderunner.exe is not utilizing maximum memory and it is using maximum CPU .Also i could see that catalog index status is healthy for all the mailbox database copies and it is not in the failed and suspended state .

    Note : To troubleshoot this issue , Already i have reseeded the catalog file for all the database copies on that server .As of now all the database copies and content index is healthy on that server and the CPU usage fluctuates between 20% to 30% .But it will spikes up to 100% and noderunner.exe will utilize maximum CPU when we mount/activate anyone of the mailbox database copy on that server.Once CPU spikes to 100% system will wait for sometime and then it will automatically failover that mounted DB to its partner server in DAG.

    Please share your views and suggestions to move further on this case .


    Thanks & Regards S.Nithyanandham


    • Edited by Nithyanandham Thursday, April 27, 2017 2:55 PM Added some more info.
    Thursday, April 27, 2017 2:52 PM
  • Are you by chance running EWS Managed API on that box? We had it running on one of my DAG nodes and it caused the same symptoms you described. I uninstalled it and everything went back to normal.

    My Blog: http://exchangeitup.blogspot.com My Twitter: http://twitter.com/ExchangeITup

    Thursday, April 27, 2017 4:07 PM
  • Hello Stacey Branham ,

    On the server i could see only Microsoft unified communications managed API 4.0, runtime  and not EWS Managed API .

    Please share me your views and suggestions .


    Thanks & Regards S.Nithyanandham

    Friday, April 28, 2017 8:16 AM
  • Perhaps your server needs more CPU resources.

    Ed Crowley MVP "There are seldom good technological solutions to behavioral problems."
    Celebrating 20 years of providing Exchange peer support!

    Saturday, April 29, 2017 2:21 AM
    Moderator
  • Hello ED ,

    Thanks for your reply .Just wanted to inform you that we had Sized hardware resources equally for all the servers in that DAG .Also in that DAG remaining servers are functioning properly without any issues.

    Additionally i wanted to share you the below mentioned error message with you . Please have a look and share your views and suggestions .

    Error Message : 

    Application Logs : 

    Event ID : 165

    Source : ExchangeStoreDB

    Task Category : Database recovery

    At '5/1/2017 12:12:06 PM' the Exchange store database 'testDB01' copy on this server timed out on periodic status check. For more details about the failure, consult the Event log on the server for other storage and "ExchangeStoreDb" events. The passive database copy will retry to mount as passive.


    Thanks & Regards S.Nithyanandham

    Tuesday, May 02, 2017 6:30 AM
  • That could indicate issues with storage performance.  You might want to start PerfMon, and look at the database and transaction log volumes' Logical Disk > Avg. sec/read and Avg. sec/write counters.  Make sure that you're seeing averages of 10ms and below, with peaks of no more than 100ms.

    Ed Crowley MVP "There are seldom good technological solutions to behavioral problems."
    Celebrating 20 years of providing Exchange peer support!

    Wednesday, May 03, 2017 12:38 AM
    Moderator
  • Hello ED ,

    Thanks a lot for your response .

    I just wanted to inform you that all the HDD'S  are directly attached to the server (i.e JBOD configuration) and we are not using external storage for DB and log volumes in that server and also this setup is common for all the servers in that DAG .

    One weird thing that i have noticed is , the above given alert is coming only on the problematic server and not for other servers in that Same DAG .

    So Please let me know , Can i use the same given procedure to check all the attached HDD'S on that server ? 


    Thanks & Regards S.Nithyanandham

    Wednesday, May 03, 2017 8:43 AM
  • The suggestion I posted is valid.

    Ed Crowley MVP "There are seldom good technological solutions to behavioral problems."
    Celebrating 20 years of providing Exchange peer support!

    Wednesday, May 03, 2017 10:16 PM
    Moderator
  • Hello ED ,

    Thanks for your reply , Let me check with the Perfmon counters for the attached disks and then i will come back to you with my update .


    Thanks & Regards S.Nithyanandham

    Thursday, May 04, 2017 1:30 PM
  • Hello All , 

    We have found the fix for this issue .It turned out that the faulty memory module has caused this issue on the server .We have replaced the faulty memory module on that server .After that server started to work normally without issues.

    Thanks a lot to everyone for your support .


    Thanks & Regards S.Nithyanandham

    Monday, May 22, 2017 2:57 PM