locked
Both Distributed Cache SharePoint Service and AppFabric Caching Service stop RRS feed

  • Question

  • Both Distributed Cache SharePoint Service and AppFabric Caching Service stop on our web server in our brand new SharePoint farm:  1 web, 1 app, 1 search and 1 sql server.  I turned off dist cache service on all servers except the web server.  The AppFabric Caching Service and the distributed cache service were running for some time on this web server but recently I started seeing Event 1000, 2016, 6398.  Now the AppFabric service and the SharePoint distributed cache service are stopped. 

    Please note we also have a two server development (web/sql) farm which does not have this error.  Also, while troubleshooting this, the system complained that dist cache was on two servers but not running so I removed the dist cache using powershell on those two servers where I didn't want it to run but of course left it on the web server. 

    There are a lot of blogs with instructions to remedy this situation but they seem greatly unsuccessful.  I have only tried to stop the sp timer, clear cache, remove the dist cache and add dist cache to this server.  When I try to add or Start the Distributed Cache I keep getting these errors on the SharePoint 2013 server:

    Event 1000
    Faulting application name: DistributedCacheService.exe, version: 1.0.4632.0, time stamp: 0x4eafeccf
    Faulting module name: KERNELBASE.dll, version: 6.2.9200.16451, time stamp: 0x50988aa6
    Exception code: 0xe0434352
    Fault offset: 0x000000000003811c
    Faulting process id: 0xc10
    Faulting application start time: 0x01cf17daa0d089ca
    Faulting application path: C:\Program Files\AppFabric 1.1 for Windows Server\DistributedCacheService.exe
    Faulting module path: C:\windows\system32\KERNELBASE.dll
    Report Id: 4ee610b7-83ce-11e3-9406-0050569b2903
    Faulting package full name:

    Event 1026
    Application: DistributedCacheService.exe
    Framework Version: v4.0.30319
    Description: The process was terminated due to an unhandled exception.
    Exception Info: Microsoft.ApplicationServer.Caching.DataCacheException
    Stack:
       at Microsoft.ApplicationServer.Caching.VelocityWindowsService.ThrowCallback(System.Object)
       at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
       at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
       at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
       at System.Threading.ThreadPoolWorkQueue.Dispatch()

    A clear set of instructions on how to proceed from square one would be appreciated.

    • Edited by kcsptech Thursday, January 23, 2014 7:39 PM typo
    Thursday, January 23, 2014 7:38 PM

All replies

  • try this:

    http://wscheema.com/blog/Lists/Posts/Post.aspx?ID=9


    Please remember to mark your question as answered &Vote helpful,if this solves/helps your problem. ****************************************************************************************** Thanks -WS MCITP(SharePoint 2010, 2013) Blog: http://wscheema.com/blog

    Friday, January 24, 2014 5:05 AM
  • Steps that I used to fix AppFabric Caching Service and Distributed Cache SharePoint Service stopping, Event 6398 still remains.

    0)  Logged in as FARM account and made sure it was in the Administrators group
    1)  Applied a cumulative update for the AppFabric (KB Article 2716015) and then worked on the steps to start services
    1.5)  Edited C:\Program Files\AppFabric 1.1 for Windows Server\DistributedCacheService.exe.config  I made two changes to this file:  Updated the account to the Farm account and Changed the name to the fully qualified DNS
    2) Created a WindowsAppFabricAllowedUsers local group and added the farm and service accounts
    3) Checked all permissions on the databases to make sure all app pool accounts had SPDataAccess privilages
    4) Made sure all groups like IIS_IUSERS, and WPG groups had all app pool accounts in them
    5) Started the cachehost:  Start-CacheHost -Computername <servername> -CachePort 22233
    6) Removed the DistributedCacheServiceInstance and then added it again.
    7)  KEY STEP - I avoided starting Distributed Cache Service using Central Admin because it always gets stuck starting but never actually starts.  I useed this set of commands instead:

    $instanceName ="SPDistributedCacheService Name=AppFabricCachingService"
    $serviceInstance = Get-SPServiceInstance | ? {($_.service.tostring()) -eq $instanceName -and ($_.server.name) -eq $env:computername}
    $serviceInstance.Provision()

    8)  Did some checks using the commands

    Get-CacheHost

    Get-CacheClusterHealth

    Get-AFCacheHostConfiguration -ComputerName <servername> -CachePort "22233"

    9)  But I still have the problem with Event 6398 every 5 minutes or so:

    The Execute method of job definition Microsoft.Office.Server.UserProfiles.LMTRepopulationJob (ID fe8eabc9-de7e-4a10-ad6d-10dc35b7e62b) threw an exception. More information is included below.

    Unexpected exception in FeedCacheService.IsRepopulationNeeded: Unable to create a DataCache. SPDistributedCache is probably down..




    • Edited by kcsptech Friday, January 24, 2014 7:42 PM added missing info
    • Marked as answer by kcsptech Monday, January 27, 2014 7:59 PM
    • Unmarked as answer by kcsptech Monday, January 27, 2014 11:43 PM
    Friday, January 24, 2014 7:02 PM
  • Is your server name or service account name longer than 15 characters?

    Trevor Seward

    Follow or contact me at...
      

    This post is my own opinion and does not necessarily reflect the opinion or view of Microsoft, its employees, or other MVPs.

    Friday, January 24, 2014 7:07 PM
  • Ditto what Trevor asked..Also, what is the output when you run the PowerShell cmdlet Get-CacheHost?

    @AndrewBillings//MCSA,MCSE www.andrewjbillings.com

    Friday, January 24, 2014 7:34 PM
  • UPDATE after running over the weekend:

    • The service account names are under 15 chars but with the domain prepended they are indeed over 15 characters. 
    • We have 1 web server, 1 app server (Central Admin server), and 1 search server.  When building the farm I read that distributed cache is only relevant on the web server so I stopped the distributed cache service only on the app and search servers where it was already running but AppFabric was unconfigured and disabled in these cases.  
    • The dist cache appeared to be running properly on the web server but after an unknown period of time I noticed Event 1000, 1023, and 6398 related to this problem on all three servers.

    ON THE WEB SERVER Event 1000 and 1026 disappeared immediately after I rebuilt the cache system (see steps above) but Event 6398 continued for 12 hours and has now completely disappeared for many days.

    ON THE APP SERVER the same two different Event 6398 are still being logged:

    1. The Execute method of job definition Microsoft.Office.Server.UserProfiles.UserProfileImportJob (ID 0bbb97a7-3d03-41d7-aebc-2a5d8f9bd658) threw an exception. More information is included below.  Generic failure
    2. The Execute method of job definition Microsoft.Office.Server.UserProfiles.LMTRepopulationJob (ID fe8eabc9-de7e-4a10-ad6d-10dc35b7e62b) threw an exception. More information is included below.  Unexpected exception in FeedCacheService.IsRepopulationNeeded: Unable to create a DataCache. SPDistributedCache is probably down..

    ON THE SEARCH SERVER Event 6398 stopped showing up in the event viewer after I rebuilt the cache on the web server

    Results of the Get-CacheClusterHealth run on any server:

    Cluster health statistics
    =========================

    HostName = webservername.domain.com
    -------------------------

        NamedCache = DistributedActivityFeedCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
            Healthy               = 9.09
            UnderReconfiguration  = 0.00
            NotPrimary            = 0.00
            InadequateSecondaries = 0.00
            Throttled             = 0.00

        NamedCache = DistributedActivityFeedLMTCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
            Healthy               = 9.09
            UnderReconfiguration  = 0.00
            NotPrimary            = 0.00
            InadequateSecondaries = 0.00
            Throttled             = 0.00

        NamedCache = DistributedLogonTokenCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
            Healthy               = 9.09
            UnderReconfiguration  = 0.00
            NotPrimary            = 0.00
            InadequateSecondaries = 0.00
            Throttled             = 0.00

        NamedCache = DistributedBouncerCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
            Healthy               = 9.09
            UnderReconfiguration  = 0.00
            NotPrimary            = 0.00
            InadequateSecondaries = 0.00
            Throttled             = 0.00

        NamedCache = DistributedDefaultCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
            Healthy               = 9.09
            UnderReconfiguration  = 0.00
            NotPrimary            = 0.00
            InadequateSecondaries = 0.00
            Throttled             = 0.00

        NamedCache = DistributedSecurityTrimmingCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
            Healthy               = 9.09
            UnderReconfiguration  = 0.00
            NotPrimary            = 0.00
            InadequateSecondaries = 0.00
            Throttled             = 0.00

        NamedCache = default
            Healthy               = 9.09
            UnderReconfiguration  = 0.00
            NotPrimary            = 0.00
            InadequateSecondaries = 0.00
            Throttled             = 0.00

        NamedCache = DistributedAccessCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
            Healthy               = 9.09
            UnderReconfiguration  = 0.00
            NotPrimary            = 0.00
            InadequateSecondaries = 0.00
            Throttled             = 0.00

        NamedCache = DistributedViewStateCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
            Healthy               = 9.09
            UnderReconfiguration  = 0.00
            NotPrimary            = 0.00
            InadequateSecondaries = 0.00
            Throttled             = 0.00

        NamedCache = DistributedSearchCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
            Healthy               = 9.09
            UnderReconfiguration  = 0.00
            NotPrimary            = 0.00
            InadequateSecondaries = 0.00
            Throttled             = 0.00

        NamedCache = DistributedServerToAppServerAccessTokenCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
            Healthy               = 9.09
            UnderReconfiguration  = 0.00
            NotPrimary            = 0.00
            InadequateSecondaries = 0.00
            Throttled             = 0.00


    Unallocated named cache fractions
    ---------------------------------

    • Edited by kcsptech Tuesday, January 28, 2014 12:10 AM clarification
    Friday, January 24, 2014 7:40 PM
  • How did you originally stop the cache service on the APP server? From the sounds of it this is the only server still experiencing this error correct? Please confirm that you followed the documentation provided here for stopping this service: http://technet.microsoft.com/en-us/library/jj219613.aspx

    Regards,

    Andrew J Billings

    Portal Systems Engineer//MCSA,MCSE

    Blog: http://www.andrewjbillings.com  Twitter:   LinkedIn:   

    Tuesday, January 28, 2014 4:22 AM
  • I am going through a similar situation.  With regards to the HostName being longer than 15 characters; what issues will that raise?  Does the 15 character limit include  the entire HostName.Domain.com?  Or just the Name?
    Thursday, March 5, 2015 5:30 PM
  • I have seen it where if the hostname in hostname.domain.com is longer than 15 characters the App Fabric Caching service will fail to start in a brand new SharePoint 2013 environment. You can modify the app fabric cache cluster config with powershell to get the service to start, but if this is a brand new environment you may just want to consider having a server name with less than 15 characters.

    Also, after modifying the cachecluster config (using the following steps): 

    1. Run this PowerShell command to export the current App Fabric configuration: Export-CacheClusterConfig -File c:\CurrentClusterConfig.xml
    2. Modify the cachehostname entry to be the actual server name instead of the shortened netbios name
    3. For example: cacheHostName=”AppFabricCachingService” name=”SPDEV13LongServer”
    4. Run the following powershell command to import the updated config with the long server name: Import-CacheClusterConfig -File c:\CurrentClusterConfig.xml
    5. Double check your work by running the PowerShell cmdlet get-cachehost
    6. Reprovision the  Distributed Cache on the server with the long name using the following PowerShell commands: 
    Remove-SPDistributedCacheServiceInstance
    Add-SPDistributedCacheServiceInstance

    After doing these steps the App Fabric caching service should be running (If a server's name had more than 15 characters)

    You will also want to check that these 2 commands match up or else your User Profile Sync service may not start up. 

    $farm = get-spfarm
    Write-host "SharePoint Machine Names:" $farm.servers
    write-host "System Machine Name:" [system.environment]::machinename


    Regards,

    Andrew J Billings

    Portal Systems Engineer//MCSA,MCSE

    Blog: http://www.andrewjbillings.com  Twitter:   LinkedIn:   

    Monday, March 9, 2015 3:29 PM