Asked by:
Both Distributed Cache SharePoint Service and AppFabric Caching Service stop

Question
-
Both Distributed Cache SharePoint Service and AppFabric Caching Service stop on our web server in our brand new SharePoint farm: 1 web, 1 app, 1 search and 1 sql server. I turned off dist cache service on all servers except the web server. The AppFabric Caching Service and the distributed cache service were running for some time on this web server but recently I started seeing Event 1000, 2016, 6398. Now the AppFabric service and the SharePoint distributed cache service are stopped.
Please note we also have a two server development (web/sql) farm which does not have this error. Also, while troubleshooting this, the system complained that dist cache was on two servers but not running so I removed the dist cache using powershell on those two servers where I didn't want it to run but of course left it on the web server.
There are a lot of blogs with instructions to remedy this situation but they seem greatly unsuccessful. I have only tried to stop the sp timer, clear cache, remove the dist cache and add dist cache to this server. When I try to add or Start the Distributed Cache I keep getting these errors on the SharePoint 2013 server:
Event 1000
Faulting application name: DistributedCacheService.exe, version: 1.0.4632.0, time stamp: 0x4eafeccf
Faulting module name: KERNELBASE.dll, version: 6.2.9200.16451, time stamp: 0x50988aa6
Exception code: 0xe0434352
Fault offset: 0x000000000003811c
Faulting process id: 0xc10
Faulting application start time: 0x01cf17daa0d089ca
Faulting application path: C:\Program Files\AppFabric 1.1 for Windows Server\DistributedCacheService.exe
Faulting module path: C:\windows\system32\KERNELBASE.dll
Report Id: 4ee610b7-83ce-11e3-9406-0050569b2903
Faulting package full name:Event 1026
Application: DistributedCacheService.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: Microsoft.ApplicationServer.Caching.DataCacheException
Stack:
at Microsoft.ApplicationServer.Caching.VelocityWindowsService.ThrowCallback(System.Object)
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
at System.Threading.ThreadPoolWorkQueue.Dispatch()A clear set of instructions on how to proceed from square one would be appreciated.
- Edited by kcsptech Thursday, January 23, 2014 7:39 PM typo
Thursday, January 23, 2014 7:38 PM
All replies
-
try this:
http://wscheema.com/blog/Lists/Posts/Post.aspx?ID=9
Please remember to mark your question as answered &Vote helpful,if this solves/helps your problem. ****************************************************************************************** Thanks -WS MCITP(SharePoint 2010, 2013) Blog: http://wscheema.com/blog
Friday, January 24, 2014 5:05 AM -
Steps that I used to fix AppFabric Caching Service and Distributed Cache SharePoint Service stopping, Event 6398 still remains.
0) Logged in as FARM account and made sure it was in the Administrators group
1) Applied a cumulative update for the AppFabric (KB Article 2716015) and then worked on the steps to start services
1.5) Edited C:\Program Files\AppFabric 1.1 for Windows Server\DistributedCacheService.exe.config I made two changes to this file: Updated the account to the Farm account and Changed the name to the fully qualified DNS
2) Created a WindowsAppFabricAllowedUsers local group and added the farm and service accounts
3) Checked all permissions on the databases to make sure all app pool accounts had SPDataAccess privilages
4) Made sure all groups like IIS_IUSERS, and WPG groups had all app pool accounts in them
5) Started the cachehost: Start-CacheHost -Computername <servername> -CachePort 22233
6) Removed the DistributedCacheServiceInstance and then added it again.
7) KEY STEP - I avoided starting Distributed Cache Service using Central Admin because it always gets stuck starting but never actually starts. I useed this set of commands instead:$instanceName ="SPDistributedCacheService Name=AppFabricCachingService"
$serviceInstance = Get-SPServiceInstance | ? {($_.service.tostring()) -eq $instanceName -and ($_.server.name) -eq $env:computername}
$serviceInstance.Provision()8) Did some checks using the commands
Get-CacheHost
Get-CacheClusterHealth
Get-AFCacheHostConfiguration -ComputerName <servername> -CachePort "22233"
9) But I still have the problem with Event 6398 every 5 minutes or so:
The Execute method of job definition Microsoft.Office.Server.UserProfiles.LMTRepopulationJob (ID fe8eabc9-de7e-4a10-ad6d-10dc35b7e62b) threw an exception. More information is included below.
Unexpected exception in FeedCacheService.IsRepopulationNeeded: Unable to create a DataCache. SPDistributedCache is probably down..
Friday, January 24, 2014 7:02 PM -
-
Ditto what Trevor asked..Also, what is the output when you run the PowerShell cmdlet Get-CacheHost?
@AndrewBillings//MCSA,MCSE www.andrewjbillings.com
Friday, January 24, 2014 7:34 PM -
UPDATE after running over the weekend:
- The service account names are under 15 chars but with the domain prepended they are indeed over 15 characters.
- We have 1 web server, 1 app server (Central Admin server), and 1 search server. When building the farm I read that distributed cache is only relevant on the web server so I stopped the distributed cache service only on the app and search servers where it was already running but AppFabric was unconfigured and disabled in these cases.
- The dist cache appeared to be running properly on the web server but after an unknown period of time I noticed Event 1000, 1023, and 6398 related to this problem on all three servers.
ON THE WEB SERVER Event 1000 and 1026 disappeared immediately after I rebuilt the cache system (see steps above) but Event 6398 continued for 12 hours and has now completely disappeared for many days.
ON THE APP SERVER the same two different Event 6398 are still being logged:
- The Execute method of job definition Microsoft.Office.Server.UserProfiles.UserProfileImportJob (ID 0bbb97a7-3d03-41d7-aebc-2a5d8f9bd658) threw an exception. More information is included below. Generic failure
- The Execute method of job definition Microsoft.Office.Server.UserProfiles.LMTRepopulationJob (ID fe8eabc9-de7e-4a10-ad6d-10dc35b7e62b) threw an exception. More information is included below. Unexpected exception in FeedCacheService.IsRepopulationNeeded: Unable to create a DataCache. SPDistributedCache is probably down..
ON THE SEARCH SERVER Event 6398 stopped showing up in the event viewer after I rebuilt the cache on the web server
Results of the Get-CacheClusterHealth run on any server:
Cluster health statistics
=========================HostName = webservername.domain.com
-------------------------NamedCache = DistributedActivityFeedCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
Healthy = 9.09
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00NamedCache = DistributedActivityFeedLMTCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
Healthy = 9.09
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00NamedCache = DistributedLogonTokenCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
Healthy = 9.09
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00NamedCache = DistributedBouncerCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
Healthy = 9.09
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00NamedCache = DistributedDefaultCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
Healthy = 9.09
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00NamedCache = DistributedSecurityTrimmingCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
Healthy = 9.09
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00NamedCache = default
Healthy = 9.09
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00NamedCache = DistributedAccessCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
Healthy = 9.09
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00NamedCache = DistributedViewStateCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
Healthy = 9.09
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00NamedCache = DistributedSearchCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
Healthy = 9.09
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00NamedCache = DistributedServerToAppServerAccessTokenCache_3abfa55a-c818-4b4f-a894-c98b51ee8f4e
Healthy = 9.09
UnderReconfiguration = 0.00
NotPrimary = 0.00
InadequateSecondaries = 0.00
Throttled = 0.00
Unallocated named cache fractions
---------------------------------- Edited by kcsptech Tuesday, January 28, 2014 12:10 AM clarification
Friday, January 24, 2014 7:40 PM -
How did you originally stop the cache service on the APP server? From the sounds of it this is the only server still experiencing this error correct? Please confirm that you followed the documentation provided here for stopping this service: http://technet.microsoft.com/en-us/library/jj219613.aspx
Regards,
Andrew J Billings
Portal Systems Engineer//MCSA,MCSE
Blog:http://www.andrewjbillings.com Twitter:
LinkedIn:
Tuesday, January 28, 2014 4:22 AM -
I am going through a similar situation. With regards to the HostName being longer than 15 characters; what issues will that raise? Does the 15 character limit include the entire HostName.Domain.com? Or just the Name?Thursday, March 5, 2015 5:30 PM
-
I have seen it where if the hostname in hostname.domain.com is longer than 15 characters the App Fabric Caching service will fail to start in a brand new SharePoint 2013 environment. You can modify the app fabric cache cluster config with powershell to get the service to start, but if this is a brand new environment you may just want to consider having a server name with less than 15 characters.
Also, after modifying the cachecluster config (using the following steps):
- Run this PowerShell command to export the current App Fabric configuration: Export-CacheClusterConfig -File c:\CurrentClusterConfig.xml
- Modify the cachehostname entry to be the actual server name instead of the shortened netbios name
- For example: cacheHostName=”AppFabricCachingService” name=”SPDEV13LongServer”
- Run the following powershell command to import the updated config with the long server name: Import-CacheClusterConfig -File c:\CurrentClusterConfig.xml
- Double check your work by running the PowerShell cmdlet get-cachehost
- Reprovision the Distributed Cache on the server with the long name using the following PowerShell commands:
Add-SPDistributedCacheServiceInstanceAfter doing these steps the App Fabric caching service should be running (If a server's name had more than 15 characters)
You will also want to check that these 2 commands match up or else your User Profile Sync service may not start up.
$farm = get-spfarm
Write-host "SharePoint Machine Names:" $farm.servers
write-host "System Machine Name:" [system.environment]::machinenameRegards,
Andrew J Billings
Portal Systems Engineer//MCSA,MCSE
Blog:http://www.andrewjbillings.com Twitter:
LinkedIn:
Monday, March 9, 2015 3:29 PM