locked
FSIS host controller service getting stopped frequently on windows 2008 machine RRS feed

  • Question

  • We have 6 Windows hosts out of which 1 is admin node for FSIS and other 5 have IMS instances running. Of late we observed that FSIS host controller service is getting stopped on many of these windows hosts almost every day and sometimes 5-6 times a day. From the event viewer we could not find much errors except messaging saying FSIS host controller service got terminated unexpectedly. below are the message from one of the host controller service log. I have enabled debug mode today so waiting for debug logs to give some more information

    2013-01-29T20:07:55.3371746-08:00 Warning [11] WcfReplicationManager - Exception in replication handler, backing off for 300 seconds
    System.ServiceModel.EndpointNotFoundException: Could not connect to net.tcp://srch-fsis-001-p.cisco.com:17001/ceres/hostcontroller/nettcp. The connection attempt lasted for a time span of 00:00:00.9961320. TCP error code 10061: No connection could be made because the target machine actively refused it 72.163.42.208:17001.

    Server stack trace:
       at System.ServiceModel.Channels.SocketConnectionInitiator.Connect(Uri uri, TimeSpan timeout)
       at System.ServiceModel.Channels.BufferedConnectionInitiator.Connect(Uri uri, TimeSpan timeout)
       at System.ServiceModel.Channels.ConnectionPoolHelper.EstablishConnection(TimeSpan timeout)
       at System.ServiceModel.Channels.ClientFramingDuplexSessionChannel.OnOpen(TimeSpan timeout)
       at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
       at System.ServiceModel.Channels.ServiceChannel.OnOpen(TimeSpan timeout)
       at System.ServiceModel.Channels.CommunicationObject.Open(TimeSpan timeout)
       at System.ServiceModel.Channels.ServiceChannel.CallOnceManager.CallOnce(TimeSpan timeout, CallOnceManager cascade)
       at System.ServiceModel.Channels.ServiceChannel.EnsureOpened(TimeSpan timeout)
       at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)
       at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)
       at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)

    Exception rethrown at [0]:
       at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)
       at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)
       at Microsoft.Ceres.HostController.WcfTypes.IRepositoryVersionManagement.GetRepositoryGuid()
       at Microsoft.Ceres.HostController.WcfServer.WcfReplicationManager.get_PrimaryRepositoryGuid()
       at Microsoft.Ceres.HostController.Repository.Replication.ReplicationManager.ReplicationHandler()
    System.Net.Sockets.SocketException: No connection could be made because the target machine actively refused it 72.163.42.208:17001
       at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
       at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
       at System.ServiceModel.Channels.SocketConnectionInitiator.Connect(Uri uri, TimeSpan timeout)
    2013-01-30T07:55:05.1432766-08:00 Info    [5] ServiceRunner - Starting host controller service
    2013-01-30T07:55:05.1921066-08:00 Info    [5] ServiceRunner - Machine Configuration:
        Machine Name          : SRCH-FSIS-006-P
        Operating System      : Microsoft Windows NT 6.1.7601 Service Pack 1
        Runtime Version       : v2.0.50727

    2013-01-30T07:55:05.2057790-08:00 Info    [5] ServiceRunner - Service Configuration:
        Log Level             : Info
        Log Max FileSize      : 1 GB
        Log Max Log Volume    : 100 GB
        Log Retention Period  : 40320 minutes
        Log Rotate Daily      : True
        Log Rotate Hour of Day: 0
        EventLog Source       : FSIS Host Controller
        EventLog Level        : Info
        Runtimes              : C:\Program Files\FAST Search Server 2010 for Internet Sites\Runtime\
        Repository Path       : C:\Users\fsis.gen\AppData\Local\FSIS\Repository\
        Replicated From       : net.tcp://srch-fsis-001-p.cisco.com:17001/ceres/hostcontroller/nettcp
        Node Path             : C:\Users\fsis.gen\AppData\Local\FSIS\Nodes\
        Node Port Allocation  : Start = 17002, End = 17199, Block Size = 20
        Node Memory Limit     : Soft = 70%, Hard = 80% of installed physical memory
        On Shutdown           : Teminate all nodes
    2013-01-30T07:55:05.4196544-08:00 Info    [5] RepositoryManager - Initializing repository
    2013-01-30T07:55:05.7399792-08:00 Info    [5] RepositoryManager - Initializing repository journal
    2013-01-30T07:55:05.9519014-08:00 Info    [5] HostController - 16383 MB physical memory installed
    2013-01-30T07:55:05.9538546-08:00 Info    [5] HostController - 13106 MB physical memory available for nodes
    2013-01-30T07:55:06.1716364-08:00 Info    [5] ServiceRunner - Repository being replicated from the primary repository at net.tcp://srch-fsis-001-p.cisco.com:17001/ceres/hostcontroller/nettcp
    2013-01-30T07:55:06.4118800-08:00 Info    [5] ServiceRunner - Service available at http://srch-fsis-006-p:17000/ceres/hostcontroller
    2013-01-30T07:55:06.4128566-08:00 Info    [5] ServiceRunner - Service available at net.tcp://srch-fsis-006-p:17001/ceres/hostcontroller
    2013-01-30T07:55:06.4206694-08:00 Info    [5] WcfReplicationManager - Replication manager started
    2013-01-30T10:05:08.2641940-08:00 Warning [12] HostController - Failed to determine if the node running [System = Fsis, Node = InteractionEngineNode42]
    System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.
       at System.RuntimeMethodHandle._InvokeMethodFast(Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner)
       at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks)
       at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)
       at System.RuntimeType.InvokeMember(String name, BindingFlags bindingFlags, Binder binder, Object target, Object[] providedArgs, ParameterModifier[] modifiers, CultureInfo culture, String[] namedParams)
       at Microsoft.Ceres.HostController.Controller.RuntimeProxy.InvokeControllerMethod(String method, Object[] args)
       at Microsoft.Ceres.HostController.Controller.RuntimeProxy.IsNodeOperating(String nodeRoot)
       at Microsoft.Ceres.HostController.Controller.RuntimeProxy.IsNodeOperating(String nodeRoot)
       at Microsoft.Ceres.HostController.Controller.HostController.IsNodeOperating(String systemName, String nodeName)
    System.ArgumentException: A node is not deployed at C:\Users\fsis.gen\AppData\Local\FSIS\Nodes\Fsis\InteractionEngineNode42
       at Microsoft.Ceres.CoreServices.Deployment.DeploymentController.ValidateDeployed(String nodeRoot)
       at Microsoft.Ceres.CoreServices.Deployment.DeploymentController.IsNodeOperating(String nodeRoot)

    Wednesday, January 30, 2013 10:35 PM

All replies

  • Hello,

    The answer for this is in the error message:

    2013-01-29T20:07:55.3371746-08:00 Warning [11] WcfReplicationManager - Exception in replication handler, backing off for 300 seconds
    System.ServiceModel.EndpointNotFoundException: Could not connect to net.tcp://srch-fsis-001-p.cisco.com:17001/ceres/hostcontroller/nettcp. The connection attempt lasted for a time span of 00:00:00.9961320. TCP error code 10061: No connection could be made because the target machine actively refused it 72.163.42.208:17001

    Verify that you have connectivity from this server to server 72.163.42.208 in port 17001.

    You can test this with telnet:

    telnet 72.163.42.208 17001

    You can check the documentation for the list of ports used by SharePoint @ http://technet.microsoft.com/en-us/library/cc262849.aspx

    Cheers

    Thursday, July 3, 2014 3:57 PM