none
JDBCConnector - Publisher0 has been inactive for more than 300 seconds

    Question

  • Hi FAST gurus,

    We are indexing a SQL Server database in a Test environment using the jdbcconnector.bat command:

    .\jdbcconnector.bat start -f ..\etc\SQL_Test_Config.xml

    The indexing works, and results are available - however we get the following error/timeout/issue as part of the process:

    <*snip*>

    ...

    10:55:10,229 INFO  [JDBCAdapter] The time taken to execute SQL is : 0m 0s
    10:55:15,697 INFO  [JDBCAdapter] Running postSQL
    10:55:15,697 INFO  [JDBCAdapter] Finished running postSQL
    10:55:15,697 INFO  [JDBCAdapter] Closed JDBC connection
    10:55:17,244 INFO  [CCTKDocumentFeeder] Publisher instance: 0 published: 1000 (142.85715 docs/sec)
    10:55:17,260 INFO  [CCTKDocumentFeeder] Publisher 0: Last document in queue has been read
    10:55:17,260 INFO  [CCTKDocumentFeeder] Publisher 0: Shutting down: com.fastsearch.esp.cctk.publishers.CCTKDocumentFeeder
    10:55:17,291 INFO  [CCTKDocumentFeeder] Waiting for FAST Search feeder to complete.....
    10:56:50,197 INFO  [JDBCConnector] All tasks done. Telling publishers to stop waiting for items....
    11:00:19,822 WARN  [JDBCConnector] Thread : Publisher0 has been inactive for more than 300 seconds.
    11:05:29,822 WARN  [JDBCConnector] Thread : Publisher0 has been inactive for more than 300 seconds.

    If we CTRL-C and stop the batch file, it all works fine. But I'm wondering what else is not being done because of this. I am assuming it is waiting for a response from the Publisher components, but never receives one? There are no other warnings or errors generated as part of the process.

    We have FAST with SP1 and the August 2011 CU installed. FAST is configured across two servers with the following deployment.xml file:

    <?xml version="1.0" encoding="utf-8" ?>
    <deployment comment="2 node FAST Search farm configuration" xmlns="http://www.microsoft.com/enterprisesearch">
      <instanceid>FASTSearchMultiNode</instanceid>
      <connector-databaseconnectionstring>
        <![CDATA[jdbc:sqlserver://sql.test.local\sql01:1433;DatabaseName=FASTSearchAdminDatabase]]>
      </connector-databaseconnectionstring>
      <host name="fast01.test.local">
        <admin />
        <indexing-dispatcher />
        <content-distributor />
        <webanalyzer server="true" link-processing="true" lookup-db="true" />
        <document-processor processes="4" />
        <query />
        <searchengine row="0" column="1" />
      </host>
      <host name="fast02.test.local">
        <document-processor processes="4" />
        <content-distributor />
        <query />
        <searchengine row="0" column="0" />
      </host>
      <searchcluster>
        <row id="0" index="primary" search="true" />
      </searchcluster>
    </deployment>

     


    • Edited by Gavin McKay Monday, December 05, 2011 12:14 AM Added snip
    Monday, December 05, 2011 12:13 AM

All replies

  • use the BCS , going forward BCS is going to be way for FS4SP.
    Sriram S
    Wednesday, December 07, 2011 12:03 PM
  • use the BCS , going forward BCS is going to be way for FS4SP.
    Sriram S

    I would be very surprised if that was the case - and disappointed. BCS is useful for generic items, but it lacks the processing power of using FAST direct, as well as limiting flexibility for indexing if you want to do something more advanced like customise the indexing process. Using BCS would limit you to specific content collections in FAST, whereas I can feed a couple of millions SQL Server rows of data to a specific content collection, and then query that using FASTML directly via the SharePoint web services (being able to specify the content collection as well as use FASTML to tweak the query). And can FAST index BCS data, or do you have to use the SharePoint Query Service App to do this (like People search).

    Using the JDBC connector also allows me to utilise additional queries to detect changes in the underlying data. BCS might be great for smaller sets of content, but my thoughts are that it would suffer when you are getting up towards millions of rows of data to index.


    At the risk of derailing the thread - what makes you say that BCS if going to be the way forward for F4SP? Has there been any Microsoft announcements on this?
    • Edited by Gavin McKay Wednesday, December 07, 2011 9:51 PM
    Wednesday, December 07, 2011 9:49 PM
  • My 2c on this issue.

    I think BCS will move forward with more functionality in the next release and maybe we'll see the introduction of CTS/IMS as well for SharePoint.

    As for BCS you are limiting yourself in how you view it. If you code a custom BCS connector and set the right crawler impact rules you I suspect you will be able to match the indexing speed of the jdbc connecor more or less. There is not that much difference between sending stuff to the content distributor from the jdbc connector and from the FAST Content SSA. And with a custom one you can get the change detection in as well.

    Of course, you would need custom code.. but maybe we'll get a more advanced database connector built for BCS with SharePoint in the future.

    And having one or more content collection is not that big of an issue. A content collection is only a meta tag on the content anyways and it is useful to have one collection per crawler framework. And BCS crawls into the "sp" collection which is just fine.

    On a side now... BA-Insight has built a BCS connector for databases, which is extremely FAST, no issues with millions of rows at all.

    Regards,
    Mikael Svenson 


    Search Enthusiast - SharePoint MVP/WCF4/ASP.Net4
    http://techmikael.blogspot.com/
    Thursday, December 08, 2011 7:46 AM
  • I echo Mikael view. I came to know that microsoft internally use the BCS Fraework to build the connector for lotus notes and other content source. I have implemented many connectorts using BCS framework. Couldnt face much issues. Only disadvantage over this , we need to write code to build this connector.
    Sriram S
    Thursday, December 08, 2011 5:24 PM
  • Hi guys,

     

    Thanks for your responses, but if I am reading correctly the BCS will meet and/or excede the capabilities of the JDBC Connector in the next version of SharePoint (or a major update). Which isn't right now. So that seems to imply the best functional option for indexing large SQL Server databases via FAST, without having to write custom code or develop BCS connectors (i.e. OOTB) is to use the JDBC Connector.

    Which still leaves me with my original problem - has anyone seen an issue where the Publisher times out replying to the JDBC Connector feeder:

    11:05:29,822 WARN [JDBCConnector] Thread : Publisher0 has been inactive for more than 300 seconds.

     


    • Edited by Gavin McKay Thursday, December 08, 2011 10:00 PM
    Thursday, December 08, 2011 9:57 PM
  • I have the same issue but with the Notes Connector. Have applied the August 2011 patch and tried increasing the MAXHEAP setting for java because it looked like that's where it was hitting a limit at 1GB but no luck.

    17:22:09,244 WARN  [NotesConnector] Thread : Publisher0 has been inactive for more than 300 seconds.

    17:26:39,254 WARN  [NotesConnector] Thread : Adapter for task: 0 has been inactive for more than 300 seconds.

    Tuesday, December 13, 2011 6:50 AM
  • Hi David,

    I ran into this as well, and I found something very weird happening.

    I'm trying to index several database, and only one throws this timeout.

    I did the following: set AdapterThrottleSleepMS to 300 miliseconds.

    When one crawl gets stuck, I ran another crawl on a different database, and it "released" the stuck one.

    Have no idea what's happening.

     

    Amir

    Wednesday, January 04, 2012 8:43 AM
  • Thanks, Amir. I gave this a try but it didn't work for me. At the moment I'm going to assume that it has something to do with trying to crawl these databases locally. Hopefully this problem will disappear when these databases are sitting on a Domino server.
    Wednesday, January 11, 2012 5:14 AM
  • Hello,

    I was experiencing the same problem with JDBC connector. After a while I found out that I have provided a non-unique column for the JDBCPrimaryKeyField (I was joining a few tables in the select query). So I have just provided the unique combination of columns to be the primary key for the result set.

    BR,

    Karol

    Monday, February 27, 2012 12:46 PM
  • OK so after some exhaustive trouble-shooting with MS support, there were a number of issues:

    1. Java JRE update 29 is *broken*. Update 29 includes a new "feature" that encrypts the database connection, and SQL Server 2008 R2 does not cooperate with this at all. Update 27 and Update 30 works. If you have Update 29 - upgrade/downgrade. You will see connection timeouts to SQL Server with this update if you have SQL 2008 R2

    2. The default JDBCConnector.bat settings are not great for large databases (regardless of being SQL Server or not). Because you are extracting documents quicker that the document processors can handle them, the buffer fills up quickly and speed drops away. It was common for us to start at around 100 docs/second, then 70, 45 etc down to 2 by the end! You can change settings such as:

                    <parameter name="MaxDocsInBatch" type="string" >
                                    <description>
                                                    <![CDATA[Maximum number of docs in a batch]]>
                                    </description>
                                    <value>100</value>
                    </parameter>
                    <parameter name="MaxBatchSize(KB)" type="string" >
                                    <description>
                                                    <![CDATA[Maximum batch size]]>
                                    </description>
                                    <value>1000</value>
                    </parameter>
                    <parameter name="MaxActiveDocuments(MB)" type="string" >
                                    <description>
                                                    <![CDATA[Maximum size of active docs]]>
                                    </description>
                                    <value>5</value>
                    </parameter>
                    <parameter name="ThreadSampleInterval" type="string" >
                                    <description>
                                                    <![CDATA[Interval to check for secured callbacks]]>
                                    </description>
                                    <value>30</value>
                    </parameter>

    to modify the amount of information sent through as a batch each time. For example, changing MaxDocsInBatch to 1000 and increasing the MaxActiveDocuments(MB) from 5 to 20 and the ThreadSampleInterval from 30 to 1 sped up feeding and processing significantly.

    3. Check the SQL you are using as well. It turned out that our 800K row database actually only had 150K, but the query being used was was not written correctly. Fixing this changed our row count to the expected 150K.

    4. Check the number of document processors in your farm. We have two indexing servers (2 columns) with 4 processors in each. Recommendation is not more than 2 processors per CPU (we have 4 CPUS, so could change this to 6 probably). You need to carefully monitor performance on your servers though.

    5. Use the "rc -r" command to generate a detailed report of stats in the environment. Not for the faint of heart, but if you really want to examine what is happening under the hood, this is the one you need.

    HTH

    Monday, February 27, 2012 1:02 PM
  •     So I am doing some tests on my local VM which does have SQL Server, SP 2010 and FS4SP installed.  Since FS4SP doesn't have a good way of indexing XML files like the FileTraver I am kinda stuck using the JDBC Connector.  So in my situation we are indexing less than 60,000 rows from a simple SQL table and I have noticed it does it in batches of 1000 documents.  I am looking to get some increase in speed, averaging 8 docs/sec which is pretty pathetic.  Any suggestions on getting better performance on this, I would like to get an idea how long it will take to index 60k docs.  In a production, stage, dev environment with a seperate indexing server I would think 5 to 10 minutes max.  I could get the content in an XML format as well, but that means building a custom pipeline like outlined here:

    http://blogs.msdn.com/b/sharepointdev/archive/2012/01/25/creating-a-custom-xml-indexing-connector-for-fast-search-server-2010-for-sharepoint.aspx

    but I have not tried this yet.

    I am not a fan of using BCS either, I would think we would get better performance from the connectors.

    Smitty.

    Tuesday, March 20, 2012 12:20 PM
  • Maybe a little late for a party, but we experience similar issue connectiong to SQL Azure - it connects initially and after retrieving around 3K documents connection is being lost.

    We have the lates JRE 1.7, have no major issues with processing rate and I wanted to experiment with connector setting, e.g. TimeOut, but I can't seem to find "MaxDocsInBatch", "MaxBatchSize" etc. paramaters in default jdbctemplate.xml (nor in JDBCConnector.bat for that matter).

    Do they exist in FS4SP jdbc connector?

    Thank you,

    Mike

    Friday, April 13, 2012 10:31 PM
  • May want to check this article.

    http://social.msdn.microsoft.com/Forums/en-US/ssdsgetstarted/thread/849bad83-0656-4a8e-b98c-6917c4a5d52f/

    Based on above article with SU1 update "Idle timeout" has been increased to 30-minutes.

    In regards to parameters,

    August CU had fixed this issue.

    http://support.microsoft.com/kb/2553040

    After applying August FAST CU, system will allow the user to add these configuration parameters manually to the configuration file.

    FASTSearchSubmit/MaxBatchSize(KB),
    FASTSearchSubmit/MaxDocsInBatch,
    FASTSearchSubmit/MaxActiveDocuments(MB),
    and ConnectorExecution\ThreadSampleInterval

    The connector will read the config file, and if a value does not exist, it will add the default value. Otherwise, it will use the value read from file.

    HTH

    Tuesday, May 01, 2012 9:10 PM
  • Hello,

    I was experiencing the same problem with JDBC connector. After a while I found out that I have provided a non-unique column for the JDBCPrimaryKeyField (I was joining a few tables in the select query). So I have just provided the unique combination of columns to be the primary key for the result set.

    BR,

    Karol


    Thanks alot. This reminded me my primary key column wasn't set correctly.
    Monday, May 21, 2012 9:58 PM