locked
Full text search of .pdf files in a file table. RRS feed

  • Question

  • I have installed the Adobe iFilter 11 64 bit and set the path to the bin folder. I still cannot find any text from the pdf files. I suspect I am missing something trivial because I don't find much when I Bing for this so it must not be a common problem. Here is the code.

    --	Adobe iFilter 11 64 bit is installed
    --	The Path variable is set to the bin folder for the Adobe iFilter.
    --	SQL Developer version 64 bit on both Windows 7 and Windows 8.
    USE master;
    GO
    DROP DATABASE FileTableStudy;
    GO
    CREATE DATABASE FileTableStudy
    ON PRIMARY
    	(	 NAME = N'FileTableStudy'
    		,FILENAME = N'E:\SQLServerData\SQL2012\Engine\FileTableStudy.mdf' 
    		,SIZE = 4096KB 
    		,FILEGROWTH = 1024KB
    	)
    	,FILEGROUP FileTableStudyFileTable  CONTAINS FILESTREAM
    	(	 NAME = FileTableStudyFileTable
    		,FILENAME = 'E:\SQLServerData\FileTableStudy'
    	)
    	LOG ON 
    	(	 NAME = N'FileTableStudy_log'
    		,FILENAME = N'D:\SQLServerLogs\SQL2012\FileTableStudy_log.ldf' 
    	)
    	WITH FILESTREAM
    	(	 NON_TRANSACTED_ACCESS = FULL
    		,DIRECTORY_NAME = N'FileTableStudyFiles'
    	);
    GO
    
    USE FileTableStudy;
    GO
    
    DROP TABLE dbo.Magazine;
    GO
    
    CREATE TABLE dbo.Magazine AS FILETABLE
    WITH	(	 FileTable_Directory = 'MagazineStore'
    			,FileTable_Collate_Filename = database_default
    		);
    GO
    
    CREATE FULLTEXT CATALOG MagazineFullTextCatablog AS DEFAULT;
    GO
    
    --EXEC sp_fulltext_service 'load_os_resources', 1;
    --EXEC sp_fulltext_service 'verify_signature', 0;
    --EXEC sp_fulltext_service 'restart_all_fdhosts';
    --EXEC sp_fulltext_service 'update_languages'; 
    --EXEC sp_help_fulltext_system_components 'filter'; 
    --RECONFIGURE WITH OVERRIDE;
    SELECT	 document_type
    		,path 
    FROM sys.fulltext_document_types 
    WHERE document_type = '.pdf';
    SELECT *
    FROM sys.fulltext_document_types
    ORDER BY document_type;
    
    DROP  FULLTEXT INDEX ON dbo.Magazine;
    GO
    
    SELECT TOP 1  indexes.name	IndexName
    FROM sys.indexes
    JOIN sys.tables		ON indexes.object_id = tables.object_id
    					AND tables.name = 'Magazine'
    JOIN sys.schemas	ON tables.schema_id = schemas.schema_id
    					AND schemas.name = 'dbo'
    WHERE indexes.is_unique = 1
     AND indexes.name LIKE 'PK__%';
    GO
    
    --	Drag documents to folder.
    CREATE FULLTEXT INDEX ON dbo.Magazine
    	(	file_stream TYPE COLUMN file_type)
    	KEY INDEX [PK__Magazine__5A5B77D541728F3E];
    GO
    --	Wait for index to build
    SELECT	 DATEDIFF(ss, crawl_start_date, crawl_end_date) IndexBuildSeconds
    		,*
    FROM sys.fulltext_indexes
    
    --ALTER FULLTEXT INDEX ON dbo.Magazine START UPDATE POPULATION;
     
    SELECT * 
    FROM dbo.Magazine
    WHERE file_type = 'pdf';
    
    SELECT * 
    FROM dbo.Magazine
    WHERE FREETEXT(*,'new core licensing')
    AND file_type = 'pdf';
    
    SELECT * 
    FROM dbo.Magazine
    WHERE CONTAINS(*, N'"Microsoft"')
    AND file_type = 'pdf';
    
    SELECT *
    FROM sys.fulltext_catalogs;
    
    SELECT *
    FROM sys.fulltext_indexes;
    
    SELECT *
    FROM sys.fulltext_index_columns;
    
    SELECT *
    FROM sys.fulltext_index_catalog_usages; 
    

    Thanks for any help.


    Tom G.

    • Moved by Kalman Toth Saturday, March 30, 2013 2:21 PM Not t-sql
    Saturday, March 30, 2013 2:06 PM

Answers

  • Hi Tom,
     
    That is the error that you get when the FTS service can’t load the filter, or things aren’t configured correctly. And you’re right to look, leaving out the bin directory from environment is the main reason this occurs.
     
    I just checked the configuration on my machine (its working there) and have Adobe IFilter9 registered. A few things to try:
    1. All the configuration steps you have commented out in the script have already been done? And .pdf is mapped to the filter?
    2. You’re running a 64-bit version of SQL Server? (bitness of the filter needs to machine the SQL Server)
    3. SQL Server (it’s actually the related-FTS FTHOST.exe) has access to the filter file (security-wise)?
     
    Those are the only things I can think of. Perhaps, if you can find the older version of the filter and nothing else works, you can try that. But that’s an unlikely fix too.
     
    Hope this helps,
    Cheers,
    Bob
    • Marked as answer by Tom Groszko Monday, April 1, 2013 3:19 PM
    Sunday, March 31, 2013 11:19 PM

All replies

  • Saturday, March 30, 2013 2:24 PM
  • I have seen that.

    Thanks for taking the time to look at my problem.


    Tom G.

    Saturday, March 30, 2013 3:34 PM
  • Well, I have it working on my servers, but don't remember this problem:

    http://www.sqlservercentral.com/Forums/Topic779285-391-1.aspx

    On another front, I did have problems with Microsoft Office and Adobe PDF IFilter (I now use FoxIT instead) with Apartment threading.  You can pursue the links in this thread: http://social.technet.microsoft.com/Forums/en-US/sqlsearch/thread/992b33ed-6dba-44c1-8311-50bad3baf76d

    Or try out FoxIT to see if the problem goes away and determine if it is worth the money to you.

    RLF

    Sunday, March 31, 2013 1:34 AM
  • Check the fulltext log files for errors. If there’s the filter isn’t working, there should be errors in those files.
     
    Cheers, Bob
    • Proposed as answer by Kalman Toth Sunday, March 31, 2013 9:41 PM
    Sunday, March 31, 2013 8:04 AM
  • I had found one of those, that for the pointer to the second.

    Happy Easter


    Tom G.

    Sunday, March 31, 2013 12:35 PM
  • Once I found the log files id did find some errors in them.

    2013-03-31 15:32:42.10 spid33s     Warning: No appropriate filter was found during full-text index population for table or indexed view '[FileTableStudy001].[dbo].[Magazine]' (table or indexed view ID '245575913', database ID '22'), full-text key value '/11092072547012.248776084320333.2588084073/'. Some columns of the row were not indexed.

    So the filter is not working but why?

    I can verify that the bin directory is in the Path variable with

    DECLARE	 @RegistryValue		NVARCHAR(4000)
    		,@RC		INT
    		,@Hive		NVARCHAR(128)	=	N'HKEY_LOCAL_MACHINE'
    		,@Key		NVARCHAR(255)	=	N'SYSTEM\CurrentControlSet\Control\Session Manager\Environment'
    		,@Item		NVARCHAR(128)	=	N'Path';
    EXEC @RC = master.dbo.xp_regread	@Hive
                                        ,@Key
                                        ,@Item
                                        ,@RegistryValue OUTPUT
                                        ,N'Not Used'
    SELECT @RegistryValue RegistryValue;
    

    It is still telling me that it completed successfully, no documents failed?

    2013-03-31 15:35:22.95 spid35s     Informational: Full-text Full population completed for table or indexed view '[FileTableStudy001].[dbo].[Magazine]' (table or indexed view ID '245575913', database ID '22'). Number of documents processed: 5. Number of documents failed: 0. Number of documents that will be retried: 0.

    Happy Easter.


    Tom G.

    Sunday, March 31, 2013 8:10 PM
  • Hi Tom,
     
    That is the error that you get when the FTS service can’t load the filter, or things aren’t configured correctly. And you’re right to look, leaving out the bin directory from environment is the main reason this occurs.
     
    I just checked the configuration on my machine (its working there) and have Adobe IFilter9 registered. A few things to try:
    1. All the configuration steps you have commented out in the script have already been done? And .pdf is mapped to the filter?
    2. You’re running a 64-bit version of SQL Server? (bitness of the filter needs to machine the SQL Server)
    3. SQL Server (it’s actually the related-FTS FTHOST.exe) has access to the filter file (security-wise)?
     
    Those are the only things I can think of. Perhaps, if you can find the older version of the filter and nothing else works, you can try that. But that’s an unlikely fix too.
     
    Hope this helps,
    Cheers,
    Bob
    • Marked as answer by Tom Groszko Monday, April 1, 2013 3:19 PM
    Sunday, March 31, 2013 11:19 PM
  • Those commented out things have been run several times. It is 64 bit SQL Server and the Adobe iFilter is 64bit 11. I have confirmed the bin directory is in the path folder both by eye ball, T-SQL code as posted and the .NET Environment show it using C#.

    I went to the Adobe iFilter bin folder and granted full text launcher and SQL2012 read and execute access to the folder.

    I dropped the index, recreated it and waited for the crawl to be finished.

    Still the same results.

    Thanks for all of your suggestions.

    Happy Easter.


    Tom G.

    Monday, April 1, 2013 12:51 AM
  • Your "unlikely fix" fixed it. With great doubt I remove iFilter 11 and installed iFilter 9. All worked as I wanted it to. I then removed iFilter 9 and put 11 back. It did not work. Put 9 back again and all worked as expected.

    My conclusion is that Adobe iFilter v11 does not work with SQL2012.

    Thanks for your suggestions.


    Tom G.

    Monday, April 1, 2013 3:19 PM
  • Thanks for the feedback, Tom. I have an unprovisioned (for FTS) VM with SQL2012 I can use to repro your behavior, and if I can repro it, report as a bug. Or maybe do some more testing with the 11 filter. If you want to report it first, post the Connect bug number here and I’ll confirm.
     
    Cheers, Bob
    Tuesday, April 2, 2013 8:31 PM
  • Since 9 works and 11 does not I suspect this is not a candidate for a connect item. It is an Adobe problem not a Microsoft problem. I could not find on the Adobe site any way to report a bug. You may certainly report it to Microsoft if you want.


    Tom G.

    Wednesday, April 3, 2013 2:07 PM
  • Hello,

    We believe we have figured this out.  It looks like it has to do with the length of the default folder location for the Adobe iFilter.

    I was able to reproduce the issue and the following resolved it for me.  See if this resolves it for you all as well.

    Here is how to get Adobe Version 11 PDF filter to work.

     1 . If you haven’t already, run the following in SQL Server:

    Sp_fulltext_service ‘Load_os_resources’, 1

    Go

    --you might also need to run:  sp_fulltext_service ‘Verify_signature’,0  --This is used to validate trusted iFilters. 0 disables it. So use with caution.

    --go

    2. Stop SQL Server.  (Make sure FDHost.exe stops)

    3.   Uninstall the Adobe ifilter (because it defaulted to having spaces or the folder name is too long).

    4.   Reinstall the Adobe iFilter and when it prompts for where to install it, change it to: C:\Program Files\Adobe\PDFiFilter

    5.  Once the installation finishes, go the computer’s Environment variables. Add the following to the PATH.

    C:\Program Files\Adobe\PDFiFilter\BIN

    NOTE: it must include the BIN folder

    NOTE: If you had the OLD location that included spaces, remove it from the path environment variable.

    6. Start SQL Server

    7.  IF you had an existing Full-text index on PDFs, drop the full-text index and recreate it.

    8. You should now get results when you run sys.dm_fts_index_keywords('db','tblname')  --Note: Change db to be the actual database name and tblname to be the actual table name.

     Give this a try and see if this fixes yours. 

    Sincerely,

    Rob Beene, MSFT

    Monday, October 13, 2014 11:11 PM
    Answerer
  • Thanks I will give it a try.

    Tom G.

    Monday, October 13, 2014 11:33 PM
  • Rob Been,

    You are a true god! Thanks for the explanation! I got it working now :)

    Best regards,

    Jouke van Rossum

    Friday, November 20, 2015 11:03 AM