Answered by:
Full text search of .pdf files in a file table.

Question
-
I have installed the Adobe iFilter 11 64 bit and set the path to the bin folder. I still cannot find any text from the pdf files. I suspect I am missing something trivial because I don't find much when I Bing for this so it must not be a common problem. Here is the code.
-- Adobe iFilter 11 64 bit is installed -- The Path variable is set to the bin folder for the Adobe iFilter. -- SQL Developer version 64 bit on both Windows 7 and Windows 8. USE master; GO DROP DATABASE FileTableStudy; GO CREATE DATABASE FileTableStudy ON PRIMARY ( NAME = N'FileTableStudy' ,FILENAME = N'E:\SQLServerData\SQL2012\Engine\FileTableStudy.mdf' ,SIZE = 4096KB ,FILEGROWTH = 1024KB ) ,FILEGROUP FileTableStudyFileTable CONTAINS FILESTREAM ( NAME = FileTableStudyFileTable ,FILENAME = 'E:\SQLServerData\FileTableStudy' ) LOG ON ( NAME = N'FileTableStudy_log' ,FILENAME = N'D:\SQLServerLogs\SQL2012\FileTableStudy_log.ldf' ) WITH FILESTREAM ( NON_TRANSACTED_ACCESS = FULL ,DIRECTORY_NAME = N'FileTableStudyFiles' ); GO USE FileTableStudy; GO DROP TABLE dbo.Magazine; GO CREATE TABLE dbo.Magazine AS FILETABLE WITH ( FileTable_Directory = 'MagazineStore' ,FileTable_Collate_Filename = database_default ); GO CREATE FULLTEXT CATALOG MagazineFullTextCatablog AS DEFAULT; GO --EXEC sp_fulltext_service 'load_os_resources', 1; --EXEC sp_fulltext_service 'verify_signature', 0; --EXEC sp_fulltext_service 'restart_all_fdhosts'; --EXEC sp_fulltext_service 'update_languages'; --EXEC sp_help_fulltext_system_components 'filter'; --RECONFIGURE WITH OVERRIDE; SELECT document_type ,path FROM sys.fulltext_document_types WHERE document_type = '.pdf'; SELECT * FROM sys.fulltext_document_types ORDER BY document_type; DROP FULLTEXT INDEX ON dbo.Magazine; GO SELECT TOP 1 indexes.name IndexName FROM sys.indexes JOIN sys.tables ON indexes.object_id = tables.object_id AND tables.name = 'Magazine' JOIN sys.schemas ON tables.schema_id = schemas.schema_id AND schemas.name = 'dbo' WHERE indexes.is_unique = 1 AND indexes.name LIKE 'PK__%'; GO -- Drag documents to folder. CREATE FULLTEXT INDEX ON dbo.Magazine ( file_stream TYPE COLUMN file_type) KEY INDEX [PK__Magazine__5A5B77D541728F3E]; GO -- Wait for index to build SELECT DATEDIFF(ss, crawl_start_date, crawl_end_date) IndexBuildSeconds ,* FROM sys.fulltext_indexes --ALTER FULLTEXT INDEX ON dbo.Magazine START UPDATE POPULATION; SELECT * FROM dbo.Magazine WHERE file_type = 'pdf'; SELECT * FROM dbo.Magazine WHERE FREETEXT(*,'new core licensing') AND file_type = 'pdf'; SELECT * FROM dbo.Magazine WHERE CONTAINS(*, N'"Microsoft"') AND file_type = 'pdf'; SELECT * FROM sys.fulltext_catalogs; SELECT * FROM sys.fulltext_indexes; SELECT * FROM sys.fulltext_index_columns; SELECT * FROM sys.fulltext_index_catalog_usages;
Thanks for any help.
Tom G.
- Moved by Kalman Toth Saturday, March 30, 2013 2:21 PM Not t-sql
Saturday, March 30, 2013 2:06 PM
Answers
-
Hi Tom,That is the error that you get when the FTS service can’t load the filter, or things aren’t configured correctly. And you’re right to look, leaving out the bin directory from environment is the main reason this occurs.I just checked the configuration on my machine (its working there) and have Adobe IFilter9 registered. A few things to try:1. All the configuration steps you have commented out in the script have already been done? And .pdf is mapped to the filter?2. You’re running a 64-bit version of SQL Server? (bitness of the filter needs to machine the SQL Server)3. SQL Server (it’s actually the related-FTS FTHOST.exe) has access to the filter file (security-wise)?Those are the only things I can think of. Perhaps, if you can find the older version of the filter and nothing else works, you can try that. But that’s an unlikely fix too.Hope this helps,Cheers,Bob
- Marked as answer by Tom Groszko Monday, April 1, 2013 3:19 PM
Sunday, March 31, 2013 11:19 PM
All replies
-
BOL page: Configure and Manage Filters for Search
Kalman Toth Database & OLAP Architect sqlusa.com
New Book / Kindle: SQL Programming & Database Design Using Microsoft SQL Server 2012Saturday, March 30, 2013 2:24 PM -
I have seen that.
Thanks for taking the time to look at my problem.
Tom G.
Saturday, March 30, 2013 3:34 PM -
Well, I have it working on my servers, but don't remember this problem:
http://www.sqlservercentral.com/Forums/Topic779285-391-1.aspx
On another front, I did have problems with Microsoft Office and Adobe PDF IFilter (I now use FoxIT instead) with Apartment threading. You can pursue the links in this thread: http://social.technet.microsoft.com/Forums/en-US/sqlsearch/thread/992b33ed-6dba-44c1-8311-50bad3baf76d
Or try out FoxIT to see if the problem goes away and determine if it is worth the money to you.
RLF
Sunday, March 31, 2013 1:34 AM -
Check the fulltext log files for errors. If there’s the filter isn’t working, there should be errors in those files.Cheers, Bob
- Proposed as answer by Kalman Toth Sunday, March 31, 2013 9:41 PM
Sunday, March 31, 2013 8:04 AM -
I had found one of those, that for the pointer to the second.
Happy Easter
Tom G.
Sunday, March 31, 2013 12:35 PM -
Once I found the log files id did find some errors in them.
2013-03-31 15:32:42.10 spid33s Warning: No appropriate filter was found during full-text index population for table or indexed view '[FileTableStudy001].[dbo].[Magazine]' (table or indexed view ID '245575913', database ID '22'), full-text key value '/11092072547012.248776084320333.2588084073/'. Some columns of the row were not indexed.
So the filter is not working but why?
I can verify that the bin directory is in the Path variable with
DECLARE @RegistryValue NVARCHAR(4000) ,@RC INT ,@Hive NVARCHAR(128) = N'HKEY_LOCAL_MACHINE' ,@Key NVARCHAR(255) = N'SYSTEM\CurrentControlSet\Control\Session Manager\Environment' ,@Item NVARCHAR(128) = N'Path'; EXEC @RC = master.dbo.xp_regread @Hive ,@Key ,@Item ,@RegistryValue OUTPUT ,N'Not Used' SELECT @RegistryValue RegistryValue;
It is still telling me that it completed successfully, no documents failed?
2013-03-31 15:35:22.95 spid35s Informational: Full-text Full population completed for table or indexed view '[FileTableStudy001].[dbo].[Magazine]' (table or indexed view ID '245575913', database ID '22'). Number of documents processed: 5. Number of documents failed: 0. Number of documents that will be retried: 0.
Happy Easter.
Tom G.
Sunday, March 31, 2013 8:10 PM -
Hi Tom,That is the error that you get when the FTS service can’t load the filter, or things aren’t configured correctly. And you’re right to look, leaving out the bin directory from environment is the main reason this occurs.I just checked the configuration on my machine (its working there) and have Adobe IFilter9 registered. A few things to try:1. All the configuration steps you have commented out in the script have already been done? And .pdf is mapped to the filter?2. You’re running a 64-bit version of SQL Server? (bitness of the filter needs to machine the SQL Server)3. SQL Server (it’s actually the related-FTS FTHOST.exe) has access to the filter file (security-wise)?Those are the only things I can think of. Perhaps, if you can find the older version of the filter and nothing else works, you can try that. But that’s an unlikely fix too.Hope this helps,Cheers,Bob
- Marked as answer by Tom Groszko Monday, April 1, 2013 3:19 PM
Sunday, March 31, 2013 11:19 PM -
Those commented out things have been run several times. It is 64 bit SQL Server and the Adobe iFilter is 64bit 11. I have confirmed the bin directory is in the path folder both by eye ball, T-SQL code as posted and the .NET Environment show it using C#.
I went to the Adobe iFilter bin folder and granted full text launcher and SQL2012 read and execute access to the folder.
I dropped the index, recreated it and waited for the crawl to be finished.
Still the same results.
Thanks for all of your suggestions.
Happy Easter.
Tom G.
Monday, April 1, 2013 12:51 AM -
Your "unlikely fix" fixed it. With great doubt I remove iFilter 11 and installed iFilter 9. All worked as I wanted it to. I then removed iFilter 9 and put 11 back. It did not work. Put 9 back again and all worked as expected.
My conclusion is that Adobe iFilter v11 does not work with SQL2012.
Thanks for your suggestions.
Tom G.
Monday, April 1, 2013 3:19 PM -
Thanks for the feedback, Tom. I have an unprovisioned (for FTS) VM with SQL2012 I can use to repro your behavior, and if I can repro it, report as a bug. Or maybe do some more testing with the 11 filter. If you want to report it first, post the Connect bug number here and I’ll confirm.Cheers, BobTuesday, April 2, 2013 8:31 PM
-
Since 9 works and 11 does not I suspect this is not a candidate for a connect item. It is an Adobe problem not a Microsoft problem. I could not find on the Adobe site any way to report a bug. You may certainly report it to Microsoft if you want.
Tom G.
Wednesday, April 3, 2013 2:07 PM -
Hello,
We believe we have figured this out. It looks like it has to do with the length of the default folder location for the Adobe iFilter.
I was able to reproduce the issue and the following resolved it for me. See if this resolves it for you all as well.
Here is how to get Adobe Version 11 PDF filter to work.
1 . If you haven’t already, run the following in SQL Server:
Sp_fulltext_service ‘Load_os_resources’, 1
Go
--you might also need to run: sp_fulltext_service ‘Verify_signature’,0 --This is used to validate trusted iFilters. 0 disables it. So use with caution.
--go
2. Stop SQL Server. (Make sure FDHost.exe stops)
3. Uninstall the Adobe ifilter (because it defaulted to having spaces or the folder name is too long).
4. Reinstall the Adobe iFilter and when it prompts for where to install it, change it to: C:\Program Files\Adobe\PDFiFilter
5. Once the installation finishes, go the computer’s Environment variables. Add the following to the PATH.
C:\Program Files\Adobe\PDFiFilter\BIN
NOTE: it must include the BIN folder
NOTE: If you had the OLD location that included spaces, remove it from the path environment variable.
6. Start SQL Server
7. IF you had an existing Full-text index on PDFs, drop the full-text index and recreate it.
8. You should now get results when you run sys.dm_fts_index_keywords('db','tblname') --Note: Change db to be the actual database name and tblname to be the actual table name.
Give this a try and see if this fixes yours.
Sincerely,
Rob Beene, MSFT
Monday, October 13, 2014 11:11 PMAnswerer -
Thanks I will give it a try.
Tom G.
Monday, October 13, 2014 11:33 PM -
Rob Been,
You are a true god! Thanks for the explanation! I got it working now :)
Best regards,
Jouke van Rossum
Friday, November 20, 2015 11:03 AM