none
FAST Search Pipeline Extensibility, Python, and Access Denied Error on "Mapped Drive" RRS feed

  • Question

  • I've created an extensibility stage, called within the FAST Search pipeline that invokes a process written in Python.  When the process is called to open the file name during crawling, using the local path "C:\Users\spfs_fastsearch\AppData\LocalLow\testfolder\testdoc.pdf" passed to it, using the docpush functionality, the Python code called works as designed as the file is opened and the rest of the code works.  However, when the path for the same file is passed to the same code using the same method, but using "\\servername\testfolder\testdoc.pdf" an "Access Denied" error message occurs.
    This code works as designed in the Python shell calling both the "C:\Users\spfs_fastsearch\AppData\LocalLow\testfolder\testdoc.pdf" and "\\servername\testfolder\testdoc.pdf" but again, not using the pipeline extensibility functionality as mentioned above.

    It has to be something with a permissions or security setup somewhere in the pipeline but am not sure where this would be.  ANY assistance in this matter would be greatly appreciated as I have tried everything that I can think of.  Thanks.





    • Edited by BradleyW Thursday, August 23, 2012 5:43 PM
    Thursday, August 23, 2012 5:39 PM

Answers

  • I have read what you write Amir above is an option but found a different way to throw around data between the pipeline and python, RabbitMQ.  Set up the RabbitMQ service on the FAST Search server, have an executable integrated into the pipeline that pulls the document path from the document crawled and sends it to the RabbitMQ service which the Python script retrieves it from and starts it's processing on the file. Works like a dream! Thanks for suggesting the web service option as RabbitMQ works for our needs at this time. 


    • Marked as answer by BradleyW Friday, September 7, 2012 1:00 PM
    • Edited by BradleyW Friday, September 7, 2012 1:04 PM
    Friday, September 7, 2012 1:00 PM

All replies

  • The pipeline in FAST Search Server runs in a sandboxed environment. Despite the fact it is being executed as the FAST user (you can see this e.g. using the task manager or process explorer), you can only access and write to data that are

    1. configured as crawled properties in the extensibility configuration
    2. residing in this one specific folder you are using: "%USERHOME%/<fastservice-user>/AppData/LocalLow

    Also connecting to any external service is limited by the sandbox. For instance, you will be able reaching out to an MS SQL server using SQL server authentication but integrated windows authenticatin (SSPI) won't work at all.

    To my knowledge the only option remaining for you would be to put a python module into the FAST pipeline which is a bit more of a customizing step compared to the customer extensibility stage.

    Friday, August 24, 2012 6:13 AM
  • Thanks for getting back to me.  As my original posting states, and your second parameter requires, the file that is being read is in the "%USERHOME%/<fastservice-user>/AppData/LocalLow" folder, however is being mapped such that as \\servername\AppData\LocalLow.  Why would this change if it's a mapped drive of sorts, and if in the pipeline, is the "locallow rule" still a requirement or could it be read anywhere?  Thanks.

    Thursday, August 30, 2012 7:23 PM
  • This is due to the sandbox implementation of the pipeline. To my knowledge, there is no way to get around this that you can only write to this one specific folder and below. 

    Friday, August 31, 2012 6:09 AM
  • Hi Guys,

    There is a way to pass around the issue of the sandbox isolation.

    If you'll create an anonymous web service, the pipeline extension can call it.

    Just let the web service identify/authenticate to the file system (or any other system), grab the information and pass it back to the pipeline for further processing, or let the web service do the process and return the result of the process, thus transferring less information.

    The only issue is the security issue of an anonymous web service, which can be ran on the FAST server and blocked to all external communication, configuring the firewall accordingly.

    I've do so many times, and it works perfectly.

    Amir

    Thursday, September 6, 2012 11:58 AM
  • I have read what you write Amir above is an option but found a different way to throw around data between the pipeline and python, RabbitMQ.  Set up the RabbitMQ service on the FAST Search server, have an executable integrated into the pipeline that pulls the document path from the document crawled and sends it to the RabbitMQ service which the Python script retrieves it from and starts it's processing on the file. Works like a dream! Thanks for suggesting the web service option as RabbitMQ works for our needs at this time. 


    • Marked as answer by BradleyW Friday, September 7, 2012 1:00 PM
    • Edited by BradleyW Friday, September 7, 2012 1:04 PM
    Friday, September 7, 2012 1:00 PM
  • Hi Bradley,

    It's a very interesting approach which I will investigate further on.

    Thank you very much for sharing.

    Amir

    Sunday, September 9, 2012 6:53 AM
  • Are you looking for a solution to integrate Metadata in a Database with documents in a file system? Are you just testing posibilites? or are you looking for a solution?
    Tuesday, September 11, 2012 12:00 AM
  • I was looking for a solution to run some external processes on the document that is being crawled upon during the time of crawling.  Hope this answers your question.

    Friday, September 14, 2012 1:56 PM