none
How to crawl complex url sites like php sites?

    Question

  • Hi all,

           I am working in Fast search for SharePoint 2010..I tried to crawl the complex url sites (eg. complex php sites)..In crawl rules, I enabled both crawling the complex url's, crawling the entire site with https. It is showing as one success and no errors which means only the home page url is getting crawled and the complex urls are not getting crawled. .So it is not getting navigated from home page to complex urls page..I also set the depths as unlimited..But also it is showing the same..So please help me how to crawl the entire complex url php sites..Waiting for reply..Thanks.........

    Regards,

    Sharath Kumar

     

    Friday, December 30, 2011 11:51 AM

All replies

  • SharePoint should crawl any of the URLs on the site provided that the home page on the site has links to the other pages on the site via navigation (or whatever).
    Corey Roth - SharePoint Server MVP blog: www.dotnetmafia.com twitter: @coreyroth
    Friday, December 30, 2011 3:36 PM
  •             Thanks for your reply. What you said is correct. But in my case, it is not happening. I tried all the ways like

    a) In crawl rules, we checked both the options crawl complex url and crawl as http pages.

    b) We set the depth as unlimited and tried..

    Other sites are getting crawled but only the site with php script is not been crawled. Here only the home page is only getting crawled and I get the result as one success and zero error..So please help me..


    Sharath Kumar R
    Monday, January 02, 2012 6:52 AM
  • Did you try to set it with powershell ?

    Set-SPEnterpriseSearchExtendedConnectorProperty –SearchApplication $searchApp –identity ExtensionsToFilter –Value “;ascx;asp;aspx;htm;html;jhtml;jsp;php;”

    Official KB from Microsoft: http://support.microsoft.com/kb/2550268

    Hope it helps, Gokan


    Founder of SharePoint CookBook: http://www.GokanOzcifci.be
    Microsoft Certified Technology Specialist: SharePoint 2010, Configuring
    Microsoft Certified Personal
    Monday, January 02, 2012 10:53 AM
  • Hi,

        Thanks for ur reply..I already tried with powershell commands also.But also i am getting same problem.

    In that site,there is no index, no follow metatags..The robots.txt is disabled..

    Can you suggest some other methods to crawl the complex url php site??

     


    Sharath Kumar R
    Monday, January 02, 2012 12:23 PM
  • Hi Kumar,

    Please check this site: http://walisystemsinc.com/sharepoint/art/phpsearch/Searching_PHP_Sites_With_SharePoint_2010.htm

    Kr, Gokan


    Founder of SharePoint CookBook: http://www.GokanOzcifci.be
    Microsoft Certified Technology Specialist: SharePoint 2010, Configuring
    Microsoft Certified Personal
    Tuesday, January 03, 2012 3:26 PM
  • Hi,

         Thanks for help..I already tried all the things given in that link which you specified   http://walisystemsinc.com/sharepoint/art/phpsearch/Searching_PHP_Sites_With_SharePoint_2010.htm. But also its not crawling the complex url which are dynamically generated.I have a doubt whether the share point crawls the dynamic complex urls?? If is it so, please mention what should be done to crawl dynamic urls..waiting for reply...

     


    Sharath Kumar R
    Tuesday, January 03, 2012 8:45 PM
  • Hi, what do you mean with "complex url withc are dynamiclly generated"? Kr, Gokan
    Founder of SharePoint CookBook: http://www.GokanOzcifci.be
    Microsoft Certified Technology Specialist: SharePoint 2010, Configuring
    Microsoft Certified Personal
    Tuesday, January 03, 2012 10:21 PM
  • Hi Gokhan,

        dynamic url means when we click on the any links under homepage, then the url of those links will get automatically generated.That url wont be physically present in any of server.

    For example, let's look at three URLs:

       http://www.somesites.com/forums/thread.php?threadid=12345&sort=date
       http://www.somesites.com/forums/thread.php?threadid=67890&sort=date
       http://www.somesites.com/forums/thread.php?threadid=13579&sort=date

    All three of these URLs point to three different pages. But if the search engine purges the information after the first offending character, the question mark (?), now all three pages look the same:

       http://www.somesites.com/forums/thread.php
       http://www.somesites.com/forums/thread.php
       http://www.somesites.com/forums/thread.php


    Now, you don't have unique pages, and consequently, the duplicate URLs won't be indexed.

    For further details please refer this link  http://www.webconfs.com/dynamic-urls-vs-static-urls-article-3.php

    http://www.seo-consultant-services.co.uk/static-html-vs-dynamic-urls.html

    Please let me know whether the share point crawls the dynamic complex urls and also how?


    Sharath Kumar R
    Wednesday, January 04, 2012 7:18 AM
  •  
    1. Check search services is running?
    2. Check services identity user   as you  user password should be never expire  and its part of domain user ..
    3. Runpsconfig  as Last option
    4. Check timer job History
    5. Log file for more details  use  (ULS, Poweshell command m SQl command )

    Cheers

    Kamal

    Tuesday, February 28, 2012 2:13 AM