none
FAST Search present same file as "Duplicated" RRS feed

  • Question

  • Hi All,

        We are facing an intermitent issue with FAST Search results. Some files are being presented as duplicated, but it turns to be the same file (on the same location). It seems that FAST is duplicating the indexes for these files.

        Is this a known issue? Is there any suggestion on how investigate this further or solve this problem?

        We have 2 test environments: one at our customer site and another "in-company". It seems that the problem occurs more often on the customer site than here.

    Thanks a lot!

     

     


    Sergio

    Monday, November 5, 2012 2:54 PM

All replies

  • Following article discusses how to trim duplicates.

    http://msdn.microsoft.com/en-us/library/ff521593.aspx

    Please check if by any chance enabled case sensitive crawl rules or set global case preservation. Below article discuses this

    http://blogs.msdn.com/b/enterprisesearch/archive/2010/07/09/crawling-case-sensitive-repositories-using-sharepoint-server-2010.aspx

    If there is no case sensitive OR global case preservation applied, check if the duplicates have same internal id.

    Srini Dutta | Sr Escalation Engineer | CTS

    Monday, November 5, 2012 7:16 PM
  • Hi Srini,

         I have checked and we have no case sensitive crawl rules or global case preservation. How can I check this internal id?

         The files that are being duplicated are the exactly same files. Another think I could check is that we have the SP1 installed on our machine. Is it a possible cause for this issue on the other environment?

    Thanks a lot!

     


    Sergio

    Monday, November 5, 2012 7:39 PM
  • Tuesday, November 6, 2012 12:05 AM
  • Hi Srini,

        Using the reference above I could check the duplicate internal ids. They have different internalid's and contentid's.

        Is there any way of solving this?

    Thanks!


    Sergio

    Wednesday, November 7, 2012 3:37 PM
  • Hi Sergio,

    Following technet article discusses about how to remove URLs from search results.

    http://technet.microsoft.com/en-us/library/ff191226.aspx

    You may want to apply steps mentioned under sections for duplicate result.

    Remove the Item ID from the content collection
    Create a crawl rule to exclude duplicates.

    Since these 2 results appear t be coming from different internal ids, you may want to remove one of them and create exclusion rule to prevent it from getting crawled again.

    If you are having multiple URLs with this behavior, may want to open a support tkt with technical support team.

    Wednesday, November 14, 2012 6:48 PM
  • Hi,

    It's hard to say if it's known or not. But the built-in checksum used for duplicate removal is...how should I say it.... sub-par (http://social.technet.microsoft.com/Forums/en-US/fastsharepoint/thread/025e643c-03d8-463c-87d9-ec57db8e8237).

    I have a sample in my book (p.339) on how to use more than the first 1024 characters to build the crc. However, as the crc is 32 bit there is a 50 percent probability that you will get a collision after just 77,000 items and a 99 percent probability that you will get a collision when you reach 200,000 indexed items, even though the items are completely different. But still better than the built-in one.

    Thanks,
    Mikael Svenson



    Search Enthusiast - SharePoint MVP/MCT/MCPD - If you find an answer useful, please up-vote it.
    http://techmikael.blogspot.com/
    Author of Working with FAST Search Server 2010 for SharePoint

    Monday, November 19, 2012 9:07 PM
  • We're facing the same issue with "strange"  case-sensitive duplicates:

    1) No crawl rules are associated with this content source.

    2) Global value of $searchApp.GetProperty("CaseSensitiveCrawling") is false (untouched, by default). 

    Please check a screenshot below shows expanded duplicates. Any ideas what went wrong?

     
    Friday, November 23, 2012 6:42 AM
  • I am facing exactly the same problem in our environment (FAST Search 2010 + Sharepoint 2010).

    One point that I discovered is that this behaviour is associated with new solution deployments. In our environment, only files that were uploaded before the new solution deployment were duplicated, the new ones follow the ordinary behaviour.

    After some tests, I found out this scenario:

    1. Deploy a new Solution over the old one
    2. Run a FAST Content Index Reset (in Central Administration)
    3. Run a FAST Content Full Crawl (in Central Administration)
    4. Result: Every files (or some files) were duplicated

    In order to resolve this situation, we run the command: "Clear-FASTSearchContentCollection sp" after each new deployment.

    Has anyone been through this situation ?

    Thursday, December 6, 2012 1:49 PM
  • Hi,

    I am facing a similar problem. I am getting duplicate results with same contentid in the production environment. I faced the same issue on my local earlier and resolved it by clearing the content collection. As far as my knowledge(please correct me if I am wrong), this issue is related to inconsistency caused in the environment due to regeneration of FAST Search certificate after its expiration. Is there any way to get this issue resolved without resetting the index or clearing the content collection?

    Thanks,

    Sandip


    Sandip

    Friday, December 14, 2012 1:45 PM