none
Crawl rules ignored RRS feed

Answers

  • Ok, so you want all files in the folder file://vtweb50/publications, but not any files?

    Using this include regexp will do all files in the folder, and skip all folders:

    file://vtweb50/publications/[-a-zA-Z0-9%_]+[.][-a-zA-Z0-9%_]+
    

    Seems \w and \d or any other escape regexp operator is not supported. The groups should do all upper and lower case letters, numbers, encoded characters, hypen and underscore.

    You might have to add an exclude rule after the include with file://vtweb50/publications/*, if the first one doesn't exclude everything else, as the rules are fall thru.

    -m


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    • Marked as answer by Martin Rossen Thursday, March 3, 2011 9:24 AM
    Thursday, March 3, 2011 7:44 AM

All replies

  • Sorry. I solved it myself. I needed to include a whitespace in my regex.
    file://vtweb50/publications/*/[a-z0-9%20/ ]+


    \Martin
    Wednesday, March 2, 2011 1:05 PM
  • Your regexp don't look right even though it matches.

    Seems to be you want something like: file://vtweb50/publications/.*?/([\w\d/]+)|(%20)+

    A star in itself is greedy, and you might want it only match the first folder name? a-z won't match uppercase. By putting %20 in your character group it will also match %02, 2%0, %a0 etc.

    What exactly are you trying to skip?

    -m


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    Wednesday, March 2, 2011 2:55 PM
  • Hi Mikael

    Good point.

    I'm crawling a fileshare, and want to remove all folders under vtweb50/publications from the searchresults.

    I did a little recon, and concluded, that foldernames in that path includes a-z, A-Z, 0-9, / (if there are subfolders) and spaces (the %20 is probably not needed anymore, since the whitespace worked).

    It seems with a little more study, that I can use

    file://vtweb50/publications/[\d\w\s/]+

    Would you say, thats correct?


    \Martin
    Wednesday, March 2, 2011 3:03 PM
  • Or maybe I should find a regex that matches anything but a period.

    would file://vtweb50/publications[^\.] work?


    \Martin
    Wednesday, March 2, 2011 3:05 PM
  • Or just file://vtweb50/publications/* without regex. That will remove everything starting with that path.

    -m


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    Wednesday, March 2, 2011 6:23 PM
  • The point is, that I only want to remove folders. Not the files in the fileshare.

    When I try [\d\w\s] or [\^.] sharepoint converts all \ to / when I press Ok. That can't be by design... is that a bug?


    \Martin
    Thursday, March 3, 2011 6:52 AM
  • Ok, so you want all files in the folder file://vtweb50/publications, but not any files?

    Using this include regexp will do all files in the folder, and skip all folders:

    file://vtweb50/publications/[-a-zA-Z0-9%_]+[.][-a-zA-Z0-9%_]+
    

    Seems \w and \d or any other escape regexp operator is not supported. The groups should do all upper and lower case letters, numbers, encoded characters, hypen and underscore.

    You might have to add an exclude rule after the include with file://vtweb50/publications/*, if the first one doesn't exclude everything else, as the rules are fall thru.

    -m


    Search Enthusiast - MCTS SharePoint/WCF4/ASP.Net4
    http://techmikael.blogspot.com/ - http://www.comperiosearch.com/
    • Marked as answer by Martin Rossen Thursday, March 3, 2011 9:24 AM
    Thursday, March 3, 2011 7:44 AM