locked
How to crawl metadata from RSS Feed or Sitemap file With Enterprise Crawler RRS feed

  • Question

  • Hi,

    With FAST ESP 5.3 with Enterprise crawler, how can i get metadata from sitemap.xml to Fast Pipeline

    With all my tests i've never seen the extra_data attribute

    Thanks


    • Edited by Guillaume BARBERY Friday, October 18, 2013 7:27 PM update to explain the issue
    Thursday, October 17, 2013 6:11 PM

All replies

  • Sitemap is supported in crawler version 6.6 or newer which applies to ESP 5.3. - storing/indexing metadata from sitemaps.
    When the 'use_sitemaps' is enabled in the crawler configuration, it allows the crawler to detect and parse sitemaps. The crawler support sitemap and sitemap index files as defined by the specification at  http://www.sitemaps.org/protocol.php and Support for storing/indexing metadata from sitemaps.
    You can enable doctrace with debug to see the metadata which gets extracted during document processing, but regarding adding metadata in the sitemap.xml, that's outside the crawler and you may have to refer to above sitemap link.

    Friday, October 18, 2013 3:19 PM
  • Hi and thanks for your answer

    I have the exact same problem with rss feed, it seem that i am not able to get the extra_data from the crawler

    I've put some spy stage in the pipelines and no one ever show me the extra_data attribute

    Is there anything to do to enable this attribute ?

    Moreover http://www.sitemaps.org/protocol.php doesnt provide any information regarding additional metadata that could be crawled by FAST, should i need to use custom namespace ?

    Do you have an sitemap example with additional metadata

    Thanks

    Friday, October 18, 2013 4:00 PM