none
Hyperlinks in HTML table RRS feed

  • Question

  • When importing data from a table on a web page, is there any way to include the href attribute of a hyperlink in the output?  For example, suppose you have an html table like this:

    <table>
    <tr>
      <td><a href="url1">Row 1, Column 1</A></td>
      <td>Row 1, Column 2</td>
    </tr>
    <tr>
      <td><a href="url2">Row 2, Column 1</A></td>
      <td>Row 2, Column 2</td>
    </tr>
    </table>

    I would like to be able to access the href URL's (url1 and url2 in the example above) as text from inside Power Query.

    Monday, July 22, 2013 6:25 PM

Answers

  • Hi David,

    Currently there isn't a straightforward way of extracting the list of links from an HTML page. The "easiest" way of doing that today would be importing the HTML source file as text and defining some transformations to extract the href attributes (maybe based on split column by delimiter operations, etc.). This can be very painful.

    This is a recurrent piece of feedback, so something that we will enable in one of our future updates is the ability to get a list of all links within a page, similar to how we offer the list of all tables today.

    Thanks,
    M.

    Monday, July 22, 2013 7:29 PM

All replies

  • Hi David,

    Currently there isn't a straightforward way of extracting the list of links from an HTML page. The "easiest" way of doing that today would be importing the HTML source file as text and defining some transformations to extract the href attributes (maybe based on split column by delimiter operations, etc.). This can be very painful.

    This is a recurrent piece of feedback, so something that we will enable in one of our future updates is the ability to get a list of all links within a page, similar to how we offer the list of all tables today.

    Thanks,
    M.

    Monday, July 22, 2013 7:29 PM
  • Same question from me: extracting attributes, and filtering elements by their attributes. Thanks anyway, Miguel.

    Saturday, September 7, 2013 8:59 AM