none
Pdf.Tables inconsistant table names and table structure RRS feed

  • General discussion

  • Hello,

    I receive daily an updated PDF report that contains tables.

    The PDF is in an email and with a Flow I drop the PDF into my Onedrive.

    In Power-Bi Power Query I select with Folder.Files() the PDF from the last 5 days to extra the data.

    I extract data from some tables with Power Query Pdf.Tables() function. (Still missing this function in Excel Power Query!!!).

    I have encountered 2 problems so far:

    1. the name of the table is not always the same from day to day.

    The header in the table has consistant names, hence I first look up the table name by expanding all tables of the source and then search for a header name and convert the table name to a variable.

    2. the table has every day the same data, but sometimes columns are not split correctly.

    In my case, I have some columns at the end that I will not use, hence this gives me some margin for my solution.

    My solution :

    First make sure that Table.PromotedHeaders() and Table.TransformColumnTypes() is not in the steps that are created automatically. All column headers are now Column1, Column2, etc

    Create a new column and concatenate all columns in 1 column. I add Text.PadEnd(columnx,30) to all columns to be able to split later some columns based on number of characters.

    Then remove all other columns but the new concatenated.

    Split the new column back into columns that are needed for the report, based on number of characters or delimiters.

    This works fine for now, but there is some hardcoding involved and I would like to make the query more flexible so it handles all variations that can occur in the table name and table columns.

    Has any one experience with this issue?

    And please vote & nag Microsoft to include the Pdf.Tables() function into Excel !!!

    Regards,

    Dirk.

    Friday, February 14, 2020 2:35 PM

All replies

  • I have been using the Pdf.Tables() function in Excel Queries even though the documentation in the M-Function list has not been changed.  I assume this is a different implementation than a "connector" mentioned by guyhunkin. here

    elc




    • Edited by GeneQuery2 Monday, May 18, 2020 5:26 PM added link
    Monday, May 18, 2020 5:09 PM