Importing data from 1000 PDFs using Power Query - Starting Point - Individual Excel files or one big Excel file? RRS feed

  • Question

  • I am just getting started using power query and I have 1000 7 page PDFs which have the same structure with different information (employee applications - same fields in each - name, date of application, qualifications, etc. with different responses). I'm just starting the process of determining how to do this. My solution was to convert them all to individual Excel files, open one in Power Query, clean it up so that there is just one row per application, and then run this query on the folder for all of them. I'm not sure how to get them to merge into one spreadsheet if I do it this way... if I'm using range to determine where the different fields are, I don't think it will work if I append them all... pointers on which way to begin to structure this? (One big excel file with all applications appended, or individual Excel files? It may be faster for me to get all the PDFs into one PDF and then export to Excel versus manually saving each to Excel, but I don't know what will be better for the portion of the work in Power Query. Any help is much appreciated!!

    Thank you!

    Using Excel 2010 with Power Query Add-in.

    Tuesday, May 24, 2016 4:26 PM


  • Hi there. Just to clarify: you're converting each of the 1k PDFs to an Excel file, and then attempting to import those via Power query?

    If that's the case, do the resulting "Excel files that used to be PDFs" have the same structure? From what you said above, it almost sounds like they end up with the same fields stored in different locations in each Excel file.

    If you could provide more details, examples, etc. of what the source data looks like (prior to being imported via Power Query) that would be helpful.


    Wednesday, May 25, 2016 6:45 PM