locked
unstructured excel workbook: PowerQuery RRS feed

  • Question

  • Hi guys,

        I have inherited a 10 year collection of unstructured excell workbooks, where there are not tables etc. defines, but all of them have the same format, and calculations (i.e same cells have similar values/context across all workbooks). I have been able to import all of them using power query, using the from file -> from folder option. 

       However, from this workbook, I am interested only is a few salient cells, which will help me generate some useful summary. So, I have the cell references for them , which are sprinkled across sheets : A1:85 in sheet 1, D45:F45 in sheet 2 etc. 

       I am interested in getting some advice on how to pick out only the cells that I need using power query. Since this is a legacy workbook, I will not be able to make any changes to the workbook, only to "get and transform" data while import.

       My question is, is there a M -code / PowerQuery way I can pick and choose what I want to accumulate?

       I did run a search on this forum, and although I found lot of results dealing with named ranges etc, I do not think they address this problem specifically.

       I would be happy to hear what the forum thinks.

       Thanks,

        Sid.

    Thursday, May 18, 2017 3:17 PM

Answers

All replies

  • Hi Sid. Have you tried accessing the sheet you're interested, and then deleting the unneeded columns and/or rows?

    Ehren

    Tuesday, May 30, 2017 5:46 PM
  • My general approach here would be:

    1. Use the query editor to build the query with one specific workbook (we'll call said query "ParseWorkbook" for this example)
    2. Convert the query to a function in the Advanced Editor, insert a line above 'let' (the first line) like so:
      (filepath as text) =>
      let
         <...>
      in
         <...>
    3. Replace the line
         Source = Excel.Workbook("some hard-coded file path"),
      with (exactly as below)
         Source = Excel.Workbook(filepath),
    4. Build another query that reads in the contents of the folder containing all the workbooks
    5. Filter the column(s) 'Extension' and if necessary 'Name' to include only the relevant files.
    6. Add a custom column that uses this as the formula:
         = ParseWorkbook([Folder Path] & [Name])
    7. Continue building out your query, mostly likely by combining the tables or lists or records - whatever your function returns - into a monolithic table/list

    Unfortunately, your question is so general that I can't get more specific than this, but this should give you a good start.


    • Edited by Chris Dutch Wednesday, July 19, 2017 4:30 PM
    Wednesday, July 19, 2017 4:29 PM