locked
Import Files from Folder: Same structure of files however New Columns introduced that must be added RRS feed

  • Question

  • Hello Experts,

    I have over 300 Excel files that contain ~800 rows each of data and 30 columns.  The structure of the file is stable and consistent and will not change.  The columns have titles which include dates (mm/dd/yyyy) and calendar weeks (CW 02, CW 03, etc...) along with other text titles.  The dates are in the date format and the calendar weeks are in text format with a separation between CW & the week number.  Each file shows the planning requirements snapshot for the next weeks.  The files are from the MRP system and show the planning numbers for products for the given date/CW in the respective column.

    The issue is the following:
    When the new file comes in, I need to shift the contents of the new file to match the header titles and then any new columns that are not represented in the last file will need to be added.  Example below but the images further down (that I can't post right now but will when I can) will show a better idea of the issue.

    File Import #1 - Probably my Sample File when setting up the template in PQ

    Part#  CW01  CW02  CW03 CW04 CW05
    Part1   100     200     100    200    100

    File Import #2 - same number of headers but the data is for some different weeks

    Part#  CW02  CW03  CW04 CW05 CW06
    Part1   200     100    200    100    200

    As you can see, If I use File Import #1 as my Sample File, the headers are created for CW 01 - CW 05.  If I import File #2, I need the CW to be shifted on the new file to match the Sample file headers and then create a new Column for CW 06 to deposit the data.

    After PQ Sort - what I'd like to see as a result

    Part#  CW01  CW02  CW03 CW04 CW05  CW06 <-new column added
    Part1   100     200     100    200    100     null
    Part1   null     200     100    200    100    200

    As I do this import throughout the year, I will have the history of every part, for every import that I process.  
    What I'm requesting help with is the following:

    • After establishing the sample file on the 1st file, how to make the subsequent files line up with the correct CW headers
    • Create a new columns to extend the table to place the new CW data into throughout the year (every growing)
    • Change the CW titles to dates.weekofyear instead of text

    I know this is a lot and I really appreciate your time, if this is even possible.  I've made images but this forum isn't allowing me to post at this time.  If you want to see them, please let me know.  After I get access to post images, I'll update the question if the forum will allow it.

    Thank you,

    Sidney


    Tuesday, December 17, 2019 7:34 PM

Answers

  • Hi Sydney

    The following does what you want (except if I missed something important). A few comments:

    • In your querying file you seem to show 2 different expected results where 1 seem to be sorted, not the other. This wasn't really clear to me so nothing is sorted anywhere. If you need to, add steps to query Result and you should be fine
    • Again 2 different columns names (FP vs Part#) between expected results. I kept "FP"
    • In your desired output [FP] show rows with null values. Not sure this is what you want ultimately but as I wasn't sure I didn't filtered them out...
    • What makes your consolidation a bit tricky is: your data are not formatted as Excel Tables. No problem but the sheet name to combine is different in each file. Fortunately you only have 1 sheet per file so this was manageable
    • Re. [FileName] I had to assume there is always find "CW" somewhere in the middle (easy to change) of the actual file name
    • For readability and maintenance (changes/additions) reasons I split the process in 2 queries and 2 functions

    Query Result code (if you don't need to Sort and/or add steps after combining file you don't need that one, just use query CombineSources):

    let
        Source = CombineSources
    in
        Source

    Query CombineSources code:

    let
        Source = Folder.Files(PathToFilesToCombines),
        FilteredOutHiddenFiles = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
        SelectedRequiredColumns = Table.SelectColumns(FilteredOutHiddenFiles,{"Content", "Name"}),
        RenamedNameColumn = Table.RenameColumns(SelectedRequiredColumns,{{"Name", "FileName"}}),
        TransformedFileName = Table.TransformColumns(RenamedNameColumn,
            {"FileName", (state)=> "CW" & Text.BeforeDelimiter(Text.AfterDelimiter(state," CW"),".",{0, RelativePosition.FromEnd})}),
    AddedSheetContent = Table.AddColumn(TransformedFileName, "SheetContent", each fnGetSheetContent([Content]), Table.Type), TransformedSheetContent = Table.AddColumn(AddedSheetContent, "SheetTable", each fnTranformSheetContent([SheetContent]), Table.Type), SelectedReqColumns = Table.SelectColumns(TransformedSheetContent,{"FileName","SheetTable"}), ColumnsToExpand = List.Union(List.Transform(SelectedReqColumns[SheetTable], (state)=> Table.ColumnNames(state))), ExpandedTables = Table.ExpandTableColumn(SelectedReqColumns,"SheetTable",ColumnsToExpand), columnNames = Table.ColumnNames(ExpandedTables), numColumns = List.Difference(columnNames,{"FileName","FP"}), numTypes = List.Repeat({Number.Type},List.Count(numColumns)), zipTransform = List.Zip({columnNames,List.Combine({{Text.Type},{Text.Type},numTypes})}), ChangedTypes = Table.TransformColumnTypes(ExpandedTables,zipTransform) in ChangedTypes


    Function fnGetSheetContent code:

    (binSource as binary) as table =>
    let
        ExcelContent = Excel.Workbook(binSource, null, true),
        SheetName = Table.FirstValue(Table.SelectRows(ExcelContent, each ([Kind]="Sheet"))),
        SheetContent = ExcelContent{[Item=SheetName,Kind="Sheet"]}[Data]
    in
        SheetContent


    Function fnTranformSheetContent code

    (inTable as table) as table =>
    let
        Source = inTable,
        PromotedHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
        TrimmedColumnNames = Table.TransformColumnNames(PromotedHeaders,Text.Trim),
        ColumnNames = Table.ColumnNames(TrimmedColumnNames),
            posBACKLOG = List.PositionOf(ColumnNames,"BACKLOG"),
            posRESTWK = List.PositionOf(ColumnNames,"REST WK"),
            dateColumns = List.Skip(List.FirstN(ColumnNames,posRESTWK),posBACKLOG+1),
            cwColumns = List.Select(ColumnNames, each Text.StartsWith(_,"CW")),
            reqColumns = {"FP","BACKLOG"} & dateColumns & {"REST WK"} & cwColumns,
        SelectedReqColumns = Table.SelectColumns(TrimmedColumnNames,reqColumns),
            dayColumns = List.Transform(
                List.Positions(dateColumns), (i)=> "Day " & Text.From(i +1)
                ),
            zipRename = List.Zip({dateColumns,dayColumns}),
        RenamedColumns = Table.RenameColumns(SelectedReqColumns,zipRename)
    in
        RenamedColumns

    Corresponding workbook avail. here (don't forget to change PathToFilesToCombines in query CombineSource) and if you want to avoid hard-coding PathToFilesToCombines in the query, there are n examples on this forum...

    If this solves your problem please Mark as answer (can help others with similar scenario). Any question re. the above let me know


    • Edited by Lz._ Monday, December 23, 2019 8:21 AM adjust code CombineSource
    • Proposed as answer by Lz._ Thursday, January 9, 2020 6:04 PM
    • Marked as answer by SidJenkins Monday, February 3, 2020 5:45 PM
    Sunday, December 22, 2019 10:11 AM

All replies

  • Hi Sydney

    1st thing confusing: The columns have titles which include dates (mm/dd/yyyy) and calendar weeks (CW 02, CW 03, etc...). Then later you show tables (File import) with [Part#] & [CWnn] only
    Where are the columns with dates (mm/dd/yyyy)???

    In the meantime...

    In your scenario, from one source file to another, you have a varying number of columns. In such case you cannot easily use (+adapt) the standard import wizard. As you've observed, if File1 has columns CW01-CW05, File2 has CW01-CW06 and you select File1 as sample file then column CW06 from File2 is ignored - whatever the sources of your files are (Db, XLSX, CSV…) the problematic remains the same
    In a nutshell you have to build your own solution/code. Your situation is very similar to Combining Files From Folder in Power Query where you'll find samples for diff. type of sources

    Other thing that is unclear: Change the CW titles to dates.weekofyear instead of text

    When should this change happen in your process (probably after consolidating the sources files but confirm please)? More importantly, where do we get the info to calc. the WeekOfYear (this Q. probably relate to #1 re. mm/dd/yyyy but not sure)

    Easiest way to clarify everything would be: ZIP 2 dummy source files + a workbook showing the expected result. Then upload & share that ZIP file on OneDrive, GoogleDrive… and post the link here

    Tuesday, December 17, 2019 10:07 PM
  • Hi,

    I'd like to see the pictures/files and the transform function that is called that leads you to: "As you can see, If I use File Import #1 as my Sample File, the headers are created for CW 01 - CW 05."

    On that function, if I understood you correctly, you could remove "columns = #" parameter and it would work as expected

    Wednesday, December 18, 2019 7:36 AM
  • Hello,

    I'm still not verified by the Forum administrator and therefore cannot send the images.  If you know a trick for them to verify my account, it would be appreciated.  I've looking in my email folders (all of them) and do not see any attempt from them to contact me for verification.

    Anyway, the images that I can send will show the date format and the other columns with CWs.  I will send you the link in a few minutes to give you the file access.

    Thank you for your help both Lz._ & S.Risemann

    Talk soon

    Friday, December 20, 2019 7:36 PM
  •                     

    Hello All,

    https://www.dropbox.com/s/pc1qz3y3hqn1zed/Forum.zip?dl=0

    Above is the link requested.  You will find the source files in the folder along with the Process file & .pptx file for more information.  Please let me know if you need anything else.

    Thank you,
    Sidney


    Friday, December 20, 2019 10:41 PM
  • Hi Sydney

    The following does what you want (except if I missed something important). A few comments:

    • In your querying file you seem to show 2 different expected results where 1 seem to be sorted, not the other. This wasn't really clear to me so nothing is sorted anywhere. If you need to, add steps to query Result and you should be fine
    • Again 2 different columns names (FP vs Part#) between expected results. I kept "FP"
    • In your desired output [FP] show rows with null values. Not sure this is what you want ultimately but as I wasn't sure I didn't filtered them out...
    • What makes your consolidation a bit tricky is: your data are not formatted as Excel Tables. No problem but the sheet name to combine is different in each file. Fortunately you only have 1 sheet per file so this was manageable
    • Re. [FileName] I had to assume there is always find "CW" somewhere in the middle (easy to change) of the actual file name
    • For readability and maintenance (changes/additions) reasons I split the process in 2 queries and 2 functions

    Query Result code (if you don't need to Sort and/or add steps after combining file you don't need that one, just use query CombineSources):

    let
        Source = CombineSources
    in
        Source

    Query CombineSources code:

    let
        Source = Folder.Files(PathToFilesToCombines),
        FilteredOutHiddenFiles = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
        SelectedRequiredColumns = Table.SelectColumns(FilteredOutHiddenFiles,{"Content", "Name"}),
        RenamedNameColumn = Table.RenameColumns(SelectedRequiredColumns,{{"Name", "FileName"}}),
        TransformedFileName = Table.TransformColumns(RenamedNameColumn,
            {"FileName", (state)=> "CW" & Text.BeforeDelimiter(Text.AfterDelimiter(state," CW"),".",{0, RelativePosition.FromEnd})}),
    AddedSheetContent = Table.AddColumn(TransformedFileName, "SheetContent", each fnGetSheetContent([Content]), Table.Type), TransformedSheetContent = Table.AddColumn(AddedSheetContent, "SheetTable", each fnTranformSheetContent([SheetContent]), Table.Type), SelectedReqColumns = Table.SelectColumns(TransformedSheetContent,{"FileName","SheetTable"}), ColumnsToExpand = List.Union(List.Transform(SelectedReqColumns[SheetTable], (state)=> Table.ColumnNames(state))), ExpandedTables = Table.ExpandTableColumn(SelectedReqColumns,"SheetTable",ColumnsToExpand), columnNames = Table.ColumnNames(ExpandedTables), numColumns = List.Difference(columnNames,{"FileName","FP"}), numTypes = List.Repeat({Number.Type},List.Count(numColumns)), zipTransform = List.Zip({columnNames,List.Combine({{Text.Type},{Text.Type},numTypes})}), ChangedTypes = Table.TransformColumnTypes(ExpandedTables,zipTransform) in ChangedTypes


    Function fnGetSheetContent code:

    (binSource as binary) as table =>
    let
        ExcelContent = Excel.Workbook(binSource, null, true),
        SheetName = Table.FirstValue(Table.SelectRows(ExcelContent, each ([Kind]="Sheet"))),
        SheetContent = ExcelContent{[Item=SheetName,Kind="Sheet"]}[Data]
    in
        SheetContent


    Function fnTranformSheetContent code

    (inTable as table) as table =>
    let
        Source = inTable,
        PromotedHeaders = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
        TrimmedColumnNames = Table.TransformColumnNames(PromotedHeaders,Text.Trim),
        ColumnNames = Table.ColumnNames(TrimmedColumnNames),
            posBACKLOG = List.PositionOf(ColumnNames,"BACKLOG"),
            posRESTWK = List.PositionOf(ColumnNames,"REST WK"),
            dateColumns = List.Skip(List.FirstN(ColumnNames,posRESTWK),posBACKLOG+1),
            cwColumns = List.Select(ColumnNames, each Text.StartsWith(_,"CW")),
            reqColumns = {"FP","BACKLOG"} & dateColumns & {"REST WK"} & cwColumns,
        SelectedReqColumns = Table.SelectColumns(TrimmedColumnNames,reqColumns),
            dayColumns = List.Transform(
                List.Positions(dateColumns), (i)=> "Day " & Text.From(i +1)
                ),
            zipRename = List.Zip({dateColumns,dayColumns}),
        RenamedColumns = Table.RenameColumns(SelectedReqColumns,zipRename)
    in
        RenamedColumns

    Corresponding workbook avail. here (don't forget to change PathToFilesToCombines in query CombineSource) and if you want to avoid hard-coding PathToFilesToCombines in the query, there are n examples on this forum...

    If this solves your problem please Mark as answer (can help others with similar scenario). Any question re. the above let me know


    • Edited by Lz._ Monday, December 23, 2019 8:21 AM adjust code CombineSource
    • Proposed as answer by Lz._ Thursday, January 9, 2020 6:04 PM
    • Marked as answer by SidJenkins Monday, February 3, 2020 5:45 PM
    Sunday, December 22, 2019 10:11 AM
  • Hi Sydney

    Did you have an opportunity to test the above proposal? Did it work as expected? If so please use the Mark as answer hyperlink at the bottom otherwise let me know what does work. Thanks

    Wednesday, January 1, 2020 5:08 AM
  • Hello,

    I marked as answer.  I learned so much from your post and I really appreciate your time to work this out.  Of course i had to change just a couple of things since I gave a hypothetical approach but I'm very happy.  Thank you again.

    Sidney

    Monday, February 3, 2020 5:46 PM
  • Hi Sydney

    Glad I could help & Thanks for posting back (TBH ad > 1 month with no news I wasn't expecting anymore)

    Monday, February 3, 2020 8:19 PM