none
Unpivot is losing columns in parsing a folder of CSVs with dynamic number of columns RRS feed

  • Question

  • The scenario: I receive a weekly report of processed items with for a number of different programs that vary each cycle. One week there might be three the next five. I can successfully parse either type with a single source query. When I convert that into a function it loses the last column as it parses each file.

    Here is the single course routine

    let
        Source = Table.FromColumns({Lines.FromBinary(File.Contents("C:\Users\gartman\Desktop\Jan Conference\Westshore FD622 Folder\Special Processing\Long File Names\_dcotnaspd1.som.ad.state.mi.us_DCHFilenet_FD622_batchuploadfolders_fnedmsdocimport_img_filesForDMS_18194_ac_MDCH_e622_op_1407949191_20140904_20140905_summary.csv"),null,null,1252)}),
        #"Split Column by COMMA" = Table.SplitColumn(Source,"Column1",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10"}),
        #"Removed Top 3 Rows" = Table.Skip(#"Split Column by COMMA",3),
        #"Replaced Value" = Table.ReplaceValue(#"Removed Top 3 Rows","","0",Replacer.ReplaceValue,{"Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9"}),
        #"Promoted Headers" = Table.PromoteHeaders(#"Replaced Value"),
        #"Delete 2 Next Rows" = Table.Skip(#"Promoted Headers",2),
        #"Add Headers for First 4 Rows" = Table.RenameColumns(#"Delete 2 Next Rows",{{"", "Year"}, {"_1", "MIP"}, {"Section", "Type"}, {"_2", "Cost Center"}}),
        #"Remove row with no YEAR" = Table.SelectRows(#"Add Headers for First 4 Rows", each ([Year] <> "")),
        #"Unpivot to Normalize" = Table.UnpivotOtherColumns(#"Remove row with no YEAR", {"Year", "MIP", "Type", "Cost Center"}, "Attribute", "Value"),
        #"Changed Value column to Decimal" = Table.TransformColumnTypes(#"Unpivot to Normalize",{{"Value", type number}}),
        #"Removed Value Column Errors" = Table.RemoveRowsWithErrors(#"Changed Value column to Decimal", {"Value"}),
        #"Remove Empty/Zero Values" = Table.SelectRows(#"Removed Value Column Errors", each ([Value] <> null and [Value] <> 0))
    in
        #"Remove Empty/Zero Values"

    Which becomes this function

    (FD622Parsed as text) as table=>
    let
        Source = Csv.Document(File.Contents (FD622Parsed),[Delimiter=",",Encoding=1252]),
        #"Split Column by COMMA" = Table.SplitColumn(Source,"Column1",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10", "Column1.11"}),
        #"Removed Top 3 Rows" = Table.Skip(#"Split Column by COMMA",3),
        #"Replaced Value" = Table.ReplaceValue(#"Removed Top 3 Rows","","0",Replacer.ReplaceValue,{"Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10", "Column1.11"}),
        #"Promoted Headers" = Table.PromoteHeaders(#"Replaced Value"),
        #"Delete 2 Next Rows" = Table.Skip(#"Promoted Headers",2),
        #"Add Headers for First 4 Rows" = Table.RenameColumns(#"Delete 2 Next Rows",{{"", "Year"}, {"_1", "MIP"}, {"Section", "Type"}, {"_2", "Cost Center"}}),
        #"Remove row with no YEAR" = Table.SelectRows(#"Add Headers for First 4 Rows", each ([Year] <> "")),
        #"Unpivot to Normalize" = Table.UnpivotOtherColumns(#"Remove row with no YEAR", {"Year", "MIP", "Type", "Cost Center"}, "Attribute", "Value"),
        #"Changed Value column to Decimal" = Table.TransformColumnTypes(#"Unpivot to Normalize",{{"Value", type number}}),
        #"Removed Value Column Errors" = Table.RemoveRowsWithErrors(#"Changed Value column to Decimal", {"Value"}),
        #"Remove Empty/Zero Values" = Table.SelectRows(#"Removed Value Column Errors", each ([Value] <> null and [Value] <> 0))
    in
        #"Remove Empty/Zero Values"

    The problem file is partially parsed missing the last column. All the other data is coming thru without issue. At the point of EXPAND is where things seem to get messed up in this routine

    let
        Source = Folder.Files("C:\Users\gartman\Desktop\Jan Conference\WS Test\shortNames"),
        #"Get folder file path and file names" = Table.RemoveColumns(Source,{"Extension", "Date accessed", "Date modified", "Date created", "Attributes", "Content"}),
        #"Use ParseUnpivot to get shaped tables of each file found" = Table.AddColumn(#"Get folder file path and file names", "parseFD622s", each ParseUnpivot([Folder Path]&[Name])),
        #"Expanded parseFD622s1" = Table.ExpandTableColumn(#"Use ParseUnpivot to get shaped tables of each file found", "parseFD622s", {"Year", "MIP", "Type", "Cost Center", "Attribute", "Value"}, {"Year", "MIP", "Type", "Cost Center", "Attribute", "Value"}),
        #"Remove unneeded columns 1st pass" = Table.RemoveColumns(#"Expanded parseFD622s1",{"Folder Path"}),
        #"Parse the file names by underscores" = Table.SplitColumn(#"Remove unneeded columns 1st pass","Name",Splitter.SplitTextByDelimiter("_", QuoteStyle.Csv),{"Name.1", "Name.2", "Name.3", "Name.4", "Name.5", "Name.6", "Name.7", "Name.8", "Name.9", "Name.10", "Name.11", "Name.12", "Name.13", "Name.14", "Name.15", "Name.16", "Name.17"}),
        #"Removed Columns" = Table.RemoveColumns(#"Parse the file names by underscores",{"Name.1", "Name.2", "Name.7", "Name.8", "Name.9", "Name.10", "Name.11", "Name.12", "Name.13", "Name.14", "Name.15", "Name.16", "Name.17"}),
        #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Name.3", "Patient Type"}, {"Name.4", "Billing NPI"}, {"Name.5", "Cycle Date"}, {"Name.6", "Issue Date"}, {"Attribute", "Program"}, {"Value", "Amount"}}),
        #"Filtered Rows" = Table.SelectRows(#"Renamed Columns", each true),
        #"Changed Type" = Table.TransformColumnTypes(#"Filtered Rows",{{"Patient Type", type text}, {"Billing NPI", type text}, {"Cycle Date", type text}, {"Issue Date", type text}, {"Year", type text}, {"MIP", type text}, {"Type", type text}, {"Cost Center", type text}, {"Program", type text}}),
        #"Uppercased Text" = Table.TransformColumns(#"Changed Type",{{"Patient Type", Text.Upper}}),
        #"Changed Type1" = Table.TransformColumnTypes(#"Uppercased Text",{{"Amount", type number}})
    in
        #"Changed Type1"

    Originally, the source files had blanks in many of the columns. Before setting headers, a force all blanks to zero so all the columns have values when unpivoted. Later I remove the zero values since they are of no consequence. 

    Is there some unknown limit to unpivot I am missing?

    Frustrating since the single source query works just fine but I need to pull info from the file name to fully setup the row.

    Any help will be appreciated.

    Wednesday, January 27, 2016 2:24 PM

Answers

  • To me the #“Split Column by COMMA“-step in the function looks useless. It does its job in the single course routine (very strange way to open a csv document btw) – but in the function the opening of the csv file looks straightforward (step: Source) – no need for the second step then. Maybe this causes the error.

    As a general rule when working with data with different number of columns: Watch the code for hardcoded Column names like “Column1.1”. These steps will not adjust automatically. So it’s best to unpivot (other Columns) as soon as possible, then do the necessary transformations row-wise (i.e. instead of renaming columns, you just replace values in the attribute-column of the unpivoted table) before pivoting back as the last step. That way there will be no hardcoded column names anywhere.


    Imke Feldmann TheBIccountant.com

    Wednesday, January 27, 2016 3:08 PM
    Moderator

All replies

  • To me the #“Split Column by COMMA“-step in the function looks useless. It does its job in the single course routine (very strange way to open a csv document btw) – but in the function the opening of the csv file looks straightforward (step: Source) – no need for the second step then. Maybe this causes the error.

    As a general rule when working with data with different number of columns: Watch the code for hardcoded Column names like “Column1.1”. These steps will not adjust automatically. So it’s best to unpivot (other Columns) as soon as possible, then do the necessary transformations row-wise (i.e. instead of renaming columns, you just replace values in the attribute-column of the unpivoted table) before pivoting back as the last step. That way there will be no hardcoded column names anywhere.


    Imke Feldmann TheBIccountant.com

    Wednesday, January 27, 2016 3:08 PM
    Moderator
  • No luck yet.

    The reason I load text and then split is because the file will not parse properly at first. I believe it is because of the headers messing up the real columns below.

    To solve the issue, I created a special handling process to move the subject file out of the main folder to a special processing folder and manually processing it into a single file table, copy and appending to a Excel table of manual items. I just append that to the main data result to get a combined list.

    Bummer. But only so much time and other things to do. Plus 2013 is coming and will expect maintenance issues.

    Thanks for your advice.

    Wednesday, February 17, 2016 1:23 PM