none
Bizarre parse when applied to folder RRS feed

  • Question

  • This one is driving me crazy.

    I created a query to parse a weekly file

    let
        Source = Table.FromColumns({Lines.FromBinary(File.Contents(GetValue("current")), null, null, 1252)}),
        #"Split Column by COMMA" = Table.SplitColumn(Source,"Column1",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10"}),
        #"Removed Top 3 Rows" = Table.Skip(#"Split Column by COMMA",3),
        #"Set Data Headers to this row" = Table.PromoteHeaders(#"Removed Top 3 Rows"),
        #"Delete 2 Next Rows" = Table.Skip(#"Set Data Headers to this row",2),
        #"Add Headers for First 4 Rows" = Table.RenameColumns(#"Delete 2 Next Rows",{{"", "Year"}, {"_1", "MIP"}, {"Section", "Type"}, {"_2", "Cost Center"}}),
        #"Filtered Rows" = Table.SelectRows(#"Add Headers for First 4 Rows", each ([Year] <> "" and [Year] <> "Outpatient Program Summary")),
        #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Filtered Rows", {"Year", "MIP", "Type", "Cost Center"}, "Attribute", "Value"),
        #"Renamed Columns" = Table.RenameColumns(#"Unpivoted Other Columns",{{"Attribute", "Program"}, {"Value", "Amount"}}),
        #"Filtered Rows1" = Table.SelectRows(#"Renamed Columns", each ([Amount] <> "" and [Amount] <> "0")),
        #"Changed Type" = Table.TransformColumnTypes(#"Filtered Rows1",{{"Amount", Currency.Type}})
    in
        #"Changed Type"

    Which generates the following result as expected

    I convert the routine to a function

    (FD622Parsed as text) as table=>
    
    let
        Source = Csv.Document(File.Contents (FD622Parsed),[Delimiter=",",Encoding=1252]),
    
        //Source = Table.FromColumns({Lines.FromBinary(File.Contents(GetValue("current")), null, null, 1252)}),
    
        #"Split Column by COMMA" = Table.SplitColumn(Source,"Column1",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10"}),
        #"Removed Top 3 Rows" = Table.Skip(#"Split Column by COMMA",3),
        #"Set Data Headers to this row" = Table.PromoteHeaders(#"Removed Top 3 Rows"),
        #"Delete 2 Next Rows" = Table.Skip(#"Set Data Headers to this row",2),
        #"Add Headers for First 4 Rows" = Table.RenameColumns(#"Delete 2 Next Rows",{{"", "Year"}, {"_1", "MIP"}, {"Section", "Type"}, {"_2", "Cost Center"}}),
        #"Filtered Rows" = Table.SelectRows(#"Add Headers for First 4 Rows", each ([Year] <> "" and [Year] <> "Outpatient Program Summary")),
        #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Filtered Rows", {"Year", "MIP", "Type", "Cost Center"}, "Attribute", "Value"),
        #"Renamed Columns" = Table.RenameColumns(#"Unpivoted Other Columns",{{"Attribute", "Program"}, {"Value", "Amount"}}),
        #"Filtered Rows1" = Table.SelectRows(#"Renamed Columns", each ([Amount] <> "" and [Amount] <> "0")),
        #"Changed Type" = Table.TransformColumnTypes(#"Filtered Rows1",{{"Amount", Currency.Type}})
    in
        #"Changed Type"

    Make up a routine to process a folder of files

    let
        Source = Folder.Files(GetValue("SNFfolder")),
        #"Get folder file path and file names" = Table.RemoveColumns(Source,{"Extension", "Date accessed", "Date modified", "Date created", "Attributes", "Content"}),
        #"Added Custom" = Table.AddColumn(#"Get folder file path and file names", "table", each fParseOP([Folder Path]&[Name]))
    in
        #"Added Custom"

    And I generates this column list when I go to expand the table

    So to work around, I had to parse each of the 52 files manually and append them to a table. Each one parsed without issue using the first query. 

    Ideas?

    I have a similar setup for a matching set of files with the same general format (IP versus OP) and they all parse automatically without issue. The bottom seven columns are from the source, but there are others that are ignored and new ones added. Since all the filed parsed one at a time, the removal of headers and errant items did not appear in the 52 results :-(

    Monday, July 11, 2016 6:16 PM

Answers

  • Making some progress. I actually now have all the columns processing into the dataset, before some columns would be left out the unpivot. I also change the source functions to match. Here is the new parse function.

    (FD622Parsed as text) as table=>
    
    let
        //Source = Csv.Document(File.Contents (FD622Parsed),[Delimiter=",",Encoding=1252]),
        
        Source = Table.FromColumns({Lines.FromBinary(File.Contents(FD622Parsed),null,null,1252)}),
    
        //Source = Table.FromColumns({Lines.FromBinary(File.Contents(GetValue("current")),null,null,1252)}),
    
        #"Split Column by COMMA" = Table.SplitColumn(Source,"Column1",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10"}),
        #"Removed Top 3 Rows" = Table.Skip(#"Split Column by COMMA",3),
        #"Set Data Headers to this row" = Table.PromoteHeaders(#"Removed Top 3 Rows"),
        #"Delete 2 Next Rows" = Table.Skip(#"Set Data Headers to this row",2),
        #"Add Headers for First 4 Rows" = Table.RenameColumns(#"Delete 2 Next Rows",{{"", "Year"}, {"_1", "MIP"}, {"Section", "Type"}, {"_2", "Cost Center"}}),
        #"Remove row with no YEAR" = Table.SelectRows(#"Add Headers for First 4 Rows", each ([Year] <> "")),
        #"Unpivot to Normalize" = Table.UnpivotOtherColumns(#"Remove row with no YEAR", {"Year", "MIP", "Type", "Cost Center"}, "Attribute", "Value"),
        #"Changed Value column to Decimal" = Table.TransformColumnTypes(#"Unpivot to Normalize",{{"Value", type number}}),
        #"Removed Value Column Errors" = Table.RemoveRowsWithErrors(#"Changed Value column to Decimal", {"Value"}),
        #"Remove Empty/Zero Values" = Table.SelectRows(#"Removed Value Column Errors", each ([Value] <> null and [Value] <> 0))
    in
        #"Remove Empty/Zero Values"

    I then revised the consolidate routine to call it.

    //    This routine will retrieve the directory details of all the files in the SNFfolder and put them into a list.
    //    Then is will parse each folder using the fParse function and bring in the detail rows in those files.
    //    Next will will extract key common data from the structured file name and add those columns to the resulting final file.
    
    
    let
        Source = Folder.Files(GetValue("SNFfolder")),
        #"Get folder file path and file names" = Table.RemoveColumns(Source,{"Extension", "Date accessed", "Date modified", "Date created", "Attributes", "Content"}),
        #"Added Custom" = Table.AddColumn(#"Get folder file path and file names", "OP", each fParse([Folder Path]&[Name])),
        #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Folder Path"}),
        #"Split Column by Delimiter" = Table.SplitColumn(#"Removed Columns","Name",Splitter.SplitTextByDelimiter("_", QuoteStyle.Csv),{"Name.1", "Name.2", "Name.3", "Name.4", "Name.5", "Name.6", "Name.7"}),
        #"Expanded OP" = Table.ExpandTableColumn(#"Split Column by Delimiter", "OP", {"Year", "MIP", "Type", "Cost Center", "Attribute", "Value"}, {"Year", "MIP", "Type", "Cost Center", "Attribute", "Value"}),
        #"Renamed Columns1" = Table.RenameColumns(#"Expanded OP",{{"Attribute", "Program"}, {"Value", "Amount"}, {"Name.3", "Patient Type"}, {"Name.4", "Billing NPI"}, {"Name.5", "Cycle Date"}, {"Name.6", "Issue Date"}}),
        #"Removed Columns1" = Table.RemoveColumns(#"Renamed Columns1",{"Name.1", "Name.2", "Name.7"}),
        #"Uppercased Text" = Table.TransformColumns(#"Removed Columns1",{{"Patient Type", Text.Upper}}),
        #"Changed Type" = Table.TransformColumnTypes(#"Uppercased Text",{{"Amount", Currency.Type}})
    in
        #"Changed Type"

    I now combine this result with a manual additions (which I used for hand processed files, using this routine

    // This routine will combine the direct parsed data with the manual additions and move it to the workbook and model.
    
    
    let
        Source = Table.Combine({#"FD622 Consolidate", manualAdditions}),
        #"Appended Query" = Table.Combine({Source,manualAdditions}),
        #"Filtered Rows" = Table.SelectRows(#"Appended Query", each ([Patient Type] <> null)),
        #"Removed Alternate Rows" = Table.AlternateRows(#"Filtered Rows",1,1,1)
    in
        #"Removed Alternate Rows"
    I ran into a new problem but was easy to work around. Each item in the main consolidate table was duplicated? I removed alternate rows as shown but I only have the 104 files in the source folder and they all have unique file names. I am flummoxed over this development but am pleased that I won't have to process 52 files by hand eight more times for our other units.

    • Marked as answer by geraldartman Wednesday, July 13, 2016 6:24 PM
    Wednesday, July 13, 2016 5:59 PM

All replies

  • You're using a different Source-expression in your function compared to the first query. Could this be the reason?

    Imke Feldmann TheBIccountant.com

    Monday, July 11, 2016 7:10 PM
    Moderator
  • Not that I know of the parameters point to one file in the first used to build the desired query routine. The second is pointing to the list of the files in the folder.  Are you talking about the "Table.FromColumns({Lines.FromBinary" compared to "Source = Csv.Document(File.Contents (FD622Parsed),[Delimiter=",",Encoding=1252]),"?

    When I use the same queries on a folder of similar files, the all parse and consolidate.

    Would you suggest a matching syntax?

    Tuesday, July 12, 2016 5:55 PM
  • I'm not sure if I understand you correctly here, but I'd expect the issue to be with the delimiters here. For some reason, some files end up with the header columns not identified correctly. You could check out which files have values in the "ColumnX"-columns and check what has happened to their headers for example and how their content is split across the columns for a start.

    Imke Feldmann TheBIccountant.com

    Tuesday, July 12, 2016 6:56 PM
    Moderator
  • Imke, Thanks for the comments but I am not connecting any dots here.

    I have run each file in the folder individually without issue. I would have expected the type of problem that you describe would have arisen when that process occurred.

    Wednesday, July 13, 2016 11:28 AM
  • Sorry for the confusion, but I'm also still not sure if you're using the same commands when you running individually and via the function.

    What irritates me is the "Split Column by COMMA"-step in your function. It makes sense in your first query:

    let
       
    Source = Table.FromColumns({Lines.FromBinary(File.Contents(GetValue("current")), null, null, 1252)}),
       
    #"Split Column by COMMA" = Table.SplitColumn(Source,"Column1",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10"}),
     

    as you're opening your csv-file as a text file where all columns will be brought into one. Then you need to split them up, using "," as the delimiter.

    But in your function, you're switching to open your csv-files as csv-files, providing the separator sign there already. This step will already split the content up into 10 columns. So the "Split Column by COMMA"-step is not necessary - actually this would be the reason for the problem: As you've also defined 10 column names who would (additionally) be created, no matter what.


    Imke Feldmann TheBIccountant.com


    Wednesday, July 13, 2016 2:16 PM
    Moderator
  • In order to identify the foul egg(s) in your csv-files, I'd recommend the following:

    let
         Source = Folder.Files(GetValue("SNFfolder")),
         #"Added Custom" = Table.AddColumn(Source, "Custom", each Csv.Document([Content], [Delimiter=",",Encoding=1252)),
         #"Added Custom1" = Table.AddColumn(#"Added Custom", "NoOfCols", each Table.ColumnCount([Custom]))
    in
         #"Added Custom1"
    
    

    This will return a count of columns of the parsed results. Check the outliers.

    If this doesn't work (not outliers here), it could be an issue with the number of rows to delete. Instead of hardcoding a number to delete, you could filter out rows with blanks (in the 1st column?) .. depending of course on your data structure.


    Imke Feldmann TheBIccountant.com

    Wednesday, July 13, 2016 2:56 PM
    Moderator
  • Making some progress. I actually now have all the columns processing into the dataset, before some columns would be left out the unpivot. I also change the source functions to match. Here is the new parse function.

    (FD622Parsed as text) as table=>
    
    let
        //Source = Csv.Document(File.Contents (FD622Parsed),[Delimiter=",",Encoding=1252]),
        
        Source = Table.FromColumns({Lines.FromBinary(File.Contents(FD622Parsed),null,null,1252)}),
    
        //Source = Table.FromColumns({Lines.FromBinary(File.Contents(GetValue("current")),null,null,1252)}),
    
        #"Split Column by COMMA" = Table.SplitColumn(Source,"Column1",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10"}),
        #"Removed Top 3 Rows" = Table.Skip(#"Split Column by COMMA",3),
        #"Set Data Headers to this row" = Table.PromoteHeaders(#"Removed Top 3 Rows"),
        #"Delete 2 Next Rows" = Table.Skip(#"Set Data Headers to this row",2),
        #"Add Headers for First 4 Rows" = Table.RenameColumns(#"Delete 2 Next Rows",{{"", "Year"}, {"_1", "MIP"}, {"Section", "Type"}, {"_2", "Cost Center"}}),
        #"Remove row with no YEAR" = Table.SelectRows(#"Add Headers for First 4 Rows", each ([Year] <> "")),
        #"Unpivot to Normalize" = Table.UnpivotOtherColumns(#"Remove row with no YEAR", {"Year", "MIP", "Type", "Cost Center"}, "Attribute", "Value"),
        #"Changed Value column to Decimal" = Table.TransformColumnTypes(#"Unpivot to Normalize",{{"Value", type number}}),
        #"Removed Value Column Errors" = Table.RemoveRowsWithErrors(#"Changed Value column to Decimal", {"Value"}),
        #"Remove Empty/Zero Values" = Table.SelectRows(#"Removed Value Column Errors", each ([Value] <> null and [Value] <> 0))
    in
        #"Remove Empty/Zero Values"

    I then revised the consolidate routine to call it.

    //    This routine will retrieve the directory details of all the files in the SNFfolder and put them into a list.
    //    Then is will parse each folder using the fParse function and bring in the detail rows in those files.
    //    Next will will extract key common data from the structured file name and add those columns to the resulting final file.
    
    
    let
        Source = Folder.Files(GetValue("SNFfolder")),
        #"Get folder file path and file names" = Table.RemoveColumns(Source,{"Extension", "Date accessed", "Date modified", "Date created", "Attributes", "Content"}),
        #"Added Custom" = Table.AddColumn(#"Get folder file path and file names", "OP", each fParse([Folder Path]&[Name])),
        #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Folder Path"}),
        #"Split Column by Delimiter" = Table.SplitColumn(#"Removed Columns","Name",Splitter.SplitTextByDelimiter("_", QuoteStyle.Csv),{"Name.1", "Name.2", "Name.3", "Name.4", "Name.5", "Name.6", "Name.7"}),
        #"Expanded OP" = Table.ExpandTableColumn(#"Split Column by Delimiter", "OP", {"Year", "MIP", "Type", "Cost Center", "Attribute", "Value"}, {"Year", "MIP", "Type", "Cost Center", "Attribute", "Value"}),
        #"Renamed Columns1" = Table.RenameColumns(#"Expanded OP",{{"Attribute", "Program"}, {"Value", "Amount"}, {"Name.3", "Patient Type"}, {"Name.4", "Billing NPI"}, {"Name.5", "Cycle Date"}, {"Name.6", "Issue Date"}}),
        #"Removed Columns1" = Table.RemoveColumns(#"Renamed Columns1",{"Name.1", "Name.2", "Name.7"}),
        #"Uppercased Text" = Table.TransformColumns(#"Removed Columns1",{{"Patient Type", Text.Upper}}),
        #"Changed Type" = Table.TransformColumnTypes(#"Uppercased Text",{{"Amount", Currency.Type}})
    in
        #"Changed Type"

    I now combine this result with a manual additions (which I used for hand processed files, using this routine

    // This routine will combine the direct parsed data with the manual additions and move it to the workbook and model.
    
    
    let
        Source = Table.Combine({#"FD622 Consolidate", manualAdditions}),
        #"Appended Query" = Table.Combine({Source,manualAdditions}),
        #"Filtered Rows" = Table.SelectRows(#"Appended Query", each ([Patient Type] <> null)),
        #"Removed Alternate Rows" = Table.AlternateRows(#"Filtered Rows",1,1,1)
    in
        #"Removed Alternate Rows"
    I ran into a new problem but was easy to work around. Each item in the main consolidate table was duplicated? I removed alternate rows as shown but I only have the 104 files in the source folder and they all have unique file names. I am flummoxed over this development but am pleased that I won't have to process 52 files by hand eight more times for our other units.

    • Marked as answer by geraldartman Wednesday, July 13, 2016 6:24 PM
    Wednesday, July 13, 2016 5:59 PM
  • That's good news.

    This is a very static code (containing many hardcoded parameters), but if it works - fine.

    In your last query, you're adding the manualAdditions twice: First in the source-step already and then in the #"Appended Query" - couldn't you just skip this step?


    Imke Feldmann TheBIccountant.com

    Wednesday, July 13, 2016 6:06 PM
    Moderator
  • Your comment caught me off guard since there were no manual additions (empty table). The result was a duplicate of the parsed data. I removed the second append and voila, success. I guess appending works in reverse adding the first file instead of the last one or perhaps appending an empty file has something to do with it.

    I understand the brute force of the routines. Still studying the M language and will be more elegant in the future, but the job needs to be accomplished. One way or the other.

    Wednesday, July 13, 2016 6:24 PM
  • Glad to see that following my first suggestion not to use different Source-expressions helped solving your problem.

    Imke Feldmann TheBIccountant.com

    Wednesday, July 13, 2016 7:50 PM
    Moderator