none
Power Query - split column by variable field lengths - account for null values RRS feed

  • Question

  • This is in reference to Stack Overflow Power Query question and answer which provides a nice solution to split character delimited text file into columns based on column character count widths.

    This solution doesn't account for nulls.  

    When null value is encountered it gives an error at one of the columns to right. I can't quite say exactly what is happening. The error is

    An error occurred in the ‘SplitText’ query. Expression.Error: The 'count' argument is out of range.


    The code for split function is:

    let
    SplitText = (text, lengths) => 
    let
    LengthsCount = List.Count(lengths),
    // Keep track of the index in the lengths list and the position in the text to take the next characters from. Use this information to get the next segment and put it into a list.
    Split = List.Generate(() => {0, 0}, each _{0} < LengthsCount, each {_{0} + 1, _{1} + lengths{_{0}}}, each Text.Range(text, _{1}, lengths{_{0}}))
    in
    Split,
    // Convert the list to a record to 
    ListToRecord = (text, lengths) => 
    let
    List = SplitText(text, lengths),
    Record = Record.FromList(List, List.Transform({1 .. List.Count(List)}, each Number.ToText(_)))
    in
    Record
    in
    ListToRecord

    Then, in your table, add a custom column that uses this formula:

    each SplitText([Column1], {4, 2, 5, 3})

    How to modify this to account for nulls?

    I am using Excel 2010 64 bit and Power Query Version: 2.29.4217.1861




    • Edited by sitrucp Friday, April 29, 2016 12:45 AM
    Friday, April 29, 2016 12:22 AM

Answers

  • Hi Davie,

    I preffer different approach to the problem but if you want to use these functions, try to change the step "Split"

    Split = List.Generate(() => {0, 0}, each _{0} < LengthsCount, each {_{0} + 1, _{1} + lengths{_{0}}}, each try Text.Range(text, _{1}, lengths{_{0}}) otherwise null)
    Regards

    • Marked as answer by sitrucp Friday, April 29, 2016 4:17 PM
    Friday, April 29, 2016 9:41 AM

All replies

  • Hi Davie,

    I preffer different approach to the problem but if you want to use these functions, try to change the step "Split"

    Split = List.Generate(() => {0, 0}, each _{0} < LengthsCount, each {_{0} + 1, _{1} + lengths{_{0}}}, each try Text.Range(text, _{1}, lengths{_{0}}) otherwise null)
    Regards

    • Marked as answer by sitrucp Friday, April 29, 2016 4:17 PM
    Friday, April 29, 2016 9:41 AM
  • This does the job. I tried 'try' but not everywhere or in correct syntax.

    What is another approach? I am interested and others landing here will be too : ) 

    Thank you!

    Friday, April 29, 2016 4:17 PM
  • Hi Davie,

    Try this code below for 50 000 rows (or more) and look the difference.

    let
        ListOfLengths = {3,4,3,5,2},
        ListPosition = List.Positions(ListOfLengths),
        Positions = {0} & List.Transform(ListPosition, each List.Sum(List.FirstN(ListOfLengths, _+1)) ),
        ColNames = List.Transform({1..List.Count(ListOfLengths)}, each Text.From(_)),
        Source = Table.FromRows({{"aaaAAAACCC"},{null},{"yyyXXXX111GGGGG"},{"TryThisOne:-)"}}, {"Data"}),
        ChType = Table.TransformColumnTypes(Source,{{"Data", type text}}),
        DuplicateColumn = Table.DuplicateColumn(ChType, "Data", "Data — copy"),
        SplitCol = Table.SplitColumn(DuplicateColumn,"Data — copy",Splitter.SplitTextByPositions(Positions),ColNames)
    in
        SplitCol

    Of Course you have to change the "Source" step.

    Additionally, I think that Table.Range should be replaced by Table.Middle in your "SpliText" function (You can see the difference if you change - for a while -  source step in your code to

    Source = Table.FromRows({{"aaaAAAACCC"},{null},{"yyyXXXX111GGGGG"},{"TryThisOne:-)"}}, {"Data"}),

    See the last row.;-)

    Regards

    Saturday, April 30, 2016 12:28 PM
  • Wow that is great. 

    Lots of interesting ways to use the List functions.

    Very instructive!

    I have seen, and copy/paste used, the "_" in M code. What does it mean though? Is it correct to call it a parameter?

    Saturday, April 30, 2016 8:45 PM