Code to parse XML file is stuck with large files power query RRS feed

  • Question

  • Hello to all,

    I have the following code to tabulate a XML file and relates the parents nodes number with each text and their attributes. The code it works with the sample XMl I show below, but is stuck for over more than 20 minutes and never finishes with the real file that has about 22 MB.

    Somebody could help me to improve this code please.

        Source = Xml.Tables(File.Contents("<Path to input.xml")),
        #"Expanded Table" = Table.ExpandTableColumn(Source, "Table", {"Attribute:index", "row"}, {"index", "row"}),
        #"Changed Type" = Table.TransformColumnTypes(#"Expanded Table",{{"index", Int64.Type}}),
        #"Expanded row" = Table.ExpandTableColumn(#"Changed Type", "row", {"column"}, {"column"}),
        #"Added Index" = Table.AddIndexColumn(#"Expanded row", "row_index", 0, 1),
        #"Added Custom" = Table.AddColumn(#"Added Index", "row", (C) => if C[column] = null then null else Table.RowCount(Table.SelectRows(#"Added Index", each [index] = C[index] and [row_index] <= C[row_index]))-1, Int64.Type),
        #"Expanded column" = Table.ExpandTableColumn(#"Added Custom", "column", {"text"}, {"text"}),
        #"Expanded text" = Table.ExpandTableColumn(#"Expanded column", "text", {"Element:Text", "Attribute:fontName", "Attribute:fontSize", "Attribute:x", "Attribute:y", "Attribute:width", "Attribute:height", "Attribute:fontStyle"}, {"Text", "fontName", "fontSize", "x", "y", "width", "height", "fontStyle"}),
        #"Added Index1" = Table.AddIndexColumn(#"Expanded text", "col_index", 0, 1),
        #"Added Custom1" = Table.AddColumn(#"Added Index1", "column", (C) => if C[row] = null then null else Table.RowCount(Table.SelectRows(#"Added Index1", each [index] = C[index] and [row] = C[row] and [col_index] <= C[col_index]))-1, Int64.Type),
        #"Removed Other Columns" = Table.SelectColumns(#"Added Custom1",{"index", "Text", "fontName", "fontSize", "x", "y", "width", "height", "row", "column", "fontStyle"}),
        #"Reordered Columns" = Table.ReorderColumns(#"Removed Other Columns",{"index", "row", "column", "Text", "fontName", "fontSize", "fontStyle", "x", "y", "width", "height"}),
        #"Changed Type1" = Table.TransformColumnTypes(#"Reordered Columns",{{"fontName", type text}, {"fontSize", type number}, {"x", type number}, {"y", type number}, {"width", type number}, {"height", type number}, {"Text", type text}, {"fontStyle", type text}})
        #"Changed Type1"

    This is the XML file:

    <?xml version="1.0" encoding="UTF-8"?>
        <page index="0"/>
        <page index="1">
            <row><column><text fontName="Arial" fontSize="12.0" x="121.10" y="83.42" width="71.04" height="12.00">achene –(e.g., strawberry)</text></column></row>
            <row><column><text fontName="Arial" fontSize="12.0" x="121.10" y="124.82" width="101.07" height="12.00">capsule – (e.g., Brazil nut)</text></column></row>
            <row><column><text fontName="Arial" fontSize="12.0" x="121.10" y="207.65" width="140.31" height="12.00">caryopsis – (e.g., wheat)</text></column></row>
        <page index="2">
            <row><column><text fontName="Arial" fontSize="12.0" x="85.10" y="69.62" width="24.36" height="12.00">cypsela – (e.g., dandelion)</text></column></row>
        <page index="3"/>
        <page index="4">
            <row><column><text fontName="Arial" fontSize="12.0" fontStyle="Bold" x="276.29" y="239.45" width="95.42" height="12.00">fibrous drupe – (e.g., coconut, walnut)</text></column></row>
            <row><column><text fontName="Arial" fontSize="12.0" x="121.10" y="266.81" width="229.57" height="12.00">follicle – (e.g., magnolia)</text></column>
                 <column><text fontName="Arial" fontSize="12.0" x="353.94" y="266.81" width="155.71" height="12.00">legume – (e.g., bean, pea, peanut)</text></column></row>
            <row><column><text fontName="Arial" fontSize="12.0" x="85.10" y="294.41" width="165.10" height="12.00">loment – a type of indehiscent legume</text></column>
                 <column><text fontName="Arial" fontSize="12.0" x="253.43" y="294.41" width="14.39" height="12.00">nut – (e.g., beech, hazelnut, oak acorn)</text></column>
                 <column><text fontName="Arial" fontSize="12.0" x="271.04" y="294.41" width="255.64" height="12.00">samara – (e.g., ash, elm, maple key)</text></column></row>
            <row><column><text fontName="Arial" fontSize="12.0" x="85.10" y="501.43" width="432.97" height="12.00">schizocarp – (e.g., carrot seed)</text></column></row>
        <page index="5">
            <row><column><text fontName="Arial" fontSize="12.0" x="85.10" y="69.62" width="363.44" height="12.00">silicle – (e.g., shepherd's purse)</text></column></row>
            <row><column><text fontName="Arial" fontSize="12.0" x="85.10" y="83.42" width="382.36" height="12.00">utricle – (e.g., strawberry)</text></column></row>
        <page index="6"/>

    • Edited by cgkmal Friday, May 31, 2019 10:38 PM
    Friday, May 31, 2019 10:27 PM


All replies