none
repeatable data import and query process - XML/CSV imports RRS feed

  • Question

  • Hi

    I am looking for advice regarding using excel, power query and powerbi to create repeatable data import and query processes.       For example currently I have several XML files in a directory and weekly would add more.

    What is a good process to follow to ensure that I can repeat-ably import and extract the data I want, save that data and weekly add to it with new data.

    Doing an import isn't a problem, I understand the tool to an extent its the process and since the XML files contain multiple tables in them the query is time consuming and I would want to make that easy and reliable for all future updates.

    About me, I am an excel guy who can get a lot done, have used SQL and SQL server 2008 R2(that is not a current restriction its what I learned on), MySQL and some python, but I am new to ETL.

    Looking forward to your advice.

    thanks

    Sayth

    For example here is my current query(working)

    let    Source = Xml.Tables(File.Contents("C:\Users\sayth\Repos\XML\20150815RHIL0.xml")),    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Attribute:id", Int64.Type}, {"Attribute:barriertrial", Int64.Type}, {"Attribute:venue", type text}, {"Attribute:date", type datetime}, {"Attribute:gearchanges", Int64.Type}, {"Attribute:stewardsreport", Int64.Type}, {"Attribute:gearlist", Int64.Type}, {"Attribute:racebook", Int64.Type}, {"Attribute:postracestewards", Int64.Type}, {"Attribute:meetingtype", type text}, {"Attribute:rail", type text}, {"Attribute:weather", type text}, {"Attribute:trackcondition", type text}, {"Attribute:nomsdeadline", type datetime}, {"Attribute:weightsdeadline", type datetime}, {"Attribute:acceptdeadline", type datetime}, {"Attribute:jockeydeadline", type datetime}}),    #"Expanded race" = Table.ExpandTableColumn(#"Changed Type", "race", {"condition", "nomination", "Attribute:id", "Attribute:number", "Attribute:nomnumber", "Attribute:division", "Attribute:name", "Attribute:mediumname", "Attribute:shortname", "Attribute:stage", "Attribute:distance", "Attribute:minweight", "Attribute:raisedweight", "Attribute:class", "Attribute:age", "Attribute:grade", "Attribute:weightcondition", "Attribute:trophy", "Attribute:owner", "Attribute:trainer", "Attribute:jockey", "Attribute:strapper", "Attribute:totalprize", "Attribute:first", "Attribute:second", "Attribute:third", "Attribute:fourth", "Attribute:fifth", "Attribute:time", "Attribute:bonustype", "Attribute:nomsfee", "Attribute:acceptfee", "Attribute:trackcondition", "Attribute:timingmethod", "Attribute:fastesttime", "Attribute:sectionaltime", "Attribute:formavailable", "Attribute:racebookprize"}, {"race.condition", "race.nomination", "race.Attribute:id", "race.Attribute:number", "race.Attribute:nomnumber", "race.Attribute:division", "race.Attribute:name", "race.Attribute:mediumname", "race.Attribute:shortname", "race.Attribute:stage", "race.Attribute:distance", "race.Attribute:minweight", "race.Attribute:raisedweight", "race.Attribute:class", "race.Attribute:age", "race.Attribute:grade", "race.Attribute:weightcondition", "race.Attribute:trophy", "race.Attribute:owner", "race.Attribute:trainer", "race.Attribute:jockey", "race.Attribute:strapper", "race.Attribute:totalprize", "race.Attribute:first", "race.Attribute:second", "race.Attribute:third", "race.Attribute:fourth", "race.Attribute:fifth", "race.Attribute:time", "race.Attribute:bonustype", "race.Attribute:nomsfee", "race.Attribute:acceptfee", "race.Attribute:trackcondition", "race.Attribute:timingmethod", "race.Attribute:fastesttime", "race.Attribute:sectionaltime", "race.Attribute:formavailable", "race.Attribute:racebookprize"}),    #"Expanded race.nomination" = Table.ExpandTableColumn(#"Expanded race", "race.nomination", {"Attribute:age", "Attribute:barrier", "Attribute:blinkers", "Attribute:career", "Attribute:deadtrack", "Attribute:decimalmargin", "Attribute:finished", "Attribute:firstup", "Attribute:goodtrack", "Attribute:heavytrack", "Attribute:horse", "Attribute:id", "Attribute:number", "Attribute:pricestarting", "Attribute:rating", "Attribute:regnumber", "Attribute:rsbtrainername", "Attribute:sex", "Attribute:slowtrack", "Attribute:thisdistance", "Attribute:thistrack", "Attribute:trainernumber", "Attribute:weight"}, {"race.nomination.Attribute:age", "race.nomination.Attribute:barrier", "race.nomination.Attribute:blinkers", "race.nomination.Attribute:career", "race.nomination.Attribute:deadtrack", "race.nomination.Attribute:decimalmargin", "race.nomination.Attribute:finished", "race.nomination.Attribute:firstup", "race.nomination.Attribute:goodtrack", "race.nomination.Attribute:heavytrack", "race.nomination.Attribute:horse", "race.nomination.Attribute:id", "race.nomination.Attribute:number", "race.nomination.Attribute:pricestarting", "race.nomination.Attribute:rating", "race.nomination.Attribute:regnumber", "race.nomination.Attribute:rsbtrainername", "race.nomination.Attribute:sex", "race.nomination.Attribute:slowtrack", "race.nomination.Attribute:thisdistance", "race.nomination.Attribute:thistrack", "race.nomination.Attribute:trainernumber", "race.nomination.Attribute:weight"}),    #"Expanded club" = Table.ExpandTableColumn(#"Expanded race.nomination", "club", {"Attribute:abbrevname"}, {"club.Attribute:abbrevname"}),    #"Removed Columns" = Table.RemoveColumns(#"Expanded club",{"race.nomination.Attribute:regnumber", "race.nomination.Attribute:slowtrack", "race.Attribute:trophy", "race.Attribute:owner", "race.Attribute:trainer", "race.Attribute:jockey", "race.Attribute:strapper", "race.Attribute:totalprize", "race.Attribute:weightcondition", "race.Attribute:first", "race.Attribute:second", "race.Attribute:third", "race.Attribute:fourth", "race.Attribute:fifth", "race.Attribute:bonustype", "race.Attribute:nomsfee", "race.Attribute:acceptfee", "race.Attribute:timingmethod", "race.Attribute:formavailable", "race.Attribute:racebookprize", "Attribute:barriertrial", "Attribute:gearchanges", "Attribute:stewardsreport", "Attribute:gearlist", "Attribute:racebook", "Attribute:postracestewards", "club.Attribute:abbrevname", "Attribute:nomsdeadline", "Attribute:weightsdeadline", "Attribute:acceptdeadline", "Attribute:jockeydeadline", "race.nomination.Attribute:deadtrack"}),    #"Split Column by Delimiter" = Table.SplitColumn(#"Removed Columns","race.nomination.Attribute:pricestarting",Splitter.SplitTextByDelimiter("F", QuoteStyle.Csv),{"race.nomination.Attribute:pricestarting.1", "race.nomination.Attribute:pricestarting.2"}),    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"race.nomination.Attribute:pricestarting.1", type text}, {"race.nomination.Attribute:pricestarting.2", type text}, {"race.Attribute:id", Int64.Type}, {"race.Attribute:number", Int64.Type}, {"race.Attribute:nomnumber", Int64.Type}, {"race.Attribute:division", Int64.Type}, {"race.Attribute:name", type text}, {"race.Attribute:mediumname", type text}, {"race.Attribute:shortname", type text}, {"race.Attribute:stage", type text}, {"race.Attribute:distance", Int64.Type}, {"race.Attribute:minweight", Int64.Type}, {"race.Attribute:raisedweight", Int64.Type}, {"race.Attribute:class", type text}, {"race.Attribute:age", type text}, {"race.Attribute:grade", Int64.Type}, {"race.Attribute:time", type datetime}, {"race.Attribute:trackcondition", type text}, {"race.Attribute:fastesttime", type text}, {"race.Attribute:sectionaltime", type text}}),    #"Removed Columns1" = Table.RemoveColumns(#"Changed Type1",{"race.nomination.Attribute:pricestarting.2"}),    #"Changed Type2" = Table.TransformColumnTypes(#"Removed Columns1",{{"race.nomination.Attribute:pricestarting.1", Currency.Type}}),    #"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type2","race.nomination.Attribute:firstup",Splitter.SplitTextByDelimiter("-", QuoteStyle.Csv),{"race.nomination.Attribute:firstup.1", "race.nomination.Attribute:firstup.2", "race.nomination.Attribute:firstup.3", "race.nomination.Attribute:firstup.4"}),    #"Changed Type3" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"race.nomination.Attribute:firstup.1", Int64.Type}, {"race.nomination.Attribute:firstup.2", Int64.Type}, {"race.nomination.Attribute:firstup.3", Int64.Type}, {"race.nomination.Attribute:firstup.4", type text}}),    #"Split Column by Delimiter2" = Table.SplitColumn(#"Changed Type3","race.nomination.Attribute:firstup.4",Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv),{"race.nomination.Attribute:firstup.4.1", "race.nomination.Attribute:firstup.4.2"}),    #"Changed Type4" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"race.nomination.Attribute:firstup.4.1", Int64.Type}, {"race.nomination.Attribute:firstup.4.2", Int64.Type}}),    #"Removed Columns2" = Table.RemoveColumns(#"Changed Type4",{"race.nomination.Attribute:firstup.4.1"}),    #"Changed Type5" = Table.TransformColumnTypes(#"Removed Columns2",{{"race.nomination.Attribute:firstup.4.2", Currency.Type}}),    #"Removed Columns3" = Table.RemoveColumns(#"Changed Type5",{"race.nomination.Attribute:heavytrack"}),    #"Split Column by Delimiter3" = Table.SplitColumn(#"Removed Columns3","race.nomination.Attribute:goodtrack",Splitter.SplitTextByDelimiter("-", QuoteStyle.Csv),{"race.nomination.Attribute:goodtrack.1", "race.nomination.Attribute:goodtrack.2", "race.nomination.Attribute:goodtrack.3", "race.nomination.Attribute:goodtrack.4"}),    #"Changed Type6" = Table.TransformColumnTypes(#"Split Column by Delimiter3",{{"race.nomination.Attribute:goodtrack.1", Int64.Type}, {"race.nomination.Attribute:goodtrack.2", Int64.Type}, {"race.nomination.Attribute:goodtrack.3", Int64.Type}, {"race.nomination.Attribute:goodtrack.4", type text}}),    #"Split Column by Delimiter4" = Table.SplitColumn(#"Changed Type6","race.nomination.Attribute:goodtrack.4",Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv),{"race.nomination.Attribute:goodtrack.4.1", "race.nomination.Attribute:goodtrack.4.2"}),    #"Changed Type7" = Table.TransformColumnTypes(#"Split Column by Delimiter4",{{"race.nomination.Attribute:goodtrack.4.1", Int64.Type}, {"race.nomination.Attribute:goodtrack.4.2", Int64.Type}}),    #"Removed Columns4" = Table.RemoveColumns(#"Changed Type7",{"race.nomination.Attribute:goodtrack.4.1"}),    #"Changed Type8" = Table.TransformColumnTypes(#"Removed Columns4",{{"race.nomination.Attribute:goodtrack.4.2", Currency.Type}}),    #"Removed Columns5" = Table.RemoveColumns(#"Changed Type8",{"Attribute:date", "Attribute:meetingtype", "Attribute:rail", "race.Attribute:grade", "race.Attribute:name", "race.Attribute:mediumname", "race.Attribute:nomnumber", "race.Attribute:division"}),    #"Split Column by Delimiter5" = Table.SplitColumn(#"Removed Columns5","race.nomination.Attribute:thisdistance",Splitter.SplitTextByDelimiter("-", QuoteStyle.Csv),{"race.nomination.Attribute:thisdistance.1", "race.nomination.Attribute:thisdistance.2", "race.nomination.Attribute:thisdistance.3", "race.nomination.Attribute:thisdistance.4"}),    #"Changed Type9" = Table.TransformColumnTypes(#"Split Column by Delimiter5",{{"race.nomination.Attribute:thisdistance.1", Int64.Type}, {"race.nomination.Attribute:thisdistance.2", Int64.Type}, {"race.nomination.Attribute:thisdistance.3", Int64.Type}, {"race.nomination.Attribute:thisdistance.4", type text}}),    #"Split Column by Delimiter6" = Table.SplitColumn(#"Changed Type9","race.nomination.Attribute:thisdistance.4",Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv),{"race.nomination.Attribute:thisdistance.4.1", "race.nomination.Attribute:thisdistance.4.2"}),    #"Changed Type10" = Table.TransformColumnTypes(#"Split Column by Delimiter6",{{"race.nomination.Attribute:thisdistance.4.1", Int64.Type}, {"race.nomination.Attribute:thisdistance.4.2", Int64.Type}}),    #"Removed Columns6" = Table.RemoveColumns(#"Changed Type10",{"race.nomination.Attribute:thisdistance.4.1"}),    #"Changed Type11" = Table.TransformColumnTypes(#"Removed Columns6",{{"race.nomination.Attribute:thisdistance.4.2", Currency.Type}}),    #"Split Column by Delimiter7" = Table.SplitColumn(#"Changed Type11","race.nomination.Attribute:thistrack",Splitter.SplitTextByDelimiter("-", QuoteStyle.Csv),{"race.nomination.Attribute:thistrack.1", "race.nomination.Attribute:thistrack.2", "race.nomination.Attribute:thistrack.3", "race.nomination.Attribute:thistrack.4"}),    #"Changed Type12" = Table.TransformColumnTypes(#"Split Column by Delimiter7",{{"race.nomination.Attribute:thistrack.1", Int64.Type}, {"race.nomination.Attribute:thistrack.2", Int64.Type}, {"race.nomination.Attribute:thistrack.3", Int64.Type}, {"race.nomination.Attribute:thistrack.4", type text}}),    #"Split Column by Delimiter8" = Table.SplitColumn(#"Changed Type12","race.nomination.Attribute:thistrack.4",Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv),{"race.nomination.Attribute:thistrack.4.1", "race.nomination.Attribute:thistrack.4.2"}),    #"Changed Type13" = Table.TransformColumnTypes(#"Split Column by Delimiter8",{{"race.nomination.Attribute:thistrack.4.1", Int64.Type}, {"race.nomination.Attribute:thistrack.4.2", Int64.Type}}),    #"Removed Columns7" = Table.RemoveColumns(#"Changed Type13",{"race.nomination.Attribute:thistrack.4.1"}),    #"Changed Type14" = Table.TransformColumnTypes(#"Removed Columns7",{{"race.nomination.Attribute:thistrack.4.2", Currency.Type}}),    #"Split Column by Delimiter9" = Table.SplitColumn(#"Changed Type14","race.nomination.Attribute:career",Splitter.SplitTextByDelimiter("-", QuoteStyle.Csv),{"race.nomination.Attribute:career.1", "race.nomination.Attribute:career.2", "race.nomination.Attribute:career.3", "race.nomination.Attribute:career.4"}),    #"Changed Type15" = Table.TransformColumnTypes(#"Split Column by Delimiter9",{{"race.nomination.Attribute:career.1", Int64.Type}, {"race.nomination.Attribute:career.2", Int64.Type}, {"race.nomination.Attribute:career.3", Int64.Type}, {"race.nomination.Attribute:career.4", type text}}),    #"Split Column by Delimiter10" = Table.SplitColumn(#"Changed Type15","race.nomination.Attribute:career.4",Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv),{"race.nomination.Attribute:career.4.1", "race.nomination.Attribute:career.4.2"}),    #"Changed Type16" = Table.TransformColumnTypes(#"Split Column by Delimiter10",{{"race.nomination.Attribute:career.4.1", Int64.Type}, {"race.nomination.Attribute:career.4.2", Int64.Type}}),    #"Removed Columns8" = Table.RemoveColumns(#"Changed Type16",{"race.nomination.Attribute:career.4.1"}),    #"Changed Type17" = Table.TransformColumnTypes(#"Removed Columns8",{{"race.nomination.Attribute:career.4.2", Currency.Type}, {"race.nomination.Attribute:decimalmargin", type number}}),    #"Removed Columns9" = Table.RemoveColumns(#"Changed Type17",{"race.condition"}),    #"Reordered Columns" = Table.ReorderColumns(#"Removed Columns9",{"race.Attribute:number", "race.Attribute:id", "race.nomination.Attribute:age", "race.nomination.Attribute:barrier", "race.nomination.Attribute:blinkers", "race.nomination.Attribute:career.1", "race.nomination.Attribute:career.2", "race.nomination.Attribute:career.3", "race.nomination.Attribute:career.4.2", "race.nomination.Attribute:decimalmargin", "race.nomination.Attribute:finished", "race.nomination.Attribute:firstup.1", "race.nomination.Attribute:firstup.2", "race.nomination.Attribute:firstup.3", "race.nomination.Attribute:firstup.4.2", "race.nomination.Attribute:goodtrack.1", "race.nomination.Attribute:goodtrack.2", "race.nomination.Attribute:goodtrack.3", "race.nomination.Attribute:goodtrack.4.2", "race.nomination.Attribute:horse", "race.nomination.Attribute:id", "race.nomination.Attribute:number", "race.nomination.Attribute:pricestarting.1", "race.nomination.Attribute:rating", "race.nomination.Attribute:rsbtrainername", "race.nomination.Attribute:sex", "race.nomination.Attribute:thisdistance.1", "race.nomination.Attribute:thisdistance.2", "race.nomination.Attribute:thisdistance.3", "race.nomination.Attribute:thisdistance.4.2", "race.nomination.Attribute:thistrack.1", "race.nomination.Attribute:thistrack.2", "race.nomination.Attribute:thistrack.3", "race.nomination.Attribute:thistrack.4.2", "race.nomination.Attribute:trainernumber", "race.nomination.Attribute:weight", "race.Attribute:shortname", "race.Attribute:stage", "race.Attribute:distance", "race.Attribute:minweight", "race.Attribute:raisedweight", "race.Attribute:class", "race.Attribute:age", "race.Attribute:time", "race.Attribute:trackcondition", "race.Attribute:fastesttime", "race.Attribute:sectionaltime", "Attribute:id", "Attribute:venue", "Attribute:weather", "Attribute:trackcondition"})in    #"Reordered Columns"


    Taking my C# further


    • Edited by SaythJ Sunday, August 16, 2015 2:05 AM
    Sunday, August 16, 2015 1:25 AM

Answers

  • It sounds like what you are looking for is an Append query, you could create an Append query and give it the same M-code that you have above and just change the source file to your new file. You might look at using the Folder option so that you can process multiple files in a single query.
    Monday, August 17, 2015 2:44 PM