none
Import XML file from Exchange mailbox RRS feed

  • Question

  • Hello,

    I want to import the data from XML files that arrive as an attachment of emails in an Exchange Mailbox.

    I connect the Exchange Mailbox and edit the data:

    - select column Attachments (=table)

    - remove other columns

    - expand the Attachments column table

    - the XML data is in the column AttachmentContent as Binary

    - Drill down on the Binary

    - Selecting 1 of the items in the binary gives me some data, sometimes with other tables in it to drill down further, but I can't seem to import all data from all mails at once? (+/-900 mails and counting)

    - Is there an easier way to work around? or where can I find detailed info on how to use this Binary data/XML?

    Thanks in advance for a quick respons.

    Dirk.

    Sunday, May 7, 2017 2:01 PM

Answers

  • Hi Dirk. You could try replacing the absolute indexes (e.g. {0}, {1}, etc.) with record-based indexes. For example:

    Table0{[Name="Foo"]}

    This means, "Return the row from Table0 that has a Name field equal to 'Foo'".

    Ehren

    Friday, May 19, 2017 11:29 PM
    Owner

All replies

  • Hi,

    if your main problem is extracting the XML files from the message, you can use EWS Managed API to save the attachment to disk and then just process the XML file. Then just import the XML files.


    Evgenij Smirnov

    I work @ msg services ag, Berlin -> http://www.msg-services.de
    I blog (in German) @ http://it-pro-berlin.de
    my stuff in PSGallery --> https://www.powershellgallery.com/profiles/it-pro-berlin.de/
    Exchange User Group, Berlin -> http://exusg.de
    Windows Server User Group, Berlin -> http://www.winsvr-berlin.de
    Mark Minasi Technical Forum, reloaded -> http://newforum.minasi.com

    Sunday, May 7, 2017 6:36 PM
  • Hi,

    When I save the attachment to disk and extract the XML with PowerQuery & I select all items, I get 15 queries that I can dump in 15 worksheets, but there is no clear link on each sheet to link the data.

    I'm trying to import the XML with more control over the data & prefer to get the data 1 one record line / XML.

    I will look into your proposal, but still prefer a direct from Exchange solution.

    Best Regards,

    Dirk Verliefden.

    Monday, May 8, 2017 4:36 AM
  • Hi Dirk. You should be able to drill down on one of the example attachments (as you described above), and then turn this into an M function that you can apply to all the attachments in the original query.

    Here's an example of how to do this kind of thing (against Excel files, but a similar approach should work in your scenario):

    https://blog.crossjoin.co.uk/2016/05/15/creating-m-functions-from-parameterised-queries-in-power-bi/

    Ehren

    Monday, May 8, 2017 8:20 PM
    Owner
  • Yesterday I got a new laptop with Excel 2016 installed.

    The way Get & Transform imports XML data from the Attachement Binary is different from 2013!

    In 2013 you had to drill down to a named space field level before you could load the table.

    In 2016 you can select at higher levels and import all named spaces at once.

    I need some time to work out the details, but this looks a lot more promising then before.

    Rgds,

    Dirk.

    Friday, May 12, 2017 2:10 PM
  • Ok, let us know if you hit any snags.

    Ehren

    Monday, May 15, 2017 4:53 PM
    Owner
  • Hi,

    I found some info on how PQ is handling binary info in PQ, but still no info found on how to treat the data if it is data from an XML. I must admit that my knowledge of XML is limited.

    I can extract a part of the XML or into several sheets per subdivision of data, but if I want to convert the info into 1 record, I need a key to link the datalines. The XML filename is a unique key. How can I add a column with the filename?

    Below the XML (company data replaced by x's or 9's):
    <?xml version="1.0" encoding="UTF-8"?>
    -<ns2:C_EAD_VAL xmlns:ns4="urn:publicid:-:EC:DGTAXUD:EMCS:PHASE3:IE801:V1.76" xmlns:ns3="urn:publicid:-:EC:DGTAXUD:EMCS:PHASE3:TMS:V1.76" xmlns:ns2="http://emcs.minfin.fgov.be/xsd/MSAMessage" xmlns="http://emcs.minfin.fgov.be/xsd/Context">
    -<ns2:Context>
    -<NameValue>
    <name>Operation</name>
    <value>forwardAAD</value>
    </NameValue>
    -<NameValue>
    <name>autorisationholder</name>
    <value>BE9X999999999</value>
    </NameValue>
    -<NameValue>
    <name>storageplace</name>
    <value>BE9X999999999</value>
    </NameValue>
    </ns2:Context>
    -<ns4:IE801>
    -<ns4:Header>
    <ns3:MessageSender>NDEA.BE</ns3:MessageSender>
    <ns3:MessageRecipient>0999999999</ns3:MessageRecipient>
    <ns3:DateOfPreparation>2017-05-01</ns3:DateOfPreparation>
    <ns3:TimeOfPreparation>04:00:29.013</ns3:TimeOfPreparation>
    <ns3:MessageIdentifier>999999999-xxxx-9999-XXXX-999999999999</ns3:MessageIdentifier>
    </ns4:Header>
    -<ns4:Body>
    -<ns4:EADContainer>
    -<ns4:ConsigneeTrader language="nl">
    <ns4:Traderid>BE9X999999999</ns4:Traderid>
    <ns4:TraderName>XXXXXXXXXXX</ns4:TraderName>
    <ns4:StreetName>XXXXXXXXXXX/ns4:StreetName>
    <ns4:Postcode>9999</ns4:Postcode>
    <ns4:City>XXXXXXXXXXX</ns4:City>
    </ns4:ConsigneeTrader>
    -<ns4:ExciseMovementEad>
    <ns4:AdministrativeReferenceCode>99BE99999999999999999</ns4:AdministrativeReferenceCode>
    <ns4:DateAndTimeOfValidationOfEad>2017-05-01T04:00:29.013</ns4:DateAndTimeOfValidationOfEad>
    </ns4:ExciseMovementEad>
    -<ns4:ConsignorTrader language="nl">
    <ns4:TraderExciseNumber>XXXXXXXXXXX</ns4:TraderExciseNumber>
    <ns4:TraderName>XXXXXXXXXXX</ns4:TraderName>
    <ns4:StreetName>XXXXXXXXXXX</ns4:StreetName>
    <ns4:Postcode>9999</ns4:Postcode>
    <ns4:City>XXXXXXXXXXX</ns4:City>
    </ns4:ConsignorTrader>
    -<ns4:PlaceOfDispatchTrader language="nl">
    <ns4:ReferenceOfTaxWarehouse>XXXXXXXXXXX</ns4:ReferenceOfTaxWarehouse>
    <ns4:TraderName>XXXXXXXXXXX</ns4:TraderName>
    <ns4:StreetName>XXXXXXXXXXX</ns4:StreetName>
    <ns4:Postcode>9999</ns4:Postcode>
    <ns4:City>XXXXXXXXXXX</ns4:City>
    </ns4:PlaceOfDispatchTrader>
    -<ns4:DeliveryPlaceTrader language="nl">
    <ns4:Traderid>BEXXXXXXXXXXX</ns4:Traderid>
    <ns4:TraderName>XXXXXXXXXXX</ns4:TraderName>
    <ns4:StreetName>XXXXXXXXXXX</ns4:StreetName>
    <ns4:Postcode>9999</ns4:Postcode>
    <ns4:City>XXXXXXXXXXX</ns4:City>
    </ns4:DeliveryPlaceTrader>
    -<ns4:CompetentAuthorityDispatchOffice>
    <ns4:ReferenceNumber>BEXXXXXXXXXXX</ns4:ReferenceNumber>
    </ns4:CompetentAuthorityDispatchOffice>
    -<ns4:FirstTransporterTrader language="nl">
    <ns4:TraderName>XXXXXXXXXXX</ns4:TraderName>
    <ns4:StreetName>XXXXXXXXXXX</ns4:StreetName>
    <ns4:Postcode>9999</ns4:Postcode>
    <ns4:City>XXXXXXXXXXX</ns4:City>
    </ns4:FirstTransporterTrader>
    -<ns4:Ead>
    <ns4:LocalReferenceNumber>GLXXXXXXXXXXX</ns4:LocalReferenceNumber>
    <ns4:InvoiceNumber>WEEK</ns4:InvoiceNumber>
    <ns4:InvoiceDate>2017-04-30</ns4:InvoiceDate>
    <ns4:OriginTypeCode>1</ns4:OriginTypeCode>
    <ns4:DateOfDispatch>2017-04-30</ns4:DateOfDispatch>
    <ns4:TimeOfDispatch>23:59:59.000</ns4:TimeOfDispatch>
    </ns4:Ead>
    -<ns4:HeaderEad>
    <ns4:SequenceNumber>1</ns4:SequenceNumber>
    <ns4:DateAndTimeOfUpdateValidation>2017-05-01T04:00:29.013</ns4:DateAndTimeOfUpdateValidation>
    <ns4:DestinationTypeCode>1</ns4:DestinationTypeCode>
    <ns4:JourneyTime>D07</ns4:JourneyTime>
    <ns4:TransportArrangement>2</ns4:TransportArrangement>
    </ns4:HeaderEad>
    -<ns4:TransportMode>
    <ns4:TransportModeCode>3</ns4:TransportModeCode>
    </ns4:TransportMode>
    -<ns4:MovementGuarantee>
    <ns4:GuarantorTypeCode>1</ns4:GuarantorTypeCode>
    </ns4:MovementGuarantee>
    -<ns4:BodyEad>
    <ns4:BodyRecordUniqueReference>1</ns4:BodyRecordUniqueReference>
    <ns4:ExciseProductCode>E999</ns4:ExciseProductCode>
    <ns4:CnCode>99999999</ns4:CnCode>
    <ns4:Quantity>000000</ns4:Quantity>
    <ns4:GrossWeight>000000</ns4:GrossWeight>
    <ns4:NetWeight>000000</ns4:NetWeight>
    <ns4:CommercialDescription language="nl">xxxxxx</ns4:CommercialDescription>
    -<ns4:Package>
    <ns4:KindOfPackages>VQ</ns4:KindOfPackages>
    </ns4:Package>
    </ns4:BodyEad>
    -<ns4:TransportDetails>
    <ns4:TransportUnitCode>2</ns4:TransportUnitCode>
    <ns4:IdentityOfTransportUnits>Trekker</ns4:IdentityOfTransportUnits>
    <ns4:ComplementaryInformation language="nl">XXXXXXXXXXX</ns4:ComplementaryInformation>
    </ns4:TransportDetails>
    </ns4:EADContainer>
    </ns4:Body>
    </ns4:IE801>
    </ns2:C_EAD_VAL>

    Wednesday, May 17, 2017 6:16 PM
  • Can you just trim the source columns down to the unique identifier (the file name) and the xml binary data (AttachmentContent), and then add a new custom column with the following formula?

    Xml.Tables([XmlBinaryColumn])

    Then you might be able to expand this new column to extract what you need for each row.

    Ehren

    Wednesday, May 17, 2017 7:12 PM
    Owner
  • Ehren,

    Thanks for the reply but this is not working for me.

    The XML are sent by the government and are about excise duty transactions of goods that we or our suppliers have entered in a European Excise Duty registration system.
    There are different type of messages, but even within the type of message 'IE801' I have noticed that the messages are not always equal. There may be more or less subtables and in the subtables there may be more or less fields defined. The information I want to obtain from these messages are in all IE801 messages because these are the minimal required fields.
    Because not all subtables & fields are equal, the processing of the Binary (email attachment) goes well for all messages that are equal to the 'the sample file' but may return null when there is a shift in the fields.

    Is there no option to address directly the subtables and fields by name?

    This is the M code in the 'Transform Sample File from Mail' ('Mail' is my query to start with):

    let
        Source = Xml.Tables(#"Sample File Parameter1"),
        Table1 = Source{1}[Table],
        Table0 = Table1{0}[Table],
        Table2 = Table0{1}[Table],
        Table3 = Table2{0}[Table]
    in
        Table3

    Instead of referring to the records with {1}, is it not possible to refer to :

    Table1= {-<ns4:BodyEad>}[<ns4:ExciseProductCode>E999</ns4:ExciseProductCode>, <ns4:CnCode>99999999</ns4:CnCode>, < ns4:Quantity>000000</ns4:Quantity>, < ns4:GrossWeight>000000</ns4:GrossWeight> ]

    ?

    Best regards,

    Dirk

    Friday, May 19, 2017 12:13 PM
  • Hi Dirk. You could try replacing the absolute indexes (e.g. {0}, {1}, etc.) with record-based indexes. For example:

    Table0{[Name="Foo"]}

    This means, "Return the row from Table0 that has a Name field equal to 'Foo'".

    Ehren

    Friday, May 19, 2017 11:29 PM
    Owner
  • Hi Ehren,

    I think I cracked/hacked it.

    Since not all XML attachments have the same format I had to dive into the M language and make some adjustments on how Power Query treats the binary import:

    - not all messages have the same number of subtables in the 'container'

    - I looked up an XML that has all possible subtables and downloaded this to a network drive

    - Power Query creates extra files (queries & function) when importing from binary:

    --'Sample File Parameter'

    --'Sample File'

    --'Transform Sample File from <your query>'

    --function 'Transform File from <your query>'

    -<your query>

    1/ 'Sample file Parameter' : no changes

    2/ 'Sample File' I edited and replaced the M code to a reference to the XML om my drive:

    let
        Source = File.Contents("N:\APPS2\SUPPLY\Projects\Admin2016\Accijns\Sample File\IE801_1_17BEJ2HG0811002BE8Y37.xml")
    in
        Source

    3/ 'Transform Sample File from <your query>' : I edited the 'Transform Sample File from <your query>':

    // not all M-code lines are included below because repetitive work -> //... indicates lines deleted

    // Source = refer to the Sample File

    let
        Source = Xml.Tables(#"Sample File Parameter1"),

    // Create a variable that holds the table of every subtable
    // Drill down to the last level of the structure: EADContainer
    // Name = urn:publicid:-:EC:DGTAXUD:EMCS:PHASE3:IE801:V1.76
        Table = Source{[Name = "urn:publicid:-:EC:DGTAXUD:EMCS:PHASE3:IE801:V1.76"]}[Table],
    // Name = IE801
        Table1 = Table{[Name = "IE801"]}[Table],
    // Name = Body
        Table2 = Table1{[Name = "Body"]}[Table],
    // Name = EADContainer
       T_EADContainer = Table2{[Name="EADContainer"]}[Table],

    // Create a variable of each subtable in EADContainer
    // added 'try .. otherwise' in case the subtable does not exist in the XML processed

    T_HeaderEad = try T_EADContainer{[Name="HeaderEad"]}[Table] otherwise null,
    T_BodyEad = try T_EADContainer{[Name="BodyEad"]}[Table] otherwise null,
    //...
    T_DeliveryPlaceTrader= try T_EADContainer{[Name="DeliveryPlaceTrader"]}[Table] otherwise null,
    T_Ead= try T_EADContainer{[Name="Ead"]}[Table] otherwise null,
    // ...

    // Start of creation of the record line
    // First take the first table that needs a pivot to make it 1 record line
    Table0 = Source{[Name="Context"]}[Table],
        Table6 = Table0{0}[Table],
        Table7 = Table6{0}[Table],
        #"Changed Type" = Table.TransformColumnTypes(Table7,{{"name", type text}, {"value", type text}}),
        #"Pivoted Column" = Table.Pivot(#"Changed Type", List.Distinct(#"Changed Type"[name]), "name", "value"),

    // For each subtable add a column to this record using the variables
        #"Added Custom" = Table.AddColumn(#"Pivoted Column", "ConsigneeTrader", each T_ConsigneeTrader),
        #"Added Custom1" = Table.AddColumn(#"Added Custom", "ExciseMovementEad", each T_ExciseMovementEad),
    //...
          #"Added Custom7" = Table.AddColumn(#"Added Custom6", "Ead", each T_Ead),
        #"Added Custom8" = Table.AddColumn(#"Added Custom7", "HeaderEad", each T_HeaderEad),
        #"Added Custom11" = Table.AddColumn(#"Added Custom10", "BodyEad", each T_BodyEad),
    //...

    // Expand all the tables in the added columns (or select the desired fields)
        #"Expanded ConsigneeTrader" = Table.ExpandTableColumn(#"Added Custom12", "ConsigneeTrader", {"Traderid", "TraderName", "StreetName", "Postcode", "City", "Attribute:language"}, {"ConsigneeTrader.Traderid", "ConsigneeTrader.TraderName", "ConsigneeTrader.Attribute:language"}),
        #"Expanded ExciseMovementEad" = Table.ExpandTableColumn(#"Expanded ConsigneeTrader", "ExciseMovementEad", {"AdministrativeReferenceCode", "DateAndTimeOfValidationOfEad"}, {"ExciseMovementEad.AdministrativeReferenceCode", "ExciseMovementEad.DateAndTimeOfValidationOfEad"}),
    //....
        #"Expanded Ead" = Table.ExpandTableColumn(#"Expanded FirstTransporterTrader", "Ead", {"LocalReferenceNumber", "InvoiceNumber", "InvoiceDate", "OriginTypeCode", "DateOfDispatch", "TimeOfDispatch"}, {"Ead.LocalReferenceNumber", "Ead.InvoiceNumber", "Ead.InvoiceDate", "Ead.OriginTypeCode", "Ead.DateOfDispatch", "Ead.TimeOfDispatch"}),
        #"Expanded HeaderEad" = Table.ExpandTableColumn(#"Expanded Ead", "HeaderEad", {"SequenceNumber", "DateAndTimeOfUpdateValidation", "DestinationTypeCode", "JourneyTime", "TransportArrangement"}, {"HeaderEad.SequenceNumber", "HeaderEad.DateAndTimeOfUpdateValidation", "HeaderEad.DestinationTypeCode", "HeaderEad.JourneyTime", "HeaderEad.TransportArrangement"}),
    //...
        #"Expanded BodyEad" = Table.ExpandTableColumn(#"Expanded MovementGuarantee", "BodyEad", {"BodyRecordUniqueReference", "ExciseProductCode", "CnCode", "Quantity", "GrossWeight", "NetWeight", "CommercialDescription", "Package"}, {"BodyEad.BodyRecordUniqueReference", "BodyEad.ExciseProductCode", "BodyEad.CnCode", "BodyEad.Quantity", "BodyEad.GrossWeight", "BodyEad.NetWeight", "BodyEad.CommercialDescription", "BodyEad.Package"}),

    //...

    // Adjust types where needed

        #"Changed Type1" = Table.TransformColumnTypes(#"Expanded TransportDetails.ComplementaryInformation",{{"ConsigneeTrader.Traderid", type text}, //...

    4/ function 'Transform File from <your query>' : no changes

    5/ <your query> : invokes the function - no changes

    In some subtables it is possible that there are multiple records. This results in multiple record lines in the output but these can be processed (delete duplicates or grouping or leave them for processing in the pivot tables).

    I did not have the time to learn how XML works but I managed to import the data needed.

    I would prefer to make this data driven instead of the XML on the network drive and also finetune this query so that it can cope with future changes in the XML data structure (based on version info).

    All suggestions to improve my query are welcome, or Microsoft can build an improved XML handler into PQ? :-)

    I managed to process the 1638 emails in my mailbox, it takes a little while, but it works.

    Best regards, dirk

    Tuesday, May 30, 2017 7:31 AM