none
Is there a way, maybe via R, that makes it possible to process extremely large XML files in Power BI Desktop? RRS feed

  • Question

  • Is there a way, maybe via R, that makes it possible to process extremely large XML files in Power BI Desktop?

    2,89 GB (3.106.797.097 bytes) and bigger.

    This will soon be a problem when the ISO/PC 295 Audit Data Collection standard, which tackles the interoperability issue between various accounting software, is set.

    When I open this file, 2,89 GB (3.106.797.097 bytes), in Power Query as text, it works fine. When I open is with Xml.Tables( it takes very, very, very, very, very much time. Too much time.


    Sorry, the file is confidential.
    Wednesday, October 25, 2017 1:54 PM

Answers

  • Hi Robert,

    Xml.Tables doesn’t currently accept a user schema and so it tries to guess the output schema. This (along with the flattening process) makes it very memory intensive and basically unsuitable for large documents. Xml.Document should work better for large documents at the expense of having to write code yourself – and of course, depending on the transformations you want to do, those could again be memory-intensive.

    I'd recommend you create a request on Power BI Ideas. If you can drum up support for it (e.g. from other folks who will be needing to process the ISO format you mentioned), that will help increase its priority internally.

    Ehren

    Wednesday, November 1, 2017 5:25 PM
    Owner

All replies

  • Hi Robert,

    Xml.Tables doesn’t currently accept a user schema and so it tries to guess the output schema. This (along with the flattening process) makes it very memory intensive and basically unsuitable for large documents. Xml.Document should work better for large documents at the expense of having to write code yourself – and of course, depending on the transformations you want to do, those could again be memory-intensive.

    I'd recommend you create a request on Power BI Ideas. If you can drum up support for it (e.g. from other folks who will be needing to process the ISO format you mentioned), that will help increase its priority internally.

    Ehren

    Wednesday, November 1, 2017 5:25 PM
    Owner
  • Ehren, In the mean time, is there a way via R?
    Thursday, November 2, 2017 7:02 PM
  • Ehren, is there maybe another alternative?
    Thursday, November 2, 2017 7:09 PM
  • Regarding R, I actually have no idea. Perhaps someone on the Power BI Community site would know.

    Maybe you could pre-process the file somehow, using Lines.FromBinary and some custom M to break it into more manageable chunks? Lines.FromBinary will also stream, which should help with the perf. However, if the file can't really be broken up, then this approach won't work.

    Ehren

    Thursday, November 2, 2017 8:48 PM
    Owner