none
cross apply xml query performs exponentially worse as xml document grows

    Question

  • What I Have
    I have a variable size XML document that needs to be parsed on MSSQL 2008 R2 that looks like this:

    <data item_id_type="1" cfgid="{4F5BBD5E-72ED-4201-B741-F6C8CC89D8EB}" has_data_event="False">
      <item name="1">
        <field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.506543009706267</field>
        <field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.79500402346138</field>
        <field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.0152649050024924</field>
      </item>
      <item name="2">
        <field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.366096802804087</field>
        <field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.386642801354842</field>
        <field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.031671174184115</field>
      </item>
    </data>

    What I Want
    I need to transform it into a regular table type dataset that looks like this:

    What Works
    This "cross apply" query creates the desired output:

    create table #temp (x xml)
    
    insert into #temp (x)
    values ('
      <data item_id_type="1" cfgid="{4F5BBD5E-72ED-4201-B741-F6C8CC89D8EB}" has_data_event="False">
        <item name="1">
          <field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.506543009706267</field>
          <field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.79500402346138</field>
          <field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.0152649050024924</field>
        </item>
        <item name="2">
          <field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.366096802804087</field>
          <field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.386642801354842</field>
          <field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.031671174184115</field>
        </item>
        <item name="3">
          <field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.883962036959074</field>
          <field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.781459993268713</field>
          <field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.228442351572923</field>
        </item>
      </data>
    ')
    
    select c.value('(../@name)','varchar(5)') as item_name
          ,c.value('(@id)','uniqueidentifier') as field_id
          ,c.value('(@type)','int') as field_type
          ,c.value('(.)','nvarchar(15)') as field_value
    from   #temp cross apply
           #temp.x.nodes('/data/item/field') as y(c)
    
    drop table #temp

    Problem
    When there are a few hundred (or fewer) "<item>" elements in the XML, the query performs just fine.  However, when there are 1,000 "<item>" elements, it takes 24 seconds to finish returning the rows in SSMS.  When there are 6,500 "<item>" elements, it takes about 20 minutes to run the "cross apply" query.  We could have 10-20,000 "<item>" elements.

    Questions
    What makes the "cross apply" query perform so poorly on this simple XML document, and perform exponentially slower as the dataset grows?

    Is there a more efficient way to transform the XML document into the tabular dataset (in SQL)?        

    Friday, June 13, 2014 3:55 AM

Answers

  • Adding an XML index did the trick.  Now the 6,500 records that took 20 minutes to run takes < 4 seconds.

        create table #temp (id int primary key, x xml)
        create primary xml index idx_x on #temp (x)
        
        insert into #temp (id, x)
        values (1, '
        <data item_id_type="1" cfgid="{4F5BBD5E-72ED-4201-B741-F6C8CC89D8EB}" has_data_event="False">
          <item name="1">
            <field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.506543009706267</field>
            <field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.79500402346138</field>
            <field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.0152649050024924</field>
          </item>
          <item name="2">
            <field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.366096802804087</field>
            <field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.386642801354842</field>
            <field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.031671174184115</field>
          </item>
          <item name="3">
            <field id="{EA032B25-19F1-4C1B-BDDE-3113542D13A5}" type="2">0.883962036959074</field>
            <field id="{71014ACB-571B-4C72-9C9B-05458B11335F}" type="2">-0.781459993268713</field>
            <field id="{740C36E9-1988-413E-A1D5-B3E5B4405B45}" type="2">0.228442351572923</field>
          </item>
        </data>
        ')
        
        select c.value('(../@name)','varchar(5)') as item_name
              ,c.value('(@id)','uniqueidentifier') as field_id
              ,c.value('(@type)','int') as field_type
              ,c.value('(.)','nvarchar(15)') as field_value
        from   #temp cross apply
               #temp.x.nodes('/data/item/field') as y(c)
        
        drop table #temp



    • Edited by jimvern Friday, June 13, 2014 8:04 AM
    • Marked as answer by Naomi NModerator Friday, June 13, 2014 3:24 PM
    Friday, June 13, 2014 4:48 AM