Example of how to create a lookup table from XML file

Отвечено Example of how to create a lookup table from XML file

  • Thursday, January 03, 2013 12:11 AM
     
     

    I needed to QUICKLY set up a table from XML containing US states and their abbreviations. I thought I would quickly find an example (with data) for a task that has been done several thousand times... but - nothing. I'm sure it's out there, but I didnt see it...probably buried on page 100 of search.

    Anyway, Like most developers I am way too lazy to type it in myself. After finding a text list of state abbreviations, I performed a simple search replacement to get a file like this:

    <root>

      <row abbr = "AK" name = "Alaska" />

      <row abbr = "AZ" name = "Arizona" />

      <row abbr = "AR" name = "Arkansas" />

      <row abbr = "CA" name = "California" />

    I like examples for a quickly understanding of how to get things done. See the simple example below. (You will be interested in MS documentaion on OPENXML to understand its arguments, and maybe XPATH.  Or just copy and paste this to a new query window, and voila, get a new States table!)

    declare @doc xml

    set @doc =

    '

    <root>

      <row abbr = "AK" name = "Alaska" />

      <row abbr = "AZ" name = "Arizona" />

      <row abbr = "AR" name = "Arkansas" />

      <row abbr = "CA" name = "California" />

      <row abbr = "CO" name = "Colorado" />

      <row abbr = "CT" name = "Connecticut" />

      <row abbr = "DC" name = "District of Columbia" />

      <row abbr = "DE" name = "Delaware" />

      <row abbr = "FL" name = "Florida" />

      <row abbr = "GA" name = "Georgia" />

      <row abbr = "HI" name = "Hawaii" />

      <row abbr = "ID" name = "Idaho" />

      <row abbr = "IL" name = "Illinois" />

      <row abbr = "IN" name = "Indiana" />

      <row abbr = "IA" name = "Iowa" />

      <row abbr = "KS" name = "Kansas" />

      <row abbr = "KY" name = "Kentucky" />

      <row abbr = "LA" name = "Louisiana" />

      <row abbr = "ME" name = "Maine" />

      <row abbr = "MD" name = "Maryland" />

      <row abbr = "MA" name = "Massachusetts" />

      <row abbr = "MI" name = "Michigan" />

      <row abbr = "MN" name = "Minnesota" />

      <row abbr = "MS" name = "Mississippi" />

      <row abbr = "MO" name = "Missouri" />

      <row abbr = "MT" name = "Montana" />

      <row abbr = "NE" name = "Nebraska" />

      <row abbr = "NV" name = "Nevada" />

      <row abbr = "NH" name = "New Hampshire" />

      <row abbr = "NJ" name = "New Jersey" />

      <row abbr = "NM" name = "New Mexico" />

      <row abbr = "NY" name = "New York" />

      <row abbr = "NC" name = "North Carolina" />

      <row abbr = "ND" name = "North Dakota" />

      <row abbr = "OH" name = "Ohio" />

      <row abbr = "OK" name = "Oklahoma" />

      <row abbr = "OR" name = "Oregon" />

      <row abbr = "PA" name = "Pennsylvania" />

      <row abbr = "RI" name = "Rhode Island" />

      <row abbr = "SC" name = "South Carolina" />

      <row abbr = "SD" name = "South Dakota" />

      <row abbr = "TN" name = "Tennessee" />

      <row abbr = "TX" name = "Texas" />

      <row abbr = "UT" name = "Utah" />

      <row abbr = "VT" name = "Vermont" />

      <row abbr = "VA" name = "Virginia" />

      <row abbr = "WA" name = "Washington" />

      <row abbr = "WV" name = "West Virginia" />

      <row abbr = "WI" name = "Wisconsin" />

      <row abbr = "WY" name = "Wyoming" />

    </root>'

    declare @docHandle int;

    exec sp_xml_prepareDocument @docHandle output, @doc

    -- drop states

    select * into States from openxml(@docHandle,'/root/row',1)

          WITH (abbr varchar(2) '@abbr', name varchar(20) '@name')

    -- release to resolve internal memory issues

    exec sp_xml_removedocument @docHandle

    Your countries state names (or any other lookup data) may of course require different column widths. I started with a larger varchar column for name. Then determined the optimal length needed for the states name column. I'm sure you have done this before, but as a handy reference:

    SELECT top 1

          name, max(len(ltrim(name)))

      FROM States group by name order by max(len(ltrim(name))) DESC

    Hope this helps...


    • Edited by DBC_Austin Monday, January 07, 2013 4:51 PM
    •  

All Replies

  • Thursday, January 03, 2013 12:22 AM
     
     Answered Has Code

    You have not included sp_xml_removedocument in your example.  You must always call sp_xml_removedocument when using OPENXML and sp_xml_preparedocument.  Otherwise this can lead to memory issues.  Please correct your example if you intend it as a note for others.

    Using SELECT INTO is fine for knocking up quick tables, but they won't have indexes, a primary key to prevent duplicates and you don't control nullability or the length of the datatypes.  That's why it's better to use CREATE TABLE and INSERT instead.

    From SQL 2005 onwards, you can also do the above using the xml datatype and its methods ( eg .query, .value, .nodes ) which I prefer as it doesn't have the memory issues OPENXML can have, eg

    DECLARE @xml XML
     
    SET @xml = '<root>
      <row abbr="AK" name="Alaska" />
      <row abbr="AZ" name="Arizona" />
      <row abbr="AR" name="Arkansas" />
      <row abbr="CA" name="California" />
      <row abbr="CO" name="Colorado" />
      <row abbr="CT" name="Connecticut" />
      <row abbr="DC" name="District of Columbia" />
      <row abbr="DE" name="Delaware" />
      <row abbr="FL" name="Florida" />
      <row abbr="GA" name="Georgia" />
      <row abbr="HI" name="Hawaii" />
      <row abbr="ID" name="Idaho" />
      <row abbr="IL" name="Illinois" />
      <row abbr="IN" name="Indiana" />
      <row abbr="IA" name="Iowa" />
      <row abbr="KS" name="Kansas" />
      <row abbr="KY" name="Kentucky" />
      <row abbr="LA" name="Louisiana" />
      <row abbr="ME" name="Maine" />
      <row abbr="MD" name="Maryland" />
      <row abbr="MA" name="Massachusetts" />
      <row abbr="MI" name="Michigan" />
      <row abbr="MN" name="Minnesota" />
      <row abbr="MS" name="Mississippi" />
      <row abbr="MO" name="Missouri" />
      <row abbr="MT" name="Montana" />
      <row abbr="NE" name="Nebraska" />
      <row abbr="NV" name="Nevada" />
      <row abbr="NH" name="New Hampshire" />
      <row abbr="NJ" name="New Jersey" />
      <row abbr="NM" name="New Mexico" />
      <row abbr="NY" name="New York" />
      <row abbr="NC" name="North Carolina" />
      <row abbr="ND" name="North Dakota" />
      <row abbr="OH" name="Ohio" />
      <row abbr="OK" name="Oklahoma" />
      <row abbr="OR" name="Oregon" />
      <row abbr="PA" name="Pennsylvania" />
      <row abbr="RI" name="Rhode Island" />
      <row abbr="SC" name="South Carolina" />
      <row abbr="SD" name="South Dakota" />
      <row abbr="TN" name="Tennessee" />
      <row abbr="TX" name="Texas" />
      <row abbr="UT" name="Utah" />
      <row abbr="VT" name="Vermont" />
      <row abbr="VA" name="Virginia" />
      <row abbr="WA" name="Washington" />
      <row abbr="WV" name="West Virginia" />
      <row abbr="WI" name="Wisconsin" />
      <row abbr="WY" name="Wyoming" />
    </root>'
    
    INSERT INTO ...
    SELECT r.c.value('@abbr', 'CHAR(2)'), r.c.value('@name', 'VARCHAR(50)')
    FROM @xml.nodes('root/row') r(c)

  • Monday, January 07, 2013 5:38 PM
     
     Answered

    You have not included sp_xml_removedocument in your example.  You must always call sp_xml_removedocument when using OPENXML and sp_xml_preparedocument. 

    Thanks - yes - need to use sp_xml_removedocument. I've used that with no problems - missed that on copy - thanks.

    As far as your comments against using select into, I think you are incorrect in suggesting an index for a 50 row table. Any index on very small tables will be ignored and SQL Server will perform table scan.

    I usually (almost always) use  int identity as surrogate keys. There are practical application-related reasons for doing so. I dont want to get in a religious discussion of natural vs surrogate keys here, but a US states table is the poster child for using natural key of state name. Here's why:

     1) US state names are going to be unique. State names are natural keys. It would take an act of congress to add a row into this table. 

    2) semantically, any key other than a name for a US state table doesnt make sense, and could be confusing to casual user of table if they mistaken considered the surrogate key to have meaning.

    I think you missed the point of the article - I said I was trying to be quick! This technique is useful for development phase of work as long as one is free from the contraints/standards that may be necessary for large distributed efforts.

    Regardless of the project size - during development I think many will find that SELECT INTO is a just fine. Nullability and other changes can easily be made to a table created by SELECT INTO.

    BTW - you are mistaken about controlling the length of data types using SELECT INTO.

    select

    cast(1 as varchar(255)) [One] , cast('2' as int ) [Two] into example

    • Proposed As Answer by Kr_Mehta101 Monday, January 07, 2013 8:27 PM
    • Marked As Answer by Iric WenModerator Friday, January 11, 2013 2:35 AM
    •  
  • Saturday, April 27, 2013 5:39 PM
     
     
    Thanks this is a good example and it help me in my project.
  • Sunday, April 28, 2013 9:02 PM
     
     

    Why not just load the XML into a text file and use your editor to turn it into a single INSERT INTO statement with a long VALUES() clause?  Lookup tables are simple -- one or more columns for the parameters and one output. The data types are known, etc. 


    --CELKO-- Books in Celko Series for Morgan-Kaufmann Publishing: Analytics and OLAP in SQL / Data and Databases: Concepts in Practice Data / Measurements and Standards in SQL SQL for Smarties / SQL Programming Style / SQL Puzzles and Answers / Thinking in Sets / Trees and Hierarchies in SQL