none
complex xml RRS feed

  • Question

  • I have gone through this with vb and now powershell.  I am trying to do this without writing a bunch of granular vb code.

    I am trying to read a complex xmlFile (with a schema doc) into tables and then upload those tables into SQL.

    When importing to a dataset the table names are off (and all the datarows appear in one table only).

    The schema tested valid and the xml tested valid against the schema.......see the end for the results

    Schema:

    <?xml version="1.0" encoding="utf-8"?>
    <xs:schema xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    attributeFormDefault="unqualified"
    elementFormDefault="qualified"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
      <xs:element name="products">
        <xs:complexType>
          <xs:sequence>
            <xs:element minOccurs="0" name="books">
              <xs:complexType>
                <xs:sequence>
                  <xs:element minOccurs="0" maxOccurs="unbounded" name="item">
                    <xs:complexType>
                      <xs:sequence>
                        <xs:element minOccurs="0" name="title" type="xs:string" />
                        <xs:element minOccurs="0" name="pageCount" type="xs:string" />
                        <xs:element minOccurs="0" name="price" type="xs:integer" />
                      </xs:sequence>
                    </xs:complexType>
                  </xs:element>
                </xs:sequence>
              </xs:complexType>
            </xs:element>
            <xs:element minOccurs="0" name="fruits">
              <xs:complexType>
                <xs:sequence>
                  <xs:element minOccurs="0" name="item">
                    <xs:complexType>
                      <xs:sequence>
                        <xs:element minOccurs="0" name="type" type="xs:string" />
                        <xs:element minOccurs="0" name="color" type="xs:string" />
                        <xs:element minOccurs="0" name="price" type="xs:string" />
                      </xs:sequence>
                    </xs:complexType>
                  </xs:element>
                </xs:sequence>
              </xs:complexType>
            </xs:element>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:schema>

    My XML

    <?xml version="1.0" encoding="utf-8"?>
    <!-- Created with Liquid XML 2014 Starter Edition (Trial) 12.2.8.5459 (http://www.liquid-technologies.com) -->
    <products xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="C:\temp\xml\mySchematest.xsd">
        <books>
            <item>
                <title>myFirstBook</title>
                <pageCount>300</pageCount>
                <price>55</price>
            </item>
            <item>
                <title>mySecondBook</title>
                <pageCount>1000</pageCount>
                <price>100</price>
            </item>
        </books>
        <fruits>
            <item>
                <type>orange</type>
                <color>orange like</color>
                <price>5</price>
            </item>
         </fruits>
    </products>

    my PowerShell

    clear-host
    
    $dsXml = new-object "System.Data.DataSet"
    $dsXml.ReadXmlSchema("C:\temp\xml\mySchemaTest.xsd")
    $dsXml.ReadXml("c:\temp\xml\myXmlTest.xml", 'ReadSchema')
    
    
    Write-Host "Table names in dataset"
    Write-Host "======================"
    foreach ($table in $dsXml.Tables)
    {
    	Write-Host $table.TableName
    }
    

    results

    ReadSchema
    Table names in dataset
    ======================
    books
    item
    fruits

    I expected only 2 tables to be created, each with their attributes.
    Instead I got the table "item" with all the attribute names and data

    the tables books and fruits simply have one column called books_id or fruit_id with one value of 0 in each.

    even if I just read the schema I get 3 table names....

    am I supposed to create a granular reader?

    help, please!

    CS

    Wednesday, January 28, 2015 12:19 AM

Answers


  • Both books and fruits contained objects referred to as "item".  Instead they should have referred to "book" and "fruit".

    When this was done several tables were created including "book" and "fruit" and the columns were populated accordingly.

    I am not saying that is right but I am saying that it created the separate tables as desired (both had a common column name called "price" that was not linked.

    Basically, it worked.

    I think I can change the schema to have multiple types and assign those types to books and fruits so they can share the same subtype info but have not tested that.

    If an XML wiz knows better please update this thread.  Here is the final XML, the schema was similarly changed:

    <?xml version="1.0" encoding="utf-8"?>
    <products xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="C:\temp\xml\mySchematest.xsd">
        <books>
            <book>
                <title>myFirstBook</title>
                <pageCount>300</pageCount>
                <price>55</price>
            </book>
            <book>
                <title>mySecondBook</title>
                <pageCount>1000</pageCount>
                <price>100</price>
            </book>
        </books>
        <fruits>
            <fruit>
                <type>orange</type>
                <color>orange like</color>
                <price>5</price>
            </fruit>
         </fruits>
    </products>



    • Edited by CountryStyle Wednesday, January 28, 2015 11:04 AM
    • Marked as answer by CountryStyle Thursday, January 29, 2015 1:03 AM
    Wednesday, January 28, 2015 5:00 AM

All replies

  • You should really beposting this kind of issue into the developers forums.  The schema declares a relation. It will create the tabels.

    ¯\_(ツ)_/¯

    Wednesday, January 28, 2015 1:09 AM
  • YOu are getting the relations and tables but the schema is not correct.


    ¯\_(ツ)_/¯

    Wednesday, January 28, 2015 1:15 AM
  • Schema and data do not even vaguely match.


    ¯\_(ツ)_/¯

    Wednesday, January 28, 2015 1:21 AM
  • Schema and data do not even vaguely match.


    ¯\_(ツ)_/¯

    The tools (liquid XML) say the XML was well formed and the XML also was valid for the schema.

    The schema was valid too (which is the part being tested against powershell)


    • Edited by CountryStyle Wednesday, January 28, 2015 1:32 AM
    Wednesday, January 28, 2015 1:30 AM
  • Schema and data do not even vaguely match.


    ¯\_(ツ)_/¯

    Why are you saying they don't match?

    The string datatype is fine.....the powerShell is reading the schema only for this test.

    Wednesday, January 28, 2015 1:33 AM
  • I see product, fruits, and item  int he schema and books  and fruits inthe XML.  Isee no keys in either andno fields match the schema declaration.


    ¯\_(ツ)_/¯

    Wednesday, January 28, 2015 1:53 AM
  • I see product, fruits, and item  int he schema and books  and fruits inthe XML.  Isee no keys in either andno fields match the schema declaration.


    ¯\_(ツ)_/¯

    The datatypes are irrelevant, string datatype is a catchall.  Just think of possible values for those feilds that you think are int being "a lot" "bundles" "prices" or some dollar format.

    Aside from the datatypes do you see anything that may be amiss?

    Besides....I never even got to the xml part....the schema can load on its own.

    • Edited by CountryStyle Wednesday, January 28, 2015 1:56 AM
    Wednesday, January 28, 2015 1:55 AM
  • YOu declare two tables of item in an implied relation. You get two tables and a relaionship table (detail) called item. What binds fruit to book? Is it just a table of stuff that has mutiople external relations?  What is it bound on?

    type,color, price => title,page, price

    Are they related by price?


    ¯\_(ツ)_/¯

    Wednesday, January 28, 2015 2:01 AM
  • fruits and books.  They are both products in a store....that is it.  Other than that they are mutually exclusive.

    Are you saying I need 2 different XML documents?


    • Edited by CountryStyle Wednesday, January 28, 2015 2:13 AM
    Wednesday, January 28, 2015 2:12 AM
  • I will take out the common word "item" and see what happens.....I don't see why a link must be made.
    Wednesday, January 28, 2015 2:20 AM
  • I am pretty sure you need to have separate XML data files as the schem can reference many structures noit jsu the one used.

    You are loafding them as if they are related tables.  THe dataset loader is trying to create relations.

    See: $dsXml.Relations

    PS C:\scripts> $dsXml.Relations


    ChildColumns        : {books_Id}
    ChildTable          : {0, 0, }
    DataSet             : System.Data.DataSet
    ParentColumns       : {books_Id}
    ParentTable         : {0}
    RelationName        : books_item
    Nested              : True
    ParentKeyConstraint : Constraint1
    ChildKeyConstraint  : books_item
    ExtendedProperties  : {}

    ChildColumns        : {fruits_Id}
    ChildTable          : {0, 0, }
    DataSet             : System.Data.DataSet
    ParentColumns       : {fruits_Id}
    ParentTable         : {0}
    RelationName        : fruits_item
    Nested              : True
    ParentKeyConstraint : Constraint1
    ChildKeyConstraint  : fruits_item
    ExtendedProperties  : {}

    Nothing matches.  How are they related?


    ¯\_(ツ)_/¯

    Wednesday, January 28, 2015 2:21 AM
  • THe schema says you have a dataset relationship called "Products".  It has two tables that calim to describe what a "product" is. 

    ¯\_(ツ)_/¯

    Wednesday, January 28, 2015 2:24 AM
  • I think this is what you are trying to do.  REad it carefuy to see why your schema deosn't work the way you think it does.

    https://msdn.microsoft.com/en-us/library/bfdchewb(v=vs.110).aspx

    Remember that this is a Microsoft schema reference that is used to build schemas compatible with Microsoft datasets and data types.  It is not general purpose.

    Also note that none of this has anything to do with scripting. 


    ¯\_(ツ)_/¯

    Wednesday, January 28, 2015 2:44 AM
  • I changed "item" to "itemId" for fruits and it craeted 1 more table (for 4) and the data was separated correctly.

    So....it looks like I will have to read the xml file in via a query and process each type (book or fruit) separately in their own separate data sets.

    Hopefully won't have to do granular XML work.

    thanks....will look at the other methods.

    cs


    • Edited by CountryStyle Wednesday, January 28, 2015 2:45 AM
    Wednesday, January 28, 2015 2:44 AM
  • No - you have to understandwhat the Microsft "Relationa;" XML dataset schema is intended to be used for.

    Here is how we define a relationship.

    <xs:element name="Order">
      <xs:complexType>
         <xs:sequence>        
           <xs:element name="OrderDetail" />
               <xs:complexType>             
               </xs:complexType>
         </xs:sequence>
      </xs:complexType>
    </xs:element>
    

    Notice we define an OrderDetail to be related to an "Order"

    In you schema you have declared two things.  "apples" and "book".  (fruits).  THey say you can't mix apples and ooks for a reason.  They are not related.  THe title of teh MMS schea is the "MSData Relational Schema".  It is not  general purpose schema.

    Like your earlier issues with loading rows, you need to understand what is happening under the covers.  Read teh MS docs and it will become clear.


    ¯\_(ツ)_/¯

    Wednesday, January 28, 2015 2:52 AM
  • I do know the items can be mixed in the same XML document.....apples and books can be mixed together.  The items are related at the root.

    The XML can be queried with xPath....previously I was going through XML node by node and pulling out children based on matches.

    Wednesday, January 28, 2015 3:02 AM
  • I was trying to figure out how to show you what is happening.  I removed the incorrect schema load which helps (I just noticed you had a schema reference in the XML file)

    Here is what the table looks like:

    PS C:\scripts> $ds.Tables[1]
    
    
    title     : myFirstBook
    pageCount : 300
    price     : 55
    books_Id  : 0
    type      :
    color     :
    fruits_Id :
    
    title     : mySecondBook
    pageCount : 1000
    price     : 100
    books_Id  : 0
    type      :
    color     :
    fruits_Id :
    
    title     :
    pageCount :
    price     : 5
    books_Id  :
    type      : orange
    color     : orange like
    fruits_Id : 0
    
    

    Notice how your schema merges the two tables on an implied relation.  This is similar to a cross-join in SQL.  It is not what you want.


    ¯\_(ツ)_/¯

    Wednesday, January 28, 2015 3:09 AM
  • Maybe a better statement would bethat it is like a denormalized view based on an inferred relationship.


    ¯\_(ツ)_/¯

    Wednesday, January 28, 2015 3:09 AM
  • I do know the items can be mixed in the same XML document.....apples and books can be mixed together.  The items are related at the root.

    The XML can be queried with xPath....previously I was going through XML node by node and pulling out children based on matches.


    All true but totally unrelated to schemas.

    ¯\_(ツ)_/¯

    Wednesday, January 28, 2015 3:11 AM

  • Both books and fruits contained objects referred to as "item".  Instead they should have referred to "book" and "fruit".

    When this was done several tables were created including "book" and "fruit" and the columns were populated accordingly.

    I am not saying that is right but I am saying that it created the separate tables as desired (both had a common column name called "price" that was not linked.

    Basically, it worked.

    I think I can change the schema to have multiple types and assign those types to books and fruits so they can share the same subtype info but have not tested that.

    If an XML wiz knows better please update this thread.  Here is the final XML, the schema was similarly changed:

    <?xml version="1.0" encoding="utf-8"?>
    <products xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="C:\temp\xml\mySchematest.xsd">
        <books>
            <book>
                <title>myFirstBook</title>
                <pageCount>300</pageCount>
                <price>55</price>
            </book>
            <book>
                <title>mySecondBook</title>
                <pageCount>1000</pageCount>
                <price>100</price>
            </book>
        </books>
        <fruits>
            <fruit>
                <type>orange</type>
                <color>orange like</color>
                <price>5</price>
            </fruit>
         </fruits>
    </products>



    • Edited by CountryStyle Wednesday, January 28, 2015 11:04 AM
    • Marked as answer by CountryStyle Thursday, January 29, 2015 1:03 AM
    Wednesday, January 28, 2015 5:00 AM
  • Yes but look at what you havenow:

    PS C:\scripts> $dsXml = new-object "System.Data.DataSet"
    PS C:\scripts> $dsXml.ReadXmlSchema("C:\test\Testschema.xsd")
    PS C:\scripts> $dsXml.ReadXml("c:\test\Test.xml", 'ReadSchema')
    ReadSchema
    PS C:\scripts> $dsXml.Tables|select tablename

    TableName
    ---------
    books
    item  <<<=====
    fruits

    You still have a broken relation.  You need to either changethe schemaor place the data in separate files.

    Read the links I posted to get an understanding what a data relation in an MS dataset is.


    ¯\_(ツ)_/¯

    Wednesday, January 28, 2015 2:07 PM
  • Broken relation is gone......look at the XML....the XSD has the same changes.

    I also think the relationship with item can be broken using an update to the XSD (as opposed to changing the XML to suit a new XSD)

    ReadSchema
    Table names in dataset
    ======================
    store
    books
    book
    fruits
    fruit

    The item relationship gone......the datareader does not think they are the same thing.

    cs

    Wednesday, January 28, 2015 2:42 PM
  • Yes - that was nmmy point.  You needed to changethe XSD to accomodate either  no relation or different tables. 

    OF course you didn't post the updated schema so I did not knowyou had done that.

    Glad you sorted it.


    ¯\_(ツ)_/¯

    Wednesday, January 28, 2015 4:53 PM