none
Extract Xml data from multiple files in multiple folders RRS feed

  • Question

  • Hello,

    I am very new to Powershell and I have been forever trying to do what seems to be a fairly easy task for Powershell.  

    I have a location with multiple directories/folders, each containing more directories/folders and files.  I want to find all of the 'files.xml' files that reside in each folder/directory, extract a particular section of the xml data from each one, and send those results to a file.

    I have been successful so far in returning exactly what I need from a single file using this script:

    [xml]$xmlData = Get-Content "m:\VABeach\hillsvirginiabea1962unse\*files.xml"
    $xmlData.files.ChildNodes.Item("4")

    but what hat I really want is to extract the same data from each one of the 'files.xml' files in every folder.

    I am trying with this script:

    [xml]$xmlData = Get-ChildItem "m:\VABeach\*" -Recurse -Include \*files.xml -force
    $xmlData.files.ChildNodes.Item("4")

    I receive the following error:

    You cannot call a method on a null-valued expression.
    At C:\Users\kbucher\PowerShell\XmlDataXtractor_Nprogress.ps1:2 char:31
    + $xmlData.files.ChildNodes.Item <<<< ("4")
        + CategoryInfo          : InvalidOperation: (Item:String) [], RuntimeExcep 
       tion
        + FullyQualifiedErrorId : InvokeMethodOnNull

    I have read what the error is and what it is trying to tell me but I am not getting it.  I have tried using Select-Object and the Select-Xml from examples I found here but then I get other errors that I do not know how to fix.  

    I have tried so many different things that the only thing I have really figured out is that I know just enough to get myself into trouble but not enough to get myself out!

    Could someone who knows what they are doing please help me get this over with?! :)  I know the script doesn't include sending the results to a file only because that is really the least of my worries right now...

     

    Sunday, March 1, 2015 7:03 PM

Answers

  • YOu need to be more clear about what you are trying to do.

    We need a short sample of the XML file and the script you are actually trying to run but only the part that is failing.  Shrink the script to only the lines that cause the failure.

    I think you may misunderstand what Item is on an XML object.  It cannot be used as an element tagname or there wil lbe a conflict.

    Example:

    PS C:\scripts> $xml=[xml]'<root><itemx>Something</itemx></root>'
    PS C:\scripts> $xml
    
    root
    ----
    root
    
    
    PS C:\scripts> $xml.root
    
    itemx
    -----
    Something
    
    
    PS C:\scripts> $xml.root.itemx
    Something
    PS C:\scripts> $xml.root.item
    
    
    IsSettable          : False
    IsGettable          : True
    OverloadDefinitions : {System.Xml.XmlElement Item(string name) {get;}, System.Xml.XmlElement Item(string localname, string ns) {get;}}
    TypeNameOfValue     : System.Xml.XmlElement
    MemberType          : ParameterizedProperty
    Value               : System.Xml.XmlElement Item(string name) {get;}, System.Xml.XmlElement Item(string localname, string ns) {get;}
    Name                : Item
    IsInstance          : True
    
    
    
    PS C:\scripts> $xml.root.item(0)
    PS C:\scripts> $xml.root.item(1)
    PS C:\scripts> $xml.root.item('itemx')
    
    #text
    -----
    Something
    
    
    PS C:\scripts>
    
    

    Now this:

    PS C:\scripts> $xml=[xml]'<root><item>Something</item></root>'
    PS C:\scripts> $xml.root
    
    
    Name            : root
    LocalName       : root
    NamespaceURI    :
    Prefix          :
    NodeType        : Element
    ParentNode      : #document
    OwnerDocument   : #document
    IsEmpty         : False
    Attributes      : {}
    HasAttributes   : False
    SchemaInfo      : System.Xml.XmlName
    InnerXml        : <item>Something</item>
    InnerText       : Something
    NextSibling     :
    PreviousSibling :
    Value           :
    ChildNodes      : {item}
    FirstChild      : item
    LastChild       : item
    HasChildNodes   : True
    IsReadOnly      : False
    OuterXml        : <root><item>Something</item></root>
    BaseURI         :
    
    
    
    PS C:\scripts> $xml.root.item
    Something
    PS C:\scripts> $xml.root.item(1)
    PS C:\scripts> $xml.root.item('4')
    PS C:\scripts>

    Now look at this:

    PS C:\scripts> $xml=[xml]'<root><item>3 Something</item><item>2 Something</item><item>1 Something</item></root>'
    PS C:\scripts> $xml.root
    
    
    Name            : root
    LocalName       : root
    NamespaceURI    :
    Prefix          :
    NodeType        : Element
    ParentNode      : #document
    OwnerDocument   : #document
    IsEmpty         : False
    Attributes      : {}
    HasAttributes   : False
    SchemaInfo      : System.Xml.XmlName
    InnerXml        : <item>3 Something</item><item>2 Something</item><item>1 Something</item>
    InnerText       : 3 Something2 Something1 Something
    NextSibling     :
    PreviousSibling :
    Value           :
    ChildNodes      : {item, item, item}
    FirstChild      : item
    LastChild       : item
    HasChildNodes   : True
    IsReadOnly      : False
    OuterXml        : <root><item>3 Something</item><item>2 Something</item><item>1 Something</item></root>
    BaseURI         :
    
    
    
    PS C:\scripts> $xml.root.item
    3 Something
    2 Something
    1 Something
    PS C:\scripts> $xml.root.item(1)
    PS C:\scripts>

    You cannot use the Item() method if there is a node named "item".  YOU cannot index into the child with item. "Item()" takes a element tag name'


    ¯\_(ツ)_/¯

    • Marked as answer by Kelly B2 Sunday, March 1, 2015 9:55 PM
    Sunday, March 1, 2015 7:33 PM
  • PS C:\scripts> $xml=[xml]'<root><item>3 Something</item><item>2 Something</item><item>1 Something</item></root>'
    PS C:\scripts> $xml.SelectNodes('//item')
    
    #text
    -----
    3 Something
    2 Something
    1 Something
    

    To get all nodes named item at any level we would do this:


    ¯\_(ツ)_/¯

    • Marked as answer by Kelly B2 Sunday, March 1, 2015 9:55 PM
    Sunday, March 1, 2015 7:36 PM
  • Now run this and see what happens:

    $xml=[xml]@' <root> <big> <item>Something BIG 1</item><item>Something BIG 2</item><item>Something BIG 3</item> </big> <small> <item>3 Something small </item><item>2 Something small</item><item>1 Something small</item> </small> </root> '@ $xml.SelectNodes('//item') $xml.SelectNodes('//big/item')

    # we can also do this:
    $xml.SelectNodes('*/big/item[contains(text(),"3")]')
    $xml.SelectNodes('/root/*/item[contains(text(),"3")]')

    Start here: http://www.w3schools.com/xml/default.asp

    Then here: http://www.w3schools.com/xml/xml_xpath.asp


    ¯\_(ツ)_/¯



    • Edited by jrv Sunday, March 1, 2015 7:44 PM
    • Marked as answer by Kelly B2 Sunday, March 1, 2015 9:55 PM
    Sunday, March 1, 2015 7:41 PM
  • The fast way to get a last node (warning it is not predictable on successive loads):

    $xml.files.file[-1]

    The "-1"   is PowerShell for last element of an array.


    ¯\_(ツ)_/¯

    • Marked as answer by Kelly B2 Sunday, March 1, 2015 9:56 PM
    Sunday, March 1, 2015 9:35 PM

All replies

  • YOu need to be more clear about what you are trying to do.

    We need a short sample of the XML file and the script you are actually trying to run but only the part that is failing.  Shrink the script to only the lines that cause the failure.

    I think you may misunderstand what Item is on an XML object.  It cannot be used as an element tagname or there wil lbe a conflict.

    Example:

    PS C:\scripts> $xml=[xml]'<root><itemx>Something</itemx></root>'
    PS C:\scripts> $xml
    
    root
    ----
    root
    
    
    PS C:\scripts> $xml.root
    
    itemx
    -----
    Something
    
    
    PS C:\scripts> $xml.root.itemx
    Something
    PS C:\scripts> $xml.root.item
    
    
    IsSettable          : False
    IsGettable          : True
    OverloadDefinitions : {System.Xml.XmlElement Item(string name) {get;}, System.Xml.XmlElement Item(string localname, string ns) {get;}}
    TypeNameOfValue     : System.Xml.XmlElement
    MemberType          : ParameterizedProperty
    Value               : System.Xml.XmlElement Item(string name) {get;}, System.Xml.XmlElement Item(string localname, string ns) {get;}
    Name                : Item
    IsInstance          : True
    
    
    
    PS C:\scripts> $xml.root.item(0)
    PS C:\scripts> $xml.root.item(1)
    PS C:\scripts> $xml.root.item('itemx')
    
    #text
    -----
    Something
    
    
    PS C:\scripts>
    
    

    Now this:

    PS C:\scripts> $xml=[xml]'<root><item>Something</item></root>'
    PS C:\scripts> $xml.root
    
    
    Name            : root
    LocalName       : root
    NamespaceURI    :
    Prefix          :
    NodeType        : Element
    ParentNode      : #document
    OwnerDocument   : #document
    IsEmpty         : False
    Attributes      : {}
    HasAttributes   : False
    SchemaInfo      : System.Xml.XmlName
    InnerXml        : <item>Something</item>
    InnerText       : Something
    NextSibling     :
    PreviousSibling :
    Value           :
    ChildNodes      : {item}
    FirstChild      : item
    LastChild       : item
    HasChildNodes   : True
    IsReadOnly      : False
    OuterXml        : <root><item>Something</item></root>
    BaseURI         :
    
    
    
    PS C:\scripts> $xml.root.item
    Something
    PS C:\scripts> $xml.root.item(1)
    PS C:\scripts> $xml.root.item('4')
    PS C:\scripts>

    Now look at this:

    PS C:\scripts> $xml=[xml]'<root><item>3 Something</item><item>2 Something</item><item>1 Something</item></root>'
    PS C:\scripts> $xml.root
    
    
    Name            : root
    LocalName       : root
    NamespaceURI    :
    Prefix          :
    NodeType        : Element
    ParentNode      : #document
    OwnerDocument   : #document
    IsEmpty         : False
    Attributes      : {}
    HasAttributes   : False
    SchemaInfo      : System.Xml.XmlName
    InnerXml        : <item>3 Something</item><item>2 Something</item><item>1 Something</item>
    InnerText       : 3 Something2 Something1 Something
    NextSibling     :
    PreviousSibling :
    Value           :
    ChildNodes      : {item, item, item}
    FirstChild      : item
    LastChild       : item
    HasChildNodes   : True
    IsReadOnly      : False
    OuterXml        : <root><item>3 Something</item><item>2 Something</item><item>1 Something</item></root>
    BaseURI         :
    
    
    
    PS C:\scripts> $xml.root.item
    3 Something
    2 Something
    1 Something
    PS C:\scripts> $xml.root.item(1)
    PS C:\scripts>

    You cannot use the Item() method if there is a node named "item".  YOU cannot index into the child with item. "Item()" takes a element tag name'


    ¯\_(ツ)_/¯

    • Marked as answer by Kelly B2 Sunday, March 1, 2015 9:55 PM
    Sunday, March 1, 2015 7:33 PM
  • PS C:\scripts> $xml=[xml]'<root><item>3 Something</item><item>2 Something</item><item>1 Something</item></root>'
    PS C:\scripts> $xml.SelectNodes('//item')
    
    #text
    -----
    3 Something
    2 Something
    1 Something
    

    To get all nodes named item at any level we would do this:


    ¯\_(ツ)_/¯

    • Marked as answer by Kelly B2 Sunday, March 1, 2015 9:55 PM
    Sunday, March 1, 2015 7:36 PM
  • Now run this and see what happens:

    $xml=[xml]@' <root> <big> <item>Something BIG 1</item><item>Something BIG 2</item><item>Something BIG 3</item> </big> <small> <item>3 Something small </item><item>2 Something small</item><item>1 Something small</item> </small> </root> '@ $xml.SelectNodes('//item') $xml.SelectNodes('//big/item')

    # we can also do this:
    $xml.SelectNodes('*/big/item[contains(text(),"3")]')
    $xml.SelectNodes('/root/*/item[contains(text(),"3")]')

    Start here: http://www.w3schools.com/xml/default.asp

    Then here: http://www.w3schools.com/xml/xml_xpath.asp


    ¯\_(ツ)_/¯



    • Edited by jrv Sunday, March 1, 2015 7:44 PM
    • Marked as answer by Kelly B2 Sunday, March 1, 2015 9:55 PM
    Sunday, March 1, 2015 7:41 PM
  • The clumsier way is like this:

    PS C:\scripts>  $xml | Select-Xml -XPath '/root/*/item[contains(text(),"3")]'|%{$_.Node}
    
    #text
    -----
    Something BIG 3
    3 Something small
    
    
    Note that it takes more bits to get the same answer .


    ¯\_(ツ)_/¯

    Sunday, March 1, 2015 7:46 PM
  • Thank you so much for such a quick reply and I apologize in advance for my lack of understanding, and ability.  I only first heard of Powershell about 5 days ago :( 

    Here is a Sample of the files.xml document that you requested previously (section I need - the last one)

    
    <files>
    <file name="almobile1986polkdirectory_metasource.xml" source="original">
    <format>MARC Source</format>
    <mtime>1405428300</mtime>
    <size>240</size>
    <md5>e5725f49c145557ca976684a9ed1721c</md5>
    <crc32>eba58b7c</crc32>
    <sha1>71558e773490130f6576fb697f0a94f909e68240</sha1>
    </file>
    <file name="almobile1986polkdirectory_scandata.xml" source="original">
    <format>Scandata</format>
    <mtime>1406830899</mtime>
    <size>2331798</size>
    <md5>1061c1778fec0cbfee768055299d200b</md5>
    <crc32>b3b63df0</crc32>
    <sha1>d5ef5e6c21190f1716dc015f903c09330deb1b72</sha1>
    </file>
    <file name="almobile1986polkdirectory_orig_jp2.tar" source="original">
    <format>Single Page Original JP2 Tar</format>
    <mtime>1406830969</mtime>
    <size>2101442560</size>
    <md5>8b34b9a5e6996cc981c93e642e437fcf</md5>
    <crc32>a193c4b3</crc32>
    <sha1>80796aee4b31ffe31522d658d412c456a4a4bb9c</sha1>
    <private>true</private>
    </file>
    <file name="almobile1986polkdirectory.gif" source="derivative">
    <format>Animated GIF</format>
    <original>almobile1986polkdirectory_jp2.zip</original>
    <mtime>1406858053</mtime>
    <size>256107</size>
    <md5>29ed5af0eb38f87b87ec14327ddb8e06</md5>
    <crc32>f3ae1f28</crc32>
    <sha1>e3a01ac5e024b9cb16e1ae0defdbf5c8f266e8c0</sha1>
    </file>
    <file name="almobile1986polkdirectory_jp2.zip" source="derivative">
    <format>Single Page Processed JP2 ZIP</format>
    <original>almobile1986polkdirectory_orig_jp2.tar</original>
    <mtime>1406857732</mtime>
    <size>1974874595</size>
    <md5>fe9af9a3e2b0ac33885e535204148918</md5>
    <crc32>f476482b</crc32>
    <sha1>794e4bbeed26340f8deec35ccb7491b5c8011295</sha1>
    <private>true</private>
    </file>

    This is the part of the code that was causing the error referenced in the original posting:

    $xmlData.files.ChildNodes.Item("4")

    I have also tried

    $xmlData.SelectNodes('//files') 

    You cannot call a method on a null-valued expression.
    At C:\Users\kbucher\PowerShell\XmlDataXtractor_Nprogress.ps1:3 char:21
    + $xmlData.SelectNodes <<<< ('//files')
        + CategoryInfo          : InvalidOperation: (SelectNodes:String) [], Runti 
        meException
        + FullyQualifiedErrorId : InvokeMethodOnNull

    and even

    $xmlData.SelectSingleNode('//files')

    I received basically the same error for all of them.

    I have been to both links you provided (along with a million others) and I am apparently still not grasping the concept.  Can you explain why it works on a single folder that contains the file and extracts exactly what I want with $xmlData.files.ChildNodes.Item("4")  yet the same thing won't work for multiple folders when the files.xml files are exactly the same?

    Thanks again!


    • Edited by Kelly B2 Sunday, March 1, 2015 8:47 PM
    Sunday, March 1, 2015 8:46 PM
  • First this has nothing to do with Powershell. It is about XML.  Here is what your XML should look like to be usable:
    <?xml version="1.0" ?>
    <files>
    	<file name="almobile1986polkdirectory_metasource.xml" source="original">
    		<format>MARC Source</format>
    		<mtime>1405428300</mtime>
    		<size>240</size>
    		<md5>e5725f49c145557ca976684a9ed1721c</md5>
    		<crc32>eba58b7c</crc32>
    		<sha1>71558e773490130f6576fb697f0a94f909e68240</sha1>
    	</file>
    	<file name="almobile1986polkdirectory_scandata.xml" source="original">
    		<format>Scandata</format>
    		<mtime>1406830899</mtime>
    		<size>2331798</size>
    		<md5>1061c1778fec0cbfee768055299d200b</md5>
    		<crc32>b3b63df0</crc32>
    		<sha1>d5ef5e6c21190f1716dc015f903c09330deb1b72</sha1>
    	</file>
    	<file name="almobile1986polkdirectory_orig_jp2.tar" source="original">
    		<format>Single Page Original JP2 Tar</format>
    		<mtime>1406830969</mtime>
    		<size>2101442560</size>
    		<md5>8b34b9a5e6996cc981c93e642e437fcf</md5>
    		<crc32>a193c4b3</crc32>
    		<sha1>80796aee4b31ffe31522d658d412c456a4a4bb9c</sha1>
    		<private>true</private>
    	</file>
    	<file name="almobile1986polkdirectory.gif" source="derivative">
    		<format>Animated GIF</format>
    		<original>almobile1986polkdirectory_jp2.zip</original>
    		<mtime>1406858053</mtime>
    		<size>256107</size>
    		<md5>29ed5af0eb38f87b87ec14327ddb8e06</md5>
    		<crc32>f3ae1f28</crc32>
    		<sha1>e3a01ac5e024b9cb16e1ae0defdbf5c8f266e8c0</sha1>
    	</file>
    	<file name="almobile1986polkdirectory_jp2.zip" source="derivative">
    		<format>Single Page Processed JP2 ZIP</format>
    		<original>almobile1986polkdirectory_orig_jp2.tar</original>
    		<mtime>1406857732</mtime>
    		<size>1974874595</size>
    		<md5>fe9af9a3e2b0ac33885e535204148918</md5>
    		<crc32>f476482b</crc32>
    		<sha1>794e4bbeed26340f8deec35ccb7491b5c8011295</sha1>
    		<private>true</private>
    	</file>
    </files>
    


    ¯\_(ツ)_/¯

    Sunday, March 1, 2015 9:28 PM
  • No order is ever guaranteed in XML.  There is really no such thing as last node being predictable.

    The best way it to query by value and choose a useful value.  Here is an example:

    $xml=[xml]@'
    <?xml version="1.0" ?>
    <files>
    	<file name="almobile1986polkdirectory_metasource.xml" source="original">
    		<format>MARC Source</format>
    		<mtime>1405428300</mtime>
    		<size>240</size>
    		<md5>e5725f49c145557ca976684a9ed1721c</md5>
    		<crc32>eba58b7c</crc32>
    		<sha1>71558e773490130f6576fb697f0a94f909e68240</sha1>
    	</file>
    	<file name="almobile1986polkdirectory_scandata.xml" source="original">
    		<format>Scandata</format>
    		<mtime>1406830899</mtime>
    		<size>2331798</size>
    		<md5>1061c1778fec0cbfee768055299d200b</md5>
    		<crc32>b3b63df0</crc32>
    		<sha1>d5ef5e6c21190f1716dc015f903c09330deb1b72</sha1>
    	</file>
    	<file name="almobile1986polkdirectory_orig_jp2.tar" source="original">
    		<format>Single Page Original JP2 Tar</format>
    		<mtime>1406830969</mtime>
    		<size>2101442560</size>
    		<md5>8b34b9a5e6996cc981c93e642e437fcf</md5>
    		<crc32>a193c4b3</crc32>
    		<sha1>80796aee4b31ffe31522d658d412c456a4a4bb9c</sha1>
    		<private>true</private>
    	</file>
    	<file name="almobile1986polkdirectory.gif" source="derivative">
    		<format>Animated GIF</format>
    		<original>almobile1986polkdirectory_jp2.zip</original>
    		<mtime>1406858053</mtime>
    		<size>256107</size>
    		<md5>29ed5af0eb38f87b87ec14327ddb8e06</md5>
    		<crc32>f3ae1f28</crc32>
    		<sha1>e3a01ac5e024b9cb16e1ae0defdbf5c8f266e8c0</sha1>
    	</file>
    	<file name="almobile1986polkdirectory_jp2.zip" source="derivative">
    		<format>Single Page Processed JP2 ZIP</format>
    		<original>almobile1986polkdirectory_orig_jp2.tar</original>
    		<mtime>1406857732</mtime>
    		<size>1974874595</size>
    		<md5>fe9af9a3e2b0ac33885e535204148918</md5>
    		<crc32>f476482b</crc32>
    		<sha1>794e4bbeed26340f8deec35ccb7491b5c8011295</sha1>
    		<private>true</private>
    	</file>
    </files>
    '@
    $fileName='almobile1986polkdirectory_jp2.zip'
    $xml.SelectSingleNode("//file[@name='$filename']")
    

    Of course you can load the XML from a file and do the same thing.


    ¯\_(ツ)_/¯

    Sunday, March 1, 2015 9:32 PM
  • The fast way to get a last node (warning it is not predictable on successive loads):

    $xml.files.file[-1]

    The "-1"   is PowerShell for last element of an array.


    ¯\_(ツ)_/¯

    • Marked as answer by Kelly B2 Sunday, March 1, 2015 9:56 PM
    Sunday, March 1, 2015 9:35 PM
  • I am playing with everything you provided now to see if the lights will finally pop on :)

    My xml document does look like the sample you posted but (since I am new here) I was unable to post an image/screenshot, so I had to copy and paste.  I think that's what may have messed it up...

    Thanks again for your expertise, it is VERY much appreciated!


    Sunday, March 1, 2015 9:45 PM
  • If you read the info on the two websites I posted it will all become clear.  You only need to  read about three pages each.

    ¯\_(ツ)_/¯

    Sunday, March 1, 2015 10:12 PM