none
How to check if certain XML attribute exists within separate equally named XML nodes? RRS feed

  • Question

  • Hello,

    I am trying to loop through multiple XML files and check if certain attribute exists and, depending on the result, produce an appropriate output.

    All XML files have identical structure and I am extracting information that is stored between tag <input>. Here is a sample XML content from one of the files called "lex_csv_millenium.xml":

    	<input name="mainIn" className="com.abc.feed.csv.CsvFileInput">
    		<property name="forceCsvVersion">1.9</property>
    		<property name="allowedExchangesList">XEEE</property>
    		<archiveOnly>true</archiveOnly>
    
    		<filePattern>
    			<marketCode>millennium_lex_b</marketCode>
    			<format>zip</format>
    		</filePattern>
    	</input>
    
    	<input name="ztypeIn" className="com.cde.feed.csv.TarCsvFileInput">
    		<property name="ignoreExtraFieldValues">true</property>
    		<property name="columns">ZTYPE</property>
    				
    		<filePattern>
    			<marketCode>millennium_as_b</marketCode>
    			<format>csv</format>
    		</filePattern>
    	</input>


    So, given the above XML, my script checks whether element <property name="forceCsvVersion"> exists within <input> node. The challenge that I am facing here is that in this sample XML there are 2 <input> nodes - and my script only evaluates the first <input> node and ignores the rest. My code:

    Get-ChildItem -Path 'C:\ps_scripts\configs\*csv*' -Recurse |
    ForEach-Object {
        $xml_file = $_ 
        $content = [xml](Get-Content $xml_file)
        
        $myObject = [PSCustomObject]@{
            BaseName        = $xml_file.BaseName
            forceCsvVersion = $content.SelectNodes('//input[@className="com.abc.feed.csv.CsvFileInput" or
                                                   @className="com.cde.feed.csv.TarCsvFileInput"]/property[@name="forceCsvVersion"]').count -as [bool]
        }
    
        $overlayType = $null
        If ($myObject.forceCsvVersion -eq "True") {
            $overlayType = "New Overlay"
        }
        else {
            $overlayType = "Old Overlay"
        }
    
        $myObject.Basename + " -- " + $overlayType
        
    }


    And the output that is being produced is this:

    ams_csv_kasbank -- Old Overlay
    amt_csv_jpmorgan -- New Overlay
    lex_csv_millennium -- New Overlay


    As seen above, the script found attribute "forceCsvVersion" within the first <input> node of file lex_csv_millennium and did not check the second <input> node.

    Could you please advise how do I check if a certain attribute exists within multiple <input> nodes?

    Desired output would be:

    ams_csv_kasbank -- Old Overlay
    amt_csv_jpmorgan -- New Overlay
    lex_csv_millennium -- New Overlay
    lex_csv_millennium -- Old Overlay




    • Edited by Kamokoba Sunday, October 11, 2020 11:43 PM
    Sunday, October 11, 2020 11:41 PM

All replies

  • Here is how to retrieve a specific item.

    $xml.SelectSingleNode('//input/property[@name="forceCsvVersion"')

    See the following for instructions:

    XPath Tutorial (w3schools.com)



    \_(ツ)_/

    Monday, October 12, 2020 12:37 AM
  • I modified my script so that it uses foreach to loop through every single <input> element within a file:

       


    Get-ChildItem -Path 'C:\ps_scripts
    
    \configs\*csv*' -Recurse |
    ForEach-Object {
        $xml_file = $_ 
    
        $content = [xml](Get-Content $xml_file)
        $nodes = $content.SelectNodes("//input")
    
        foreach ($node in $nodes) {
            $forceCsvVersion = $node.SelectNodes('//input[@className="com.abc.feed.csv.CsvFileInput" or
                                                    @className="com.cde.feed.csv.TarCsvFileInput"]/property[@name="forceCsvVersion"]').count -as [bool]
            $xml_file.BaseName + ' -- ' + $forceCsvVersion
        }
    
    }

    However, the output that I get is:

    ams_csv_kasbank -- True
    amt_csv_jpmorgan -- False
    lex_csv_millennium -- True
    lex_csv_millennium -- True

    Instead of the expected:

    ams_csv_kasbank -- True
    amt_csv_jpmorgan -- False
    lex_csv_millennium -- True
    lex_csv_millennium -- False

    It's obvious that file lex_csv_millennium contains 2 <input> elements but only 1 of those elements actually contain <property name="forceCsvVersion">, so only i dont know what's going on with this thing

    Monday, October 12, 2020 3:27 PM
  • There is no need to loop anything.  Just query the XML for the existence.  The code I posted does exactly that as you requested.

    Without some clear understanding of PowerShell and XML this will be hard for you to understand.  Test one file until you see what is happening.


    \_(ツ)_/

    Monday, October 12, 2020 6:13 PM
  • I do need to loop as there are multiple <input> elements in a file.

    $xml.SelectSingleNode('//input/property[@name="forceCsvVersion"]') will simply retrieve only the first occurrence of <input> element that happens  to contain  "forceCsvVersion" attribute. If there are other <input> elements, they will be ignored if there is no looping mechanism, regardless whether they contain  "forceCsvVersion" attribute or not.

    Monday, October 12, 2020 7:22 PM
  • I do need to loop as there are multiple <input> elements in a file.

    $xml.SelectSingleNode('//input/property[@name="forceCsvVersion"]') will simply retrieve only the first occurrence of <input> element that happens  to contain  "forceCsvVersion" attribute. If there are other <input> elements, they will be ignored if there is no looping mechanism, regardless whether they contain  "forceCsvVersion" attribute or not.

    Your question claims you are only looking for the existence of one in each file.  If you want all that match the criteria then the query will return all matching nodes as a collection.

    To match all nodes use this:

    $xml.SelectNodes('//input/property[@name="forceCsvVersion"]') 

    Again - you need to learn XML and PowerShell in order to understand what I am telling you.  Of course you may have badly stated your question.  There is no way we can determine that.  We can only understand your question as asked.,


    \_(ツ)_/


    • Edited by jrv Monday, October 12, 2020 10:20 PM
    Monday, October 12, 2020 10:16 PM
  • Yes, I'm looking for the existence of "forceCsvVersion" attribute within each <input> element in a file, no matter how many <input> elements might be present in a given file. Based on the findings (True or False), my script will perform other operations.

    So, in the earlier mentioned file lex_csv_millennium.xml, there are 2 <input> elements but only 1 of them contains attribute "forceCsvVersion" (within <property> element):

    	<input name="mainIn" className="com.abc.feed.csv.CsvFileInput">
    		<property name="forceCsvVersion">1.9</property>
    		<property name="allowedExchangesList">XEEE</property>
    		<archiveOnly>true</archiveOnly>
    
    		<filePattern>
    			<marketCode>millennium_lex_b</marketCode>
    			<format>zip</format>
    		</filePattern>
    	</input>
    
    	<input name="ztypeIn" className="com.cde.feed.csv.TarCsvFileInput">
    		<property name="ignoreExtraFieldValues">true</property>
    		<property name="columns">ZTYPE</property>
    				
    		<filePattern>
    			<marketCode>millennium_as_b</marketCode>
    			<format>csv</format>
    		</filePattern>
    	</input>

    So the first thing that I do is to loop through all XML files in a directory and check how many <input> elements each of the files contain:

    Get-ChildItem -Path 'C:\ps_scripts\configs\*csv*' -Recurse |
    ForEach-Object {
        $xml_file = $_ 
    
        $content = [xml](Get-Content $xml_file)
        $nodes = $content.SelectNodes("//input")
    	$xml_file.BaseName + ' -- ' + $nodes
    
    }


    The output that I get after running the script above is this:

    ams_csv_jpmorgan -- input
    ams_csv_kasbank -- input
    ams_csv_societegenerale -- input
    lex_csv_millennium -- input input

    As you can see, the filename lex_csv_millennium is having 2 "input" elements, in contrast to all the other sample files, which only have 1 "input".

    So, now I need to loop through each <input> element within each file and check whether there is "forceCsvVersion" attribute or not.

    In order to do this, I add a foreach loop so that each <input> element within a file is checked and determine if "forceCsvVersion" attribute exists:

    Get-ChildItem -Path 'C:\ps_scripts\configs\*csv*' -Recurse |
    ForEach-Object {
        $xml_file = $_ 
    
        $content = [xml](Get-Content $xml_file)
        $nodes = $content.SelectNodes("//input")
    	
    	foreach ($node in $nodes) {
            $forceCsvVersion = $node.SelectSingleNode('//property[@name="forceCsvVersion"]').count -as [bool]
            $xml_file.BaseName + ' -- ' + $forceCsvVersion
            
        }
    
    }


    What I expect the foreach loop to perform is to iterate through every single occurrence of <input> element and evaluate whether "forceCsvVersion" exists (True or False). So I would expect the following output:

    ams_csv_jpmorgan -- True
    ams_csv_kasbank -- False
    lex_csv_millennium -- True
    lex_csv_millennium -- False

    However, the actual output is:

    ams_csv_jpmorgan -- True
    ams_csv_kasbank -- False
    lex_csv_millennium -- True
    lex_csv_millennium -- True

    So the script says that it found attribute "forceCsvVersion" in both <input> elements while in reality only 1 of <input> elements in file "lex_csv_millennium.xml" is containing "forceCsvVersion"


    Tuesday, October 13, 2020 12:10 PM
  • The code I posted does exactly what you are asking without using a loop.  It finds all matching nodes.  If none are found the logical value of the code is "false"

    if($xml.SelectNodes('//input/property[@name="forceCsvVersion"]')){
        # true - one or more nodes found.
    }else{
        # false - no matches found
    }

    I think you don't understand what a node is or how XML works.  You are making bad assumptions.  

    You say you just want to know if any node in the file matches but you then look at all nodes.  Why?  Either you are asking the wrong question or you are not understanding how PowerShell, XML and programming work.


    \_(ツ)_/


    • Edited by jrv Tuesday, October 13, 2020 12:52 PM
    Tuesday, October 13, 2020 12:50 PM
  • "You say you just want to know if any node in the file matches but you then look at all nodes"

    Now, this is getting ridiculous.

    First of all, I never said I want to know if any node in the file contains attribute "forceCsvVersion". I think I emphasized numerous times that the only nodes that I care about are <input> nodes. 

    Secondly, take a closer look at your own sentence:

    "You say you just want to know if any node in the file matches but you then look at all nodes. Why?"

    If my intention was to know if any node in the file matches, shouldn't I look at all the nodes in the file?

    FYI: any_node_in_file = all_nodes_in_file

    Either you intentionally misinterpret my questions or you are simply struggling with expressing your thoughts clearly.


    • Edited by Kamokoba Tuesday, October 13, 2020 6:12 PM
    Tuesday, October 13, 2020 6:11 PM
  • Why?  If any node matches the query will evaluate to true.  Please take some time to learn PowerShell as this would then be obvious.

    \_(ツ)_/

    Tuesday, October 13, 2020 11:18 PM
  • Take some time to test your own proposed solution and you'll find out that it doesn't work in this particular case.

    In the meantime, I've found a neat workaround for this problem, thank you.

    Wednesday, October 14, 2020 11:30 AM
  • Take some time to test your own proposed solution and you'll find out that it doesn't work in this particular case.

    In the meantime, I've found a neat workaround for this problem, thank you.

    I did test it and it does work to accomplish what you asked.  The issue is either that you are not asking the question you think or you have not given correct info and example.

    Here is a demo that shows that any node that meets your criteria works as I stated.

    PS C:\scripts> $xml = [xml]@'
    >> <root>
    >> <input name="mainIn" className="com.abc.feed.csv.CsvFileInput">
    >> <property name="forceCsvVersion">1.9</property>
    >> <property name="allowedExchangesList">XEEE</property>
    >> <archiveOnly>true</archiveOnly>
    >>
    >> <filePattern>
    >> <marketCode>millennium_lex_b</marketCode>
    >> <format>zip</format>
    >> </filePattern>
    >> </input>
    >> <input name="ztypeIn" className="com.cde.feed.csv.TarCsvFileInput">
    >> <property name="ignoreExtraFieldValues">true</property>
    >> <property name="columns">ZTYPE</property>
    >>
    >> <filePattern>
    >> <marketCode>millennium_as_b</marketCode>
    >> <format>csv</format>
    >> </filePattern>
    >> </input>
    >> </root>
    >> '@
    PS C:\scripts> $xml.SelectSingleNode('//input/property[@name="forceCsvVersion"]')
    
    name            #text
    ----            -----
    forceCsvVersion 1.9
    
    
    PS C:\scripts>

    This also demonstrates the same:

    PS C:\scripts> if($xml.SelectSingleNode('//input/property[@name="forceCsvVersion"]')){
    >>    $true
    >> }else{
    >>     $false
    >> }
    True
    PS C:\scripts>

    Don't worry about this as most non-trained in programming and XML have issues understanding what is happening.  Explaining it is very challenging for us as lack of training and experience makes the issues hard to comprehend.

    I recommend taking the time to learn PowerShell to a full level of competence before trying to tackle this level of technical challenge.


    \_(ツ)_/

    Wednesday, October 14, 2020 5:32 PM

  • Your so-called solution works only as a standalone script which is pretty useless. I think I provided my script and explained what I wanted to achieve in great detail, yet you still push your canned answer and try so hard to sound condescending. 

    If you were so proficient in PowerShell you could easily spot what was wrong in my code, but you didn't - so maybe you're not so good at PowerShell and basic XML as you imagine? No wonder if you spend most of the time lurking in this forum and shoving canned answers instead of writing actual code.

    Hint: the problem in my modified script that I posted below the original post was with $forceCsvVersion definition within foreach loop - after deleting certain part on that line the script is working as expected. But of course you won't get what was wrong with that line. 

    Friday, October 16, 2020 5:40 PM
  • My example is not the complete answer it is how to get the information you asked for.  It is up to you to learn the technology and to learn PowerShell.

    You could also spend some time lear5ning how to ask corerct technical questions as we can only respond to what you write and not to what you think.


    \_(ツ)_/

    Friday, October 16, 2020 6:42 PM
  • WRONG! Your example does not get the information I was asking for. My code already gets this information (in a much more elegant way) with .count -as [bool] part.

    Aside from mediocre PowerShell skills you also have pretty poor English text interpretation capabilities as well. SAD!

    Sunday, October 18, 2020 10:17 AM
  • WRONG! Your example does not get the information I was asking for. My code already gets this information (in a much more elegant way) with .count -as [bool] part.

    Aside from mediocre PowerShell skills you also have pretty poor English text interpretation capabilities as well. SAD!

    Without some example of the code you claim so we can evaluate if you really have a better method your claims are not reasonable.  Continued insults will not help your case.


    \_(ツ)_/

    Sunday, October 18, 2020 11:38 AM
  • From my original post (make sure to move slider all the way to the right):

        $myObject = [PSCustomObject]@{
            BaseName        = $xml_file.BaseName
            forceCsvVersion = $content.SelectNodes('//input[@className="com.abc.feed.csv.CsvFileInput" or
                                                   @className="com.cde.feed.csv.TarCsvFileInput"]/property[@name="forceCsvVersion"]').count -as [bool]
        }

    You're welcome

    Monday, October 19, 2020 9:09 PM
  • So why is that any different from what I posted.  It is actually much more complicated a way to arrive at thee same end.

    One easy simplification is this:

    # at beginning of script
    $xpath = '//input[@className="com.abc.feed.csv.CsvFileInput" or @className="com.cde.feed.csv.TarCsvFileInput"]'
    
    #other code
    
    $myObject = [PSCustomObject]@{
        BaseName        = $xml_file.BaseName
        forceCsvVersion = [bool]$content.SelectNodes($xpath)
    }

    Also note that I was not rewriting your script - I was just showing you how to use the XPath in a way that is better adapted to the task. I was also trying to get you to understand that you do not need to enumerate every node as the XPath already does that.

    I was trying to get you to see the task of programming in a more holistic way as that simplifies design.  You also subtly shifted the question as you proceeded which caused a disconnect between your view and my intended hint.  This is what I was referring to that non-programmers do that makes the task and the understanding a challenge when trying to acquire and understand how to solve problems.

    Anyway - good luck and keep studying the process and learn the formalities of coding as they are extremely useful going forward.


    \_(ツ)_/


    • Edited by jrv Tuesday, October 20, 2020 12:10 AM
    Monday, October 19, 2020 11:53 PM