Hey, Scripting Guy! I have a file that I need to parse. It is an XML file, and it is very ugly. Who in the world ever thought that an XML file would be better than a plain CSV file? Unfortunately, Microsoft seems to be enamored with XML, and I cannot get away from it. Surely, there must be a better way to work with XML than to go cross-eyed looking at all those slashes and angle brackets! Can you throw a fellow a bone?
— TS
Hello TS,
Microsoft Scripting Guy Ed Wilson here. It is Friday in Charlotte, North Carolina, in the United States. There was frost on the lawn this morning, and I was tempted to make a mug of hot chocolate and forego my usual pot of English Breakfast tea, but I prefer real hot chocolate (not the powdered stuff) and I did not feel like fooling with the double boiler on a cold November morning. (I use milk, cream, whole vanilla beans, cinnamon sticks, almonds, hazelnuts, honey, and chunks of Mexican chocolate when I make hot chocolate so it is a big production.) So I ate my omelet while sipping my hot tea and checking my e-mail on my Windows Mobile 6.5 smart phone, and then headed up to my office primed and ready for a great day. It is after all no-meetings Friday.
TS, I am sincerely sorry that XML files upset you (my feelings were similar when I first saw XML). If you were here, I would fix you a cup of my special “Scripting Guy Hot Chocolate.” We would sit down, and I would show you that with Windows PowerShell 2.0, XML files need not be a hassle. Because you are not here, I suggest you make yourself some hot chocolate, and open up Windows PowerShell 2.0, and we’ll work through this together.
One thing that makes working with XML documents easier is having the right tool. If you double-click the XML file, the default association is Internet Explorer. This view lets you open and close nodes by clicking the minus signs on the left side of the code, as seen here:
Using Internet Explorer to view an XML file is better than using Notepad. The color highlighting makes it easier to find nodes, and you can hide nodes you are not interested in seeing by clicking the minus (–) symbol to the left of the node. There is a better tool to use; it is the XML Notepad. The XML Notepad is freely downloadable from the Microsoft Download Center. Using the XML Notepad, you can easily figure out the structure of your XML file and find the names of the various nodes. This is shown here:
The books.xml file that is loaded into XML Notepad, seen in the previous image, is included in various software development kits (SDKs), but you do not need to download and install an entire SDK just to gain access to this rather small XML file. You can easily create it by copying it from MSDN, pasting it into Notepad, and saving it as books.xml (You can use normal Notepad or XML Notepad to do this, whichever one you prefer to use.) It is useful to have a well-formed XML file upon which to practice your techniques because if you run into problems, you will at least know the XML file is working properly.
The easiest way to read an XML file is to use the Select-XML cmdlet. You can use the –path parameter to point to an XML file, and the –Xpath parameter to provide the query. The command and associated results are shown here:
PS C:> Select-Xml -Path C:fsobooks.xml -XPath “//title”
Node Path Pattern
—- —- ——-
title C:fsobooks.xml //title
title C:fsobooks.xml //title
title C:fsobooks.xml //title
title C:fsobooks.xml //title
title C:fsobooks.xml //title
title C:fsobooks.xml //title
title C:fsobooks.xml //title
title C:fsobooks.xml //title
title C:fsobooks.xml //title
title C:fsobooks.xml //title
title C:fsobooks.xml //title
title C:fsobooks.xml //title
One of the problems with the output above is that it is useless. If you examine the output, it lists title, the file name, and the search criteria; however, it provides no detail about the title, which is the information we were really interested in obtaining in the first place. To obtain the actual title of the book, you can pipe the results to the Select-Object cmdlet. However, before doing that, you need to know which kind of object you are working with, so that you will know which properties are available to select. To do this, you use the Get-Member cmdlet and choose the property MemberType. The Get-Member command and the resulting properties of the SelectXMLInfo object are in this code list:
PS C:> Select-Xml -Path C:fsobooks.xml -XPath “//title” | Get-Member -MemberType property
TypeName: Microsoft.PowerShell.Commands.SelectXmlInfo
Name MemberType Definition
—- ———- ———-
Node Property System.Xml.XmlNode Node {get;set;}
Path Property System.String Path {get;set;}
Pattern Property System.String Pattern {get;set;}
The SelectXMLInfo object only exposes three properties. From our earlier output, TS, you know that the path refers to the path to the XML file, and the pattern refers to the XPath query; therefore, the node must be the title information. You can use the Select-Object cmdlet to expand the node property. The revised Windows PowerShell command is contained in the code here:
PS C:> Select-Xml -Path C:fsobooks.xml -XPath “//title” | Select-Object -ExpandProperty node
#text
—–
XML Developer’s Guide
Midnight Rain
Maeve Ascendant
Oberon’s Legacy
The Sundered Grail
Lover Birds
Splish Splash
Creepy Crawlies
Paradox Lost
Microsoft .NET: The Programming Bible
MSXML3: A Comprehensive Guide
Visual Studio 7: A Comprehensive Guide
TS, you might be wondering what the XPath parameter is all about. XPath is a way of querying XML files. There are numerous books written about XPath, and hundreds of pages on MSDN, but a good starting point is to take a look a look at the syntax documentation on MSDN. From this page, you can see that the //title query I used told the Select-Xml cmdlet to look for the use of title anywhere within the XML document. If you want to return the name of the root node, you can use the “/*” syntax seen in the listing here:
PS C:> Select-Xml -Path C:fsobooks.xml -XPath “/*”
Node Path Pattern
—- —- ——-
catalog C:fsobooks.xml /*
To find a book with the id attribute that is equal to bk101, you use the // to find all of the books, and then use a set of square brackets to indicate you are looking for an attribute that is equal to a certain value. This command is in the code that follows.
Keep in mind that XML values and attribute names are case sensitive. For example, book must begin with a small “b”, and “id” is lower case, as in “bk101.”
PS C:> Select-Xml -Path C:fsobooks.xml -XPath “//book[@id = ‘bk101’]”
Node Path Pattern
—- —- ——-
book C:fsobooks.xml //book[@id = ‘bk101’]
PS C:>
Well, TS, as you can see, working with XML files does not need to be complicated. Join us tomorrow as XML Week continues.
If you want to know exactly what we will be looking at tomorrow, follow us on Twitter or Facebook. If you have any questions, send e-mail to us at scripter@microsoft.com or post them on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.
Ed Wilson and Craig Liebendorfer, Scripting Guys
0 comments