October 13th, 2008

Hey, Scripting Guy! How Can I Read a CSV File?

Hey, Scripting Guy! Question

Hey, Scripting Guy! I am new to scripting, but I have been tasked with creating a script anyway. What I need to do is to read in a CSV file and then perform an operation on each line in the file. I do not know what the correct cmdlet is or what action I need to use. Can you point me in the right direction?

– DF

SpacerHey, Scripting Guy! Answer

Hi DF,

Point you in the right direction? Sure. As one of my old university professors once said, “Go west, young man.” (No, I am not from the nineteenth century. He was quoting someone else.) Because you asked about a cmdlet, it sounds as if you are attempting to use a Windows PowerShell script. The nice thing about Windows PowerShell is that there is a cmdlet called Import-CSV that will automatically read a CSV (comma separated value) file. This is much easier to do than what we needed to do with VBScript, which would have required creating an instance of the FileSystemObject, iterating through it, using split to break the line into an array and then work with each element. There are numerous examples of using this technique on the Script Center.

To use the Import-CSV cmdlet you need to first give it the path to the text file. We next use the ForEach-Object cmdlet to walk through the array of text and then we access the values via the name of the column headers. While we are hanging out inside the ForEach loop we can do anything we wish with the values we retrieve from the CSV file. This technique is illustrated in the ReadCsvDoWmi.ps1 script where we read a text file that contains a listing of WMI classes and a listing of computer names. We take the values from the CSV file and run a bunch of WMI queries. The ReadCsvDoWmi.ps1 script is seen here:

$path = "c:\fso\Classes.csv"
Import-csv -path $path | 
foreach-object `
{ 
  Get-WmiObject -class $_.class -computername $_.computername 
}

The CSV file we are going to use in the ReadCSVDoWmi.ps1 script is seen in Figure 1. As you can see, the first row of the file is the column headers which in this case define two values: the Class and the ComputerName. The values we have in the class column: WIN32_Bios, WIN32_Bus, WIN32_UsbHub, and WIN32_Share are just a few of the more than 1,000 WMI classes contained in a standard installation of any modern version of Microsoft Windows.

Incidentally, if you are wondering why I chose these classes, it is because they are some of the shorter named WMI classes in the default root\cimv2 WMI namespace and therefore they are easy to type. Here is the PowerShell command I used to find these WMI classes with short names:

Get-WmiObject -List | Where-Object { $_.name.length -le 11 -AND $_.name -notmatch '^cim' }

To make the script easy to test on a single computer, I used various names for the local computer:localhost, loopback, and the IP address for loopback, 127.0.0.1. Without further ado (we talked about ADO last week anyway), here is this masterpiece of lazy ingenuity called a CSV file:

Figure 1 The CSV file used in the ReadCsvDoWmi.ps1 script

Figure 1  The CSV file used in the ReadCsvDoWmi.ps1 script

 

Let’s take an in-depth look at the ReadCsvDoWmi.ps1 script. It begins with a straightforward value assignment to a variable. All variables in Windows PowerShell begin with the dollar sign (which makes them really inexpensive outside the USA). The advantage of this convention is that whenever you see something with a dollar sign in front of it, guess what? It is a variable. One of the things I used to run into when teaching VBScript was that beginning students often had a problem differentiating between a language statement, program ID, and a variable. Why? Well, first of all, when you use the Hungarian naming standard, all your variable names look like weird programming things. No one would deliberately create a variable named objfso, would they? All the variables I learned in algebra were things like “x.” In Windows PowerShell, we avoid that confusion—all variables begin with a dollar sign. So, we create a variable named $path and guess what? We assign to it the path to a CSV file. Here is the line of code that does this:

$path = "c:\fso\Classes.csv"

Now we want to read the CSV file. To do this we use the Import-CSV cmdlet while specifying the path to the CSV file. We end the line of code by using the pipeline (“|”) character. The pipeline in Windows PowerShell is used to pass data from the CSV file to the next command. This line of code is seen here:

Import-csv -path $path |

The advantage of using the pipeline is that it is faster, is more efficient, and uses less memory. This is very important when you are reading files that are larger than our classes.csv file. The traditional approach to our problem of reading a CSV file would be the ReadCsvDoWmiDoNotUsePipeline.ps1 script that is seen following this paragraph. Note that in this script, we read the contents of the CSV file and store the results in the $csv variable. This means we cannot proceed to the next line of code until we have completed reading the entire contents of the CSV file. This also means the entire contents of the CSV file are stored in memory. When we have completed that operation, we proceed to iterate through it in much the same fashion as our original script:

ReadCsvDoWmiDoNotUsePipeline.ps1

$path = "c:\fso\Classes.csv"
$csv = Import-csv -path $path
foreach($line in $csv)
{ 
  Get-WmiObject -class $line.class -computername $line.computername 
}

But by using the pipeline, we immediately begin operating on the contents of the CSV file as they are dragged from the hard disk. This means the script will be able to complete faster, and it also means the script will use less memory during the processing of a large CSV file. In the case of our small CSV file, the “bad” ReadCsvDoWmiDoNotUsePipeline.ps1 script takes more time to run than the script that uses the pipeline. But not very much! This is seen here:

Figure 2 Relative times in milliseconds to run the two scripts

Figure 2  Relative times in milliseconds to run the two scripts

 

Before you start thinking: “Wow, this is cool! I can use this as a bullet item on my review: Modified the ReadCsvDoWmiDoNotUsePipeline.ps1 script and obtained a 6 percent increase in efficiency.” Keep in mind that you should not rely on millisecond results from the Measure-Command cmdlet in Windows PowerShell. It simply is not accurate. If you run the two scripts in question with our specific CSV file, you will obtain as many different results as the number of times you engage in the exercise.

We take the pipelined results from Import-CSV and send them to the Foreach-Object cmdlet. We use the line continuation character (the backtick, grave accent, backward apostrophe—whatever you want to call the silly thing) to allow us to continue the command to the next line. I am doing this because I want to line up my curly brackets (makes the code easier to read). Here is the line in question:

foreach-object `

The next line is a curly bracket as seen here (I get paid by the line):

{

Now we have the option of doing whatever we wish to the values we have obtained from the CSV file. In this example we are performing WMI queries by querying specific WMI classes on various computers. To do this we use the Get-WmiObject cmdlet. One thing you will notice is that I made the column heads in the CSV file the same as the name of the parameters I was planning on using with the Get-WmiObject cmdlet. This makes the code easy to read, and is something I emphasize in my PowerShell Best Practices book.

Get-WmiObject -class $_.class -computername $_.computername

Followed by another curly bracket (you knew it would be coming):

}

DF, I hope this helps you get started reading CSV files with Windows PowerShell. Additionally, the technique of storing computer names and WMI classes in a CSV file and quickly walking through them to make the queries is a pretty cool idea that could be easily expanded upon.

Ed Wilson and Craig Liebendorfer, Scripting Guys

Author

0 comments

Discussion are closed.