Use PowerShell to Normalize Names Before Data Import

Doctor Scripto

Summary: Use Windows PowerShell to normalize names before importing data.

Microsoft Scripting Guy, Ed Wilson, is here. I have been reading Jane Austen Northanger Abbey this week in the evenings. I really love the way Jane Austen develops her characters. I also like some of the ways the characters express themselves. For example, “Give me but a little cheerful company…”

Indeed. I like to hang out with people who are cheerful, and who have a good attitude. It rubs off. 

This is also how Windows PowerShell is. For example, rather than complaining about the person who designed a poor database—who did not implement strict data input controls, and rather, provided a blank unencumbered text box for data input, I can be cheerful and glad that I have Windows PowerShell to help me out. Indeed, Windows PowerShell makes quick work of text manipulation.

As you will recall from my recent post, Use PowerShell to Read Munged Data, the names in the data import file are in poor shape. The first names and last names are in a single field. In some cases, the names are first name first and last name last. In other cases, they are reversed. Some are uppercase with lowercase, and others are all lowercase. It seems that just about every way of inputting the data has been used to create the file.

In the following image, I can see the different ways that the names appear:

Image of spreadsheet

The first thing I do is import the CSV file and store the created custom objects in the $datain variable. This is shown here:

$datain = import-csv C:\DataIn\DataDump.csv

I now use a Foreach command to walk through the collection of custom objects:

Foreach ($d in $datain)

    {

By looking at my data, I can see that if I have a comma, the name is Last name, First name. First name Last name does not use a comma. So I use the If statement to look for a comma. If there is a comma, I enter a script block to fix the name. This is shown here:

If($d.name -match  ',')

        { 

The easiest way to title cap the name (that is, the first letter is capitalized and the remaining letters in the name are lowercase) is to use the ToTitleCase method from the TextInfo class from the CultureInfo class that is found in the System.Globalization namespace.

Luckily, I can gain access to this by using the Get-Culture cmdlet and accessing the TextInfo property. (Accessing the TextInfo property returns a TextInfo object, and that gives me the ToTitleCase method.) I call this method to title cap the name. I then split the name at the comma (and create two elements in an array). Then I trim the elements to remove any leading or trailing spaces. This line is shown here:

$name = (Get-Culture).TextInfo.ToTitleCase($d.name).Split(',').trim() 

Now I take my elements, plug them into the Lname and Fname fields of a custom object, and I am done with the record. This is shown here:

[PSCustomObject]@{

            Lname = $name[0]

            Fname = $name[1] }}

If the name field does not include a comma, it means the name is First name, Last name. I could also title cap the names if I need to, but for this example, I leave that out. I split the field into two elements of an array, and plug them into custom properties to make a new object.

Note  If not specified, the Split method splits at a blank space.

Here is the code to split the First name, Last name entries:

ELSE {

         $name = $d.Name.Split().trim()

          [PSCustomObject]@{

            Fname = $name[0]

            Lname = $name[1] } }

The complete FixName.ps1 script is shown here:

$datain = import-csv C:\DataIn\DataDump.csv

Foreach ($d in $datain)

    {

      If($d.name -match  ',')

        {

          $name = (Get-Culture).textinfo.ToTitleCase($d.name).Split(',').trim()

          [PSCustomObject]@{

            Lname = $name[0]

            Fname = $name[1] }}

        ELSE {

         $name = $d.Name.Split().trim()

          [PSCustomObject]@{

            Fname = $name[0]

            Lname = $name[1] } }

      } 

When I run it, the following output appears:

Image of command output

That is all there is to using Windows PowerShell to fix user names before importing data. Data Manipulation Week will continue tomorrow when I will talk about fixing the address.

I invite you to follow me on Twitter and Facebook. If you have any questions, send email to me at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.

Ed Wilson, Microsoft Scripting Guy

0 comments

Discussion is closed.

Feedback usabilla icon