Hey, Scripting Guy! Is it possible to use Windows PowerShell to groom a text file by deleting extra returns and line feeds it may contain? Here is the deal. I have a file that we need to download each day. This file contains updated prices for commodities used in our manufacturing operation. We use the updated commodity pricing to determine our base cost for profit and loss in our general ledger application (g/l app). A back office application automatically sets our quarterly price and volume discount points based upon the information from the g/l app. Needless to say, this file is mission critical for us.
In the past, when we tried to automate the downloading and updating process, there were several “incidents.” As a result, our management is skittish about anything related to these applications. The problem is that right now this is a time-consuming manual process. If someone goes on vacation, we have to train a replacement. It is a lot of stress and I would love to automate it using Windows PowerShell. Help!
— TB
Hello TB,
Microsoft Scripting Guy Ed Wilson here. It is so hot and steamy outside that the crickets are hibernating. The dragonflies cannot fly because of the high humidity that saturates their thin wings, making them hang limply like a kite left outside during a spring shower. The birds sit in the shade pleading for a breeze. The rabbits wander aimlessly around the backyard vainly seeking a single green blade of grass for a meal.
Late July in Charlotte offers special opportunities and unique challenges for outdoor activities. Most people seem to shuttle from one air-conditioned spot to another with little thought or attention given to the intermediate scenery. I feel as if I would like to take a nap in the afternoon like I did when I was in Spain. The following photo is one I took in Mijas in Southern Spain. Mijas is a really cool old town with narrow streets and lots of interesting shops.
TB, a text file such as the one shown in the following image has a number of returns and line feeds in it. These are seen in between the paragraphs and at the end of the text file. When importing the data from the text file into a database, each line could introduce a blank record into the database and cause the problems you experienced with your application.
To remove the blank lines in the text file involves several steps.
- Rename the text file.
- Read the text file.
- Find lines with text while ignoring blank lines.
- Remove the blank lines from the text.
- Write the revised text to a new file.
- Save the new file with the name of the original text file.
The reason for the roundabout procedure is that, after the text file is opened, a lock will remain on the file that prevents overwriting the file with new content. This is shown in the following image.
To rename the text file, use the Rename-Item cmdlet. You use the path parameter to specify the original file, and the newname parameter for the new name. One thing that is important to keep in mind is that the Rename-Item cmdlet does not have the ability to move a file in the rename operation. If you need to move a file, use the Move-Item cmdlet. Because you cannot move a file with the Rename-Item cmdlet, the inclusion of the path in the newname parameter is redundant. The use of the parameter names themselves is also not required. You can also use the ren alias if you wish. The short version of the command is shown here:
PS C:\> ren C:\fso\text1.txt text1.bak
PS C:\>
The complete command with parameter names is shown here:
Rename-Item -Path C:\fso\Text1.txt -NewName C:\fso\text1.bak
After the backup copy of the file has been created, use the Select-String cmdlet to look for Word characters. This regular expression pattern is documented on MSDN. The easiest way to open a text file and parse the contents is to use the Select-String cmdlet. We talked about using the Select-String cmdlet in yesterday’s Hey, Scripting Guy! Blog. The Select-String cmdlet can be used to search for a wildcard character pattern or for a regular expression pattern. As shown here, the Select-String cmdlet returns both the matching line of text, the file name, and the line number that contains the match:
PS C:\> Select-String -Pattern "\w" -Path C:\fso\Text1.bak
fso\Text1.bak:1:This is a paragraph. This is a paragraph. This is a paragraph. This
is a paragraph.
fso\Text1.bak:2:This is a paragraph. This is a paragraph. This is a paragraph. This
is a paragraph.
fso\Text1.bak:3:This is a paragraph. This is a paragraph. This is a paragraph. This
is a paragraph.
fso\Text1.bak:4:This is a paragraph. This is a paragraph. This is a paragraph. This
is a paragraph.
fso\Text1.bak:6:This is a paragraph. This is a paragraph. This is a paragraph. This
is a paragraph.
<results truncated>
After the matching text is retrieved, use the ForEach-Object cmdlet to choose the matching line of text, and pipe the results to the new file. The complete command is shown here:
PS C:\> Select-String -Pattern "\w" -Path C:\fso\Text1.bak | ForEach-Object { $_.line
} | Set-Content -Path C:\fso\Text1.txt
PS C:\>
After the command has run, the newly created file will contain text only. This is shown in the following image
TB, that is all there is to using Windows PowerShell to remove extra returns and line feeds from a text file. String Week will continue tomorrow when we will talk about selecting specific columns of data from a tab-separated file.
We would love you to follow us on Twitter and Facebook. If you have any questions, send email to us at scripter@microsoft.com, or post your questions on the Official Scripting Guys Forum. See you tomorrow. Until then, peace.
Ed Wilson and Craig Liebendorfer, Scripting Guys
0 comments