April 5th, 2010

Hey, Scripting Guy! How Can I Search a Microsoft Excel Workbook for Links to Other Workbooks?

Bookmark and Share 

  Hey, Scripting Guy! Question

Hey, Scripting Guy! I have a user in our finance department that asked me today about using a Windows PowerShell script to identify Microsoft Excel workbooks that have links inside them to other Microsoft Excel workbooks. Frankly, I am not exactly sure what she is talking about, but I do think it is great that she asked me about Windows PowerShell. Can you help me out? I do not want this user to get all depressed the first time she attempts to do something with Windows PowerShell.

— SV

 

Hey, Scripting Guy! AnswerHello SV,

Microsoft Scripting Guy Ed Wilson here. It has been many years since I created Microsoft Excel workbooks that contained links to other workbooks, or more accurately, links to spreadsheets in other workbooks. I believe I know the reason your user is asking for such a script, and that is because according to the Microsoft Excel online Help, there is no automated method of detecting such links. But if one of the workbooks becomes moved, renamed, or deleted, the workbook containing the link will not work properly. This is shown in the following image.

Image of broken workbook

 

The link to a spreadsheet in another Microsoft Excel workbook looks something like this: =’C:fso[cdRoms.xlsx]Sheet1′!$A$1. Therefore, it will be possible to write a Windows PowerShell script that will search a workbook for formulas that match that pattern. The Get-LinkedExcelSheets.ps1 script is an example of a script that does this. The complete Get-LinkedExcelSheets.ps1 script is shown here.

Get-LinkedExcelSheets.ps1

$path = “C:fso”
$excelSheets = Get-Childitem -Path $path -Include *.xls,*.xlsx -Recurse
$excel = New-Object -comobject Excel.Application
$excel.visible = $false

foreach($excelSheet in $excelSheets)
{
 $workbook = $excel.Workbooks.Open($excelSheet)
 “There are $($workbook.Sheets.count) sheets in $excelSheet”

 For($i = 1 ; $i -le $workbook.Sheets.count ; $i++)
 {
  $worksheet = $workbook.sheets.item($i)
  “`tLooking for links on $($worksheet.name) worksheet”
  $rowMax = ($worksheet.usedRange.rows).count
  $columnMax = ($worksheet.usedRange.columns).count
  For($row = 1 ; $row -le $rowMax ; $row ++)
  {
   For($column = 1 ; $column -le $columnMax ; $column ++)
    {
     [string]$formula = $workSheet.cells.item($row,$column).formula
     if($formula -match “w?:\w*\[w*.xlsw?]”) {“`t`t$($formula)”}
    } #end for $column
   } #end for $row
  $worksheet = $rowmax = $columnMax = $row = $column = $formula = $null
 } #end for
 $workbook.saved = $true
 $workbook.close()
} #end foreach
 
 $excel.quit()
 $excel = $null
 [gc]::collect()
 [gc]::WaitForPendingFinalizers()

For more information about using Windows PowerShell with the Microsoft Excel automation model, refer to the Microsoft Excel articles on the Hey, Scripting Guy! Blog.

To illustrate working with linked Microsoft Excel spreadsheets, I created a couple of Microsoft Excel workbooks containing spreadsheets with links to other spreadsheets in other workbooks. This is shown in the following image.

Image of Microsoft workbook with links

 

The first thing to do in the Get-LinkedExcelSheets.ps1 script is to assign the path to the folder that contains Microsoft Excel workbooks. The path is assigned to the $path variable. Next, use the Get-ChildItem cmdlet to retrieve all of the Microsoft Excel spreadsheets. The include parameter includes all of the legacy XLS workbooks as well as the new XLSX workbooks. The recurse parameter is required to make the Get-ChildItem cmdlet return all files that match the search criteria in the folder. This section of the script is shown here:

$path = “C:fso”
$excelSheets = Get-Childitem -Path $path -Include *.xls,*.xlsx -Recurse

Now it is time to create an instance of the Excel.Application COM object. To do this, use the New-Object cmdlet and specify the name of the COM object. The application object is the main object that is used when working with Microsoft Excel automation. Because there are possibly many Microsoft Excel workbooks in the folder, it is not desirable to have the Microsoft Excel application appearing and disappearing during the running of the script. Therefore, the visible property from the Microsoft Excel application object is set to $false. This is shown here:

$excel = New-Object -comobject Excel.Application
$excel.visible = $false

When the Excel.Application object has been created, and the collection of Microsoft Excel workbooks has been generated, it is time to walk through the collection. To do this, use the ForEach statement as shown here:

foreach($excelSheet in $excelSheets)
{

Inside the ForEach statement, use the Open method from the Workbooks collection of the Excel.Application object. Pass the path that is stored in the $excelSheet variable to the Open method, and store the returned workbook object in the $workbook variable as shown here:

 $workbook = $excel.Workbooks.Open($excelSheet)

The sheets object, which is obtained from the workbook object, contains the count property that can be used to determine how many spreadsheets are contained in a Microsoft Excel workbook. This information is displayed in the Windows PowerShell console. A subexpression $() is required to prevent the COM object from unraveling inside the double quotation marks. The subexpression forces the value of the expression to be processed before the string is displayed. This is shown here:

 “There are $($workbook.Sheets.count) sheets in $excelSheet”

Now it is time to walk through the collection of spreadsheets. To do this, the For statement is used as shown here:

 For($i = 1 ; $i -le $workbook.Sheets.count ; $i++)

The item method from the sheets collection is used to retrieve a specific spreadsheet. Because the For statement will continue to operate once for each of the total number of spreadsheets in the collection, the $i variable is used to keep track of where the operation is within the collection. The worksheet object is stored in the $worksheet variable as shown here:

 {
  $worksheet = $workbook.sheets.item($i)

Next, a status message is displayed on the Windows PowerShell console. The `t character is a special character that causes the text to tab over one stop. A subexpression is used to retrieve the name of the worksheet via the name property. This is shown here:

  “`tLooking for links on $($worksheet.name) worksheet”

Now the number of rows and columns that are used within the spreadsheet are calculated via the count property of the range object. The number of rows and the number of columns are assigned to variables as shown here:

  $rowMax = ($worksheet.usedRange.rows).count
  $columnMax = ($worksheet.usedRange.columns).count

For each used row and each used column, the value of the formula property of the cell is retrieved and stored in the $formula variable. The [string] type is used to ensure the formula is returned as a string so that a regular expression pattern match can be used to look for linked spreadsheets. This is shown here:

  For($row = 1 ; $row -le $rowMax ; $row ++)
  {
   For($column = 1 ; $column -le $columnMax ; $column ++)
    {
     [string]$formula = $workSheet.cells.item($row,$column).formula

The If statement is used to see if the formula stored in the $formula variable matches a particular regular expression pattern. The regular expression pattern is not particularly complicated, but it is a bit cumbersome. The regular expression pattern is shown here:

w?:\w*\[w*.xlsw?]

A useful breakdown of the pattern is shown in the following table.

Regular expression

Meaning

w

a word character [a-zA-Z_0-9]

?

one instance of preceding character

:

literal character

escapes a backslash

the backslash character

w

a word character [a-zA-Z_0-9]

Author

0 comments

Discussion are closed.

Feedback