Convert specific table of excel sheet to JSON using PowerShell

Sonam Rastogi

There is an excellent script on GitHub that helps to convert a full Excel sheet to JSON format using PowerShell. The script expects the table to be at the start of the sheet; that is, to have the first header in the A1 cell.

I had a little different requirement. I had to convert a specific table among various tables available within a sheet in an Excel file as shown in image below.

Screenshot of an Excel sheet showing a table in the middle of a sheet instead of at the start

Our requirement is to read Class 6 student’s data. In the above screenshot, there are multiple sheets within the Excel workbook. There are multiple tables like Class 1, Class 2, and so on inside the Science sheet.

As our requirement is to only read Class 6 student’s data from Science sheet, let’s look closely at how the data is available in the Excel sheet.

  • The name of the class is at row 44.
  • The column headers are on row 45.
  • The data starts from row 46.

Note


The tables can be at any location (any column and any row) within the sheet. The only fixed identifier is ClassName which is Class 6 in this example.

Steps to follow

Follow these steps to see how you can read Class 6 data from Science sheet:

Handle input parameters.

The script accepts 3 parameters:

  • $InputFileFullPath – This is path of the input Excel file.
  • $SubjectName – This is name of the sheet inside the Excel file.
  • $ClassName – This is name of the table within the Excel sheet.
$InputFileFullPath = 'C:\Data\ABCDSchool.xlsx' 
$SubjectName = 'Science' 
$ClassName = 'Class 6'

Open the Excel file and read the Science sheet.

$excelApplication = New-Object -ComObject Excel.Application
$excelApplication.DisplayAlerts = $false
$Workbook = $excelApplication.Workbooks.Open($InputFileFullPath)
$sheet = $Workbook.Sheets | Where-Object { $_.Name -eq $SubjectName }
if (-not $sheet) { throw "Could not find subject '$SubjectName' in the workbook" }

Grab the Class 6 table within the Science sheet to work with.

# Find the cell where Class name is mentioned
$found = $sheet.Cells.Find($ClassName) #find the cell where Class name is mentioned
$beginAddress = $Found.Address(0,0,1,1).Split("!")[1]
$beginRowAddress = $beginAddress.Substring(1,2)
$startHeaderRowNumber = [int]$beginRowAddress + 1 #header row starts 1 row after the class name 
$startDataRowNumber = $startHeaderRowNumber + 1 #student data row starts 1 rows after header row
$beginColumnAddress = $beginAddress.Substring(0,1)
$startColumnHeaderNumber = [BYTE][CHAR]$beginColumnAddress - 65 + 1 #ASCII number of column

Extract the header column names (Logical Seat Location, Actual Seat Location, LAN Port #, Monitor Cable Port, Student Name, Student#, and Room Type)

$Headers          = @{}
$numberOfColumns  = 0
$foundHeaderValue = $true
while ($foundHeaderValue -eq $true) {
    $headerCellValue = $sheet.Cells.Item($startHeaderRowNumber, $numberOfColumns+$startColumnHeaderNumber).Text 
    if ($headerCellValue.Trim().Length -eq 0) {
        $foundHeaderValue = $false
    } else {
        $numberOfColumns++
        if($Headers.ContainsValue($headerCellValue))
        {
            #do not add any duplicate column again.
        }
        else
        {            
            $Headers.$numberOfColumns = $headerCellValue
        }
    }
}

Extract the data (Class 6 student information rows).

$rowNumber = $startDataRowNumber
$finish = $false
while($finish -eq $false)
{
    if ($rowNumber -gt 1) {
        $result = @{}        
        foreach ($columnNumber in $Headers.GetEnumerator()) {
            $columnName = $columnNumber.Value
            $cellValue = $sheet.Cells.Item($rowNumber, $columnNumber.Name+($startColumnHeaderNumber-1)).Value2 # student data row, student data column number
            if($cellValue -eq $null)
            {
                $finish = $true
                break;
            }
            $result.Add($columnName.Trim(),$cellValue.Trim())
        }
        if($finish -eq $false)
        {
            $result.Add("RowNumber",$rowNumber) #adding excel sheet row number for validation        
            $results += $result
            $rowNumber++
        }
    }
}

Create the JSON file and close the Excel file.

$inputFileName = Split-Path $InputFileFullPath -leaf
$jsonOutputFileName = "$($inputFileName.Split(".")[0])-$SubjectName-$ClassName.json"
$jsonOutputFileFullPath = [System.IO.Path]::GetFullPath($jsonOutputFileName) #Output file name will be "ABCDSchool-Science-Class 6.json" 
Write-Host "Converting sheet '$SubjectName' to '$jsonOutputFileFullPath'"
$ignoreOutput = $results | ConvertTo-Json | Out-File -Encoding ASCII -FilePath $jsonOutputFileFullPath
$ignoreOutput = $excelApplication.Workbooks.Close()
$ignoreOutput = [void][System.Runtime.InteropServices.Marshal]::ReleaseComObject($excelApplication)

Putting it all together

The full code goes like this:

param (
    # Excel name
    [Parameter(Mandatory=$true)]
    [string]$InputFileFullPath,
    # Sheet name
    [Parameter(Mandatory=$true)]
    [string]$SubjectName,
    # Identifier for the table
    [Parameter(Mandatory=$true)]
    [string]$ClassName
)

#region Open Excel file
$excelApplication = New-Object -ComObject Excel.Application
$excelApplication.DisplayAlerts = $false
$Workbook = $excelApplication.Workbooks.Open($InputFileFullPath)

# Find sheet
$sheet = $Workbook.Sheets | Where-Object { $_.Name -eq $SubjectName }

if (-not $sheet) {
    throw "Could not find subject '$SubjectName' in the workbook"
}
#endregion Open Excel file

#region Grab the table within sheet to work with
# Find the cell where Class name is mentioned
$found           = $sheet.Cells.Find($ClassName)
$beginAddress    = $Found.Address(0, 0, 1, 1).Split('!')[1]
$beginRowAddress = $beginAddress.Substring(1, 2)
# Header row starts 1 row after the class name
$startHeaderRowNumber = [int]$beginRowAddress + 2
# Student data row starts 1 row after header row
$startDataRowNumber = $startHeaderRowNumber + 1
$beginColumnAddress = $beginAddress.Substring(0,1)
# ASCII number of column
$startColumnHeaderNumber = [BYTE][CHAR]$beginColumnAddress - 65 + 1
#endregion Grab the table within sheet to work with

#region Extract Header Columns Name
$Headers          = @{}
$numberOfColumns  = 0
$foundHeaderValue = $true

while ($foundHeaderValue -eq $true) {
    $headerCellValue = $sheet.Cells.Item(
        $startHeaderRowNumber,
        ($numberOfColumns + $startColumnHeaderNumber)
    ).Text

    if ($headerCellValue.Trim().Length -eq 0) {
        $foundHeaderValue = $false
    } else {
        $numberOfColumns++
        if ($Headers.ContainsValue($headerCellValue)) {
            # Do not add any duplicate column again.
        } else {
            $Headers.$numberOfColumns = $headerCellValue
        }
    }
}
#endregion Extract Header Columns Name

#region Extract Student Information Rows
$results   = @()
$rowNumber = $startDataRowNumber
$finish    = $false

while ($finish -eq $false) {
    if ($rowNumber -gt 1) {
        $result = @{}

        foreach ($columnNumber in $Headers.GetEnumerator()) {
            $columnName = $columnNumber.Value
            # Student data row, student data column number
            $cellValue = $sheet.Cells.Item(
                $rowNumber,
                ($columnNumber.Name + ($startColumnHeaderNumber - 1))
            ).Value2

            if ($cellValue -eq $null) {
                $finish = $true
                break
            }

            $result.Add($columnName.Trim(),$cellValue.Trim())
        }

        if ($finish -eq $false) {
            # Adding Excel sheet row number for validation
            $result.Add("RowNumber",$rowNumber)
            $results += $result
            $rowNumber++
        }
    }
}
#endregion Extract Student Information Rows

#region Create JSON file and close Excel file
$inputFileName = Split-Path $InputFileFullPath -leaf
$inputFileName = $inputFileName.Split('.')[0]
# Output file name will be "ABCDSchool-Science-Class 6.json"
$jsonOutputFileName     = "$inputFileName-$SubjectName-$ClassName.json"
$jsonOutputFileFullPath = [System.IO.Path]::GetFullPath($jsonOutputFileName)

Write-Host "Converting sheet '$SubjectName' to '$jsonOutputFileFullPath'"

$null = $results |
    ConvertTo-Json |
    Out-File -Encoding ASCII -FilePath $jsonOutputFileFullPath
$null = $excelApplication.Workbooks.Close()
$null = [System.Runtime.InteropServices.Marshal]::ReleaseComObject(
    $excelApplication
)
#endregion Create JSON file and close Excel file

The output JSON file will look like below:

[
    {
        "Room Type":  "Standard",
        "RowNumber":  46,
        "Student Name":  "Alex",
        "Student#":  "RL45",
        "LAN Port #":  "LAN Port 7-8",
        "Logical Seat Location":  "SL 11",
        "Actual Seat Location":  "Seat43",
        "Monitor Cable Port":  "C-D"
    },
    {
        "Room Type":  "Standard",
        "RowNumber":  47,
        "Student Name":  "Alex",
        "Student#":  "RL45",
        "LAN Port #":  "LAN Port 5-6",
        "Logical Seat Location":  "SL 11",
        "Actual Seat Location":  "Seat43",
        "Monitor Cable Port":  "A-B"
    },
    {
        "Room Type":  "Standard",
        "RowNumber":  48,
        "Student Name":  "John",
        "Student#":  "RL47",
        "LAN Port #":  "LAN Port 3-4",
        "Logical Seat Location":  "SL 11",
        "Actual Seat Location":  "Seat43",
        "Monitor Cable Port":  "C-D"
    },
    {
        "Room Type":  "Standard",
        "RowNumber":  49,
        "Student Name":  "John",
        "Student#":  "RL47",
        "LAN Port #":  "LAN Port 1-2",
        "Logical Seat Location":  "SL 11",
        "Actual Seat Location":  "Seat43",
        "Monitor Cable Port":  "A-B"
    },
    {
        "Room Type":  "Standard",
        "RowNumber":  50,
        "Student Name":  "Victor",
        "Student#":  "RL35",
        "LAN Port #":  "LAN Port 7-8",
        "Logical Seat Location":  "SL 10",
        "Actual Seat Location":  "Seat33",
        "Monitor Cable Port":  "C-D"
    },
    {
        "Room Type":  "Standard",
        "RowNumber":  51,
        "Student Name":  "Victor",
        "Student#":  "RL35",
        "LAN Port #":  "LAN Port 5-6",
        "Logical Seat Location":  "SL 10",
        "Actual Seat Location":  "Seat33",
        "Monitor Cable Port":  "A-B"
    },
    {
        "Room Type":  "Standard",
        "RowNumber":  52,
        "Student Name":  "Honey",
        "Student#":  "RL42",
        "LAN Port #":  "LAN Port 3-4",
        "Logical Seat Location":  "SL 10",
        "Actual Seat Location":  "Seat33",
        "Monitor Cable Port":  "C-D"
    }
]

Feel free to drop your feedback and inputs on this page. Till then, Happy Scripting!!!

Read more about PowerShell-Community

4 comments

Discussion is closed. Login to edit/delete existing comments.

  • Mark Pflug 0

    Nice article! I came across a link to this on twitter, and as the author of some data processing libraries, I’m always on the lookout for some challenges that people encounter when dealing with Excel or CSV files. As an exercise, I wrote a C# console application using my libraries to do this same conversion. I’m pretty happy with the result, which can be viewed here: https://gist.github.com/MarkPflug/93936172c7ef32eef5080ec64c540748

    • Sonam RastogiMicrosoft employee 0

      Hi Mark, I am glad you liked it. 🙂

  • KOUJI MATSUNAGA 0

    Great Job

    • Sonam RastogiMicrosoft employee 0

      Thanks 🙂

Feedback usabilla icon