How do I retrieve an extremely large range of cells from Excel if the clipboard functions all time out?

Raymond Chen

We’ve been looking at what happens when a program is slow to generate delay-rendered clipboard data, and what you can do to give the program more time. But we got distracted by the question and lost sight of the problem.

The original customer problem was extracting a large range of cells, and their solution was to launch Excel, programmatically select a bunch of cells, and then copy them to the clipboard, so that another program could copy them out.

If you look at it, you’ll realize that this is using a global solution to a local problem. This automated process is using the shared clipboard as a way to transfer data between processes, which means that it will damage whatever was on the clipboard previously. And the user can’t use the clipboard while this automated process is running.

My guess is that the automated process is running on a dedicated computer with no interactive user present, so these problems are deemed inconsequential. But it also means that you want to run two copies of this automation, you need two computers!

I think a better way to solve this problem is to use the Excel object model to load the spreadsheet, select the range, and then copy the values from the range. You can then take those values and put them into your own data structures to do whatever it was that you wanted to do.

using Microsoft.Office.Interop;

var excel = new Excel.Application();
excel.Workbooks.Open("awesome.xls", ReadOnly: true);
var sheet = excel.ActiveWorkbook.ActiveSheet;
var cellA1 = sheet.Cells[1, 1];
Console.WriteLine(cellA1.Text);
excel.Quit();

The advantage of this approach is that you are no longer subject to the whims of the clipboard. You are talking directly to Excel to get the data you want.

Another option is to use the object model to copy the desired range to a new worksheet, and then save the worksheet as CSV (say), and then parse the CSV file, which will be much easier to parse than a full spreadsheet.

Bonus chatter: I suspect their system came together when somebody wanted to automate a tedious business process and cobbled together something that “worked on my machine”, back when the spreadsheet was relatively small, like, maybe a few thousand rows. This lunch-hour hack became popular and eventually became part of the company’s critical workflow. Now that the spreadsheet has grown from a few thousand rows to hundreds of thousands of rows, the “quick and dirty hack” doesn’t quite hold up, and the customer will have to invest in a solution that has better scalability.

10 comments

Discussion is closed. Login to edit/delete existing comments.

  • Valts Sondors 0

    Should be also mentioned: there are nowadays plenty of sturdy open-source libraries out there that can deal with Excel files. That’s both more efficient than spinning up a full Excel process in the background, and cheaper.

    • Dan Bugglin 0

      Yup, the solution Raymond posted is the traditional one but requires Excel installed on the current PC. Plus IIRC the ActiveX object you add to your project is specific to that version (or at least sets a minimum version of Office, I forget). And Intellisense is fairly useless when trying to figure out how things work IIRC.

      I have used OpenXML which will work with the new format *.???x files, though it’s very low level (basically you’re working one level above the underlying raw XML files). ClosedXML is built on top of OpenXML and looks much more user friendly though I have not used it yet.

    • David Wolff 0

      You can even do this in VBScript (with just about the same number of lines of code).

    • Raymond ChenMicrosoft employee 0

      True. I was assuming the customer really needed full Excel, say, because the file contains macros to drive a database connection and fetch the 300,000 rows of data. After all, if the data were static, they could just “Save As: CSV” and read it out that way.

    • Dmitry 0

      The disadvantage is that custom solutions based on 3rd party libraries might become incapable of handling files of newer versions due to differences and quirks of newer Excel implementation vs custom one. Plus, the .xls format is still a thing for many reasons.

      • Mike Morrison 0

        Open-source libraries can be updated to match any future changes in the Office XML file formats. .XSLX and .XLSM also replace the crusty old XLS format nearly everywhere.

    • Alejandro Carrazzoni 0

      Back at my previous job, there was an issue where Devexpress generated corrupt .xls files that Excel could open fine, but open-source libraries couldn’t. The fix was to open the .xls file in Excel using the object model as shown here and save it as CSV, then parse the CSV.

  • Neil Rashbrook 0

    Maybe this business process is so old it was written in a language that didn’t support COM and instead worked by sending keystrokes to Excel? (At least the export to CSV option is still doable this way, I guess.)

  • Erik Fjeldstrom 0

    For data transmission, there’s also good old crufty DDE. Many fun times were had communicating with Excel over that!

  • Ismo Salonen 0

    I once did something like this: via COM add vba code which writes the data to clipboard as xml and then posts message to the waiting app after it is ready.
    No timeout but the problem with two simultabeous programs rinning is still present ( but still theoretical)-
    it could use some other medium than clipboard for data transfer but no problems in 20 years 🙂

Feedback usabilla icon