What’s up with the `CF_SYLK` and `CF_DIF` clipboard formats?

Raymond Chen

A customer asked, “Please tell us about the CF_SYLK and CF_DIF clipboard formats and if you have information on the specific format to use.”

Okay, let’s start with the first part of the question: What are these things?

The Data Interchange Format is a file format introduced by VisiCalc. VisiCalc was the first electronic spreadsheet for personal computers, a landmark product which expanded the appeal of personal computers into the business world. It was arguably the first “killer app”, the program so compelling that it by itself served as justification for buying a computer. In 1983, VisiCalc ended up being overshadowed by Lotus 1-2-3 and quickly faded into history.

The SYLK format is a file format introduced by Microsoft Multiplan, one of the company’s early forays into integrated productivity software suites. Multiplan didn’t last for long, but it lasted long enough for Windows to add a clipboard format for it. Somewhat ironically, the Microsoft product that replaced Multiplan was Excel, which also served as one of the “killer apps” for that new operating system called Windows.

Now the second part of the question: Which one should we use?

That’s easy: Neither.

There’s no point adding support for these clipboard formats. You’d be interoperating with programs that nobody uses any more.

I asked why the customer was interested in these data formats in particular. After all, there are thousands of data formats out there. Why are they so interested in learning about DIF and SYLK? The customer seems to be running around looking for things, finding out what they are, and then deciding whether those things will help their program.

I can understand reaching DIF when starting with the requirement to support interoperability with VisiCalc: “We need to interoperate with VisiCalc. What data format does it use? Oh, DIF, thanks.”

But starting with DIF and asking what it is used for is going about things backward. Start with product requirements, and then identify the things that will help you achieve those requirements. “Will the DIF data format help us achieve the goal of X?” is a reasonable question. But “Tell me about the DIF data format” is an open-ended question that takes a lot of time to research, and it’s pointless to make someone do all that research only to say, “Oh, then I’m not interested.”

Its like saying, “The documentation says that if I have a document written in the Hittite language, I can tag it with CF_HITTITE. Where can I find documentation on the Hittite language?”

Why are you writing a document in Hittite? The last native speaker of Hittite died over 3000 years ago. There may be some academic scholars who can read Hittite, but if your intent is to communicate with those scholars, you probably should use a language they are more fluent in. If you’re going to send the document to, say, the Cuneiform Studies department at the University of Chicago, then you probably want to write your document in English.

What is your use case that led you to want to add DIF support? Do you have any customers who need DIF support? The last native speaker of DIF died in 1983.

Author

Raymond Chen

Raymond has been involved in the evolution of Windows for more than 30 years. In 2003, he began a Web site known as The Old New Thing which has grown in popularity far beyond his wildest imagination, a development which still gives him the heebie-jeebies. The Web site spawned a book, coincidentally also titled The Old New Thing (Addison Wesley 2007). He occasionally appears on the Windows Dev Docs Twitter account to tell stories which convey no useful information.

16 comments

Discussion is closed. Login to edit/delete existing comments.

Ian Boyd March 10, 2020

One reason to support DIFF and SYLK is because i want to be a good developer and pay my programmer taxes.

If i’m trying to make data available to other applications through an `IDataObject`, and i already support:

– HTML
– Text
– Unicode Text
– PNG
– Bitmap handle

If it’s trival enough for me to include other formats that might help out another application: i’d like to do that:

– CSV
– Diff
– Sylk
– FileDescriptor
– FileContents
– RenPrivateAppointment
– Gif
– Jfif
John Styles March 2, 2020

I used to like DIF files. I was a bit sad when CSV files displaced them as the lowest common denominator for this sort of data interchange.
I seem to recall that the slightly weird format was so that it would work with the file IO limitations of some very primitive versions of BASIC (specifically if I recall correctly they would get upset if you were expecting a string delimited by “” and got a number or vice versa).
John Styles March 2, 2020

Amusingly
a) Sylk is also the brand name of a product you may not want to search for
b) within living memory (well into the 2010s though not in Office 365 ProPlus which is what I’m using) having a CSV file (with the CSV suffix) beginning ID, in the first line would give you a weird message about SYLK files.
Neil Rashbrook February 28, 2020

Again, not the clipboard format, but for a project at a company that I used to work for, we needed to be able to generate a file that would automatically open in Excel for the user. As I recall, CSV wasn’t quite automatic enough to keep the user happy, so we ended up creating the file in (I think) SYLK format.
Piotr Siódmak February 26, 2020

Both are documented in this not-so-long list of standard clipboard formats: https://docs.microsoft.com/en-us/windows/win32/dataxchg/standard-clipboard-formats
Emphasis on “documented” and “standard”.
Both have the least description among others. Looks like the customer was going through this list while implementing clipboard support and had a valid request of giving more information on the parts that are not well described, but worded it poorly.
- Raymond Chen Author February 26, 2020
  
  What sort of additional description would have helped the customer? Should the documentation say something like “This format is obsolete”? Who’s to say that it’s obsolete? Windows itself never used the format, but some spreadsheet programs still do (probably only reluctantly).
  - Daniel Bishop March 2, 2020
    
    How about something like "Data Interchange Format, a spreadsheet format introduced by Software Arts' VisiCalc"? Key words being "spreadsheet" and "VisiCalc" (probably a more recognizable name than "Software Arts").
    
    I'd then expect developers browsing the list to have one of three reactions:
    
    1. My program has nothing to do with spreadsheets, so I'll ignore it.
    2. VisiCalc? Now, there's a name I haven't heard in years! Surely nobody uses that format anymore.
    3. Well, I am working with spreadsheets. Is this "DIF" a format I need to support? Is it compatible with Excel?
    
    And if the customer happened...
    Read more
    How about something like “Data Interchange Format, a spreadsheet format introduced by Software Arts’ VisiCalc”? Key words being “spreadsheet” and “VisiCalc” (probably a more recognizable name than “Software Arts”).
    
    I’d then expect developers browsing the list to have one of three reactions:
    
    1. My program has nothing to do with spreadsheets, so I’ll ignore it.
    2. VisiCalc? Now, there’s a name I haven’t heard in years! Surely nobody uses that format anymore.
    3. Well, I am working with spreadsheets. Is this “DIF” a format I need to support? Is it compatible with Excel?
    
    And if the customer happened to fall into category #1 or #2, they wouldn’t bother asking you about it.
    
    Read less
  - Piotr Siódmak February 27, 2020
    
    Not sure about the specific customer, I'm just giving the benefit of a doubt here.
    The main issue, I believe, is that a vendor/application specific format has a constant value in the documentation instead of being in an application-specific range (CF_PRIVATEFIRST...CF_PRIVATELAST ?), so it must be important or a special case, right? Adding that it's a spreadsheet could help. When implementing clipboard support you want to cover all relevant formats: I want the user to be able to paste images into my application. Let's see... CF_BITMAP probably, CF_DIB could be, CF_DIBV5 it's a bitmap so yes I guess, CF_DIF what's...
    Read more
    Not sure about the specific customer, I’m just giving the benefit of a doubt here.
    The main issue, I believe, is that a vendor/application specific format has a constant value in the documentation instead of being in an application-specific range (CF_PRIVATEFIRST…CF_PRIVATELAST ?), so it must be important or a special case, right? Adding that it’s a spreadsheet could help. When implementing clipboard support you want to cover all relevant formats: I want the user to be able to paste images into my application. Let’s see… CF_BITMAP probably, CF_DIB could be, CF_DIBV5 it’s a bitmap so yes I guess, CF_DIF what’s that? Data Interchange Format, but what data? Text data? Pixels? Sound? I’ll better ask Microsoft for clarification just to be sure.
    
    Read less
  - Mike Morrison February 28, 2020
    
    Trawling the documentation for clipboard data formats to support seems like a bad idea. The clipboard formats to support should follow from the purpose of the app. If it’s to work with audio files, then you’d probably want to support CF_WAVE, maybe CF_RIFF, and anything you want pass along in the CF_PRIVATE range. If support for the DIF and SYLK format didn’t arise from the development of the program (or from it’s requirements documents), then why bother querying Microsoft for information on those formats?
  - Erik Fjeldstrom February 27, 2020
    
    Doing a bit of searching generally solves the problem for me: Wikipedia has a fairly comprehensive article on both DIF and SYLK where it’s pretty clear what the formats are. There are cases where the search results are ambiguous or non-existent, but I’d probably check on Stack Exchange or similar before spending one of my limited Microsoft support incidents.
Marshall Wells February 26, 2020

Or maybe it’s an issue of poor naming and poor documentation rather than silly customers doing things backwards.
“Data Interchange Format” and “Symbolic Link format” sound pretty useful when it comes to clipboard operations. I don’t know that I’d have identified them as decades dead spreadsheet formats either…
Tom Ballard February 26, 2020

In order to leave this comment I had to answer a disturbing question as “Yes”

That said,

https://outflank.nl/blog/2019/10/30/abusing-the-sylk-file-format/
- Matteo Italia February 27, 2020
  
  In less malicious news, I also heard about DIF being used to this day as a trivially generable format, still readable by pretty much every spreadsheet software out there.
- Ian Yates February 26, 2020
  
  Great post! Thanks for sharing it.
John Elliott February 26, 2020

I suppose it’s possible the client wondered what their program ought to do if it was asked to paste from the clipboard and encountered data in one of those formats.
- Mike Morrison February 26, 2020
  
  If the program encounters data in these formats and neither it nor the programmers know anything about these formats, then the program should do nothing. That's what I would expect of well-written programs when they encounter data in a format that they don't understand. Better that than copy/interpreting/parsing/mangling data in an unknown format. things go quickly from bad to worse once you start down that road.
  
  On the other hand, if the program genuinely needs to copy data from the clipboard in those formats, then study up on those formats and write the program to properly handle them.
  
  As...
  Read more
  If the program encounters data in these formats and neither it nor the programmers know anything about these formats, then the program should do nothing. That’s what I would expect of well-written programs when they encounter data in a format that they don’t understand. Better that than copy/interpreting/parsing/mangling data in an unknown format. things go quickly from bad to worse once you start down that road.
  
  On the other hand, if the program genuinely needs to copy data from the clipboard in those formats, then study up on those formats and write the program to properly handle them.
  
  As stated, the customer’s question wants data about the formats and then decide if they need to implement code to handle these formats. “Let me drink from the firehose of info and figure it out for myself”. That may be fine for the programmer but it’s not good for the person who responds to such a request, for the reasons that Raymond stated. I’ve been the person to respond to such requests (though not for Microsoft) and the time spent gathering the info is much better spent elsewhere.
  
  Read less