Culture data shouldn’t be considered stable

Shawn Steele

I thought I’d start off with a topic I’ve discussed before on my old blog.  It comes up every once in a while, so it doesn’t hurt to have a reminder and update.

“Culture data shouldn’t be considered stable”

Computers like to manipulate data, but eventually that data needs to be presented in a form that a human person can easily understand.  Culture, aka Locale, Region, or Language information, is required so that programs can present data in a human readable form.  Things like 3/12/21 vs 12-3-2021, or March vs März vs marzo.  1,21 compared to 1.21, etc.  Values often need to be displayed in a form that actual humans expect for their culture.

It is fairly easy for a developer to learn and understand that users in some regions have different expectations.  Preference of m/d/y or d/m/y date formats is probably one of the first localization things most developers discover.  What is less obvious is that the preference may not be the same next year or week.  Or, worse, that there are less obvious cultural variations, and those might vary as well.

Some people might learn about a cultural change, such as moving Daylight Savings Time, or the revaluation of a currency.  Those might be newsworthy enough to notice, particularly if its a culture of interest to the developer.  These events can make it appear that these changes are rare, or even historical, events.  It is easy to miss the impact of cultural data changes on modern applications, particularly those with a world-wide audience.

Cultural Expectations

Locale/Culture data represents a cultural, regional, admin and/or user preference for cultural expectations.  Applications should NOT make any assumptions that rely on this data being stable.  This premise holds regardless of the operating system or platform the application is using.  .Net relies on the OS data.  Windows typically relies on “NLS” (National Language Support) data that has been collected over years.  Some applications use ICU (International Components for Unicode).  Most other OS’s base their support of a version of ICU tailored for their business needs.

Right off the bat, that paragraph should make it obvious that all of these components and applications are getting data from different sources, and so it follows pretty easily that the information one piece of software provides may differ on another platform or application.  What is less obvious is that this data can change over time.

Reasons Cultural Data Changes

There are many reasons that culture data can change, here are a few:

  • The most obvious reason is that there is a bug in the data that was corrected.  (Believe it or not platforms make mistakes ;-))  In this case our users (and yours too) want culturally correct data, so we have to fix the bug even if it breaks existing applications.
  • Another reason is that cultural preferences can change.  There’re lots of ways this can happen, but it does happen:
    • Global awareness, cross cultural exchange, the changing role of computers and so forth can all effect a cultural preference.
    • International treaties, trade, etc. can change values.  The adoption of the Euro changed many countries currency symbol to €.
    • National or regional regulations can impact these values too.
    • Preferred spelling of words can change over time.
    • Preferred date formats, etc. can shift to attempt to address ambiguity or to better fit their neighbors.
    • For many folks it is hard to imagine this stuff changing, one of the most obvious that some people may have encountered is changing preferences for Daylight Savings Time.
  • Multiple preferences may exist for a culture.  The preferred best choice can then change over time.
  • Data may be subject to periodic changes that make users feel like it is stable in the moment.  However, that data may have always been expected to shift.
    • Early developers used a convention of 2 digit year forms.  With the year 2000, “Y2K” made it obvious that 4 digit years were needed in many cases.  Twenty years later people are shifting back to the 2 digit abbreviations.
    • The Japanese Calendar adds Eras, which is typically a generational event making the era seems stable in the moment.  But then shifts like the addition of the Reiwa era remind users and developers that the perceived stability would one day change.
  • Users could have overridden some values, like date or time formats.  Some platforms allow requesting locale data without these user overrides, however we recommend that applications respect user preferences as those indicate what the user indicated they desired.  Apps shouldn’t be second guessing the user’s cultural needs.
  • Users or administrators could have created a replacement culture, replacing common default values for a culture with company specific, regional specific, or other variations of the standard data.
    • Some cultures may have preferences that vary depending on the setting.  A business might have a more formal form than an Internet Café.
    • An enterprise may require a specific date format or time format for the entire organization.
    • One obvious case is a 12 hour or 24 hour clock preference in locales where either can be used.
  • Differing versions of the same custom culture, or one that’s custom on one machine and a windows only culture on another machine.
  • Data could originate on different machines, devices, platforms or architectures.  Even when they use the same source for cultural information, those systems could be on different revisions.

Pitfalls of Changing Culture Information

This topic probably wouldn’t be as interesting if there weren’t some serious traps that apps can fall into when they don’t consider the shifting nature of culture data.

For example, a common operation is to format a string with a particular date format.  Then the app may want to try to parse that string value later, returning the original date.  What happens if the machine changed, if the framework version changed (newer data), if the platform changed, if a custom culture was modified, or even if the user just changes their preference from M/D/Y to D/M/Y?

Apps that persist data in a human form and try to recover that in a machine format later are at risk.  The form that is useful for a person is typically more ambiguous than something a computer would try to consume.

Avoiding the Trap of Changing Information

There are some patterns that developers can use to avoid difficulty with changing cultural preferences.

Remember the Audience

Machines need unambiguous representations of data.  Humans need data formatted in the manner they are accustomed to.

Techniques for Machines

Oftentimes machine data is stored in a well-defined binary form.  Other times it’s exchanged through XML or json type formats.  The key point is to ensure that the data is stored in a well-defined and consistent format.  Oftentimes this is a standards based format, like ISO 8601 date formats.

Particular care should be used when creating new protocols and storage mechanisms.  It can be unfortunate to allow a dependency on a linguistic locale and then find that the behavior shifts over time.

Machine compatible data should be processed in a non-linguistic manner for consistency.  For string formatting in .Net and Windows, an “Invariant” Culture (Locale) is available.  The expectation is that CultureInfo.InvariantCulture provides stable formatting over time.  (Although even InvariantCulture can shift for comparisons (collation)).

I prefer formats that are explicit and can be handled through something like a simple sscanf() call rather than a more complex parser – though sscanf still needs to be be sure to use the “C” locale to avoid problems like variations of decimal separators.

Whether snapping to an existing standard or creating your own data type, the key point for machine readable data is to ensure that the format is explicitly defined and consistent.

Making Humans Happy

Of course you can’t have both “correct” display for the current user and perfect round tripping if the culture data changes. The earlier machine techniques help prevent problems corruption of data, but may not be great for humans.

The key point for human presentation is to recognize when the applications context has moved on from data storage to presentation.  When finally presenting data to the user is the appropriate time to use the culture specific behavior to satisfy the expectations of the user.

After making the data pretty and formatting it for human presentation, the application should then recognize that the human formatted data is no longer appropriate for machine processing and interchange.  If a bunch of subsidiaries are collecting data and sending it up to their headquarters, they should likely transmit the machine readable data rather than the human formatted reports.  Particularly if those subsidiaries are in different locales with different expectations.

Conclusion

Now we’re back to the beginning:  “Culture data shouldn’t be considered stable.”  By keeping the context in mind, we prevent problems and errors exchanging and interpreting computerized data.  Go ahead and show real people the pretty data that they expect, but make sure the machines on the back end have the orderly versions they need.

 

3 comments

Leave a comment

  • Avatar
    Hakan Fostok

    Thank you very much from the bottom of my heart,
    This topic bites me multiple times and I learned that the hard, harsh way.
    Would I read this earlier in my career, It would be really better.
    Reading this article was articulating with a loud voice, in a very nice way, the ideas that I came to after multiple I fall for traps multiple times.
    Hope a lot of developers will read this and avoid the same mistake.