Some people have noticed that you can paste examples out of Word documents directly into a PowerShell session. Given all of the typographic tricks that Word does, this is actually much harder than it sounds. Here’s what we do. There’s a piece of code in the interpreter that takes each of the possible characters and maps it into the canonical representation for that character. So – an em-dash ([char] 0x2014) or an en-dash ([char] 0x2013) become a simple dash (0x02d). There are also predicate functions that return true it the character is a single quote, double quote or a dash. The code is (approximately):
public const char enDash = (char)0x2013;
public const char emDash = (char)0x2014;
public const char horizontalBar = (char)0x2015;
// left single quotation mark
public const char quoteSingleLeft = (char)0x2018;
// right single quotation mark
public const char quoteSingleRight = (char)0x2019;
// single low-9 quotation mark
public const char quoteSingleBase = (char)0x201a;
// single high-reversed-9 quotation mark
public const char quoteReversed = (char)0x201b;
// left double quotation mark
public const char quoteDoubleLeft = (char)0x201c;
// right double quotation mark
public const char quoteDoubleRight = (char)0x201d;
// low double left quote used in german.
public const char quoteLowDoubleLeft = (char)0x201E;
public static bool IsDash(char c)
{
return (c == enDash || c == emDash || c == horizontalBar ||
c == ‘-‘);
}
public static bool IsSingleQuote(char c)
{
return (c == quoteSingleLeft || c == quoteSingleRight ||
c == quoteSingleBase || c == quoteReversed || c == ‘\”);
}
public static bool IsDoubleQuote(char c)
{
return (c == ‘”‘ || c == quoteDoubleLeft ||
c == quoteDoubleRight || c == quoteLowDoubleLeft);
}
public static bool IsQuote(char c)
{
return (IsSingleQuote(c) || IsDoubleQuote(c));
}
Of course it’s not just Word that we want to support. We want to provide reasonable support for arbitrary applications (within the limitations of the console host for now) so if anyone sees anything we missed, please let me know.
Now, for the trivia folks in the audience who want to know what an en is, from encarta:
em dash (plural em dash·es)
noun
Definition:
long dash: in printing, a dash that is one em long
en dash (plural en dash·es)
noun
Definition:
dash one en long: in printing, a dash that is one en in length
en [ en ] (plural ens)
noun
Definition:
measure of printing width: a measure of printing width, half that of an em
em [ em ] (plural ems)
noun
Definition:
1. variable measure of type: a unit of measurement of print size, equal to the point size of the typeface being used
2. printing
Same as pica
[Late 18th century. Representing pronunciation of m because the letter is about this width]
-bruce
Bruce Payette
PowerShell Technical Lead
PSMDTAG:FAQ: Can I cut-n-paste examples from WORD documents?
PSMDTAG:PARSER: (em dash, en dash, dash) handling
0 comments