{"id":1461,"date":"2014-10-31T12:36:35","date_gmt":"2014-10-31T12:36:35","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/powershell\/2014\/10\/31\/convertfrom-string-example-based-text-parsing\/"},"modified":"2024-02-22T08:56:56","modified_gmt":"2024-02-22T16:56:56","slug":"convertfrom-string-example-based-text-parsing","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/powershell\/convertfrom-string-example-based-text-parsing\/","title":{"rendered":"ConvertFrom-String: Example-based text parsing"},"content":{"rendered":"<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<h1 style=\"margin: 12pt 0in 0pt; line-height: 17pt; list-style-type: disc;\"><span style=\"font-size: 16pt;\">Intro<\/span><\/h2>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">I\u2019m sure most of you are familiar with the powerful tools for text parsing available in PowerShell.\u00a0A <a href=\"https:\/\/www.youtube.com\/watch?v=Hkzd8spCfCU&amp;index=5&amp;list=PLfeA8kIs7Coehjg9cB6foPjBojLHYQGb_\">presentation<\/a> at the PowerShell Summit a couple of weeks ago provides a good overview of these and mentions a new Powershell cmdlet, ConvertFrom-String, that was introduced in <a href=\"http:\/\/go.microsoft.com\/fwlink\/?LinkId=398175\"><span lang=\"EN\">Windows Management Framework 5.0 Preview September 2014<\/span><\/a>.\u00a0ConvertFrom-String lets you parse a file by providing a template that contains examples of the desired output data rather than by writing a (potentially complex) script.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h1 style=\"margin: 12pt 0in 0pt; line-height: 17pt; list-style-type: disc;\"><span style=\"font-size: 16pt;\">A Simple Example<\/span><\/h2>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">The namesAndCities.input.txt attached to this post contains simple names together with cities, and namesAndCities.namesOnly.template.txt copies the first two records and wraps their names in template markup:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-family: Courier New; font-size: small;\">{Name*:Craig Trudeau} <\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-family: Courier New; font-size: small;\">Buffalo, NY<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-family: Courier New; font-size: small;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-family: Courier New; font-size: small;\">{Name*:Merle Baldridge} <\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-family: Courier New; font-size: small;\">Baltimore, MD<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">This defines the extraction of all names from the file. (In this case a single example would have worked due to the distinct formatting of the lines, but in general it is better to supply two examples to give FlashExtract a better idea of the context.) Now let\u2019s run it:<\/span><\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\">\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000; font-family: 'Courier New'; font-size: small;\">gc .\\namesAndCities.input.txt | ConvertFrom-String -templateFile .\\namesAndCities.namesOnly.template.txt <\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"color: #000000; font-family: 'Courier New'; font-size: small;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">ExtentText\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Name<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">&#8212;&#8212;&#8212;-\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 &#8212;-<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">Craig Trudeau &#8230;\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Craig Trudeau<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">Merle Baldridge &#8230;\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Merle Baldridge<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">Vicente Saul &#8230;\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Vicente Saul<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">Lydia Parsons &#8230;\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Lydia Parsons<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">Cheryl Booth &#8230;\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Cheryl Booth<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">Shannon Holland &#8230;\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Shannon Holland<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">Libby Stevens &#8230;\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Libby Stevens<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: 'Courier New'; font-size: small;\">Thomas Donnelly &#8230;\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Thomas Donnelly<\/span><span style=\"font-size: 9pt;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 12pt; color: #000000;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">The rest of this post describes how ConvertFrom-String works and develops a more full-featured address file and templates for these addresses. I\u2019ll also describe some ways to figure out what to change when you don\u2019t get the results you want.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h1 style=\"margin: 12pt 0in 0pt; line-height: 17pt; list-style-type: disc;\"><span style=\"font-size: 16pt;\">How ConvertFrom-String Works<\/span><\/h2>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">ConvertFrom-String is built on top of <a href=\"http:\/\/research.microsoft.com\/en-us\/um\/people\/sumitg\/pubs\/pldi14-flashextract.pdf\">FlashExtract<\/a>, a program-synthesis technology developed by Microsoft Research. FlashExtract uses an improved version of the substring-extraction techniques that were developed in <a href=\"http:\/\/research.microsoft.com\/en-us\/um\/people\/sumitg\/pubs\/popl11-synthesis.pdf\">Flash Fill<\/a>, which <a href=\"http:\/\/research.microsoft.com\/en-us\/um\/people\/sumitg\/flashfill.html\">ships in Excel 2013<\/a>.\u00a0 In Flash Fill, those substrings are extracted from one or more source strings and combined into a target string. FlashExtract learns substrings to perform a top-down partitioning of the file into regions that are either nested or completely non-overlapping, and then to extract the contents of some subset of those regions as the desired output strings. In ConvertFrom-String, the regions are defined by examples in the template markup, and the substrings that are extracted by FlashExtract become the values of properties on a sequence of output objects.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">The program synthesis in FlashExtract is based upon analyzing the substrings surrounding the beginning and ending of each example region and generating programs that are combinations of various primitive string operations such as regular expressions. For each region, it finds the set of these programs that are consistent with all examples for that region and ranks them. The combination of the best-ranking sub-program for each region becomes the final FlashExtract program.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h1 style=\"margin: 12pt 0in 0pt; line-height: 17pt; list-style-type: disc;\"><span style=\"font-size: 16pt;\">Defining a Structure and Fields<\/span><\/h2>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Now let\u2019s look at a more realistic address file and template. Here are the first two examples in addresses.PersonInfo.template.txt:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-family: Courier New; font-size: small;\">{PersonInfo*:{Name:Craig Trudeau} <\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-family: Courier New; font-size: small;\">{Address:{Street:4567 Main St NE}<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-family: Courier New; font-size: small;\">{[string]City:Buffalo}, {State:NY} {Zip:98052}}<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-family: Courier New; font-size: small;\">{Phone:(425) 555-0100}}<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-family: Courier New; font-size: small;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-family: Courier New; font-size: small;\">{PersonInfo*:{Name:Merle Baldridge} <\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-family: Courier New; font-size: small;\">{Address:{Street:1234 First Ave}<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-family: Courier New; font-size: small;\">{City:Baltimore}, {State:MD} {Zip:98101}}<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-family: Courier New; font-size: small;\">{Phone:(425) 555-0101}}<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">In this template we define a hierarchy with the PersonInfo structure defined at the highest level and within this the individual fields of the structure, including another structure for Address.\u00a0 In a bit more detail we are defining:<\/span><\/p>\n<p>&nbsp;<\/p>\n<ul>\n<li>\n<div class=\"MsoListParagraphCxSpFirst\" style=\"margin: 0in 0in 0pt 0.5in; line-height: 12pt; text-indent: -0.25in; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Examples of regions that define a sequence of structures named PersonInfo.\u00a0 The \u2018*\u2019 suffix defines a sequence within its parent region. In this case, the parent region is the entire file.<\/span><\/div>\n<\/li>\n<li>\n<div class=\"MsoListParagraphCxSpFirst\" style=\"margin: 0in 0in 0pt 0.5in; line-height: 12pt; text-indent: -0.25in; list-style-type: disc;\"><span style=\"font-size: 11pt;\">An example of a region that defines a non-sequence (there is no \u2018*\u2019 suffix) leaf value Name within the PersonInfo structure. <\/span><\/div>\n<\/li>\n<li>\n<div class=\"MsoListParagraphCxSpFirst\" style=\"margin: 0in 0in 0pt 0.5in; line-height: 12pt; text-indent: -0.25in; list-style-type: disc;\"><span style=\"font-size: 11pt;\">An example of a region that defines a non-sequence structure Address within the PersonInfo structure, along with the fields of Address. If all addresses in the file had exactly the same format, only the first PersonInfo definition would be necessary.\u00a0However, the second Address.Street does not have the \u201cNE\u201d suffix. If we did not define the Address structure and its Street field in the second example, our extracted addresses would only recognize Streets that had such a suffix. By supplying the second example, we tell FlashExtract to be more flexible in its extraction of Street. (In this particular example, simply defining the second PersonInfo structure, even without its Address.Street definition, results in a correct program.\u00a0However, it is safest to provide the definition for the field, to ensure that changes in other areas such as ranking will continue to give the desired results.)<\/span><\/div>\n<\/li>\n<li>\n<div class=\"MsoListParagraphCxSpFirst\" style=\"margin: 0in 0in 0pt 0.5in; line-height: 12pt; text-indent: -0.25in; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Notice the [string] type cast on City.\u00a0 This is the default, so it is merely illustrative here. As in PowerShell, you can specify a type cast to any .NET type.\u00a0 For example, to use Sort-Object on a field with integers, define it with [int] so it is sorted as integers rather than text.<\/span><\/div>\n<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h1 style=\"margin: 12pt 0in 0pt; line-height: 17pt; list-style-type: disc;\"><span style=\"font-size: 16pt;\">Debugging ConvertFrom-String<\/span><\/h2>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Now let\u2019s look at a program that FlashExtract might generate for these two examples.\u00a0 To do so, pass the -Debug parameter to the ConvertFrom-String cmdlet (\u201ccfs\u201d is the alias for ConvertFrom-String):<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000; font-family: 'Courier New'; font-size: small;\">gc .\\addresses.input.txt | cfs -templateFile .\\addresses.PersonInfo.template.txt -Debug <\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 12pt; color: #000000;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><b><span style=\"font-size: 11pt;\">Important Note: The -Debug output shown here is specific to the current preview version of ConvertFrom-String and will change in later versions. And as always in PowerShell, <\/span><\/b><a href=\"http:\/\/blogs.msdn.com\/b\/powershell\/archive\/2008\/09\/04\/text-output-is-not-a-contract.aspx\"><b><span style=\"font-size: 11pt;\">text output is not a contract.<\/span><\/b><\/a><b><\/b><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Running this gives us 8 programs, one for each field in the template.\u00a0 Here are the first two: <\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000; font-family: 'Courier New'; font-size: small;\">DEBUG: Property: PersonInfo<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000; font-family: 'Courier New'; font-size: small;\">Program: EndSSL(ESPL((StartsWith(Left parenthesis(\\(), Number([0-9]+(\\,[0-9]{3})*(\\.[0-9]+)?), Right parenthesis(\\)))): 0, 1, &#8230;: \u03b5&#8230;\u03b5, 0)Line Separator([ \\t]*((\\r)?\\n)<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000; font-family: 'Courier New'; font-size: small;\">)&#8230;Camel Case(\\p{Lu}(\\p{Ll})+), WhiteSpace(( )+), Camel Case(\\p{Lu}(\\p{Ll})+), -1)<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000; font-family: 'Courier New'; font-size: small;\">&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000; font-family: 'Courier New'; font-size: small;\">Property: Name<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000; font-family: 'Courier New'; font-size: small;\">Program: ESSL((EndsWith(Camel Case(\\p{Lu}(\\p{Ll})+), WhiteSpace(( )+), Camel Case(\\p{Lu}(\\p{Ll})+))): 0, 1, &#8230;: \u03b5&#8230;\u03b5, 1 + \u03b5&#8230;\u03b5, 0) <\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000; font-family: 'Courier New'; font-size: small;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 12pt; color: #000000;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Before we dive into the details, here\u2019s a high-level view of what\u2019s happening. FlashExtract first learns how to recognize the start and end positions of the PersonInfo structure examples.\u00a0Then it evaluates the subfield examples within each of those structure examples to learn those subfields\u2019 boundaries.\u00a0In this case, we have two examples of the Address structure, one within each of the two PersonInfo examples.\u00a0 For each Address example, FlashExtract learns programs to recognize the start and end positions of that Address example within its parent PersonInfo example, then combines these to create a single substring-recognition program that satisfies both Address examples. In the same way, we learn a substring-recognition program for each of the fields Street, City, State, and Zip within Address, and for the field Phone within PersonInfo.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Now let\u2019s look in more detail at the -Debug output, starting with PersonInfo.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">A line sequence is a subset of the lines in the file that match certain criteria. A position may be either a constant or a location in a string where the substrings to either side of that location match certain regular expressions.\u00a0 In this example, for the multiline PersonInfo region FlashExtract first learned the line sequence that identifies the end positions, then learned a function that, for each line in this sequence, backs up to identify the region start positions. (FlashExtract can also learn it in the other direction, first learning the starting position sequence and then a function that moves forward to find the ending position).\u00a0 In the above program, EndSSL is the function that drives this process, ESPL defines the ending-position line sequence, and after ESPL is the function that maps an ending position to the start position. So the PersonInfo program breaks out as:<\/span><\/p>\n<p>&nbsp;<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">EndSSL(\r\n\r\n\r\n\r\n\r\n\u00a0\u00a0 ESPL((StartsWith(\/*area code*\/)):\r\n\r\n\r\n\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0, 1, ...: \/\/ This represents a filter that accepts all\r\n\r\n\r\n\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ matching lines (starts at first position\r\n\r\n\r\n\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ and increments by one.\r\n\r\n\r\n\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 e...e, 0\u00a0\u00a0 \/\/ The end position is the last position in\r\n\r\n\r\n\r\n\r\n\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ the line.\r\n\r\n\r\n\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 )\r\n\r\n\r\n\r\n\r\n\u00a0\u00a0\u00a0 Line Separator()... \/\/ Find the start position by looking for\r\n\r\n\r\n\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ a line start\r\n\r\n\r\n\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Camel Case(), WhiteSpace(), Camel Case(), \/\/ that is followed\r\n\r\n\r\n\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ by two \"names\" separated by whitespace;\r\n\r\n\r\n\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ the start position is at the beginning of\r\n\r\n\r\n\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ the first \"name\".\r\n\r\n\r\n\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 -1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ Move backward from the end position to\r\n\r\n\r\n\r\n\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ find the match.\r\n\r\n\r\n\r\n\r\n\u00a0\u00a0 \u00a0)\r\n\r\n\r\n\r\n\r\n\u00a0<\/code><\/pre>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\">\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Notice that in the comments above, \u201cname\u201d is in quotes. So far our examples assume that a name consists of an initial uppercase letter followed by lowercase letters (sometimes called \u201cproper case\u201d). We\u2019ll see later that this is not always correct.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">As mentioned above this output will change but to help you diagnose problems in the meantime here is a list of the sequence-generating functions you might see in the current version:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><b><span style=\"font-size: 10pt;\">ESSL<\/span><\/b><span style=\"font-size: 10pt;\">: This returns a sequence of (single-line) substrings by finding a sequence of lines and extracting a substring from each line.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><b><span style=\"font-size: 10pt;\">EndSSL<\/span><\/b><span style=\"font-size: 10pt;\">: This returns a sequence of (possibly multiline) substrings by finding a sequence of ending positions, and for each ending position, finding the starting position.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><b><span style=\"font-size: 10pt;\">StartSSL<\/span><\/b><span style=\"font-size: 10pt;\">: This returns a sequence of (possibly multiline) substrings by finding <\/span><span style=\"font-size: 10pt;\">a sequence of starting positions, and for each starting position, finding the ending position.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><b><span style=\"font-size: 10pt;\">ESPL<\/span><\/b><span style=\"font-size: 10pt;\">: This returns a sequence of positions by finding a sequence of lines, and for each line, finding a position within it.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><b><span style=\"font-size: 10pt;\">SPL:<\/span><\/b><span style=\"font-size: 10pt;\"> This returns a sequence of positions. It takes four parameters: re1, re2, init, incr. SPL finds all positions that match regex re1 on its left and match regex re2 on its right. From this sequence it selects every incr\u2019th item starting at index init. <\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Why does FlashExtract break the file into lines?\u00a0 Consider the following:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 10pt;\">$123 one two three four {CapitalLetters*:ABC} five six seven eight<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 10pt;\">123 put four words here DEF and another four here<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 10pt;\">123 2001 2002 2003 2004 GHI 2005 2006 2007 2008\n$123 eleven twelve thirteen fourteen {CapitalLetters*:JKL} fifteen sixteen seventeen eighteen <\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Here we want to capture CapitalLetters only if the line starts with $. However, learning the start and end positions of CapitalLetters can be done over a much shorter span (\u201cextract an all-capital sequence that is between two lower-case alphabetical sequences\u201d), which would mistakenly capture the DEF line.\u00a0 By splitting the file into lines, we can use shorter ranges on both line selection and position selection.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Now that we\u2019ve got the outer structure, let\u2019s look at the Name field inside it.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000; font-family: 'Courier New'; font-size: small;\">Property: Name<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000; font-family: 'Courier New'; font-size: small;\">Program: ESSL((EndsWith(Camel Case(\\p{Lu}(\\p{Ll})+), WhiteSpace(( )+), Camel Case(\\p{Lu}(\\p{Ll})+))): 0, 1, &#8230;: \u03b5&#8230;\u03b5, 1 + \u03b5&#8230;\u03b5, 0)<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 12pt; color: #000000;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">This breaks out as:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">ESSL((EndsWith(\/*&#8221;name&#8221;, whitespace, &#8220;name&#8221;)): \/\/ find lines that end with this pattern<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span style=\"font-size: 11pt;\">0, 1, &#8230;: \/\/ accept all matching lines<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span style=\"font-size: 11pt;\">e&#8230;e, 1\u00a0\u00a0 \/\/ For each matching line, the start position is the first occurrence of an empty string<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span style=\"font-size: 11pt;\">+\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \/\/ (separate start and end positions)<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span style=\"font-size: 11pt;\">e&#8230;e, 0)\u00a0 \/\/ and the end position is the end of the line<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Now let\u2019s see this in action.\u00a0 Add Format-Table to the command line.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">PersonInfo\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">&#8212;&#8212;&#8212;-\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">{@{ExtentText=Craig Trudeau &#8230;\u00a0\u00a0\u00a0 <\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">{@{ExtentText=Merle Baldridge &#8230;\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">{@{ExtentText=Los Angeles, CA 98102&#8230;<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">{@{ExtentText=Randolph LaBelle&#8230;<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">{@{ExtentText=Lydia Parsons &#8230;<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">{@{ExtentText=Cheryl Booth &#8230;<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">{@{ExtentText=Shannon Holland &#8230;<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">{@{ExtentText=San Diego, CA 98107&#8230;<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">{@{ExtentText=Hannah McStorey&#8230;<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: 'Courier New'; font-size: small;\">{@{ExtentText=Thomas Donnelly &#8230;<\/span>\u00a0\u00a0\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 12pt; color: #000000;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Notice that we have \u201cLos Angeles\u201d and \u201cSan Diego\u201d where we expect names.\u00a0 These city names contain a space, so they match the beginning position program for PersonInfo.\u00a0 Let\u2019s provide another example:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">{PersonInfo*:Vicente Saul <\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">{Address:2345 Second Ave SE<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">{City:Los Angeles}, CA 98102}<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">(425) 555-0102}<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Because the field we\u2019re concerned about providing another example for is City, we only need to provide its direct hierarchy; we don\u2019t need Name, Street, State, etc.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Now the Format-Table output looks good.\u00a0 Let\u2019s dig in a bit more with Format-List.\u00a0 Now we see a couple of incorrect names:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000; font-family: 'Courier New'; font-size: small;\">PersonInfo : {@{ExtentText=Randolph LaBelle<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 3456 Third Ave<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Fargo, ND 98103<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (425) 555-0183; Name=3456 Third Ave; Phone=(425) 555-0183}}<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000; font-family: 'Courier New'; font-size: small;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000; font-family: 'Courier New'; font-size: small;\">PersonInfo : {@{ExtentText=Hannah McStorey<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 8901 Pine St<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Portland, OR 98108<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000;\"><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (425) 555-0108; Name=8901 Pine St; Address=; Phone=(425) 555-0108}}<\/span><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"font-size: 9pt; color: #000000;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 12pt; color: #000000;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">As mentioned earlier, learning that a name has an uppercase letter only at the beginning is not always correct.\u00a0 We\u2019ll add one more example, again defining only the fields necessary to resolve the ambiguity:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">{PersonInfo*:{Name:Randolph LaBelle}<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">3456 Third Ave<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Fargo, ND 98103<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">(425) 555-0183}<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">With this, we can see that the full output with Format-Custom (or your favorite formatting command) is correct.\u00a0 In fact, we can now remove the example for Merle Baldridge because the example we added for Randolph LaBelle has no suffix on the street.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Now let\u2019s see how we can write the examples even more easily.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h1 style=\"margin: 12pt 0in 0pt; line-height: 17pt; list-style-type: disc;\"><span style=\"font-size: 16pt;\">Defining Implicit Structures<\/span><\/h2>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Above, we defined the PersonInfo structure explicitly, with a name and boundaries.\u00a0 This is not always necessary.\u00a0 FlashExtract can often infer the boundaries of a parent structure if the first subfield of that structure is defined as a sequence.\u00a0 In this case, FlashExtract learns an implicit region for the parent structure that extends from the beginning of one instance of the first subfield to just before the beginning of the next instance.\u00a0 The attached file addresses.ImplicitStruct. template.txt modifies our example above to illustrate this:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">{Name*:Craig Trudeau} <\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">{Address:{Street:4567 Main St NE}<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">{City:Buffalo}, {State:NY} {Zip:98052}}<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">{Phone:(425) 555-0100}<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">{Name*:Vicente Saul} <\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">2345 Second Ave SE<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">{!Name*:Los Angeles}, CA 98102<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">(425) 555-0102<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">{Name*:Randolph LaBelle}<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">3456 Third Ave<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Fargo, ND 98103<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">(425) 555-0183<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Now there is no PersonInfo field defined, and Name has become a sequence (with the \u2018*\u2019 suffix). This also illustrates another aspect of field definitions, a negative example.\u00a0Without the {!Name*:Los Angeles} definition, \u201cLos Angeles\u201d matches a line-starting expression that will extract it as a name. As before, we need Randolph LaBelle as an example of capitalization inside a name.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">The only difference in -debug output from the final template file with PersonInfo is that we don\u2019t have a PersonInfo property here, and Name has a different program:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000; font-family: 'Courier New'; font-size: small;\">DEBUG: Property: Name<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\"><span style=\"color: #000000; font-family: 'Courier New'; font-size: small;\">Program: ESSL((SucceedingStartsWith(Number([0-9]+(\\,[0-9]{3})*(\\.[0-9]+)?), WhiteSpace(( )+), Camel Case(\\p{Lu}(\\p{Ll})+))): 0, 1, &#8230;: \u03b5&#8230;\u03b5, 1 + \u03b5&#8230;\u03b5, 0)<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 12pt; color: #000000;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Our negative example for \u201cLos Angeles\u201d resulted in the best-ranked program changing from looking for a line starting with Camel Case to looking for a line followed by a line that starts with a number.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">Although these examples don\u2019t show it, it is possible to define sequences within any parent structure.\u00a0This includes defining nested sequences within an implicit parent structure:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 11pt;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span>{Name*:\u2026}<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 {Phone*:\u2026}<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 {Phone*:\u2026}<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 {Name*:\u2026}<\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNoSpacing\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">However, it is often easier to remove ambiguity by defining explicit regions. This is particularly important when there may be zero instances of a child sequence within a parent sequence.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h1 style=\"margin: 12pt 0in 0pt; line-height: 17pt; list-style-type: disc;\"><span style=\"font-size: 16pt;\">Syntax Summary<\/span><\/h2>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">As you can probably piece together from the examples above and the release notes, the syntax of a template field specification is:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><span style=\"font-family: Courier New;\"><span style=\"font-size: small;\">{[<i>optional-typecast<\/i>]<i>name<\/i><i><sup><span style=\"line-height: 11pt;\">sequence-spec<\/span><\/sup><\/i><\/span><\/span><span style=\"font-size: 11pt;\"><span style=\"font-family: Courier New; font-size: small;\">:example-value}<\/span> <\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">The field specification is enclosed with curly braces. If there are curly braces already in the file they must be escaped by adding a \u2018\\\u2019 before them (and any \u2018\\\u2019 characters already in the file must be escaped by doubling them).\u00a0 <i>Optional-typecast<\/i> is the usual PowerShell type-cast syntax, a .NET type within square brackets (such as [string] or [int]).\u00a0 <i>Name<\/i> is the name of the field.\u00a0 <i>Sequence-spec<\/i> is \u201c*\u201d if the field will be a sequence (i.e. will have multiple instances) within its parent, else empty.\u00a0 All of these are text that is added to the actual data in the template file.\u00a0\u201cExample-value\u201d is the actual data.\u00a0This data starts with the character immediately after the \u201c:\u201d following <i>name<\/i> and ends with the character immediately preceding the field\u2019s closing \u201c}\u201d, including all whitespace.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">As illustrated above, fields may be nested; this is done by creating a field specification within the \u201cexample-value\u201d of the enclosing field.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 8pt; line-height: 12pt; list-style-type: disc;\"><span style=\"font-size: 11pt;\">ConvertFrom-String does not recognize regular expressions in the field value; it interprets them as the literal string.\u00a0 For example, <a href=\"http:\/\/www.lazywinadmin.com\/2014\/09\/powershell-convertfrom-string-and.html\">this LazyWinAdmin post<\/a> provides a very nice example of using ConvertFrom-String with the output of netstat.\u00a0 However, the use of \u201c{State:\\s}\u201d works not because it is recognized as a regular expression, but because it tries to match against the literal string \u201c\\s\u201d, which happens to result in FlashExtract selecting a program that returns an blank field.\u00a0 Using \u201c{State:\\q}\u201d or \u201c{State:#}\u201d works as well.\u00a0 The latter, because it does not have an alphabet character, learns a program that avoids a problem in the template that (as of this writing) converts \u201cTIME_WAIT\u201d to \u201cWAIT\u201d; this could also be fixed by adding examples of State with the \u201c_\u201d character.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h1 style=\"margin: 12pt 0in 0pt; line-height: 17pt; list-style-type: disc;\"><span style=\"font-size: 16pt;\">Common Problems<\/span><\/h2>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 10pt;\">Many problems are due to over-learning:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoListParagraphCxSpFirst\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; text-indent: -0.25in; list-style-type: disc;\"><span style=\"font-size: 10pt;\">\u00b7<\/span><span style=\"line-height: normal;\"><span style=\"font-size: 7pt;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><\/span><span style=\"font-size: 10pt;\">Specific case.\u00a0 Sometimes a word will begin with lowercase when your examples are all uppercase, or as above, an uppercase letter will be in the middle of a word.\u00a0 Or there may be special characters like underscore, apostrophe, or hyphen.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoListParagraphCxSpLast\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; text-indent: -0.25in; list-style-type: disc;\"><span style=\"font-size: 10pt;\">\u00b7<\/span><span style=\"line-height: normal;\"><span style=\"font-size: 7pt;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><\/span><span style=\"font-size: 10pt;\">A program may learn to search for only a specific number of characters, or for a specific string, if all the examples are the same in certain areas. For example, I recently parsed a file that used spaces for alignment; both my examples had one digit followed by 9 spaces, so it missed lines where 2 digits were followed by 8 spaces.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 10pt;\">Over-learning can be fixed by adding diverse examples to relax the restrictions. In other cases, as with \u201cLos Angeles\u201d above, the learned program may be too lenient and you may need to specify a negative example.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 10pt;\">Other things to look for:<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoListParagraphCxSpFirst\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; text-indent: -0.25in; list-style-type: disc;\"><span style=\"font-size: 10pt;\">\u00b7<\/span><span style=\"line-height: normal;\"><span style=\"font-size: 7pt;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><\/span><span style=\"font-size: 10pt;\">Be sure that spaces in the template file match those in the data file.\u00a0 Sometimes there may be trailing spaces or spaces between fields in the original file, and these may be relevant in learning.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoListParagraphCxSpLast\" style=\"margin: 0in 0in 0pt 0.5in; line-height: normal; text-indent: -0.25in; list-style-type: disc;\"><span style=\"font-size: 10pt;\">\u00b7<\/span><span style=\"line-height: normal;\"><span style=\"font-size: 7pt;\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <\/span><\/span><span style=\"font-size: 10pt;\">If you are having trouble getting the correct region boundaries with implicit regions, consider adding explicit regions.<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 10pt;\">We would love to know how ConvertFrom-String meets your needs or where it does not do what you expect.\u00a0 Please give it a workout and send your examples, problems, suggestions, and other feedback to psdmfb&#8211;at&#8211;microsoft.com. Happy parsing!<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 12pt;\">\u00a0<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 10pt;\">Ted Hart [MSFT]<\/span><\/p>\n<p>&nbsp;<\/p>\n<p class=\"MsoNormal\" style=\"margin: 0in 0in 0pt; line-height: normal; list-style-type: disc;\"><span style=\"font-size: 10pt;\">Microsoft Research<\/span><\/p>\n<p>&nbsp;<\/p>\n<div id=\"scid:fb3a1972-4489-4e52-abe7-25a00bb07fdf:05a906b6-09cd-4122-97b2-3101533ea958\" class=\"wlWriterEditableSmartContent\" style=\"margin: 0px; padding: 0px; float: none;\">\n<p>Download: <a href=\"https:\/\/web.archive.org\/web\/20190110114705\/https:\/\/msdnshared.blob.core.windows.net\/media\/MSDNBlogsFS\/prod.evol.blogs.msdn.com\/CommunityServer.Blogs.Components.WeblogFiles\/00\/00\/00\/63\/74\/metablogapi\/1780.ConvertFrom-String-Examples_0AF1A296.zip\" target=\"_blank\" rel=\"noopener\">ConvertFrom-String Examples<\/a><\/p>\n<\/div>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; &nbsp; Intro &nbsp; I\u2019m sure most of you are familiar with the powerful tools for text parsing available in PowerShell.\u00a0A presentation at the PowerShell Summit a couple of weeks ago provides a good overview of these and mentions a new Powershell cmdlet, ConvertFrom-String, that was introduced in Windows Management Framework 5.0 Preview September 2014.\u00a0ConvertFrom-String [&hellip;]<\/p>\n","protected":false},"author":600,"featured_media":13641,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1461","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-powershell"],"acf":[],"blog_post_summary":"<p>&nbsp; &nbsp; Intro &nbsp; I\u2019m sure most of you are familiar with the powerful tools for text parsing available in PowerShell.\u00a0A presentation at the PowerShell Summit a couple of weeks ago provides a good overview of these and mentions a new Powershell cmdlet, ConvertFrom-String, that was introduced in Windows Management Framework 5.0 Preview September 2014.\u00a0ConvertFrom-String [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/powershell\/wp-json\/wp\/v2\/posts\/1461","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/powershell\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/powershell\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/powershell\/wp-json\/wp\/v2\/users\/600"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/powershell\/wp-json\/wp\/v2\/comments?post=1461"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/powershell\/wp-json\/wp\/v2\/posts\/1461\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/powershell\/wp-json\/wp\/v2\/media\/13641"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/powershell\/wp-json\/wp\/v2\/media?parent=1461"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/powershell\/wp-json\/wp\/v2\/categories?post=1461"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/powershell\/wp-json\/wp\/v2\/tags?post=1461"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}