{"id":66233,"date":"2006-10-18T13:29:00","date_gmt":"2006-10-18T13:29:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2006\/10\/18\/how-can-i-split-a-string-only-on-specific-instances-of-a-character\/"},"modified":"2006-10-18T13:29:00","modified_gmt":"2006-10-18T13:29:00","slug":"how-can-i-split-a-string-only-on-specific-instances-of-a-character","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/how-can-i-split-a-string-only-on-specific-instances-of-a-character\/","title":{"rendered":"How Can I Split a String Only on Specific Instances of a Character?"},"content":{"rendered":"<p><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Question\" border=\"0\" alt=\"Hey, Scripting Guy! Question\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/q-for-powertip.jpg\" width=\"34\" height=\"34\"> \n<P>Hey, Scripting Guy! I have a series of string values that I need to convert to an array, splitting the value on the ampersand (&amp;). However, if the ampersand is followed by <B>amp <\/B>(in other words, if the string is <B>&amp;amp<\/B>) I <I>don\u2019t<\/I> want to split the string at that point. How do I do that?<BR><BR>&#8212; AK<\/P><IMG border=\"0\" alt=\"Spacer\" src=\"https:\/\/devblogs.microsoft.com\/scripting\/wp-content\/uploads\/sites\/29\/2019\/05\/spacer.gif\" width=\"5\" height=\"5\"><IMG class=\"nearGraphic\" title=\"Hey, Scripting Guy! Answer\" border=\"0\" alt=\"Hey, Scripting Guy! Answer\" align=\"left\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/a-for-powertip.jpg\" width=\"34\" height=\"34\"><A href=\"http:\/\/go.microsoft.com\/fwlink\/?linkid=68779&amp;clcid=0x409\"><IMG class=\"farGraphic\" title=\"Script Center\" border=\"0\" alt=\"Script Center\" align=\"right\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/ad.jpg\" width=\"120\" height=\"288\"><\/A> \n<P>Hey, AK. We apologize for taking so long to post the answer to your question. We actually answered this a long time ago, but every time we sent it in for editing the Scripting Editor rejected it; the whole thing sounded so crazy she assumed we\u2019d started drinking. But that\u2019s not true; we actually started drinking a little over 6 years ago.<\/P>\n<TABLE id=\"EED\" class=\"dataTable\" cellSpacing=\"0\" cellPadding=\"0\">\n<THEAD><\/THEAD>\n<TBODY>\n<TR class=\"record\" vAlign=\"top\">\n<TD>\n<P><B>Note<\/B>. Why, yes, as a matter of fact the Scripting Guy who writes this column <I>did<\/I> start working at Microsoft a little over 6 years ago. How did you know?<\/P>\n<P><I>Editor\u2019s Note: <\/I><I>Keep in mind that by \u201cdrinking\u201d we\u2019re not necessarily referring to alcoholic beverages. One too many espresso shots have been known to have adverse effects on some of the Scripting Guys. Although so has one too few shots. It\u2019s a delicate balance around here sometimes.<\/I><\/P><\/TD><\/TR><\/TBODY><\/TABLE>\n<DIV class=\"dataTableBottomMargin\"><\/DIV>\n<P>Admittedly, this <I>is<\/I> a somewhat crazy question. So let\u2019s see if we can explain the situation a little better. Based on your email we\u2019re assuming you have a string that looks something like this:<\/P>\n<P>aaa&amp;bbb&amp;amp;ccc&amp;aamp;ddd<\/P>\n<P>What you\u2019d like to do is convert this string into an array, splitting the string on the ampersand. <I>However<\/I> \u2013 and this is an important however \u2013 you don\u2019t want to split on any ampersand that\u2019s followed by the letters <B>amp<\/B>. In other words, you want to split the string only at the following locations:<\/P>\n<P>aaa<B>&amp;<\/B>bbb&amp;amp;ccc<B>&amp;<\/B>aamp;ddd<\/P>\n<P>In the end, that would give you an array with the following elements:<\/P>\n<P>aaa<BR>bbb&amp;amp;ccc<BR>aamp;ddd<\/P>\n<P>It\u2019s crazy, but we think we understand what you want to do. Granted, we\u2019re not totally sure <I>why<\/I> you want to do it. But, then again, that\u2019s not really any of our business, is it?<\/P>\n<P>Actually, we\u2019ve encountered similar situations before, particularly in Microsoft Word. On more than one occasion we\u2019ve been given documents that have a paragraph return at the end of each line. We need to replace those paragraph returns with blank spaces. However, we can\u2019t just go out and replace every single paragraph return in the document; if we did that then we\u2019d replace the legitimate paragraph returns (that is, the ones between paragraphs) as well. To solve that problem in Microsoft Word we used an approach very similar to the approach we\u2019re going to use today, which is why we considered ourselves at least semi-qualified to answer this question.<\/P>\n<P>Oh, good point: what <I>is<\/I> the approach we\u2019re going to use today? This:<\/P><PRE class=\"codeSample\">strText = &#8220;aaa&amp;bbb&amp;amp;ccc&amp;aamp;ddd&#8221;<\/p>\n<p>strText = Replace(strText, &#8220;&amp;amp&#8221;, &#8220;@@@@&#8221;)<\/p>\n<p>arrText = Split(strText, &#8220;&amp;&#8221;)<\/p>\n<p>For i = 0 to Ubound(arrText) &#8211; 1\n    arrText(i) = Replace(arrText(i), &#8220;@@@@&#8221;, &#8220;&amp;amp&#8221;)\nNext<\/p>\n<p>For Each strItem in arrText\n    Wscript.Echo strItem \nNext\n<\/PRE>\n<P>Trust us; it\u2019s <I>supposed<\/I> to look like that.<\/P>\n<P>Before the rest of you start accusing us of drinking (as his long-suffering family will attest, the Scripting Guy who writes this column is too much of a cheapskate to be a drinker <I>[yes, those lattes get expensive]<\/I>) let\u2019s explain what we\u2019re doing here. In the first line of code we take our string value and assign it to a variable named strText. In other words, everything starts out pretty easy. But then it gets a little weird:<\/P><PRE class=\"codeSample\">strText = Replace(strText, &#8220;&amp;amp&#8221;, &#8220;@@@@&#8221;)\n<\/PRE>\n<P>What\u2019s the point of <I>that<\/I> line of code? Well, as we know, we want to split the line at each and every ampersand \u2026 provided, of course, that the ampersand isn\u2019t followed by the letters <I>amp<\/I>. That means that the value <I>&amp;amp<\/I> is a problem: it has an ampersand, which means we ought to split the line at that point. However, that ampersand is followed by the letters <I>amp<\/I>, which means we <I>shouldn\u2019t<\/I> split the line at that point. Obviously what we need to do is find each ampersand, then check to see if the character is followed by the letters <I>amp<\/I>.<\/P>\n<P>But, to tell you the truth, that sounded way too hard. (It can be done, but \u2026.) And so we decided to use logic instead. If the value <I>&amp;amp<\/I> is causing us problems then the logical thing to do is to get rid of that value. That\u2019s why we used VBScript\u2019s <B>Replace<\/B> function to replace all instances of <I>&amp;amp<\/I> with <I>@@@@<\/I>. <\/P>\n<P>Before you ask, no, we didn\u2019t <I>have<\/I> to use @@@@; we could have used any string of characters that doesn\u2019t appear elsewhere in the text. Our text doesn\u2019t include the string <I>!!!!<\/I>, which means we could have used that value instead. However, our text <I>does<\/I> include the value <I>bbb<\/I>, which means that <I>bbb<\/I> would be a poor choice as a replacement value. Why? Hopefully that will become clear in just a moment.<\/P>\n<P>So what have we gained by doing this? What we\u2019ve gained is this: the value of strText is now equal to the following, with <I>@@@@<\/I> replacing any instances of <I>&amp;amp<\/I>:<\/P>\n<P>aaa&amp;bbb<B>@@@@<\/B>;ccc&amp;aamp;ddd<\/P>\n<P>That might not look like that big of a deal, but it is: after all, with the problem value <I>&amp;amp<\/I> temporarily removed we\u2019re free to split our string on all the remaining ampersands, something we do here:<\/P><PRE class=\"codeSample\">arrText = Split(strText, &#8220;&amp;&#8221;)\n<\/PRE>\n<P>In turn, that gives us an array consisting of the following items:<\/P><PRE class=\"codeSample\">aaa&amp;\nbbb@@@@;ccc\n&amp;aamp;ddd\n<\/PRE>\n<P>See? That\u2019s pretty good; in fact, if it wasn\u2019t for the value <I>@@@@<\/I> stuck in the middle of line 2 it would be perfect. But that\u2019s OK; to paraphrase an age-old parental threat, we brought <I>@@@@<\/I> into this world and we can take <I>@@@@<\/I> out of this world (or at least out of our array). In order to restore the original text we set up a For Next loop that loops through each item in the array. For each of those items we use the Replace function to replace any instances of <I>@@@@<\/I> with \u2013 you guessed it &#8212; <I>&amp;amp<\/I>. That\u2019s what this block of code is for:<\/P><PRE class=\"codeSample\">For i = 0 to Ubound(arrText) &#8211; 1\n    arrText(i) = Replace(arrText(i), &#8220;@@@@&#8221;, &#8220;&amp;amp&#8221;)\nNext\n<\/PRE>\n<P>To make a long story short, early on the value <I>&amp;amp<\/I> was causing us problems; therefore, we temporarily removed that value from the string. All we\u2019re doing now is restoring <I>&amp;amp<\/I> to its rightful place. Make sense?<\/P>\n<P>In fact, if you now echo back all the values in the array (something we do in the last block of code) you get back this:<\/P><PRE class=\"codeSample\">aaa\nbbb&amp;amp;ccc\naamp;ddd\n<\/PRE>\n<P>Which, believe it or not, is exactly what we wanted to get back.<\/P>\n<P>Whew. You know, we could really use a drink right about now.<\/P>\n<P>Of <I>water<\/I>. Sheesh.<\/P><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hey, Scripting Guy! I have a series of string values that I need to convert to an array, splitting the value on the ampersand (&amp;). However, if the ampersand is followed by amp (in other words, if the string is &amp;amp) I don\u2019t want to split the string at that point. How do I do [&hellip;]<\/p>\n","protected":false},"author":595,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[3,4,21,14,5],"class_list":["post-66233","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-scripting-guy","tag-scripting-techniques","tag-string-manipulation","tag-text-files","tag-vbscript"],"acf":[],"blog_post_summary":"<p>Hey, Scripting Guy! I have a series of string values that I need to convert to an array, splitting the value on the ampersand (&amp;). However, if the ampersand is followed by amp (in other words, if the string is &amp;amp) I don\u2019t want to split the string at that point. How do I do [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/66233","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/595"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=66233"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/66233\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=66233"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=66233"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=66233"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}