{"id":56273,"date":"2008-02-05T22:39:00","date_gmt":"2008-02-05T22:39:00","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/heyscriptingguy\/2008\/02\/05\/hey-scripting-guy-how-can-i-count-the-number-of-sentences-and-paragraphs-in-an-office-word-document\/"},"modified":"2008-02-05T22:39:00","modified_gmt":"2008-02-05T22:39:00","slug":"hey-scripting-guy-how-can-i-count-the-number-of-sentences-and-paragraphs-in-an-office-word-document","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/scripting\/hey-scripting-guy-how-can-i-count-the-number-of-sentences-and-paragraphs-in-an-office-word-document\/","title":{"rendered":"Hey, Scripting Guy! How Can I Count the Number of Sentences and Paragraphs in an Office Word Document?"},"content":{"rendered":"<p><img decoding=\"async\" class=\"nearGraphic\" title=\"Hey, Scripting Guy! Question\" height=\"34\" alt=\"Hey, Scripting Guy! Question\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/q-for-powertip.jpg\" width=\"34\" align=\"left\" border=\"0\" \/> <\/p>\n<p>Hey, Scripting Guy! I just read <a href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/resources\/officetips\/mar05\/tips0324.mspx\"><b>your article<\/b><\/a> about getting Word documents statistics and would like to know if there is any way of getting the number of sentences and paragraphs per document.<\/p>\n<p>&#8212; RLP<\/p>\n<p><img decoding=\"async\" height=\"5\" alt=\"Spacer\" src=\"https:\/\/devblogs.microsoft.com\/scripting\/wp-content\/uploads\/sites\/29\/2019\/05\/spacer.gif\" width=\"5\" border=\"0\" \/><img decoding=\"async\" class=\"nearGraphic\" title=\"Hey, Scripting Guy! Answer\" height=\"34\" alt=\"Hey, Scripting Guy! Answer\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/29\/2019\/02\/a-for-powertip.jpg\" width=\"34\" align=\"left\" border=\"0\" \/><a href=\"http:\/\/go.microsoft.com\/fwlink\/?linkid=68779&amp;clcid=0x409\"><img decoding=\"async\" class=\"farGraphic\" title=\"Script Center\" height=\"288\" alt=\"Script Center\" src=\"http:\/\/img.microsoft.com\/library\/media\/1033\/technet\/images\/scriptcenter\/ad.jpg\" width=\"120\" align=\"right\" border=\"0\" \/><\/a> <\/p>\n<p>Hey, RLP. To be perfectly honest, now that the Super Bowl is over the Scripting Guy who writes this column is more relieved than he is happy. The Scripting Guy who writes this column is not a New York Giants fan, and under normal circumstances he would never <i>ever<\/i> root for the Giants. <\/p>\n<table class=\"dataTable\" id=\"EFD\" cellSpacing=\"0\" cellPadding=\"0\">\n<thead><\/thead>\n<tbody>\n<tr class=\"record\" vAlign=\"top\">\n<td class=\"\">\n<p><b>Note<\/b>. But suppose alien invaders showed up on Earth, challenged the Giants to a football game, and said they would destroy the planet if the Giants lost; wouldn\u2019t the Scripting Guy who writes this column root for the Giants if the very fate of the Earth was at stake?<\/p>\n<p>You know what? That\u2019s probably not going to happen, so maybe we shouldn\u2019t even speculate on that.<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div class=\"dataTableBottomMargin\"><\/div>\n<p>As many of you know, however, this year\u2019s Super Bowl was not played under normal circumstances; instead, the New England Patriots were on the brink of being declared the greatest football team \u2013 heck, the greatest sports team \u2013 heck, the greatest collection of human beings who ever lived or ever will live. The <i>New England<\/i><i> Patriots<\/i>? Say it isn\u2019t so! The truth is, the Scripting Guy who writes this column just couldn\u2019t deal with that prospect. Therefore, with desperate times calling for desperate measures, he bit his lip, held his nose, and rooted for the New York Giants. <\/p>\n<p>And, surprisingly enough, the Giants actually won. Which figures. The teams that the Scripting Guy who writes this column really, truly roots for \u2013 the Washington Huskies, the Seattle Mariners, the Seattle Seahawks \u2013 <i>never<\/i> seem to win. But the team he roots for merely as the lesser of two evils, well, they win.<\/p>\n<p>But, still, anyone but the Patriots, right?<\/p>\n<p>Well, no, not the Dallas Cowboys; no way. The Oakland Raiders? Heavens no! And, no, not the New York Jets; we don\u2019t really like the Jets. And definitely not \u2026. <\/p>\n<p>At any rate, it was an exciting game, and a lot of fun to watch. As for the rest of the day\u2019s festivities, the Scripting Guy who writes this column (as always) skipped the pregame shows; he made dinner during the halftime show (although he <i>does<\/i> like Tom Petty and the Heartbreakers); and he had a lot of trouble figuring out the meaning behind most of the &#8230; innovative \u2026 commercials that were telecast during the game. In fact, if anyone can explain why watching a dog slurp from a water bowl for 30 seconds makes you want to rush out and buy Gatorade, well, <a href=\"mailto:scripter@microsoft.com\"><b>drop us a line<\/b><\/a> and let us know.<\/p>\n<p>While we wait for than explanation we might as well see if we can solve RLP\u2019s problem. (We have a feeling it\u2019s going to be awhile before anyone can come up with anything.) Before we show you any code, however, we need to note that today\u2019s task is more difficult than you might think; in fact, depending on your point of view, it might be downright impossible. For example, how many sentences would you say are in the following paragraph:<\/p>\n<table class=\"\" cellSpacing=\"0\" cellPadding=\"0\" border=\"0\">\n<tbody>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>The highlight of the evening was an appearance by Dr. Ken Myer.<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Most people would say there\u2019s just one sentence here. However, Microsoft Word is going to insist that there are <i>two<\/i> sentences. Why? Because in Word\u2019s view a sentence consists of an ending punctuation mark (like a period, question mark, or exclamation mark) followed by a blank space or paragraph return. Consequently, Word thinks the preceding paragraph contains two sentences:<\/p>\n<table class=\"\" cellSpacing=\"0\" cellPadding=\"0\" border=\"0\">\n<tbody>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>The highlight of the evening was an appearance by Dr. <\/p>\n<\/td>\n<\/tr>\n<tr>\n<td class=\"listBullet\" vAlign=\"top\">\u2022<\/td>\n<td class=\"listItem\">\n<p>Ken Myer.<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>To be honest, there\u2019s no good way to work around this problem, at least not until computers fully understand English. That means that Word (or any custom regular expression you try to come up with) is almost always going to overestimate the number of sentences in a document. That\u2019s something you\u2019ll just have to learn to live with.<\/p>\n<table class=\"dataTable\" id=\"E2E\" cellSpacing=\"0\" cellPadding=\"0\">\n<thead><\/thead>\n<tbody>\n<tr class=\"record\" vAlign=\"top\">\n<td class=\"\">\n<p class=\"lastInCell\"><b>Note<\/b>. One thing you <i>could<\/i> do is a run a few tests using typical Word documents. You might find that, for you documents, Word consistently says there are 5% more sentences than there really are. In that case, you could add some code that automatically makes that adjustment when reporting back the number of sentences in a document. But, needless to say, that\u2019s up to you.<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div class=\"dataTableBottomMargin\"><\/div>\n<p>The point is, sentences can pose a bit of a problem. Paragraphs can also pose a problem, albeit a completely different one. For example, how many paragraphs do you see in the following selection, with the underscore indicating each time we pressed the ENTER key:<\/p>\n<p>Paragraph 1._<br \/>_<br \/>Paragraph 2._<br \/>_<br \/>Paragraph 3._<\/p>\n<p>As you might have guessed, the answer is this: it depends. If you use the <b>ComputeStatistics<\/b> method to calculate the number of paragraphs (like we did in our <a href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/resources\/officetips\/mar05\/tips0324.mspx\"><b>original article<\/b><\/a>), Word will tell you that there are three paragraphs here. If you use the <b>Paragraphs<\/b> collection, however (which we\u2019re going to use today) then Word will tell you that there are <i>five<\/i> paragraphs in this document. Why? Because in that case Word is simply counting the number of times you pressed the ENTER key. How did we insert a blank line between paragraphs? That\u2019s right: we hit the ENTER key. In fact, we hit the ENTER key five times, which is why the Paragraphs collection contains five items. So which of these two values \u2013 three or five \u2013 is correct? That really depends both on you and on the nature of your documents. <\/p>\n<p>But you know what? There\u2019s no reason why you can\u2019t use both approaches in your script. In fact, why don\u2019t we do just that? Why <i>don\u2019t<\/i> we use both paragraph-counting methods in our script:<\/p>\n<pre class=\"codeSample\">Const wdStatisticParagraphs = 4Set objWord = CreateObject(\"Word.Application\")objWord.Visible = TrueSet objDoc = objWord.Documents.Open(\"C:\\Scripts\\Test.doc\")Wscript.Echo \"Paragraphs (text-only): \" &amp; objDoc.ComputeStatistics(wdStatisticParagraphs)Wscript.Echo \"Paragraphs (including blank lines): \" &amp; objDoc.Paragraphs.CountWscript.Echo \"Sentences: \" &amp; objDoc.Sentences.Count<\/pre>\n<p>As you can see, we start things off by defining a constant named wdStatisticParagraphs and setting the value to 4; that tells Word which kind of statistic we want it to compute. After defining the constant we create an instance of the <b>Word.Application<\/b> object; set the <b>Visible<\/b> property to True (just so we can see our instance of Word on screen); and then use this line of code to open the document C:\\Scripts\\Test.doc:<\/p>\n<pre class=\"codeSample\">Set objDoc = objWord.Documents.Open(\"C:\\Scripts\\Test.doc\")<\/pre>\n<p>To make it a little easier for you to follow along at home, here\u2019s what Test.doc looks like:<\/p>\n<p>The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.<br \/>The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.<br \/>The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog.<\/p>\n<p>By the way, here\u2019s a cool little Word trick. Create a new, blank document, type the following, and then press ENTER:<\/p>\n<pre class=\"codeSample\">=rand()\u202a<\/pre>\n<p>When you do that, Word will add the preceding text into your document. (In Word 2007 the text will be different, but thre function still works.) In other words, <b>=rand() <\/b>will add a bunch of sample text into your document, giving you a practice document that actually has some text in it. Would you like a document that has 8 paragraphs, and would you like each of those paragraphs to have 3 sentences in it? Then type the following and press ENTER:<\/p>\n<pre class=\"codeSample\">=rand(8,3)\u202a<\/pre>\n<p>And you thought all we did was write scripts. The truth is, every now and then we actually know something that <i>doesn\u2019t<\/i> involve scripting.<\/p>\n<p>And yes, that usually <i>is<\/i> something about as important as knowing how to insert sample text into a Word document.<\/p>\n<p>After we\u2019ve opened our document we\u2019re ready to calculate the number of paragraphs and sentences. To count only the paragraphs that actually contain text we use this line of code:<\/p>\n<pre class=\"codeSample\">Wscript.Echo \"Paragraphs (text-only): \" &amp; objDoc.ComputeStatistics(wdStatisticParagraphs)<\/pre>\n<p>In this case, the script is going to tell us that the document contains 3 paragraphs; that\u2019s because we have three paragraphs that actually contain text. To count the number of times we hit the ENTER key, we simply report back the value of the Paragraph collection\u2019s <b>Count<\/b> property, a property that tells us the number of items in the collection:<\/p>\n<pre class=\"codeSample\">Wscript.Echo \"Paragraphs (including blank lines): \" &amp; objDoc.Paragraphs.Count<\/pre>\n<p>This time around Word will tell us that the document contains <i>4<\/i> paragraphs; that\u2019s because the blank line following paragraph 3 is considered a paragraph. Finally, we can use the Count property of the <b>Sentences<\/b> collection to determine the number of sentences in the document:<\/p>\n<pre class=\"codeSample\">Wscript.Echo \"Sentences: \" &amp; objDoc.Sentences.Count<\/pre>\n<p>Because this is a very straightforward little document (i.e., it doesn\u2019t contain abbreviations or any other misleading punctuation marks) Word correctly tells us that the document contains 15 sentences. Like we said, depending on the nature of your document Word won\u2019t always be able to tell you exactly how many sentences there are. In this case, it hit the nail right on the head. In other cases \u2026.<\/p>\n<p>That\u2019s about the best we can do, RLP. It\u2019s far from perfect, but we don\u2019t really know of any foolproof way to get at this information. But, remember, the important thing isn\u2019t whether or not you can count the sentences in a Word document with 100% accuracy. The important thing is that the New England Patriots lost the Super Bowl.<\/p>\n<p>We\u2019ll just try to ignore the fact that, if the Patriots lost, that must mean that the Giants won. <\/p>\n<table class=\"dataTable\" id=\"ECAAC\" cellSpacing=\"0\" cellPadding=\"0\">\n<thead><\/thead>\n<tbody>\n<tr class=\"record\" vAlign=\"top\">\n<td class=\"\">\n<p class=\"lastInCell\"><b>Note<\/b>. Of course, now that the Super Bowl is over there\u2019s only one big game left: the <a href=\"http:\/\/www.microsoft.com\/technet\/scriptcenter\/funzone\/games\/default.mspx\"><b>2008 Winter Scripting Games<\/b><\/a>. Remember, the Games start on Friday, February 15<sup>th<\/sup>. Whatever you do, don\u2019t miss them; after all, nobody wants to see the Giants win the Scripting Games, too.<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n","protected":false},"excerpt":{"rendered":"<p>Hey, Scripting Guy! I just read your article about getting Word documents statistics and would like to know if there is any way of getting the number of sentences and paragraphs per document. &#8212; RLP Hey, RLP. To be perfectly honest, now that the Super Bowl is over the Scripting Guy who writes this column [&hellip;]<\/p>\n","protected":false},"author":595,"featured_media":87096,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[84,49,3,5],"class_list":["post-56273","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-scripting","tag-microsoft-word","tag-office","tag-scripting-guy","tag-vbscript"],"acf":[],"blog_post_summary":"<p>Hey, Scripting Guy! I just read your article about getting Word documents statistics and would like to know if there is any way of getting the number of sentences and paragraphs per document. &#8212; RLP Hey, RLP. To be perfectly honest, now that the Super Bowl is over the Scripting Guy who writes this column [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/56273","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/users\/595"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/comments?post=56273"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/posts\/56273\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media\/87096"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/media?parent=56273"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/categories?post=56273"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/scripting\/wp-json\/wp\/v2\/tags?post=56273"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}