{"id":9803,"date":"2011-08-25T07:00:00","date_gmt":"2011-08-25T07:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2011\/08\/25\/stupid-command-line-trick-counting-the-number-of-lines-in-stdin\/"},"modified":"2011-08-25T07:00:00","modified_gmt":"2011-08-25T07:00:00","slug":"stupid-command-line-trick-counting-the-number-of-lines-in-stdin","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20110825-00\/?p=9803","title":{"rendered":"Stupid command-line trick: Counting the number of lines in stdin"},"content":{"rendered":"<p>\nOn unix, you can use <code>wc -l<\/code> to count the number of lines\nin stdin.\nWindows doesn&#8217;t come with <code>wc<\/code>,\nbut there&#8217;s a sneaky way to count the number of lines anyway:\n<\/p>\n<pre>\nsome-command-that-generates-output | find \/c \/v \"\"\n<\/pre>\n<p>\nIt is a special quirk of the <code>find<\/code> command\nthat the null string is treated as never matching.\nThe <code>\/v<\/code> flag reverses the sense of the test,\nso now it matches everything.\nAnd the <code>\/c<\/code> flag returns the count.\n<\/p>\n<p>\nIt&#8217;s pretty convoluted, but it does work.\n<\/p>\n<p>\n(Remember, I provide the occasional tip on\nbatch file programming as a public service to those forced to endure it,\nnot as an endorsement of batch file programming.)\n<\/p>\n<p>\nNow come da history:\nWhy does the <code>find<\/code> command say that a null string matches\nnothing?\nMathematically, the null string is a substring of every string,\nso it should be that if you search for the null string, it matches\neverything.\nThe reason dates back to the original MS-DOS\nversion of <code>find.exe<\/code>,\nwhich according to the comments appears to have been written\nin 1982.\nAnd back then, pretty much all of MS-DOS was written in assembly\nlanguage.\n(If you look at your old MS-DOS floppies, you&#8217;ll find that\n<code>find.exe<\/code> is under 7KB in size.)\nHere is the relevant code, though I&#8217;ve done some editing to get rid of\ndistractions like DBCS support.\n<\/p>\n<pre>\n        mov     dx,st_length            ;length of the string arg.\n        dec     dx                      ;adjust for later use\n        mov     di, line_buffer\nlop:\n        inc     dx\n        mov     si,offset st_buffer     ;pointer to beg. of string argument\ncomp_next_char:\n        lodsb\n        cmp     al,byte ptr [di]\n        jnz     no_match\n        dec     dx\n        jz      a_matchk                ; no chars left: a match!\n        call    next_char               ; updates di\n        jc      no_match                ; end of line reached\n        jmp     comp_next_char          ; loop if chars left in arg.\n<\/pre>\n<p>\nIf you&#8217;re rusty on your 8086 assembly language,\nhere&#8217;s how it goes in pseudocode:\n<\/p>\n<pre>\n int dx = st_length - 1;\n char *di = line_buffer;\nlop:\n dx++;\n char *si = st_buffer;\ncomp_next_char:\n char al = *si++;\n if (al != *di) goto no_match;\n if (--dx == 0) goto a_matchk;\n if (!next_char(&amp;di)) goto no_match;\n goto comp_next_char;\n<\/pre>\n<p>\nIn sort-of-C, the code looks like this:\n<\/p>\n<pre>\n int l = st_length - 1;\n char *line = line_buffer;\n l++;\n char *string = st_buffer;\n while (*string++ == *line &amp;&amp; --l &amp;&amp; next_char(&amp;line)) {}\n<\/pre>\n<p>\nThe weird <code>-&nbsp;1<\/code> followed by <code>l++<\/code> is an artifact\nof code that I deleted, which needed the decremented value.\nIf you prefer, you can look at the code this way:\n<\/p>\n<pre>\n int l = st_length;\n char *line = line_buffer;\n char *string = st_buffer;\n while (*string++ == *line &amp;&amp; --l &amp;&amp; next_char(&amp;line)) {}\n<\/pre>\n<p>\nNotice that if the string length is zero, there is an integer\nunderflow, and we end up reading off the end of the buffers.\nThe comparison loop does stop, because we eventually\nhit bytes that don&#8217;t match.\n(No virtual memory here, so there is no page fault when you\nrun off the end of a buffer; you just keep going and reading\nfrom other parts of your data segment.)\n<\/p>\n<p>\nIn other words, due to an integer underflow bug, a string of length zero\nwas treated as if it were a string of length 65536, which doesn&#8217;t\nmatch anywhere in the file.\n<\/p>\n<p>\nThis bug couldn&#8217;t be fixed,\nbecause by the time you got around to\ntrying, there were already people who discovered this behavior\nand wrote batch files that relied on it.\nThe bug became a feature.\n<\/p>\n<p>\nThe integer underflow was fixed, but the code is careful\nto treat null strings as never matching, in order to preserve\nexisting behavior.<\/p>\n<p>\n<b>Exercise<\/b>: Why is the loop label called <code>lop<\/code>\ninstead of <code>loop<\/code>?<\/p>\n","protected":false},"excerpt":{"rendered":"<p>On unix, you can use wc -l to count the number of lines in stdin. Windows doesn&#8217;t come with wc, but there&#8217;s a sneaky way to count the number of lines anyway: some-command-that-generates-output | find \/c \/v &#8220;&#8221; It is a special quirk of the find command that the null string is treated as never [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[25,104],"class_list":["post-9803","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-code","tag-tipssupport"],"acf":[],"blog_post_summary":"<p>On unix, you can use wc -l to count the number of lines in stdin. Windows doesn&#8217;t come with wc, but there&#8217;s a sneaky way to count the number of lines anyway: some-command-that-generates-output | find \/c \/v &#8220;&#8221; It is a special quirk of the find command that the null string is treated as never [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/9803","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=9803"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/9803\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=9803"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=9803"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=9803"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}