{"id":37843,"date":"2004-09-16T07:00:00","date_gmt":"2004-09-16T07:00:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/oldnewthing\/2004\/09\/16\/a-visual-history-of-spam-and-virus-email\/"},"modified":"2004-09-16T07:00:00","modified_gmt":"2004-09-16T07:00:00","slug":"a-visual-history-of-spam-and-virus-email","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/oldnewthing\/20040916-00\/?p=37843","title":{"rendered":"A visual history of spam (and virus) email"},"content":{"rendered":"<p><P>\nI have kept every single piece of spam and virus email since mid-1997.\nOccasionally, it comes in handy, for example, to add\n<A HREF=\"http:\/\/www.paulgraham.com\/spam.html\">\nna&iuml;ve Bayesian spam filter<\/A> to my custom-written email filter.\nAnd occasionally I use it to build a chart of spam and virus email.\n<\/P>\n<P>\nThe following chart plots every single piece of spam and virus email\nthat arrived at my work email address since April 1997.\nBlue dots are spam and red dots are email viruses.\nThe horizontal axis is time, and the vertical axis is size of mail\n(on a <strong>logarithmic<\/strong> scale).\nDarker dots represent more messages.\n(Messages larger than 1MB have been treated as if they were 1MB.)\n<\/P>\n<P>\nNote that this chart is not scientific.  Only mail which makes it past\nthe corporate spam and virus filters show up on the chart.\n<\/P>\n<P>\nWhy does so much spam and virus mail get through the filters?\nBecause corporate mail filters cannot take the risk of accidentally\nclassifying valid business email as spam.  Consequently, the filters\nhave to make sure to remove something only if they has extremely high\nconfidence that the message is unwanted.\n<\/P>\n<P>\nOkay, enough dawdling.  Let&#8217;s see the chart.\n<\/P>\n<IMG SRC=\"http:\/\/www.gotdotnet.com\/team\/raymondc\/0409.spam.png\" WIDTH=\"800\" HEIGHT=\"640\" BORDER=\"1\">\n<P>\nOverall statistics and extrema:\n<\/P>\n<UL>\n<LI>First message in chart: April 22, 1997.\n<LI>Last message in chart: September 10, 2004.\n<LI>Smallest message: 372 bytes, received March 11, 1998.\n<PRE>\nFrom: 15841.\nTo: 15841.\nSubject: About your account&#8230;\nContent-Type: text\/plain; charset=ISO-8859-1\nContent-Transfer-Encoding: 7bit<\/p>\n<p>P\n<\/PRE>\n<LI>Largest message: 1,406,967 bytes, received January 8, 2004.\n    HTML mail with a lot of text including 41 large images.\n    A slightly smaller version was received the previous day.\n    (I guess they figured that their first version wasn&#8217;t big\n    enough, so they sent out an updated version the next day.)\n<LI>Single worst spam day by volume: January 8, 2004. That one monster\n    message sealed the deal.\n<LI>Single worst spam day by number of messages:\n    August 22, 2002.  67 pieces of spam.  The vertical blue line.\n<LI>Single worst virus day: August 24, 2003.\n    This is the winner both by volume (1.7MB) and by number (49).\n    The red splotch.\n<LI>Totals: 227.6MB of spam in roughly 19,000 messages.\n    61.8MB of viruses in roughly 3500 messages.\n<\/UL>\n<P>\nThings you can see on the chart:\n<\/P>\n<UL>\n<LI>\nSpam went ballistic starting in 2002.\nYou could see it growing in 2001, but 2002 was when it really took off.\n<LI>\nVertical blue lines are &#8220;bad spam days&#8221;.\nVertical red lines are &#8220;bad virus days&#8221;.\n<LI>\nHorizontal red lines let you watch the lifetime of a particular email virus.\n(This works only for viruses with a fixed-size payload.\nViruses with variable-size payload are smeared vertically.)\n<LI>\nThe big red splotch in August 2003 around the 100K mark is the Sobig virus.\n<LI>\nThe horizontal line in 2004 that wanders around\nthe 2K mark is the Netsky virus.\n<LI>\nFor most of this time, the company policy on\nspam filtering was not to filter it out at all,\nbecause all the filters they tried had too high a false-positive rate.\n(I.e., they were rejecting too many valid messages as spam.)\nYou can see that in late 2003, the blue dot density diminished\nconsiderably.\nThat&#8217;s when mail administrators found a filter\nwhose false-positive rate was low enough to be acceptable.\n<\/UL>\n<P>\nAs a comparison, here&#8217;s the same chart based on email received\nat one of my inactive personal email addresses.\n<\/P>\n<IMG SRC=\"http:\/\/www.gotdotnet.com\/team\/raymondc\/0409.spam2.png\" WIDTH=\"438\" HEIGHT=\"640\" BORDER=\"1\">\n<\/P>\n<P>\nThis particular email address has been inactive since 1995;\nall the mail it gets is therefore from harvesting done prior to 1995.\n(That&#8217;s why you don&#8217;t see any red dots: None of my friends have this address\nin their address book since it is inactive.)\nThe graph doesn&#8217;t go back as far because\nI didn&#8217;t start saving spam from this address until late 2000.\n<\/P>\n<P>\nOverall statistics and extrema:\n<\/P>\n<UL>\n<LI>First message in chart: September 2, 2000.\n<LI>Last message in chart: September 10, 2004.\n<LI>Smallest message: 256 bytes, received July 24, 2004.\n<PRE>\nReceived: from dhcp065-025-005-032.neo.rr.com ([65.25.5.32]) by &#8230;\n         Sat, 24 Jul 2004 12:30:35 -0700\nX-Message-Info: 10\n<\/PRE>\n<LI>Largest message: 3,661,900 bytes, received April 11, 2003.\n    Mail with four large bitmap attachments, each of which is\n    a Windows screenshot of Word with a document open, each\n    bitmap showing a different page of the document.\n    Perhaps one of the most inefficient ways of distributing a four-page\n    document.\n<LI>Single worst spam day by volume: April 11, 2003.\n    Again, the monster message drowns out the competition.\n<LI>Single worst spam day by number of messages:\n    October 3, 2003. 74 pieces of spam.\n<LI>Totals: 237MB of spam in roughly 35,000 messages.\n<\/UL>\n<P>\nI cannot explain the mysterious &#8220;quiet period&#8221; at the beginning\nof 2004.  Perhaps my ISP instituted a filter for a while?\nPerhaps I didn&#8217;t log on often enough to pick up my spam and it\nexpired on the server? I don&#8217;t know.\n<\/P>\n<P>\nOne theory is that the lull was due to uncertainty created by the\nCAN-SPAM Act, which took effect on January 1, 2004.\nI don&#8217;t buy this theory since there was no significant corresponding\nlull at my other email account, and follow-up reports indicate\nthat CAN-SPAM was widely disregarded.\n<A HREF=\"http:\/\/www.washingtonpost.com\/ac2\/wp-dyn\/A29136-2004Jun9\">\nEven in its heyday, compliance was only 3%<\/A>.\n<\/P>\n<P>\nCuriously, the trend in spam size for this particular account is\nthat it has been going <STRONG>down<\/STRONG> since 2002.\nIn the previous chart, you could see a clear upward trend since 1997.\nMy theory is that since this second dataset is more focused on current\ntrends, it missed out on the growth trend in the late 1990&#8217;s\nand instead is seeing the shift in spam from text to &lt;IMG&gt; tags.\n<\/P><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have kept every single piece of spam and virus email since mid-1997. Occasionally, it comes in handy, for example, to add na&iuml;ve Bayesian spam filter to my custom-written email filter. And occasionally I use it to build a chart of spam and virus email. The following chart plots every single piece of spam and [&hellip;]<\/p>\n","protected":false},"author":1069,"featured_media":111744,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[26],"class_list":["post-37843","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-oldnewthing","tag-other"],"acf":[],"blog_post_summary":"<p>I have kept every single piece of spam and virus email since mid-1997. Occasionally, it comes in handy, for example, to add na&iuml;ve Bayesian spam filter to my custom-written email filter. And occasionally I use it to build a chart of spam and virus email. The following chart plots every single piece of spam and [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/37843","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/users\/1069"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/comments?post=37843"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/posts\/37843\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media\/111744"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/media?parent=37843"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/categories?post=37843"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/oldnewthing\/wp-json\/wp\/v2\/tags?post=37843"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}