A little program to look for files with inconsistent line endings

Raymond Chen

Raymond

I wrote this little program to look for files with inconsistent line endings. Maybe you’ll find it useful. Probably not, but I’m posting it anyway.

using System;
using System.Collections.Generic;
using System.IO;

class Program
{
    static IEnumerable<FileInfo> EnumerateFiles(string dir)
    {
        var info = new System.IO.DirectoryInfo(dir);
        foreach (var f in info.EnumerateFileSystemInfos(
                           "*.*", SearchOption.TopDirectoryOnly))
        {
            if (f.Attributes.HasFlag(FileAttributes.Hidden))
            {
                continue;
            }

            if (f.Attributes.HasFlag(FileAttributes.Directory))
            {
                switch (f.Name.ToLower())
                {
                    case "bin":
                    case "obj":
                        continue;
                }

                foreach (var inner in EnumerateFiles(f.FullName))
                {
                    yield return inner;
                }
            }
            else
            {
                yield return (FileInfo)f;
            }
        }
    }

    // Starting in the current directory, enumerate files
    // (see EnumerateFiles for criteria), and report what
    // type of line ending each file uses.

    static void Main()
    {
        foreach (var f in EnumerateFiles("."))
        {
            // Skip obvious binary files.
            switch (f.Extension.ToLower())
            {
                case ".png":
                case ".jpg":
                case ".gif":
                case ".wmv":
                    continue;
            }

            int line = 0; // total number of lines found
            int cr = 0;   // number of lines that end in CR
            int crlf = 0; // number of lines that end in CRLF
            int lf = 0;   // number of lines that end in LF

            var stream = new FileStream(
                     f.FullName, FileMode.Open, FileAccess.Read);
            using (var br = new BinaryReader(stream))
            {
                // Slurp the entire file into memory.
                var bytes = br.ReadBytes((int)f.Length);
                for (int i = 0; i < bytes.Length; i++)
                {
                    if (bytes[i] == '\r')
                    {
                        if (i + 1 < bytes.Length &&
                            bytes[i+1] == '\n')
                        {
                            line++;
                            crlf++;
                            i++;
                        }
                        else
                        {
                            line++;
                            cr++;
                        }
                    }
                    else if (bytes[i] == '\n')
                    {
                        lf++;
                        line++;
                    }
                }
            }

            if (cr == line)
            {
                Console.WriteLine("{0}, {1}", f.FullName, "CR");
            }
            else if (crlf == line)
            {
                Console.WriteLine("{0}, {1}", f.FullName, "CRLF");
            }
            else if (lf == line)
            {
                Console.WriteLine("{0}, {1}", f.FullName, "LF");
            }
            else
            {
                Console.WriteLine("{0}, {1}, {2}, {3}, {4}",
                              f.FullName, "Mixed", cr, lf, crlf);
            }
        }
    }
}

The Enumerate­Files method recursively enumerates the contents of the directory, but skips over hidden files, hidden directories, and directories with specific names.

The main program takes the files enumerated by Enumerate­Files, ignores certain known binary file types, and for the remaining files, counts the number of lines and how many of them use any particular line terminator.

If the file’s lines all end the same way, then that line terminator is reported with the file name. Otherwise, the file is reported as Mixed and the number of lines of each type is reported.

I use this little program when chasing down line terminator inconsistencies. Maybe that’s not something you have to deal with, in which case lucky you.

12 comments

Comments are closed.

  • Avatar
    Henry Skoglund

    Hi, found a bug, files without any line terminators are reported as being CR terminated.
    Also, as @Michael Liu says above, you can simplify usig ReadAllBytes. And use the old trustworthy Split method to further simplify could give something like this:

    string bytes = System.Text.Encoding.Default.GetString(System.IO.File.ReadAllBytes(f.FullName));
    int crlf = bytes.Split(new String[] { "\r\n" }, StringSplitOptions.None).Count() - 1;
    int cr = bytes.Split('\r').Count() - 1 - crlf;
    int lf = bytes.Split('\n').Count() - 1 - crlf;
    if (0 == crlf + cr + lf)
        Console.WriteLine("{0}, {1}", f.FullName, "None");
    else if (0 == cr + lf)
        Console.WriteLine("{0}, {1}", f.FullName, "CRLF");
    else if (0 == crlf + lf)
        Console.WriteLine("{0}, {1}", f.FullName, "CR");
    else if (0 == crlf + cr)
        Console.WriteLine("{0}, {1}", f.FullName, "LF");
    else
        Console.WriteLine("{0}, {1}, {2}, {3}, {4}",
                             f.FullName, "Mixed", cr, lf, crlf);
    
  • Avatar
    cheong00

    I wonder, shouldn’t a whitelist approach be used (i.e.: only visit file extensions you want to look for different line endings)?
    There are who knows how many files types on the disk which are not text files and will just produce unwanted noise.

  • Avatar
    Paulo Morgado

    Although very convinient for to look at, a switch statement that requires the creation of a string is not the best practice. Using Equals with a StringComparison would be better.