Chinh Do

Removing Excess Whitespace from a String

9th March 2012

Removing Excess Whitespace from a String

I was looking for the most efficient way to remove excess white space from a string and wrote the following benchmark. Guess which algorithm is faster?

const int iterations = 200000;
const string expr = " Hello    world! Why    are so    many spaces?  Testing One   two three    four    five.";

// Remove excess space using Regex
var doRegex = new Action(() =>
{
    for (int i = 0; i < iterations; i++)
    {
        var newStr = Regex.Replace(expr, @"\s{2,}", " ");
    }
});


// Remove excess space using Split/Join
var doSplit = new Action(() =>
{
    for (int i = 0; i < iterations; i++)
    {
        var newStr = String.Join(" ", expr.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries));
    }
});
var benchMark = new Func<string, Action, long>((name, a) =>
{
    var sw = Stopwatch.StartNew();
    a();
    sw.Stop();
    Console.WriteLine(name + ": " + sw.ElapsedMilliseconds);
    return sw.ElapsedMilliseconds;
});

// Warming up
Console.WriteLine("Warming up.");
doRegex();
doSplit();

// Run benchmark
long regexElapsed = benchMark("Regex", doRegex);
long splitElapsed = benchMark("Split", doSplit);

On my PC, the Split method is about 7.5 times faster than Regex.

image

This entry was posted on Friday, March 9th, 2012 at 9:09 pm and is filed under Dotnet/.NET - C#, Programming. You can follow any responses to this entry through the RSS 2.0 feed. You can skip to the end and leave a response. Pinging is currently not allowed.

There are currently 2 responses to “Removing Excess Whitespace from a String”

  1. 1 On March 17th, 2012, Minh Le said:

    I could not understand your code at line 22. In my opinion It should be something like this:

    var benchMark = new Func((name, a) => {…});

    Could you please explain yours?

    Also, about the result. Mine is not impressive as yours. I got 2098 for regex and 1196 for split method. Do you think what impacts the result?

    I have Core i7-2720QM run at 2.2Ghz and 8GB RAM.

    Thanks.

  2. 2 On March 17th, 2012, Chinh Do said:

    Hi Minh: Good catch on line 22… I think that was some type of copy/paste error. I have fixed the article.

    Your i7-2720QM CPU is only maybe 25% slower than my i7-2600K at 3.4Mhz. I was actually running the code inside a VMWARE machine but I guess that doesn’t slow things down much. Maybe you have some other things going on on your PC when you were running the code?

    Chinh

Leave a Comment