Removing Excess Whitespace from a String

I was looking for the most efficient way to remove excess white space from a string and wrote the following benchmark. Guess which algorithm is faster?

const int iterations = 200000;
const string expr = " Hello    world! Why    are so    many spaces?  Testing One   two three    four    five.";

// Remove excess space using Regex
var doRegex = new Action(() =>
{
    for (int i = 0; i < iterations; i++)
    {
        var newStr = Regex.Replace(expr, @"\s{2,}", " ");
    }
});


// Remove excess space using Split/Join
var doSplit = new Action(() =>
{
    for (int i = 0; i < iterations; i++)
    {
        var newStr = String.Join(" ", expr.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries));
    }
});
var benchMark = new Func<string, Action, long>((name, a) =>
{
    var sw = Stopwatch.StartNew();
    a();
    sw.Stop();
    Console.WriteLine(name + ": " + sw.ElapsedMilliseconds);
    return sw.ElapsedMilliseconds;
});

// Warming up
Console.WriteLine("Warming up.");
doRegex();
doSplit();

// Run benchmark
long regexElapsed = benchMark("Regex", doRegex);
long splitElapsed = benchMark("Split", doSplit);

On my PC, the Split method is about 7.5 times faster than Regex.

image

4 Replies to “Removing Excess Whitespace from a String”

  1. I could not understand your code at line 22. In my opinion It should be something like this:

    var benchMark = new Func((name, a) => {…});

    Could you please explain yours?

    Also, about the result. Mine is not impressive as yours. I got 2098 for regex and 1196 for split method. Do you think what impacts the result?

    I have Core i7-2720QM run at 2.2Ghz and 8GB RAM.

    Thanks.

  2. Hi Minh: Good catch on line 22… I think that was some type of copy/paste error. I have fixed the article.

    Your i7-2720QM CPU is only maybe 25% slower than my i7-2600K at 3.4Mhz. I was actually running the code inside a VMWARE machine but I guess that doesn’t slow things down much. Maybe you have some other things going on on your PC when you were running the code?

    Chinh

  3. Que de compliments si joliment tournés ; je n’en attendais pas tant. Mais êtes-vous certain de la correction de cette phrase : « Lire Dominique est la preuve que l’on peut cracher, pester, vilipender, honnir, traîner dans la boue, sans jamais céder à la gesticulation. » Le sujet sous-entendu de « lire » doit être le même que celui de « on peut… », puisque rien ne précise l’identité du pronom « on ».

Leave a Reply

Your email address will not be published. Required fields are marked *