Detecting Blank Images with C#

Greetings visitor from the year 2020! You can get the latest optimized working source code for this, including a version that does not use unsafe code, from my Github repo here. Thanks for visiting.

Recently I needed a way to find blank images among a large batch of images. I had tens of thousands of images to work with so I came up with this c# function to tell me whether an image is blank.

The basic idea behind this function is that blank images will have highly uniform pixel values throughout the whole image. To measure the degree of uniformity (or variability), the function calculates the standard deviation of all pixel values. An image is determined to be blank if the standard deviation falls below a certain threshold.

Here’s the code. In order to compile, the project to which this code resides must have “Allow Unsafe Code” checked.

public static bool IsBlank(string imageFileName)
{
    double stdDev = GetStdDev(imageFileName);
    return stdDev < 100000;
}

/// <summary>
/// Get the standard deviation of pixel values.
/// </summary>
/// <param name="imageFileName">Name of the image file.</param>
/// <returns>Standard deviation.</returns>
public static double GetStdDev(string imageFileName)
{
    double total = 0, totalVariance = 0;
    int count = 0;
    double stdDev = 0;

    // First get all the bytes
    using (Bitmap b = new Bitmap(imageFileName))
    {
        BitmapData bmData = b.LockBits(new Rectangle(0, 0, b.Width, b.Height), ImageLockMode.ReadOnly, b.PixelFormat);
        int stride = bmData.Stride;
        IntPtr Scan0 = bmData.Scan0;
        unsafe
        {
            byte* p = (byte*)(void*)Scan0;
            int nOffset = stride - b.Width * 3;
            for (int y = 0; y < b.Height; ++y)
            {
                for (int x = 0; x < b.Width; ++x)
                {
                    count++;
                    byte blue = p[0];                            
                    byte green = p[1];
                    byte red = p[2];

                    int pixelValue = red + green + blue;
                    total += pixelValue;
                    double avg = total / count;
                    totalVariance += Math.Pow(pixelValue - avg, 2);
                    stdDev = Math.Sqrt(totalVariance / count);

                    p += 3;
                }
                p += nOffset;
            }
        }

        b.UnlockBits(bmData);
    }

    return stdDev;
}

Chinh Do

I occasionally blog about programming (.NET, Node.js, Java, PowerShell, React, Angular, JavaScript, etc), gadgets, etc. Follow me on Twitter for tips on those same topics. You can also find me on GitHub. See About for more info.

View Comments

  • Geetesh: My guess is that some of the scanned images have scanning artifacts in them that cause the code to think they are not blank. You can try to increase the Standared Deviation threshold. Change the 100000 number to something bigger.

  • Seems on certain images I get the pointer error even with Timex code.

    I've added try catch with continue inside:
    try
    {
    count++;

    byte blue = p[0];
    byte green = p[1];
    byte red = p[2];

    int pixelValue = Color.FromArgb(0, red, green, blue).ToArgb();
    total += pixelValue;
    double avg = total / count;
    totalVariance += Math.Pow(pixelValue - avg, 2);
    stdDev = Math.Sqrt(totalVariance / count);
    p += bytesPerPixel;
    }
    catch
    {
    continue;
    }

    seems to have fixed the issue, but I still dont understand why some images will throw an error.

    Times: What do you mean changing the hex to dec? all functions really on a byte not decimal value. Can you repost the decimal version?

    Thanks

  • There is an error when working with BMPS that are lower than 8 bit per pixel.

    ie:
    byte bytesPerPixel = (byte)(bitsPerPixel / 8);

    if bitsperpixel < 8 the bytersperpixel = 0

    and p += bytesPerPixel; is p += 0;

    I tried a work around by making in p++ every 8 loops but that didnt work.... (for a 1 bpp bmp)
    im a programming n00b.. any ideas?

  • thanks for your reply chinh do.
    I already did that work around actually..

    I'll keep my eye on this page incase anyone else has a more efficient solution.
    :)

  • I'm trying to convert this code to run under VB.net. I'm having a problem with the following line. byte* p = (byte*)(void*)Scan0;

    Has anyone convert this code to VB?

    Do I have to use the "Pointer" method or can the code be written without using a Pointer?

    Thank you for your Help!
    Phil

  • Hi Chinh

    need your suggestion as to how to put the code that you have given into a project?
    I want to do something similar, and scan through a folder, containing thousands of images, and get as output the filename of a blank image.

  • Hi,

    I have an scanned document with multiple pages and every alternate page is blank.
    When I tried to convert each page to image and run the above code still it is showing Isblank has "false".

    How do I overcome the above issue?

    Whether tolerance is same for all .tiff files (or) it varies for each page?

    For an given multi page scanned document, how do I find the tolerance value?

    Any help is highly appreciated.

  • I use this algorithm in some of my programs, but with a few changes. One of the changes I did to the algorithm was on this line:

    int pixelValue = Color.FromArgb(0, red, green, blue).ToArgb();

    This is not very wise, because since the ToArgb() method will return a 32-bit integer in the form AARRGGBB, differences among the pixels on the red channel will account more on the calculated standard deviation than differences on the green channel, and even more than on the blue channel. This leads to some difficulty in setting a threshold for the standard deviation value of the images, because the algorithm may return quite different values for images which are near the blank/not-blank point.

    A more meaningful value to the pixelValue variable is:

    int pixelValue = red + green + blue / 3;

    or

    double pixelValue = red + green + blue / 3.0;

    This way, differences in the 3 channels are accounted with the same weight on the result. Also, the algorithm's output becomes much more comprehensive, varying from 0 to 255. Setting a threshold with this modification to the algorithm has proven much easier. The algorithm usually returns a value close to 1 on blank images and above 8 on non-blank images. I am currently using 2 for threshold, with a near perfect accuracy rate.

    I know this means we're interpreting the image as if it were in grayscale, but in my experience it makes no difference.

Recent Posts

How to switch to a different Kubernetes context or namespace?

To list available contexts: kubectl config get-contexts To show the current context: kubectl config current-context…

3 years ago

How to ssh into Kubernetes pod

kubectl exec -it <podname> -- sh To get a list of running pods in the…

3 years ago

How to Create a Soft Symbolic Link (symlink) in Unix/Linux

# Create a soft symbolic link from /mnt/original (file or folder) to ~/link ln -s…

4 years ago

How to Configure Git Username and Email Address

git config --global user.name "<your name>" git config --global user.email "<youremail@somewhere.com>" Related Commands Show current…

4 years ago

Getting the Last Monday for Any Month with TypeScript/JavaScript

TypeScript/JavaScript function getLastMonday(d: Date) { let d1 = new Date(d.getFullYear(), d.getMonth() + 1, 0); let…

5 years ago

How to View Raw SMTP Email Headers in Outlook

I had to do some SMTP relay troubleshooting and it wasn't obvious how to view…

6 years ago