Detecting Blank Images with C#

Recently I needed a way to find blank images among a large batch of images. I had tens of thousands of images to work with so I came up with this c# function to tell me whether an image is blank.

The basic idea behind this function is that blank images will have highly uniform pixel values throughout the whole image. To measure the degree of uniformity (or variability), the function calculates the standard deviation of all pixel values. An image is determined to be blank if the standard deviation falls below a certain threshold.

Here’s the code. In order to compile, the project to which this code resides must have “Allow Unsafe Code” checked.

public static bool IsBlank(string imageFileName)
{
    double stdDev = GetStdDev(imageFileName);
    return stdDev < 100000;
}
 
/// <summary>
/// Get the standard deviation of pixel values.
/// </summary>
/// <param name="imageFileName">Name of the image file.</param>
/// <returns>Standard deviation.</returns>
public static double GetStdDev(string imageFileName)
{
    double total = 0, totalVariance = 0;
    int count = 0;
    double stdDev = 0;
 
    // First get all the bytes
    using (Bitmap b = new Bitmap(imageFileName))
    {
        BitmapData bmData = b.LockBits(new Rectangle(0, 0, b.Width, b.Height), ImageLockMode.ReadOnly, b.PixelFormat);
        int stride = bmData.Stride;
        IntPtr Scan0 = bmData.Scan0;
        unsafe
        {
            byte* p = (byte*)(void*)Scan0;
            int nOffset = stride - b.Width * 3;
            for (int y = 0; y < b.Height; ++y)
            {
                for (int x = 0; x < b.Width; ++x)
                {
                    count++;
 
                    byte blue = p[0];                            
                    byte green = p[1];
                    byte red = p[2];
 
                    int pixelValue = Color.FromArgb(0, red, green, blue).ToArgb();
                    total += pixelValue;
                    double avg = total / count;
                    totalVariance += Math.Pow(pixelValue - avg, 2);
                    stdDev = Math.Sqrt(totalVariance / count);
 
                    p += 3;
                }
                p += nOffset;
            }
        }
 
        b.UnlockBits(bmData);
    }
 
    return stdDev;
}

42 Replies to “Detecting Blank Images with C#”

  1. The above code gives error “Attempted to read or write protected memory. This is often an indication that other memory is corrupt.” at line byte blue = p[0];. Please advise how to resolve it.

  2. Thank you very much for sharing this function.But It only working for 24bit Image. when it check 1bit bitmap by this function,it will throw exception like 4Th floor said!(“Attempted to read or write protected memory. This is often an indication that other memory is corrupt.” at line byte blue = p[0];. )

  3. Vitro: You mean the idea behind the algorithm? The general idea is that images are made up of pixels of different values. Blank images would then have pixels that have very similar values. Real images would have pixel values that are spread out all over the spectrum. For example, in a blank/white image, all pixels would have the value #FFFFFF.

    In statistics, Standard Deviation is used to measure the variability or dispersion of a population… so that’s what is used to calculate the similarity or dispersion value of each image. Hope this helps.

  4. What is the reason you chose 3 as your constant pointer advance interval? I believe virtro was on to something when he mentioned it worked ok for 24-bit images.

    In my case, I am actually trying to use this algorithm to analyze a bunch of DB BLOBs of diagram images, to see if they are blank or not. For me, they happen to be 8-bit JPEGs. So I had the same error “Attempted to read or write protected memory. This is often an indication that other memory is corrupt.”

    So, it looks like you need to [1] smartly use the LCD for the current bitmap format (note that they can be larger than 32bpp, even up to 256bpp) for your p interval or [2] just use a constant 2 since all formats will be multiples of 2.

    HTH

  5. CORRECTION:

    I goofed. I overlooked that there actually is a 1bpp format, as I believe virtro mentioned. In that case, you will have to p advance by 1.

    TIP: You can interrogate the pixel format via BitmapData.PixelFormat or Bitmap.PixelFormat. So, I thought of adding some code to choose an LCD based on the actual pixel format for the given image.

    Cheers

  6. Here is what I come up with:

    [code]
    ///
    /// Gets whether or not a given Bitmap is blank.
    ///
    /// The instance of the Bitmap for this method extension.
    /// Returns trueif the given Bitmap is blank; otherwise returns false.
    public static bool IsBlank(this Bitmap bitmap) {
    double stdDev = GetStdDev(bitmap);
    int tolerance = 100000;
    return stdDev < tolerance;
    }

    ///
    /// Gets the bits per pixel (bpp) for the given .
    ///
    /// The instance of the for this method extension.
    /// Returns a representing the bpp for the .
    internal static byte GetBitsPerPixel(this Bitmap bitmap) {
    byte bpp = 0x1;

    //return Regex.Match(Regex.Match(bitmap.PixelFormat.ToString(), @”\dbpp”).Value, @”\d+”).Value;
    switch (bitmap.PixelFormat) {
    case PixelFormat.Format1bppIndexed:
    bpp = 0x1;
    break;
    case PixelFormat.Format4bppIndexed:
    bpp = 0x4;
    break;
    case PixelFormat.Format8bppIndexed:
    bpp = 0x8;
    break;
    case PixelFormat.Format16bppArgb1555:
    case PixelFormat.Format16bppGrayScale:
    case PixelFormat.Format16bppRgb555:
    case PixelFormat.Format16bppRgb565:
    bpp = 0x16;
    break;
    case PixelFormat.Format24bppRgb:
    bpp = 0x24;
    break;
    case PixelFormat.Canonical:
    case PixelFormat.Format32bppArgb:
    case PixelFormat.Format32bppPArgb:
    case PixelFormat.Format32bppRgb:
    bpp = 0x32;
    break;
    case PixelFormat.Format48bppRgb:
    bpp = 0x48;
    break;
    case PixelFormat.Format64bppArgb:
    case PixelFormat.Format64bppPArgb:
    bpp = 0x64;
    break;
    }
    return bpp;
    }

    ///
    /// Get the standard deviation of pixel values.
    ///
    /// The instance of the for this method extension.
    /// Returns the standard deviation of pixel population of the Bitmap.
    public static double GetStdDev(this Bitmap bitmap) {
    double total = 0;
    double totalVariance = 0;
    int count = 0;
    double stdDev = 0;

    // First get all the bytes
    BitmapData bmData = bitmap.LockBits(new Rectangle(0, 0, bitmap.Width, bitmap.Height), ImageLockMode.ReadOnly, bitmap.PixelFormat);
    int stride = bmData.Stride;
    IntPtr Scan0 = bmData.Scan0;

    byte bitsPerPixel = GetBitsPerPixel(bitmap);
    byte bytesPerPixel = (byte)(bitsPerPixel / 8);

    unsafe {
    byte* p = (byte*)(void*)Scan0;
    int nOffset = stride – bitmap.Width * bytesPerPixel;
    for (int y = 0; y < bitmap.Height; ++y) {
    for (int x = 0; x < bitmap.Width; ++x) {
    count++;

    byte blue = p[0];
    byte green = p[1];
    byte red = p[2];

    int pixelValue = Color.FromArgb(0, red, green, blue).ToArgb();
    total += pixelValue;
    double avg = total / count;
    totalVariance += Math.Pow(pixelValue – avg, 2);
    stdDev = Math.Sqrt(totalVariance / count);
    p += bytesPerPixel;
    }
    p += nOffset;
    }
    }
    bitmap.UnlockBits(bmData);

    return stdDev;
    }
    [/code]

    Note: I wrote this in C# 3.0, so these are extension methods, that way they appear as helper methods for the Bitmap type, like so:

    [code]
    using(Bitmap = new Bitmap(@”someimage.bmp”)){
    byte bpp = bitmap.GetBitsPerPixel();
    int stddev = bitmap.GetStdDev();
    bool isBlank = bitmap.IsBlank();
    }
    [/code]

    In any event, I believe this should work for non-indexed bitmaps at least. I’m not sure about true indexed bitmaps.

    Try it out and see if it works.

    Chinh Do: You still get most of the credit, though! 😀 Much appreciated.

  7. Opps, argh! I should have proofed before posting. The switch case statements should have decimal values, not hex. Or, convert them to the correct hex value (e.g. 24 = 0x18 etc.). Apologies.

  8. Chinh Do, no, thank you! I didn’t feel like digging in to try to figure out how to do that kind of thing. You did all the grunt work!

  9. I believe you have an error calculation your variance.
    you can compute the variance only after you know your average.
    you should go over the pixels again after you know the average and compute the squared difference.

  10. Roshan: I am not aware of something like this in C++ that doesn’t mean that it doesn’t exist. I am sure you can translate the code to C++. Any C++ expert out there want to help us out?

    Steve:

    Thanks for your note and I think you are right. My algorithm does not produce a standard deviation number in the textbook definition. I think I used this modified “running” standard deviation algorithm to allow for this optimization: once the “running” standard deviation exceeds a certain threshold, I can short circuit the process and exit the loop. I do remember using this optimization but I guess I took it out at the end to keep the published code simple.

    Chinh

  11. Thanks for such a nice peice of code.

    I am new to C# and need your help Chinch Do or Timex.
    Some of the images(blank pages from both sides) when scanned get tested as blank while some are not blank, even though if they are blank.
    Can u pls help with it.
    If possible can u please give me some explanation on the code u or Timex have given.

  12. Geetesh: My guess is that some of the scanned images have scanning artifacts in them that cause the code to think they are not blank. You can try to increase the Standared Deviation threshold. Change the 100000 number to something bigger.

  13. Seems on certain images I get the pointer error even with Timex code.

    I’ve added try catch with continue inside:
    try
    {
    count++;

    byte blue = p[0];
    byte green = p[1];
    byte red = p[2];

    int pixelValue = Color.FromArgb(0, red, green, blue).ToArgb();
    total += pixelValue;
    double avg = total / count;
    totalVariance += Math.Pow(pixelValue – avg, 2);
    stdDev = Math.Sqrt(totalVariance / count);
    p += bytesPerPixel;
    }
    catch
    {
    continue;
    }

    seems to have fixed the issue, but I still dont understand why some images will throw an error.

    Times: What do you mean changing the hex to dec? all functions really on a byte not decimal value. Can you repost the decimal version?

    Thanks

  14. There is an error when working with BMPS that are lower than 8 bit per pixel.

    ie:
    byte bytesPerPixel = (byte)(bitsPerPixel / 8);

    if bitsperpixel < 8 the bytersperpixel = 0

    and p += bytesPerPixel; is p += 0;

    I tried a work around by making in p++ every 8 loops but that didnt work…. (for a 1 bpp bmp)
    im a programming n00b.. any ideas?

  15. thanks for your reply chinh do.
    I already did that work around actually..

    I’ll keep my eye on this page incase anyone else has a more efficient solution.
    🙂

  16. I’m trying to convert this code to run under VB.net. I’m having a problem with the following line. byte* p = (byte*)(void*)Scan0;

    Has anyone convert this code to VB?

    Do I have to use the “Pointer” method or can the code be written without using a Pointer?

    Thank you for your Help!
    Phil

  17. Hi Chinh

    need your suggestion as to how to put the code that you have given into a project?
    I want to do something similar, and scan through a folder, containing thousands of images, and get as output the filename of a blank image.

  18. Hi,

    I have an scanned document with multiple pages and every alternate page is blank.
    When I tried to convert each page to image and run the above code still it is showing Isblank has “false”.

    How do I overcome the above issue?

    Whether tolerance is same for all .tiff files (or) it varies for each page?

    For an given multi page scanned document, how do I find the tolerance value?

    Any help is highly appreciated.

  19. I use this algorithm in some of my programs, but with a few changes. One of the changes I did to the algorithm was on this line:

    int pixelValue = Color.FromArgb(0, red, green, blue).ToArgb();

    This is not very wise, because since the ToArgb() method will return a 32-bit integer in the form AARRGGBB, differences among the pixels on the red channel will account more on the calculated standard deviation than differences on the green channel, and even more than on the blue channel. This leads to some difficulty in setting a threshold for the standard deviation value of the images, because the algorithm may return quite different values for images which are near the blank/not-blank point.

    A more meaningful value to the pixelValue variable is:

    int pixelValue = red + green + blue / 3;

    or

    double pixelValue = red + green + blue / 3.0;

    This way, differences in the 3 channels are accounted with the same weight on the result. Also, the algorithm’s output becomes much more comprehensive, varying from 0 to 255. Setting a threshold with this modification to the algorithm has proven much easier. The algorithm usually returns a value close to 1 on blank images and above 8 on non-blank images. I am currently using 2 for threshold, with a near perfect accuracy rate.

    I know this means we’re interpreting the image as if it were in grayscale, but in my experience it makes no difference.

  20. Hi Leonardo: That makes a lot of sense. Thanks for sharing the info. I guess we can even calculate Standard Deviation for each color channel individually and make sure each of them is below the threshold.

  21. <i understand that this code works on bitmaps. Does anyoe have a hint whether there ist something similar on compressed files (JPEG, JPEG2000) without decompressing the image? Thx, Jan

  22. How awesome is this function? I am pulling hundreds of website snapshots. This will spot the blank ones and I can flag the listing.
    I just have to work out all-black or color screens. The other issue is to spot “navagation cancelled.
    Thanks for posting !!!!!

  23. Adi: Sorry I have not gotten around to take everyone’s feedback from this post and create an updated method. You will have to take my starting code, and incorporate the additions/changes from the comments in this post. Chinh

  24. “On May 26th, 2009, Timex said:
    Here is what I come up with: …”

    In my program your code doesn’t work do detect white pages.
    All pages give a number over 100000.

Leave a Reply

Your email address will not be published. Required fields are marked *