Programming Archives - Page 9 of 19

Memory Leak Fix for Transactional File Manager

December 24, 2008
Chinh Do
Dotnet/.NET - C#, Programming
0 Comments

I found a memory leak in my Transactional File Manager, so here’s version 1.0.1 with a fix for the leak. Click here to download.

Update (6/8/2010) – This project is now in CodePlex. URL: http://transactionalfilemgr.codeplex.com/

Detecting Blank Images with C#

September 10, 2008
Chinh Do
Dotnet/.NET - C#, Programming
44 Comments

Greetings visitor from the year 2020! You can get the latest optimized working source code for this, including a version that does not use unsafe code, from my Github repo here. Thanks for visiting.

Recently I needed a way to find blank images among a large batch of images. I had tens of thousands of images to work with so I came up with this c# function to tell me whether an image is blank.

The basic idea behind this function is that blank images will have highly uniform pixel values throughout the whole image. To measure the degree of uniformity (or variability), the function calculates the standard deviation of all pixel values. An image is determined to be blank if the standard deviation falls below a certain threshold.

Here’s the code. In order to compile, the project to which this code resides must have “Allow Unsafe Code” checked.

public static bool IsBlank(string imageFileName)
{
    double stdDev = GetStdDev(imageFileName);
    return stdDev < 100000;
}

/// <summary>
/// Get the standard deviation of pixel values.
/// </summary>
/// <param name="imageFileName">Name of the image file.</param>
/// <returns>Standard deviation.</returns>
public static double GetStdDev(string imageFileName)
{
    double total = 0, totalVariance = 0;
    int count = 0;
    double stdDev = 0;

    // First get all the bytes
    using (Bitmap b = new Bitmap(imageFileName))
    {
        BitmapData bmData = b.LockBits(new Rectangle(0, 0, b.Width, b.Height), ImageLockMode.ReadOnly, b.PixelFormat);
        int stride = bmData.Stride;
        IntPtr Scan0 = bmData.Scan0;
        unsafe
        {
            byte* p = (byte*)(void*)Scan0;
            int nOffset = stride - b.Width * 3;
            for (int y = 0; y < b.Height; ++y)
            {
                for (int x = 0; x < b.Width; ++x)
                {
                    count++;
                    byte blue = p[0];                            
                    byte green = p[1];
                    byte red = p[2];

                    int pixelValue = red + green + blue;
                    total += pixelValue;
                    double avg = total / count;
                    totalVariance += Math.Pow(pixelValue - avg, 2);
                    stdDev = Math.Sqrt(totalVariance / count);

                    p += 3;
                }
                p += nOffset;
            }
        }

        b.UnlockBits(bmData);
    }

    return stdDev;
}

Greatest Hits

August 30, 2008
Chinh Do
Dotnet/.NET - C#, Links, Programming
0 Comments

According to Google Analytics, these are my most popular posts:

Web Scraping, HTML/XML Parsing, and Firebug’s Copy XPath Feature

August 29, 2008
Chinh Do
Dotnet/.NET - C#, Programming, Software/tools, Technology, Tips
6 Comments

If you do any web scraping (also known as web data mining, extracting, harvesting), you are probably familiar with the main steps: navigate to page, retrieve HTML, parse HTML, extract desired elements, repeat. I’ve found the SgmlReader library to be very useful for this purpose. SmglReader turns your HTML into XML. Once you have the XML, it’s fairly easy to use built-in classes such as XmlDocument, XmlTextReader, XPathNavigator to parse and extract the data you want.

Now to the labor intensive part: before your program can make sense of the XML, you have to manually analyze the HTML/XML first. Your program won’t know jack about how to extract that stock price until you tell it exactly where the stock price is, typically in the form of an XPath expression. My process of getting that XPath expression goes something like this:

Scroll to/find desired element in the XML editor.
Does element have unique attributes that can be used?

a – If yes, code XPATH statement with filter on attribute value. Example: //Table[@id=”searchResultTable”].
b – If no, code an absolute XPATH expression. Example: /html/body/div[4]/pre[2]/font[7]/table[2]/tr[5]/td[2]/table[1]/tr[2]/td[5]/span.

Step 2b is where it gets very labor intensive and boring, especially for a big web page with many levels of nesting. Visual Studio 2005 XML Editor/Resharper have a couple of features that I find useful for this:

– Visual Studio’s Format Document (Edit/Advanced/Format Document) command formats the XML with nice indentation and makes it a lot easier to look at.

– With Resharper, you can press Ctrl-[ to go to the start of the current element, or if you are already at the start, go to the parent element.

Even with the above tools, it’s still a painful and error-prone exercise. Luckily for us, Firebug has the perfect feature for this: Copy XPath. To use it, open your HTML/XML document, open the Firebug pane (Tools/Firebug/Open Firebug), navigate to the desired element, right click on it and choose “Copy XPath”.

You should now have this XPath expression in the clipboard, ready to be pasted into your web scrapper application: “/html/body/div[2]/table/tr/td[2]/table”.

A feature that I would love to have is the ability to generate an alternate XPath expression using “id” predicates, such as this: “//Table[@id=”searchResultTable”]”. With web pages that are not under your control, you want to minimize the chance that changes on the pages impact your code. Absolute XPath expressions are vulnerable to any kind of changes on the page that change the order and/or nesting of elements. On the other hand, XPath expressions using an “id” predicate are less likely to be impacted by layout changes because in HTML, element IDs are supposed to be unique. No matter where your element is on the page, if it has the same ID, you should still be able to get to it by looking up the ID. Hmm… this sounds like a good idea for a Visual Studio Add-in.

Interesting Finds – August 27, 2008

August 27, 2008
Chinh Do
Dotnet/.NET - C#, PowerShell, Programming
1 Comment

If you are a subscriber to my blog, you may have noticed that I have not been posting my more “Finds of the Week” in the last 2 months. Well, I was a little busy with the month-long Euro 2008 tournament in June, plus a couple of new games (Crysis and Medieval Total War II). Finally the Olympics in August finished me off.

I am going to turn this series into a periodic (as in longer than weekly :-)) Interesting Finds series from now on.

Oh, if you want to know… Crysis is ok. Very good graphics and requires a hot rod box but gameplay is just ok. I am more into realistic squad-based shooters. Medieval 2 is very addictive.

.NET, C#

I can’t believe I didn’t know about ThreadStaticAttribute. While searching for more information on it, I ran across this interesting article on MSDN Magazine: Scope<T> and More (Stephen Toub) that talked about the use of ThreadStatic in System.Transactions.
10 Tools Which I Left After Using VSTS 2008. By joycsharp.
NDepend – deep code metrics. By John-Daniel Trask.

Programming, General

Windows Vista and Server 2008 bring us Transactional NTFS. Find out more with Enhance Your Apps With File System Transactions. Jason Olson, MSDN.
10 reasons why SQL Server 2008 is going to rock. By Angry Hacker.

PowerShell

I am finding more and more things I can do with PowerShell everyday. The other day I had to “touch” a file… two lines is what it takes:

PS C:\Users\Cdo\AppData\Local\Temp> $f = ls testFile.txt
PS C:\Users\Cdo\AppData\Local\Temp> $f.LastWriteTime = new-object System.DateTime 2007,12,31
PS C:\Users\Cdo\AppData\Local\Temp> ls testFile.txt


    Directory: Microsoft.PowerShell.Core\FileSystem::C:\Users\Cdo\AppData\Local\Temp


Mode                LastWriteTime     Length Name
----                -------------     ------ ----
-a---        12/31/2007  12:00 AM          8 testFile.txt


PS C:\Users\Cdo\AppData\Local\Temp>

Something Different

Euro 2008 Official Match Ball – special production film . Yes, I was a little obsessed with Euro 2008.