Chinh Do

Optimizing/Building Your Baseline Win XP Virtual PC Image

August 25, 2007
Chinh Do
Programming, Technology
2 Comments

I am getting ready to check out all the neat tools in Scott Hanselman’s 2007 Ultimate Developer and Power Users Tool List for Windows. It’s a nice and comprehensive list of tools that covers just about all the tools a developer or power user may ever need. There are quite a few tools I have not used before.

Since I don’t necessarily want to install all of these tools to my everyday Windows installation, I am preparing a baseline Virtual PC image to install these tools into. My experience with Windows has taught me to try to keep it as clean as possible.

I found a great guide to optimize your baseline Windows XP virtual image from Dan’s Archive. It took about 30 minutes for me to go through the guide to build my optimized XP image (that’s not counting the time needed to install Windows XP Pro). If anyone is listening out there, I think this would be a good candidate for another “tool”.

With by baseline image ready, now I can make a backup copy of it and install away!! I feel like a kid at Christmas. Notepad2, Notepad++, Lutz Reflector, SlickRun, FireBug, ZoomIt, WinSnap, CodeRush, Refactor, FolderShare… so many new toys to play with, so little time.

Virtual PC

It’s OK to Be Lazy

August 24, 2007
Chinh Do
Dotnet/.NET - C#, Programming
0 Comments

At least when it comes to instantiating objects.

Even in today’s environment, when the typical amount of RAM on each server is in the gigabytes, it’s still wise to pay attention to memory usage. As a developer or architect, you need to be aware of the trade-offs between eager instantiation and lazy instantiation. Yes, it’s rather pointless to consider an Int16 versus an Int32 for a variable if it’s just going to be created and used a few times in the lifetime of your application. However, if that same variable is instantiated thousands of times or more, then the potential improvement in either memory usage or performance (whichever is more important to you) is definitely worth a look.

Eager/Lazy Instantiation Defined

With eager instantiation, the object is created as soon as possible:

Example – Eager Instantiation

public class Customer
{
    // eager instantiation
    private Address homeAddress = new Address();
    public Address HomeAddress
    {
        get
        {
             return homeAddress;
        }
    }
}

With lazy instantiation, the object is created as late as possible:

Example – Lazy Instantiation

public class Customer
{
   private Address homeAddress;
   public Address HomeAddress
   {
       get
       {
           // Create homeAddress if it’s not already created
           if (homeAddress == null)
           {
               homeAddress = new Address();
           }
           return homeAddress;
        }
    }
}

Eager/lazy instantiation also applies to classes, singletons, etc. The principles and potential advantages/disadvantages are similar. For this article, I am only discussing the instantiation of class members.

CPU Cycles vs. Memory Usage

Eager vs. lazy instantiation is the classic performance/memory trade-off. With eager instantiation, you gain some performance improvement at the cost of system memory. Exactly what kind of performance/memory trade-off are we talking about? The answer depends mostly on the objects themselves:

How many instances of the parent object do you need?
What is the memory footprint of the member object?
How much time does it take to instantiate the member object?
How often will the parent object/member object be accessed?

Calculating the Memory Footprint of an Object

According to my own experiments (using DevPartner Studio and .NET Memory Profiler), each reference-type object (class) has a minimum memory footprint of 12 bytes. To calculate the total memory footprint of each reference-type object, add up any other memory used by members in the object. To get the exact memory footprint, you also need to take into consideration “boundaries” but for our purpose that’s probably not important.

The memory foot-print of an object can be closely approximated using the following table (from MSDN Magazine):

Type	Managed Size in Bytes
System.Boolean	1
System.Byte	1
System.Char	2
System.Decimal	16
System.Double	8
System.Single	4
System.Int16	2
System.Int32	4
System.Int64	8
System.SByte	1
System.UInt16	2
System.UInt32	4
System.UInt64	8

Using the example Customer class above, let’s say that each Address object take up 1 KByte, and my application frequently needs to instantiate up to 10,000 Customer objects. Just by creating 10,000 Customer objects, we would need about 10 Megabytes of memory. Now let’s say that the HomeAddress member is only needed when the user drills down into the details of a Customer, and we are looking at a potential saving of 10 Megabytes of memory by using lazy instantiation on HomeAddress.

Memory Usage Can Also Impact Performance

Another important consideration with .NET managed code is garbage collection. In .NET managed code, memory usage has a hidden impact on performance in terms of the work the garbage collector has to perform to recover memory. The more memory you allocate and throw away, the more CPU cycles the garbage collector has to go through.

Recommendations

Pay closer attention to classes that get instantiated multiple times, such as Orders, OrderItems, etc.
For light-weight objects, or if you are not sure, use lazy instantiation.
If a member object is only used some of the times, use lazy instantiation.

Additional Reading

Rediscover the Lost Art of Memory Optimization in Your Managed Code

A New Way to Measure Lines of Code

August 23, 2007
Chinh Do
Programming
1 Comment

Is Lines of Code a good way to measure programmer output?

Background

First, some background: several studies (Sackman, Erikson, and Grant – 1968; Curtis – 1981) have shown that there are large variations in productivity levels among the best and worst programmers. While the numbers from the studies are controversial, I tend to agree with the basic premise that a super programmer can significantly outperform the average programmer. In my real-world projects, I estimate that variations have ranged up to 5/1.

As a manager or technical lead of a project, it’s important to have a good idea of how productive your programmers are. With a good idea of productivity levels, you can make better estimates for time and resources, and you can manage the individual developers better. Knowing that Programmer A has relatively lower productivity than his teammates, you can assign him smaller features and save the more complex ones for more productive/better programmers. Or, in the case of the negative-productivity programmer, you can identify him quickly and react appropriately instead of letting him continue to negatively impact your project.

So, is Lines of Code (LOC) per Day by itself a good way to measure productivity? I think the answer is a resounding no for many reasons:

A good programmer is able to implement the same feature with much less code than the average programmer.
Code quality is not taken into account. If you can write a thousand lines of code more than the average programmer, but your code is twice as buggy, that’s not really desirable.
Deleting and changing code, activities that are associated with important tasks such as re-factoring and bug-fixing, are not counted, or even counted negatively.

A New Method to Measure LOC

If LOC is not a good way to measure productivity, why am I writing about it? Because it’s still a good metric to have at your disposal, if you use it correctly, carefully, and in conjunction with other data. I also propose a revised method to calculate LOC that can better correlate with productivity. This “new-and-improved” LOC, in conjunction with other data (such as a Tech Lead’s intimate knowledge of his programmers’ style, efficiency, and skill level), may allow us to gain a better picture of programmer productivity.

The traditional way of calculating LOC has always been to count the lines of source code added. There are variations, such as not counting comments or counting statements instead of lines, but the general concept is the same: only lines or statements of code that are added are counted. The problems with the old method are:

Buggy code counts as much as correct code.
Deleting or changing code is not counted. Deleting/changing code is often done when you are re-factoring, or fixing bugs.
Optimizing a 20,000-line module to make it 10,000 lines actually impacts the LOC negatively.

At a conceptual level, my new method to calculate LOC (let’s call it “Lines of Correct Code” or LOCC) only counts correct lines of code, and code that is deleted or changed. Short of reviewing each line of code manually, how does a program know if a line of code is correct? My answer: if it remains in the code base at the end of the produce cycle, then for our purpose, it is “correct” code.

Algorithm for Counting Lines of Correct Code

Below is the proposed algorithm for calculating the LOCC. It should be possible to automate every of the steps described here using a modern source control system.

Analyze the source code at the end of the product cycle and keep a picture of the code that exists at the end. This is our base-line “correct” code.
Go back to the beginning of the project cycle and examine each check-in. For each check-in, count the lines of code that is added or changed and remains until the end. Lines of code that are deleted are also counted.
Auto-generated code is not counted or is weighted appropriately (after all, some work is involved).
Duplicate files are only counted once. In many applications, some files are mirrored (shared in SourceSafe-speak) in multiple locations. It’s only fair to count these files only once.

Ways to Use Lines of Correct Code

Here are a few ways I am planning to use LOCC in my projects:

Look at the LOCC per day (week/month) of the same developer over time.
Compare the LOCC per day between different programmers of equal efficiency and skill level.
Compare the total LOCC between different projects to get an idea of their relative size.
Correlate the LOCC of a programmer against his/her bug rate.
If a programmer writes code that is often deleted or changed later on, try to find out why.

Tell me what you think. Is this LOCC metric something that you would consider using in your project? I am writing a utility to calculate LOCC automatically from SourceSafe and if there’s sufficient interest, I will consider making it available.