skip to Main Content

Splitting a Generic List<T> into Multiple Chunks

Greetings visitor from the year 2020! You can get the latest code for this from my Github repo here. Thanks for visiting.

“Chunking” is the technique used to break large amount of work into smaller and manageable parts. Here are a few reasons I can think of why you want to chunk, especially in a batch process where you have to process large number of items:

  • Manage/minimize peak memory requirement.
  • During failures, the entire process can resume at the last failure point, instead of all the way from the beginning.
  • Take advantage of multiple processors/cores (by having multiple threads, each processing a small chunk).

Here’s a helper method to quickly split a List<T> into chunks:

/// <summary>
/// Splits a <see cref="List{T}"/> into multiple chunks.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="list">The list to be chunked.</param>
/// <param name="chunkSize">The size of each chunk.</param>
/// <returns>A list of chunks.</returns>
public static List<List<T>> SplitIntoChunks<T>(List<T> list, int chunkSize)
{
    if (chunkSize <= 0)
    {
        throw new ArgumentException("chunkSize must be greater than 0.");
    }

    List<List<T>> retVal = new List<List<T>>();
    int index = 0;
    while (index < list.Count)
    {
        int count = list.Count - index > chunkSize ? chunkSize : list.Count - index;
        retVal.Add(list.GetRange(index, count));

        index += chunkSize;
    }

    return retVal;
}
 

If you want to be more efficient at the cost of readability, the second version below moves the items from the big list into the small chunks, so both types of lists will not need to be in memory at once:

 

/// <summary>
/// Break a <see cref="List{T}"/> into multiple chunks. The <paramref name="list="/> is cleared out and the items are moved
/// into the returned chunks.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="list">The list to be chunked.</param>
/// <param name="chunkSize">The size of each chunk.</param>
/// <returns>A list of chunks.</returns>
public static List<List<T>> BreakIntoChunks<T>(List<T> list, int chunkSize)
{
    if (chunkSize <= 0)
    {
        throw new ArgumentException("chunkSize must be greater than 0.");
    }

    List<List<T>> retVal = new List<List<T>>();

    while (list.Count > 0)
    {
        int count = list.Count > chunkSize ? chunkSize : list.Count;
        retVal.Add(list.GetRange(0, count));
        list.RemoveRange(0, count);
    }

    return retVal;
}

Finds of the Week – April 6, 2008

Programming

.NET/C#

Windows Mobile/Pocket PC

Gaming

Something a Little Different

Finds of the Week – March 30, 2008

Programming

.NET/C#

PowerShell

Finds of the Week – March 23, 2008

Programming

C#

Something a Little Different

Finds of The Week – March 16, 2008

Programming

C#.NET

Software & Tools

Gadgets

  • I got myself a ThinkPad X60 Tablet PC last week, so naturally I’m writing this article on it :-). The Tablet PC Team Blog is a good starting location to look for Tablet PC tips. The combination of a Tablet PC and Microsoft OneNote is very nice. I am surprised at how good the text recognition engine engine in Windows Vista is.

     Tablet PC Writing Pad

Misc

Something a Little Different

Finds of the Week, March 9, 2008

Programming

C#/.NET

Software and Tools

Something a Little Different

Finds of the Week – March 2, 2008

Programming

C#.NET

.NET Tips & Tricks

  • Did you know you can give threads any names you want (MSDN)? The names are extremely useful when it comes to debugging time:

    Thread names and debugging

  • System.IO.Directory.CreateDirectory will create all directories and subdirectories as specified by the path parameter. No need to write code to create each directory in the chain. Just do this:
    Directory.CreateDirectory(@"c:\MyApp\Env\Dev");
  • System.IO.Path.GetDirectoryName returns the directory name from a fully qualified file name.

Powershell

  • Round-robin game scheduling algorithm in Powershell. By Scott Hanselman. Check out my C# 2.0 algorithm in the comments section.
  • Mitch Denny wrote How To: Host the PowerShell Runtime.
  • Use Powershell array expression syntax @(…) allows you to force a scalar return value to be wrapped in a array, if it’s not already in an array. I learned about this the hard way while trying to figure out while Get-Childitem sometimes returns an array and sometimes a scalar. Bruce Payette wrote more about it here.

Windows Mobile / Pocket PC

  • I needed a way to stream music and other media to my Windows Mobile phone (Samsung SCH-i760) and all the PCs around the house. Orb seems to be the answer. I’ve only had it running for a few days but it seems to be working great. I can stream music and photos (have not tested videos yet) to any PC in the house or anywhere on the net. I can also listen to my entire music library on my i760 phone anytime, anywhere through Verizon Wireless’s unlimited (with a catch… not to exceed 5GB) EDVO connection.

    Orb Mycast

  • I am a Google Mobile guy, but Yahoo! Go for Windows Mobile also looks very cool. I downloaded it to my Samsung SCH-i760 a few days ago. I am still checking it out but here are a few things I like:
    • Nice and responsive interface.
    • Built-in RSS Reader.
    • Street and satellite maps.

      Here are a few screenshots:

      Yahoo! Go

      Yahoo! Go

      Yahoo! Go Weather 

Software and Tools

  • You can configure Notepad++ to always use spaces for tabs/indentation. The option is a little hidden. It’s in Settings/Preferences/MISC, under Tab Setting:

    Notepad++ tab to spaces setting

Something a Little Different

Try/Catch Blocks Can Hurt Performance

Over at Programmers Heaven.com, there’s an interesting article on the potential performance impact of try/catch blocks. The article concluded that the average cost of a try/catch block is essentially nothing (sorry there’s no author information on the post so I couldn’t tell who wrote it), and that .NET/C# programmers should not think twice about using try/catch blocks.

The author is right that a try/catch block has essentially zero cost. However, like most coding performance issues, exceptions and try/catch blocks do not have performance implications until they occur in some type of loop. Something like this will do the job:

Dictionary<int, int> numbers = new Dictionary<int, int>();
Stopwatch w = new Stopwatch();
w.Start();
int notFound = 0;
for (int i = 1; i <= 1000000; i++)
{
    try
    {
        int value = numbers[i];
    }
    catch (KeyNotFoundException)
    {
        notFound++;
    }
}

w.Stop();
Console.WriteLine(notFound);
Console.WriteLine("Elapsed: " + w.ElapsedMilliseconds + ".");

In the block of code above, I am trying to find the number of integers from 1 to 1,000,000 that are not in the numbers dictionary. One way to do it is to try to access the dictionary item by key. Since the Dictionary class will throw a KeyNotFoundException if the key is not found, that’s how I am going to know whether each value is in the dictionary or not.

Well, let’s just see how long that code takes to run. On my virtual PC it took … hold on a sec, it’s till running… still waiting… not quite there yet… finally: 101031 (101 seconds).

If you have any doubt, this is the type of try/catch block or exception handling they advice against. 🙂

The above logic, when implemented correctly without using a try/catch block, took only 10 milliseconds. Yes, that’s not a typo: 10 milliseconds. Oh, only about 10,000 times faster.

Here’s the correct code:

Dictionary<int, int> numbers = new Dictionary<int, int>();

Stopwatch w = new Stopwatch();
w.Start();

int notFound = 0;
for (int i = 1; i <= 1000000; i++)
{
    if (! numbers.ContainsKey(i))
    {
        notFound++;
    }
}

w.Stop();
Console.WriteLine(notFound);
Console.WriteLine("Elapsed: " + w.ElapsedMilliseconds + ".");

So, do consider performance impact when using exceptions and try/catch blocks. Avoid using exception handling to implement normal program flow. Here are some links on exception handling best practices in .NET:

kick it on DotNetKicks.com

Finds of the Week – February 24, 2008

.NET Programming

WCF

Something Different

  • Learning to Smoke. It’s not permitted. It pisses people off. It makes you puke. It confuses you, and it brings clarity. It makes you an outcast, and it helps you meet wonderful strangers. Lessons from a man who did the unthinkable.
Back To Top