Chinh Do

Splitting a Generic List<T> into Multiple Chunks

15th May 2008

Splitting a Generic List<T> into Multiple Chunks

“Chunking” is the technique used to break large amount of work into smaller and manageable parts. Here are a few reasons I can think of why you want to chunk, especially in a batch process where you have to process large number of items:

  • Manage/minimize peak memory requirement.
  • During failures, the entire process can resume at the last failure point, instead of all the way from the beginning.
  • Take advantage of multiple processors/cores (by having multiple threads, each processing a small chunk).

Here’s a helper method to quickly split a List<T> into chunks:

/// <summary>
/// Splits a <see cref="List{T}"/> into multiple chunks.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="list">The list to be chunked.</param>
/// <param name="chunkSize">The size of each chunk.</param>
/// <returns>A list of chunks.</returns>
public static List<List<T>> SplitIntoChunks<T>(List<T> list, int chunkSize)
{
    if (chunkSize <= 0)
    {
        throw new ArgumentException("chunkSize must be greater than 0.");
    }

    List<List<T>> retVal = new List<List<T>>();
    int index = 0;
    while (index < list.Count)
    {
        int count = list.Count - index > chunkSize ? chunkSize : list.Count - index;
        retVal.Add(list.GetRange(index, count));

        index += chunkSize;
    }

    return retVal;
}
 

If you want to be more efficient at the cost of readability, the second version below moves the items from the big list into the small chunks, so both types of lists will not need to be in memory at once:

 

/// <summary>
/// Break a <see cref="List{T}"/> into multiple chunks. The <paramref name="list="/> is cleared out and the items are moved
/// into the returned chunks.
/// </summary>
/// <typeparam name="T"></typeparam>
/// <param name="list">The list to be chunked.</param>
/// <param name="chunkSize">The size of each chunk.</param>
/// <returns>A list of chunks.</returns>
public static List<List<T>> BreakIntoChunks<T>(List<T> list, int chunkSize)
{
    if (chunkSize <= 0)
    {
        throw new ArgumentException("chunkSize must be greater than 0.");
    }

    List<List<T>> retVal = new List<List<T>>();

    while (list.Count > 0)
    {
        int count = list.Count > chunkSize ? chunkSize : list.Count;
        retVal.Add(list.GetRange(0, count));
        list.RemoveRange(0, count);
    }

    return retVal;
}
This entry was posted on Thursday, May 15th, 2008 at 11:07 pm and is filed under Dotnet/.NET - C#, Programming. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

There are currently 3 responses to “Splitting a Generic List<T> into Multiple Chunks”

  1. 1 On May 16th, 2008, Dew Drop - May 16, 2008 | Alvin Ashcraft's Morning Dew said:

    [...] Splitting a Generic List<T> Into Multiple Chunks (Chinh Do) [...]

  2. 2 On June 3rd, 2008, Anjo said:

    (list.Count – index) gives the number of remaining elements including the index element. So it doesn’t have to be greater than chunkSize, rather greater than or equal.

    I think this line:

    int count = list.Count – index > chunkSize ? chunkSize : list.Count – index;

    should be:

    int count = list.Count – index >= chunkSize ? chunkSize : list.Count – index;

    Thanks for the post – helped me with a problem I was having.

  3. 3 On June 3rd, 2008, Chinh Do said:

    Hi Anjo: I checked my code again and it does work as expected (I did have a pretty comprehensive unit test for it). However, you have very sharp eyes and your version also works just fine. When list.Count – index == chunkSize, either the left side or right side of the equation will get you the same thing.

    Thanks for the comment. It was a good brain excercise.

    Chinh

Leave a Comment