This is the first in a series of posts trying to give bite-sized tidbits around async for people not familiar with Task or async but who want to take advantage of it without wanting to think too much about it.
For this first post, I wanted to talk about async method granularity and how it affects concurrency.
Just as an example, let’s say you have some existing code that downloads data from 3 different URL’s. In regular/synchronous code, it’s kind of six/half-dozen whether you have methods handle a collection of items or just one item at a time. Either way, you’re going to foreach over it and handle each item:
using System; | |
using System.Collections.Generic; | |
using System.Net; | |
namespace ItsBigItsHeavyItsWood | |
{ | |
class Program | |
{ | |
static void Main() | |
{ | |
var urlsToDownload = new[] | |
{ | |
"http://www.google.com/", | |
"http://www.microsoft.com/", | |
"http://www.apple.com/", | |
}; | |
DownloadUrls(urlsToDownload); | |
Console.ReadLine(); | |
} | |
private static void DownloadUrls(IEnumerable<string> urlsToDownload) | |
{ | |
foreach (var url in urlsToDownload) | |
{ | |
var client = new WebClient(); | |
Console.WriteLine("Starting to download url {0}", url); | |
var contents = client.DownloadData(url); | |
Console.WriteLine("Downloaded {0} bytes", contents.Length); | |
} | |
} | |
} | |
} |
using System; | |
using System.Net; | |
namespace ItsBigItsHeavyItsWood | |
{ | |
class Program | |
{ | |
static void Main() | |
{ | |
var urlsToDownload = new[] | |
{ | |
"http://www.google.com/", | |
"http://www.microsoft.com/", | |
"http://www.apple.com/", | |
}; | |
foreach (var url in urlsToDownload) | |
{ | |
DownloadUrl(url); | |
} | |
Console.ReadLine(); | |
} | |
private static void DownloadUrl(string url) | |
{ | |
var client = new WebClient(); | |
Console.WriteLine("Starting to download url {0}", url); | |
var contents = client.DownloadData(url); | |
Console.WriteLine("Downloaded {0} bytes", contents.Length); | |
} | |
} | |
} |
If you try to switch this kind of code over to async by making the ‘simple’ changes (WebClient –> HttpClient, then await its download method), then it makes a big difference which one you end up with.
Changing the ‘takes a collection’ version would end up as:
using System; | |
using System.Collections.Generic; | |
using System.Net; | |
using System.Net.Http; | |
namespace ItsBigItsHeavyItsWood | |
{ | |
class Program | |
{ | |
static void Main() | |
{ | |
var urlsToDownload = new[] | |
{ | |
"http://www.google.com/", | |
"http://www.microsoft.com/", | |
"http://www.apple.com/", | |
}; | |
DownloadUrls(urlsToDownload); | |
Console.ReadLine(); | |
} | |
private static async void DownloadUrls(IEnumerable<string> urlsToDownload) | |
{ | |
foreach (var url in urlsToDownload) | |
{ | |
var client = new HttpClient(); | |
Console.WriteLine("Starting to download url {0}", url); | |
var contents = await client.GetByteArrayAsync(url); | |
Console.WriteLine("Downloaded {0} bytes", contents.Length); | |
} | |
} | |
} | |
} |
(Yes, you would probably want to change the return type to Task, but I’m trying to minimize the diff for this example)
With this kind of change, you might think that you’d starting getting parallel downloads, but you’d be wrong.
Fiddler shows up what’s going on, as the requests are still serialized.
Trying to explain ‘why’ is tricky without delving too much into Task or await, but I think (at least at the moment) the minimal conceptual bit is looking at the calling method, specifically that ‘await means returning back to the caller’.
(Yes, there’s lots more to it than that, but trying to simplify as much as I can without losing too much fidelity)
So with this kind of method, when we hit the await the caller can continue to do work, but there’s nothing for it (in this example) to really do.
More importantly, the start of the later web requests don’t happen until later iterations of the foreach loop, since the ‘await’ means that method is ‘stuck’ waiting for the web request until it can continue.
The ‘fix’ for this (assuming your intent is concurrency/parallelization of the http requests) is to move the boundary such that when the ‘await’ in the method does the ‘return to caller’, there’s actually more work (specifically, queuing more http requests) that the caller can do.
using System; | |
using System.Net.Http; | |
namespace ItsBigItsHeavyItsWood | |
{ | |
class Program | |
{ | |
static void Main() | |
{ | |
var urlsToDownload = new[] | |
{ | |
"http://www.google.com/", | |
"http://www.microsoft.com/", | |
"http://www.apple.com/", | |
}; | |
foreach (var url in urlsToDownload) | |
{ | |
DownloadUrl(url); | |
} | |
Console.ReadLine(); | |
} | |
private static async void DownloadUrl(string url) | |
{ | |
var client = new HttpClient(); | |
Console.WriteLine("Starting to download url {0}", url); | |
var contents = await client.GetByteArrayAsync(url); | |
Console.WriteLine("Downloaded {0} bytes", contents.Length); | |
} | |
} | |
} |
With the method like this, we’re doing the foreach at the caller instead (Main in this case). This means that once we hit that ‘await’ keyword and the caller gets to continue, we actually have something meaningful we can do (calling DownloadUrl again and starting another request!)
This simple change in where we do the foreach results in the actual intended concurrency happening, as Fiddler shows us:
The lesson? When making async methods, try to accept parameters that are at the granularity of your intended parallelization. If you want different URL’s to get processed in parallel, then make sure your async method accepts just a url and not a collection of them.
NOTE: Yes, I’m glossing over lots of details and other options, but I’m trying not to complicate an already-too-long post.