Async lesson 1 of N: keep your async methods granular for better concurrency

This is the first in a series of posts trying to give bite-sized tidbits around async for people not familiar with Task or async but who want to take advantage of it without wanting to think too much about it. Smile

For this first post, I wanted to talk about async method granularity and how it affects concurrency.

Just as an example, let’s say you have some existing code that downloads data from 3 different URL’s.  In regular/synchronous code, it’s kind of six/half-dozen whether you have methods handle a collection of items or just one item at a time.  Either way, you’re going to foreach over it and handle each item:

using System;
using System.Collections.Generic;
using System.Net;
namespace ItsBigItsHeavyItsWood
{
class Program
{
static void Main()
{
var urlsToDownload = new[]
{
"http://www.google.com/",
"http://www.microsoft.com/",
"http://www.apple.com/",
};
DownloadUrls(urlsToDownload);
Console.ReadLine();
}
private static void DownloadUrls(IEnumerable<string> urlsToDownload)
{
foreach (var url in urlsToDownload)
{
var client = new WebClient();
Console.WriteLine("Starting to download url {0}", url);
var contents = client.DownloadData(url);
Console.WriteLine("Downloaded {0} bytes", contents.Length);
}
}
}
}
view raw Program.cs hosted with ❤ by GitHub
using System;
using System.Net;
namespace ItsBigItsHeavyItsWood
{
class Program
{
static void Main()
{
var urlsToDownload = new[]
{
"http://www.google.com/&quot;,
"http://www.microsoft.com/&quot;,
"http://www.apple.com/&quot;,
};
foreach (var url in urlsToDownload)
{
DownloadUrl(url);
}
Console.ReadLine();
}
private static void DownloadUrl(string url)
{
var client = new WebClient();
Console.WriteLine("Starting to download url {0}", url);
var contents = client.DownloadData(url);
Console.WriteLine("Downloaded {0} bytes", contents.Length);
}
}
}
view raw Program.cs hosted with ❤ by GitHub

If you try to switch this kind of code over to async by making the ‘simple’ changes (WebClient –> HttpClient, then await its download method), then it makes a big difference which one you end up with.

Changing the ‘takes a collection’ version would end up as:

using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
namespace ItsBigItsHeavyItsWood
{
class Program
{
static void Main()
{
var urlsToDownload = new[]
{
"http://www.google.com/&quot;,
"http://www.microsoft.com/&quot;,
"http://www.apple.com/&quot;,
};
DownloadUrls(urlsToDownload);
Console.ReadLine();
}
private static async void DownloadUrls(IEnumerable<string> urlsToDownload)
{
foreach (var url in urlsToDownload)
{
var client = new HttpClient();
Console.WriteLine("Starting to download url {0}", url);
var contents = await client.GetByteArrayAsync(url);
Console.WriteLine("Downloaded {0} bytes", contents.Length);
}
}
}
}
view raw Program.cs hosted with ❤ by GitHub

(Yes, you would probably want to change the return type to Task, but I’m trying to minimize the diff for this example)

With this kind of change, you might think that you’d starting getting parallel downloads, but you’d be wrong. Smile

Fiddler shows up what’s going on, as the requests are still serialized.

image

Trying to explain ‘why’ is tricky without delving too much into Task or await, but I think (at least at the moment) the minimal conceptual bit is looking at the calling method, specifically that ‘await means returning back to the caller’.

(Yes, there’s lots more to it than that, but trying to simplify as much as I can without losing too much fidelity)

So with this kind of method, when we hit the await the caller can continue to do work, but there’s nothing for it (in this example) to really do.

More importantly, the start of the later web requests don’t happen until later iterations of the foreach loop, since the ‘await’ means that method is ‘stuck’ waiting for the web request until it can continue.

The ‘fix’ for this (assuming your intent is concurrency/parallelization of the http requests) is to move the boundary such that when the ‘await’ in the method does the ‘return to caller’, there’s actually more work (specifically, queuing more http requests) that the caller can do.

using System;
using System.Net.Http;
namespace ItsBigItsHeavyItsWood
{
class Program
{
static void Main()
{
var urlsToDownload = new[]
{
"http://www.google.com/&quot;,
"http://www.microsoft.com/&quot;,
"http://www.apple.com/&quot;,
};
foreach (var url in urlsToDownload)
{
DownloadUrl(url);
}
Console.ReadLine();
}
private static async void DownloadUrl(string url)
{
var client = new HttpClient();
Console.WriteLine("Starting to download url {0}", url);
var contents = await client.GetByteArrayAsync(url);
Console.WriteLine("Downloaded {0} bytes", contents.Length);
}
}
}
view raw Program.cs hosted with ❤ by GitHub

With the method like this, we’re doing the foreach at the caller instead (Main in this case).  This means that once we hit that ‘await’ keyword and the caller gets to continue, we actually have something meaningful we can do (calling DownloadUrl again and starting another request!)

This simple change in where we do the foreach results in the actual intended concurrency happening, as Fiddler shows us:

image

The lesson?  When making async methods, try to accept parameters that are at the granularity of your intended parallelization.  If you want different URL’s to get processed in parallel, then make sure your async method accepts just a url and not a collection of them.

NOTE: Yes, I’m glossing over lots of details and other options, but I’m trying not to complicate an already-too-long post. Smile

C# 5 async/await: the oncoming flood of confusion

Caveat #1: First and foremost, let me state that I love, Love, LOVE the feature.  It’s a massive improvement in the language and is the key to enabling a change in the app landscape towards async-by-default, which in turn will mean massive gains in UI responsiveness and server scalability.  It’s HUGE.

Caveat #2: this kind of issue has arguably already happened in C# with iterator blocks/IEnumerable<T>/yield (for instance, how deferred execution bites people), and I’m really just trying to point out that the same is going to happen here, but to a larger extent.

With that said, there’s basically 2 buckets of people that I think about as using this feature to consume async API’s in the future.

Bucket #1 – People that are coming from the world of using async API’s already, but were doing so without the benefit of async/await

In this bucket you get lots of people that:

  • are likely to know what APM, EAP , and TAP refer to
  • may or may not have created their own async API’s, but are familiar with consuming them
  • Due to the growth curve, they’re most likely to be familiar with the EAP pattern of FooAsync() and a FooCompleted method
  • when writing EAP-type code, are probably as likely to write FooCompleted as a separate method as doing it inline as anonymous or a lambda
  • are likely already familiar with issues with thread affinity (typically via updating the UI on the ‘UI thread’)

These people (myself included) are typically happy with async/await because of what it enables for code readability and cleanliness.  Now you can have async code that you can read and (more importantly) reason about without getting lost in the details/ceremony of the pattern(s) used to implement it.  It also brings back the ability to use things like try/catch/finally and using blocks the way you did in synchronous code, so you can get rid of all the headaches you have without it in making sure to dispose of things in both success and failure cases (effectively doing ‘finally’ manually).  Yes, there are patterns/techniques/libraries for improving this story, but none ‘out of the box’ and none (IMHO) to this extent.  The TPL itself does a good job of improving the code for such things, but there’s only so much it can do as a library.

NOTE: this isn’t necessarily saying these are ‘better’ developers, as in many cases some of them were just forced into this bucket (for instance, Silverlight forcing async API’s), just that they’re at least somewhat familiar with the terrain of the world that async/await lives in.

Bucket #2 – People that aren’t familiar with using async API’s, and are either ‘forced’ (Silverlight/WP7/Metro) into async or see async/await as the way to get rid of their UI blocking or concurrency issues ‘for free’

For people that might normally avoid async API’s (when synchronous versions are available), looking at async/await code is surprisingly familiar.  Gone are the things like lambdas and event hookups and the like.  Gone even are the usages of Task or Task<T> and associated things like TaskFactory, AggregateException, etc., which are great if you’re coming from APM/EAP, but still more complicated and less obvious than ‘regular’ synchronous code.

One big ‘gotcha’ here is that ‘await’ doesn’t do what you might think of it as doing based on the name.  When you read some code that does var foo = await something.GetFooAsync(); without any additional context, the tendency will be to think “oh, cool, it calls this async methods, but then waits for it to complete, assigns the result to foo, and we can go back to our normal/regular/synchronous code”.  IOW, ‘await’ can be interpreted as something that just bridges the gap between async and sync API’s – using await just lets you call async API’s but still keep your original code.  Yay!  Now I can keep writing code the way I always have and not have to deal with all the ugly code!  When the compiler tells them that using ‘await’ means they have to mark the containing method as ‘async’, they (or a ‘quick fix’ / ReSharper / CodeRush / whatever) will do so and they’ll move on.

Of course, those familiar with the feature know that’s not what happens at all, and unfortunately it’s harder to explain (and even harder for people to know what’s going on to search for it in the first place) due to fact that the above (flawed) logic is actually correct within the scope of that method.  When looking at and reasoning about that method, you can think of it that way (that’s what this enables, after all).  The problem, of course, is that the place where the runtime behavior has changed in terms of how things behave is in the caller.  Because of this, when looking at demos and snippets of code, it’s much harder to reason about runtime behavior if you’re not familiar with at least the concept that the rest of the method is made a continuation on a task and the method returns.

Worst of all, if you thought deferred execution was hard to track down when it bit you and once you realized what was happening, you needed to ToArray/ToList/whatever, you’re in for a whole new world of difficulty here.

Much like the iterator block support (and IEnumerable in general), the deferred execution is great (memory savings in particular), but if you’re either not aware or not thinking about it at the right time, it’ll bite you.

What to do?

In an ideal world, everyone using async/await will have a chance to learn how it works and get familiar enough with it so that when they use it, they’re aware ahead of time of the potential gotchas.  In particular, Jon Skeet’s EduAsync series is a great place to start.

In the real world, lots of people will be using it as the ‘least code’ and ‘simplest’ approach to using async API’s that they’re forced to use (as Metro shows us, forcing async in your API surface area is a trend that’s going to continue).

So, as I help people (friends, co-workers, StackOverflow users, whatever) with their pain points around async/await due to how it works, especially issues that seem to crop up often, I’ll try to make (short, hopefully!) posts about the issue with hopefully helpful guidance on what you can do to try to minimize the chances of being bitten by it.  Something akin to ‘async/await for the rest of us’ or ‘async/await cliff notes’ I guess.

I have no idea how many or how often they’ll happen, but hopefully at least one will help someone at some point. Smile