Asynchronous

Limiting asynchrony with task parallelism

When I started re-writ­ing this web­site, I wanted to make good use of my multi-core CPU. Gen­er­at­ing hun­dreds of pages using XSL trans­forms and plenty of pre-pro­cess­ing in C#, there's a lot of par­al­lelism to be had.

I began by using the TPL's data par­al­lelism fea­tures: mainly Parallel.​ForEach and Parallel.​Invoke. These are super easy to use, and made an im­me­di­ate huge dif­fer­ence.

Then the Vi­sual Stu­dio 11 de­vel­oper pre­view came out, and I felt com­pelled to make use of its new async fea­tures. This meant ditch­ing the Par­al­lel meth­ods all to­gether and writ­ing for task par­al­lelism.

There are still parts of the .NET Frame­work which don't sup­port async, and XML is one of them. Be­cause I'm read­ing rel­a­tively small doc­u­ments, I was able to work around these lim­i­ta­tions by asyn­chro­nously fill­ing a Mem­o­ryS­tream from a file and feed­ing the Mem­o­ryS­tream to the XML classes:

Task<FileStream> OpenReadAsync(string fileName)
{
    return Task.Factory.StartNew(state =>
        new FileStream((string)state, FileMode.Open, FileAccess.Read,
                       FileShare.Read, 4096, true), fileName);
}

async Task<XmlReader> CreateXmlReader(string fileName,
                                      XmlReaderSettings settings = null)
{
    MemoryStream ms = new MemoryStream();
    
    using (FileStream fs = await OpenReadAsync(fileName))
    {
        await fs.CopyToAsync(ms);
    }

    ms.Position = 0;
    return XmlReader.Create(ms, settings, fileName);
}

But I had one more prob­lem to solve. For ef­fi­ciency, Parallel.​ForEach par­ti­tions its items into ranges which will be op­er­ated on con­cur­rently. A side ef­fect of this that I was re­ly­ing on was that only so many I/O op­er­a­tions would be able to hap­pen at once. In my new code I'm sim­ply launch­ing all these tasks at once rather than par­ti­tion­ing—this ab­solutely killed per­for­mance as po­ten­tially hun­dreds of con­cur­rent I/Os caused my disk to seek like crazy.

What I ended up doing here was cre­at­ing a ticket sys­tem which can be used to allow only a lim­ited num­ber of I/Os to hap­pen con­cur­rently: es­sen­tially a safe task-based sem­a­phore.

sealed class AsyncLimiter
{
    public AsyncLimiter(int max);
    public Task<IDisposable> Lock();
}

The full im­ple­men­ta­tion is avail­able in Sub­ver­sion and under a 2-clause BSD li­cense. Using it is very sim­ple:

AsyncLimiter limiter = new AsyncLimiter(4);

async Task<FileStream> OpenReadAsync(string fileName)
{
    using (IDisposable limiterlock = await limiter.Lock())
    {
        return await Task.Factory.StartNew(state =>
            new FileStream((string)state, FileMode.Open, FileAccess.Read,
                           FileShare.Read, 4096, true), fileName);
    }
}

async Task<XmlReader> CreateXmlReader(string fileName,
                                      XmlReaderSettings settings = null)
{
    MemoryStream ms = new MemoryStream();

    using (FileStream fs = await OpenReadAsync(fileName))
    using (IDisposable limiterlock = await limiter.Lock())
    {
        await fs.CopyToAsync(ms);
    }

    ms.Position = 0;
    return XmlReader.Create(ms, settings, fileName);
}

When the lock gets dis­posed, it'll let the next op­er­a­tion in line progress. This was sim­ple to im­ple­ment ef­fi­ciently using In­ter­locked meth­ods and a Con­cur­ren­tQueue.

Some op­er­a­tions—file open­ing and ex­is­tence test­ing, di­rec­tory cre­ation, etc.—have no asyn­chro­nous ana­log. For these there is no good so­lu­tion, so I sim­ply wrapped them in a task as in the OpenReadAsync ex­am­ple above. They're rare enough that it hasn't been a prob­lem.

The end re­sult? Ac­tu­ally about 50% bet­ter per­for­mance than using the Par­al­lel meth­ods. When all the files are in cache, I'm able to gen­er­ate this en­tire web­site from scratch in about 0.7 sec­onds.

Asynchronous page faults

With I/O, we’ve got some choices:

  1. Syn­chro­nous, copy­ing from OS cache ( fread). This is the sim­plest form of I/O, but isn’t very scal­able.
  2. Syn­chro­nous, read­ing di­rectly from OS cache (mem­ory map­ping). This is wicked fast and ef­fi­cient once mem­ory is filled, but aside from some cases with read-​ahead, your threads will still block with page faults.
  3. Asyn­chro­nous, copy­ing from OS cache ( ReadFile). Much more scal­able than fread, but each read still in­volves du­pli­cat­ing data from the OS cache into your buffer. Fine if you’re read­ing some data only to mod­ify the buffer in place, but still not very great when you’re treat­ing it as read only (such as to send over a socket).
  4. Asyn­chro­nous, main­tain­ing your own cache ( FILE_FLAG_NO_BUFFERING). More scal­able still than Read­File, but you need to do your own caching and it’s not shared with other processes.

Note that there’s one im­por­tant choice miss­ing: mem­ory map­ping with asyn­chro­nous page faults. As far as I know there are no op­er­at­ing sys­tems that ac­tu­ally offer this—it’s kind of a dream fea­ture of mine. There are two APIs that will help sup­port this:

HANDLE CreateMemoryManager();
BOOL MakeResident(HANDLE, LPVOID, SIZE_T, LPOVERLAPPED);

CreateMemoryManager opens a han­dle to the Win­dows mem­ory man­ager, and MakeResident will fill the pages you spec­ify (re­turn­ing true for syn­chro­nous com­ple­tion, false for error/async like every­thing else). The best of both worlds: fast, easy ac­cess through mem­ory, a full asyn­chro­nous work­flow, and shared cache usage. This would be es­pe­cially use­ful on mod­ern CPUs that offer gi­gan­tic ad­dress spaces.

The mem­ory man­ager al­ready has sim­i­lar func­tion­al­ity in there some­where, so it might not be dif­fi­cult to pull into user-​mode. Just an ed­u­cated guess. Maybe it’d be ter­ri­bly dif­fi­cult. Dream fea­ture!

Is C# the Boost of C-family languages?

For all the cons of giv­ing a sin­gle en­tity con­trol over C#, one pro is that it gives the lan­guage an un­matched agility to try new things in the C fam­ily of lan­guages. LINQ—both its lan­guage in­te­gra­tion and its back­ing APIs—is an in­cred­i­bly pow­er­ful tool for query­ing and trans­form­ing data with very con­cise code. I re­ally can’t ex­press how much I’ve come to love it.

The new async sup­port an­nounced at PDC10 is ba­si­cally the holy grail of async cod­ing, let­ting you focus on what your task is and not how you’re going to im­ple­ment a com­plex async code path for it. It’s an old idea that many async coders have come up with, but, as far as I know, has never been suc­cess­fully im­ple­mented sim­ply be­cause it re­quired too much lan­guage sup­port.

The lack of peer re­view and stan­dards com­mit­tee for .​NET shows—there’s a pretty high rate of turnover as Mi­crosoft tries to iron down the right way to tackle prob­lems, and it re­sults in a very large li­brary with lots of re­dun­dant func­tion­al­ity. As much as this might hurt .​NET, I’m start­ing to view C# as a sort of Boost for the C lan­guage fam­ily. Some great ideas are get­ting real-​world use, and if other lan­guages even­tu­ally feel the need to get some­thing sim­i­lar, they will have a bounty of ex­pe­ri­ence to pull from.

C++, at least, is a ter­ri­fy­ingly com­plex lan­guage. Get­ting new fea­tures into it is an up­hill bat­tle, even when they ad­dress a prob­lem that every­one is frus­trated with. Get­ting com­plex new fea­tures like these into it would be a very long process, with a lot of ar­gu­ing and years of delay. Any extra in­cu­ba­tion time we can give them is a plus.