February 2012 Archives

Limiting asynchrony with task parallelism

When I started re-writ­ing this web­site, I wanted to make good use of my multi-core CPU. Gen­er­at­ing hun­dreds of pages using XSL trans­forms and plenty of pre-pro­cess­ing in C#, there's a lot of par­al­lelism to be had.

I began by using the TPL's data par­al­lelism fea­tures: mainly Parallel.​ForEach and Parallel.​Invoke. These are super easy to use, and made an im­me­di­ate huge dif­fer­ence.

Then the Vi­sual Stu­dio 11 de­vel­oper pre­view came out, and I felt com­pelled to make use of its new async fea­tures. This meant ditch­ing the Par­al­lel meth­ods all to­gether and writ­ing for task par­al­lelism.

There are still parts of the .NET Frame­work which don't sup­port async, and XML is one of them. Be­cause I'm read­ing rel­a­tively small doc­u­ments, I was able to work around these lim­i­ta­tions by asyn­chro­nously fill­ing a Mem­o­ryS­tream from a file and feed­ing the Mem­o­ryS­tream to the XML classes:

Task<FileStream> OpenReadAsync(string fileName)
{
    return Task.Factory.StartNew(state =>
        new FileStream((string)state, FileMode.Open, FileAccess.Read,
                       FileShare.Read, 4096, true), fileName);
}

async Task<XmlReader> CreateXmlReader(string fileName,
                                      XmlReaderSettings settings = null)
{
    MemoryStream ms = new MemoryStream();
    
    using (FileStream fs = await OpenReadAsync(fileName))
    {
        await fs.CopyToAsync(ms);
    }

    ms.Position = 0;
    return XmlReader.Create(ms, settings, fileName);
}

But I had one more prob­lem to solve. For ef­fi­ciency, Parallel.​ForEach par­ti­tions its items into ranges which will be op­er­ated on con­cur­rently. A side ef­fect of this that I was re­ly­ing on was that only so many I/O op­er­a­tions would be able to hap­pen at once. In my new code I'm sim­ply launch­ing all these tasks at once rather than par­ti­tion­ing—this ab­solutely killed per­for­mance as po­ten­tially hun­dreds of con­cur­rent I/Os caused my disk to seek like crazy.

What I ended up doing here was cre­at­ing a ticket sys­tem which can be used to allow only a lim­ited num­ber of I/Os to hap­pen con­cur­rently: es­sen­tially a safe task-based sem­a­phore.

sealed class AsyncLimiter
{
    public AsyncLimiter(int max);
    public Task<IDisposable> Lock();
}

The full im­ple­men­ta­tion is avail­able in Sub­ver­sion and under a 2-clause BSD li­cense. Using it is very sim­ple:

AsyncLimiter limiter = new AsyncLimiter(4);

async Task<FileStream> OpenReadAsync(string fileName)
{
    using (IDisposable limiterlock = await limiter.Lock())
    {
        return await Task.Factory.StartNew(state =>
            new FileStream((string)state, FileMode.Open, FileAccess.Read,
                           FileShare.Read, 4096, true), fileName);
    }
}

async Task<XmlReader> CreateXmlReader(string fileName,
                                      XmlReaderSettings settings = null)
{
    MemoryStream ms = new MemoryStream();

    using (FileStream fs = await OpenReadAsync(fileName))
    using (IDisposable limiterlock = await limiter.Lock())
    {
        await fs.CopyToAsync(ms);
    }

    ms.Position = 0;
    return XmlReader.Create(ms, settings, fileName);
}

When the lock gets dis­posed, it'll let the next op­er­a­tion in line progress. This was sim­ple to im­ple­ment ef­fi­ciently using In­ter­locked meth­ods and a Con­cur­ren­tQueue.

Some op­er­a­tions—file open­ing and ex­is­tence test­ing, di­rec­tory cre­ation, etc.—have no asyn­chro­nous ana­log. For these there is no good so­lu­tion, so I sim­ply wrapped them in a task as in the OpenReadAsync ex­am­ple above. They're rare enough that it hasn't been a prob­lem.

The end re­sult? Ac­tu­ally about 50% bet­ter per­for­mance than using the Par­al­lel meth­ods. When all the files are in cache, I'm able to gen­er­ate this en­tire web­site from scratch in about 0.7 sec­onds.

Everything old is new again

I moved to Word­press when I got sick of lim­i­ta­tions in an XSLT-based site. I've now moved back to an XSLT-based site due to Word­press' heavy SQL usage—Source­Forge's SQL servers are pretty over­loaded, and it was mak­ing this site run pretty slow.

It took me a while to fin­ish the tran­si­tion and there are still a few things I need to iron out, but I think it's ready enough for prime time. It's been a great tran­si­tion, though. I fi­nally got to rewrite that old theme, using clean HTML and CSS. All the lim­i­ta­tions I hated in my old XSLT-based site are gone as well, al­beit with a good deal more pre-pro­cess­ing in­volved.

One of the things I've changed is in how im­ages are scaled: when there isn't enough screen space (such as re­duc­ing the width of the screen), they'll all shrink to fit. This was im­por­tant to me be­cause I've grown to use Win­dows 7's snap­ping fea­ture and it's im­por­tant that sites still work when using only half the screen. This ac­tu­ally re­vealed a bug in Google Chrome and per­haps other We­bkit-based browsers, so hope­fully that gets fixed soon.

An­other thing I've started try­ing is to size float­ing im­ages based on megapix­els in­stead of sim­ply a max­i­mum width/height. This was sim­ple to do and will in­crease aes­thet­ics by en­sur­ing no im­ages ap­pear ab­nor­mally big com­pared to other ones. So far I like the re­sults.

Now that I'm mostly done with this, I should be able to write a lot more. Those promised re­sam­pling ar­ti­cles are com­ing, I swear!