March 2008 Archives

Life in Windows 2008

Given that I got a free copy of Win­dows Server 2008 at the big launch, I thought I’d try it out on my desk­top. It and Vista are ba­si­cally the same OS, just dif­fer­ent ar­ti­fi­cial lim­its. Fol­low­ing Vi­jaysh­inva Kar­nure’s di­rec­tions, I had a fully func­tional desk­top in no time.

I’ve only run into a cou­ple minor is­sues. Half-Life 2 doesn’t seem to have a proper codec to play their intro video, and the “Win­dows Live Mes­sen­ger Down­load” link in the start menu takes you to an in­staller that re­fuses to run on a server OS.

The internationalization test

When choos­ing a de­vel­op­ment frame­work there are many things to look for, but the first thing I no­tice is solid in­ter­na­tion­al­iza­tion sup­port.

Does that seem weird to any­one? That is ex­actly why. In­ter­na­tion­al­iza­tion is ex­tremely hard to get right, and is usu­ally the last thing on any­one’s mind. If a frame­work of­fers fully in­te­grated in­ter­na­tion­al­iza­tion, there is a good chance it will be well thought out in other areas and be worth my time to use.

GCC 4.3, C++0x preview

GCC 4.3 came out a cou­ple weeks ago, and I fi­nally got time to give its ex­per­i­men­tal C++0x sup­port a go. Specif­i­cally, I was in­ter­ested in two fea­tures of it: vari­adic tem­plates and rvalue ref­er­ences.

There is one prime ex­am­ple of what these two fea­tures are awe­some for: per­fect for­ward­ing. Take a mem­ory pool. You might have some­thing like this:

class pool {
   void* alloc();

   template<typename T>
   T* construct() { return new(alloc()) T; }
};

But that is hardly sat­is­fac­tory. What if you want to con­struct T with an ar­gu­ment?

class pool {
   void* alloc();

   template<typename T>
   T* construct() { return new(alloc()) T; }

   template<typename T, typename ArgT>
   T* construct(const ArgT &arg) { return new(alloc()) T(arg); }
};

So we add a new func­tion to han­dle pass­ing an arg. Bet­ter, but still not very great. What if you want to pass it mul­ti­ple ar­gu­ments?

C++ has very few prob­lem that can’t be worked around in a rel­a­tively strait­for­ward way. Un­for­tu­nately, this is one of those prob­lems. The cur­rent so­lu­tion most li­brary de­vel­op­ers em­ploy will in­volve some re­ally nasty pre­proces­sor hacks to gen­er­ate sep­a­rate func­tions for 1, 2, 3, up to maybe 10 or 15 ar­gu­ments. So, how do we solve this?

Enter vari­adic tem­plates, the new C++ fea­ture built specif­i­cally to solve this. Here is an up­dated pool class that takes any num­ber of ar­gu­ments:

class pool {
   void* alloc();

   template<typename T, typename Args...>
   T* construct(const Args&... args) { return new(alloc()) T(args...); }
};

Pretty sim­ple! Those el­lipses will ex­pand into zero or more args. Great – we’re al­most there. But we still have a prob­lem here: what hap­pens if the con­struc­tor for T takes some ar­gu­ments as non-const ref­er­ences? This con­struct func­tion will try to pass them as const ref­er­ences, re­sult­ing in a com­pile error. We can’t have it pass args as non-const ref­er­ences, be­cause then if you pass it an rvalue—such as a tem­po­rary—it will gen­er­ate an­other com­pile error as rval­ues can only be bound to const ref­er­ences.

This is where the sec­ond part of our pool up­grades come in: rvalue ref­er­ences.

class pool {
   void* alloc();

   template<typename T, typename Args...>
   T* construct(Args&&... args) {
      return new(alloc()) T(std::forward(args)...);
   }
};

We’ve fi­nally got our so­lu­tion. That dou­ble-ref­er­ence look­ing thing is the new syn­tax for rvalue ref­er­ences. This con­struct im­ple­ments per­fect for­ward­ing: call­ing construct<foo>(a, b, c, d) will be­have ex­actly as if we had called the con­struc­tor di­rectly via new(alloc()) T(a, b, c, d).

This works be­cause Args is a tem­plated type that will re­solve to ref­er­ences and const ref­er­ences if it needs to. One prob­lem I have yet to fig­ure out how to solve is a con­struc­tor where you know the type you want, and want to ac­cept any ref­er­ence type:

struct foo {
   foo(const bar &b) : m_b(b) {}
   foo(bar &&b) : m_b(std::move(b)) {}

   bar m_b;
};

I don’t care if b is a lvalue or rvalue ref­er­ence: I just want the con­struc­tion of m_b to be as ef­fi­cient as pos­si­ble so that it can use move se­man­tics when you pass it an rvalue. So far the only way I can find to do it is with two sep­a­rate con­struc­tors, which could mean a lot of code du­pli­ca­tion on some­thing less triv­ial than this ex­am­ple.

Scalability isn’t everything

In the be­gin­ning, you write threaded apps with great ig­no­rance to scal­a­bil­ity. That’s usu­ally okay — most apps don’t need it, but sooner or later you will come across a prob­lem that de­mands it. With enough search­ing, you will come across lock–free al­go­rithms. Tricky to get right, but promis­ing fan­tas­tic scal­a­bil­ity if you do.

Even trick­ier, though, is know­ing when to not use them. Lock–free al­go­rithms come with a price: al­though they are in­deed very scal­able, their per­for­mance can be much worse than a well de­signed al­go­rithm for sin­gle–threaded ap­pli­ca­tions. Do a lit­tle bench­mark­ing and you might find some­thing sur­pris­ing: the per­for­mance hit can ac­tu­ally be so large that a sim­ple locked sin­gle–threaded al­go­rithm with no scal­a­bil­ity will give bet­ter over­all per­for­mance than a 100% scal­able lock–free ver­sion.

This is more com­mon than you might think. Take a queue. A sin­gle–threaded ver­sion will typ­i­cally have very min­i­mal mem­ory over­head: maybe a pointer for every n ob­jects. A lock–free ver­sion will need two point­ers for every ob­ject (or one, if you use a GC). Now the amount of over­head greatly de­pends on what your ob­ject is. If your ob­ject is large, a lock–free queue will prob­a­bly be a bet­ter choice. But if your ob­ject is small—say one or two point­ers—the over­head can be enough that cache misses will sig­nif­i­cantly af­fect your ap­pli­ca­tion.

I re­cently had to tackle this prob­lem. My ap­pli­ca­tion needed a queue of small ob­jects, and on a mod­ern quad–core CPU the cache misses were hurt­ing per­for­mance so much that al­though a lock–free queue did have near 100% scal­a­bil­ity, the over­all op­er­a­tion was com­plet­ing 165% faster with a locked queue with zero scal­a­bil­ity.

The next best thing is to com­bines the best of both worlds: de­sign a queue with low over­head and medium scal­a­bil­ity. Using a reader–writer lock with a com­bi­na­tion of lock–free op­er­a­tions, I came up with a queue that only needs to do a full lock once every 32 or 64 op­er­a­tions. The re­sult? Scal­a­bil­ity 5% lower than a lock–free queue, with over­all per­for­mance 210% bet­ter.

OK, I’ll admit it: I cheated, some­what. Lock–free al­go­rithms are good for more than just scal­a­bil­ity. They also offer im­mu­nity to nasty ef­fects like dead­lock, live­lock, and pri­or­ity in­ver­sion. In my case I wasn’t in a sit­u­a­tion to worry about these, but you might be. The les­son here is to know your sit­u­a­tion and de­cide care­fully, and don’t trust what oth­ers tell you: al­ways try things your­self and pro­file.

Digging into TR1

Chan­nel 9 has an in­ter­view with Stephan T. Lavavej of the Vi­sual C++ team show­ing off some of the new fea­tures in TR1, the new draft stan­dard li­brary ad­di­tions for C++0x. While the in­ter­view will prob­a­bly have noth­ing new for com­pe­tent C++ de­vel­op­ers, Stephan does go into good de­tail ex­plain­ing the unique strengths of C++ that new­bies may not im­me­di­ately see.

If you’ve thought of C++ but have been scared by its com­plex­ity, or have been tempted by newer garbage col­lected lan­guages like C#, this will prob­a­bly be a good eye opener.