C++, Page 2
Optimizing exceptions
You might often hear about exceptions being slow. For this reason they are usually shunned in the embedded space, and sometimes even for regular desktop/server programming. What makes them slow? When one is thrown it needs to search through the call stack for exception handlers.
I guess I don’t understand this line of thinking. For one, exceptions are meant for exceptional situations: things you don’t expect to happen under normal operation. Code that uses exceptions will run just as fast (or maybe even faster) as code without, until you throw one. These exceptional situations are truely rare, so I usually don’t care if they do happen to run slower.
A compiler can actually use exceptions to optimize your code. Consider this inefficient (but typical) pseudo-C:
int dosomething(void) { /* do something A */ if(err) return -1; /* do something B */ if(err) { /* cleanup previous work A */ return -1; } /* do something C */ if(err) { /* cleanup previous work B */ /* cleanup previous work A */ return -1; } return 0; }
Or even this more efficient (yes boys and girls, goto actually has a good use case in C, get over it) pseudo-C:
int dosomething(void) { /* do something A */ if(err) return -1; /* do something B */ if(err) goto err1; /* do something C */ if(err) goto err2; return 0; err2: /* cleanup previous work B */ err1: /* cleanup previous work A */ return -1; }
Why are these bad? Cache locality. In the first example, you have error handling code inline with your regular code. In the second you have it slightly better and off to the end of the function. Ideally the code you run will all be compacted in as few cache lines as possible, and erroring handling this way will waste significant space on cleanup code that in the large majority of cases won’t be run.
But with exceptions, the compiler is free to take all the cleanup code in your entire app, and shove it into a single separate area of code. All your normal code that you expect to run can be compact and closer together. Of course, this will make exceptions run slower. If your code is heavy on throwing exceptions (which would probably be an abuse) it will probably cause a significant overall slowdown. But if they are used correctly–for exceptional situations–then the common case will be improved cache usage and therefor faster code.
Visual C++ 2008 Feature Pack is now available
The Visual C++ 2008 Feature Pack I talked about before is finished and ready for download. This includes a bulk of the TR1 updates (sadly, still no cstdint) and some major MFC updates.
GCC 4.3, C++0x preview
GCC 4.3 came out a couple weeks ago, and I finally got time to give its experimental C++0x support a go. Specifically, I was interested in two features of it: variadic templates and rvalue references.
There is one prime example of what these two features are awesome for: perfect forwarding. Take a memory pool. You might have something like this:
class pool { void* alloc(); template<typename T> T* construct() { return new(alloc()) T; } };
But that is hardly satisfactory. What if you want to construct T
with an argument?
class pool { void* alloc(); template<typename T> T* construct() { return new(alloc()) T; } template<typename T, typename ArgT> T* construct(const ArgT &arg) { return new(alloc()) T(arg); } };
So we add a new function to handle passing an arg. Better, but still not very great. What if you want to pass it multiple arguments?
C++ has very few problem that can’t be worked around in a relatively straitforward way. Unfortunately, this is one of those problems. The current solution most library developers employ will involve some really nasty preprocessor hacks to generate separate functions for 1, 2, 3, up to maybe 10 or 15 arguments. So, how do we solve this?
Enter variadic templates, the new C++ feature built specifically to solve this. Here is an updated pool class that takes any number of arguments:
class pool { void* alloc(); template<typename T, typename Args...> T* construct(const Args&... args) { return new(alloc()) T(args...); } };
Pretty simple! Those ellipses will expand into zero or more args. Great – we’re almost there. But we still have a problem here: what happens if the constructor for T takes some arguments as non-const references? This construct function will try to pass them as const references, resulting in a compile error. We can’t have it pass args as non-const references, because then if you pass it an rvalue—such as a temporary—it will generate another compile error as rvalues can only be bound to const references.
This is where the second part of our pool upgrades come in: rvalue references.
class pool { void* alloc(); template<typename T, typename Args...> T* construct(Args&&... args) { return new(alloc()) T(std::forward(args)...); } };
We’ve finally got our solution. That double-reference looking thing is the new syntax for rvalue references. This construct implements perfect forwarding: calling construct<foo>(a, b, c, d)
will behave exactly as if we had called the constructor directly via new(alloc()) T(a, b, c, d)
.
This works because Args is a templated type that will resolve to references and const references if it needs to. One problem I have yet to figure out how to solve is a constructor where you know the type you want, and want to accept any reference type:
struct foo { foo(const bar &b) : m_b(b) {} foo(bar &&b) : m_b(std::move(b)) {} bar m_b; };
I don’t care if b is a lvalue or rvalue reference: I just want the construction of m_b to be as efficient as possible so that it can use move semantics when you pass it an rvalue. So far the only way I can find to do it is with two separate constructors, which could mean a lot of code duplication on something less trivial than this example.
Digging into TR1
Channel 9 has an interview with Stephan T. Lavavej of the Visual C++ team showing off some of the new features in TR1, the new draft standard library additions for C++0x. While the interview will probably have nothing new for competent C++ developers, Stephan does go into good detail explaining the unique strengths of C++ that newbies may not immediately see.
If you’ve thought of C++ but have been scared by its complexity, or have been tempted by newer garbage collected languages like C#, this will probably be a good eye opener.
C++ sucks less than you think it does
C++ seems to be the quickest language to get a knee-jerk “it sucks!” reaction out of people. The common reasons they list:
- It’s ugly, hard to read, and unmaintainable.
- It’s easy to get memory leaks – computers are fast enough, use Java or another language with garbage collection!
- It results in larger, bloated executables.
C++ is a very powerful, very complex language. Being a multi-paradigm (procedural-, functional-, object-oriented–, and meta-programming) language that implores the coder to always use the best tool for the job, C++ forces you to think differently than other popular languages and will take the average coder years of working with it before they start to get truly good at it.
C++ does have its flaws. Some are fixable, some aren’t. Most of what is fixable is being addressed in a new standard due some time next year (2009).
Its biggest problem is that newcomers tend to only see its complexity and syntax, not its power. The primary reason for this is education. In introductory and advanced courses, students are taught an underwhelming amount of templates without any of the useful practices that can make them so epically powerful. Use of the interesting parts of the standard library, such as iterators and functional programming, are put aside in favor of object-oriented design and basic data structures which, while useful, is only a small part of what C++ is about. RAII, an easy zero-cost design pattern that makes resource leaks almost impossible, is virtually untaught.
C++ does tend to produce larger executables. This is a trade off – do you want smaller executables or faster code? Whereas in C you might use callbacks for something (as shown by the qsort and bsearch functions in the standard library) and produce less code, in C++ everything is specialized with a template that gives improved performance. This follows C++’s “don’t pay for what you don’t use” philosophy.
Don’t get me wrong, C++ is not the right tool for all jobs. But among all the languages I have used, C++ stands out from the crowd as one that is almost never a bad choice, and a lot of times is the best choice. It might take longer to get good at than most other languages, but once you’ve got it down its power is hard to match.
Visual C++ 2008 Feature Pack beta
It’s here! A beta of the promised TR1 library update has been put up for download.
Included in the pack is an update to MFC that adds a number of Office-style controls. Wish they’d put these out as plain Win32 controls, because I’ve got no intention of using MFC!
C++0x work progressing
A bit delayed, but I just found the results of the October 2007 C++ meeting. In it, they voted in several really nice things for the C++0x draft:
- nullptr – no more using 0 or NULL and getting int/pointer overload issues. ( N2431)
- Atomic library – comes with several classes and utility functions for working with memory ordering and lock–free algorithms. ( N2427)
- Threading library – basic threading, mutexes, and condition variables. ( N2447)
- Unicode literals – UTF-8 and UTF-16 string literals. ( N2442)
- Unicode codecvt facets – UTF-8 and UTF-16 codecvt facets for reading Unicode streams. ( N2401)
Visual Studio 2008 released, TR1 support coming
Anyone following Visual Studio 2008 will know that although it offers a plethora of new features for the managed world, there was little focus on the unmanaged side of things. Now that it is finally out the door, I guess it’s a good time to look at what few new features are there for us unmanaged C++ coders.
- Improved standards conformance with support for friend templates, an uncommon but powerful C++ feature.
- Intrinsic support for SSSE3, SSE4.x, and SSE4a. These are modern vector instructions (SSE4a literally just came out with AMD’s Phenom processors!) that anyone interested in writing high‐performance code will want to be familiar with.
- Intrinsic support for the CMPXCHG16B instruction. This instruction is essential when writing many lock-free algorithms for the x64 platform. I’ve been lobbying to have it added for a long time, so I’m especially happy to finally see it. Unfortunately, the generated code in Beta 2 was very sub‐optimal (considering the instruction is typically used in very tight loops) so I may end up using assembly anyway! I’m anxious to see if it is improved in RTM.
- Improved optimizer, with support for inlining transcendental functions and scheduling for the latest CPUs.
- Linker options updated for Vista – ability to specify UAC and address space randomization properties. For some reason, still no support for DPI independence so we’ll end up writing manifests anyway.
- We can finally use those quad‐core CPUs that are coming out to reduce our compile times with Multi‐threaded compiling.
Not much, huh? That’s because Microsoft was running under the assumption that people would flock to C# and only use unmanaged C++ to maintain "legacy" code. Perhaps the best news so far, they’ve finally realized their mistake. Although they didn’t have time to put things into VC++ 2008, they have re‐committed to unmanaged code for the next version and in the meantime made a small separate announcement that they will be bringing VC++ 2008 users a mostly complete TR1 implementation update in early January.