January 2008 Archives
strncpy is not your friend
Being in IRC, every so often you will find someone heralding the use of strncpy
for writing secure code. A lot of the time they are just going off what others have said, and can’t even tell you what strncpy really does. strncpy is a problem for two reasons:
- It silently truncates data. When, in all of your experience coding, has silent truncation been acceptable behavior? Replacing one bug (a buffer overflow) with silent truncation is not a fix, it’s just hiding the problem.
- strncpy does not do what you think it does. It is not made for security—in fact, if the buffer runs out of room it will copy into the last character, not adding a null terminator! So once again, you replace a buffer overflow with another bug.
Bugs happen. Sometimes we build sanity checks into programs to combat unknown ones before they become a problem. But strncpy is not a sanity check or security feature—using it instead of resizing a buffer to accommodate the data, or just outright rejecting the data if it gets too big is a bug.
C++ sucks less than you think it does
C++ seems to be the quickest language to get a knee-jerk “it sucks!” reaction out of people. The common reasons they list:
- It’s ugly, hard to read, and unmaintainable.
- It’s easy to get memory leaks – computers are fast enough, use Java or another language with garbage collection!
- It results in larger, bloated executables.
C++ is a very powerful, very complex language. Being a multi-paradigm (procedural-, functional-, object-oriented–, and meta-programming) language that implores the coder to always use the best tool for the job, C++ forces you to think differently than other popular languages and will take the average coder years of working with it before they start to get truly good at it.
C++ does have its flaws. Some are fixable, some aren’t. Most of what is fixable is being addressed in a new standard due some time next year (2009).
Its biggest problem is that newcomers tend to only see its complexity and syntax, not its power. The primary reason for this is education. In introductory and advanced courses, students are taught an underwhelming amount of templates without any of the useful practices that can make them so epically powerful. Use of the interesting parts of the standard library, such as iterators and functional programming, are put aside in favor of object-oriented design and basic data structures which, while useful, is only a small part of what C++ is about. RAII, an easy zero-cost design pattern that makes resource leaks almost impossible, is virtually untaught.
C++ does tend to produce larger executables. This is a trade off – do you want smaller executables or faster code? Whereas in C you might use callbacks for something (as shown by the qsort and bsearch functions in the standard library) and produce less code, in C++ everything is specialized with a template that gives improved performance. This follows C++’s “don’t pay for what you don’t use” philosophy.
Don’t get me wrong, C++ is not the right tool for all jobs. But among all the languages I have used, C++ stands out from the crowd as one that is almost never a bad choice, and a lot of times is the best choice. It might take longer to get good at than most other languages, but once you’ve got it down its power is hard to match.
Visual C++ 2008 Feature Pack beta
It’s here! A beta of the promised TR1 library update has been put up for download.
Included in the pack is an update to MFC that adds a number of Office-style controls. Wish they’d put these out as plain Win32 controls, because I’ve got no intention of using MFC!
Writing a good parser
From a simple binary protocol used over sockets to a complex XML document, many applications depend on parsing. Why, then, do the great majority of parsers out there just plain suck?
I’ve come to believe a good parser should have two properties:
- It should be completely decoupled from any I/O. The same parser should work on a file, a pipe, a socket, or from straight memory. This not only gets you flexibility but the ability to make test cases simpler: protocol handler acting up? Just write a test case that reads it from a file instead.
- Other than perhaps memory allocation, a parser should never block. To accomplish the above task, I’ve seen several parsers use read/write callbacks. This causes the parser to block on I/O. Should you ever need to use the parser in a scalable environment, this just won’t do—a good server only has one or two threads per CPU and blocking on I/O doesn’t allow this.
Several parsers meet the first requirement, but almost none meet the second: they expect their input to come in without a break in execution. So how do you accomplish this? Lately I’ve been employing this fairly simple design:
struct buffer { struct buffer *next; char *buf; size_t len; }; struct parser { struct buffer *input; struct buffer *lastinput; struct buffer *output; int (*func)(struct parser*, struct *state); }; enum { CONTINUE, NEEDMORE, GOTFOO, GOTBAR }; int parse(struct parser *p) { int ret; while((ret = p->func(p)) == CONTINUE); return ret; }
The idea should be easy to understand:
- Add buffer(s) to input queue.
- If
parse()
returnsNEEDMORE
, add more input to the queue and call it again. - If
parse()
returnsGOTFOO
orGOTBAR
, state is filled with data. - The function pointer is continually updated with a parser specialized for the current data: in this case, a
FOO
or aBAR
, or even just bits and pieces of aFOO
or aBAR
. It returnsCONTINUE
ifparse()
should just call a new function pointer. - As the parser function pointers eat at the input queue, put used buffers into the output stack.
Other than meeting my two requirements above, the best thing about this design? It doesn’t sacrifice cleanliness, and it won’t cause your code size to increase.