January 2008 Archives

strncpy is not your friend

Being in IRC, every so often you will find some­one herald­ing the use of strncpy for writ­ing se­cure code. A lot of the time they are just going off what oth­ers have said, and can’t even tell you what strncpy re­ally does. strncpy is a prob­lem for two rea­sons:

Bugs hap­pen. Some­times we build san­ity checks into pro­grams to com­bat un­known ones be­fore they be­come a prob­lem. But strncpy is not a san­ity check or se­cu­rity fea­ture—using it in­stead of re­siz­ing a buffer to ac­com­mo­date the data, or just out­right re­ject­ing the data if it gets too big is a bug.

C++ sucks less than you think it does

C++ seems to be the quick­est lan­guage to get a knee-jerk “it sucks!” re­ac­tion out of peo­ple. The com­mon rea­sons they list:

C++ is a very pow­er­ful, very com­plex lan­guage. Being a multi-par­a­digm (pro­ce­dural-, func­tional-, ob­ject-ori­ented–, and meta-pro­gram­ming) lan­guage that im­plores the coder to al­ways use the best tool for the job, C++ forces you to think dif­fer­ently than other pop­u­lar lan­guages and will take the av­er­age coder years of work­ing with it be­fore they start to get truly good at it.

C++ does have its flaws. Some are fix­able, some aren’t. Most of what is fix­able is being ad­dressed in a new stan­dard due some time next year (2009).

Its biggest prob­lem is that new­com­ers tend to only see its com­plex­ity and syn­tax, not its power. The pri­mary rea­son for this is ed­u­ca­tion. In in­tro­duc­tory and ad­vanced courses, stu­dents are taught an un­der­whelm­ing amount of tem­plates with­out any of the use­ful prac­tices that can make them so epi­cally pow­er­ful. Use of the in­ter­est­ing parts of the stan­dard li­brary, such as it­er­a­tors and func­tional pro­gram­ming, are put aside in favor of ob­ject-ori­ented de­sign and basic data struc­tures which, while use­ful, is only a small part of what C++ is about. RAII, an easy zero-cost de­sign pat­tern that makes re­source leaks al­most im­pos­si­ble, is vir­tu­ally un­taught.

C++ does tend to pro­duce larger ex­e­cuta­bles. This is a trade off – do you want smaller ex­e­cuta­bles or faster code? Whereas in C you might use call­backs for some­thing (as shown by the qsort and bsearch func­tions in the stan­dard li­brary) and pro­duce less code, in C++ every­thing is spe­cial­ized with a tem­plate that gives im­proved per­for­mance. This fol­lows C++’s “don’t pay for what you don’t use” phi­los­o­phy.

Don’t get me wrong, C++ is not the right tool for all jobs. But among all the lan­guages I have used, C++ stands out from the crowd as one that is al­most never a bad choice, and a lot of times is the best choice. It might take longer to get good at than most other lan­guages, but once you’ve got it down its power is hard to match.

Visual C++ 2008 Feature Pack beta

It’s here! A beta of the promised TR1 li­brary up­date has been put up for down­load.

In­cluded in the pack is an up­date to MFC that adds a num­ber of Of­fice-style con­trols. Wish they’d put these out as plain Win32 con­trols, be­cause I’ve got no in­ten­tion of using MFC!

Writing a good parser

From a sim­ple bi­nary pro­to­col used over sock­ets to a com­plex XML doc­u­ment, many ap­pli­ca­tions de­pend on pars­ing. Why, then, do the great ma­jor­ity of parsers out there just plain suck?

I’ve come to be­lieve a good parser should have two prop­er­ties:

Sev­eral parsers meet the first re­quire­ment, but al­most none meet the sec­ond: they ex­pect their input to come in with­out a break in ex­e­cu­tion. So how do you ac­com­plish this? Lately I’ve been em­ploy­ing this fairly sim­ple de­sign:

struct buffer {
  struct buffer *next;
  char *buf;
  size_t len;
};

struct parser {
  struct buffer *input;
  struct buffer *lastinput;
  struct buffer *output;
  int (*func)(struct parser*, struct *state);
};

enum {
  CONTINUE,
  NEEDMORE,
  GOTFOO,
  GOTBAR
};

int parse(struct parser *p) {
  int ret;
  while((ret = p->func(p)) == CONTINUE);

  return ret;
}

The idea should be easy to un­der­stand:

  1. Add buffer(s) to input queue.
  2. If parse() re­turns NEEDMORE, add more input to the queue and call it again.
  3. If parse() re­turns GOTFOO or GOTBAR, state is filled with data.
  4. The func­tion pointer is con­tin­u­ally up­dated with a parser spe­cial­ized for the cur­rent data: in this case, a FOO or a BAR, or even just bits and pieces of a FOO or a BAR. It re­turns CONTINUE if parse() should just call a new func­tion pointer.
  5. As the parser func­tion point­ers eat at the input queue, put used buffers into the out­put stack.

Other than meet­ing my two re­quire­ments above, the best thing about this de­sign? It doesn’t sac­ri­fice clean­li­ness, and it won’t cause your code size to in­crease.