C

Is C# the Boost of C-family languages?

For all the cons of giv­ing a sin­gle en­tity con­trol over C#, one pro is that it gives the lan­guage an un­matched agility to try new things in the C fam­ily of lan­guages. LINQ—both its lan­guage in­te­gra­tion and its back­ing APIs—is an in­cred­i­bly pow­er­ful tool for query­ing and trans­form­ing data with very con­cise code. I re­ally can’t ex­press how much I’ve come to love it.

The new async sup­port an­nounced at PDC10 is ba­si­cally the holy grail of async cod­ing, let­ting you focus on what your task is and not how you’re going to im­ple­ment a com­plex async code path for it. It’s an old idea that many async coders have come up with, but, as far as I know, has never been suc­cess­fully im­ple­mented sim­ply be­cause it re­quired too much lan­guage sup­port.

The lack of peer re­view and stan­dards com­mit­tee for .​NET shows—there’s a pretty high rate of turnover as Mi­crosoft tries to iron down the right way to tackle prob­lems, and it re­sults in a very large li­brary with lots of re­dun­dant func­tion­al­ity. As much as this might hurt .​NET, I’m start­ing to view C# as a sort of Boost for the C lan­guage fam­ily. Some great ideas are get­ting real-​world use, and if other lan­guages even­tu­ally feel the need to get some­thing sim­i­lar, they will have a bounty of ex­pe­ri­ence to pull from.

C++, at least, is a ter­ri­fy­ingly com­plex lan­guage. Get­ting new fea­tures into it is an up­hill bat­tle, even when they ad­dress a prob­lem that every­one is frus­trated with. Get­ting com­plex new fea­tures like these into it would be a very long process, with a lot of ar­gu­ing and years of delay. Any extra in­cu­ba­tion time we can give them is a plus.

strncpy is not your friend

Being in IRC, every so often you will find some­one herald­ing the use of strncpy for writ­ing se­cure code. A lot of the time they are just going off what oth­ers have said, and can’t even tell you what strncpy re­ally does. strncpy is a prob­lem for two rea­sons:

Bugs hap­pen. Some­times we build san­ity checks into pro­grams to com­bat un­known ones be­fore they be­come a prob­lem. But strncpy is not a san­ity check or se­cu­rity fea­ture—using it in­stead of re­siz­ing a buffer to ac­com­mo­date the data, or just out­right re­ject­ing the data if it gets too big is a bug.

Writing a good parser

From a sim­ple bi­nary pro­to­col used over sock­ets to a com­plex XML doc­u­ment, many ap­pli­ca­tions de­pend on pars­ing. Why, then, do the great ma­jor­ity of parsers out there just plain suck?

I’ve come to be­lieve a good parser should have two prop­er­ties:

Sev­eral parsers meet the first re­quire­ment, but al­most none meet the sec­ond: they ex­pect their input to come in with­out a break in ex­e­cu­tion. So how do you ac­com­plish this? Lately I’ve been em­ploy­ing this fairly sim­ple de­sign:

struct buffer {
  struct buffer *next;
  char *buf;
  size_t len;
};

struct parser {
  struct buffer *input;
  struct buffer *lastinput;
  struct buffer *output;
  int (*func)(struct parser*, struct *state);
};

enum {
  CONTINUE,
  NEEDMORE,
  GOTFOO,
  GOTBAR
};

int parse(struct parser *p) {
  int ret;
  while((ret = p->func(p)) == CONTINUE);

  return ret;
}

The idea should be easy to un­der­stand:

  1. Add buffer(s) to input queue.
  2. If parse() re­turns NEEDMORE, add more input to the queue and call it again.
  3. If parse() re­turns GOTFOO or GOTBAR, state is filled with data.
  4. The func­tion pointer is con­tin­u­ally up­dated with a parser spe­cial­ized for the cur­rent data: in this case, a FOO or a BAR, or even just bits and pieces of a FOO or a BAR. It re­turns CONTINUE if parse() should just call a new func­tion pointer.
  5. As the parser func­tion point­ers eat at the input queue, put used buffers into the out­put stack.

Other than meet­ing my two re­quire­ments above, the best thing about this de­sign? It doesn’t sac­ri­fice clean­li­ness, and it won’t cause your code size to in­crease.