C – int64.org

C

Is C# the Boost of C-family languages?

Posted on October 28, 2010

For all the cons of giving a single entity control over C#, one pro is that it gives the language an unmatched agility to try new things in the C family of languages. LINQ—both its language integration and its backing APIs—is an incredibly powerful tool for querying and transforming data with very concise code. I really can’t express how much I’ve come to love it.

The new async support announced at PDC10 is basically the holy grail of async coding, letting you focus on what your task is and not how you’re going to implement a complex async code path for it. It’s an old idea that many async coders have come up with, but, as far as I know, has never been successfully implemented simply because it required too much language support.

The lack of peer review and standards committee for .NET shows—there’s a pretty high rate of turnover as Microsoft tries to iron down the right way to tackle problems, and it results in a very large library with lots of redundant functionality. As much as this might hurt .NET, I’m starting to view C# as a sort of Boost for the C language family. Some great ideas are getting real-world use, and if other languages eventually feel the need to get something similar, they will have a bounty of experience to pull from.

C++, at least, is a terrifyingly complex language. Getting new features into it is an uphill battle, even when they address a problem that everyone is frustrated with. Getting complex new features like these into it would be a very long process, with a lot of arguing and years of delay. Any extra incubation time we can give them is a plus.

strncpy is not your friend

Posted on January 20, 2008

Being in IRC, every so often you will find someone heralding the use of strncpy for writing secure code. A lot of the time they are just going off what others have said, and can’t even tell you what strncpy really does. strncpy is a problem for two reasons:

It silently truncates data. When, in all of your experience coding, has silent truncation been acceptable behavior? Replacing one bug (a buffer overflow) with silent truncation is not a fix, it’s just hiding the problem.
strncpy does not do what you think it does. It is not made for security—in fact, if the buffer runs out of room it will copy into the last character, not adding a null terminator! So once again, you replace a buffer overflow with another bug.

Bugs happen. Sometimes we build sanity checks into programs to combat unknown ones before they become a problem. But strncpy is not a sanity check or security feature—using it instead of resizing a buffer to accommodate the data, or just outright rejecting the data if it gets too big is a bug.

Writing a good parser

Posted on January 02, 2008

From a simple binary protocol used over sockets to a complex XML document, many applications depend on parsing. Why, then, do the great majority of parsers out there just plain suck?

I’ve come to believe a good parser should have two properties:

It should be completely decoupled from any I/O. The same parser should work on a file, a pipe, a socket, or from straight memory. This not only gets you flexibility but the ability to make test cases simpler: protocol handler acting up? Just write a test case that reads it from a file instead.
Other than perhaps memory allocation, a parser should never block. To accomplish the above task, I’ve seen several parsers use read/write callbacks. This causes the parser to block on I/O. Should you ever need to use the parser in a scalable environment, this just won’t do—a good server only has one or two threads per CPU and blocking on I/O doesn’t allow this.

Several parsers meet the first requirement, but almost none meet the second: they expect their input to come in without a break in execution. So how do you accomplish this? Lately I’ve been employing this fairly simple design:

struct buffer {
  struct buffer *next;
  char *buf;
  size_t len;
};

struct parser {
  struct buffer *input;
  struct buffer *lastinput;
  struct buffer *output;
  int (*func)(struct parser*, struct *state);
};

enum {
  CONTINUE,
  NEEDMORE,
  GOTFOO,
  GOTBAR
};

int parse(struct parser *p) {
  int ret;
  while((ret = p->func(p)) == CONTINUE);

  return ret;
}

The idea should be easy to understand:

Add buffer(s) to input queue.
If parse() returns NEEDMORE, add more input to the queue and call it again.
If parse() returns GOTFOO or GOTBAR, state is filled with data.
The function pointer is continually updated with a parser specialized for the current data: in this case, a FOO or a BAR, or even just bits and pieces of a FOO or a BAR. It returns CONTINUE if parse() should just call a new function pointer.
As the parser function pointers eat at the input queue, put used buffers into the output stack.

Other than meeting my two requirements above, the best thing about this design? It doesn’t sacrifice cleanliness, and it won’t cause your code size to increase.

int64.org

Top Tags see all

Archives

C

Is C# the Boost of C-family languages?

strncpy is not your friend

Writing a good parser