SIMD

Optimizing IP range searching in PeerGuardian

I was work­ing on some­thing com­pletely dif­fer­ent last night, when an el­e­gant idea came to mind on how to sig­nif­i­cantly speed up Peer­Guardian’s IP search­ing. It’s funny how an idea can just pop into the mind about a prob­lem that hasn’t been thought of in a long time.

Right now Peer­Guardian uses a bi­nary search to match IPs. This is al­ready pretty ef­fi­cient, run­ning in ⌈log 2 N⌉—so for 100,000 IP ranges, about 16 tests need to be done. This has the ad­di­tional ad­van­tage of hav­ing no mem­ory over­head.

My idea is to use a struc­ture sim­i­lar to a B+tree, pack­ing as many IPs and branch point­ers into a cache line as pos­si­ble. On today’s ar­chi­tec­tures, a cache line is typ­i­cally 64 bytes, so 8 IPs and 9 branch point­ers would fit on each node, mak­ing it only need to read about ⌈log 9 N⌉ nodes to find a match. So in order to find a match in 100,000 IP ranges, only about 5 nodes would need to be read.

CPUs al­ways read and cache data in blocks (a cache line), so an al­go­rithm that keeps this in mind to min­i­mize mem­ory reads and max­i­mize cache usage should be in­cred­i­bly fast. Even though this in­tro­duces sig­nif­i­cant over­head for branch point­ers (about 2x the stor­age would be re­quired), it should be far more ef­fi­cient over­all.

But this al­go­rithm im­proves in an­other way too: branch­ing. I’m talk­ing in terms of branch in­struc­tions, not the branch point­ers men­tioned above. The fewer branches code takes, the faster a su­per­scalar or pipelined CPU will be able to run your code. For this al­go­rithm, an en­tire node could be processed (that is, com­par­ing the IPs and de­ter­min­ing which branch node to go into) with zero branches using in­te­ger SSE2 (PCMPGTD, PMOVM­SKB), and bit-scan for­ward (BSF).

I can’t be sure how much of a speed dif­fer­ence this would make until I code it up, but I bet it would be at least 200% faster. I’ve been too busy to work on Peer­Guardian for quite a while, so I don’t know if this will ever make it into PG. We’re look­ing for a new coder with more time on their hands.