PeerGuardian

Optimizing IP range searching in PeerGuardian

I was work­ing on some­thing com­pletely dif­fer­ent last night, when an el­e­gant idea came to mind on how to sig­nif­i­cantly speed up Peer­Guardian’s IP search­ing. It’s funny how an idea can just pop into the mind about a prob­lem that hasn’t been thought of in a long time.

Right now Peer­Guardian uses a bi­nary search to match IPs. This is al­ready pretty ef­fi­cient, run­ning in ⌈log 2 N⌉—so for 100,000 IP ranges, about 16 tests need to be done. This has the ad­di­tional ad­van­tage of hav­ing no mem­ory over­head.

My idea is to use a struc­ture sim­i­lar to a B+tree, pack­ing as many IPs and branch point­ers into a cache line as pos­si­ble. On today’s ar­chi­tec­tures, a cache line is typ­i­cally 64 bytes, so 8 IPs and 9 branch point­ers would fit on each node, mak­ing it only need to read about ⌈log 9 N⌉ nodes to find a match. So in order to find a match in 100,000 IP ranges, only about 5 nodes would need to be read.

CPUs al­ways read and cache data in blocks (a cache line), so an al­go­rithm that keeps this in mind to min­i­mize mem­ory reads and max­i­mize cache usage should be in­cred­i­bly fast. Even though this in­tro­duces sig­nif­i­cant over­head for branch point­ers (about 2x the stor­age would be re­quired), it should be far more ef­fi­cient over­all.

But this al­go­rithm im­proves in an­other way too: branch­ing. I’m talk­ing in terms of branch in­struc­tions, not the branch point­ers men­tioned above. The fewer branches code takes, the faster a su­per­scalar or pipelined CPU will be able to run your code. For this al­go­rithm, an en­tire node could be processed (that is, com­par­ing the IPs and de­ter­min­ing which branch node to go into) with zero branches using in­te­ger SSE2 (PCMPGTD, PMOVM­SKB), and bit-scan for­ward (BSF).

I can’t be sure how much of a speed dif­fer­ence this would make until I code it up, but I bet it would be at least 200% faster. I’ve been too busy to work on Peer­Guardian for quite a while, so I don’t know if this will ever make it into PG. We’re look­ing for a new coder with more time on their hands.

Working with cryptography; turns out it’s not so simple

While com­ing up with a new list for­mat for Peer­Guardian 3, I de­cided it should have built in dig­i­tal sig­na­tures, so every­one get­ting lists can ver­ify the in­tegrity and who the list came from.

Al­though I’ve used crypto sys­tems like GPG be­fore and un­der­stood the ba­sics of it, I’d never im­ple­mented one my­self. So after much re­search, I de­cided on LibTom­Crypt due to its sim­ple API, stel­lar doc­u­men­ta­tion, and sup­port for mod­ern al­go­rithms like AES and ECC. Being en­tirely in the pub­lic do­main is a good perk, too.

The first it­er­a­tion is a very basic pub­lic key sys­tem. After fur­ther read­ing, I’ve de­cided it would be use­ful to im­ple­ment a full pub­lic key in­fra­struc­ture – that is, signed keys and pos­si­bil­ity of re­vo­ca­tion. This al­lows Phoenix Labs (or any­one else) to sign other pub­lic keys to ver­ify they’re legit and trust­wor­thy, and later re­voke the key if some­thing hap­pens with it (such as the pri­vate key being leaked).

All in all, it’s turn­ing out to be a lot more work than I ex­pected, but I don’t mind – it’s some­thing new and in­ter­est­ing, which seems to hap­pen less and less these days.