Little wheels

I’ve been weirdly depressed since finishing my thesis. Okay, not since finishing my thesis, since getting the first round of results from qcap. qcap is supposed to be a slick little library to parse network data. It’s supposed to chow down on packets captured from the network card, voraciously chewing through the billions of bytes flying across your average small gateway. It’s the proof of concept from my thesis. And it’s slow.

qcap doesn’t really “chow”. It more nibbles. Delicately. With one pinky held high, it takes delicate little bites from network captures. It takes the time to enjoy the flavour of whatever network stack sent each packet. I had hoped that I had built the parsing equivalent of a fat little kid who would gobble down anything you sat in front of it. Instead, I appear to have built an aristocratic epicurean sloth. I was hoping that the parser would be able to parse a few megs of network data per second, but it looks like it can only handle a few kilobytes at best.

So, with only two weeks until my defense, the wheels are starting to spin up again. I’m looking at the parser design, and I’m trying to see what I could do better. I took a generalized approach, that causes the parser clone itself every time it hits an ambiguous region. I think that I can replace that with an LL(k) parser that will only drop into generalized mode when it absolutely has to; which should dramitically reduce the cost of processing each byte. Grammars in my original parser were interpreted, existing as a structure in memory that was used as an input to the parser function along with each fresh byte; I’m pretty sure that I know how to generate code instead.

I would love to implement the new parser, but there’s no chance I can do that before my defense. After my defense, I don’t know if I will have time.

2 Responses to “Little wheels”

  1. 2006.Aug.11 @ 08:58

    Remember – before spending all this time implmenting a new parser, profile to see that the parser is the bottleneck.

  • 2006.Aug.11 @ 11:25

    Yup, it’s definitely in the parser. I’ve been doing a fair amount of profiling with gprof. I don’t have a bottleneck that can be reduced/removed without an architectural change. :-(

  • Reply

    You can use these HTML tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

    If your website is claim enabled, it will be notified that you have posted here.