Thoughts on Networking: November 2010

Tuesday, November 30, 2010

Thinking about thinking about.

This is just a short post on some reflection I had today. Dave Ripplinger presented an interesting paper (as yet unpublished) that the IDeA lab is presenting at an upcoming conference (I didn't catch which one). In short, it presented improving wireless performance (goodput) as a mathematically-based optimization problem, and it seemed pretty promising. Made me reflect a little on how we, as computer scientists, sometimes just want to barrel forward without thinking about how we're thinking about the problem. They seem to be getting some real gains in performance by thinking about the problems of wireless routing and interference differently than a network engineer typically would. So often we will come up with a model for certain real-world behavior, but we don't always seem to check and see if there are other, equally valid models that would present us with some more readily visible solution to the problems we aim to solve.

Spam in time

Regarding this paper on spamming botnets again: there was a section on detecting future spam, yet in the final discussion, extension of AutoRE to real-time detection is left for future work. I think that the real value of this algorithm will be seen in how well it does on live, new data. The initial results are promising, so why not wait for the next conference and present on results of real-time detection? It could be that there are some political/financial reasons for the paper being presented as it was, but it does feel like we don't quite get to see the exciting conclusion.

Spam spam spam spam...

This is a paper about using regular expression URL matching to help detect email spam, especially spam campaigns. While they have a really interesting tool, I worry about the haste with which they dismiss some false positive sources:

Legitimate emails sent by a big company advertising a product or event could also be bursty. But they will be unlikely sent from hosts spanning more than a few ASes. One false positive case could be email flash crowd, where people forward each other a few popular URL links. We expect such events to be very rare. In our experience of using three months of data and the source AS threshold of 20, we did not encounter a single such event.

Three months of data from a single email provider seems like a small sample set to conclude that they will not mislabel some sort of spontaneous/viral non-spam URL. The lack of "flash crowd" data could lead to real problems with the email providers' customers if they should begin to participate in such an event; we have no confidence from the results of this study that such an event would be differentiated. Additionally, I wonder about the large-companies-will-span-few-ASes assumption. Does that hold for CDNs? Global companies? I think that time and trends may need to be considered more here.

Coding!

This paper was pretty cool! The basic idea is encoding packets for a bunch of destinations by XORing with packets that the clients at those destinations have sent. By doing so, wireless traffic congestion can be decreased pretty significantly, and there may be some possible multicast options to be explored there. Only one problem arises that I can think of: secure data.

It's already not very difficult to create a packet-sniffing device, so one wonders how secure data could be transferred over such an encoded network. I don't think this is a dealbreaker for typical Web surfing, video streaming (well, with piracy, maybe it is), or other low-security applications, but for something like banking networks, military application (battlefield networks come to mind), or any other network containing sensitive data, it seems like broadcasting all packets mixed together could have some serious implications. With coding, it seems that there is a good chance that every node could reasonably figure out every other node's packet contents simply by overhearing what they send and receiving the response. This is a problem with wireless networks already, but it feels like there are even more variables when we actually rely on overhearing in the network.

Well, that sounded like I'm coming down hard on this XOR coding. I do like it; just have to wonder about its interaction with secure applications.

Monday, November 29, 2010

A wireless routing idea

Wireless routing seems to suffer from something akin to Heisenberg's Uncertainty Principle: as one tries to measure or use a wireless path, the less reliable that path appears. This can lead to switching back and forth between the optimal path and a less optimal one.

A simplistic idea (and probably flawed, but it's a start) is to use more channels for wireless networking; separate the data and information channels such that a node in a wireless network can report its status without affecting the current data flows. Granted, it would still need to account for interference, but at least we could remove one source of entropy in a proposed routing protocol.

Wireless networking and backwards compatibility

I suspect that wireless networking requires a more radical break from traditional thinking about networking protocols than has generally been proposed. Problems with routing and packet loss may stem from the fact that most of our thinking revolves around how we've solved problems in a wired, reliable environment. In logic, undermining the axioms an argument is based on casts the entire argument into doubt; in networking, changing the physical layer probably means we have to revisit every layer above that.

The sensible thing to do seems to be to start fresh from the bottom up and see what ends up making sense. Instead, however, we seem to be trying to kludge TCP, IP, etc. into something that will work for a physical layer that it just hadn't considered in its design. I think wireless networking is going to require something entirely new, and interoperability with wired networks may just have to be achieved with new translator components. Any thoughts?

On reproducing results

One of the hallmarks of a valid scientific study is reproducible results. At the beginning of this month, a team of three of us attempted to recreate the results of this paper. While we were able to get pretty close, the task was made a little more difficult by not knowing what the parameters of their simulation were.

When a paper is published, there must always be some talk of methodology, and understandably, the details are left out so that the writers can get to the heart of their findings. This is appropriate, but I believe that a more-or-less obligatory appendix should be included, specifying exactly how the the simulations or experiments being reported on were run with an eye toward reproducing results. This might consist of a table of parameters and their values, or code, or perhaps simply a repository link where researchers could obtain and inspect code and documentation relating to the paper.

Perhaps it's just my own impatience, but I feel like it shouldn't be the work of an experiment itself to reproduce the results of a previously-performed experiment.