Wednesday, December 8, 2010

An experience with Omnet++

Documentation in open-source projects is often a problem. In my latest experiences with Omnet++, an open-source network simulator, I have been trying to install the simulator as well as INET and MiXiM extensions on an Ubuntu platform. In order to install, one would expect to read the INSTALL file. Indeed, this does seem to have instructions for installation. However, after hours of not being able to get extensions to work, it was pointed out that the INET installation MiXiM depends on is not the standard, but a custom branch of the INET project referred to in another README file in the MiXiM archive. Argh.

On a related note, it looks like it will take some work to get INET to simulate wireless interference well. Open-source takes work, I suppose!

Another blow to end-to-end

NetFence attempts to prevent DoS attacks by modifying the network to allow it to be slightly more intelligent. The routers gain the ability to inspect and police sender traffic as well as perform some security operations in an effort to minimize the effect of DoS attacks. This is another violation of the end-to-end principle that seems to be beneficial. In fact, it begs the question of whether DoS is a direct consequence of the end-to-end principle.

DoS relies on using a large number of hosts to bombard the target with constant requests. If the end-to-end principle is to be followed, then the responsibility for handling this attack must be placed entirely on the target of the attack. Since this involves inspecting and deciding what to do with a large amount of data, it seems the right approach would be a distributed/parallelized approach. The natural way to accomplish this is with an approach like NetFence; we have to violate the end-to-end principle in order to be able to handle this attack very well. Again, it seems that end-to-end is a useful simplification, but a sometimes dangerous and obsolete one.

Failure to success conversion

ASTUTE was presented in SIGCOMM 2010. What struck me about their paper is that from their introduction, it looks like they were trying to compete with Kalman filter-based and Wavelet anomaly detection. They seemed to fail to compete with them directly, but they turned their failure to detect as much as Kalman into a success by observing that ASTUTE detects different anomalous behavior than Kalman, and that working in tandem, they are more successful. The lesson to be learned is that research doesn't have to reach the conclusions hoped for to be useful.

Tuesday, November 30, 2010

Thinking about thinking about.

This is just a short post on some reflection I had today. Dave Ripplinger presented an interesting paper (as yet unpublished) that the IDeA lab is presenting at an upcoming conference (I didn't catch which one). In short, it presented improving wireless performance (goodput) as a mathematically-based optimization problem, and it seemed pretty promising. Made me reflect a little on how we, as computer scientists, sometimes just want to barrel forward without thinking about how we're thinking about the problem. They seem to be getting some real gains in performance by thinking about the problems of wireless routing and interference differently than a network engineer typically would. So often we will come up with a model for certain real-world behavior, but we don't always seem to check and see if there are other, equally valid models that would present us with some more readily visible solution to the problems we aim to solve.

Spam in time

Regarding this paper on spamming botnets again: there was a section on detecting future spam, yet in the final discussion, extension of AutoRE to real-time detection is left for future work. I think that the real value of this algorithm will be seen in how well it does on live, new data. The initial results are promising, so why not wait for the next conference and present on results of real-time detection? It could be that there are some political/financial reasons for the paper being presented as it was, but it does feel like we don't quite get to see the exciting conclusion.

Spam spam spam spam...

This is a paper about using regular expression URL matching to help detect email spam, especially spam campaigns. While they have a really interesting tool, I worry about the haste with which they dismiss some false positive sources:

Legitimate emails sent by a big company advertising a product or event could also be bursty. But they will be unlikely sent from hosts spanning more than a few ASes. One false positive case could be email flash crowd, where people forward each other a few popular URL links. We expect such events to be very rare. In our experience of using three months of data and the source AS threshold of 20, we did not encounter a single such event.
Three months of data from a single email provider seems like a small sample set to conclude that they will not mislabel some sort of spontaneous/viral non-spam URL. The lack of "flash crowd" data could lead to real problems with the email providers' customers if they should begin to participate in such an event; we have no confidence from the results of this study that such an event would be differentiated. Additionally, I wonder about the large-companies-will-span-few-ASes assumption. Does that hold for CDNs? Global companies? I think that time and trends may need to be considered more here.

Coding!

This paper was pretty cool! The basic idea is encoding packets for a bunch of destinations by XORing with packets that the clients at those destinations have sent. By doing so, wireless traffic congestion can be decreased pretty significantly, and there may be some possible multicast options to be explored there. Only one problem arises that I can think of: secure data.

It's already not very difficult to create a packet-sniffing device, so one wonders how secure data could be transferred over such an encoded network. I don't think this is a dealbreaker for typical Web surfing, video streaming (well, with piracy, maybe it is), or other low-security applications, but for something like banking networks, military application (battlefield networks come to mind), or any other network containing sensitive data, it seems like broadcasting all packets mixed together could have some serious implications. With coding, it seems that there is a good chance that every node could reasonably figure out every other node's packet contents simply by overhearing what they send and receiving the response. This is a problem with wireless networks already, but it feels like there are even more variables when we actually rely on overhearing in the network.

Well, that sounded like I'm coming down hard on this XOR coding. I do like it; just have to wonder about its interaction with secure applications.