Wednesday, December 8, 2010

An experience with Omnet++

Documentation in open-source projects is often a problem. In my latest experiences with Omnet++, an open-source network simulator, I have been trying to install the simulator as well as INET and MiXiM extensions on an Ubuntu platform. In order to install, one would expect to read the INSTALL file. Indeed, this does seem to have instructions for installation. However, after hours of not being able to get extensions to work, it was pointed out that the INET installation MiXiM depends on is not the standard, but a custom branch of the INET project referred to in another README file in the MiXiM archive. Argh.

On a related note, it looks like it will take some work to get INET to simulate wireless interference well. Open-source takes work, I suppose!

Another blow to end-to-end

NetFence attempts to prevent DoS attacks by modifying the network to allow it to be slightly more intelligent. The routers gain the ability to inspect and police sender traffic as well as perform some security operations in an effort to minimize the effect of DoS attacks. This is another violation of the end-to-end principle that seems to be beneficial. In fact, it begs the question of whether DoS is a direct consequence of the end-to-end principle.

DoS relies on using a large number of hosts to bombard the target with constant requests. If the end-to-end principle is to be followed, then the responsibility for handling this attack must be placed entirely on the target of the attack. Since this involves inspecting and deciding what to do with a large amount of data, it seems the right approach would be a distributed/parallelized approach. The natural way to accomplish this is with an approach like NetFence; we have to violate the end-to-end principle in order to be able to handle this attack very well. Again, it seems that end-to-end is a useful simplification, but a sometimes dangerous and obsolete one.

Failure to success conversion

ASTUTE was presented in SIGCOMM 2010. What struck me about their paper is that from their introduction, it looks like they were trying to compete with Kalman filter-based and Wavelet anomaly detection. They seemed to fail to compete with them directly, but they turned their failure to detect as much as Kalman into a success by observing that ASTUTE detects different anomalous behavior than Kalman, and that working in tandem, they are more successful. The lesson to be learned is that research doesn't have to reach the conclusions hoped for to be useful.

Tuesday, November 30, 2010

Thinking about thinking about.

This is just a short post on some reflection I had today. Dave Ripplinger presented an interesting paper (as yet unpublished) that the IDeA lab is presenting at an upcoming conference (I didn't catch which one). In short, it presented improving wireless performance (goodput) as a mathematically-based optimization problem, and it seemed pretty promising. Made me reflect a little on how we, as computer scientists, sometimes just want to barrel forward without thinking about how we're thinking about the problem. They seem to be getting some real gains in performance by thinking about the problems of wireless routing and interference differently than a network engineer typically would. So often we will come up with a model for certain real-world behavior, but we don't always seem to check and see if there are other, equally valid models that would present us with some more readily visible solution to the problems we aim to solve.

Spam in time

Regarding this paper on spamming botnets again: there was a section on detecting future spam, yet in the final discussion, extension of AutoRE to real-time detection is left for future work. I think that the real value of this algorithm will be seen in how well it does on live, new data. The initial results are promising, so why not wait for the next conference and present on results of real-time detection? It could be that there are some political/financial reasons for the paper being presented as it was, but it does feel like we don't quite get to see the exciting conclusion.

Spam spam spam spam...

This is a paper about using regular expression URL matching to help detect email spam, especially spam campaigns. While they have a really interesting tool, I worry about the haste with which they dismiss some false positive sources:

Legitimate emails sent by a big company advertising a product or event could also be bursty. But they will be unlikely sent from hosts spanning more than a few ASes. One false positive case could be email flash crowd, where people forward each other a few popular URL links. We expect such events to be very rare. In our experience of using three months of data and the source AS threshold of 20, we did not encounter a single such event.
Three months of data from a single email provider seems like a small sample set to conclude that they will not mislabel some sort of spontaneous/viral non-spam URL. The lack of "flash crowd" data could lead to real problems with the email providers' customers if they should begin to participate in such an event; we have no confidence from the results of this study that such an event would be differentiated. Additionally, I wonder about the large-companies-will-span-few-ASes assumption. Does that hold for CDNs? Global companies? I think that time and trends may need to be considered more here.

Coding!

This paper was pretty cool! The basic idea is encoding packets for a bunch of destinations by XORing with packets that the clients at those destinations have sent. By doing so, wireless traffic congestion can be decreased pretty significantly, and there may be some possible multicast options to be explored there. Only one problem arises that I can think of: secure data.

It's already not very difficult to create a packet-sniffing device, so one wonders how secure data could be transferred over such an encoded network. I don't think this is a dealbreaker for typical Web surfing, video streaming (well, with piracy, maybe it is), or other low-security applications, but for something like banking networks, military application (battlefield networks come to mind), or any other network containing sensitive data, it seems like broadcasting all packets mixed together could have some serious implications. With coding, it seems that there is a good chance that every node could reasonably figure out every other node's packet contents simply by overhearing what they send and receiving the response. This is a problem with wireless networks already, but it feels like there are even more variables when we actually rely on overhearing in the network.

Well, that sounded like I'm coming down hard on this XOR coding. I do like it; just have to wonder about its interaction with secure applications.

Monday, November 29, 2010

A wireless routing idea

Wireless routing seems to suffer from something akin to Heisenberg's Uncertainty Principle: as one tries to measure or use a wireless path, the less reliable that path appears. This can lead to switching back and forth between the optimal path and a less optimal one.

A simplistic idea (and probably flawed, but it's a start) is to use more channels for wireless networking; separate the data and information channels such that a node in a wireless network can report its status without affecting the current data flows. Granted, it would still need to account for interference, but at least we could remove one source of entropy in a proposed routing protocol.

Wireless networking and backwards compatibility

I suspect that wireless networking requires a more radical break from traditional thinking about networking protocols than has generally been proposed. Problems with routing and packet loss may stem from the fact that most of our thinking revolves around how we've solved problems in a wired, reliable environment. In logic, undermining the axioms an argument is based on casts the entire argument into doubt; in networking, changing the physical layer probably means we have to revisit every layer above that.

The sensible thing to do seems to be to start fresh from the bottom up and see what ends up making sense. Instead, however, we seem to be trying to kludge TCP, IP, etc. into something that will work for a physical layer that it just hadn't considered in its design. I think wireless networking is going to require something entirely new, and interoperability with wired networks may just have to be achieved with new translator components. Any thoughts?

On reproducing results

One of the hallmarks of a valid scientific study is reproducible results. At the beginning of this month, a team of three of us attempted to recreate the results of this paper. While we were able to get pretty close, the task was made a little more difficult by not knowing what the parameters of their simulation were.

When a paper is published, there must always be some talk of methodology, and understandably, the details are left out so that the writers can get to the heart of their findings. This is appropriate, but I believe that a more-or-less obligatory appendix should be included, specifying exactly how the the simulations or experiments being reported on were run with an eye toward reproducing results. This might consist of a table of parameters and their values, or code, or perhaps simply a repository link where researchers could obtain and inspect code and documentation relating to the paper.

Perhaps it's just my own impatience, but I feel like it shouldn't be the work of an experiment itself to reproduce the results of a previously-performed experiment.

Saturday, October 30, 2010

Non-static networks

Randy Buck wrote this article for our discussion of the network layer in CS660. It was a really exciting discussion about how relaxing the requirement for finding the shortest path in a network could result in huge benefits in routing table size and routing efficiency (ironic, isn't it?). The downside, however, is that the benefits only work in theory on a static network, which the Internet very much is not -- nodes come and go all the time. I've been trying to think of some way around that problem, but nothing has panned out so far. Here's what I have thought of and the problems therewith:

1. Central registration server. No. Just no. That just becomes the bottleneck in the network, and everything still has to register or time out with it. Why did I even think of this one?

2. Assign nodes with physical location-based names, and have those locations pre-assigned for an entire namespace. Traffic could then try and hit a certain address, and they'd either get a response from it if something exists at that address, or they wouldn't. The problems here would still lie in trying to get something more human-readable, as well as with mobile devices. Perhaps we should only map routers/access points on the network?

3. Whatever somebody else puts here. It seems like a daunting task, and in an area of networking that I'm just not as up-to-date on.

Saturday, October 23, 2010

Ownership of the Pipe

Connected to in-class discussion of network neutrality, it was discovered that pretty much everyone there felt good about the idea of letting the consumer/business/other customer own the pipe going to their house/building, then allowing them to choose the ISP they would connect to at some central routing box. This seems like a great way to encourage competition without incurring the overhead and other potential problems associated with regulation. The problem is that certain largish companies have already paid at least a significant portion of the cost to put infrastructure in, and they should be allowed to profit from that. So how do we resolve these ideals? I'm not sure of the answer, but here are a few ideas that were postulated:

1. After a certain amount of time (which has perhaps already passed), the companies have made back their investment plus a reasonable profit, so ownership should pass to someone else (preferably the building owners).

I see a few problems with this approach. For one, it feels a little to close to the principle of nationalization for my liking. Secondly, I don't think it would fly with my understanding of the way things work in the US. Since the companies are losing property, they should be compensated equitably for their loss. That leads me to solution #2:

2. Some plan could be established whereby customers could purchase the pipe going to their house.

Again, I'm not sure about all the details, but I like the principle of this one a little more. The company can be compensated for the effort they exerted building the infrastructure, but either directly or indirectly (perhaps via the municipality), the customer could gain control of their "last mile" pipe and foster competition. Questions I have about this approach are what happens if an ISP doesn't want to sell their pipe, and how will the pricing work?

3. A competing infrastructure could be established.

This seems wasteful both in terms of redundant infrastructure and in tax money already spent subsidizing the existing infrastructure, but it may be the only way to fairly guarantee competition. It might be useful for cities to use their resources to build the infrastructure in an ISP-neutral way, but the cities large enough to afford it also have enough bureaucracy to not be able to do it well.

Anyone else have some thoughts? I'd love to see more competition in ISPs one day.

Thursday, October 21, 2010

Network neutrality and panic

Reading Douglas A. Hass' 2008 paper on network neutrality was an interesting experience for me. I've never been a big fan of government regulation anyway (I see it as a last resort), but they made some interesting arguments regarding history; there isn't any to justify the need for net neutrality.

I had never thought about AOL in the context of them being the very monopolistic entity that proponents of net neutrality regulation fear, but this is due to the fact that they failed as that entity! At least up to this point, ISPs that have gotten "too big for their britches" have had huge losses in their customer base (quite a few million in AOL's case), indicating that competition is alive, well, and keeping ISPs from abusing their customers. Until that changes for some reason, I don't see any need to incur the monetary and other costs of increased government regulation on the Internet.


Thursday, October 14, 2010

More thoughts on end-to-end

I was reading this paper (actually, the version at the top of that page) and noticed how my previous comments on the end-to-end principle seem to be justified in a paper touting the need for end-to-end congestion control and TCP friendliness. A major observation here is that with the Internet as large and diverse as it is (and this was written in 1999), we cannot simply trust that everyone on the network will play nice. A quote from the conclusions in the paper:
We have argued in this paper on the need for end-to-end congestion
control, and further, on the need for mechanisms in the
network to detect and restrict unresponsive or high-bandwidth
best-effort flows in times of congestion. Such mechanisms
would provide a incentive in support of end-to-end congestion
control for best-effort traffic.
A major portion of this paper recommends the use of network mechanisms to restrict non-compliant (TCP-unfriendly) traffic on the network, thus encouraging others to play fair. At the same time, it is also mentioned how providers could allow premium rates (amazing how this has all played out) that would follow different rules. My point is simply this: whether or not we intended to, we have already violated the end-to-end principle out of necessity. I'm not sure that's a bad thing, as long as we do it in a straightforward and fair way - yes, there will be added complexity in the network, but so long as we have something akin to a standard API for other layers to interact with, this could be a strength.

Saturday, October 9, 2010

End of End-to-End

The original objectives for the Internet included an "end-to-end" principle for simplicity, and perhaps therefore for reliability. I think that keeping a system simple has an incredible amount of value in it, but I think that it is possible to take principles that keep one system simple to an extreme, with the result of an unintentionally complex system. We may have reached this point with our networks.

This paper is an example of using a bit in packet headers that only routers set in order to improve congestion control. Through this and other papers, it seemed pretty clear to me that network performance was improved drastically by the addition of a little reporting logic at the router level. This, however, violates the end-to-end principle. So much effort has been put in to do research on improving the prediction of network congestion that we have more or less ignored the fact that we can just find out how bad it is at any given time.

Improving the network is more costly than changing a protocol on client machines, but I've noticed that it happens anyway. People want faster speeds, bigger pipes -- why not use router congestion control as well? Let's not forget that the end-to-end principle is a means to the end of simpler, more robust networks. If it begins to hamper that end, perhaps we should re-evaluate our devotion to that principle.

Thursday, October 7, 2010

Fairness and Adoption

We discussed a point in my networking class that I found interesting -- transport protocols in development are extremely concerned with being fair to TCP. I was thinking about whether fairness is necessary or not, and I came to the conclusion that some degree of fairness certainly is -- a network will lose users quickly if transport protocols are so aggressive that one user takes up most/all of the traffic at any given point. Real-time applications, at the very least, would die, and the network would be perceived as unreliable (unless one user consistently got the whole network; then he'd be pretty happy). On the other hand, if a new protocol is fair with other implementations of itself, but not with TCP, it could encourage migrating to the new protocol (hopefully with some benefits connected thereto!). So what's the appropriate level of fairness?

The problem, as I see it, is most visible on large networks with non-technical people, like the Internet. Users who haven't upgraded their OS of choice for whatever reason may find that their TCP connections are wholly unreliable compared to what they used to be, and complain to the network administrators (or ISPs in this case). Enough complaints, enough lost business, and suddenly the new protocol might find itself prohibited or hobbled in some way on the network so that current customers will be happy. Then again, they might simply force their customers to upgrade the protocol on the client side. My question then becomes, "is it appropriate to force users to do a certain thing that they did not agree to on their own machines?" My gut says no, but short of running side-by-side networks, is there some compromise that would solve this issue? I don't yet know.

Saturday, October 2, 2010

Peer selection is interesting.

I'm quite glad that I chose to review a paper on peer selection for our last round of applications papers. Peer selection seems to be an area I can really sink my teeth into – it bears similarity to more general algorithmic optimization problems that I enjoy solving. As I seek out possible master's thesis topics, I find myself a bit drawn to the application layer in networking. Prior work in custom-criterion tree searching may prove helpful here.

PlanetLab status histories

PlanetLab is a cool idea – a bunch of interested parties contribute machines to a planet-wide network, allowing them all to request a slice and perform larger-scale experiments than they would otherwise be able to do. One problem that I've noticed as we have tried to experiment with a slice is that there are really no guarantees of availability – our slice was assigned up to a little over a thousand nodes, and we selected a subset of somewhere around 130 nodes. While testing our experiment scripts, we found that a number of those nodes, while reported available and working, did not respond to our SSH requests. Kind of a pain – we want to hit an exact number for this experiment, so knowing which nodes we can rely on seems pretty critical. I would love to have more information on a history of reliability of nodes on PlanetLab rather than the current “responding” sort of status indicator they show. Maybe they do it already, but I haven't seen it, and if not, it doesn't seem like keeping a history of statuses should be too difficult.

Friday, September 24, 2010

Workshops vs. papers

Maybe it's bad in research, but I find that I'm a practically-minded guy. I find that I prefer terse papers to some of these 12-14 page ordeals I find in SIGCOMM, etc. Not that they aren't useful, but sometimes it seems like they just take forever to get to the point, requiring a Herculean effort of concentration to see what the problem and their solution is.

My suggestion would be to collectively let go of the stigma of short papers - if somebody did something cool in one page, then let them do it in one page! It could just be that network researchers aren't good at English communication, but I believe that's something that can be learned. We just need to have the drive to. Then again, looking at my own blog posts, maybe I need to worry about the beam in my own eye!
I read this paper for my Applications paper in our networking class, and I was struck by the difference in approach that applications research could take compared with what our last area, architecture, could do. They were able to create an add-on for a popular BitTorrent client that would allow them to use real peers for their research, giving them a large test base practically overnight and giving the peers the benefits of the research instantly. What a great situation to do research in! If only there were some way for architecture to be able to get "buy-in" from network users so easily...

Saturday, September 18, 2010

DNS for people

We read a lot of papers that want to change the way naming on the Internet works, Often the goal seems to be to allow users to connect just to another user, no matter where they are. I think it would be interesting to try some kind of mapping from unique identifiers of devices (MAC addresses, perhaps) to a user entity on the network, such that packets for that user are broadcast to all registered devices on the network. This may have been done (I'm still without Internet access as I write this), but it is something to think about. Registration would require a centralized or semi-centralized solution (super-peers) because flooding the network with requests just gets to be too much for most networks. Target devices could respond with either “don't talk to me, I'm not in use” or with the expected responses to a given packet.

The next question is whether this is more or less efficient than our current DNS setup. That will require research. I find, however, that with services like Google Voice, there does appear to be a demand for such a user location service (location on the network, at least – it might be nice to prevent system abuse by stalkers).

Peer-to-peer networking

Does anyone know if a distributed (peer-to-peer) file system has been proposed as a form of file version control? It seems like that would be a nice way to have both redundant backup as well as reduce reliance on a trunk server. I'd research this more, but I'm cut off from Internet access at the moment, so I'll just describe what I think would be cool at a high level:

Versioned files could be stored and tracked via Gnutella-like super peers, none of which would need to contain the entire trunk, but they COULD have redundancy if it was desired. Deltas seem like they'd be trivial to include in the hierarchy, and thus you'd have a full versioning system distributed across your network.

Saturday, September 11, 2010

What usage should Internet architecture focus on?

Internet architecture research aims to improve the infrastructure of the Internet, but with respect to what? I think most would say that we should meet the most common needs, both current and foreseen, but on thinking about what those needs are, it seems that even that is a difficult question.

Accesses to given services may be a good measure, but improving the network's ability to serve small content fast may harm services that transfer large amounts of data, such as video or possibly gaming. Well-intentioned efforts in enabling easier network research may cause difficulties for those pursuing network security. The ever-present problem of network architecture research is, then, the inherent alteration of the status quo for every endeavor, field, or application built on the current architecture. While we can (and, I think, should) try and minimize the impact that architecture changes would have on current usage, the fact of the matter is that to do anything very meaningful with such a fundamental change as architecture, adaptations will needed at every level of the current architecture. This may be why changes to the way we use the Internet happen incrementally - too much resistance to universally disruptive change. Testing issues make network architecture research difficult; implementation issues threaten to make it prohibitively impractical.

Thursday, September 9, 2010

Side-by-side network testing

I was reading a whitepaper on an idea called OpenFlow. In short, it's a proposed specification to allow campus networks to run both traditional and experimental architecture side-by-side, allowing researchers to experiment with real data in real networks while not disrupting normal users'...well, usage. They focused on experiments with traffic controlled by one researcher, but I believe this could be expanded.

It looks like the OpenFlow switch could be configured to replicate traffic on the production network. If this is the case (and the traffic in question is not sensitive/whatever other privacy concerns there be are addressed), it seems like it shouldn't be too difficult to set up a replica network with virtual users. A few more powerful machines (each representing a number of actual and potential users) could be dedicated to simulating the same traffic that occurs on the production network and thus help determine how an experimental protocol or architecture idea would perform in the real world. What could be a better test than real data?

Saturday, September 4, 2010

Briefly on Internet Architecture Research

I've been thinking about the motivation for Internet architecture research, and why it seems that no for-profit companies are terribly interested in it right now (at least not compared with NSF, etc.). Most of us tend not to replace the car until it's becoming pretty clear that it needs replacing; parts break (frequently) or it fails to meet changing needs (the curse of the minivan for young families). I think that at the moment, our current Internet architecture tends to meet the needs of users reliably, and thus few entities are willing to fund research into changing that working architecture.

We probably shouldn't condemn anybody for not wanting to fix what ain't broke, but I am glad that there is research going on in a seemingly unwanted area. At some point, there will be breakage (I'm thinking due to problems of scale) or changing needs, which I'm not going to even try to predict. We're already working on the problems we can foresee, and I believe that the experience and knowledge gained will be useful in helping us cope with new and unforeseen demands as well. For instance, we have discovered that testing new network architecture ideas on a large scale is difficult -- networks that big tend to get annoyingly useful to somebody who will complain if the experiment fails and the network is down for a while. Thus we have research in the style of DONA, using techniques that will allow us to use parts of existing architecture to make any necessary transition as painless as possible.

In summation, I quote the Heavy Weapons Guy: "Good job, everyone!"

Thursday, September 2, 2010

DONA

Having read "A Data-Oriented (and Beyond) Network Architecture" from the 2007 SIGCOMM, I am both intrigued and skeptical about the idea. Intrigued because I think this could be an interesting way to route data over a network and avoid connection loss, but a bit skeptical because I fear the increased complexity for the user may prevent its adoption. Granted, these may be somewhat naive thoughts, but here they are anyway:

From the article: "Today, however, the vast majority of Internet usage is data
retrieval and service access, where the user cares about content
and is oblivious to location. That is, the user knows that she wants
headlines from CNN, or videos from YouTube, or access to her
bank account, but does not know or care on which machine the
desired data or service resides." This is true; the users, by and large, just want to get at the content their interested in, wherever it resides. One could probably extend this mindset and say that the users just want to have an easy way to get to content.

The problem I see is briefly addressed later in the paper:
"...How will users learn these
flat, long, and user-unfriendly names? We expect that users will
learn these flat names through a variety of external mechanisms that
the user trusts... Users won’t,
of course, remember the flat names directly, but will have their own
private namespace of human-readable names, which map onto these
global and flat names... While such flat names are harder
to use than today’s DNS names, they offer the advantage that the
mappings between private human-readable names and flat names
will be free to reflect evolving social structures rather than being
tied, as DNS names are, to a fixed administrative structure."

If the user is required to put more effort into managing oft-frequented sites, I believe there would be some more resistance to the use of this system, at least until something that sounds similar to DNS is implemented - a large, standard mapping of human-readable names to flat names. The problem there is that we're back to the very problem that DONA was trying to solve! I believe that as an architectural improvement, DONA could be quite valuable, but I fear that their apparent desire to replace DNS could not be realized because a less technically-sophisticated public would demand convenience. I am reminded of "Why Johnny Can't Encrypt," by Whitten and Tygar, where a user study found that PGP encryption was not widely adopted because of increased end-user complexity. Users, in the end, want networks to function with a minimum of input from their end, and I believe there would be a great deal of resistance to a change that increased the effort required of those users.

[As an aside: I really hate when people switch the misperceived-as-sexist-but-grammatically-gender-neutral "he" for the most-assuredly-sexist-but-in-the-nonstandard-way "she." Maybe I'll discuss that sometime later.]