Moments of Computer Science

Saturday, December 4, 2010

The research paper Spamming Botnets: Signatures and Characteristics describes a forensic method for analyzing email server traffic to recognize spam originating from botnets and to detect botnet-infected hosts. While no effort had been made as of the paper's publishing to create a real-time version of the algorithm, the uses of such a tool are very intriguing. Some possible uses include:

An email service could better screen spam email as it arrives. The researchers found that the email patterns of botnet spam campaigns were highly correlated with recent campaigns occuring within a month earlier.
Researchers or whitehat hackers could use the information to discover botnet control servers and help bring them down faster.
An ISP could learn which hosts it provides services to that are under botnet influence. It could contact its customers directly about the issue or otherwise add firewall rules to disrupt the communication between infected hosts and their control servers.
an email service could contact a host suspected to be infected by a botnet and warn them that they recently sent a fishy email. The user could then verify with the email service that they did in fact send a legitimate email or otherwise take steps to purge his or her machine of botnet influence.

Of all the uses of such a botnet detector, the latter two seem the most interesting because they allow ISPs and other services to potentially deal with the botnet problem from the end of the victims of infected hosts in a legal and ethical manner. The increase in accurate spam filtering and the reconnaissance information that could be provided to security researchers also seems very valuable.

Thursday, December 2, 2010

The virtue of ASTUTE

ASTUTE is a very recently published automatic detection method for sensing anomalous traffic over a network link based on the assumption that, in normal traffic, flows are independent and the net change of all flows over time is close to zero. The authors of ASTUTE show that the method is significantly better than previous ones at discovering deviant traffic that involves many small flows, whereas previous methods perform better at finding anomalies resulting from a few large flows. From my initial reading, ASTUTE seem especially suited to discovering problems related to a variety of network issues, including misconfigured or misbehaving applications, saturated links, problems with BGP routing, offline servers and orphaned clients, and so on. This seems to be because ASTUTE looks at TCP traffic at a very high level, makes certain assumptions about how TCP flows work in aggregate, and does little to analyze individual flows. ASTUTE's strengths appear to make it an excellent tool for trying to understand network traffic at a very high level over time.

On the other hand, ASTUTE seems less well equipped to be used as a security measure in detecting malicious flows. First of all, bots in a botnet are often spread out across multiple networks, so that most activities of the collective botnet are unlikely to be detected across a single flow. Second, most malicious software, with the exception of port scanners, doesn't do anything that would look abnormal from a TCP flow level. For instance, how could the communication of nodes in a botnet look relative to hosts interacting with a video game server? How would distributed file transfer between infected hosts be distinguished from a legitimate peer-to-peer file transfer? As a final example, how would single-host applications like keyloggers appear different from a chat or VOIP client that works through an intermediate server?

Overall, ASTUTE seems well positioned to assist a network administrator in understanding many types of traffic in his network, even if not the malicious kinds.

Thursday, November 11, 2010

At the end of our discussion in class on XORs in The Air: Practical Wireless Network Coding, I began to think about the feasibility of a commercial implementation of the idea behind the article. This article proposes the application of network coding techniques to wireless networking to opportunistically increase the amount of data that can be sent in one transmission. It holds great potential because transmission time is a scarce resource that can't be scaled up with added hardware. Furthermore, use of the technique is orthogonal to many other strategies for improving network efficiency. The idea involves nodes listening and recording temporarily all packets they can hear, including those not meant for them. Then, a transmitting node can XOR new packets with previously transmitted ones to send more than one packet simultaneously. Listeners can XOR the transmitted packet with packets they have stored in memory to obtain their desired payload.

The technique has several important technical hurdles to overcome in transforming itself to commercial success. The main issue seems to be that all nodes participating in a network that uses network coding require custom software in their network stack. This leads to significant deployment challenges; specifically, standard 802.11 nodes can't join the network unaided. This problem could be mitigated by requiring only infrastructure nodes in multi-hop networks to use network coding, but this would also limit the opportunities for optimization by network coding. Second, the benefits of network coding vary widely with factors such as network topology and current traffic and congestion levels, so that it may be difficult to predict how much the average network might benefit from network coding. Third, the technique increases the complexity of the network software stack, as nodes must now store extra network-specific state in the form of past packets. Lastly, network coding software might have a significant negative impact on node power consumption, as nodes are now required to listen to every packet that is broadcast in the network.

In light of the above hurdles, I believe the potential benefits of network coding as presented in the paper do not seem to be worth the costs. That is not to say the paper is not successful; it combined novel ideas and inspired a host of excitement new research in wireless networking. However, more work would need to be done to integrate network coding successfully into live wireless networks.

Overcoming the Challenges of Wireless Networking

Wireless networking poses unique challenges when compared to traditional networking. Two issues in particular are at the heart of these difficulties: every act of communication involves broadcasting and packet loss is much more frequent and dynamic. The broadcasting nature of wireless networks means that the medium of communication, air, is precious, as it must be shared by everyone. Also, it is very difficult for nodes to know whether or not adjacent nodes are available for communication, since an adjacent node may be busy communicating with a third node that the first node can't detect. Lastly, the packet loss characteristics of wireless communication in combination with TCP can lead to uneven resource availability and even full starvation, especially with multi-hop communication.

That said, researchers have applied their creative intellects to making the best of a fairly harsh networking environment. For instance, one paper proposed a change to a core multi-hop network to cause nodes in a network to work together to enforce fairness constraints at each hop of the network. While the algorithm could slightly reduce overall bandwidth utilization, it significantly reduced resource starvation and uneven resource usage between nodes. In another paper a group of researchers proposed that multiple packets be encoded together at once by XOR-encoding packets together. This method requires nodes to remember recent packets they have heard, as well as track other neighbor state, but could result in as much as 4x improvements in bandwidth usage. Lastly, we learned of a proposed technique to allow nodes to cooperate with another in sending a packet across a multi-hop network. When a node fires a batch of packets, all capable listeners are enabled to assist those packets in traversing the network, so that even if the original next-hop destination failed to hear some packets nearby neighbors could pass on the packets for it.

These innovative solutions showed me both just how difficult improving wireless networking and extending its use cases can be. At the same time, it showed me that difficult problems can provide fertile ground for creative thinkers to apply themselves.

Saturday, October 30, 2010

NIRA is a new inter-domain routing architecture that allows a sender to encode the route a packet should take to arrive at its destination. Currently a sender can only encode the destination of a packet; the routers that carry the packet to its destination decide the path it should take at each hop. User choice in packet routing could provide the infrastructure needed to offer several new services on the Internet. Some new services I was able to think of include:

ISPs could allow users to choose between multiple backbone providers and switch between providers at will. This could lead to increased competition among backbone providers as users become able to adjust their Internet plans dynamically based on price, quality of service, and other factors.
ISPs could specialize in different QOS's related to voice, video, gaming, and other uses of the Internet for which standard TCP connections provide a less-than-optimal user experience. Users could then purchase standard Internet services using one route but purchase a different route for specific applications like video and voice conferencing or online games.
Route choice could provide incentives to help ISPs transition to pay-as-you-go services. ISPs are currently struggling under the current unlimited monthly services model to remain profitable due to the uneven distribution of actual Internet usage among users. By allowing users to purchase from multiple different providers offering different QOS levels for different applications, users may become more acclimated to the idea of paying for the directly services that they use.

For me, the largest barrier to a technology such as NIRA succeeding in the current world is related to usability. Right now users don't have to think at all about the route their packets take throughout the Internet. If a new technology such as NIRA forces this choice on end users, then it must offer benefits that are significant enough to offset the new burden the technology places on them More likely, existing technologies like browsers and browser plugins would be augmented to make the choice intelligently for users so that they don't have to think about it beyond possibly an initial setup.

It's hard to tell from a superficial analysis how beneficial route choice would be for the growth of the Internet and for society as a whole, but it is an interesting feature to ponder new possibilities with.

Tuesday, October 26, 2010

Architecture Design Trade-offs

Most new networking designs have trade-offs; they offer new features and improvements but bring with them their own unique costs and caveats. It is little wonder then that agreement cannot be reached on how best to evolve the architecture of the Internet. The research paper on NIRA, a new inter-domain routing architecture, is one example of an Internet proposal upgrade. The proposal is elegant, relies on existing ideas and deployments like IPv6 as much as possible, and offers feature that could be useful for society. Specifically, the protocol aims to give users the ability to choose between available routes between a source and destination. As a side effect, it will also exhibit improved scalability relative to the existing inter-domain routing protocol BGP.

Of course, NIRA is not without potential flaws. First, it allocates IPv6 addresses in a specific way so as to limit other potential uses for the same address space. Second, it's protocol for communicating topology changes and user up-graphs has a per-message overhead linear with the number of domain-level links in a user's up-graph. Third, the network state a user must maintain is theoretically exponential in the depth of a user's up-graph. And last of all, if NRLS updates, analogous to DNS updates, are frequent they could also result in scalability issues as the Internet grows.

On one hand, the authors argue effectively that all of the above potential issues are unlikely to be realized in practice. On the other hand, the protocols of the original Internet also met their most important design goals, but at the same time have been stretched in ways their original authors never imagined. Is NIRA's added value important enough to integrate in the next Internet? Would its design weaknesses eventually become serious problems? As with the current Internet architecture, only time will tell!

Saturday, October 23, 2010

The Lawmaker's Dilemma

Reading economic papers on network neutrality and discussing the issues in class made me to wonder how lawmakers manage to get anything done. The issues are deep, complex, and spans multiple fields including technology economics, and politics. As a computer scientist I have a fairly deep understanding of the technical issues behind net neutrality and the effects of such regulation on software companies and network providers from that perspective, but I admit ignorance on most of the economic arguments brought to bear by Yoo, Lessig, and others. Interestingly, in one Berkeley paper I read I found significant fault with several of the technical issues mentioned along with their interpretation in the debate.

A lawmaker with potentially limited understanding of the issues, or, more importantly, the ramifications of his or her decisions, seems to have no way of rendering a correct verdict given the conflicting views presented by different schools of thought on net neutrality. If the intellectuals can't come to a solid agreement, should lawmakers be expected to succeed where the experts have failed? To me, this foray into politics further strengthened my position that a government should seek to be as minimal as possible while ensuring the rights of the people. Net neutrality is too complex, the consequences of legislation too difficult to predict. Let a good strong economic engine flesh out the issues before executive action is taken.