Moments of Computer Science: 2010

Saturday, December 4, 2010

The research paper Spamming Botnets: Signatures and Characteristics describes a forensic method for analyzing email server traffic to recognize spam originating from botnets and to detect botnet-infected hosts. While no effort had been made as of the paper's publishing to create a real-time version of the algorithm, the uses of such a tool are very intriguing. Some possible uses include:

An email service could better screen spam email as it arrives. The researchers found that the email patterns of botnet spam campaigns were highly correlated with recent campaigns occuring within a month earlier.
Researchers or whitehat hackers could use the information to discover botnet control servers and help bring them down faster.
An ISP could learn which hosts it provides services to that are under botnet influence. It could contact its customers directly about the issue or otherwise add firewall rules to disrupt the communication between infected hosts and their control servers.
an email service could contact a host suspected to be infected by a botnet and warn them that they recently sent a fishy email. The user could then verify with the email service that they did in fact send a legitimate email or otherwise take steps to purge his or her machine of botnet influence.

Of all the uses of such a botnet detector, the latter two seem the most interesting because they allow ISPs and other services to potentially deal with the botnet problem from the end of the victims of infected hosts in a legal and ethical manner. The increase in accurate spam filtering and the reconnaissance information that could be provided to security researchers also seems very valuable.

Thursday, December 2, 2010

The virtue of ASTUTE

ASTUTE is a very recently published automatic detection method for sensing anomalous traffic over a network link based on the assumption that, in normal traffic, flows are independent and the net change of all flows over time is close to zero. The authors of ASTUTE show that the method is significantly better than previous ones at discovering deviant traffic that involves many small flows, whereas previous methods perform better at finding anomalies resulting from a few large flows. From my initial reading, ASTUTE seem especially suited to discovering problems related to a variety of network issues, including misconfigured or misbehaving applications, saturated links, problems with BGP routing, offline servers and orphaned clients, and so on. This seems to be because ASTUTE looks at TCP traffic at a very high level, makes certain assumptions about how TCP flows work in aggregate, and does little to analyze individual flows. ASTUTE's strengths appear to make it an excellent tool for trying to understand network traffic at a very high level over time.

On the other hand, ASTUTE seems less well equipped to be used as a security measure in detecting malicious flows. First of all, bots in a botnet are often spread out across multiple networks, so that most activities of the collective botnet are unlikely to be detected across a single flow. Second, most malicious software, with the exception of port scanners, doesn't do anything that would look abnormal from a TCP flow level. For instance, how could the communication of nodes in a botnet look relative to hosts interacting with a video game server? How would distributed file transfer between infected hosts be distinguished from a legitimate peer-to-peer file transfer? As a final example, how would single-host applications like keyloggers appear different from a chat or VOIP client that works through an intermediate server?

Overall, ASTUTE seems well positioned to assist a network administrator in understanding many types of traffic in his network, even if not the malicious kinds.

Thursday, November 11, 2010

At the end of our discussion in class on XORs in The Air: Practical Wireless Network Coding, I began to think about the feasibility of a commercial implementation of the idea behind the article. This article proposes the application of network coding techniques to wireless networking to opportunistically increase the amount of data that can be sent in one transmission. It holds great potential because transmission time is a scarce resource that can't be scaled up with added hardware. Furthermore, use of the technique is orthogonal to many other strategies for improving network efficiency. The idea involves nodes listening and recording temporarily all packets they can hear, including those not meant for them. Then, a transmitting node can XOR new packets with previously transmitted ones to send more than one packet simultaneously. Listeners can XOR the transmitted packet with packets they have stored in memory to obtain their desired payload.

The technique has several important technical hurdles to overcome in transforming itself to commercial success. The main issue seems to be that all nodes participating in a network that uses network coding require custom software in their network stack. This leads to significant deployment challenges; specifically, standard 802.11 nodes can't join the network unaided. This problem could be mitigated by requiring only infrastructure nodes in multi-hop networks to use network coding, but this would also limit the opportunities for optimization by network coding. Second, the benefits of network coding vary widely with factors such as network topology and current traffic and congestion levels, so that it may be difficult to predict how much the average network might benefit from network coding. Third, the technique increases the complexity of the network software stack, as nodes must now store extra network-specific state in the form of past packets. Lastly, network coding software might have a significant negative impact on node power consumption, as nodes are now required to listen to every packet that is broadcast in the network.

In light of the above hurdles, I believe the potential benefits of network coding as presented in the paper do not seem to be worth the costs. That is not to say the paper is not successful; it combined novel ideas and inspired a host of excitement new research in wireless networking. However, more work would need to be done to integrate network coding successfully into live wireless networks.

Overcoming the Challenges of Wireless Networking

Wireless networking poses unique challenges when compared to traditional networking. Two issues in particular are at the heart of these difficulties: every act of communication involves broadcasting and packet loss is much more frequent and dynamic. The broadcasting nature of wireless networks means that the medium of communication, air, is precious, as it must be shared by everyone. Also, it is very difficult for nodes to know whether or not adjacent nodes are available for communication, since an adjacent node may be busy communicating with a third node that the first node can't detect. Lastly, the packet loss characteristics of wireless communication in combination with TCP can lead to uneven resource availability and even full starvation, especially with multi-hop communication.

That said, researchers have applied their creative intellects to making the best of a fairly harsh networking environment. For instance, one paper proposed a change to a core multi-hop network to cause nodes in a network to work together to enforce fairness constraints at each hop of the network. While the algorithm could slightly reduce overall bandwidth utilization, it significantly reduced resource starvation and uneven resource usage between nodes. In another paper a group of researchers proposed that multiple packets be encoded together at once by XOR-encoding packets together. This method requires nodes to remember recent packets they have heard, as well as track other neighbor state, but could result in as much as 4x improvements in bandwidth usage. Lastly, we learned of a proposed technique to allow nodes to cooperate with another in sending a packet across a multi-hop network. When a node fires a batch of packets, all capable listeners are enabled to assist those packets in traversing the network, so that even if the original next-hop destination failed to hear some packets nearby neighbors could pass on the packets for it.

These innovative solutions showed me both just how difficult improving wireless networking and extending its use cases can be. At the same time, it showed me that difficult problems can provide fertile ground for creative thinkers to apply themselves.

Saturday, October 30, 2010

NIRA is a new inter-domain routing architecture that allows a sender to encode the route a packet should take to arrive at its destination. Currently a sender can only encode the destination of a packet; the routers that carry the packet to its destination decide the path it should take at each hop. User choice in packet routing could provide the infrastructure needed to offer several new services on the Internet. Some new services I was able to think of include:

ISPs could allow users to choose between multiple backbone providers and switch between providers at will. This could lead to increased competition among backbone providers as users become able to adjust their Internet plans dynamically based on price, quality of service, and other factors.
ISPs could specialize in different QOS's related to voice, video, gaming, and other uses of the Internet for which standard TCP connections provide a less-than-optimal user experience. Users could then purchase standard Internet services using one route but purchase a different route for specific applications like video and voice conferencing or online games.
Route choice could provide incentives to help ISPs transition to pay-as-you-go services. ISPs are currently struggling under the current unlimited monthly services model to remain profitable due to the uneven distribution of actual Internet usage among users. By allowing users to purchase from multiple different providers offering different QOS levels for different applications, users may become more acclimated to the idea of paying for the directly services that they use.

For me, the largest barrier to a technology such as NIRA succeeding in the current world is related to usability. Right now users don't have to think at all about the route their packets take throughout the Internet. If a new technology such as NIRA forces this choice on end users, then it must offer benefits that are significant enough to offset the new burden the technology places on them More likely, existing technologies like browsers and browser plugins would be augmented to make the choice intelligently for users so that they don't have to think about it beyond possibly an initial setup.

It's hard to tell from a superficial analysis how beneficial route choice would be for the growth of the Internet and for society as a whole, but it is an interesting feature to ponder new possibilities with.

Tuesday, October 26, 2010

Architecture Design Trade-offs

Most new networking designs have trade-offs; they offer new features and improvements but bring with them their own unique costs and caveats. It is little wonder then that agreement cannot be reached on how best to evolve the architecture of the Internet. The research paper on NIRA, a new inter-domain routing architecture, is one example of an Internet proposal upgrade. The proposal is elegant, relies on existing ideas and deployments like IPv6 as much as possible, and offers feature that could be useful for society. Specifically, the protocol aims to give users the ability to choose between available routes between a source and destination. As a side effect, it will also exhibit improved scalability relative to the existing inter-domain routing protocol BGP.

Of course, NIRA is not without potential flaws. First, it allocates IPv6 addresses in a specific way so as to limit other potential uses for the same address space. Second, it's protocol for communicating topology changes and user up-graphs has a per-message overhead linear with the number of domain-level links in a user's up-graph. Third, the network state a user must maintain is theoretically exponential in the depth of a user's up-graph. And last of all, if NRLS updates, analogous to DNS updates, are frequent they could also result in scalability issues as the Internet grows.

On one hand, the authors argue effectively that all of the above potential issues are unlikely to be realized in practice. On the other hand, the protocols of the original Internet also met their most important design goals, but at the same time have been stretched in ways their original authors never imagined. Is NIRA's added value important enough to integrate in the next Internet? Would its design weaknesses eventually become serious problems? As with the current Internet architecture, only time will tell!

Saturday, October 23, 2010

The Lawmaker's Dilemma

Reading economic papers on network neutrality and discussing the issues in class made me to wonder how lawmakers manage to get anything done. The issues are deep, complex, and spans multiple fields including technology economics, and politics. As a computer scientist I have a fairly deep understanding of the technical issues behind net neutrality and the effects of such regulation on software companies and network providers from that perspective, but I admit ignorance on most of the economic arguments brought to bear by Yoo, Lessig, and others. Interestingly, in one Berkeley paper I read I found significant fault with several of the technical issues mentioned along with their interpretation in the debate.

A lawmaker with potentially limited understanding of the issues, or, more importantly, the ramifications of his or her decisions, seems to have no way of rendering a correct verdict given the conflicting views presented by different schools of thought on net neutrality. If the intellectuals can't come to a solid agreement, should lawmakers be expected to succeed where the experts have failed? To me, this foray into politics further strengthened my position that a government should seek to be as minimal as possible while ensuring the rights of the people. Net neutrality is too complex, the consequences of legislation too difficult to predict. Let a good strong economic engine flesh out the issues before executive action is taken.

Thursday, October 21, 2010

More than Two Sides to Network Neutrality

After reading two different economics papers on net neutrality, I learned to my surprise that there are more than two sides to the network neutrality debate! From these two papers alone I saw three major themes in the debate, one favoring a free market and two involving very different kinds of regulation. The free market argument, led by Christopher Yoo at Vanderbilt, asserted that the currently proposed neutrality regulations would cause more harm than good by limiting the freedom of network providers to innovate, to differentiate themselves, and to absorb the congestion costs of the Internet. In effect this would hinder them from competing with one another and would reduce the incentive for companies to invest in network infrastructure. Furthermore, according to Yoo, price discrimination based on type of service requested would be the optimal strategy to control Internet usage in a way that benefits both providers and consumers.

On the opposite end, neutrality regulation advocates, as exemplified by Lawrence Lessiag at Stanford, have a strong opinion on how the Internet should work in that it should be a "dumb" pipe used to transport data between endpoints. Like other utilities such as highways and electricity, discrimination should not occur based on purpose or type of use, source, or destination, as such discrimination would greatly reduce the overall value of the Internet to society for the benefit only of the network providers. In order to prevent any such future abuses of network infrastructure, the government should intercede now by outlawing this behavior.

Lastly, Douglas Hass argued in 2007 that an analysis of the history of the Internet reveals that control over its direction has always been primarily dictated not by any national or local ISPs but by the purchasing power of users themselves. He gives a few high profile examples of attempts, perceived or real, at violating net neutrality and the backlash responses that have kept the Internet mostly in line. Furthermore because of how rapidly Internet technology has evolved, both at the infrastructure and application level, Hass believes we should not presume at this time to know what is best for the Internet in terms of neutrality. Instead, he advocates regulations that enforce transparency in the Internet services customers purchase, including transparency related to any kind of traffic discrimination a network provider might perform. This will allow users to be informed about their decisions and continue to drive the evolution of the Internet through their purchasing power.

Of all these opinions, I currently favor the last, although I now have a greater appreciation for the complexity of the issue. My biggest issue with the transparency approach is that it will be very difficult to define and enforce the regulations needed to ensure ISPs reveal all that they should about their network policies. However, if such transparency could be enforced, it alone would seriously regulate company behavior relative to neutrality simply based on the fact that their actions will be placed in the open for public scrutiny. Also, in the worst case, such regulation would reveal any glaring violations of net neutrality immediately as they occur, so that if more regulation is eventually needed it can be applied quickly and effectively.

As much as I admire the noble ideas of network neutrality, I also believe that the government already has its hands full regulating our selfish behavior as companies and as citizens. If unverifiable economic theory is the only proof of needed regulation, then perhaps it can wait until stronger proof is brought to light.

Sunday, October 17, 2010

BGP is well-known for its faults. To me, it is interesting that the protocol is so different than TCP/IP, a protocol pair that has managed to scale so well as the Internet as grown. Using the deficiencies of BGP described in the paper "HLP: A Next Generation Inter-domain Routing Protocol", here a several ways in which the two protocol families differ.

- Scalability. TCP has scaled well as the number of users on the Internet increases because it has little to no intermediate state of interest to any other connections. TCP has had a difficult time scaling as available bandwidth and connection latency have both increased, but sender-only additions have been made to address this issue. BGP, on the other hand, doesn't scale as well. BGP routers must maintain state linear with the size of the Internet, and the number of globally visible state updates also grows linearly to the Internet size to synchronize this state.

- Convergence. TCP converges on one thing only: packet drops. Even though packet drops are related to a single piece of intermediate state, router queue size, the drop events themselves are local in scope, which greatly simplifies the transmission rate convergence process. BGP, on the other hand, as mentioned above, maintains a significant amount of global shared state. Furthermore, sender-side additions to TCP have greatly increased its ability to converge more rapidly to a healthy transmission rate, whereas no such improvements have been made to BGP.

- Fault isolation. Because of its initial use for military communication, the TCP/IP stack was designed to continue to operate in the face of local failures through the means of storing state on the ends. BGP, on the other hand, frequently requires global state updates to handle local faults.

Overall, the primary difference between the two protocol families is that one keeps all of its state on the ends and the other keeps complex state at the core of the Internet. While this analysis may not lead to improvements in BGP or inter-domain protocols, it is interesting to compare the two types of protocols to help understand the origins of BGP's current flaws.

Practical Routing Research

A while ago I shared a single Internet connection between 3 different apartments in a complex. However, I found that most of the time I only received a small fraction, less than 1/100th, of the advertised bandwidth for the link. We eventually figured out that 1 of the three apartments was doing something to consistently steal the majority of the bandwidth. I had no idea how they were managing to do so, but I decided to try to research and see if there were any fairshare enforcement policies that I could turn on with the home router that we were using. Unfortunately, I wasn't able to find anything. I couldn't figure out if I simply didn't have the right search terms or if home routers didn't in general support such a feature. It seemed like an easy thing to do; simply maintaining some address or connection-based state and ensure that all connections receive an equal share of the available bandwidth.

Having had the above experience, I was very happy to read the research paper "Promoting the Use of End-to-End Congestion Control in the Internet". Although the main point of the paper is to introduce new mechanisms for inducing end-to-end congestion control through incentives, the article also discusses the core principles behind Internet fairshare. As the paper explains, there are two ways a connection can obtain more than its fair share of bandwidth. First, it can ignore the rules of congestion control. An example is any application that communicates through UDP. Second, the application can open multiple connections to perform the work of one connection. For instance, if all three apartments were using the network simultaneously but the application in one apartment opened up 10 connections simultaneously, then that apartment would obtain about 83% of available bandwidth. Some routers do implement per-flow scheduling using techniques such as weighted round-robin, but they are only guaranteed to work on a per-hop basis.

With the understanding I gained from the paper I am now better equipped to handle a similar problem in the future. I have a better background needed to choose the appropriate search terms to figure out if my router supports the relevant features. Furthermore, I would be better equipped to analyze packet traces on the network to determine the cause for the drop in bandwidth for two of the three apartments. While learning these things may not have been the main objective of reading the paper, it was still very enjoyable to associate what I was learning in class with solving problems I've had in real life.

Monday, October 11, 2010

Renaissance of TCP's

Will we one day reach a point where TCP algorithms are so varied and specific that TCP streams could flexibly adapt to connection-specific conditions and applications could decide what kind of TCP protocol they want and when? After learning in class about the variety of modifications and additions proposed to TCP over the past decade, it almost seems natural to evolve TCP in that direction.

For instance, certain algorithms like SCTP or HSTCP were designed to handle a specific scenario like efficient bandwidth utilization on long fat links, but were shown to perform unfairly in more standard usage scenarios. In a more dynamic TCP environment, a TCP implementation could adapt itself to measured conditions like high RTT and low congestion and switch congestion avoidance algorithms entirely and completely transparently to the client application.

Another use case of dynamic TCP could be application-driven TCP. As an example, authors in one paper we read claimed that the current implementation of TCP is not optimized for P2P applications. We also read of another algorithm specifically designed for the data center. If an application was allowed to select per-stream protocols, they could then choose the algorithm that best suites the needs of the application itself and the data it is sending. As a further possibility, an application, when installed, could install a new version of TCP that it prefers in some or all scenarios.

This flexibility and ubiquity of TCP opens up many issues related to fairness, correctness of implementation, efficiency of dynamic algorithms that generate metrics an application can respond to, and so on, but they also open up the possibility for a much more flexible transport layer that could adapt to the needs of individual connections or applications, which may help improve overall connection efficiency on the Internet.

Sunday, October 10, 2010

Compound Ideas

The paper A Compound TCP Approach for High-speed and Long Distance Networks describes a novel TCP algorithm to utilize high speed and long-delay networks more efficiently while maintaining inter- and intra-protocol fairness. For me, the most interesting part of the paper is the build-up from the "previous work" section to the section describing the new algorithm itself. Through the two sections one learns that the paper's novel contribution is simply the combination of existing ideas in new ways. More specifically, compound TCP basically combines HSTCP, an aggressive loss-based congestion control algorithm, with delay-based information very similar to that used by the Vegas and Fast TCP algorithms. Even more interestingly, CTCP isn't even the first to combine the two approaches; TCP Africa also attempted a similar strategy. Yet, despite the apparent simplicity of CTCP relative to previous work, it apparently improved on existing approaches enough to become a standard available option new versions of windows. To me, this paper serves an excellent example of incremental evolution in the research community.

Monday, October 4, 2010

The Past and Future of Internet Research

As I've spent time studying pieces of the Internet research literature I've been surprised that the majority of it has been entirely based on empirical observation, and in the infancy of the Internet, observation on live systems. For instance, congestion avoidance algorithms came about after an episode that caused a portion of the network to drop its throughput one thousand-fold. Van Jacobsen, a noted researcher and the owner of an affected segment of the network, decided to analyze and fix the problem through a hands-on debugging session. Although it appears that over time people have began to use simulation software to test hypothesis more rigorously before deployment, these tests are still inherently observation-based.

This style of research contrasted greatly with that of Feng and Vanichpun on TCP Vegas. In their 2003 research paper they construct a mathematical model of Reno and Vegas connections competing with one another for bandwidth. With this model they could predict the probability that a particular packet would be dropped, the average window size and queue size of a connection, and most importantly, the ratio of throughput of the two connections. Using this model they were easily able to tweak Vegas parameters to cause the two connection types to be fair with one another, something that no one had accomplished previously. The empirical experiments that followed simply served the role to verify the correctness of the model. If the researchers did not perform this rigorous analysis of the two connections, they would have had only brute-force methods at their disposal to try to discover an optimal parameter set for TCP Vegas.

Because of this paper I believe that continued progress on Internet architecture research will become increasingly reliant on constructing such models and performing such analyses. Distributed concurrent autonomous systems are inherently complex and The Internet itself continues to grow in complexity over time. Discovering models that allow us to predict and understand aspects of this system more thoroughly will enable us to discover new ideas we would otherwise be unable to find.

Sunday, October 3, 2010

TCP Algorithms and Test Suites

The story of TCP Reno and TCP Vegas could easily be used as a poster story for the test-driven development camp. TCP Reno implemented an at-the-time new technique for quickly recovering from transient network congestion called fast recovery. However, after people were already convinced to implement and deploy TCP Reno on hosts around the world, researchers discovered a flaw in Reno that resulted in significantly poorer performance than TCP Tahoe, its predecessor, when a consecutive sequence of packets were lost simultaneously.

TCP Vegas, on the other hand, suffered from a related but different problem. The authors of Vegas made claims of serious performance increases to the Internet if their new algorithms were implemented and deployed. Unfortunately, because Vegas was so different in its approach many people had a difficult time believing that these performance enhancements would be observable in a production system. Furthmore, many worried that Vegas would cause currently unforeseen problems with the Internet if deployed and might not play well with existing TCP implementations.

Both of these stories may have had a happer ending if their were a large suite of test cases that any new TCP algorithms or enhancements could be tested against. These test descriptions could be ran on either simulation software, partially-simulated environments, or real hardware. These tests should be broad enough to cover all of the common traffic patterns found on the Internet as well as important edge cases that an algorithm must handle to be sufficiently robust. Furthermore, tests could be used to determine how a new TCP algorithm interacts with existing deployed solutions. Lastly, the test suite would provide a common ground for researchers to communicate with. Not only could a researcher both verify for himself and others the validity of his or her new ideas, but if more questions or concerns arose then the test suite could simply be extended with new tests that explored these concerns.

Having a suite of tests has proved invaluable for many core infrastructure technologies just as compilers. Applying such a philosophy to what is quite possibly the most important infrastructure in the world should lead to increased productivity in researching new ways of improving TCP.

Saturday, September 25, 2010

A Conjecture on Bandwidth Estimation

After discussing in class a couple of peer-selection algorithms for enhanced peer-to-peer data transfer performance, along with a couple of techniques related to video broadcasting and incremental peer optimization, I asked myself how much would the Internet benefit if a fast, accurate estimate of the theoretical maximum throughput, minimum latency, and expected data loss rate could be obtained for any arbitrary connection between two hosts. Some of the benefits I imagined include:

More efficient overlays, either via IP multicast or peer-to-peer applications, for multicasting applications.
Better match-making for online games. Users that are near each other could be more likely to be matched together to improve latency and performance.
Easier estimates for expected bandwidth and latency based on the theoretical values and the current load on both hosts.
Better TCP utilization. Instead of a TCP connection gradually raising its transfer rate and then halving the rate when packets are dropped, a smarter algorithm might attempt to maintain a steady transfer state around a computed expected bandwidth and latency. In addition, dynamic ACK timeout values could be derived to avoid premature packet re-transfer.
Better user warnings and errors for applications with strict latency/bandwidth requirements like voice/video conferencing and video streaming. Before the application even begins to transfer data, warnings could be displayed to the user that the potential or expected bandwidth or latency between connection is lower than is required for the service to operate smoothly.

I do not know what infrastructure changes would be required to enable such a feature in the Internet or how much trust would be placed between nodes to provide honest answers to bandwidth and latency queries, but it seems that such a feature would have a positive impact on current Internet applications.

Internet Video Broadcast and Peer-to-Peer selection

In our Internet architecture class this week we presented and discussed several papers related to peer-to-peer networking. I in particular focused on the article "Opportunities and Challenges of Peer-to-Peer
Internet Video Broadcast", which discussed the current state of the art in peer-to-peer video broadcasting. The two example techniques for video broadcast that were described in this paper both attempted to incrementally optimize the broadcast overlay. In other words, peers periodically attempted to change the peers they shared data with to improve the flow of data in the system. Also, both methods appeared to go make such peer-switching decisions by having peers track the history of the performance of other peers they interact with.

At the same time, we also discussed another paper on a peer selection heuristic for improving the performance of Bittorrent downloads. This method relies on the use of CDNs. When a node joins the system, it contacts a CDN-distributed site and tracks the chain of redirects it follows to retrieved the cached content. Then, when choosing peers initially, it compares these redirects against those of other nodes to select those that appear to be in the same or the closest CDN. The use of this technique resulted in significant performance gains.

Having studied both papers, it seems very easy to ask why anyone has not tried tried applying a similar technique to improving the performance of peer-to-peer video broadcasts. That's when I started to see one of the values of collaborating together and sharing information in a class or lab setting. By sharing what we know, we can synthesize new ideas from existing ones much more easily. Most ideas seem simple in hindsight, but the process of discovering those ideas is often much more complicated, and sharing research knowledge in a lab setting appears to be an effective way to spark innovation.

Thursday, September 16, 2010

Why distributed data stores matter

After our recent discussion on Chord, a distributed hash table (DHT) implementation, I began to try to connect the ideas Chord presented with all of the other popular persistent storage technologies that have been advertising themselves recently, such as the NoSQL solutions, as I wanted to know how these ideas are related. Chord's decentralized approach to data lookup allows it to be robustly fault-tolerant as nodes enter and leave the system. While most companies that use such data stores don't have to worry about autonomous nodes entering and leaving at will, they do need an easy way to add additional nodes to their systems as their business scales upward. Also, their data storage systems, consisting of large clusters of commodity servers, must handle node failures gracefully, which means both redundant data storage and system stabilization algorithms are required. Lastly, many companies need to perform distributed computations on large data sets, which is made possible by the distributed nature of these data stores.

A few interesting commercial and open source implementations of distributed databases are the Apache Cassandra project, google's proprietary BigTable, the Hadoop-compatible Hypertable. While Cassandra, based on Dynamo, uses a DHT internally, BigTable and Hypertable both use a different data storage format. Wikipedia describes BigTable as a "sparse, distributed multi-dimensional sorted map".

While Chord itself may not have caught on in large commercial organizations like Yahoo, Google, or Facebook, it would be interesting to learn how much DHT's like Chord have influenced the design of the next generation of persistent data stores, and what problems they focus on trying to solve compared to Chord.

Analyzing Internet Packet Flow

The paper End-to-End Internet Packet Dynamics provides a classic example of Internet measurement studies. Based on my reading and discussing of the paper in class, I thought of some interesting questions or directions that might be useful to answer or explore.

What if it were possible to actually determine the exact link that formed the bottleneck in a connection? I don't think this is possible with the current architecture, but if it were, routers might be able to update costs temporarily for a particular TCP connection. This would require a small amount of transient state, but could lead to more optimized data transfer and less packet queueing if an alternative route could be discovered that had a higher bottleneck rate.
The paper concludes that certain pathological scenarios like out-of-order-delivered packets and data and ACK packet loss are highly connection-specific or specific to the current level of traffic. One interesting avenue of exploration might be to analyze the connections that generated the disproportionate amount of pathological cases to determine their cause. If they are only congestion-specific, for example, then some transient router state to detect such cases might prove valuable in reducing the number of retransmissions that occur.
The authors hypothesize that 16-bit TCP checksums are insufficient for the large amount of data that is currently transferred over the Internet, but that the chance of a corrupted packet being accepted as valid is too high. This claim is hard to believe 15 years later, as we still use the same size checksums and our data transfer rates and Internet usage have only increased. Even still, it might be valuable to mathematically analyze certain pathological scenarios like the probability of a file transfer of a certain size containing corrupted data, and then empirically evaluating the probability with different connections.

As our teacher told us that this paper is seminal in its field, it is likely that all of these questions have been explored already, but I find them interesting to think about nonetheless and perhaps useful in understanding how the Internet currently works.

Monday, September 13, 2010

The Permissiveness of Active Architectures

After studying two research papers in my Internet research class on active architectures, I wonder why these researchers did not settle on proposing a plugin-based architecture instead. Relatively few systems exist that allow arbitrary code to be executed. Google App Engine, Google NaCl, the JVM, and the CLR are all good examples of such system and provide facilities for controlled and secure execution of arbitrary code. However, all of these examples are platforms that were built explicitly as Turing-complete machines to run any application on, trusted or untrusted. I do not, however, believe that a network architecture should have the same kind of flexibility as a programming language runtime. A network need not execute untrusted code, and its scope is much more limited than that of a general language runtime. In proposing such an architecture, these researchers almost seem to be considering the interests of a few minority parties over the those of the public at large.

Instead of proposing an architecture where any user can execute arbitrary code, why not propose one with a plugin system? Such a design would place control and responsibility of the code that gets executed on the network back in the hands of the network owners and operators themselves, without sacrificing generality. Such a system could allow autonomous systems to advertise which plugins they support, and BGP-like protocols could take such information into account when routing a packet that requires a specific plugin.

While such a design would require manual intervention by ISPs to install a new network feature, the number of security and performance issues would drop dramatically and those that remained would be much simplified. Also, if a plugin provided significant benefits to customers or ISPs, the utility of the plugin itself should be enough incentive for organization to install it on their existing hardware. Lastly, not all of the Internet needs to have a plugin installed for it to be used, only enough that a path can be found between a source and destination where each AS hop supports the plugin.

A plugin architecture can make the Internet just as dynamic as an active architecture, but through much safer and less radical means.

Sunday, September 12, 2010

Measuring the Utility of the Internet

The paper Modeling the Adoption of Network Architectures attempts to define a simple user utility-based model for measuring the migration from an existing network architecture to a new one. The model is based primary around standalone utility, the benefits a single user or organization receives from using the Internet, and network effects, or the benefits a user or organization receives because other users and organizations use the same network architecture. However, the authors lament that they have no way of determining real world values for any of their model parameters. This prompted to think about the value that the current Internet provides to different groups of people and ask if it is even possible to provide an accurate numeric value for something as complex as user utility.

At its core, the Internet provides a protocol, IP, for hosts to send messages to each other, and two protocols, TCP and UDP, for sending arbitrary-sized chunks of data between hosts. To hardware, manufacturers, this provides value in that if their equipment can communicate with a IP-aware router, then it can communicate to the Internet. To software developers, on the other hand, this means that they can rely on underlying TCP/UDP operating system calls to write network-aware applications. For specialized applications like web servers, even higher-level abstractions exist for HTTP-based communication, and yet higher abstractions that attempt to hide the HTTP request/response cycle. At their core, they are all built on TCP/IP.

Most organizations, however, will use the Internet through a variety of applications running on a variety of hardware devices. Their internal networks will also have to respond to a multitude of disparate needs. Our university campus, for instance, most likely has many kinds of "middlebox" network technologies like web filters and caches, and network activity monitors to detect malware. And, of course, the users in an organization can differ vastly in their needs from one another. Some users may user Internet primarily for basic communication: email, chat, and web-conferencing. Home users, on the other hand, may use any of a number of applications ranging from streaming video to casual Flash games. As a final interesting note, the standalone utility of the Internet for any user is constantly changing as new applications are developed and the infrastructure evolves to provide more bandwidth.

While the Internet itself provides only one network effect, the ability for any connected, addressable host to communicate with any other connected, addressable host, there are a host of applications that also take advantage of network effects. Social media, for instance, notoriously benefit from increased numbers of users, as well as any other communication tool like email. E-commerce businesses like Amazon.com also see increased revenue with more Internet adoption. As a last example, there has recently been increased effort poured into collaboration tools like Google Office, Google Wave, and Etherpad, all of which provide a limited amount of network effect-based benefits to some users in that they really only benefit from networking with a few others. at a time.

After considering not only the large number of different ways different users can benefit from a network architecture but also the different ways in which individual users may use such a broad and general tool, it begins to seem very difficult to deduce a general numeric value to assign to "user utility" for all network architectures. Perhaps the biggest value from this research, though, simply comes from the fact that it may cause others to consider what will be required to making breaking changes to the Internet.

Sunday, September 5, 2010

I Told Him We Already Got One!

Reading about one of the many proposals for a brand new Internet recently reminded me of the famous Monty Python Scene where King Arthur offers to allow the Master of the French Guards to join him in his quest for the holy grail, and the guards respond by saying they've already got one. It felt to me like those proposing a new Internet architecture were doing much the same by inviting others to join them in a quest for a holy grail, except that the invitees really do already have a grail that works nicely. This issue is not common to the Internet. Those on the fringes of programming language research, for example, have claimed that the current crop of programming languages is insufficient for the parallel programming needs of the future, and yet the industry still churns out code well enough using existing tools. And, of course, one need not look long beyond the field of computing to see scenarios where change is resisted because the status quo is good enough.

In the paper I read, "A Data-Oriented (and Beyond) Network Architecture", the author proposes layering a name-based anycast protcol atop IP in order to more properly fulfill three current primary needs that the original Internet did not foresee: persistence, availability, and authenticity. It is interesting to learn how the architecture proposed in this paper meets these needs relative to how the current Internet handles them.

First is name-based persistence. The authors believe that as long as the data or service underlying a particular name is available, a user should be able to access it using that name. Currently, this is handled through the DNS system. If the resource underlying a URL moves, the distributed DNS system must be updated as to its new location. Intermediate measures like redirects are needed until all DNS caches have been updated. In contrast, using the anycast protocol described in the paper, a moved resource will most likely still remain in the same AS in which it originated, and since routing at the naming level occurs between AS's, no updating of the route handlers will need to occur most of the time. However, updates will still need to occur if inter-AS movement occurs, and the authors also appear to ignore the issues of reliability and failover, which suggests that a new TCP-like service would need to be built on the named routing layer to maintain functionality similar to the existing Internet.

Secondly, the proposed named routing system encourages the replication of data to multiple nodes for both high availability and low latency access. The routing system is designed to handle the ambiguity of multiple locations for a named piece of data by allowing route handlers to choose the current "best" route to the named entity. Currently, the Internet provides this type of service through the use of content distribution networks like Akamai. From my limited understanding on the matter, the routing layer already allows for multiple locations for a single IP address, enabling these systems to route a client to the nearest copy of data made available by the CDN instead of the location of the original data or service. While I have been told that routing table sizes are growing at an alarming rate, I do not know at what point this issue will have a visible impact on the level of service provided by the current Internet infrastructure.

Lastly, the named routing system integrates an authentication scheme into the system through the use of asymmetric encryption keys, allowing anyone to verify that the server of a service or piece of data is authorized to serve that content. This same feature is currently provided through the use of SSL certificates verified by a third party VeriSign. Since I am not an expert in security, I do not know the difficulty in performing authentication attacks against the current system relative to performing the same attacks on the system proposed in the paper, but the proposed system is likely inherently more secure and easier to implement than the system the current Internet uses.

Again, the new Internet system described in the paper was conceived to more correctly handle needs that the current Internet is meeting but for which it was never designed to handle. At what point will existing solutions be sufficiently worse than such a system that upgrading begins to be seen as cost-effective? For now, the Internet, decades old as it is, already seems to handle all of these needs effectively enough.

Software Design Principles and the Building of the Internet

The article "The Design Philosophy of the DARPA Internet Protocols" impressed me as it described a very organic process for building a successful large software system. It struck me as a very healthy way to build a software product and one which many of the modern software development methods share much in common with. It also shows that good engineering principles have existed for many years, even though we still flounder in our attempts to codify an apply them consistently.

The first principle the article exemplfies is that of properly defining and prioritizing requirements. The builders of the Internet knew exactly what their client, the government, needed, and knew exactly the order of importance of the requested features. Prioritization of features is especially important when the cost of implementation of features is calculated during the initial design and implementation phases. For instance, if a feature is of relatively low importance to a customer but would require disproportionately significant engineering effort to implement, it can be safely discarded even if it is higher in priority than other features that are much easier to integrate into the system.

In relation to the first principle, I was also impressed that the designers of the Internet did not attempt to generate a cohesive design that would fulfill the majority of their customer requirements at once. Instead, they focused on the most important features first and then iteratively modified their design as they reconsidered the extending the existing system to fulfill new requirements. As an example, TCP was originally written as the common protocol by which the primary government requirements of robust, reliable communication between disparate networks would be met. However, when the system was analyzed in light of the need for multiple types of service, the Internet authors subsequently split TCP into TSoftware Design Principles and the Building of the InternetCP and IP and added UDP as an alternative transport layer protocol. This organic growth process is much easier for a development team to cope with, and also allows for the team to show progress and obtain feedback faster by delivering working software to their clients sooner.

Finally, the architecture of the Internet is a great example of effective distributed system design, and shows one way one of the most fundamental problems of software design, the problem of composing reusable software components, was handled. Evidence of this problem's inherent complexity abounds, from the existence of module frameworks like OSGi to dependency injection systems like Spring, to the fundamental methods of composition found in differing software paradigms like object-oriented and functional programming. The design of the Internet has been described as effectively applying the "end-to-end" principle in meeting its objectives in a layered architecture. Value lies in studying both this generic principle and its specific manifestation in the TCP/IP stack in order to learn more about how to compose software component in building a larger system.

Moments of Computer Science