This is my personal blog. The views expressed on these pages are mine alone and not those of my employer.

Friday, January 09, 2004

There is a WikiWiki with a good introductory page on distributed hash tables (DHTs) at http://www.infoanarchy.org/wiki/wiki.pl?Distributed_Hash_Table
This page gives an overview and then provides further references to research projects that have implemented different kinds of DHTs, such as Chord, Pastry, Kademlia, etc.

As an example of how powerful DHTs can be as a generic naming substrate, see the open-source project named The Circle at http://thecircle.org.au/ . This application uses a DHT for several uses: to share files, to send instant messages and IRC-style chatting, and for putting together a personalized news service.

Here are some other things to see. I run a project named P2P Sockets that includes a simple, non-secure distributed DNS. While it doesn't currently use a DHT (it uses another open-source project named JXTA as its P2P substrate), I plan on transitioning to a DHT in the near-future.

The original paper concerning storing DNSSec records in Chord is available at http://www.pdos.lcs.mit.edu/chord/papers/ddns.pdf

There has been a great deal of activity both at the grass-roots, open source level and at the academic level in these systems the last few years. While I don't believe technical solutions can always solve social and political issues, I do believe that an alternative technical approach to the DNS can help ameliorate these problems. P2P/DHT-based approaches might point the way to such a solution.

There are several significant research issues that must be resolved before this is possible, though. These are latency, DoS-style attacks on such a substrate, reliability of the naming records, and how to achieve secure name bindings while also ensuring human-friendly names.

The first issue, latency, seems to be disappearing as newer DHT algorithms are developed. The second issue is DoS-style attacks. If we go with a First Come/First Served (FCFS) system for handing out naming bindings, which removes DNS-style registrars from the loop, then assailants can programatically exhaust the namespace by simply grabbing names. While a FCFS system is attractive because it removes the need to have gatekeepers handing out names, it does open this problem. One way to solve this is to retain DNS registrars who sign but do not store DNS records; DNS records are stored in the P2P substrate using the DNSSec standard. If a DNS registrar detects that another peer is attempting to DoS it, it can cut it off. Of course, this doesn't protect against distributed DoS attacks, where many peers in the network might be compromised and requesting names. We might have to introduce some "friction" into the system, such as money or hashcash (i.e. clients have to provide a proof that they ran some computationally-heavy algorithm).

The third issue is reliability of the naming records. Chord has its own solution to this problem, as does the OceanStore team. This is a difficult problem without an elegant solution at this point. A good paper comparing some P2P replication schemes is "Erasure Coding vs. Replication: A Quantitative Comparison".

The final issue is to achieve secure name bindings while also ensuring human-friendly names. The problem with achieving these two goals in a distributed, peer-to-peer system is succinctly explained in a position paper by an open-source programmer nicknamed Zooko. This paper is titled "Names: Decentralized, Secure, Human-Meaningful: Choose Two" . Some P2P projects have decided to simply abandon human-friendly names, instead going with secure pointers instead. This is the approach the Freenet project has taken.

Unfortunately, these secure pointers are incomprehensible to ordinary computer users. I think there is value in a global namespace that is human-friendly, such as the current DNS. While I believe that Zooko is correct in identifying that you can't achieve complete decentralization, complete security, and human-friendly names at the same time, I do feel that it is possible to have both security and human-friendly names with a partially decentralized system. The question then becomes how much can we decentralize while still retaining the other two aspects. Perhaps we will be able to decentralize the portions of DNS that are capital intensive, such as storing records or acting as root servers.


I am attracted to the DHT idea, and P2P systems in general, for several reasons. One is that I see that there are many applications which have high value/high demand by users, such as social networking applications or instant messaging applications, but which have low market potential. For example, most users love instant messaging, but they are either not willing to pay anything or only a nominal amount. The amount they are willing to pay is not enough to sustain the high capital costs it takes to build out a huge network for such applications, which means that the apps either disappear and go out of business or they converge into a monopoly or oligopoly that can sustain themselves with low profit margins. Both are undesirable in my opinion. My interest in P2P applications are that they can co-create the network they need without investing any capital costs.

Most distributed applications come down to naming: how do I resolve a friendly human name, such as a web address like "www.cnn.com ", an email address like "bradneuberg@yahoo.com", or a phone number like "510-938-2341", into some other kind of unfriendly identifer or resource to get the job done, such as an IP address "169.254.151.22" or a 128-bit public key. Traditionally we have depended either on centralized naming systems, such as LDAP, or semi-decentralized systems, such as the DNS. I am interested in the research question of how we can build a generic, mostly peer-to-peer naming system that uses human-friendly names in a secure way.

I believe that some of the social and political issues that the current DNS suffers from are due to its technological underpinnings. Because there is a set of root servers that are both very expensive to maintain and which also can give undue influence to one entity, we have to create a set of governing institutions to balance them out. While these governing institutions also exist for other purposes, what would happen if we could reduce the capital and transactions costs of such a network to close to zero? This would mean that we wouldn't need businesses to maintain these networks, since the networks would be automatically co-created by P2P applications. For example, take a look at Skype, which is a P2P VoIP application, which is trying to do something very similar. If you don't need to maintain a giant, expensive white pages server then you can significantly reduce the cost you offer your end-users. Further, the naming mesh should be able to scope itself into seperate namespaces, which means that seperate applications could experiment with different naming schemes without stepping on each others toes. No longer would application programmers have to go through committees to gain permission to experiment with new ideas.

Many people think DHTs might have the possibility of being a unified P2P architecture, a single structure that provides several different network services; the IRIS project is predicated on this belief. A unified P2P DHT-based architecture is attractive because it means programmers only have to understand and implement a single network substrate in order to get many different services. For example, an open-source project named The Circle implements a Chord-like DHT and uses it in several different ways:


* Share files
* Send instant messages and chat IRC-style
* Put together your own personalized, trust based news service


It does this by using its Chord DHT as a generalized architecture that doesn't need to be changed based on particular uses.


I would like to add NAT/Firewall routing and detection to the mix. The research question is: is it possible to add NAT/Firewall routing into DHTs/overlay networks such that you get them "for free" or for very little extra effort?

Well, as a first quick stab at the problem, imagine this. Most DHT type systems involve your requests being mediated by other peers, as they examine the hash of what you are looking for, and either re-route that request in a certain direction or satisfy it because they are handling that portion of the DHT space. So by default all peers are being mediated, which is a base requirement for NAT or firewall issues, since those peers can't automatically receive incomming requests from outside peers and need to be mediated for certain transactions. Imagine further that your particular DHT scheme requires that you maintain a constant connection with the other peers who are "closest" to you in the DHT space, which makes sense for performance reasons because you will be getting and sending requests to these folks on a large basis. If this connection were made to be HTTP, then we have again solved another piece of the puzzle since many firewalls allow HTTP connections out; plus, many of the JXTA-type NAT schemes require that the NATed peer maintain an HTTP connection to another peer, who mediates all inbound requests to the NATed peer. Further, imagine if we somehow encode whether a peer is NATed or firewalled into its hash address, and choose our DHT scheme to somehow "stripe" NATed/Firewalled peers against non-NATed/non-firewalled peers (since NATed/Firewalled peers must have non-NATed/non-firewalled peers be their intermediaries). Now imagine that we are a NATed/firewalled peer, sitting in our location in the DHT address space in such a way that we are automatically being mediated by our closest neighbor, who is non-NATed/non-Firewalled. We maintain a connection to our closest neighbors using HTTP. We send out a request containing the hash value of some item to retrieve as well as our own peer's hash ID within the space; this value is routed through the DHT until it reaches a peer who can satisfy it, at which time that peer pulls out the requesting peer's hash ID and sends back the response. Since our DHT was set-up to stripe NATed and non-NATed peers together, as this responding request comes back through the mesh it will automatically go to the non-NATed peer, who will then send it back to us.

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]