Good Vibrations
02 Jun 2009Just a few days ago Google launched Wave. The demo is a fun to watch. The technology seems quite impressive, even in this early stage. I went through all the documents and here are my impressions.
First of all it is obvious that Google Wave is still in early development stage. The most obvious signs of that are in the area of architecture and its documentation. The terminology is inconsistent and often quite confusing:
- It is not clear what is the difference between wavelet and wavelet copy. For example there is a statement "... local wavelets are those created at that provider ...". local wavelet are in fact wavelet copies, can they be "created" at a provider? Or only a wavelet can be "created" and the copy is just a side-effect of that?
- Local wavelet: is it local to the client? local to provider? And it is a wavelet copy after all. It should be called "Local wavelet copy" or "provider-local wavelet copy". Or is "local wavelet" really a "wavelet" and only a remote wavelet is a wavelet copy?
- What means "processing a wavelet operation"? It is changing state of wavelet? Or wavelet copy?
- The "frontend" component that is mentioned in federation description is not mentioned in the "cient-server protocol" document, alhough the fereration is referencing the other document for more details on that.
- It is not defined what "WSP" means (in protocol specification)
- The developer's guide mentions conceptual hierarchy: wave-wavelet-blip-document. Other documents does not mention blips (almost) at all. This terminology or consistency problem needs to be cleaned up.
Some documentation sections are far from being clear. For example the sentence "In the same way a user can submit operations to a remote wavelet, namely by letting the federation proxy connect to the remote federation proxy and submit the operation to its wave server." should obviously be "... letting the federation proxy connect to the remote federation gateway ...". Or not? In previous text it is stated that proxy connects to the gateway, there is no mention of proxy connecting to proxy or gateway connecting to gateway. However the next paragraph also describes gateways connecting to each other. This needs to be cleaned up or clarified. Conceptual sequence diagrams would help a lot.
The protocol and architecture description needs more pictures. Much more pictures. I would suggest creating figures to illustrate at least following concepts:
- Architecture big picture, showing all the high-level system components, illustrating their roles and interactions. This is almost mandatory to any architectural description. I'm surprised it is missing.
- Some kind of deployment diagram: How wavelet store, wave server, federation proxy and gateway relate to each other? Are they under the same organizational control (site) or not? It is important to understand that federation is really a federation.
- Sequence diagrams that illustrate basic communication exchanges. The single attempt on sequence diagram that I've found is not really sufficient to describe massively parallel, distributed, federated, real-time, open and insert-you-favorite-buzzword-here system.
Apart from formal inconsistencies and difficulties to understand the architecture, there are deeper concerns. The architecture seems to be problematic in few aspects.
Google Wave architecture does not adhere to architectural best practice. It is not minimal. The robots are described to communicate with Wave by HTTP/JSONRPC (robot is server), Client apparently communicates by HTTP (as AJAX application?) , while the wave federation protocol is described as XMPP-based. Why do we need so many protocols? Is there any reason why robot protocol and client-server protocol needs to be different? The non-minimalistic approach can be seen in the OT operations as well. The antidocumentelementstart and endantidocumentelementstart operations seems redundant to me. If they are not redundant, their existence should be explained in the architectural documents.
I'm a bit afraid of Google Wave scalability. Persistent queues are used in federation gateways. This may mean too much state to maintain, too much I/O operations, too much context switches in implementation. It may scale to several hundreds of interconnected nodes, but the scalability to an Internet scale is questionable. Similar concern may apply to use of digital signatures for authenticating wavelet operations may be too expensive. Even though hash trees are used, I wonder how this could scale with millions of users writing in real time. If would be nice to have empirical data on scalability of these mechanisms before going on with the prototype, especially considering that these mechanisms determine some of the basic properties of the system and the protocols.
The documents does not mention failure cases. While designing an distributed system of this scale, the failure cases are as important as positive use cases. How will the system be affected if one of the wavelet-hosting servers will not be available? What happens if master server for a wave goes down? And can the system reliably work on Internet links with quite high latencies and low reliability?
There is a question of trust infrastructure. The trust infrastructure is not considered in the Google Documents or in the paper draft by Kissner and Laurie. The XMPP specification (RFC 3920) also pushes the trust infrastructure outside of the specification scope. I can feel that TLS/X.509/DNS combination is somehow (almost silently) assumed. But for Wave to be used as an ubiquitous system, such infrastructure must exists and be universally available. Will CAs offer Wave (XMPP) certificates? What CAs will Google accept? Cannot that lead to monopolization? How much will such certificate cost? Will not that be kind of a ransom that a site must pay to be able to participate?
Wave is changing paradigms. People can no longer take back what is released. Even if someone deletes part of the document, the deleted part can be seen in playback. While this "permanent memory" was there almost since the beginning of the Internet, it was never before real-time. How could we take back an information from a Wave? Imagine you have misplaced your password to the wave instead of password input box. It will always be visible. OK, I could change my password, but what about unfortunate copy&paste event with a credit card number?
But the worst architectural deficiencies of Google Wave go even deeper: Wave is not aligned with WWW architecture and the specific nature of user identities is not considered.
Let's for a while abstract from all the deficiencies of WWW architecture itself and let's agree that, for better or worse, the WWW architecture is still useful. According the the WWW architecture agents should provide URIs as identifiers for resources. Waves, wavelets, blips and documents can definitely be regarded as resources, however Wave architecture does not assign URIs to them. Wave specification uses QNames (XML namespaces) a lot, however it does not provide QName to URI mapping as it should. Some problems of Wave architecture are caused by XMPP, such as violation of URI opacity and URI reuse. The very nature of Wave goes against REST. REST assumes stateless interactions, while Wave is inherently stateful. I don't blame Wave for all those problems. WWW architecture, XMPP and REST can be guilty as well (as they are). But I would expect discussion of these problems in the Wave architectural description and reasoning behind the decision that were taken while designing Wave.
Wave does very little to consider user identities. The demo seems to use only a simple drag&drop from contact list. But how will these contact lists get constructed and maintained? All the documents seem to assume that the use of email-like identifier for users. Will this be global? With all the unfortunate consequences? Could Wave avoid linking user activities at different sites? Does it support pseudonyms, pair-wise identifiers, user privacy controls, anonymous groups, or anything that can support user control over their personal data? How does Wave plans to defend against spam and phishing? Or do they expect that this will not apply to Wave? That would be really naïve.
My last concern is about Wave maintainability. It is only half of the success to create a system - the second half is to keep it operational and efficient. How could Wave handle change of domain names of participants? Would they loose all old waves and drop off their friend's contact lists? Domain names are human-readable, and they do change occasionally. Can Wave handle changes in network topology? For example moving wave servers here and there without service interruptions? Merging two servers or splitting a server to several boxes? To several organizations? Mergers, acquisitions and re-orgs happen all the time ...
Even though I must be making an impression that I do not like Wave, that's not true. Quite the contrary. I was having fun watching the demo, I like the idea and I think that the overall concept is good. However creating useful, reliable, real-time and Internet-scale system is really a major challenge. The Wave team is obviously up to that challenge. But there is still a long way to go. I would conclude that it is critical for basic architectural concepts of Wave to be sorted out as soon as possible, especially the alignment with Web architecture and concerns related to user identities. Wave technology is undoubtedly attractive and it will be most probably very successful. However it can be really harmful for the whole Internet if the Wave would be deployed in this form, with all the architectural deficiencies. If that would happen, the whole system will fall apart in few years or decades and it will block further innovation in this area.