However, after spending quite some time working on large distributed applications and gaining a greater appreciation of REST and messaging patterns, I feel that much of what typical web applications want to accomplish with WebSockets (or with socket-like abstractions) is perhaps better solved by other means.
HTTP streaming and Server-Sent Events
WebSockets aren’t the only game in town when it comes to efficiently pushing data to browsers. Consuming an HTTP response of indefinite length with XMLHttpRequest or EventSource/SSE is possible in all modern browsers with minimal hackery. Sure, this is only useful for sending data from the server to the browser, but for a typical application this is the only non-obvious problem that needs solving. Few developers encounter problems sending data from the browser to the server.
Arnout Kazemier (“WebSuckets”, Primus) recently posed this thought about EventSource: “It’s there. It’s really cool. It has automatic reconnect. It works through proxies […]. And it’s streaming. So why are we not using it?” Good question.
Long-polling gets a bad rap. In the early days of the realtime web, long-polling was used mostly as an underlying hack in the emulation of streams. I was active in the XMPP community back when BOSH was being developed, and indeed we looked at it as a giant hack. All we wanted was a streaming protocol that worked in browsers, and so a solution was mashed over HTTP. Of course, other efforts share in this tradition.
The thing is, it’s possible to use long-polling in non-hacky ways. FriendFeed has an intuitive long-polling API, as does WebhookInbox. Livefyre serves its millions of users with this API. RealCrowd recently shared about their RESTful Realtime approach. What’s key about all of these services is that none of them are using hacks to emulate streams. When clients make requests to these APIs, what they are really doing is attempting a data synchronization. Since realtime push is often used with the goal of synchronizing data, it can be argued that these kinds of long-polling APIs more accurately reflect what the client is actually trying to accomplish.
The main criticism of long-polling is the need for the client to repeatedly hit the server. However, even with a WebSocket you’ll almost certainly want to use some kind of periodic heartbeat/ping in order to detect network failures. Provided HTTP persistent connections are used with long-polling, the network traffic will come out about the same as a WebSocket.
Let me guess, you thought long-polling was legacy junk kept around to support old users of IE? On the contrary, long-polling is great for many reasons that have nothing to do with supporting legacy browsers.
Maybe I’ve spent too long drinking the ZeroMQ kool-aid, but my feeling is that network applications are easier to design in terms of messaging patterns than in bidirectional streams. What’s a messaging pattern? Perhaps the most common example is the request-response pattern, as this is how HTTP traditionally works. Ever notice when you make an AJAX request from the browser that you don’t have to think about TCP? The underlying complexity is abstracted away. Maybe a new TCP connection was made for the request. Maybe a persistent connection was reused. Maybe the request or response spanned multiple IP packets. These things don’t matter to the application, which only concerns itself with sending requests and receiving responses. Socket libraries for the browser tend to reintroduce the notion of “connections” to applications, but this is a step backwards. Connections are often an unimportant detail.
I’d go as far to say that web applications (or any similar thin clients) really only need request-response and publish-subscribe messaging patterns to get the job done. Any bidirectional stream layer in this context is useful only inasmuch as it can facilitate these patterns. Since request-response is already possible with conventional HTTP alone, it is only for publish-subscribe that we need alternatives.
The federated web generally works with these two patterns as well. Most interactions between servers occur as conventional request-response HTTP, and HTTP callbacks (aka Webhooks) are often used in a publish-subscribe fashion for realtime updates. Even though browsers are dynamic entities that must reach out to publishers in order to be able to receive anything, the publish-subscribe pattern is still applicable.
Note that I’m not saying everything we ever do must be based on HTTP. I’m a long-time contributor to the XMPP standard and I’ll be the first to preach about alternative protocols. It just turns out that mapping certain messaging patterns to HTTP is more workable than developers may realize.
I tend to agree with Anant Narayanan (of Firebase) who says that most developers using realtime messaging in the browser are really trying to solve synchronization problems. The claim is that WebSockets are a primitive that most applications shouldn’t need to use directly, and developers would be much better off if they could utilize an API based around data sync that potentially uses WebSockets under the hood.
With this in mind, why do we (as a community) try so hard to emulate streams on browsers that don’t support WebSockets? If we’re going to go through the trouble of hacking something over HTTP, I say we just skip to the end and emulate the interface we actually want, such as publish-subscribe or data sync, rather than emulating a stream which we’ll inevitably be layering something else on top of.
I created the Fanout PubSub Protocol with this philosophy in mind. It is a clean publish-subscribe protocol based on long-polling that does not have an intermediate connection-oriented bidirectional streaming layer. This makes it lighter and less complex than similar publish-subscribe protocols running over stream emulation (e.g. Meteor DDP running over SockJS in long-polling mode), while with few exceptions being nearly as performant as anything running over a native WebSocket.
Of course, publish-subscribe messaging is still considered a primitive in the system described by Anant. That’s okay. Data synchronization can be built on top of this pattern without minding the loss of an emulated stream, and it will be more efficient for it.
Maybe you do need a Websocket
Yes, there are certain instances in which a WebSocket provides significant benefit over the status quo.
The client needs to send rapidly to the server. Theoretically one might be able to use XMLHttpRequest to upload an indefinite stream of data, but I’d be surprised if there is much support for this in browsers/servers. Note that if the server has lots of data to send rapidly to the client, then a WebSocket isn’t strictly needed as rapid transmissions are possible with EventSource.
Stateful connection protocol. Having inbound and outbound data ride over different transports can sometimes be problematic. Maybe you have some kind of sticky sessions and need a way for a client to always send data to the same server that it is receiving data from. I’ll admit there are efficiencies to be gained by this approach, even if the web has gotten by this long without needing it.
Communication is naturally streaming. Maybe you’re trying to implement a special protocol within the browser, and this protocol is intended to run over a bidirectional stream. An example of this would be XMPP over WebSockets. In this case using an emulated stream (e.g. SockJS) in place of a WebSocket is justifiable, as a stream is the most natural transport for XMPP. A REST binding for XMPP would be pretty cool, though!
Know your options and understand the technology. For many folks, “realtime” is synonymous with WebSockets and anything else is branded as legacy. In my opinion, very few applications absolutely need a WebSocket, and so the associated complexity may be unnecessary. Think about the problem you are trying to solve, think in terms of messaging patterns, and think about data synchronization. You may find a clean, HTTP-based approach to be more suitable.
Interested in Fanout? Check out Fanout Cloud.