As I’ve been a long-time contributor to XMPP, people who know me are generally surprised to discover that Fanout Cloud is not in some way centrally powered by an XMPP server. XMPP is often touted as a solution for realtime communication among servers in a cluster. While this suggestion is not necessarily unsubstantiated, and many services operate this way today, the reality is that many applications have greater needs when it comes to messaging than your typical XMPP software stack provides. I’m talking about real queuing patterns, such as worker balancing, reliability, flow control, “job” queues, etc.

This is not to say XMPP is in any way deficient. It’s an interface protocol that does exactly what it was designed to do: enable the exchange of structured data using a logical addressing scheme (i.e. JIDs as opposed to physical IP addresses). There is enormous power in this, and in fact XMPP could be utilized as the foundational protocol of a message queuing solution. For better or for worse, though, all current message queuing solutions are based on other protocols (typically solution-specific or proprietary protocols). I doubt this is the result of any real intent by the developers of these messaging solutions, it’s just the way history happened to unfold. Is there any reason the AMQP protocol couldn’t have just been a layer on top of XMPP? I suspect probably not, though I’ll admit I haven’t given any real thought about it. That said, even if XMPP were used as the basis of a queuing solution, it still wouldn’t change the fact that XMPP itself is not a queuing solution. It’s a data transport, somewhat sibling to HTTP in the network stack. The fact that Subversion is founded on HTTP doesn’t mean HTTP itself is a version control solution, right? HTTP and XMPP each do what they do, and you can build amazing things with either of them.

There are good reasons to involve XMPP in your project. Maybe you just want a simple way to get messages around because you’re using BOSH. Maybe you want to interface with IM applications or federate to remote domains. My advice, though, is to not end up building a message queuing system once your requirements increase. This is the challenge I faced during the early years of being CTO at Livefyre. We had many independent components running on different machines, accessible via JIDs. This worked great for inter-domain communication, and it was all part of my master plan of growing Livefyre into a set of internet-wide federated clusters. The approach is not terribly different from the way other federated networks have been attempted, such as Google Wave or Buddycloud. However, simply being able to get messages around was not enough to meet the functional needs inside of each individual cluster. Within a cluster, there were important background tasks that needed to be reliably handled. Posting a comment should add the comment into the database, push live updates to users, inspect as spam, send email notifications (or append to a digest email to be sent later), etc. This chain of events needed to be delegated across worker objects and to be able to survive crashes and restarts. What we really needed was a job queue.

Initially we developed our own job queue using MySQL for durable storage and XMPP for communication. While this got the “job” done, it was clear that maintaining this reinvented-wheel was a time suck. By mid-2011 we completely transitioned to Celery for job queuing. At first, this was troubling. I’m a standards guy. It was good peace of mind to know that we could substitute our use of the Tigase XMPP server at any time with alternative XMPP software. We never did this, but the point is that we could have. It’s just like how Apache could be substituted for Nginx if we needed to. No lock-in. The switch to Celery was crucial for engineering efficiency though, and worth trading away whatever kind of protocol purity I was hanging on to. Eventually, I grew to fully embrace this change. Along the way, we introduced other tools whose future substitution would probably be difficult (e.g. Redis), but they were all just so useful that I gave up on my idea that standards on the internal network are sacred. Having spent a decade in standards, it was in my very being to force them everywhere. Now I believe that the standardness of how things get around within a cluster is not terribly important. It is an implementation-specific detail. The much greater need for standards is in the way the cluster communicates to the outside world. Standards at the edge.

Just look at how we tend to evaluate the use of standards by other companies. We care most when those standards interface outward in some way, and the more openly they do so the better. Google using XMPP to interface with remote IM servers? Very interesting. Facebook using XMPP to interface with third party IM clients? Interesting, but not quite as interesting as Google federating. Yahoo using XMPP internally to power a realtime website? Totally uninteresting. My advice is to fight the standards fight where it matters most, and when it comes to your internal implementation choose the best tool for the job. If standards work best, then great. You’ll be more agile for it. But if they don’t work best, then they don’t.

Unfortunately, since Livefyre’s federation feature was never fully completed, we reached a point where XMPP was no longer used to accomplish any task in production. Thus, we completely removed support for the protocol to avoid the burden of maintaining unused functionality. At least this was still the case as of September of last year. I do hope someday the company embraces more open standards and leads the charge for federation among commenting services. An idea ahead of its time perhaps.

When I set out to build Fanout, I designed the architecture with this same practical mentality. ZeroMQ is used for internal communication among server nodes, which gives us the queuing features we need and makes certain scalability/availability aspects easier as there is no central message broker. Sure, ZeroMQ only has one implementation (and even protocol breakage in the next version), but nobody outside of Fanout has to know or care about this. Fanout has a first class XMPP interface to the outside world, on par with its HTTP interface. I suppose I cannot say Fanout is “powered by XMPP” then, a term I’d used in the past when describing Livefyre. But what does “powered by XMPP” really mean anyway? That’s like saying your cluster is “powered by HTTP”. It’s kind of a meaningless statement when you think about it. The cluster is powered by implementations, not protocols, and how much does it matter that a stock XMPP server literally be used as the fundamental message bus? It doesn’t. What’s important is that XMPP be served to the outside world at all, and indeed Fanout deploys Tigase to its edge nodes. I believe this is the most ideal way to build amazing, highly available, and scalable services that federate to XMPP, without compromise.