Wafting bits in your general direction

Standards at the edge

Posted by justin on October 17th, 2012 — Filed under Fanout (Permalink)

As I’ve been a long-time contributor to XMPP, people who know me are generally surprised to discover that Fanout.io is not in some way centrally powered by an XMPP server. XMPP is often touted as a solution for realtime communication among servers in a cluster. While this suggestion is not necessarily unsubstantiated, and many services operate this way today, the reality is that many applications have greater needs when it comes to messaging than your typical XMPP software stack provides. I’m talking about real queuing patterns, such as worker balancing, reliability, flow control, “job” queues, etc.

This is not to say XMPP is in any way deficient. It’s an interface protocol that does exactly what it was designed to do: enable the exchange of structured data using a logical addressing scheme (i.e. JIDs as opposed to physical IP addresses). There is enormous power in this, and in fact XMPP could be utilized as the foundational protocol of a message queuing solution. For better or worse, though, all current message queuing solutions are based on other protocols (typically solution-specific or proprietary protocols). I doubt this is the result of any real intent by the developers of these messaging solutions, it’s just the way history happened to unfold. Is there any reason the AMQP protocol couldn’t have just been a layer on top of XMPP? I suspect probably not, though I’ll admit I haven’t given any real thought about it. That said, even if XMPP were used as the basis of a queuing solution, it still wouldn’t change the fact that XMPP itself is not a queuing solution. It’s a data transport, somewhat sibling to HTTP in the network stack. The fact that Subversion is founded on HTTP doesn’t mean HTTP itself is a version control solution, right? HTTP and XMPP each do what they do, and you can build amazing things with either of them.

There are good reasons to involve XMPP in your project. Maybe you just want a simple way to get messages around because you’re using BOSH. Maybe you want to interface with IM applications or federate to remote domains. My advice, though, is to not end up building a message queuing system once your requirements increase. This is the challenge I faced during the early years of being CTO at Livefyre. We had many independent components running on different machines, accessible via JIDs. This worked great for inter-domain communication, and it was all part of my master plan of growing Livefyre into a set of internet-wide federated clusters. The approach is not terribly different from the way other federated networks have been attempted, such as Google Wave or Buddycloud. However, simply being able to get messages around was not enough to meet the functional needs inside of each individual cluster. Within a cluster, there were important background tasks that needed to be reliably handled. Posting a comment should add the comment into the database, push live updates to users, inspect as spam, send email notifications (or append to a digest email to be sent later), etc. This chain of events needed to be delegated across worker objects and to be able to survive crashes and restarts. What we really needed was a job queue.

Initially we developed our own job queue using MySQL for durable storage and XMPP for communication. While this got the “job” done, it was clear that maintaining this reinvented-wheel was a time suck. By mid-2011 we completely transitioned to Celery for job queuing. At first, this was troubling. I’m a standards guy. It was good peace of mind to know that we could substitute our use of the Tigase XMPP server at any time with alternative XMPP software. We never did this, but the point is that we could have. It’s just like how Apache could be substituted for Nginx if we needed to. No lock-in. The switch to Celery was crucial for engineering efficiency though, and worth trading away whatever kind of protocol purity I was hanging on to. Eventually, I grew to fully embrace this change. Along the way, we introduced other tools whose future substitution would probably be difficult (e.g. Redis), but they were all just so useful that I gave up on my idea that standards on the internal network are sacred. Having spent a decade in standards, it was in my very being to force them everywhere. Now I believe that the standardness of how things get around within a cluster is not terribly important. It is an implementation-specific detail. The much greater need for standards is in the way the cluster communicates to the outside world. Standards at the edge.

Just look at how we tend to evaluate the use of standards by other companies. We care most when those standards interface outward in some way, and the more openly they do so the better. Google using XMPP to interface with remote IM servers? Very interesting. Facebook using XMPP to interface with third party IM clients? Interesting, but not quite as interesting as Google federating. Yahoo using XMPP internally to power a realtime website? Totally uninteresting. My advice is to fight the standards fight where it matters most, and when it comes to your internal implementation choose the best tool for the job. If standards work best, then great. You’ll be more agile for it. But if they don’t work best, then they don’t.

Unfortunately, since Livefyre’s federation feature was never fully completed, we reached a point where XMPP was no longer used to accomplish any task in production. Thus, we completely removed support for the protocol to avoid the burden of maintaining unused functionality. At least this was still the case as of September of last year. I do hope someday the company embraces more open standards and leads the charge for federation among commenting services. An idea ahead of its time perhaps.

When I set out to build Fanout, I designed the architecture with this same practical mentality. ZeroMQ is used for internal communication among server nodes, which gives us the queuing features we need and makes certain scalability/availability aspects easier as there is no central message broker. Sure, ZeroMQ only has one implementation (and even protocol breakage in the next version), but nobody outside of Fanout has to know or care about this. Fanout has a first class XMPP interface to the outside world, on par with its HTTP interface. I suppose I cannot say Fanout is “powered by XMPP” then, a term I’d used in the past when describing Livefyre. But what does “powered by XMPP” really mean anyway? That’s like saying your cluster is “powered by HTTP”. It’s kind of a meaningless statement when you think about it. The cluster is powered by implementations, not protocols, and how much does it matter that a stock XMPP server literally be used as the fundamental message bus? It doesn’t. What’s important is that XMPP be served to the outside world at all, and indeed Fanout deploys Tigase to its edge nodes. I believe this is the most ideal way to build amazing, highly available, and scalable services that federate to XMPP, without compromise.

Liked this post? Follow this blog to get more. 

9 comments
dwd
dwd

Open standards are about interop, not implementation - so the most important thing is that anyone can interop with your service using known standards.

That said, XMPP can, and often does, make for a great middleware service as well, and this use-case is important too, as it gives you that ability to side-step vendor lock-in which is vital for agility. Assuming you're in Portland next week, we should sit down - I'd say in a corner, but let's be brave and sit in the middle of the room - and chat about whether there's something we can do here.

aung
aung

I am a beginner in XMPP development so please bear with me if I don't make sense. I'm assuming that XMPP already have some sort of internal message queue. I'm designing a system that involves different modules implemented as XMPP components and communicating to each other via XMPP. I've thought about using RabbitMQ, but thought I'd give a try on the XMPP as the message queue itself.

ben
ben moderator

There is still hope.

justin
justin moderator

@dwd Yup, I'll be there. This would be a great topic at the summit.

justin
justin moderator

@aung "Message queuing" as a concept refers to a number of queuing/delivery patterns. If you just need to get messages around to independent, singular components then you may be fine with an out of the box XMPP server. But, say for example that you want to have two instances of the same component for load balancing purposes. In that case you'd want a way for a sender of a message to not have to care about which of the two instances receives the message. Some logic in the delivery layer would ensure messages are evenly distributed between the two. Maybe if both instances are too busy to handle more messages then you want some backpressure to stop or block transmission at the sender side.

You can certainly write code for this kind of behavior over XMPP. I initially wrote some custom "virtual JID" routing code within the Tigase server such that senders could send to a JID that represented one or more potential worker components. The router would then forward to a more specific worker JID. If a worker was down or unavailable, the router would respond with errors such that senders could hold and retry. This is all fine and good, but my point is that something like RabbitMQ already handles these kinds of behaviors, and so if these behaviors are needed then a better use of time would be to just use what is available rather than writing a bunch of code.

ben
ben moderator

@aung My understanding is that XMPP, being only a protocol, does not provide any sort of internal message queue. Out of the box XMPP servers like ejabberd or tigase may use message queues internally (as may your apps), but nothing about the protocol specification itself involves a message queue implementation.

justin
justin moderator

@ben Anything in the works that you can talk about?

aung
aung

@ben @aung Thanks for the clarification. I suppose I confused XMPP protocol vs. the actual implementation. I'm using ejabberd for server.

ben
ben moderator

@justin et al: Having worked in Engineering and now Product there for some time now, I can say that Livefyre is always and increasingly committed to serving as a platform for creative, conversational development. Just this week we released an example of what is possible with one of the newer APIs StreamHub provides, the Heat API that answers the question "Where are people talking?" - http://gobengo.github.com/fyre-hottest-tiled/

This generation of (XML-less) social web standards is still only beginning to develop. ActivityStrea.ms, PuSH(+JSON), oAuth, Salmon, Webfinger, RDFa, and more were extremely nascent a year and half ago when @justin and I spent so many late nights researching, and they still have a long way to go. Semantics are unclear, specifications are still being drafted, and implementations still widely differ.

But we have been watching closely, learning a lot along the way. And it's exciting.Standards are important. So are reliability, efficiency, maintainability, user experience, and customer satisfaction. I can promise that Livefyre will keep all of these in mind as StreamHub grows and develops.

This was a great post.