Friday, May 23, 2008

Comments on the Twitter Outage discussions

So, given the popularity of Twitter there is a lot of commentary going around the blogosphere and tech news communities, due to the last couple of days' outages. The article at Tech Crunch is typical of some of the attempts to characterise the issues. Their article really annoys me.

The source that I spoke to also commented on how ill-prepared the Twitter team were and are for their current and future challenges. The small team contains a handful of engineers, with only a person or two committed to infrastructure and architecture. He goes on to point out that at Digg the team for network and systems alone is bigger than the total engineering team at Twitter, and that at Digg they are lead by well-known “A-list rockstars”.

[From Twitter At Scale: Will It Work?]

This type of journalism really irks me. It's very difficult for anyone to form a reasonable opinion with this type of reporting as it's completely based on hearsay, with no way of validating the comments. Critically, the "expert source" is not named, or quoted in any way that gives the reader any hope of validating the opinion. If a "source" is not prepared to be named and stand up as an "expert" allowing the rest of us to judge their credentials - really we have no option but to disregard the information entirely. The quality of the opinion or accuracy of assessment here simply cannot be determined.

To discuss the ideas from the unamed source; it's really easy to think that all of a company's problems can be solved by "Big Engineering". Often that simply adds complexity to the situation. It seems that Twitter already have a project to improve their site in progres. Trying to add to a development team halfway through what is obviously a project already in train, just does not work. Twitter themselves are quoting a process already in train:

Essentially what has been happening is that we've been trying to make changes in order to improve the long-term reliability of the service. Those changes have introduced instability in the short-term, however. We need to be able to make these kinds of changes and do so without affecting the service. That's our goal and what we're working toward.

[From May 20: Twitter Downtime]

Often as not, a really focussed but small team can do wonders. Just because an application is managing a large scale problem, doesn't imply that the team has to be big.

Tech Crunch goes on to talk about Ruby on Rails.

The problems at Twitter are often attributed to their use of RubyOnRails, a web development framework. Twitter is almost certainly the largest site running on Rails, so fans of the framework and its developers have been quick to deflect the criticism and point it back at the engineers at Twitter. Utilizing a framework that has never conquered large-scale territory must certainly add to the risk and work required to find a solution. As an out-of-the box framework, Rails certainly doesn’t lend itself to large-scale application development, but was a big part of the reason why Twitter could experiment and release early.

Rails has enabled Twitter to prototype quickly, to quickly launch and then to easily iterate with new features. But the old adage of “Good, Fast, Cheap - pick two” certainly applies; and Rails would do itself no harm by conceding that it isn’t a platform that can compete with Java or C when it comes to intensive tasks. Twitter is at a cross-roads as an application and Rails has served its purpose very well to date, but you are unlikely to see a computational cluster built with Ruby at Apache any time soon.

[From Twitter At Scale: Will It Work?]

This comment is simply misleading. Apache is a web server. It is often used in conjunction with the other systems that run sites of this nature. I'm sure that somewhere in the Twitter setup Apache is being run also. The Apache Web site is the place from where you get the software. Whether we'd be likely to see a computational cluster at anytime soon has no bearing whatsoever, as to whether Twitter is engineered well.

In conclusion, I have no idea whether the issues at Twitter are a result of questionable engineering decisions or not. The issue is however, that the FUD from sites like Tech Crunch (who I would expect to have at least made an effort to do some good research on this), doesn't tell me anything more about the situation.

No comments: