AWS EBS Outage at US-EAST-1

I am the CTO of a startup using AWS and we had to suffer from an EBS outage this morning at the US-EAST-1 data center / region.

About 2 weeks ago we moved one of the services to AWS following an outage on a dedicated server. At the time we had no hot backup and were down for too many hours although our (well-known) provider had told us via email that we should expect 1 hour recovery time just a few weeks prior.

For our new setup I provisioned 2 servers with MySQL Master-Slave replication, one on the East coast, the other one on the West coast. We switched over before the setup was entirely finalized for monetary reasons (that's the life of a startup)...

Early this morning our master database hosted on EBS on US-EST-1 was not responding, I could login onto the servers but the database on EBS was no longer processing requests. There was no way to recover the service and I decided to switch over to the slave server on US-WEST-1.

It was much harder than anticipated because our setup and disaster recovery plan was not fully in place, but none of our customers have reported their service down. We were supposed to do our first drill over the weekend but we had a live opportunity to test this new setup and it was a lot of stress due to the nature of this service.

We had proper documentation that was crucial for the fast and full recovery.

AWS management console has worked flawlessly during the incident although we could not do snapshots on the East coast because the EBS volumes were not reachable.

Overall I am very happy with the end result as it rewarded a lot of hard work. Cloud or not, AWS is not better nor worse than other solutions from a reliability standpoint. But at least AWS makes it somewhat easy to setup over different regions when the other provider required that we have both our servers in the same rack!

The ability to start new instances almost instantly also proved useful in this outage. So overall I can say that Cloud Computing helped us from an availability standpoint.

Startups have do build for failures, hardware, network, power, data centers, but also human error (drop table replicated on slave server, oops) and vandalism (delete snapshots and everything else, #@!t).

Reducing Authorization Latency

Authorizing user access to resources adds latency because the authorization needs to be completed before resource access is performed.

This Authorization Latency increases as the complexity of authorization schemes increases. In most cases authorization requires database access which will always introduce significant latency to all requests on a server even if the authorization database is running on a distinct server than the "business" database.

A complex authorization scheme would typically require several joins. The following SQL query retrieves the list of resources and associated roles a user has in a system where authorization rules define roles for user groups on resource groups: 

SELECT groups.resource_id, rules.role_id
FROM       authorization_rules as rules
INNER JOIN resources
ON resources.id = rules.resource_id
INNER JOIN resource_groups     as groups
ON groups.id = resources.group_id
INNER JOIN user_groups_users   as users
ON users.user_group_id = rules.a.user_group_id
WHERE ( users.user_id = $user_id )

The above authorization request joining four tables is likely to take more time than many "business" queries. Despite this complexity, this model does not allow to define groups of groups for users and resources.

Authorization requests succeed 99.999% of the time because users in a system designed properly are authorized to perform what they request. This can be seen as unnecessary latency most of the time.

The proposed alternative is to perform authorization in parallel of the execution of the request. After both have completed, the server checks the result of the authorization to either forward or drop the response.

User authentication should probably still be performed before everything else to prevent denial of service attacks. Authenticated users are much less likely to attempt unauthorized access than non-authenticated users.

An implementation of parallel authorization should be straightforward for read requests. The authorization spawns a thread or coroutine before the request is executed to fetch the response:

thread_id = user.authorized( resource, "read" )
response = resource.read()
if not thread_id.join().authorized {
  throw( "Not Authorized" )
}
return response

Another more complex read scenario is where a number of records could be returned from the database but the user only has reading rights to some of them. The authorization then returns a list of authorized resource identifiers and a query must be done on the results to extract only those authorized:

thread_id = user.authorized_resources( resource_type, "read" )
responses = resources.read()
for id in thread_id.join().authorized_resources {
  if ( responses[ id ] ) {
    yield responses[ id ]
  }
}

If the ratio of the number of responses returned by the database to the number of authorized records is too high, unused records could significantly increase latency and one should consider falling back to authorizing first then reading only authorized resources.

On update requests, non-authorized updates must be undone. Using a relational database, a transaction can be used, performing the update, committing if the authorization succeeds, or rolling back the transaction if it fails:

thread_id = user.authorized( resource, "update" )
try {
  begin_transaction()
  response = resource.update()
  if not thread_id.join().authorized {
    request.user.disable() // to prevent further abuse by this user
    throw( "Not Authorized" )
  }
} catch( e ) {
  rollback_transaction()
  re_throw( e )
}
commit_transaction()
return response

Implementing this on a non-transactional system, such as a NoSQL database, would require at least a versioning system where each update in the database creates a new revision that can be undone.
thread_id = user.authorize( resource, "update" )
response = resource.update()
if not thread_id.join().authorized {
  response.undo()
  request.user.disable() // to prevent further abuse by this user
  throw( "Not Authorized" )
}
return response

There are a few other issues not discussed here, regarding the retrieval of unauthorized documents before these are removed, see my first comment bellow.

Gods' Logic Game

Stephen Wolfram has an intuition, shared by many that our complex universe is defined by a set of very simple rules. He is looking for this set of rules, telling us that is is now almost embarrassing not to look for these rules.

One of my theories for the universe is that there could be a game going on between "gods" or creators. The rules are the following:
- Each creator defines a set of universe-generating rules
- Start all universes at the same time
- Watch each universe evolve
- The creators have no right to tweak things, they can only watch, i.e. the are not allowed to perform miracles
- The first universe that generates a creature capable of understanding its universe foundation rules wins.

Once a universe self-understands itself, the understanding creatures can play the game.

Based on this idea there would be multiple universes out there but all of these universes would share one thing: logic. Logic is the only thing that is shared by all universes.

This type of universe can answer all spiritual questions:

Why does suffering exist? Because what we consider 'bad' things, such as death, are necessary - i.e. for evolution to happen.

Why miracles don't exist? because it would be cheating. A creator that does not need to perform miracles is much more clever than a creator that needs miracles to tweak a badly created universe.

Please, challenge me, ask me a question that cannot be answered by this theory.

Internet Explorer 9 to Support SVG, Much Faster JavaScript.

Media_httpwwwwebmonke_jbavs

This is great news for SVG enthusiasts. The following SVG demo will now work in the vast majority of web browsers including IE9:
http://ie.microsoft.com/testdrive/Graphics/35SVG--oids/Default.xhtml

I, for one, have been waiting for SVG support in IE for many years. Recently I was worried that IE9 might come with Canvas support but no SVG. The opposite is just happening, and this makes my day.

While many lament the lack of Canvas support, I believe that SVG support is more important than Canvas because SVG can do everything that Canvas does with the additional benefit of being fully integrated in the DOM, enabling DOM events in particular that are a mess to implement with Canvas.

The Faster JavaScript engine, now on par with Google Chrome and Safari, in combination with SVG support will allow the development of the most sophisticated web applications - aka Rich Internet Applications - without requiring plugins such as Flash or Silverlight.

Good support of CSS3 is also expected which will also makes the life of web designers significantly less painful.

My only regret is the lack of Windows XP support, which will be a problem considering the very large XP user base that will not upgrade and will continue to live for at least 5 years.

Web Standards Evolution - VML, SVG, Canvas, CSS, XML, JSON, and JavaScript.

This post stemmed from an unpublished reply (comments are closed) to a very good article from Mark Pilgrim on the Overton Window applied to web standards and the W3C in particular.

<digression class="philosophical">Standard organizations, and standards, just like any system in our known universe, have to die someday. When their time has come, they allow the birth to their next evolutionary iteration. Just like supernovas are dying stars that allow the creation of the atoms that eventually allow life to emerge, and the death of dinosaurs allowed mammals to strive. We humans will someday give birth to our evolutionary child, be it intelligent machines to allow the universe to acquire an enhanced understanding of itself - which is all the universe cares about in the end.</digression>

Lets start by not complaining too much about the current state of the W3C or other web standards, what's relevant is to understand the limitations of all the things out there and get ready for the evolutionary changes ahead.

Regarding SVG and Canvas, I believe we need both (a format language and an API) at this point although there is a big overlap between the two. We might need a third standard to unite them both or split them in 3 parts: one common and 2 extensions to account for the specificities of both. This could come in the form of:
  1. A rendering and behavior specification for vector graphics integrated in the DOM with events on individual vectors (unlike Canvas that requires to scan all objects to bind events to elements). This behavior would be accessible to HTML, CSS and JavaScript, etc, through their respective syntaxes.
  2. A language format for vector images (à la SVG/VML)
  3. An API for vector libraries (such as drawing or charting libraries like Ico).
Even if such a standard does not emerge anytime soon, web browser developers could adopt the above architecture to implement both SVG (plus VML in IE) and Canvas. An underlying rendering and behavior engine could very well be a common backend to both SVG, VML, Canvas, you name it, and of course the freaking CSS3 rounded corners. A nice side effect of this architecture would provide events to Canvas elements.

In any case, one must recognize that the real problem with SVG from the start has always been it's denied and angry VML father [Luke, ... I am your father]. The excellent Raphael JavaScript graphic library over the last year and a half has managed to reconcile VML and SVG while sharing some similarities with Canvas (being an API vs a format). Raphael and Canvas are therefore evolutionary children of the standard war.

CSS is still not a real language with variable, let alone objects. Where is CSS going? It is currently dying supernovae-style giving birth to a myriad of modules never-to-be-released, missing programmability altogether, and in total ignorance of encapsulation-concepts. Instead of premature componentization, we first need to see the big picture of where we want the web to go. Here is my proposition for an evolutionary change: Drop CSS for a JavaScript DSL.

What's the big deal about rounded corners? Do we need another iteration of CSS just for that? What if Apple drops rounded corners next week for some other box shape? Will we have CSS4 to handle that too? In 2022?

We need to add vector graphics in the toolbox of style designers so that they stop relying on raster images which are only slightly ok at carrying approximate representations of our chaotic world. I say "slightly" because if the world is chaotic as it is, then we don't need compression, not even fractal compression, we need fractal shapes.

How about the XML standard and its agile evolutionary child, JSON? I could have used JSON for the above digression but XML provides a more award style more appropriate for emphasized off topic content. JSON is more human readable, and fits better in the current JavaScript-bound-web.

What's ahead for JavaScript? Hold your breath, .... drums, ..., and the winner is: JavaScript for now. Why? Because, for now, JavaScript does the job once one realizes it is a really good evolutionary child of Class-based Object Oriented languages (such as C++ and Java) with easy to use (I would even say intuitive) closures. There's another reason of course: our life in the browser is for now limited to a single language. Although that will have to change someday, for now JavaScript only flaw is its performance in IE that should mostly be fixed in IE9.

Drop CSS for a JavaScript DSL for Styling and more.

CSS is a great language to separate structure from style concerns. Yet CSS lacks most of the features found in modern languages such as variables, objects, functions, etc. At the very minimum we'd like to have variables to keep CSS classes by selector instead of splitting them by property.

Most designers will use JavaScript whenever complex dynamic styling effects are desired or to circumvent incomplete CSS browser support and incompatibilities. But JavaScript breaks away from the declarative syntax that make CSS compact and easy to learn.

So the idea is to create a DSL (Domain Specific Language) in JavaScript to supplement (or replace) CSS to allow styling in a first class styling language. An implementation could contain compatibility hacks behind the scenes yielding cleaner cross-browser and enhanced CSS. There are already a number of CSS DSLs in Ruby, Phyton and even a mootools class (http://revnode.com/oss/css/) which are all designed for these reasons.

This may not be for every web site because some people browse the web with JavaScript disabled but if you are developing a web application, you can count on JavaScript being enabled 100% of the time.

The styling DSL would remain mostly declarative and very close to CSS. Here is a basic example:

// home.css.js
// requires css.js
(function () { // optional, avoid global variable pollution and associated side effects
  var content_width = 1024;
  var bg_color = "#fff";

  var style = new CSS;

  style.add( '#content', {
    position: 'absolute',
    top: '0px',
    left: viewport.width - content_width,
    width: content_width,
    height: '800px',
    background: bg_color,
    margin: '5px',
    padding: '1em'
  } );

  style.add( 'table', {
    border-collapse: 'collapse';
  } );
})()

This may not the most sophisticated CSS DSL ever designed but a basic proof of concept can be implemented in 8 lines using prototype.js:

// css.js
CSS = Class.create( {
  add : function( selector, style_definition ) {
    Event.observe( window, 'load', function() {
      $$( selector ).invoke( 'setStyle', style_definition )
    } )
  }

}

The nice thing about this approach is that if can be done today, no need to wait for CSS3 or 4 implementation in all major browsers.
However this does not work automatically for DOM elements created after the page has loaded or for elements which class is changed dynamically. For this to work, one would need to override Prototype Element.addClassName() and Element.removeClassName(). A more complete implementation would also override Element.classNames(), Element.hasClassName(), Element.toggleClassName(), $() and $$(), so that these methods become aware of the classes created with css.js.

SVG in Internet Explorer, SVG Web and Raphaël

SVG, the Scalable Vector Graphics standard, has been held-up hostage of its lack of support in Internet Explorer for the last 6 years since SVG 1.1 became a W3C recommendation.

Google is now fixing this with SVG Web, a Javascript library that emulates SVG but it still requires the Flash plugin.

The other option is to use Raphaël, an outstanding Javascript dynamic vector graphic library that relies on SVG in SVG-capable web browsers and VML in IE. Raphaël which is currently at version 0.8.6 is growing fast as the solution for Plugin-free vector graphics on the web. I use Raphaël for my current project for dynamic charts.

Google Waves - A Revolutionary Collaboration System

Google Waves is a clever combination of email, wiki, blog, instant messenging, and versionning.
 

 
It is not just an amazing tool but it is also an open-system enabling competing implementations to communicate freely in real-time.

 Finally this is a good showcase of what can be done with HTML 5. There is no more limit to the types of applications that can be developed on the web.

Tim Berners-Lee on Linked Data

Tim Berners-Lee, the inventor of HTTP, HTML and URLs, tells us about the genesis of the World-Wide-Web. Then Tim reminds us that there is still very little data available on the web. From there he explains that we need data relationships, what he calls Linked Data.

Wikipedia contains a lot of data. DBpedia extracts that data out of the text and make it available as Linked Data.

Governments around the world hold a lot of data, once this data becomes available, new services can emerge.

Lots of data is also in social networks, although it is still not liberated from services, i.e. it is not available as Linked Data.