Wednesday, April 14, 2010

On Web Applications, Web Architecture And Resource Identifiers

On Web Applications, Web Architecture And Resource Identifiers

1 On Web Applications, Web Architecture And Resource Identifiers

1.1 Background

As we evolve from a Web of documents (Web 1.0) to a Web of applications (Web 2.0) and eventually Toward 2^W --- Beyond Web 2.0, key underpinnings of Web Architecture such as resource identifiers require careful re-examination. As a member of the W3C's Technical Architecture Group, I have been trying to define Web Architecture in the context of Web applications; a necessary first step toward that goal is to analyze how complex Web applications are implemented on the Web of today.

This article will carefully avoid abstract issues such as Resource vs Representation, URIs vs URLs, etc. - and instead focus on more practical considerations such as:

  1. What is a URI and what can the user expect to do with it?
  2. When dereferencing a URI, what pieces of software does one need to have to retrieve a useful representation of that resource?
  3. Here, useful is defined from the perspective of the end-user. Thus, given a URI to a piece of media on the Web, relevant metadata is necessary but not sufficient to be useful - the user needs to be able to retrieve and play the media stream as well.

1.2 Case Study: BBCiPlayer And BBC Backstage

The British Broadcasting Corporation (BBC) provides streaming access to a large amount of radio and television content via a Web application called BBC iPlayer. In addition, BBC Backstage provides a rich data-oriented API to the underlying dataset in the form of linked data. Additionally, program schedules can be downloaded in a number of presentation independent formats such as XML, JSON and YAML. The remaining sections in this article detail what can (and cannot be done) with the information that is readily available from BBCiPlayer and BBC Backstage. In the process, we observe some design patterns (and anti-patterns) found on today's Web, and their efect on building richer Web applications from Web parts.

1.3 BBC IPlayer

Using the BBC iPlayer Web application requires:

  1. A modern script-enabled browser such as Chrome, Firefox, Safari, or IE.
  2. Browser plugins for media playback, such as Realplayer or Windows Media.
  3. The Adobe Flash plugin for translating playback links on the BBC iPlayer page to their corresponding Realplayer or Windows Media resources.
  4. Appropriate media player plugins based on the user's platform, e.g., Realplayer or Windows Media.

The Web application as implemented provides a rich, interactive visual interface that is sub-optimal for use from other programs.

1.4 BBC Backstage

Given the triple (radio-station, outlet, date) e.g.:

 (radio4, fm, 2010/04/14)
one can retrieve an XML representation of the program schedule using the URL:
 http://www.bbc.co.uk/radio4/programmes/schedules/fm/2010/04/14.xml
as documented on the BBC Backstage site. Alternative serializations such as JSON or YAML can be retrieved by appropriately replacing the .xml extension.

This retrieved schedule contains detailed metadata for each program that is broadcast, including a programme id pid that is used throughout the data store.

The BBCBackstage API assigns a persistent URI to each program of the form:

http://www.bbc.co.uk/iplayer/episode/<pid>
When retrieved, this persistent URI redirects appropriately to the BBC iPlayer page for that program. Note that the media streams for most programs are only available for a week.

As an example of the above, you can retrieve Midnight News from BBC Radio4 for April 14, 2010 by doing:

On the surface, this URL appears to satisfy many of the expectations that users might have:

  1. Plays the relevant media when handed to a Web browser.
  2. Can be bookmarked for later use (modulo the 1 week limit on archived media).
  3. Can be passed around via email?

The final bullet above exposes some of the problems with the current implementation. Note the set of pre-requisites for the BBC iPlayer Web application enumerated earlier; all of these apply to the URI generated above.

1.5 How It Works At Present

It is instructive to turn on HTTP Request/Response tracking in the browser when opening URL http://www.bbc.co.uk/iplayer/episode/b00rw6hf. Here is a brief summary of some of the steps that the browser performs:

  1. Receives an HTTP Response with content-type text/html.
  2. The body of this response is an HTML document that in turn loads a number of JS libraries.
  3. An embed tag in the retrieved HTML page invokes the Flash (shockwave) plugin.
  4. The embedded shockwave player receives several mostly undocumented parameters that pass in details of the enclosing environment.
  5. Once these steps have completed, the browser is automatically redirected to http://www.bbc.co.uk/iplayer/console/b00rw6hf, i.e., the earlier URI is transformed by replacing episode with console.
  6. The HTTP conversation continues, and the browser is eventually sent to http://www.bbc.co.uk/mediaselector/4/mtis/stream/b00rw6g2 which resolves to the realplayer .ram file: http://www.bbc.co.uk//iplayer/aod/playlists/2g/6w/r0/0b/RadioBridge_intl_2300_bbc_radio_fourfm.ram.

Thus, the recipiant of the Midnight News URL would need to implement all of the above transforms (or have access to software that does those computations) in order to effectively consume the media stream that was addressed by the URL.

1.6 Observations

  1. Web applications have gotten more complicated than they need to be: notice the multiple redundant layers between Flash, JS, HTML, and the complex interplay that results during the HTTP conversation between client and server.
  2. Such complex interplay within multiple layers makes RESTful APIs difficult to achieve.
  3. It is possible that the underlying media stream URLs are being intentionally obfuscated. It's hard to imagine anyone wanting to voluntarily inflict the pain inherent in steps 1..6 without a valid reason.
  4. The obfuscation scheme makes it effectively impossible (on the surface) for interfaces other than the BBC iPlayer Web application to play the media.
  5. Note on the surface in the above. As a testament to the robustness of the architecture of the Web, steps 1..6 can be hidden in a computational blackbox that surfaces a reliable URI that can be email.
  6. As an implementation of the above, see this IPlayer Convertor found on the Web.
  7. In addition to providing a simple HTML form that takes a pid and performs the trnaslation that happens during the client/server HTTP conversation, that site offers a persistent URL given a pid.
  8. What's more, the persistent URL offered up by this convertor is guessable given the pid - this in its turn then becomes a RESTful API for accessing BBC media streams given a pid.
  9. Thus, for the BBC Midnight News episode in question, the iPlayer convertor above serves up http://www.iplayerconverter.co.uk//pid/b00rw6hf/r/stream.aspx.
  10. Notice that replacing %s in
    http://www.iplayerconverter.co.uk/pid/%s/r/stream.aspx
    
    in the above with a pid yeilds a persistent URL that can be handed off directly to a media player, where:
    1. The media player supports the codec in use.
    2. The media player supports the underlying streaming protocol, rtsp in this case.

2 Conclusion

So to conclude, let's ask the original question:

  1. Given a URL, what can a user expect to be able to do with it, after having dereferenced the URI?
  2. How does the user discover what software bits he needs in order to consume the received HTTP Response?
  3. In Web 1.0 (Web of documents) the answer was simple --- HTTP Response header Content-Type specified the media type, which in turn specified what the recipiant needed to understand.
  4. A recipiant who only understands mime-type text/html in this example is likely to flee screaming in terror if he makes the mistake of doing Show Source.
  5. We all acknowledge that Show Source helped Web 1.0 succeed.
  6. Q: What is the equivalent of Show Source that will help us collectively take the Web to the next level?

Author: T.V Raman <raman@google.com>

Date: 2010-04-14 Wed

HTML generated by org-mode 6.08c in emacs 23