Modality (February 14, 2005)

I was thinking about the MVC (Model-View-Architecture) design model while looking at an application with notifications. The application is mainly accessed through a complicated GUI client. Notifications are triggered by data-bound conditions, and on a notification, one either has colors on the screen change, or a popup window opened. But that's about it. There is no way to configure a notification other than through the dialog, and no other outputs of a notification other than coloring or a popup dialog. What interests me is how one can design for multi-modal input and output, and what advantages this offers.

Let's say we participate in a discussion group on the web. We normally go to a specific website and check for new postings in the discussion forum. We can also get a daily email of all new postings that day, or an email whenever a new posting is made. We may ask to "monitor" a specific thread for activity--for example, we want to know when someone has responded to a question we've posted, receiving an email in that case.

A long time ago, the standard mechanism for tracking and participating in ongoing discussions was the mailing list. One didn't "monitor" a mailing list (though you could probably write a tool to help you), rather one received emails for each posting as it was submitted (or a daily "digest" of the postings that day). With the advent of the web, the archive of the discussions could be posted on to a web page, with a topic list, archive search, and various thread-navigation tools. Later, "forums" software became available, where one could not only view active postings, but could submit, search, filter, monitor and so on all within a browser.

The trick is that any of these "modalities" for accessing discussions share (or could share) the same underlying processing mechanisms. We have storage of "threads", "topics", and "messages", some "participants" who are active in the discussion, along with email addresses, etc. What differs between the email-only mailing list and the web-based discussion board is the mode in which we participate: email, web-forum, web-based mailing archive, etc. We could also have command-line access, a "rich client" GUI application, integration with our GUI email client, and so on. What stays the same is the underlying features and, possibly, a common "discussion server" that manages them. What differ are the tools that we use to follow and participate in the discussion.

What I suggest is that applications that are built to support multi-modal use are more flexible, and will have a longer lifespan, than those built around a single mode of access. They are more flexible because the users can choose which mode of access is most convenient for them at any given time. They should have a longer lifespan (active use of the software package), because as new modes are "invented", the new mode can (hopefully) just be strapped on to the existing software package without requiring a complete rewrite.

So, for example, if we build a discussion server, why not allow input through various interfaces (direct email, REST technologies, SOAP, RMI) and output through various channels (email, messaging servers, streams, XML)? We don't even need an "uber-API" that forces all our users to access over SOAP, for example--what we need are input and output Adaptors that allow different inputs and outputs to work with the same discussion server functionality.

I brought up MVC because when we think about MVC, we often think about writing a new user interface for some model that has not changed; or we think about changing the model without affecting our existing user interfaces. But the concept of a "view" in MVC is not limited to GUIs, of course. Fundamentally we have a separation of our core functionality with our input to, and output from, those functions. If we build to support that separation, we allow our users (and other developers) more room to play and experiment with new models for accessing and contributing to the underlying process.

For example, what I really want are different input modes to look up words on dictionary.com or LEO. I also want their output in XML. If I had both of those, I could easily write a plugin for jEdit, or for Firefox, to look up words, using whatever UI catches my fancy. Currently, I have to adapt on my side to their input format (which is HTTP/forms) and output format (which is HTML). The output format is a particular problem, because if I parse and scrape their HTML for results, any small change in their HTML layout can break all my routines. If I had multiple input formats (for requesting a word-lookup) and output formats (say, HTML, XML, CSV), I could write all sorts of clients to work with the underlying feature (looking up definitions and translations).

Worse, if people start writing HTML screen-scrapers to pull definitions out of a web page returned by dict.leo.org, and if those users rely on that output format, then dict.leo.org will have more trouble upgrading to a newer improved (or just different) layout--because they will break all those screen-scrapers.

In some cases we have a financial problem, which is that there is a non-trivial cost to building and maintaining web-based services, and those websites are often paid for through advertising. And the problem, of course, is that many of these websites I want information from are advertising-supported. If they giving me a way to access the data without requiring me to see click-through ads, then the website no longer automatically gets advertising revenue when I visit the website. If I don't display the ads in my client, then I am basically getting the information for free. It may be true that information *wants* to be free, but given there is a cost to writing dictionaries and the like, it's not always true that it will be free. So, for some of these information sources, we get tied in to using their website because the website owners have no other reasonable way to make a living without website-based advertising.

But, I still believe this is a noble goal. The GUI application I was looking at today supports one mechanism for configuring notifications (a dialog), and two ways notifications are delivered (on-screen coloring and popup dialogs). If the notification subsystem was opened up, I could write my own configuration tools for notifications--for example, from the command line, macros, or through email--and I could receive notifications through different channels--say, email, IRC, or instant messaging. And I think that would be a good thing. Of course, there will be limits on the flexibility of both the input and the output, but that's OK with me. What I really want is just some more flexibility than what I'm offered right now.

So how can we get started? An easy first step would be--for websites that don't depend on advertising-based revenue--to offer all discussion forums in an XML format using a secondary URL. The other half would be an addition to the HTTP request for navigating the forum to request the output as XML. Then publish the XML schema or DTD. For our own applications, why not throw in command-line access in all cases, if it at all makes sense? And if it makes sense from the command line, how about an Ant task as well?

Last, I know there is another problem, which is that varying input and output mechanisms only make sense if the fundamental semantics of the calls don't change. If I query a discussion board for a message, I give a message identifier, and get a single message in return. But can I retrieve all messages for a given month? For small data sets, like a log file reader, this might make sense. But most servers would balk if too many users tried to download that data at the same time. This means that given the nature of the data set, there will be limits on how flexible we can make our input or output modes. But we can do better than we are doing now.

You can comment on this on the jRoller Website site, the the host for the blog entry above.