Writings

Documents

Weblogs

Utility Servers in the JVM (February 15, 2005)

Utilities Servers: I've been thinking more about Nailgun, the server which hosts small Java programs to be run from the command line. One question is, why can't (or don't we) use Java more often for common utility tasks? Think of all the command line utilities written as executable (binary) or script files. If we didn't have the startup time of using the JVM, would we be any better off using Java for these purposes?

Let's assume that, automagically, we could launch any Java utility with zero startup cost. Nailgun is one approach--the runnable classes are loaded by a server, which keeps them in memory (though could probably swap out the least-used ones). The classes are executed by a small command-line executable written in C. Since the server is started once and remains resident, we just absorb the cost of launching the C executable (minimal), plus the (local) network call to request the class be executed (Nailgun uses a small custom protocol for this). Anyway, let's suppose that using Nailgun or a similar approach we could reduce the cost of execution to be close to the cost of running the same utility written as a binary executable. I admit this is just a theoretical possibility, and won't comment on how realistic it is.

I'm going to call Nailgun one implementation of a "Utilities Server", which is a server process that runs small "hosted utility" commands on demand from a local client, requested over TCP/IP. A Utilities Server is a memory-resident multi-threaded server that executes in a JVM, and can run any utility that is written in a JVM bytecode (or that can be compiled to do so).

So, some advantages

  1. Lots of APIs: We have access to all the Java APIs, from encryption to image manipulation.
  2. Dynamic Loading: We can reload our utilities at any time--use a custom classloader to check the file's timestamp, or force a flush/reload on demand.
  3. Scripted or Compiled: We can run any languages the JVM supports (like BeanShell, Jython, JRuby) or compiled Java classes.
  4. Low-overhead IPC: We can pipe data between different hosted utilities using direct, in-memory data streams without returning to the OS in-between.
  5. Security Manager: We can define one or more security managers to control exactly what our hosted utilities can do. We can also use reflection on loading a class to check for disallowed package access (e.g. uses of java.net.*).
  6. Swappable Implementations: Using dynamic classloading, we could swap in different implementations of the same API, or similar, but different APIs for a given purpose: use different XML parsers on demand, or switch between GNU/Perl/Java regexp packages.
  7. Colorful RMI: Remote method invocation, however you like it: Java RMI, Corba, SOAP, HTTP/REST, whichever.
  8. Auto-tuned Performance: If we are using a self-optimizing VM like Sun's HotSpot, our utilities will gain speed as they are executed more often.

Those are just off the top of my head. Now, some downsides:

  1. Piping: The common practice of chaining together command-line utilities using the pipe command becomes a little weird here, as any invocation of a hosted VM utility will have to route through our remote protocol. Will it be much slower because of this?
  2. Invocation: Need an easy way to invoke utilities without having to write a wrapper script to do this for us. Nailgun, for example, has a command-line executable called
    ng
    , which takes as an argument the hosted utility to run. Would be nice to avoid this and use the utilities as if they were, for example, Bash scripts.
  3. Opacity: Using a hosted VM adds a layer of indirection--where is our utility loaded from? How do we edit it and load a new version? What if two versions are available? For command-line utilities, these (my guess) would all be stored together in a directory of binaries or of scripts; in Java, these can come from a directory, a Jar, a ZIP file, etc.
  4. Security Confusion: Execution of scripts or binaries from the command-line is normally controlled through file access permissions. Java has its own security mechanisms--can get confusing if we have to mix-and-match these, as changing permissions for one set of utilities is orthogonal to changing them in the other.
  5. Management Confusion: As with security/execution permissions, we have all sorts of tools for monitoring, backgrounding and controlling processes that won't work using a separate utilities server.
  6. Bad Behavior: How do we stop or kill utilities that are chewing up lots of memory or CPU cycles? From normal utilities, we just kill the process. How often would we need to kill the whole utilities server?

Still what I'm thinking is that this idea of running a "utilities server" is, overall, a pretty good idea. I like the fact that I could download a signed/trusted Jar with a bunch of these utilities written in Java or in a JVM language, and have them available as needed. I like that, given the OO patterns that allow us to write adaptors and plug-in implementations of APIs, I can change how my utilities work at runtime. And, given the huge amount of code available in the FOSS/Java world, the range of functionality we have available is pretty awesome. Glad that Nailgun is giving us a model for how to do this.

You can comment on this on the jRoller Website site, the the host for the blog entry above.