Dynamic Language Integration in the JVM (January 5, 2005)

I've been interested to see the different languages built on top of the JVM. If you do a web search for "jvm languages", you'll find a whole slew of them listed, some of which are still in active maintenance. One newish one, Groovy, has been submitted as a standard dynamic language for Java. But I think the current approaches to executing these from within a Java program are too clumsy.

There will be some confusion over terminology here. When I refer to "dynamic" languages, I refer to programming languages that a) must or can be interpreted to run and 2) offer a relaxed language syntax over Java. That's a little bit broad, but some examples will clarify: JavaScript (running in or outside the JVM), Python/Jython, BeanShell, Groovy. They are "dynamic" (as opposed to "static"), in that the compiler doesn't require you to declare all types before you use them, doesn't check to see if method invocations will actually invoke anything, etc. If you make a mistake in that regard, it is a runtime, not a compile time, error. I understand there are many subtle variations as regards dynamic versus static (versus latent) typing, and etc. Do some web searching for more detailed discussions.

What I am interested in here is how to use these in the JVM, in particular, how we might speed up development on some types of projects that involve Java. The current problem with Java is that because of static typing and compiler checks, Java development requires more effort up-front (during design and coding), generally slowing down the overall programming project. On the other hand, all these compiler checks reduce some types of errors (which, how many, how useful up for discussion). But you just have to look at the proliferation of dynamic languages, and the vast number of scripts and programs written in them, to see how much people like a relaxed compiler. As far as I can see, they do. A lot.

This blog isn't about dynamic languages per se, but about current problems integrating them in a project that also uses Java. The question is whether we can't have a clearer, more directed path towards integration, such that I can start coding and prototyping in a dynamic language, then re-code where necessary in Java, running both alongside each other all the while.

Now, for dynamic languages that can run in the JVM--of which there are many--it turns out we can do this. The problem is that you can't do it cleanly: you instantiate a sort of Command object, give it some contextual information, then execute the Command (that's a generalization, but a common approach).

An Example: Scripts in jEdit

For example, the jEdit text editor lets you write scripts in BeanShell, a dynamic language. I have a script, for example, that takes the current buffer in jEdit (say, a text file), parses it, and creates a stub of a Java interface. jEdit invokes the script with some contextual information: a reference to the buffer, the text area, the jEdit instance, etc. These appear as variables within the script--you don't declare or initialize them, as they are ready to use. So "buffer" and "textarea" and "view" are all variables that I can use in any BeanShell script within jEdit. When jEdit invokes my script, it pushes these references into a sort of "context" object, which it then provides as a parameter to the script. But the invocation of the script is indirect. jEdit has no idea what object my script will return (if any), and doesn't know what functions are available in my script, what parameters those take, or what the parameter types are. It sets up a call, and invokes it.

So, instead of (within a jEdit code block)

      InterfaceParser parser = new InterfaceParser();
      String interface = parser.parseBuffer(buffer, textarea, view);
it would look something like this (this is just a mock-up, not what jEdit really does *laziness*)
      import bsh.Interpreter;
      Interpreter i = new Interpreter(); // Construct an interpreter
      i.set("textarea", textarea); // Set variables
      i.set("buffer", buffer); // Set variables
      i.set("view", view); // Set variables
      // Eval the script and get the result
      String script = i.source("interface_parser.bsh");

From what I can see, this is a fairly common approach to integrating dynamic languages within Java. The Java compiler checks the definition of Interpreter, but can't dig down to see what the script itself is doing. If you muck up the parameter types or the return type, runtime exception for you, buddy. Which kind of sucks.

What We Need

What we need, instead, is to be able to treat our dynamic languages as full-class citizens of the Java community. If we can instantiate and invoke dynamic language methods and classes as if they were Java classes and methods, we could start by writing our apps in our favorite dynamic language, then replace parts of it with Java as necessary. This could speed up early development and prototyping, while offering a straighforward migration path to a full-Java application, or at least, the best of both worlds.

An Approach

So, one approach to this is to actually build Java class definitions out of our scripts. Some dynamic languages offer this: you run a pre-compile on your scripts, which outputs Java classes, then you can import, instantiate, extend and otherwise reference them just like Java classes. That is not bad, but it requires, first, a special pre-compile (which adds time if I am editing both the script and the Java), and also, it requires the language to have a clear, non-ambiguous mapping to the Java language, as opposed to a clear, non-ambiguous mapping to JVM bytecode. So, that approach works, and I think the Bistro programming language uses it.

Another approach is to satisfy the compiler. IMO, what the Java compiler wants is to verify the existence and the structure of classes we reference. The class has to be locate-able by a ClassLoader, must have the methods and constructor we are invoking (and they must be accessible), must implement the interfaces we reference, etc. My impression is that, without a complete revision to the Java specification, this should be possible.

My proposal is that when you reference a script, it has an unambiguous name and location so that a special DynamicLangClassLoader (DLC) can find it. There could be special naming conventions, but basically, there would be some mapping between directories and packages, and between a script name and a class name, as with Java files themselves. Our DLC gets is a special type of ClassLoader that our Java compiler uses to test the structural information (class, implements/extends, methods, etc.) of the script we are referencing. The compiler would use a DLC for a script for certain packages. We identify that a particular package is "managed" by our DLC: scripts.patrick.bsh.* would be "managed" by our BSHDynamicLangClassLoader. When a reference is made to a class in a "managed" package, the regular compiler ClassLoader defers to the registerd DLC, and asks it to verify the requested invocation.

Our DLC is thus a sort of Adaptor between two different subsystems--the Java language subsystem and the dynamic language subsystem. A DLC basically checks the script and reports back to the Java compiler whether the script can actually be invoked as the Java program is requesting. The DLC could do this through an interpreted, or maybe could use some reflection on our dynamic language bytecode to verify. The Java compiler shouldn't care either way.

The only restriction is that we must be able to interpret the script as a non-ambiguous Java class. There has to be a Class definition, function prototypes, etc. Method parameters and return values must be valid Java classes, or must themselves be dynamic language constructs which, at some point, resolve to valid Java classes. Ditto for interfaces and superclasses.

Looking again at the example I gave above,

      import scripts.patrick.bsh.InterfaceParser;
      InterfaceParser parser = new InterfaceParser();
      String interface = parser.parseBuffer(buffer, textarea, view);

The scripts.patrick.bsh package would be managed by the BeanShell DLC. It would look in the classpath (for example), for /scripts/patrick/bsh for a file named "interface_parser.bsh". It would load up that script, and verify that there was a parseBuffer() method with three Java class arguments, for buffer, textarea and view, that returned a String. It would report back to the Java compiler that, yes, these existed. And the compiler would go along its merry way.

And, the DLC could be used by the JVM at runtime to convert a script file to bytecode if it wasn't already in bytecode.

The Catch

The catch to this is that, first, our DLC must be able to unambiguously satisfy the Java compiler. The script must be executable as if it were a Java language class. A second catch is that, if there were bugs in the interpreter, or if the script was changed between compilation and runtime, we'd get a runtime exception, and probably not a very friendly one at that.


I think this is doable within the framework of the Java Language Specification without introducing some nightmarish and long new specification process.

You can comment on this on the jRoller Website site, the the host for the blog entry above.