Unit-testing Performance (February 17, 2005)

Perfomance regressions: I was wondering this morning if it might make sense to write unit tests to verify performance of specific operations. The basic idea is that we have a unit test that prepares the run and executes a certain operation; a wrapper test then executes this performance test a specific number of times. Timing is captured in milliseconds for total, average, min and max execution time.

The unit test for a single execution of the process controls the non-environment specific inputs to the operation--e.g. input data and parameters. We can say that would be fixed. What the performance harness controls does is capture the timings in a standard way, and then compares these to pre-defined performance statistics--the expected cost of the run.

The issue would be, how do we control for environment-specific factors? There is memory available, CPU speed, operating system (and version), cache/swap allocated to the OS, JVM parameters, and so on. My idea is that these values are isolated into a performance profile. When setting up the test for the first time, you create a base profile by running the test in a "record" mode. The record mode captures all the environment details available to the JVM, which includes, as it turns out, things like the operating system, the JVM version, amount of memory assigned to the JVM, and so on. We might even capture the workstation's name, if it has one. For this configuration, the profile is written out (with all those details), and the timings recorded under that profile. When running the actual test, the harness looks for a matching profile, compares the test results with the expected performance, and reports deviations.

Of course, we don't want to match on exact timings, since there will always be factors we can't control. But let's say we have "deviation ranges" which are allowed--10-15% longer is counted as a warning, > 15% is counted as an error. 10-15% less is counted as a bonus, but if the test runs takes < 20% of the expected time, that would count as a warning--namely, that something suspicious was going on (perhaps we had accidentally turned off a processing step).

In the Flying Saucer XHTML/CSS renderer project, we have a "brutal" test we run manually, where we load and render the entire text of Hamlet. While many smaller pages render in less than a second, Hamlet is guarranteed to take much longer, up to half a minute or more. But this test must be run manually and eyeballed to see what degredation, or improvement, we've found.

Note that, for Flying Saucer, we'd also need to control the configuration file we load the application with--in our current configuration, you can choose which XML parser to use when loading pages into memory.

So what interests me is how accurate and reliable this would actually be. Which seems to come down to: how accurate can I make the profile? Even running on exactly the same machine, there are many factors that could cause the JVM itself to run more slowly--for example, having a different set of applications running at the same time, or background services active. On the other hand--isn't it worth a try? Getting an idea--without extra effort--of how much a set of changes to the codebase has affected performance seems like it's worth it.

You can comment on this on the jRoller Website site, the the host for the blog entry above.