More Benchmarking in Node.js and V8
Tuesday, 14th August 2012, 12:19
I may have mentioned that I didn't like the way heavy frameworks (and the blind captains that sail in them) had begun to infect the Node.js stratosphere, and that I might consider making my own. Well the first stage is to look closely at the core design decisions and then benchmark them to make sure they really are a fast way to do things.
My plan is this, create something as easy to use as Connect, because that really is a nice framework. But to make it as fast as possible, moving the goal posts away from people who are trying to turn Node.js into .NET, and back towards the clean and mean approach which is what it should all be about.
Basic Design
The built in modules of Node.js should be treated as sacrosanct, the internals may change, but the guys who made Node.js know what they are doing far more than I do, so I trust them to continuously improve how things work and make them faster.
Routing should be the starting point for everything, and it should route to multiple stacks of middleware. At the moment, with Connect, on my websites a connection comes in and several things are done across all of them. Thing is, they don't all need everything, so why do it!
If a request for a static item comes in, say as an image or favourite icon, this should be sent to a stack that does nothing more than pipes the image out. A webpage request that requires validation using cookies, this should never even pass through parts of a stack designed for serving static requests, why waste the time? But it should go through a stack which deals with cookies, but not waste time in one that worries about form data.
And likewise, any page where form data is submitted, this should go through a stack which deals with that, probably taking advantage of formidable. Or some alternative if you prefer.
But the basic design philosophy should be routing is king, calculate a route as fast as possible, and then send it to the most efficient stack for that request.
Routing Methods
I can think of a few methods to test URIs for routing purposes, since this is a fundamental feature it needs to be the fastest. The most obvious is a straight string match, slightly more complicated versions can involve regular expressions which could be slow. Possibly not as slow as the latter would be an associative array based method, but my gut feeling is it won't be as quick as a straight string match technique.
However I think it is the job of a framework/library designer to encourage best practise, so by deliberately not supporting something which is bad, should mean I don't turn into Microsoft, though it would certainly be nice to have their money.
So, I'm going to test a number of approaches and see what the damage is. And who knows, if the performance of some are better for certain circumstances than others, perhaps a choice should be offered. One may be ideal for a short list of routes, another may be far better if the list is huge.
Basic Benchmarking Script
When we are setting up, time is cheap, so we can safely assume that creating an array of routes and sorting them is free. There is also no reason why we couldn't offer some manual weighting system to speed it up further, so lets test a sensible system.
Our standard list of possible routes for this test will be as follows:
/service/delete
/service/get
/service/put
/service
/css/
/images/
/script/
/
And we'll test it with the following URIs:
/service/delete/28
/service/put/44
/service/get/a/much/really/long/route
/service/view
/css/default.css
/images/snookie.jpg
/script/jquery.js
/roadtonowhere
/
Our basic benchmarking script is going to look like this:
var runtimes = 1000000;
var routes = [
{ path: "/service/delete", destination: doNothing },
{ path: "/service/get", destination: doNothing },
{ path: "/service/put", destination: doNothing },
{ path: "/service", destination: doNothing },
{ path: "/css/", destination: doNothing },
{ path: "/images/", destination: doNothing },
{ path: "/script/", destination: doNothing },
{ path: "/", destination: doNothing }
];
var uris = [
"/service/delete/28",
"/service/put/44",
"/service/get/a/much/really/long/route",
"/service/view",
"/css/default.css",
"/images/snookie.jpg",
"/script/jquery.js",
"/roadtonowhere",
"/"
];
console.log("Runs: " + runtimes);
for (var u = 0; u < uris.length; u++)
timeRoute(runtimes, uris[u], ROUTING_FUNCTION_WE_WILL_TEST);
function timeRoute(runtimes, strRoute, fncRouter)
{
var dtStart = new Date();
for (var runcount = 0; runcount < runtimes; runcount++)
fncRouter(strRoute);
var dtEnd = new Date();
while (strRoute.length < 20)
strRoute += " ";
console.log("Route: " + strRoute + "Total Time: " + (dtEnd - dtStart) + "ms, Avg Time: " + (dtEnd - dtStart) / runtimes);
}
function doNothing()
{
var x = Math.random();
return x;
}
This should be very easy to follow, and I've tried to keep it as close to how we'll actually do things in the framework by including a destination function in the routing array.
Simple String Match
Here is our most simple case scenario, literally going through and matching strings, firing a call off that represents the correct route.
function testStringMatch(strRoute)
{
for (var p = 0; p < routes.length; p++)
{
if (routes[p].path == strRoute.substr(0, routes[p].path.length))
{
routes[p].destination();
return;
}
}
}
And here are our results, run a few times until they settled into a repeating pattern:
Route: /service/delete/28 Total Time: 128ms, Avg Time: 0.000128
Route: /service/put/44 Total Time: 220ms, Avg Time: 0.00022
Route: /service/get/a/much/really/long/route Total Time: 192ms, Avg Time: 0.000192
Route: /service/view Total Time: 281ms, Avg Time: 0.000281
Route: /css/default.css Total Time: 410ms, Avg Time: 0.00041
Route: /images/snookie.jpg Total Time: 420ms, Avg Time: 0.00042
Route: /script/jquery.js Total Time: 543ms, Avg Time: 0.000543
Route: /roadtonowhere Total Time: 450ms, Avg Time: 0.00045
Route: / Total Time: 255ms, Avg Time: 0.000255
This is as you would expect, the earliest route matches are quicker, and the further down the array it must go the longer it takes. Before you ask, I did try optimising things a tiny bit by storing routes[x].path.length as routes[x].len but it made no difference at all, which shows you how good at times V8 is at optimising.
Regular Expression String Match
Using almost identical matches to the ones above, with the only stipulation that we demand the start of a line, regular expression matching doesn't at first appear as costly as all that.
Here is our new routes array and test function:
var regexroutes = [
{ regex: /^\/service\/delete/, destination: doNothing },
{ regex: /^\/service\/get/, destination: doNothing },
{ regex: /^\/service\/put/, destination: doNothing },
{ regex: /^\/service\//, destination: doNothing },
{ regex: /^\/css\//, destination: doNothing },
{ regex: /^\/images\//, destination: doNothing },
{ regex: /^\/script\//, destination: doNothing },
{ regex: /^\//, destination: doNothing }
];
function testRegExStringMatch(strRoute)
{
for (var p = 0; p < regexroutes.length; p++)
{
if (strRoute.match(regexroutes[p].regex))
{
regexroutes[p].destination();
return;
}
}
}
And here are our results:
Runs: 1000000
Route: /service/delete/28 Total Time: 140ms, Avg Time: 0.00014
Route: /service/put/44 Total Time: 341ms, Avg Time: 0.000341
Route: /service/get/a/much/really/long/route Total Time: 215ms, Avg Time: 0.000215
Route: /service/view Total Time: 332ms, Avg Time: 0.000332
Route: /css/default.css Total Time: 375ms, Avg Time: 0.000375
Route: /images/snookie.jpg Total Time: 493ms, Avg Time: 0.000493
Route: /script/jquery.js Total Time: 505ms, Avg Time: 0.000505
Route: /roadtonowhere Total Time: 587ms, Avg Time: 0.000587
Route: / Total Time: 518ms, Avg Time: 0.000518
For simple matching, V8's regex engine is clearly pretty efficient. This isn't to say you couldn't write an awful one that won't make it cry, I'm sure you could. But using sensible ones seems to not be that costly.
Using more complicated RegExs, like the following:
var regexroutes = [
{ regex: /^\/service\/delete\/[0-9]+$/, destination: doNothing },
{ regex: /^\/service\/get/, destination: doNothing },
{ regex: /^\/service\/put\/[0-9]+/, destination: doNothing },
{ regex: /^\/service\//, destination: doNothing },
{ regex: /^\/css\/[a-z]+\.css/, destination: doNothing },
{ regex: /^\/images\/[a-z]+\.jpg/, destination: doNothing },
{ regex: /^\/script\//, destination: doNothing },
{ regex: /^\/$/, destination: doNothing }
];
We get results which are still pretty solid:
Runs: 1000000
Route: /service/delete/28 Total Time: 155ms, Avg Time: 0.000155
Route: /service/put/44 Total Time: 266ms, Avg Time: 0.000266
Route: /service/get/a/much/really/long/route Total Time: 226ms, Avg Time: 0.000226
Route: /service/view Total Time: 402ms, Avg Time: 0.000402
Route: /css/default.css Total Time: 378ms, Avg Time: 0.000378
Route: /images/snookie.jpg Total Time: 442ms, Avg Time: 0.000442
Route: /script/jquery.js Total Time: 588ms, Avg Time: 0.000588
Route: /roadtonowhere Total Time: 488ms, Avg Time: 0.000488
Route: / Total Time: 630ms, Avg Time: 0.00063
Whilst for very short routes like the root home page, string matching is fast, there is no doubt that the cost of using regexps for routing is quite low for many cases.
Associative Array Matching
Now for something a bit way out there, I really am not expecting this to be as fast at all, but let's experiment with it anyway and find out.
Here is our new array and test function:
var routesobj = [
{ "/service/delete": doNothing },
{ "/service/get": doNothing },
{ "/service/put": doNothing },
{ "/service": doNothing },
{ "/css/": doNothing },
{ "/images/": doNothing },
{ "/script/": doNothing },
{ "/": doNothing }
];
function testAssociativeMatch(strRoute)
{
while (strRoute.length)
{
if (routesobj[strRoute])
{
routesobj[strRoute]();
return;
}
strRoute = strRoute.substring(0, strRoute.lastIndexOf("/"));
}
}
And needless to say, the longer the route the slower it is:
Runs: 1000000
Route: /service/delete/28 Total Time: 1584ms, Avg Time: 0.001584
Route: /service/put/44 Total Time: 1606ms, Avg Time: 0.001606
Route: /service/get/a/much/really/long/route Total Time: 4328ms, Avg Time: 0.004328
Route: /service/view Total Time: 990ms, Avg Time: 0.00099
Route: /css/default.css Total Time: 1005ms, Avg Time: 0.001005
Route: /images/snookie.jpg Total Time: 1106ms, Avg Time: 0.001106
Route: /script/jquery.js Total Time: 1031ms, Avg Time: 0.001031
Route: /roadtonowhere Total Time: 357ms, Avg Time: 0.000357
Route: / Total Time: 334ms, Avg Time: 0.000334
Maybe we can optimise things a bit, but perhaps this way lacks any sort of clean matching process without manipulating strings.
Tree Route Matching
This is another way out idea, that involves an object based tree hierarchy, so a URI like /service/delete/28 becomes obj.service.delete.28. Again really not expecting this to be any good at all, but for the investment time in finding out, it seems silly to not try.
However this is not easy to do, and requires abusing eval(). To make matters worse the most efficient way removes options when it comes to what you can and can't match. Here is my quick attempt at it:
var routestree = {
service: { delete: doNothing, get: doNothing, put: doNothing },
css: doNothing,
images: doNothing,
script: doNothing
};
function testAssociativeMatch(strRoute)
{
strRoute = "routestree" + strRoute.replace(/\//g, ".");
while (strRoute.length)
{
try
{
eval("if(" + strRoute + ") " + strRoute + "();");
}
catch (error)
{
}
strRoute = strRoute.substring(0, strRoute.lastIndexOf("."));
}
}
A nightmare as you can see. We have to catch errors which eats time, and manipulate the string which eats time too. This is more of a lesson in stuff you can do, rather than stuff you should, and it would need all sorts of sanitising of input or you'd be in a world of hurt. But what the hell, let's benchmark it.
Runs: 1000000
Route: /service/delete/28 Total Time: 101297ms, Avg Time: 0.101297
...CTRL-C
As you can see, no way hosey.
Conclusion
Regular expressions are the way to go when it comes to routing. Sure someone could make an awful one, but with a bit of guidance (or more accurately just making sure the examples are clear and efficient) that shouldn't be too much of an issue.
RegExs.... I CHOOSE JOOOOOOOOO!
But at the same time, people might want straight matching, so worth offering that as an option too.