XSLT, node.js 0.10 and a Fun Two Days of Native Modules and Memory Leaks
Thursday, 25th April 2013, 17:14
What a fun week it's been! Let's start near the beginning, with the final jump of upgrading to node.js I took this week with the various sites I run. First job of the day, upgrade my test server and see what broke. And what did break? Well a few modules, so carefully updating them one by one, less and less broken things. Then I get to the first major hurdle, the XSLT module I've relied on for a while.
See, the approach I've been taking to making custom bespoke CMS systems quickly, has been to utilise XML and XSLT templates for controlling content. Sure you could take the Wordpress route of letting anyone do anything they want with a page, often including hackers... seriously using Wordpress for your website these days is like having a Yahoo or Gmail account for your business contact address, spend a bit more money and get something done properly!
Oh and if you still build websites using a LAMP stack, I feel for you. PHP is awful and deserves to die. I have nothing against MySQL other than historical dislike for its lack of standards and late support for true relational structures, heck I can even understand why someone would run Apache these days, even though it is an archaic old dragon with a dreadful config file format in comparison to something sleek and modern like nginx.
But PHP has always been the VisualBasic of the web development world, apart from being free and having a low entry bar for budding programmers, it is awful. The flat scripting model is outdated for modern needs, the language is one of the worst most inconsistent piles of garbage I've ever come across, and basically whilst people have done good things with it, plenty of great artists have done good things with total garbage. But it's still garbage.
Where Was I? Oh Yes node.js Upgrade
So anyway, there is little point having a designer spend ages making a website look good, if an over-eager user then turns up and starts using strange font sizes, or even worse creating a huge mess with the layout, and either thinking it looks better when it just looks unprofessional, or in even more frequent cases not even noticing what they did.
The answer here is to restrict what they can do, don't let them control the formatting, and XSLT templates are a handy way to do this. Even if XSLT itself is dreadful and almost deliberately created to be impossible to read and follow even with syntax highlighting. But it will do.
CMS sites I build have two templates for an item of content, sometimes generic items like a paragraph, other times a whole detailed collection of content. One XSLT template is used to present the XML data users enter as a HTML edit form, the other is used to display the XML as final HTML content. Converting the submitted edit form into XML is a trivial task.
For XML, node.js offers a number of modules, but for XSLT I could only ever find just one, and that wasn't even listed on the third party modules page I only discovered it through StackOverflow via Google.
Node 0.10 broke it, and the author seemed a tad AFK. So what to do? Well actually it didn't look that complicated a module, so I figured I'd never made a native C/C++ module for node.js, perhaps I'll throw a day at this and see if I can make one. Thus libxsltjs was born.
It turns out that V8 has been made as complicated and confusing to muck around with as MAME, for the very same reasons. Templates and bad developer documentation! Don't get me wrong, I'm not against templates at all, but they do introduce a level of obfuscation that macros also like to add.
Using the node_xslt and far-too-sparse-for-a-company-the-size-of-Google V8 documentation as a reference, I slowly began to decode the macros used in the former and the few examples of the latter, with everything working fine at the end.
New module created, connect replaced with fastworks.js on all my remaining sites, because yet again another connect update caused something to break which was a big motivation for me writing my own framework in the first place.
PostgreSQL Starts Imploding
Alarms begun to go off after a few days post upgrade, and they all pointed to the database server. Logs are spamming "FATAL: the database system is in recovery mode" at me, which after some googling seems to indicate a hardware issue. Hardware issues are really not what any system admin wants to read, but a reboot of the box appeared to fix things.
At least for a day, then the same thing happens again, so this isn't a one off, this is a serious new issue. After a bit more digging I discovered that the reason PostgreSQL was going into recovery mode related to an interesting feature that CentOS has with regard to low memory situations.
Basically, when the box is running out of memory, a monitoring process starts to look around for something to KILL. And it was choosing a PostgreSQL process, which made PostgreSQL panic and think it had crashed, which resulted in a distrust of the database integrity, that kicked off recovery mode.
So now knowing the cause of the implosion was a low memory condition, and that this was occurring at least once every 24 hours, all the signs were pointing to a dreaded memory leak. And it didn't take me long to find a culprit, as I watched node processes grow in memory usage higher and higher as time went on.
Boot the Test Server!
Starting up a node app on my test server, I started to use siege to throw some connections at it. Just 1000 of those caused the resident memory usage to rise from it's initial 30m to well over a hundred, after 3000 or so it was in the 250m zone. Leaving it for a few minutes showed that was never going down.
I suspected I knew what the cause of this memory leak was too, so I disabled all XML and XSLT commands from the test site, and ran the siege again. This time, no significant growth at all, it was definitely my new node module. :(
But I had a good idea why, I'd sort of assumed that when I passed an object to Javascript V8 it would free that itself, but it doesn't if this object wasn't created by the V8 library. Why would it? How would it know my module didn't need it anymore let alone the right way of freeing it?
So even more googling and I discovered I have to create a persistent handle, which is what V8 uses to reference objects that are passed to Javascript but never cleared up by the garbage collector. And then, you have to mark it as a weak handle, and provide a callback function for when it is no longer referenced in JS.
What kind of stupid way of doing things is that? A weak persistent handle? Surely there were better ways to approach that whole issue? It looks very much like something which evolved rather than was planned.
Getting the thing to compile was a nightmare and documentation on the thing is almost Facebook level awful, eventually I had to take inspiration from the node.js source code itself on how to do things, but finally I managed it and the memory leak was cured.
Just as well, since as things stood I was restarting 10 or so node processes every 30 minutes just to keep the memory level low! There is nothing like a constant reminder that you HAVE to fix something, to put unnecessary pressure on you fix it. :)