Making PDFs from HTML on your webapp in CentOS
Thursday, 29th November 2012, 14:00
So this week, I had to do something I've not done before, but sounded easy enough. I've been working on a website for online courses which needed to have a method for taking orders either by PayPal or via requesting an invoice. If you chose the former, then PayPal would handle the invoice, but if you chose the latter then it needed to create an invoice as a PDF and email it to the accountant, who would then be responsible for the rest of things.
Stupid PayPal Buttons Are Stupid
The first problem was with the PayPal button, and something as trivial as setting the amount. Whilst not a patch on how awful Facebook's documentation was last time I messed with that, it's still annoying that a company the size of PayPal can fail to mention things.
If you want to create a basic Pay Now button via their system, that is all very easy. And the resulting HTML is a simple piece of form code that contains a few hidden fields. So far, so good. Their website has a list of optional variables you can add to this form, including what they do and when you can use them.
The cancel and cancel_return work nicely enough for the purposes of just returning the user back to your website, and you can change the name of the item (or course in this case) they are buying with the item_name field. Again all hunky dory.
But specifying how much with the amount field? That bit drove me mad completely for ages! The documentation says of this field:
The price or amount of the product, service, or contribution, not including shipping, handling, or tax. If this variable is omitted from Buy Now or Donate buttons, buyers enter their own amount at the time of payment.
- Required for Add to Cart buttons
- Optional for Buy Now and Donate buttons
- Not used with Subscribe or Buy Gift Certificate buttons
Now I'm using a Buy Now button, so why every time I clicked through did it put the price at £0? After an awful lot of googling and playing around, it turned out to be a simple thing PayPal neglects to mention. When you create the button it includes the field _s-xclick, for which the docs say:
What they don't say on top of that is, this makes it entirely ignore the amount field and use whatever you added to create the button with on their website. One of the advantages of using this over _xclick is it also hides things like the Merchant ID and email address. But the price of that is too high, so changing it to _xclick and manually adding the business field with the payment email is a fudge that seems to work fine.
Stupid PDF Creation is Stupid
With the above out the way, next up was making a PDF file. The easiest way to do this you'd think, would be to use HTML to create the invoice and then run a utility to convert it to PDF. This should be really easy, surely there are loads that do this?
Well a quick search of the node.js repository for a module as a starting point limited options down to just one, node-wkhtml. Before I even found out this didn't work properly, I had to install wkhtmltopdf somehow. To say alarm bells ring whenever I have to install something on Linux and it talks about "patched versions" of libraries and compilation instructions that don't quite match the usual ./configure && make install route, is an understatement.
And very valid those bells turned out to be too. As firstly I tried installing the latest pre-compiled static binary, that showed the commands on --help just fine, but actually using it to create a PDF caused a segmentation fault. So I went and tried an older less release candidatey version, same thing.
Next step whenever this happens is to compile it from source, but following the instructions on the Google code page resulted in compilation errors. So scouring the comments for suggestions, a few other things to try that ultimately just gave different errors.
And then I stumbled upon a blog post by DakDad, which lists the very simple steps for installing a precompiled set of QT libraries and wkhtml executables via a repository. Finally I could at least get the command-line working, now back to node.js to get it going there.
Note: Make sure after installing the wkhtml package you re-edit the amberdms repo file and set all the enabled lines to 0, otherwise next time you run a yum update you may have all sorts of conflicts.
Soon I realise, especially after reading the source, that node-wkhtml is never going to work for me. Whether it just works with a version of wkhtmltopdf which I don't have, or not, I don't know. But it doesn't exactly do much anyway and certainly not enough to warrant being a module.
From looking at the version I had installed, piping the HTML to it via stdin is not possible, so I had to take the route of creating a temporary file and then running it on that. Here is how my code for it goes:
var strHtml = "<p>This is my HTML invoice</p>";
var intInvoiceNumber = 1;
var strHtmlFilename = web_basedir + "/invoices/" + intInvoiceNumber + ".html";
var strPDFFilename = web_basedir + "/invoices/" + intInvoiceNumber + ".pdf";
fs.writeFile(strHtmlFilename, strHtml, function(error) {
if (!error)
{
var pdf = childproc.spawn("wkhtmltopdf", [ "--forms", strHtmlFilename, strPDFFilename ]);
pdf.on("exit", function (code, signal) {
if (code != 0)
{
// Throw error
}
else
{
fs.unlink(strHtmlFilename);
// Do something with the PDF
}
});
}
else
{
console.log(error);
}
});
And voila, I get a reasonable PDF which is good enough for our purposes. But what a frustrating route to get there.