Scripting in Node.js AKA How to Watch for Olympic Tickets Using a Script
Tuesday, 7th August 2012, 23:37
So, in the first Olympic Tickets ballot I think I applied for maybe £500 worth of tickets, expecting to get a pair if I was lucky. And I was unlucky, I got nothing. Then maybe a few month before the games they released more tickets, and I snapped up a pair for the mixed doubles quarter finals in badminton.
I was happy with that, at least I got to go to something. Although the irony of being in the East end of London a few stops from where the Olympic Park is, and having to trapse across town to the West side to visit Wembley, wasn't lost on me. And all of Team GB had been knocked out by then so no home talent to cheer for, but the two supporters from Thailand more than made up for that.
Then literally the evening after the badminton trip, at 11pm, my brother managed to get some tickets for the women's hockey the next day, in the park. So that was great, a chance to see all of that and enjoy some hockey. Still no Team GB, but it was nice to get to the main park itself.
So now we are hoping to get tickets for the Athletics this Thursday, and I came rather close today after refreshing the website a lot, but no bananas. And since refreshing webpages is the order of the day, I thought... this is a job for a Node script.
Why should I refresh the page everytime when I can get my computer to do it for me?
The Script
Rather than post the entire thing, I'll post it broken up into lines so you can see how the general principle works.
var http = require("http");
var colour = require("coloured");
colour.extendString();
Just two requires for this script, the built in http module and the coloured one I mentioned already in my Essential Node.js Modules list from the other day.
Now we set up some global variables, normally an evil thing, but for small scripts this sort of thing is fine and saves a lot of messing about.
var intInterval = 10000;
var reUnavailableMatch = /Tickets to this session are currently unavailable/i;
var strAppVersion = "0.1";
var strAppName = "Olympic Tickets Availability Alerter";
The first is the rough interval we will check the webpage, in milliseconds. I think every 10 seconds is plenty. We also have the name of the app and version, just because I find putting these in strings at the start a historically sensible approach.
We also create a regular expression which matches something on the page that only appears if there is no tickets available. We only use this in once place, but we do use it more than once, and it's good practise to not create a new regexp every single time we use it if it's the same.
var strUrl = process.argv[2];
var strName = process.argv[3];
var intErrors = 0;
var intGets = 0;
Our script will accept two parameters, the URL it will be pulling, and an optional friendly name. The reason for the latter is if you have several scripts running at the same time and want to know quickly which alert is being triggered.
We also will keep track of how many times we pull the webpage, and how many errors we get. For no other reason than it gives us something useful we can spam to the screen every five minutes just to reassure us the thing hasn't crashed.
function urlScan()
{
intGets++;
http.get(strUrl, function(result) {
var pageData = "";
result.on("data", function (chunk) {
pageData += chunk;
});
result.on("end", function() {
if (reUnavailableMatch.exec(pageData) == null)
ringAling();
else
setTimeout(urlScan, intInterval);
});
}).on("error", function(error) {
console.log("Webpage error: ".green().bold() + error.message.bold());
intErrors++;
setTimeout(urlScan, intInterval);
});
}
urlScan() is our main worker function. It increases our get counter, and then attempts to pull the webpage. On an error it logs that to the console, increases our error counter and then sets an interval to pull the page again.
On success it adds two listener functions to the EventEmitter for the request, "data" builds a string containing the returned webpage, whilst "end" is called once it is complete. When we have the whole page, we check it against our regex to see if our tickets unavailable string exists, and if so we set an interval to pull the page again.
If we don't get a match, we can assume that this page has a good chance of containing tickets, so we call our alerter function.
setTimeout Good setInterval Bad
Just a quick note on these two functions. As a general rule, you should never ever use setInterval(), unless you have a completely most excellent reason for doing so. This is because doing so means you have to absolutely guarantee that the previous setInterval() has finished or you'll start to get an ever increasing backlog which will grind everything to a halt.
A much safer thing to do, is use setTimeout(), and then call it again just before your function exits. This pretty much guarantees you'll never get a backlog, and unless you are relying on something to happen exactly every so many milliseconds, is perfectly adequate. And actually, if you really do require precision, setInterval() can't give you it either, because Javascript is a single threaded environment, so there is no guarantee that your function can be called exactly on time anyway.
Back to the Scripting...
Now our alarm function.
function ringAling()
{
if (strName)
console.log("\u0007Tickets available for ".cyan().bold() + strName.yellow().bold());
else
console.log("\u0007Tickets available!".cyan().bold());
setTimeout(ringAling, 500);
}
Nothing special here, apart from the strange \u0007 thing, what is that I hear you ask? Well it is the ASCII bell which will cause your console (even if you are logged in via an SSH terminal) to beep.
We will beep twice a second, until you quit the script with CTRL-C, and spam text up the screen just in case you can't hear the beeping over Def Leppard or something.
function showStats()
{
var strMonitorLine = "Monitoring ".red().bold()
if (strName)
strMonitorLine += " " + strName;
if (intGets)
strMonitorLine += " Checked " + intGets + " times (" + intErrors + " errors)";
console.log(strMonitorLine);
setTimeout(showStats, 1000 * 60 * 5);
}
Just so we can not panic that our script has stalled, every 5 minutes we'll spam a line with some info on it. That's our showStats() function in a nutshell.
And finally, the last bit that kicks it all off:
console.log("\n" + strAppName.cyan().bold() + " " + strAppVersion.yellow().bold() + "\n\n");
if (strUrl)
{
showStats();
urlScan();
}
else
{
console.log("Usage: ".yellow().bold() + "olert http://www.tickets.london2012.com/eventdetails?id=EVENTID [\"Event Name\"]\n".bold());
}
Here we spam the name of the app and the version to the console, then check to see if a URL was passed. If it was, we kick off the timeouts to show the stats and pull the webpage, if it wasn't we show a help line, after which the script will gracefully exit because the event queue is empty.
With that in mind, the usage is something like this:
node olympic_alerter.js http://www.tickets.london2012.com/eventdetails?id=0000455AC9A2095C "Thursday Evening Athletics"
Bear in mind the URL you need to pass is the one that you can only click through to when you go to the Search Events page, find the correct event (if it is available) and click the See Tickets button. Something which is alas not there if there are none to see. :/ You also want an account, and make sure you are logged by regularly visiting the My Account page too.
I never said it was going to be easy. :(
Other Applications
I've actually used a similar sort of system to this for the DVD Reviewer Bargain Watcher. It pretty much goes through a database list and uses it to pull webpages, then runs regexps against them to retrieve the current prices.
Can be a pain when the sites it supports change their layouts, as I have to fix the regexps, but that seems to happen rarely these days.