Highbase is currently comprised of several shell scripts and some C code. It's actually a good project (talk about self promotion) that hasn't reached a stable release yet just because
- It hasn't been tested enough in production environments
- I've been amazingly busy during the last years. Lots of work, and lots of parenting in the last two and a half years in particular.
Still, it's real close to a release candidate, and I will continue testing and fixing the main branch. However, I believe writing a new version in Erlang has many advantages:
- Erlang was designed from the ground up with reliability in mind (among other things. It's quite popular now due to it's multi-core-friendlyness, but that's not really useful for a project like this), so it's only natural to use it to develop a high availability solution (you know, the "best tool for the job" mentality)
- Except for the mysql service verification (I wouldn't want to go with odbc), and perhaps gratuitious ARP (I'm not sure at this point), rewriting in Erlang would remove all other dependencies in third party packages. In general, it would make the solution more mantainable and hence more reliable (see point 1).
I already wrote a first draft in Erlang about a year ago, but this was when I was just learning the language, so I didn't take advantage of it's best features for reliability. Actually, the problem was I wrote my version using 'pure' Erlang, dismissing the OTP part. Erlang as a language is OK (it has anything you need to write reliable,concurrent software). Erlang/OTP is a powerful thing, in the sense that it abstracts you, as a programmer, from the low level chores of distributed, concurrent and fault tolerant programming. You just have to focus on your programming logic, and by following some conventions (Behaviours, which for OO people could loosely be related to Interfaces and Base Clases) you get a lot of extra functionality for free.
This new rewrite will be done using the gen_server behaviour, together with events. All this free to use from OTP.
Here's a state diagram with an initial draft of the slave routine (done with Visual Paradigm):
Without reading a line of code, here you can get a grasp of the power of OTP. The slave routine is, in it's normal state, just waiting for an event. So far, I've identified the following events:
- service down
- link down (erlang link down, would probably be due to the master node being down)
- shutdown request
In the first case, just as today, we first attempt to restart the service, and if this fails, we go for a takeover. If restart fails, we first to a failover (i.e., shutdown of the master node, this is the same algorithm executed on our current code branch).
In the second case, if the node is down, it's just a takeover. If the node is up, we do a takeover with ARP spoofing, because we assume there's something weird with the other node. This part of the algorithm can be improved (we could do another verification, go back to the waiting for event state for N times, etc., this is just a draft).
Still, one of the improvements over the current version is that I don't have to handle any loops, OTP handles that for me, all I have to do is write the callback functions (waiting_for_event, restarting_service, etc). Less code = less chance of errors.
I'm currently revewing the design of the new algorithm. I hope to have a draft in two weeks (this is a slow week for me, my daughter is starting preschool education next week and hence my house is kinda crazy).
I'm also overdue with regular Highbase releases so I'll try to get up to date with both trees.
The current erlang tree is mysql-ha-erl (legacy name, before the trademark issues). I'll surely be creating a new tree, highbase-erl, during the next few days. Official release news will be broadcasted here from the official site.