Many cloud PBX services are quite critical for customers. Having the system down for a longer time is unacceptable, and if a server fails then there should be a replacement ready pretty soon.
The best way to fail over is still to use a virtual server, running on a bare metal host. This way the whole PBX can fail over to another host. This can happen within milliseconds today, and existing calls and registrations can stay up. Controlling where the VM is running makes sure that the real-time requirements of the PBX are met. This setup is the gold standard for redundancy, and it does not need any additional features from the PBX.
A little less sophisticated, but still reasonable way to achieve this within a range of a few minutes downtime is a active/passive setup. In this setup there is a primary server and a backup server. The primary server is usually well equipped and connected. The secondary is used only in case when the primary is not available and only for a relatively short time. There needs to be a program taking care about turning the secondary server on when the primary is not working any more. What this robot does is essentially the following:
- Monitor if the primary server is available. This can be done by polling the web server of the PBX, tolerating a couple of failures.
- If the server becomes unavailable, check if the rest of the Internet seems to be okay. This step is necessary, otherwise if the secondary server becomes unavailable it would think it is a fail-over event.
- When the fail-over event is detected, the secondary PBX needs to reload the configuration and start accepting registrations. It will also start registering SIP trunks.
- What is should also do is to trigger an ActionURL that can be used to redirect DNS records, so that the users can turn to the secondary PBX.
The configuration information for the fail over can be stored in a separate file, outside of the working directory of the PBX: This makes is possible to keep the content of the working directory exactly the same, using some file system synchronization software like dropbox or activesync.
Going from the secondary server to the primary is something that should be done manually. The easiest solution is to start the service for the primary server manually.
This feature will be available in version 5.4.1. Anybody interested in trying this out is welcome, we'll provide beta images for testing.