By Steve Von Takach
At ACA we have been applying web technologies to a market that has previously been made up of hardware controllers. While you do eventually hit hardware with our system, we are primarily a software stack and more often than not our software is run on virtual machines or in docker containers on a shared host. This approach decidedly moves AV into the IT environment, bring with it all the advantages of a fully managed system.
In that IT environment our approach to disaster recovery has been with hot backup systems. The database is replicated in real time and a second server is on standby - however this server is not doing anything. Similar to having a spare fully patched, up to date, ready to go, hardware controller lying around in case one fails.
Today we are announcing the imminent release of our high availability building management system that does away with the ‘spare’ completely.
How does it work?
There are three possible configurations:
1. Solo System
This is our traditional system. A single service with an optional database backup and/or a hot spare in a disaster recovery site.
2. Load Balanced
Two master systems that will take over the work of the other if one goes down. Each has a full copy of the database and can be located in separate physical locations as well as on separate networks.
3. Master Slave
A single master system (probably running in your datacenter) coordinates multiple slave systems that can run on separate isolated networks. If a slave goes down, the master will take over the work and by default the slaves perform the work.
This provides the most resilience to failure as slaves can be placed in physical buildings. If the wide area network goes down you’ll still have full control with no downtime from a control perspective.
All configuration and management is performed on a master system so you can still manage and monitor your spaces from a single location, including live debugging and system introspection.
Failover windows and recovery times can also be configured to reduce downtime when a hardware failure does occur. By default, if a slave system goes down there is a 20 second recovery window before the master will take over, this protects against short lived network partitions. When the slave comes back up, or is replaced, control is immediately passed back to it - this is the recovery process.
In the case of intermittent hardware failure, maybe the hardware is failing every 15 minutes resulting in multiple outages, it is possible to configure recovery windows. For example if multiple failures are occurring, a slave can be configured not to regain control until a particular day and / or time (unless there is an outage of the master server, in which case it will take back control immediately) so the outage can be investigated without further disruption.
All of this results in an unprecedented level of deployment flexibility and system management capability. With this, ACA has created the quintessential high availability smart building system.