High availability (active/standby setup)
CTFreak does not include built-in clustering. Being a single Go binary with no JVM or other heavy runtime to warm up, an idle CTFreak instance starts in a few seconds once its config folder and database are reachable, so an active/standby setup covers most high-availability needs without requiring in-process clustering. This page describes how to build such a setup using two servers, so that the service can be switched over to a secondary instance if the primary one becomes unavailable.
Prerequisites
- Two servers, each running its own CTFreak instance.
- A PostgreSQL backend database on each server (see Using PostgreSQL as backend database instead of SQLite). This is required so that data can be replicated between the two servers.
- One of the following licensing setups:
- A SOVEREIGN Edition license on each server. Offline licenses are tied to a single physical server, so each instance needs its own.
- A single BUSINESS Edition license, shared between the two instances. This works because only one instance is ever active at a time, and the license key is stored in the PostgreSQL database, which is replicated to the standby server via streaming replication.
Architecture
| Server 1 (primary) | Server 2 (standby) | |
|---|---|---|
| CTFreak instance | active | stopped |
| PostgreSQL database | active, read/write | streaming replica, read-only |
| CTFreak config folder | active | file-level replica |
Two independent replication channels run continuously from Server 1 to Server 2:
- The PostgreSQL database is replicated using native streaming replication.
- The CTFreak config folder is replicated at the file level (using a tool such as
rsync,lsyncd, DRBD, or your storage layer’s own replication).
NB: the CTFreak config folder holds the configuration file and the execution log files; the PostgreSQL database stores everything else (tasks, nodes, users, notifiers…), including the metadata of each execution. Both need to be replicated for a complete failover.
Setting up PostgreSQL streaming replication and file-level replication is standard PostgreSQL/OS administration and is not covered here; refer to your PostgreSQL version’s documentation and to the replication tool of your choice.
Monitoring
Two checks are needed to detect a failure on Server 1 and run the failover procedure below:
GET {CTFreak instance URL}/isalivereturns HTTP 200 with bodytruewhen the CTFreak instance is up.- A simple connectivity check against the PostgreSQL database confirms it is reachable.
Failover procedure
Case 1: Server 1 or its PostgreSQL database becomes unavailable
- Stop everything on the Server 1 side (the CTFreak instance, if still partially running) and stop both replication processes.
- Promote the PostgreSQL database on Server 2 to read/write (a streaming replica stays read-only until promoted).
- Start the CTFreak instance on Server 2, which becomes the active instance.
- Point clients to Server 2: update the reverse proxy backend, DNS record, or floating IP, depending on how the service is exposed (see Configuring SSL Reverse Proxy).
NB: because the database and the config folder replicate through two independent channels, they can be very slightly out of sync at the moment of failover. In practice, this means the execution logs for the last few executions before the failure may be missing. This is expected and is not a sign of a malfunction.
Case 2: the CTFreak instance on Server 1 fails and won’t restart cleanly, while Server 1 and its PostgreSQL database are healthy
Do not fail over to Server 2 in this case. The standby has faithfully replicated the exact same state, so it would likely run into the same problem: replication is not a backup.
Before restoring anything, send the CTFreak instance log file from Server 1 to our support so the cause of the failure can be diagnosed.
Then restore both the CTFreak config folder and the PostgreSQL database from your latest known-good backup, rolling both back to the same point in time before the failure (see Backup and Restore). A cold backup covers both in a single operation; if you back them up separately (for example a scheduled pg_dump/pg_basebackup for the database), make sure to restore them to the same point in time to avoid inconsistencies.
Important
- Never start both CTFreak instances against the same PostgreSQL database at the same time. Doing so risks corrupting the database content. Only one instance should ever be active.
- Replication protects you against a server or infrastructure failure, not against application-level corruption or bad data. A hot backup or cold backup is your safety net for that (see Backup and Restore).