Operator guide

Advisory locking

How Nosdesk uses Postgres advisory locks so periodic background jobs run once across multiple instances, what that means for scaling out, and how to watch the locks.

Most installs run a single backend container, and none of this matters to them. But Nosdesk is built to run as several instances against one Postgres, and once it does, something has to stop the background jobs that should run once from running on every instance at the same time. That something is Postgres advisory locks. There’s no separate coordination service, and nothing for you to configure.

The scheduler runs in every instance

Each backend process has its own periodic scheduler: a small timer loop that fires a job every N minutes. It has no cron expressions, no persistence, and no leader election. One container, and it just ticks. Run three instances, and the scheduler ticks in all three, so a job would run three times a tick unless it’s guarded.

Two strategies keep that safe.

Idempotent jobs just run everywhere

Most periodic jobs do nothing the second time, so they’re left to run on every instance:

  • Pruning expired sessions and refresh tokens (a DELETE of rows that are already gone the second time around).
  • Provisioning future sync_actions and audit_log partitions (CREATE TABLE IF NOT EXISTS).
  • Retention sweeps for CSP reports, audit log, and the sync outbox.

Running these on three instances costs three cheap no-ops. Not worth a lock.

Single-runner jobs take an advisory lock

A few jobs are wasteful or outright wrong to run at the same time, so they call pg_try_advisory_lock(key) at the top of the tick. It’s non-blocking: one instance gets true and runs the job, the others get false and skip until the next tick. Two jobs use this today:

  • Avatar thumbnail backfill re-encodes and re-uploads every avatar. Run it on every instance and you’ve tripled the upload traffic for an identical result.
  • Microsoft Graph delta sync races the per-entity delta token in sync_history if two instances run it together: last writer wins, and the other instance’s progress is silently dropped.

The lock is session-scoped. It’s held on a connection parked for the tick and released when the job finishes, including if the job panics. If an instance crashes mid-job, its Postgres session ends and the lock releases on its own, so the next tick on any instance picks the job back up. There’s no stuck lock to clear by hand.

At boot: plugin provisioning

When an instance starts it scans ./plugins for signed zips to install. Two processes coming up together (a rolling restart, or a debug and release build side by side) would race the same files, so the sweep takes its own advisory lock. The one difference from the scheduler jobs: instead of skipping when the lock is busy, it retries a few times with a short backoff, so a new instance booting while the old one is still shutting down doesn’t skip the sweep and leave a freshly dropped plugin uninstalled.

The same primitive inside a request

The locking isn’t only a background-job thing. The transaction-scoped variant, pg_advisory_xact_lock, serialises a few hot operations while they run: merging tickets, applying rule actions to one ticket, installing a plugin, and the first-run admin setup. These release automatically when the transaction commits or rolls back. Same idea, narrower scope: let Postgres be the lock manager instead of inventing one.

The other half: sharing work, not serialising it

Advisory locks answer “only one instance should do this.” The opposite need, many instances draining one queue without doubling up, uses a different Postgres feature: SELECT … FOR UPDATE SKIP LOCKED. The outbound-email worker claims a batch of rows that way: each instance grabs rows nobody else has locked and skips the rest, so several workers drain the same queue in parallel. Worth knowing so the picture is complete: advisory locks aren’t the only coordination mechanism, they’re the “run this once” one.

What it means for running more than one instance

You can scale Nosdesk horizontally, and the instances coordinate through Postgres alone. There’s no ZooKeeper, etcd, or Redis lock in the path. (Redis is used for rate limiting and the collaboration cache, not for job coordination.)

The one requirement is that every instance points at the same Postgres database. Advisory locks are scoped to a database, so they only serialise instances that share one. Point two backends at two separate databases and you have two independent deployments, not a coordinated pair. Beyond that there’s nothing to set; the default single-container install is already correct the moment you add a second instance.

Watching the locks

Held advisory locks show up in pg_locks:

SELECT classid, objid, pid, granted
FROM pg_locks
WHERE locktype = 'advisory';

A bigint key splits across classid (the high 32 bits) and objid (the low 32 bits). Nosdesk’s keys are chosen so those halves spell short ASCII tags: the high word is Nos for every one, and the low word names the job. MSGD is the Graph delta sync, THMB the thumbnail backfill, PRVX plugin provisioning. So a row with a Nos-prefixed classid is one of ours, and the objid tells you which job is holding it.