Cycle time - the elapsed time from when a ticket is moved to In Progress to when it is merged - is one of the most actionable metrics in a software engineering team. It is also, in most tools that compute it, stale by the time you look at it.
Stale cycle time is worse than no cycle time. If you make a process change on Tuesday and the data reflects it next Monday, you are managing a one-week-lagged view of your team's actual state. The feedback loop is too long to be useful for anything except quarterly retrospectives.
This article describes the technical approach we took to make Aero's cycle time reports update in near real time - meaning within a few seconds of the underlying event - and what changed about how teams used the data once the lag was gone.
The naive approach: polling an aggregate query
The obvious way to compute cycle time is a query: for each closed ticket in the last 30 days, compute (closed_at − in_progress_at) and average by team, sprint, or label. This works. It is straightforward SQL. You can add it to a reporting endpoint and call it from a dashboard widget.
The problem is performance. An aggregate query that scans 30 days of ticket events across a team of 20 developers touches a significant number of rows. Run it on every page load and you have a performance problem. Cache it and you have a staleness problem. The polling interval becomes a tradeoff between freshness and database load that you can never quite get right.
We ran the naive approach for three months in early beta. Our fastest-moving teams were polling for updates at 5-second intervals. The dashboard was visibly slow. The cycle time numbers moved in jumps, not smoothly. Users stopped trusting the numbers.
Postgres triggers and the event queue
The insight that changed our approach was that cycle time doesn't need to be recomputed from scratch on every request. It needs to be updated incrementally every time a relevant event occurs. The relevant events are: a ticket moving to In Progress, a ticket moving to Done, a merge event, a ticket being reassigned.
We added a Postgres trigger on the ticket_events table. Whenever a row is inserted, the trigger fires a function that evaluates whether this event affects a cycle_time_fact record. If it does, the trigger inserts a work item into a small queue table.
A lightweight background worker reads from that queue table. It processes the work item - updating the materialized cycle time record for the affected ticket - and deletes it. The end-to-end latency from event insertion to materialized update is typically under 500ms.
The materialized view and the CTE we're proud of
For reporting, we maintain a materialized cycle_time_summary table that stores pre-aggregated cycle time by team and sprint. It is updated by the same background worker that processes individual ticket updates.
The tricky part is computing cycle time for tickets that are still in progress. A ticket that moved to In Progress six days ago and hasn't closed yet has a cycle time of six days and counting. Excluding in-progress tickets understates your team's current workload. Including them with a computed current duration means the average changes every minute.
The CTE we're proud of handles this with a single pass: it computes the elapsed time for closed tickets using their actual closed_at timestamp, and for open tickets using the current timestamp. The materialized view stores both numbers - closed_ticket_avg and including_open_avg - and the UI shows both, with a visual distinction between completed and in-progress-including metrics.
Keeping the background worker simple
The background worker is intentionally simple: it reads from the queue, processes one item at a time, and deletes it. It does not use a message queue, a job scheduler, or a distributed worker pool. For our current scale - several hundred teams, tens of thousands of tickets - a single Postgres-backed worker running in the same process as the API server is sufficient.
The queue table has a small index on created_at. The worker polls it every 100ms when there are items to process, and backs off to 2-second intervals when the queue is empty. It has been running continuously for eight months without a restart. The highest queue depth we've seen in production is 47 items, during a deployment where two sprints closed simultaneously. It cleared in under three seconds.
The right architecture for a problem is the simplest one that meets the requirements. Real-time cycle time did not require Kafka.
What real-time changed about how teams use the data
When cycle time updates in real time, teams start looking at it during the sprint rather than after it. This changes the behavior the metric captures.
Before real-time updates, cycle time was a retrospective metric. You looked at it on Friday afternoon and discussed what happened during the week. The feedback loop was too long to change anything in the current sprint.
After real-time updates, teams started noticing tickets that had been In Progress for four days without a commit. They started treating a rising cycle time mid-sprint as a signal to ask whether someone was blocked, not a data point to discuss at the retrospective.
The metric didn't change. The latency changed. The latency was the meaningful variable.