We saw a problem that a long continuous aggregate refresh call, which was invoked by the refresh policy, was keeping transaction open (as reported by the pg_stat_activity.backend_xid ) for the whole duration of the run. This manifested as autovacuum not being able to cleanup dead tuples until refresh job completes.
When looking at the API docs of add_continuous_aggregate_policy there is a following note:
Because each
batchis an individual transaction, executing a policy in batches make the data visible for the users before the entire job is executed.
Which can be understood as refresh call doing some trickery to perform multiple transactions in the same call. This contradicts stalled autovacuum behaviour we observed.
Could you please clarify how continuous aggregates refresh interacts with xid horizon PG tracks for every backend. There are different remediations, depending on what it actually does:
- If single refresh job truly makes multiple transactions, and therefore job’s
backend_xidshould be moved up as it finishes batch (which is one bucket by default), then our remediation should be reducing bucket width of the continuous aggregates. - If single refresh job keeps single transaction open for the whole duration (documentation is somewhat inaccurate) , then our remediation should be limiting
max_batches_per_executionto make it releasing transaction more often.
Any insights into these nitty-gritty details are much appreciated.