Hi Timescale Community,
We’re running into a recurring deadlock issue in our TimescaleDB setup and would appreciate any advice or proven solutions.
Background:
We use TimescaleDB for storing raw machine data, with regular INSERTs via Kafka Connect (JDBC Sink). We also have a retention policy in place that uses drop_chunks to remove old data.
Additional Info:
-
TimescaleDB version: 2.21.4
-
PostgreSQL version: 17.7
Problem:
On 15./16.10.2025, we experienced a significant outage due to deadlocks between our regular INSERTs and the Timescale retention job. The issue persisted until we restarted the connector. Here’s a summary of what we observed:
-
Error:
ERROR: deadlock detected Detail: Process 2027782 waits for ShareLock on transaction 158691927; blocked by process 2082871. Process 2082871 waits for ShareLock on transaction 158691928; blocked by process 2027782. Where: while locking tuple (0,4) in relation "dimension_slice" -
Stacktrace:
The deadlock occurs between INSERTs into our hypertable and the retention job (policy_retention) that callsdrop_chunks. Both operations seem to lock Timescale metadata tables, especially_timescaledb_catalog.dimension_slice. -
Impact:
All subsequent transactions failed withcurrent transaction is aborted, commands ignored until end of transaction blockuntil the connector was restarted.
Analysis:
-
The deadlock happens even though
drop_chunksonly deletes old chunks, because both INSERTs and retention jobs need locks on shared catalog tables. -
According to Timescale docs,
drop_chunksrequires exclusive locks, and can fail if there’s concurrent access—even if the data ranges don’t overlap.
Questions for the Community:
-
Has anyone else experienced deadlocks between INSERTs and retention jobs? How did you resolve them?
-
Are there best practices for scheduling retention jobs to avoid these conflicts?
-
Is there a way to make
drop_chunksless aggressive in locking, or to prioritize INSERTs? -
Would switching to custom retention jobs with tuned
lock_timeoutand retries be more robust than the built-in policy? -
Any other monitoring or mitigation strategies you recommend?
Thanks in advance for your help! Any pointers or shared experiences would be greatly appreciated.
Best,
Thomas