Failure burst
Count error events per scope. Emit a single failure.burst signal when the threshold trips, so cascade failures escalate instead of disappearing into the log.
FailureBurstDetector subscribes to error events on the bus and counts them per (scope_kind, scope_id). When a scope’s counter reaches threshold, it publishes one failure.burst event and resets the counter so the burst can re-fire on a subsequent streak.
from entorin.burst import FailureBurstDetector
detector = FailureBurstDetector(bus, threshold=5)
# Detector now subscribes itself to the default error events.
# Wire a handler if you want to act on bursts:
bus.subscribe("failure.burst", lambda e: page_oncall(e.payload))
Default scopes
Six error events are watched out of the box:
| Event | Scope kind | Scope id source |
|---|---|---|
agent.call.error | agent | payload agent_id |
tool.call.error | tool | payload tool_id |
llm.call.error | model | payload model_id |
sandbox.exec.error | sandbox | payload label |
skill.error | skill | payload skill_name |
policy.violation | policy | constant "violation" |
Override or extend by passing extractors= to the constructor. Pass an empty dict to disable defaults entirely.
Payload
{
"scope_kind": "tool",
"scope_id": "fs.read",
"error_count": 5,
"window_size": 5, # mirrors threshold
}
The window_size field mirrors threshold so consumers can report “5 errors in the last 5 attempts” without recomputing it.
Counting model
Window semantics are count-based, not time-based: simpler, deterministic for tests, no clock dependency. Counter resets after each burst emission.