Skip to main content
← Back to list
01Issue
FeatureClosedSwamp CLI
Assigneesstack72

Relationships

#312 telemetry: emit child entries for follow-up action method invocations

Opened by keeb · 5/10/2026

Problem

PR #1349 (swamp-club#301) added per-method telemetry: each workflow YAML step that resolves to a model method emits a child cli_invocation entry with parentInvocationId linkage and a workflowContext block.

That change has a documented V1 limitation (see design/workflow.md):

Workflow-step granularity only. Sub-method follow-up calls inside DefaultMethodExecutionService.execute are not captured separately.

Methods can return followUpActions: FollowUpAction[] in their MethodResult. DefaultMethodExecutionService.processFollowUpActions then recursively calls this.execute(definition, followUpMethod, context) for each action (up to DEFAULT_MAX_FOLLOW_UP_DEPTH = 100). These nested invocations do not currently emit telemetry entries.

The effect on telemetry consumers: usage accounting is proportional to workflow YAML step count, not to actual method invocations performed during the run. A workflow whose first step triggers ten follow-up actions records one child entry instead of eleven.

Proposed

Emit one child cli_invocation entry per follow-up action method invocation, using the same wire shape established in PR #1349:

  • event: "cli_invocation" with properties.invocation matching the redacted swamp model method run <name> <method> shape
  • parentInvocationId set to the immediately enclosing invocation's id (chains through nested depth)
  • workflowContext propagated from the originating workflow step (workflowName, runId, jobName, stepName, modelType, driver) — stepName should remain that of the originating step; the chain of parent ids carries the nesting structure
  • Failure semantics match existing children: post-method_executing failures record real durationMs with status: "error"; pre-method_executing failures (model lookup, vault expression resolution, vary-key validation, env-var validation) synthesize durationMs: 0 with status: "error"
  • Cancellation / timeout / mid-stream throw drains in-flight follow-up invocations as error via the bridge's existing finalize path

Alternatives considered

Roll up follow-up counts onto the parent entry (e.g. a subInvocations: Record<methodName, count> field on the originating step's child).

Rejected: a parent entry is only recorded at the end of the method run. If the method throws partway through processing follow-up actions, the rolled-up dict is either lost (if the parent entry never records) or stale (if it records with partial counts). Independent per-invocation entries emit at the point of invocation and survive process termination through the existing telemetry queue, so failure-mode fidelity is strictly higher.

New event type (e.g. follow_up_invocation).

Rejected: PR #1349 deliberately chose additive optional fields on cli_invocation because the swamp-club ingest declares properties: Record<string, unknown> and additive fields ride across with no consumer coordination. A new event type would force consumer changes for a strict subset of the existing event's data.

Out of scope

Sub-operations inside the body of a single extension model method (e.g. iterations over a per-item loop) are not addressed by this issue. Capturing those requires an opt-in API on MethodContext and is a separate design.

02Bog Flow
OPENTRIAGEDIN PROGRESSCLOSED+ 1 MOREASSIGNED+ 2 MOREREVIEW

Closed

5/18/2026, 11:58:26 PM

No activity in this phase yet.

03Sludge Pulse
stack72 assigned stack725/18/2026, 11:44:19 PM
Editable. Press Enter to edit.

stack72 commented 5/18/2026, 11:58:26 PM

Deferring — risk too high for zero current consumers.

Triage and planning confirmed the telemetry gap is real, but no models currently produce followUpActions. The machinery exists in the type system and execution service, but nothing exercises it in practice — making the telemetry gap invisible today.

Implementation complexity is substantial relative to the payoff:

  • parentInvocationId chaining through nested follow-up depth requires recordChildInvocation to return (or accept) entry IDs — a return-type change on an existing interface
  • Telemetry sink must be threaded through DefaultMethodExecutionService constructor or executeWorkflow signature into processFollowUpActions
  • Integration testing requires a test-only extension model that returns followUpActions (the shell model doesn't)
  • Non-workflow callers (swamp model method run, serve, scheduled execution) need the sink to remain optional

Recommend revisiting when the first follow-up-producing model ships and the gap becomes concrete. At that point we'll understand real usage patterns and can design telemetry that fits.

Sign in to post a ripple.