paths.base: manifest is not honored for workflows: — bundled workflows only resolve from repo root, blocking self-contained subdir layouts (sibling to #459)

Some swamp users currently rely on the community @keeb/ssh extension for programmatic SSH from models and workflows. It supports traditional key-based auth with vault-stored keys, which covers the most common case but leaves several modern SSH topologies unaddressed:

Tailscale SSH — tailscale ssh user@host uses tailnet identity (Tailscale ACL + IdP) instead of a static SSH key. This is increasingly common in production environments that have adopted Zero Trust networking. There is no clean way to express "SSH to this host via Tailscale" through @keeb/ssh today.
Bastion / jump-host topologies — ssh -J <bastion> <target> is the standard way to reach hosts that aren't directly routable. @keeb/ssh doesn't expose ProxyJump configuration.
Custom ProxyCommand — e.g., AWS SSM Session Manager (ssh -o ProxyCommand="aws ssm start-session ...") lets you SSH to EC2 instances with no inbound SSH port open. No first-class support today.

Because there is no official extension, each team that needs one of these styles either forks @keeb/ssh, writes their own bespoke model that shells out to ssh, or works around it with command/shell. This fragments the ecosystem and produces inconsistent vault handling, error reporting, and audit semantics.

Proposed solution

Provide an official @swamp/ssh extension that exposes a uniform model interface to multiple SSH transport styles. Callers specify the target host and a via (transport) selector; the model handles the underlying invocation.

Sketch of the interface:

// Traditional SSH (parity with @keeb/ssh)
{ via: "key", host: "...", user: "...", vaultKey: "..." }

// Tailscale SSH
{ via: "tailscale", host: "...", user: "..." }

// Bastion / jump host
{ via: "bastion", host: "...", user: "...", bastion: "user@bastion-host", vaultKey: "..." }

// ProxyCommand (e.g. AWS SSM)
{ via: "proxy-command", host: "...", user: "...", proxyCommand: "aws ssm start-session --target i-..." }

Methods at minimum:

run — run a single command, capture stdout/stderr/exit
script — upload and execute a script
copy — scp/rsync-style file transfer

Vault handling should match the existing swamp convention (read-secret, never write to disk; pipe into ssh-agent or use file-descriptor substitution where the underlying tool requires a path).

Alternatives considered

Continue with @keeb/ssh — community-maintained, no Tailscale or bastion support, no roadmap commitment.
Each consumer wires their own SSH model — duplicated logic, inconsistent vault handling, harder to audit fleet-wide.
Use command/shell everywhere — explicitly discouraged by swamp guidance ("command/shell is for ad-hoc one-off shell commands, NEVER for wrapping CLI tools or building integrations").

Context

Filed in the context of a project that needs a swamp posture model that audits our fleet's SSH configuration; today we'd reach for @keeb/ssh, but it can't carry a Tailscale-SSH connection — so we're either forking it or shelling out, neither of which is a good long-term answer.

02Bog Flow

Closed

5/20/2026, 5:05:57 PM

No activity in this phase yet.

03Sludge Pulse

adam assigned adam5/18/2026, 10:47:46 PM

adam commented 5/19/2026, 12:57:57 AM

Okay, it's trending toward something like this:

10 hosts: 6 web (4 prod, 2 staging), 2 db (prod, via bastion ProxyJump), 2 edge (prod, Tailscale).

Fleet definition — `fleets/awesome.yaml`

modelType: "@swamp/ssh-fleet"
modelName: awesome
globalArguments:
  name: awesome
  transport:
    kind: ssh
    user: deploy
    identityFile: ~/.ssh/awesome_ed25519
    knownHostsFile: ~/.ssh/awesome_known_hosts
    strictHostKeyChecking: accept-new
    connectTimeoutSec: 10
    controlMaster: { enabled: true, persistSec: 600 }
  defaultParallel: 8
  captureOutput: true
  hosts:
    - name: web-1
      address: web-1.prod.example.com
      tags: [web, prod]
      attrs: { region: us-east-1, role: api }
    - name: web-2
      address: web-2.prod.example.com
      tags: [web, prod]
      attrs: { region: us-east-1, role: api }
    - name: web-3
      address: web-3.prod.example.com
      tags: [web, prod]
      attrs: { region: us-east-1, role: api }
    - name: web-4
      address: web-4.prod.example.com
      tags: [web, prod]
      attrs: { region: us-east-1, role: api }
    - name: web-5
      address: web-5.staging.example.com
      tags: [web, staging]
      attrs: { region: us-east-1, role: api }
    - name: web-6
      address: web-6.staging.example.com
      tags: [web, staging]
      attrs: { region: us-east-1, role: api }
    - name: db-1
      address: 10.0.5.21
      tags: [db, prod]
      attrs: { region: us-east-1, role: postgres }
      transport:
        proxyJump: deploy@bastion.prod.example.com
    - name: db-2
      address: 10.0.5.22
      tags: [db, prod]
      attrs: { region: us-east-1, role: postgres }
      transport:
        proxyJump: deploy@bastion.prod.example.com
    - name: edge-1
      address: edge-1                                    # short tailnet name
      tags: [edge, prod]
      attrs: { region: eu-west-1, role: edge }
      transport: { kind: tailscale, user: deploy }
    - name: edge-2
      address: edge-2
      tags: [edge, prod]
      attrs: { region: eu-west-1, role: edge }
      transport: { kind: tailscale, user: deploy }

Apply + warm masters

swamp model apply -f fleets/awesome.yaml
swamp model run awesome open --json   # opens CM for the 8 ssh hosts; no-op for edge-1/edge-2

After apply, ten host-* resources exist, each tagged {fleet: awesome}.

1. `exec uptime` on every prod host (mix of ssh + tailscale)

swamp model run awesome exec \
  --arg hosts='"prod" in host.tags' \
  --arg command='uptime' \
  --json

CEL matches web-1..4, db-1..2, edge-1..2 (8 hosts). web-5 / web-6 are staging and are skipped.

Resources written by this single call, one per matched host:

run-exec-web-1   run-exec-web-2   run-exec-web-3   run-exec-web-4
run-exec-db-1    run-exec-db-2
run-exec-edge-1  run-exec-edge-2

Edge-1's record:

{
  "method": "exec",
  "host": "edge-1",
  "transport": "tailscale",
  "startedAt": "2026-05-18T17:42:11.108Z",
  "finishedAt": "2026-05-18T17:42:11.864Z",
  "durationMs": 756,
  "exitCode": 0,
  "signal": null,
  "stdout": " 17:42:11 up 12 days,  4:11,  0 users,  load average: 0.08, 0.10, 0.09\n",
  "stderr": "",
  "args": { "command": "uptime" },
  "argv": ["tailscale", "ssh", "deploy@edge-1", "--", "uptime"]
}

vs. the ssh equivalent for web-1:

{
  "method": "exec",
  "host": "web-1",
  "transport": "ssh",
  "exitCode": 0,
  "stdout": " 17:42:11 up 6 days, 8:02,  0 users, load average: 0.42, 0.51, 0.58\n",
  "argv": [
    "ssh",
    "-o", "ControlMaster=auto",
    "-o", "ControlPath=/run/user/1000/swamp-ssh/awesome/8f3c…sock",
    "-o", "ControlPersist=600",
    "-i", "/home/adam/.ssh/awesome_ed25519",
    "-p", "22",
    "deploy@web-1.prod.example.com",
    "--", "uptime"
  ]
}

2. `copy nginx.conf` to all us-east-1 web hosts

swamp model run awesome copy \
  --arg hosts='"web" in host.tags && host.attrs.region == "us-east-1"' \
  --arg src=./nginx.conf \
  --arg dst=/etc/nginx/nginx.conf \
  --arg direction=to \
  --json

CEL matches all six web-* hosts (prod + staging are both us-east-1). Six per-host resources:

run-copy-web-1  run-copy-web-2  run-copy-web-3
run-copy-web-4  run-copy-web-5  run-copy-web-6

web-1's record (scp reuses the master socket; the argv shows it):

{
  "method": "copy",
  "host": "web-1",
  "transport": "ssh",
  "exitCode": 0,
  "durationMs": 412,
  "args": { "src": "./nginx.conf", "dst": "/etc/nginx/nginx.conf", "direction": "to" },
  "argv": [
    "scp",
    "-o", "ControlPath=/run/user/1000/swamp-ssh/awesome/8f3c…sock",
    "-i", "/home/adam/.ssh/awesome_ed25519",
    "-P", "22",
    "./nginx.conf",
    "deploy@web-1.prod.example.com:/etc/nginx/nginx.conf"
  ]
}

Same thing from a workflow

In YAML the selector is just a string — no quoting acrobatics, and swamp's evaluator leaves it alone because there's no ${{ … }} to trigger it:

# workflows/reload-prod-web.yaml
name: reload-prod-web
on: manual
jobs:
  reload:
    steps:
      - name: ship config
        model: awesome
        method: copy
        arguments:
          hosts: '"web" in host.tags && "prod" in host.tags'
          src: ./nginx.conf
          dst: /etc/nginx/nginx.conf
          direction: to

      - name: reload nginx
        model: awesome
        method: exec
        arguments:
          hosts: '"web" in host.tags && "prod" in host.tags'
          command: systemctl reload nginx
          sudo: true

A few selector variants worth knowing

host.transport == "tailscale"                            # only edge-1, edge-2
host.attrs.role == "postgres"                            # only db-1, db-2
"prod" in host.tags && host.transport == "ssh"           # everything prod except the edges
host.name.startsWith("web-") && size(host.tags) > 1      # all web hosts (all have 2 tags)
host.attrs.region != "us-east-1"                         # only the two tailscale edges

adam commented 5/19/2026, 12:58:45 AM

You can override everything on a per method run basis - so it all works from the CLI, and you could get by without even static definitions of hosts if you wanted

bixu commented 5/19/2026, 8:22:06 AM

The YAML pattern here (for the model/workflows) looks great. On thing to note for our use-case is that we'd need to support ssh-agent and password as well as keyfiles.

evrardjp commented 5/19/2026, 9:47:58 AM

I like this idea of making the transport more flexible.

Just curious, if we make the transport quite flexible, what's the point of tying this to SSH? Would it be bad to make it even more generic?

Many configuration management tools in the past did their own transport mechanism (with different level of success) that we could tap into...

If you are running things locally, your transport might simply be local execution. If you run something over RPC (or any non-ssh tunnel), you might want to use that as transport mechanism. Those should ideally be implementable from a base, reusable, transport mechanism.. don't you think?

At the same time, a generic transport might become too open-ended, compared to an "ssh transport system with variants" ...

adam commented 5/19/2026, 8:17:11 PM

@bixu - should be no problem to have ssh-agent. password too, I suppose (although I have to admit, I don't know why you would do that - but who am I to judge?)

@evrardjp - I think it might be a little to wacky to move beyond ssh as the transport. But once it exists, it would be easy enough to port the pattern.

bixu commented 5/20/2026, 10:35:12 AM

I don't know why you would do that -- we are digging out of the brownest of brownfield systems at $job 😅

adam commented 5/20/2026, 5:05:52 PM

shipped as @swamp/ssh

Should be able to see all the issues I created by a filter "submitted by me"

Ability to change the email address associated with my Swamp Club Account

feat: giga-swamp phase 5 — CLI output + namespace management commands

CI review jobs use two-dot diff that includes files the PR never touched

paths.base: manifest is not honored for workflows: — bundled workflows only resolve from repo root, blocking self-contained subdir layouts (sibling to #459)

Lab profanity filter rejects legitimate CLI flag tokens via substring match

Sign and notarize the swamp macOS binary

Add platform type to issue-lifecycle extension model Zod schema

fix: datastoreSetupExtension() ignores namespace config on initial migration push/pull

Remote execution: orchestrator/worker fan-out (replaces execution drivers)

swamp datastore sync --push creates global .datastore-index.json ignoring namespace config

feat: S3/GCS extension namespace-scoped sync support

Copy explicitGlobalArgs before mutation in resolveOrCreateDefinition

vault.get() expressions in extension model globalArguments are not resolved at runtime

swamp-issue skill should scrub secrets and org-specific data before submission

workflow validate: trim stale 'skipped' label from model_not_found warning

Add pi coding agent support

hashicorp-vault should read token from env

swamp-extension adversarial review skill needs mandatory mechanical verification checklist

feat: giga-swamp phase 6 — Namespace-scoped sync

swamp workflow validate emits misleading "Extension failed to load" warning when type resolves locally

Add issue search/list command to discover existing issues

Support vault-resolved private key content in transport auth (not just file paths)

Workflow engine resolves extension methods against base type, ignoring extension-registered methods

Per-model LockTimeoutError at 60s causes cascading failures under concurrent access

Persistent, queryable workflow runs (status / cancel from any shell)

swamp repo upgrade: ERR_SQLITE_ERROR 'attempt to write a readonly database' during extension catalog schema migration

workflow validate: fail on references to unknown model instances (typo'd modelIdOrName)

feat: giga-swamp phase 4 — CEL cross-namespace queries

Docs: document the extension push adversarial-review gate

vault://local_encryption token does not round-trip correctly for GCP OAuth2 access tokens

swamp issue: add ability to edit issue title and body after submission

@swamp/gcp/iam: add WIF pool, provider, service account, and binding support

Support vault-sourced identity keys

copy method reports success when scp exits non-zero (e.g. 255)

Docs: TLS behind inspecting proxies / private CAs (system trust store, DENO_CERT, SSL_CERT_FILE)

Extension quality/adversarial-review: add a 'published-surface hygiene' check for real infra identifiers

Feed-post scoring is a direct domain write, not a consumer of feed_post_approved telemetry

workflow validate silently PASSES steps whose model type is a pulled extension (step-inputs skipped = false pass)

extension quality fails to resolve bare specifiers — contradicts fmt no-import-prefix rule

Allow global arguments in direct type execution (workflow fan-out)

Bundled Deno does not honor the OS/system CA trust store

Gator-approved feed post did not trigger Discord activity or profile points

username_metrics projection backfill does not trigger re-scoring (stale UserScore for dormant users)

Enforce adversarial review gate before extension push

support git forge / web namespaces for collectives

Report type filtering in report search

extension search: empty results from CLI despite known extensions

workflow approve/resume cannot find suspended runs

vault annotate --url fails with query params on @swamp/aws-sm

datastore compact VACUUM fails with ERR_SQLITE_ERROR

workflow approve/resume cannot find suspended run when using S3 datastore

reindexByUsername re-strands pre-association history and wipes sign_in_dates

Telemetry never retroactively credits a device's pre-association history

Docs: document swamp doctor secrets in manual reference doctor.md

Docs: document 'swamp workflow resume --input' in manual reference

Cloudflare codegen: manifest version bumps on every regeneration (README not deno-fmt-clean)

Support dynamic host discovery from external sources

feat: giga-swamp phase 3 — Path resolver + per-namespace locking

@swamp/ssh exec: string host selector only matches 'all', ignores host names and tags

Add integration test for sensitive-arg guard on lazily-loaded extension types (follow-up to #480)

Remediate existing definitions holding cleartext sensitive global arguments (follow-up to #480)

Docs: document refusal of literal sensitive global arguments (follow-up to #480)

Docs: update extension-trust reference for swamp-only default + lockfile version pinning (swamp-club#465)

feat: giga-swamp phase 2 — Catalog schema v4 + repository interface

Support for Custom CA's

Cloudflare: support vault expressions for API credentials instead of env-var-only auth

GCP: support vault expressions for credentials instead of env-var-only auth

AWS: support vault expressions for credentials instead of env-var/SDK-chain-only auth

DigitalOcean: support vault expressions for the API token instead of DO_API_TOKEN env var

swamp model get does not redact `sensitive: true` fields (logs/reports/storage do)

Support vault expressions for API token instead of env var

UAT tests for manual_approval workflow commands

Document manual_approval workflow step type and suspend/approve/resume flow

Stale extension bundles break after swamp upgrade

Support --input flags on workflow resume for elevated permissions and runtime overrides

Add HTTP approval endpoints to swamp serve for manual_approval steps

feat: giga-swamp phase 1 — Namespace value object + config

swamp serve scheduled workflows do not load repo extension registries

ci: aws-check and gcp-check jobs take ~30min — rethink whether full model type-checking is needed per PR