Skip to main content
← Back to list
01Issue
FeatureShippedExtensions
Assigneesstack72

Relationships

#435 Datastore extension benchmark suite for S3 and GCS

Opened by stack72 · 5/24/2026· Shipped 5/25/2026

Problem

During the Phase 2 datastore overhaul (#379, #434), we ran extensive benchmarks against MinIO (S3) and real GCS to validate performance. These benchmarks were ad-hoc — run manually by agents, with results pasted into conversations. There's no repeatable, automated way to run them again when making future changes to either extension.

We need a benchmark suite that lives in the swamp-extensions repo and can be run on demand to catch performance regressions and validate improvements.

Proposed Solution

Add a benchmark harness in the swamp-extensions repo that covers both S3 and GCS datastore extensions. The harness should:

  1. Be runnable via a single command (e.g., deno task benchmark:s3, deno task benchmark:gcs)
  2. Support configurable backends (MinIO endpoint for S3, GCS emulator or real bucket for GCS)
  3. Output a structured results table (or JSON) for easy comparison
  4. Seed test data automatically (1000 files across 50 models, ~5MB total)

Benchmark Scenarios

Each scenario measures per-phase timing using the trace env var (SWAMP_S3_SYNC_TRACE=1 / equivalent for GCS):

Push benchmarks

Scenario What it measures
Push 1000 files (cold) walk, upload, writeback, total wall — baseline bulk push
Push 1 modified / 1000 walk, upload, writeback, total wall — scoped push via dirty sidecar (walk should be ~3-5ms)
No-op push fastpath, total wall — sidecar fast path overhead (should be <10ms for the check)
Push after bulk import (>200 dirty paths) walk, total wall — verifies dirty cap triggers full walk fallback

Pull benchmarks

Scenario What it measures
Pull 1000 files (cold) pullIndex, download, total wall — baseline bulk pull
Pull 1 changed / 1000 walk, download, total wall — incremental pull
No-op pull fastpath, total wall — sidecar fast path overhead
Scoped pull 1 model / 50 models index read, walk, download, total — partition file vs monolithic (should be ~3-4x faster)

Writeback benchmarks

Scenario What it measures
Writeback with partitions (parallel) writeback phase only — validates partition writes are parallelized

Cross-version comparison

The harness should support running the same scenarios against two different extension builds:

  • OLD: current published extension via swamp extension pull
  • NEW: local build via swamp extension source add

Each runs in a separate repo to avoid interference. Results are printed side-by-side with delta percentages.

Infrastructure

  • S3: MinIO running locally (docker). The benchmark script should start/stop MinIO automatically or accept a pre-running endpoint.
  • GCS: GCS emulator (e.g., fsouza/fake-gcs-server) or real bucket with credentials. Accept endpoint/bucket via env vars.
  • Isolation: Each benchmark run uses a unique bucket prefix. Cleanup runs automatically after benchmarks complete.
  • Runs: Each scenario should run 3-5 trials and report min/avg/max to reduce noise.

Acceptance Criteria

  • deno task benchmark:s3 runs all scenarios against MinIO and prints results table
  • deno task benchmark:gcs runs all scenarios against GCS emulator and prints results table
  • Cross-version comparison mode works (old vs new extension side-by-side)
  • Results are deterministic enough to detect a 2x regression in any phase
  • Can be run in CI (optional — not required for initial implementation, but the harness should be CI-friendly)
02Bog Flow
OPENTRIAGEDIN PROGRESSSHIPPED+ 1 MOREASSIGNED+ 2 MOREREVIEW+ 2 MOREPR_LINKEDCOMPLETE

Shipped

5/25/2026, 2:24:02 PM

Click a lifecycle step above to view its details.

03Sludge Pulse
stack72 assigned stack725/24/2026, 10:06:20 PM

Sign in to post a ripple.