Databricks
Security Notice
This extension includes AI agent skills that can modify AI assistant behavior. Review the skill files before installing.
Databricks Jobs, DLT pipelines, SQL warehouses, workspace notebooks/files, secrets, Unity Catalog, permissions, DBSQL queries, and Git Repos as Swamp models. Compose Databricks pipelines with non-Databricks resources in Swamp workflows.
v0.18 (2026.05.30.18) - MLflow + UC Model Registry models
Four new models for the ML training-to-deployment lifecycle:
- @mfbaig35r/databricks/mlflow_experiment: create, read, update, set_tag, delete, list, create_or_update. Workspace MLflow experiment lifecycle. Smoke-validated end-to-end on Free.
- @mfbaig35r/databricks/registered_model: create, read, update, delete, list, create_or_update. UC Model Registry (the successor to the legacy workspace registry). Smoke-validated end-to-end on Free.
- @mfbaig35r/databricks/model_version: create (from MLflow run), read, update_alias (production/staging/champion), delete, list. Schema-validated; end-to-end needs an MLflow run with a logged model artifact (run ml-training template first to produce one).
- @mfbaig35r/databricks/model_serving_endpoint: create, read, update_config, delete, list, invoke. PAID DATABRICKS ONLY - schema-validated, end-to-end pending until paid workspace.
ml-training template refactored to use the new models:
- mlflow_experiment.create_or_update at the top (ensures experiment exists before the notebook calls mlflow.start_run)
- registered_model + model_version + update_alias steps after the training job (commented out by default with TODO for the user to enable after a successful run produces a run_id)
- model_serving_endpoint.create step at the end (commented out, flagged paid-only)
SKILL.md updated: ML training section now documents the explicit MLflow lifecycle as Swamp steps instead of "lives inside the notebook."
Total models: 19. Closes the "MLflow models are on the roadmap" item from the v0.17 close-out. Archive grows ~17 KB (4 new models + template + skill updates).
| Argument | Type | Description |
|---|---|---|
| name | string | |
| tasks | array | |
| job_clusters? | array | |
| schedule? | object | |
| tags? | record | |
| timeout_seconds? | number | |
| max_concurrent_runs? | number | |
| queue? | object |
| Argument | Type | Description |
|---|---|---|
| job_ref | string |
| Argument | Type | Description |
|---|---|---|
| job_ref | string |
| Argument | Type | Description |
|---|---|---|
| job_ref | string |
| Argument | Type | Description |
|---|---|---|
| job_ref | string | |
| job_parameters? | record | |
| notebook_params? | record | |
| idempotency_token? | string |
| Argument | Type | Description |
|---|---|---|
| run_id | number | |
| poll_seconds | number | |
| timeout_seconds | number |
| Argument | Type | Description |
|---|---|---|
| run_id | number |
| Argument | Type | Description |
|---|---|---|
| name | string | |
| tasks | array | |
| job_clusters? | array | |
| schedule? | object | |
| tags? | record | |
| timeout_seconds? | number | |
| max_concurrent_runs? | number | |
| queue? | object |
Resources
| Argument | Type | Description |
|---|---|---|
| path | string | |
| content | string | Raw notebook source text (NOT base64) |
| language | enum | |
| overwrite | boolean |
| Argument | Type | Description |
|---|---|---|
| path | string | |
| format | enum |
| Argument | Type | Description |
|---|---|---|
| path | string | |
| recursive | boolean |
Resources
| Argument | Type | Description |
|---|---|---|
| name | string | |
| storage? | string | DBFS/UC volume path for pipeline storage. Optional on Free/serverless. |
| configuration? | record | Spark conf passed into the pipeline runtime |
| catalog? | string | Unity Catalog target catalog (use with target schema) |
| target? | string | Default schema/database for pipeline outputs |
| libraries | array | Notebooks or files that define the pipeline |
| clusters? | array | |
| continuous? | boolean | true = streaming, false = triggered (default false) |
| development? | boolean | true = dev mode (no auto-restart on failure) |
| photon? | boolean | |
| edition? | enum | DLT pricing edition; ignored on Free/serverless |
| channel? | enum | |
| serverless? | boolean | Use serverless compute (required on Databricks Free) |
| Argument | Type | Description |
|---|---|---|
| pipeline_ref | string |
| Argument | Type | Description |
|---|---|---|
| pipeline_ref | string |
| Argument | Type | Description |
|---|---|---|
| pipeline_ref | string |
| Argument | Type | Description |
|---|---|---|
| pipeline_ref | string | |
| full_refresh | boolean | |
| full_refresh_selection? | array | Subset of tables to fully refresh |
| refresh_selection? | array | Subset of tables to incrementally refresh |
| cause? | string |
| Argument | Type | Description |
|---|---|---|
| pipeline_ref | string | |
| update_id | string | |
| poll_seconds | number | |
| timeout_seconds | number |
| Argument | Type | Description |
|---|---|---|
| pipeline_ref | string |
| Argument | Type | Description |
|---|---|---|
| name | string | |
| storage? | string | DBFS/UC volume path for pipeline storage. Optional on Free/serverless. |
| configuration? | record | Spark conf passed into the pipeline runtime |
| catalog? | string | Unity Catalog target catalog (use with target schema) |
| target? | string | Default schema/database for pipeline outputs |
| libraries | array | Notebooks or files that define the pipeline |
| clusters? | array | |
| continuous? | boolean | true = streaming, false = triggered (default false) |
| development? | boolean | true = dev mode (no auto-restart on failure) |
| photon? | boolean | |
| edition? | enum | DLT pricing edition; ignored on Free/serverless |
| channel? | enum | |
| serverless? | boolean | Use serverless compute (required on Databricks Free) |
Resources
| Argument | Type | Description |
|---|---|---|
| name | string | |
| min_num_clusters | number | |
| max_num_clusters | number | |
| auto_stop_mins | number | Minutes idle before auto-stop. 0 disables auto-stop. |
| enable_photon? | boolean | |
| enable_serverless_compute | boolean | Required on Databricks Free (serverless-only). |
| warehouse_type | enum | |
| spot_instance_policy? | enum | |
| channel? | object | |
| tags? | object |
| Argument | Type | Description |
|---|---|---|
| warehouse_ref | string |
| Argument | Type | Description |
|---|---|---|
| name | string | |
| warehouse_id | string |
| Argument | Type | Description |
|---|---|---|
| warehouse_ref | string |
| Argument | Type | Description |
|---|---|---|
| warehouse_ref | string |
| Argument | Type | Description |
|---|---|---|
| warehouse_ref | string |
| Argument | Type | Description |
|---|---|---|
| warehouse_ref | string |
| Argument | Type | Description |
|---|---|---|
| warehouse_ref | string | |
| statement | string | |
| catalog? | string | |
| schema? | string | |
| wait_timeout_seconds | number | 0 = async (returns statement_id, poll with wait_statement). 5-50 = sync wait. |
| on_wait_timeout | enum | |
| row_limit? | number |
| Argument | Type | Description |
|---|---|---|
| statement_id | string | |
| poll_seconds | number | |
| timeout_seconds | number |
| Argument | Type | Description |
|---|---|---|
| statement_id | string |
| Argument | Type | Description |
|---|---|---|
| name | string | |
| min_num_clusters | number | |
| max_num_clusters | number | |
| auto_stop_mins | number | Minutes idle before auto-stop. 0 disables auto-stop. |
| enable_photon? | boolean | |
| enable_serverless_compute | boolean | Required on Databricks Free (serverless-only). |
| warehouse_type | enum | |
| spot_instance_policy? | enum | |
| channel? | object | |
| tags? | object |
Resources
| Argument | Type | Description |
|---|---|---|
| path | string | |
| content | string | Raw file content (NOT base64). UTF-8 text only in v0.6. |
| overwrite | boolean |
| Argument | Type | Description |
|---|---|---|
| path | string |
| Argument | Type | Description |
|---|---|---|
| path | string |
Resources
| Argument | Type | Description |
|---|---|---|
| scope | string | |
| initial_manage_principal? | string | Principal granted MANAGE on the scope (e.g. 'users'). |
| scope_backend_type? | enum | DATABRICKS = workspace-managed (default). AZURE_KEYVAULT only on Azure. |
| Argument | Type | Description |
|---|---|---|
| scope | string | |
| initial_manage_principal? | string | Principal granted MANAGE on the scope (e.g. 'users'). |
| scope_backend_type? | enum | DATABRICKS = workspace-managed (default). AZURE_KEYVAULT only on Azure. |
| Argument | Type | Description |
|---|---|---|
| scope_ref | string |
Resources
| Argument | Type | Description |
|---|---|---|
| scope | string | |
| key | string | |
| string_value | string | The secret value. Pass via CEL vault.get to avoid surfacing the literal: |
| Argument | Type | Description |
|---|---|---|
| scope | string | |
| key | string |
| Argument | Type | Description |
|---|---|---|
| scope | string |
Resources
| Argument | Type | Description |
|---|---|---|
| name | string | |
| comment? | string | |
| properties? | record | |
| storage_root? | string | Managed-storage URI for the catalog (s3://, abfss://, gs://). |
| Argument | Type | Description |
|---|---|---|
| catalog_ref | string |
| Argument | Type | Description |
|---|---|---|
| catalog_ref | string | |
| new_name? | string | |
| comment? | string | |
| owner? | string | |
| properties? | record |
| Argument | Type | Description |
|---|---|---|
| catalog_ref | string | |
| force | boolean |
| Argument | Type | Description |
|---|---|---|
| name | string | |
| comment? | string | |
| properties? | record | |
| storage_root? | string | Managed-storage URI for the catalog (s3://, abfss://, gs://). |
Resources
| Argument | Type | Description |
|---|---|---|
| name | string | Schema name (NOT full_name) |
| catalog_name | string | Parent catalog (e.g. 'workspace' on Free) |
| comment? | string | |
| properties? | record | |
| storage_root? | string | External storage root (managed-storage UC volumes path) |
| Argument | Type | Description |
|---|---|---|
| schema_ref | string |
| Argument | Type | Description |
|---|---|---|
| schema_ref | string | |
| new_name? | string | |
| comment? | string | |
| owner? | string | |
| properties? | record |
| Argument | Type | Description |
|---|---|---|
| schema_ref | string | |
| force | boolean |
| Argument | Type | Description |
|---|---|---|
| catalog_name | string |
| Argument | Type | Description |
|---|---|---|
| name | string | Schema name (NOT full_name) |
| catalog_name | string | Parent catalog (e.g. 'workspace' on Free) |
| comment? | string | |
| properties? | record | |
| storage_root? | string | External storage root (managed-storage UC volumes path) |
Resources
| Argument | Type | Description |
|---|---|---|
| full_name | string |
| Argument | Type | Description |
|---|---|---|
| full_name | string |
| Argument | Type | Description |
|---|---|---|
| catalog_name | string | |
| schema_name | string | |
| max_results? | number |
Resources
| Argument | Type | Description |
|---|---|---|
| name | string | |
| catalog_name | string | |
| schema_name | string | |
| comment? | string | |
| storage_location? | string | Required for EXTERNAL volumes (cloud URI). |
| Argument | Type | Description |
|---|---|---|
| volume_ref | string |
| Argument | Type | Description |
|---|---|---|
| volume_ref | string | |
| new_name? | string | |
| comment? | string | |
| owner? | string |
| Argument | Type | Description |
|---|---|---|
| volume_ref | string |
| Argument | Type | Description |
|---|---|---|
| catalog_name | string | |
| schema_name | string | |
| max_results? | number |
| Argument | Type | Description |
|---|---|---|
| name | string | |
| catalog_name | string | |
| schema_name | string | |
| comment? | string | |
| storage_location? | string | Required for EXTERNAL volumes (cloud URI). |
Resources
| Argument | Type | Description |
|---|---|---|
| object_id | string |
| Argument | Type | Description |
|---|---|---|
| object_id | string | |
| access_control_list | array |
| Argument | Type | Description |
|---|---|---|
| object_id | string | |
| access_control_list | array |
| Argument | Type | Description |
|---|---|---|
| object_id | string | Sample object id; needed because levels are returned per-object |
Resources
| Argument | Type | Description |
|---|---|---|
| full_name | string | Securable identifier: '<catalog>' for catalog, '<catalog>.<schema>' for schema, '<catalog>.<schema>.<table>' for table/volume/function |
| Argument | Type | Description |
|---|---|---|
| full_name | string | Securable identifier: '<catalog>' for catalog, '<catalog>.<schema>' for schema, '<catalog>.<schema>.<table>' for table/volume/function |
| Argument | Type | Description |
|---|---|---|
| full_name | string | |
| changes | array |
Resources
| Argument | Type | Description |
|---|---|---|
| name | string | |
| query | string | SQL text |
| warehouse_id | string | SQL warehouse the query runs against (NOT the data_source_id) |
| description? | string | |
| parent? | string | Folder path in the workspace, e.g. 'folders/<id>' |
| run_as_role? | enum | |
| tags? | array |
| Argument | Type | Description |
|---|---|---|
| query_ref | string |
| Argument | Type | Description |
|---|---|---|
| query_ref | string | |
| name? | string | |
| query? | string | |
| warehouse_id? | string | |
| description? | string | |
| run_as_role? | enum | |
| tags? | array |
| Argument | Type | Description |
|---|---|---|
| query_ref | string |
| Argument | Type | Description |
|---|---|---|
| page_size? | number |
Resources
| Argument | Type | Description |
|---|---|---|
| name | string | Swamp-side handle. Workspace path becomes /Repos/<user>/<name> unless `path` is set. |
| url | string | Git repository URL |
| path? | string | Absolute workspace path. Defaults to /Repos/<current-user>/<name>. |
| branch? | string | Initial branch to check out. Defaults to remote default. |
| Argument | Type | Description |
|---|---|---|
| repo_ref | string |
| Argument | Type | Description |
|---|---|---|
| repo_ref | string | |
| branch? | string | New branch to check out. Triggers a pull. Mutually exclusive with `tag`. |
| tag? | string | Tag to check out. Mutually exclusive with `branch`. |
| Argument | Type | Description |
|---|---|---|
| repo_ref | string |
| Argument | Type | Description |
|---|---|---|
| repo_ref | string |
| Argument | Type | Description |
|---|---|---|
| path_prefix? | string | Filter by workspace path prefix (e.g. /Repos/me) |
| next_page_token? | string |
Resources
| Argument | Type | Description |
|---|---|---|
| name | string | Workspace path, e.g. '/Shared/experiments/churn-model' |
| artifact_location? | string | Where to store run artifacts. Defaults to workspace-managed location. |
| tags? | array |
| Argument | Type | Description |
|---|---|---|
| experiment_ref | string |
| Argument | Type | Description |
|---|---|---|
| experiment_ref | string | |
| new_name? | string |
| Argument | Type | Description |
|---|---|---|
| experiment_ref | string | |
| key | string | |
| value | string |
| Argument | Type | Description |
|---|---|---|
| experiment_ref | string |
| Argument | Type | Description |
|---|---|---|
| filter? | string | MLflow filter syntax, e.g. \ |
| max_results | number | |
| view_type | enum |
| Argument | Type | Description |
|---|---|---|
| name | string | Workspace path, e.g. '/Shared/experiments/churn-model' |
| artifact_location? | string | Where to store run artifacts. Defaults to workspace-managed location. |
| tags? | array |
Resources
| Argument | Type | Description |
|---|---|---|
| name | string | Model name (NOT full_name) |
| catalog_name | string | |
| schema_name | string | |
| comment? | string | |
| storage_location? | string | External storage URI for the model versions. UC-managed if omitted. |
| Argument | Type | Description |
|---|---|---|
| model_ref | string |
| Argument | Type | Description |
|---|---|---|
| model_ref | string | |
| comment? | string | |
| owner? | string |
| Argument | Type | Description |
|---|---|---|
| model_ref | string |
| Argument | Type | Description |
|---|---|---|
| catalog_name | string | |
| schema_name | string | |
| max_results? | number |
| Argument | Type | Description |
|---|---|---|
| name | string | Model name (NOT full_name) |
| catalog_name | string | |
| schema_name | string | |
| comment? | string | |
| storage_location? | string | External storage URI for the model versions. UC-managed if omitted. |
Resources
| Argument | Type | Description |
|---|---|---|
| model_ref | string | Name of the registered_model Swamp resource (uses its full_name) |
| source | string | URI of the MLflow run's model artifact, |
| run_id? | string | Originating MLflow run ID. Recommended for lineage. |
| comment? | string |
| Argument | Type | Description |
|---|---|---|
| model_ref | string | |
| version | number |
| Argument | Type | Description |
|---|---|---|
| model_ref | string | |
| version | number | |
| alias | string | Alias to set, e.g. 'production', 'staging', 'champion' |
| Argument | Type | Description |
|---|---|---|
| model_ref | string | |
| version | number |
| Argument | Type | Description |
|---|---|---|
| model_ref | string | |
| max_results? | number |
Resources
| Argument | Type | Description |
|---|---|---|
| name | string | |
| config | object | |
| tags? | array |
| Argument | Type | Description |
|---|---|---|
| endpoint_ref | string |
| Argument | Type | Description |
|---|---|---|
| endpoint_ref | string | |
| config | object |
| Argument | Type | Description |
|---|---|---|
| endpoint_ref | string |
| Argument | Type | Description |
|---|---|---|
| endpoint_ref | string | |
| dataframe_records? | array | |
| dataframe_split? | object | |
| instances? | array | |
| inputs? | unknown |
Resources
v0.17 (2026.05.30.17) - Phase 4 smoke fixes
Real flags surfaced by Phase 4 smoke test (fresh subagent given a Stripe-charges-ingest prompt with no skill context). Two template fixes, one documentation reinforcement.
- api-ingest-authenticated template: explicit warning comments in both standalone notebook.py and inlined workflow.yaml content pointing out that mode("overwrite") silently wipes incremental data; switch to mode("append") + MERGE INTO if going incremental.
- All four templates: added an optional
uc_permissions.updatestep at the end (commented out, with TODO). SKILL.md item #6 says to ask about downstream consumers but no template had a structural slot for grants. Now they do. - agent-templates/README.md: explicit drift-warning section about the notebook.py / inlined-workflow.yaml copy drift.
No skill changes (skill correctly forked the right template and applied disciplines; flags were template content, not skill behavior). No model changes.
Phase 4 verdict: skill loop works end-to-end. Smoke surfaced three template-level improvements without hitting any skill or model bug.
v0.16 (2026.05.30.16) - documentation cross-linking
Documentation-only release. Cross-links the agent-templates library from the discoverable entry points:
- Main README: Examples section now lists agent-templates/ alongside api-ingest/
- examples/api-ingest/README.md: authenticated-API note points at examples/agent-templates/api-ingest-authenticated/
- examples/api-ingest/met-museum/README.md: "for authenticated APIs" section points at the template instead of inline instructions
AGI's README at github.com/mfbaig/artificial-general-ingestion now links back to the file-ingest-from-s3 template as the recommended orchestration story (AGI owns the ingest algorithm; Swamp pack owns declarative composition).
No code, no skill, no template changes. Archive 96.5 KB -> ~98 KB.
v0.15 (2026.05.30.15) - agent template library
Adds four canonical templates under examples/agent-templates/ that the swamp-databricks-author skill points at when generating new workflows:
- file-ingest-from-s3: CSV/Parquet/JSON from S3 to UC Bronze + Silver, with secret_scope-managed AWS creds; production schema detection delegates to AGI's ingest_to_evidence
- api-ingest-authenticated: rate-limited API ingest with Bearer auth and cursor pagination; Stripe customers as the worked example
- dbt-run: scheduled dbt deps + run + test via the repo + dbt_task models; no custom notebook
- ml-training: read training data from UC, train + MLflow autolog + optional predictions write; scikit-learn placeholder ready to swap
Each template ships README + workflow.yaml in the registry archive (the workflow.yaml has the notebook content inlined). The standalone notebook.py for each lives only on GitHub, per the safety analyzer's .ts/.json/.md/.yaml/.yml/.txt allowlist.
SKILL.md updated to cross-reference the templates explicitly. Skill triggers + activation rules unchanged.
Archive grew from 82.4 KB to ~98 KB. Models, code, and bug fixes unchanged from v0.14.
v0.14 (2026.05.30.14) - swamp-databricks-author skill
Adds the @mfbaig35r/databricks/swamp-databricks-author skill: a Claude Code skill that guides agents through generating a runnable Databricks pipeline (notebook + workflow.yaml + smoke test plan) from a task description.
The skill activates on prompts like "create a Databricks notebook that pulls X into UC", "build a Swamp workflow that ingests Y", "ingest <files|api> into Databricks". It encodes the canonical patterns this pack supports: file ingest (delegating schema detection to AGI when available), rate-limited API ingest (following the Met Museum reference), dbt runs via the repo + dbt_task models, and ML training.
Non-negotiable disciplines baked in:
- Smoke test with bounded input before scheduling
- REVIEW CHECKLIST output before commit
- create_or_update for resource steps (idempotent reruns)
- mapInPandas, not rdd.mapPartitions (serverless restriction)
- Bronze raw JSON for schema-drift immunity
- CEL vault.get for credentials; never inline secrets in YAML
Two references shipped alongside:
- api-ingest-patterns.md: cursor / Link / offset pagination and OAuth refresh patterns
- dbt-task.md: dbt_task schema, common command patterns, failure modes, smoke test approach
Archive grew from 72.7 KB to 82.0 KB. The skill and its references ship in the pulled archive alongside the models.
No model code changes from v0.13.
Added 1 skills
v0.13 (2026.05.30.13) - examples actually ship in the archive
Packaging fix. v0.12 added examples/api-ingest/ to the GitHub repo but the manifest's additionalFiles only listed README.md and LICENSE.txt, so the published archive on swamp.club did not include the examples themselves. The README that shipped referenced examples/ paths that did not resolve in pulled copies.
v0.13 lists the publishable example files in additionalFiles:
- examples/api-ingest/README.md
- examples/api-ingest/met-museum/README.md
- examples/api-ingest/met-museum/workflow.yaml
The workflow.yaml is self-contained (notebook Python and Silver SQL are inlined), so running the example from a pulled copy works.
The standalone notebook-bronze.py and silver-typed.sql files stay GitHub-only because Swamp's safety analyzer restricts the archive to .ts .json .md .yaml .yml .txt extensions. The Met README links to GitHub for those two files.
No code changes from v0.12.
v0.12 (2026.05.30.12) - examples/api-ingest reference implementation
Adds a documented reference pattern for the most common Databricks question: "how do I pull data from an external API into a Delta table?"
examples/api-ingest/README.md describes the universal architecture:
- Bronze raw_json + metadata columns (schema-drift immune)
- Silver typed extraction via SQL
- Rate-limited Spark fan-out using mapInPandas (NOT rdd.mapPartitions; serverless rejects RDD ops)
- Token bucket per partition, single Session per partition, retries on 429/5xx only
examples/api-ingest/met-museum/ is the runnable reference. The Met Museum API is public (no auth), so the workflow runs on Databricks Free out of the box.
- workflow.yaml: full Swamp workflow that ties uc_schema, notebook, job, sql_warehouse, and uc_table together end-to-end.
- notebook-bronze.py: standalone reviewable notebook source.
- silver-typed.sql: SQL transform from Bronze to Silver.
- README.md: Met-specific runtime expectations, limitations.
Validated end-to-end with MAX_OBJECTS=10 on Databricks Free serverless: TERMINATED + SUCCESS in ~30 sec wall, 10 rows in Bronze (100% success), 10 typed rows in Silver including real Met records (e.g. "One-dollar Liberty Head Coin" by James Barton Longacre).
No code changes from v0.11. No new models. Documentation-only release.
v0.11 (2026.05.30.11) - bug fix
Fix: create_or_update tombstone bug. Previously after delete,
readResource returned tombstoned data not null, so create_or_update
took the PATCH/PUT/reset path against a workspace resource that no
longer existed and 404'd. Affected: job, dlt_pipeline, sql_warehouse,
secret_scope, uc_catalog, uc_schema, uc_volume.
Fix is workspace-first reconciliation: each create_or_update now checks the workspace via GET (or list for secret_scope) before deciding which path to take. New helper existsOnWorkspace in _lib/databricks.ts. Smoke-validated on Free: uc_schema and job delete-then-create_or_update both correctly take create path now.
Side benefit: also handles out-of-band workspace deletes (someone deletes the job via UI, then create_or_update via Swamp correctly recreates instead of failing).
No breaking changes from v0.10.
v0.10 (2026.05.30.10) - Phase 3 (query + repo)
Two new models close the last two critical gaps from the v0.7 review:
@mfbaig35r/databricks/query (create, read, update, delete, list) Manages DBSQL saved queries via /api/2.0/sql/queries. The query_id that this model returns pairs with the job model's sql_task.query.query_id field.
@mfbaig35r/databricks/repo (create, read, update, pull, delete, list) Manages Databricks Git Repos via /api/2.0/repos. Real Databricks jobs reference notebooks via repo paths rather than uploading to /Shared/.
pullre-sends the stored branch in the PATCH body (Databricks rejects empty-body PATCH).
Smoke validated end-to-end on Databricks Free serverless:
- query: list (empty), create with SELECT 1 on Starter warehouse, read, list (1 visible), delete
- repo: create against github.com/mfbaig35r/swamp-databricks (public, no Git PAT setup required), read, pull (after fix), delete
No breaking changes from v0.9.
Total models: 15. End-to-end smoke validated since v0.1: job, notebook, dlt_pipeline, sql_warehouse, workspace_file, secret_scope, secret, uc_schema, uc_table, uc_volume, uc_catalog (list+schema-validated create), workspace_permissions, uc_permissions, query, repo. uc_table has no create (use sql_warehouse.run_query).
Added 2 models. updated labels
v0.9 (2026.05.30.9) - Phase 2 (permissions)
Phase 1 (uc_catalog + idempotency):
- New model: @mfbaig35r/databricks/uc_catalog (create, read, update, delete, list, create_or_update). Completes the UC top-down tree.
- New
create_or_updatemethod on: job, dlt_pipeline, sql_warehouse, secret_scope, uc_catalog, uc_schema, uc_volume. Reconcile semantics via Swamp data layer.
Phase 2 (permissions):
- New model: @mfbaig35r/databricks/workspace_permissions (get, set, update, list_levels). Workspace-level ACLs for jobs, pipelines, warehouses, notebooks, repos, queries, dashboards, alerts, experiments, registered-models, serving-endpoints, clusters, cluster-policies, instance-pools.
- New model: @mfbaig35r/databricks/uc_permissions (get, get_effective, update). UC grants on catalogs, schemas, tables, volumes, functions, external_locations, storage_credentials, models. Changes-style PATCH (add/remove per principal).
Smoke validated end-to-end on Free serverless:
- uc_schema.create_or_update fresh create + same-call patch
- workspace_permissions.list_levels (5 levels for sql/warehouses),
get on Starter Warehouse, update to grant CAN_VIEW on a job to
usersgroup - uc_permissions.get on workspace catalog, update to grant
USE_SCHEMA on a smoke schema to
account users, verify
Tombstone caveat (Phase 1): create_or_update checks Swamp data layer not workspace. After delete, the resource is tombstoned and create_or_update will mistakenly take the update path. Use create() explicitly for delete-then-recreate flows. v0.9 candidate fix: workspace-first lookup.
uc_catalog.create requires storage_root on Free (no default metastore storage); ships as schema-validated only.
No breaking changes from v0.7.
Added 2 models. updated labels
v0.8 (2026.05.30.8) - Phase 1: uc_catalog + idempotency
New model: @mfbaig35r/databricks/uc_catalog (create, read, update, delete, list, create_or_update). Completes the UC top-down tree: catalog -> schema -> table/volume.
New
create_or_updatemethod on: job, dlt_pipeline, sql_warehouse, secret_scope, uc_catalog, uc_schema, uc_volume. Reconcile semantics: if a resource with args.name exists in Swamp's data layer, take the update path (PATCH/PUT/reset); otherwise create. Closes the "create errors on second run" gap that blocked real automation.
Tombstone caveat: create_or_update checks Swamp's data layer, not
the workspace. If you call delete and then create_or_update with
the same name, the second call will hit the patch path against a
workspace resource that no longer exists and fail. For now, delete +
recreate flows should use create explicitly. v0.9 may add a
workspace-first lookup variant.
notebook.upload and workspace_file.upload already have idempotency via
overwrite: true. secret.put is already idempotent (Databricks
replaces on put). uc_table has no create endpoint so no
create_or_update.
Smoke validated on Free serverless: catalog list (3 catalogs visible), schema create_or_update fresh-call (create path) + same-call-again (patch path), delete cleanup. uc_catalog.create requires storage_root on Free (no default metastore storage available); ships as schema-validated.
No breaking changes from v0.7.
Added 1, modified 6 models
v0.7 (2026.05.30.7)
Five new models, ten total.
Secrets (workspace-level, distinct from Swamp vault):
- @mfbaig35r/databricks/secret_scope: create, list, delete.
- @mfbaig35r/databricks/secret: put, delete, list (keys only, never values). Secret values pass through to Databricks and are NEVER persisted in Swamp's data layer. Pass values via CEL ${{ vault.get(...) }}.
Unity Catalog:
- @mfbaig35r/databricks/uc_schema: create, read, update, delete, list.
- @mfbaig35r/databricks/uc_table: read, delete, list. Tables are NOT created via this API; use sql_warehouse.run_query or a job notebook task to CREATE TABLE, then this model captures the snapshot.
- @mfbaig35r/databricks/uc_volume: create, read, update, delete, list.
No breaking changes from v0.6.
Added 5 models. updated labels
v0.6 (2026.05.30.6)
New model: @mfbaig35r/databricks/workspace_file. Owns upload, read (export), and delete for workspace files (FILE object type). Distinct from @mfbaig35r/databricks/notebook (which owns NOTEBOOK objects). Use this when a downstream task references a plain file at a workspace path: sql_task.file, spark_python_task.python_file, dbt project files, etc.
sql_task end-to-end validated. Closes the v0.5 gap: workspace_file upload -> job with sql_task.file -> run -> wait_run COMPLETED + SUCCESS on Databricks Free serverless.
upload uses POST /api/2.0/workspace/import with format=AUTO and no language, then calls /api/2.0/workspace/get-status to verify and record the resulting object_type on the resource. Modern workspaces produce FILE; older workspaces may produce NOTEBOOK depending on content, and the resource reflects what was created.
No breaking changes from v0.5.
Added 1 models
v0.5 (2026.05.30.5)
Expanded job task-type schema from 3 types to 10. Added: spark_python_task, spark_jar_task, python_wheel_task, dbt_task, run_job_task, condition_task, for_each_task (Zod recursive via z.lazy). sql_task now also covers dashboard and alert variants alongside query and file. Workflows that use any of these no longer reject at the schema layer.
End-to-end smoke validation in this release covers notebook_task only (matches what the Databricks Free smoke environment can reasonably exercise). Other task types are schema-validated; the Databricks API accepts the schemas at job-create time. If you hit an edge case on a real workload, open an issue at github.com/mfbaig35r/swamp-databricks.
Not yet covered: spark_submit_task, clean_rooms_notebook_task. Add in a later release if there is real demand.
No breaking changes from v0.4.
v0.4 (2026.05.30.4)
New model: @mfbaig35r/databricks/sql_warehouse. Lifecycle (create, read, update, delete, start, stop) plus SQL Statement Execution (run_query, wait_statement, cancel_statement). run_query waits synchronously up to 50 seconds; longer-running statements get a last_statement resource so wait_statement can take over.
sql_warehouse.adopt method: register an existing workspace warehouse as a Swamp resource without creating a new one. Required pattern on Databricks Free where warehouse quotas are small and the workspace ships with a Starter Warehouse already.
This closes the DLT cleanup gap from v0.3: run_query can execute the DROP TABLE for tables materialized by a deleted pipeline.
No breaking changes from v0.3.
Added 1 models. updated labels
v0.3 (2026.05.30.3)
DLT pipeline model validated end-to-end on Databricks Free serverless. The "preview" caveat from v0.2 is removed.
Schema fix: PipelineSettings now requires
catalogwhenserverless: truevia a Zod .refine(). Surfaces the Databricks API constraint at validation time instead of as a 400 INVALID_PARAMETER_VALUE on create.README clarifies that DELETE /api/2.0/pipelines/{id} does NOT drop the Delta tables a pipeline materialized. After deleting a pipeline, tables in the target schema persist; remove them with a DROP TABLE in a SQL warehouse or notebook. A future @mfbaig35r/databricks/sql_warehouse model will make this a Swamp-native operation.
No breaking changes from v0.2.
v0.2 (2026.05.30.2)
New model: @mfbaig35r/databricks/notebook. Owns upload, read (export), and delete for workspace notebooks. The upload_notebook / delete_notebook methods and 'notebook' resource have moved off the job model into this dedicated model. Workflows that called those methods on the job model in v0.1 must update their
model:references to the new model.New model: @mfbaig35r/databricks/dlt_pipeline. Lifecycle for Delta Live Tables pipelines: create, read, update (full replace via PUT), delete, start_update (POST /pipelines/{id}/updates), wait_update (poll until COMPLETED / FAILED / CANCELED), stop. Libraries support notebook or file references. On Databricks Free serverless, set
serverless: trueon create. Note: DLT pipeline behavior on Databricks Free has not been validated end-to-end yet; treat as preview until smoke-tested.Refactor: workspace auth + bearer fetch + global args schema extracted into extensions/models/_lib/databricks.ts. All three models share one auth surface.
Breaking from v0.1: upload_notebook / delete_notebook removed from @mfbaig35r/databricks/job. Workflows that used those must switch to @mfbaig35r/databricks/notebook upload / delete.
Added 2, modified 1 models. updated labels
Initial release. Adds the @mfbaig35r/databricks/job model with create, read, update (full reset), delete, run, wait_run, and cancel_run methods. Auth via PAT (resolved through CEL: vault.get) or OAuth M2M client_credentials. Azure MSI is stubbed.
v1 task types validated by Zod: notebook_task, sql_task, pipeline_task. Other Databricks task types (spark_python_task, python_wheel_task, dbt_task, run_job_task, for_each_task, condition_task, spark_jar_task) will reject at the schema layer until added in subsequent releases.
Preview surface (will move): upload_notebook and delete_notebook methods, and the 'notebook' resource, are convenience APIs for smoke-testing the run/wait_run loop end-to-end. They will split into a dedicated @mfbaig35r/databricks/notebook model in v0.2. Workflows that call them by name will need to update the model reference at that point.
Validated end-to-end on Databricks Free (AWS serverless): upload_notebook -> create -> run -> wait_run (TERMINATED+SUCCESS) -> delete -> delete_notebook, zero orphan workspace state.
- Has README or module doc2/2earned
- README has a code example1/1earned
- README is substantive1/1earned
- Most symbols documented1/1earned
- No slow types1/1earned
- Dependencies pass trust audit2/2earned
- Has description1/1earned
- Platform support declared (or universal)2/2earned
- License declared1/1earned
- Verified public repository2/2earned