14 encrypted source adapters

Bring your data from where it already lives.

Warehouses, relational databases, object storage, and the HuggingFace Hub — browse, sample, and import in four steps, with credentials vaulted by construction. The same four calls work for every adapter, so the engineer who wires Snowflake on Monday wires Azure Blob on Wednesday in the same afternoon.

Every connector runs through an encrypted secret vault, refuses plain-text connections where the driver supports TLS, and sandboxes the test session to a single read-only probe against the source. Import jobs land data under your tenant artefact prefix — cross-tenant reads and writes are structurally impossible.

Connectors guideDrive a connector from the SDK
Adapters
14
Families
4
Vault
Encrypted
Plain secrets
0

Fourteen adapters across four families.

Same browse / test / import surface across every adapter — no per-source quirks to learn.

Warehouse

Snowflake
BigQuery
Databricks
Redshift
Hive

Relational

Postgres
MySQL
MariaDB
MSSQL
Oracle

Object storage

Amazon S3
Google GCS
Azure Blob

Dataset hub

HuggingFace

Four steps from creds to dataset.

Step 1

Create connector

Enter creds in the UI or POST the config — inline passwords are auto-hoisted to the encrypted vault.

Step 2

Test connection

We open a short-lived test session against the source and surface the exact error if it fails.

Step 3

Browse

Per-connector dispatcher returns up to 500 tables / objects, paged and filtered.

Step 4

Import to dataset

Run an import job — lands in your tenant artefact prefix, ready for Synthesize / ADS / Mock.

# 1. Create a Postgres connector — passwords auto-hoist
POST /v1/client/connectors
{
  "name": "warehouse-prod",
  "type": "postgres",
  "config": {
    "host": "warehouse.acme.internal",
    "database": "analytics",
    "user": "radmah_ro",
    "password": "p@ssw0rd-will-be-vaulted"
  }
}
→ 201 Created   secret_ref=cs_…   no plain secret in run-state

# 2. Test it
POST /v1/client/connectors/{id}/test     → 200 OK

# 3. Browse tables
POST /v1/client/connectors/{id}/browse
{ "schema": "public", "limit": 500 }
→ [
   {"name": "customers", "rows_estimate": 482931},
   {"name": "orders",     "rows_estimate": 1923810},
   …
]

# 4. Import as a dataset
POST /v1/client/connectors/{id}/import
{ "source": "public.customers", "row_limit": 200000,
  "target_dataset_name": "customers-snapshot-2026-04" }
→ 202 Accepted    job_id=imp_…
Same four calls — every adapter

One integration pattern. Fourteen data sources. Zero per-source learning curve.

A connector is a four-call commitment: create it, test it, browse it, import from it. The same four calls work whether the source is a petabyte warehouse in Snowflake, a legacy Oracle database behind a VPN, an S3 bucket of Parquet files, or a public dataset on HuggingFace. A junior engineer wires a new source in under an hour — a senior one wires five in an afternoon.

Credentials never live in plain text. POST a password inline and it is moved to an encrypted vault before the connector row is committed; from that moment on every test, browse, and import call resolves the secret through a tenant-scoped read, audit-logged by caller and connector ID, and the value itself is never returned by any API. That is the posture your CISO asked for last quarter, shipped on the default path.

Imports land under your tenant's artefact prefix. Cross-tenant reads and writes are structurally impossible — not a policy promise. Sample-first mode (1 000 rows) lets the data team eyeball the schema before committing to a 200 000-row import that bills real credits.

  • 4-call contract, identical on every source
  • Secrets vaulted on POST, never logged
  • Tenant-scoped artefact landing
  • Sample-first imports before big runs

Who wires what

Four families, four jobs each team finally gets to delete.

A connector is the last mile of every data project. Done badly, it is the reason integrations slip. Done once, right, across every source — it is what lets the rest of the platform move at the speed the sales team promised.

Warehouses

Snowflake · BigQuery · Databricks · Redshift · Hive

A data-platform team already paid for a warehouse; the problem is not storage, it is getting a representative sample out of it into a training or partner environment without a six-week ticket through DataOps. Point a warehouse connector at the right database, pick the right schema, and import the sample you actually want. The run lands under your tenant artefact prefix — no copy of production data sitting in someone's laptop download folder.

Technical: role-scoped warehouse credentials vaulted on POST, read-only test mode, up to 500 tables per browse page, sample-first imports (1 000 rows) before the committed big run.

Relational

Postgres · MySQL · MariaDB · MSSQL · Oracle

Every enterprise has a legacy RDBMS that somehow runs half the business. The integrations engineer's job is rarely the exciting part; it is firewall rules, VPN tickets, rotating service accounts, and explaining to security why the connection string has a password in it. Our relational family removes the last problem completely — the password is vaulted before the row commits — and behaves identically across all five dialects so the engineer does not re-learn the wheel for every acquisition.

Technical: TLS-required drivers, SELECT 1 test probe, schema browse, parameterised imports, per-call audit-logged vault resolution.

Object storage

Amazon S3 · Google GCS · Azure Blob

If your organisation already standardised on a lake and a cloud, we do not ask you to change either. The object-storage family imports directly from bucket-and-prefix paths, with the same four-call contract and the same vaulted credentials. Parquet, CSV, JSON-Lines, and TSV all import cleanly; schema inference is first-class so you do not have to hand-maintain a source-of-truth column list alongside the data.

Technical: IAM-role or access-key auth, multipart-download, per-object streaming, sample-first import before the committed full scan.

Dataset hub

HuggingFace

The fastest path from a published research dataset to a sealed, evidence- chained training run. Paste a dataset ID, pick a split, and import — private datasets supported with a scoped token, public datasets work out of the box. The same tenant artefact prefix and the same evidence record apply, so the researcher who just imported the benchmark does not have to re-learn the pipeline to make it enterprise-grade.

Technical: dataset-ID resolution, split-aware browse, token-vaulted imports, split-scoped sampling.

The encrypted secret vault, in plain English.

Connector secrets are not optional to encrypt. They are encrypted by construction — the auto-hoist path enforces it even if the caller forgets.

Fernet at rest

Per-tenant key, AES-128-CBC + HMAC-cryptographic hash. No password is ever stored in the connector row.

Auto-hoist on POST

If you accidentally POST a plain password it is moved to the vault before the row is committed.

Scoped use

The vault entry is read only by the connector worker, only when running an explicit test/browse/import.

Audit log

Every secret resolve recorded with caller, connector ID, and reason — never the value.

Other protections built in.

TLS-required by default

PG/MySQL/MSSQL/Oracle drivers refuse plain-text connections.

Read-only test mode

Test session uses a single SELECT 1 — never writes to the source.

Tenant artefact prefix

Imported rows land under the calling tenant's artefact path. No cross-tenant write possible.

Sample-first imports

Sample mode (1k rows) lets you eyeball the schema before kicking off a 200k row import.

Wire one in an afternoon.

Adapter not on the list? We add new connectors on a 2-week SLA when there’s a named customer behind the request — drop us a line and we’ll scope it.

Connectors guideRequest a new adapter