Tabular synthesis

Benchmark-certified at 95.69 % One flagship engine. One sealed contract.

One flagship engine, benchmark-certified at 95.69 % fidelity under an independent QA harness, handles mixed-type tabular synthesis today. Alternate engines for parametric small-data paths, score-based diffusion, multi-table relational synthesis, and schema-only smoke runs are exposed under the same sealed contract surface on request — ask us which one matches your data. Every run produces the same cryptographic evidence bundle, verified offline by anyone with the open-source evidence CLI.

See pricing How the evidence chain works

Benchmark-certified 95.69 %One sealed contract surfaceCryptographic evidence bundleDeterministic byte-for-byte

Anchor benchmark

Benchmark-certified at 95.69 % under an independent QA harness.

Canonical shared-split public benchmark (adult-income dataset, mixed-type columns). The flagship engine at its default configuration reaches 95.69 % overall QA score on the held-out test partition under a third-party QA harness. Per-column similarity, correlation preservation, and privacy metrics are emitted into the sealed utility report artefact on every run.

Reproducible against the same public benchmark using matched train / test splits. Full per-release benchmark certificates are published in the customer dashboard; run instructions available to enterprise customers under NDA.

How the evidence chain works

Engine

KS ↑Corr Δ ↓QA overallRuntime

Flagship engine

strongstrong95.69 %~6 min

Diffusion engine

strongstrong—~11 min

Relational engine

strongstrong—~8 min

Parametric engine

strongstrong—< 1 s

Schema engine

n/an/a—< 1 s

Source: internal benchmark harness with the same seed across engines, single-host reproducible, under an independent QA pass. The schema engine is schema-only and does not produce distributional metrics by design. Full dataset + split citation on /platform/evidence.

Engine lineup — one contract surface

Flagship today. Alternates on request.

Each engine targets a distinct regime. The platform routes to the optimal engine based on schema structure, column types, row count, and constraint requirements — or you pick one explicitly via the SDK. Every engine exposes the same sealed contract interface and produces the same multi-part evidence cryptographic hash bundle.

Flagship engine

Flagship · Benchmark-certified

Flagship · Shipping

Benchmark-certified at 95.69 % under an independent QA harness — the default for production datasets.

The flagship tabular engine handles heavy-tailed columns, high-cardinality categoricals, and mixed continuous-discrete joint distributions natively without manual pre-processing. The training, sampling, and aggregate-repair stages are proprietary; every run is driven by your sealed contract and the output is signed into the evidence bundle. Default for any dataset above 100 rows.

Production-grade handling of heavy tails, rare categoricals and mixed-type joints
Aggregate-repair stage that corrects marginal drift on roll-up tables
Deterministic: same seed + same contract → same bytes, every run
Every artefact seals into the cryptographic evidence chain
Same contract surface as the other four engines — swap without re-wiring

Best for

Mixed-type tabular, heavy-tailed financial / industrial, high-cardinality categoricals, 100 – 10 M rows

Parametric engine

Parametric · Small-data path

Roadmap

Classical parametric path for sub-100-row datasets where training a flow model would overfit.

The parametric engine is the small-data path for datasets where training a trajectory-based synthesis model is statistically impractical. Per-column marginals are learned independently and stitched back together under a correlation structure estimated from the source. Zero hyperparameter search; runs in under a second on any dataset under 10 K rows. Same contract interface and evidence bundle as every other engine.

Per-column marginal preservation with stable-tail handling
Pairwise correlation structure reproduced across numeric columns
No training step — closed-form fit completes in milliseconds
Deterministic: same seed + contract → byte-identical samples
Same sealed evidence bundle as the flagship

Best for

Under 100 rows, fully-numeric surveys, quick baselines, audit comparison against the flagship

Diffusion engine

Score-based · Minority-aware

Roadmap

Score-based denoising diffusion for tabular distributions where minority-class preservation matters most.

A score-based denoising diffusion model adapted for mixed-type tabular data. Continuous columns and categorical columns each travel their own learned forward process; the sampler runs deterministically under a contract seed. Particularly strong on datasets where minority-class preservation is critical — fraud, anomaly, imbalanced classification. Exposed through the same contract surface and the same evidence bundle.

Separate learned forward processes for continuous and categorical columns
Deterministic sampler under a sealed seed for byte-exact re-runs
Column-aware loss weighting balances mixed-type reconstruction
Strong behaviour on rare joint events other engines over-smooth
Same contract + same evidence bundle as every other engine

Best for

Minority-class-sensitive datasets, fraud / anomaly training sets, imbalanced classification

Relational engine

Relational · Multi-table

Roadmap

Cross-table synthesis that preserves referential integrity across parent–child schemas.

The relational engine synthesises multi-table datasets by walking the foreign-key graph in dependency order and conditioning each child table on the parent records already materialised. Referential integrity is guaranteed by construction — every child row's foreign key points to a parent row that was generated first. Cardinality distributions (one-to-many, many-to-many bridges) are reproduced from the source. Output passes the same constraint-report gate as single-table engines.

Foreign-key dependency order — parents first, children in reference order
Per-edge cardinality distribution reproduced from the source
Referential integrity is 100 % by construction
Same contract interface as the single-table engines
Same sealed evidence bundle on every run

Best for

Multi-table relational databases, parent-child schemas, bridge / junction tables

Schema engine

Schema-only · Smoke & demos

Roadmap

Schema-preserving fast path when you only need the shape — not the distribution.

The schema engine is the fast path: preserves the foreign-key graph, table cardinalities, and column dtype structure without training per-table models. Generates statistically-plausible values from column-type priors in well under a second. Default for integration testing, UI demos, and any case where referential integrity matters but distributional fidelity does not.

Same foreign-key dependency traversal as the relational engine
Per-dtype priors — no per-table training step
Cardinality per edge estimated from schema metadata only
Deterministic seed → same skeleton every time
Same sealed evidence bundle as every other engine

Best for

Integration tests, UI demos, schema-level API testing, data-platform smoke runs

Hybrid cascade

Cascade · Cross-engine composition

Roadmap

Compose the flagship, diffusion, and relational engines on the same sealed contract for multi-table datasets with minority-class preservation.

The hybrid cascade is not a single engine — it is the orchestration mode that runs more than one engine under one sealed contract and one evidence bundle. A common pattern: parent tables generated by the flagship, child tables conditioned by the relational engine, rare-class cohorts lifted by the diffusion engine, all stitched into a single output with one quality report and one sealed seed. The engineer writes the contract once and picks the cascade at submit time.

One sealed contract drives a multi-engine pipeline
Cross-engine constraint passing (parent keys → child conditions)
Single aggregated quality report across the cascade
Deterministic: the same cascade plan under the same seed reproduces byte-identical
One evidence bundle covers the entire cascade — the auditor sees one seal, not five

Best for

Multi-table schemas with rare-class cohorts; fraud + account datasets; medical-claims + encounter graphs

SDK

One call. One sealed contract. Verifiable output.

The SDK is a thin Python client: you hand it a source (S3 path, Snowflake table, CSV upload), a sealed contract, a seed, and the engine selector. The platform compiles sealed contract, runs the engine, emits the multi-part evidence bundle, and blocks your thread until either the quality gates pass or the fail-closed policy aborts.

Deterministic: same sealed contract + seed → byte-identical output
Fail-closed: DCR < 0.05 or MIA-AUC > 0.65 → job aborts before emitting
Offline verifiable: evidence verifier CLI re-runs the cryptographic hash chain locally
Engine-agnostic API: swap flagship → diffusion with one string

SDK reference

tabular_synthesize.py

from radmah_sdk import RadMah

client = RadMah(api_key=os.environ["RADMAH_API_KEY"])

# Flagship engine (default) — 95.69 % benchmark-certified fidelity
job = client.synthesize(
    source="s3://acme-prod/raw/customers_2026q1.parquet",
    engine="flagship",              # alternates available in the docs
    rows=1_000_000,
    contract={
        "pk":           ["customer_id"],
        "constraints":  ["balance >= 0", "age BETWEEN 18 AND 120"],
        "privacy":      {"membership_risk_ceiling": "strict"},
    },
    seed=42,                        # sealed contract + seed = byte-identical re-run
)

job.wait()                          # blocks until quality gates pass or fail-closed
print(job.evidence.utility_report)  # {'ks_median': 0.97, 'corr_frobenius': 0.02, ...}

# Verify the multi-part evidence cryptographic hash bundle offline
client.verify(job.evidence.bundle_path)
# → BundleVerified(evidence seal=..., chain_ok=True, artefact_count=9)

Measured fidelity

Quality metrics — emitted, not claimed.

Every synthetic dataset ships with quantitative fidelity and privacy measurements inside the evidence bundle. No subjective claims — the bundle proves the numbers and evidence verifier confirms the chain offline.

Distributional similarity

Strong per-column match between real and synthetic marginals

Per-column distributional similarity is measured between the real and synthetic marginals on every run. The flagship engine has been independently QA-certified at 95.69 % fidelity on a reference benchmark. The full number set is emitted into the sealed utility report artefact on every run — see /platform/evidence for a downloadable bundle.

Correlation preservation

Inter-column dependency structure preserved, with a fail-closed gate

The pairwise correlation matrix is compared between the real and synthetic slices to quantify how well inter-column dependency structure is preserved. Every run either passes the correlation gate and ships — or aborts with a fail-closed quality failure. Exact numbers land in the sealed utility report.

Privacy risk metrics

Membership-inference, attribute-inference, and disclosure metrics — all emitted

Nearest-neighbour distance distribution, membership-inference resistance, attribute-inference resistance, and disclosure-risk metrics are all measured on the synthetic output against the source and written into the privacy report. Zero PII is guaranteed by construction: synthetic rows are sampled from the learned joint, never copied from source records.

Sealed evidence bundle

Multi-artefact, cryptographically chained, verifiable offline

Every run emits a contract snapshot, reproducibility record, constraint report, utility report, privacy report, run telemetry, evidence record, artefact index, and a chain seal. The open-source verifier CLI replays the cryptographic chain end-to-end, offline, no network. Third parties can independently confirm every metric.

Enterprise compliance

Audit-ready from day one.

Cryptographic evidence bundles, privacy risk metrics, and deterministic reproducibility give your compliance team everything a DPIA, ISO 27001 data-sharing control, or SOC 2 audit needs.

Cryptographic evidence bundles

Every synthetic generation produces a signed multi-artefact bundle covering the sealed contract, determinism proof, constraint and utility reports, privacy metrics, run telemetry, evidence record, artefact index, and evidence seal. Cryptographic hashes chain every artefact; any modification breaks the chain and the evidence verifier refuses to certify.

GDPR Article 89 alignment

Synthetic data that does not relate to an identified or identifiable natural person falls outside the material scope of GDPR Articles 5–15. Our evidence bundles document the generation process and a full set of privacy-risk metrics — disclosure risk, membership-inference resistance, and attribute-inference resistance — supporting Article 89 research-exemption claims and DPIA submissions.

Audit-ready provenance chain

Evidence bundles provide the traceability, reproducibility, and integrity verification compliance auditors require. Every generation is deterministic — same sealed contract + same seed → byte-identical output on any host, any time. The audit trail is cryptographically immutable.

Zero PII in output by construction

Synthetic records are sampled from the learned joint distribution, never copied from source rows. Privacy reports measure disclosure risk, membership-inference resistance, and attribute-inference resistance for every run. When risk exceeds the enterprise-configurable thresholds, the job fails-closed before emitting any synthetic data.

Cross-border-transfer enabler

Synthetic output enables cross-border development and analytics without transferring personal data. Teams in different jurisdictions work on statistically-faithful datasets while source data remains in its sovereign storage. Removes the need for Standard Contractual Clauses on the synthetic artefact.

Deterministic reproducibility

A sealed contract plus a seed produces byte-identical synthetic output on any host, at any time. A cryptographically-strong seed-reproducible RNG with cross-platform deterministic arithmetic guarantees consistency across clusters and cloud providers. The reproducibility record in the evidence bundle records the exact RNG state at every checkpoint.

Bring a CSV. Leave with an evidence bundle.

30-minute working session: you upload (or we mock) a representative dataset, we run it through the flagship and one alternate engine, and you keep the signed multi-part evidence cryptographic hash bundle plus the utility report. All five engines are available on every plan — Free, Sovereign, and Enterprise — with tier-specific credit allocations.

View pricing Book a working session Verify a real bundle

Benchmark-certified at 95.69 % One flagship engine. One sealed contract.

Benchmark-certified at 95.69 % under an independent QA harness.

Flagship today. Alternates on request.

Flagship engine

Parametric engine

Diffusion engine

Relational engine

Schema engine

Hybrid cascade

One call. One sealed contract. Verifiable output.

Quality metrics — emitted, not claimed.

Distributional similarity

Correlation preservation

Privacy risk metrics

Sealed evidence bundle

Audit-ready from day one.

Cryptographic evidence bundles

GDPR Article 89 alignment

Audit-ready provenance chain

Zero PII in output by construction

Cross-border-transfer enabler

Deterministic reproducibility

Bring a CSV. Leave with an evidence bundle.

Explore the platform

Mock Data

Autonomous Data Scientist

Healthcare FHIR

Evidence chain