Benchmark-certified at 95.69 % Five engines. One sealed contract.
Anchor benchmark
Benchmark-certified at 95.69 % under an independent QA harness.
Canonical shared-split public benchmark (adult-income dataset, mixed-type columns). The flagship engine at its default configuration reaches 95.69 % overall QA score on the held-out test partition under a third-party QA harness. Per-column similarity, correlation preservation, and privacy metrics are emitted into the sealed utility report artefact on every run.
Reproducible against the same public benchmark using matched train / test splits. Full per-release benchmark certificates are published in the customer dashboard; run instructions available to enterprise customers under NDA.
Download a real 9-artefact bundleSource: internal benchmark harness with the same seed across engines, single-host reproducible, under an independent QA pass. The schema engine is schema-only and does not produce distributional metrics by design. Full dataset + split citation on /verify.
Five engines — one contract
The tabular engine lineup.
Each engine targets a distinct regime. The platform routes to the optimal engine based on schema structure, column types, row count, and constraint requirements — or you pick one explicitly via the SDK. Every engine exposes the same sealed contract interface and produces the same 9-artefact BLAKE3 bundle.
Flagship engine
Flagship · Benchmark-certifiedBenchmark-certified at 95.69 % under an independent QA harness — the default for production datasets.
The flagship tabular engine handles heavy-tailed columns, high-cardinality categoricals, and mixed continuous-discrete joint distributions natively without manual pre-processing. The training, sampling, and aggregate-repair stages are proprietary; every run is driven by your sealed contract and the output is signed into the evidence bundle. Default for any dataset above 100 rows.
- Production-grade handling of heavy tails, rare categoricals and mixed-type joints
- Aggregate-repair stage that corrects marginal drift on roll-up tables
- Deterministic: same seed + same contract → same bytes, every run
- Every artefact seals into the cryptographic evidence chain
- Same contract surface as the other four engines — swap without re-wiring
Best for
Mixed-type tabular, heavy-tailed financial / industrial, high-cardinality categoricals, 100 – 10 M rows
Parametric engine
Parametric · Small-data pathClassical parametric path for sub-100-row datasets where training a flow model would overfit.
The parametric engine is the small-data path for datasets where training a flow-matching model is statistically impractical. Per-column marginals are learned independently and stitched back together under a correlation structure estimated from the source. Zero hyperparameter search; runs in under a second on any dataset under 10 K rows. Same contract interface and evidence bundle as every other engine.
- Per-column marginal preservation with stable-tail handling
- Pairwise correlation structure reproduced across numeric columns
- No training step — closed-form fit completes in milliseconds
- Deterministic: same seed + contract → byte-identical samples
- Same sealed evidence bundle as the flagship
Best for
Under 100 rows, fully-numeric surveys, quick baselines, audit comparison against the flagship
Diffusion engine
Score-based · Minority-awareScore-based denoising diffusion for tabular distributions where minority-class preservation matters most.
A score-based denoising diffusion model adapted for mixed-type tabular data. Continuous columns and categorical columns each travel their own learned forward process; the sampler runs deterministically under a contract seed. Particularly strong on datasets where minority-class preservation is critical — fraud, anomaly, imbalanced classification. Exposed through the same contract surface and the same evidence bundle.
- Separate learned forward processes for continuous and categorical columns
- Deterministic sampler under a sealed seed for byte-exact re-runs
- Column-aware loss weighting balances mixed-type reconstruction
- Strong behaviour on rare joint events other engines over-smooth
- Same contract + same evidence bundle as every other engine
Best for
Minority-class-sensitive datasets, fraud / anomaly training sets, imbalanced classification
Relational engine
Relational · Multi-tableCross-table synthesis that preserves referential integrity across parent–child schemas.
The relational engine synthesises multi-table datasets by walking the foreign-key graph in dependency order and conditioning each child table on the parent records already materialised. Referential integrity is guaranteed by construction — every child row's foreign key points to a parent row that was generated first. Cardinality distributions (one-to-many, many-to-many bridges) are reproduced from the source. Output passes the same constraint-report gate as single-table engines.
- Foreign-key dependency order — parents first, children in reference order
- Per-edge cardinality distribution reproduced from the source
- Referential integrity is 100 % by construction
- Same contract interface as the single-table engines
- Same sealed evidence bundle on every run
Best for
Multi-table relational databases, parent-child schemas, bridge / junction tables
Schema engine
Schema-only · Smoke & demosSchema-preserving fast path when you only need the shape — not the distribution.
The schema engine is the fast path: preserves the foreign-key graph, table cardinalities, and column dtype structure without training per-table models. Generates statistically-plausible values from column-type priors in well under a second. Default for integration testing, UI demos, and any case where referential integrity matters but distributional fidelity does not.
- Same foreign-key dependency traversal as the relational engine
- Per-dtype priors — no per-table training step
- Cardinality per edge estimated from schema metadata only
- Deterministic seed → same skeleton every time
- Same sealed evidence bundle as every other engine
Best for
Integration tests, UI demos, schema-level API testing, data-platform smoke runs
SDK
One call. Five engines. Verifiable output.
The SDK is a thin Python client: you hand it a source (S3 path, Snowflake table, CSV upload), a sealed contract, a seed, and the engine selector. The platform compiles sealed contract, runs the engine, emits the 9-artefact bundle, and blocks your thread until either the quality gates pass or the fail-closed policy aborts.
- Deterministic: same sealed contract + seed → byte-identical output
- Fail-closed: DCR < 0.05 or MIA-AUC > 0.65 → job aborts before emitting
- Offline verifiable: evidence verifier CLI re-runs the BLAKE3 chain locally
- Engine-agnostic API: swap flagship → diffusion with one string
from radmah_sdk import RadMah
client = RadMah(api_key=os.environ["RADMAH_API_KEY"])
# Flagship engine (default) — 95.69 % benchmark-certified fidelity
job = client.synthesize(
source="s3://acme-prod/raw/customers_2026q1.parquet",
engine="flagship", # alternates available in the docs
rows=1_000_000,
contract={
"pk": ["customer_id"],
"constraints": ["balance >= 0", "age BETWEEN 18 AND 120"],
"privacy": {"membership_risk_ceiling": "strict"},
},
seed=42, # sealed contract + seed = byte-identical re-run
)
job.wait() # blocks until quality gates pass or fail-closed
print(job.evidence.utility_report) # {'ks_median': 0.97, 'corr_frobenius': 0.02, ...}
# Verify the 9-artefact BLAKE3 bundle offline
client.verify(job.evidence.bundle_path)
# → BundleVerified(hsfg_seal=..., chain_ok=True, artefact_count=9)Measured fidelity
Quality metrics — emitted, not claimed.
Every synthetic dataset ships with quantitative fidelity and privacy measurements inside the evidence bundle. No subjective claims — the bundle proves the numbers and evidence verifier confirms the chain offline.
Distributional similarity
Strong per-column match between real and synthetic marginals
Per-column distributional similarity is measured between the real and synthetic marginals on every run. The flagship engine has been independently QA-certified at 95.69 % fidelity on a reference benchmark. The full number set is emitted into the sealed utility report artefact on every run — see /verify for a downloadable bundle.
Correlation preservation
Inter-column dependency structure preserved, with a fail-closed gate
The pairwise correlation matrix is compared between the real and synthetic slices to quantify how well inter-column dependency structure is preserved. Every run either passes the correlation gate and ships — or aborts with a fail-closed quality failure. Exact numbers land in the sealed utility report.
Privacy risk metrics
Membership-inference, attribute-inference, and disclosure metrics — all emitted
Nearest-neighbour distance distribution, membership-inference resistance, attribute-inference resistance, and disclosure-risk metrics are all measured on the synthetic output against the source and written into the privacy report. Zero PII is guaranteed by construction: synthetic rows are sampled from the learned joint, never copied from source records.
Sealed evidence bundle
Multi-artefact, cryptographically chained, verifiable offline
Every run emits a contract snapshot, determinism report, constraint report, utility report, privacy report, run telemetry, engine manifest, artefact index, and a chain seal. The open-source verifier CLI replays the cryptographic chain end-to-end, offline, no network. Third parties can independently confirm every metric.
Enterprise compliance
Audit-ready from day one.
Cryptographic evidence bundles, privacy risk metrics, and deterministic reproducibility give your compliance team everything a DPIA, ISO 27001 data-sharing control, or SOC 2 audit needs.
Cryptographic evidence bundles
Every synthetic generation produces a signed multi-artefact bundle covering the sealed contract, determinism proof, constraint and utility reports, privacy metrics, run telemetry, engine manifest, artefact index, and release seal. Cryptographic hashes chain every artefact; any modification breaks the chain and the evidence verifier refuses to certify.
GDPR Article 89 alignment
Synthetic data that does not relate to an identified or identifiable natural person falls outside the material scope of GDPR Articles 5–15. Our evidence bundles document the generation process and a full set of privacy-risk metrics — disclosure risk, membership-inference resistance, and attribute-inference resistance — supporting Article 89 research-exemption claims and DPIA submissions.
Audit-ready provenance chain
Evidence bundles provide the traceability, reproducibility, and integrity verification compliance auditors require. Every generation is deterministic — same sealed contract + same seed → byte-identical output on any host, any time. The audit trail is cryptographically immutable.
Zero PII in output by construction
Synthetic records are sampled from the learned joint distribution, never copied from source rows. Privacy reports measure disclosure risk, membership-inference resistance, and attribute-inference resistance for every run. When risk exceeds the enterprise-configurable thresholds, the job fails-closed before emitting any synthetic data.
Cross-border-transfer enabler
Synthetic output enables cross-border development and analytics without transferring personal data. Teams in different jurisdictions work on statistically-faithful datasets while source data remains in its sovereign storage. Removes the need for Standard Contractual Clauses on the synthetic artefact.
Deterministic reproducibility
A sealed contract plus a seed produces byte-identical synthetic output on any host, at any time. A cryptographically-strong seed-reproducible RNG with cross-platform deterministic arithmetic guarantees consistency across clusters and cloud providers. The determinism report in the evidence bundle records the exact RNG state at every checkpoint.
Bring a CSV. Leave with an evidence bundle.
30-minute working session: you upload (or we mock) a representative dataset, we run it through the flagship and one alternate engine, and you keep the signed 9-artefact BLAKE3 bundle plus the utility report. All five engines are available on every plan — Free, Sovereign, and Enterprise — with tier-specific credit allocations.
Explore the platform
Mock Data
Deterministic schema-driven fabrication when statistical fidelity is not the goal.
Learn moreAutonomous Data Scientist
48-module agent with 43 typed planner tools orchestrates Synthesize across every wired engine with HAGP approval gates.
Learn moreHealthcare FHIR
HL7 FHIR R4 conformant bundles with LOINC, RxNorm, ICD-10-CM.
Learn moreVerify the evidence
Download and validate a real 9-artefact BLAKE3 bundle right now — no account required.
Learn more