The deterministic synthetic-data platform.
Where tabular-only synthesizers stop, we keep going.
Tabular records, HL7 FHIR R4 patient bundles, industrial SCADA on real OT protocols, and pre-labelled MITRE ATT&CK ICS attack datasets — under one evidence chain. Same request, same seed: byte-identical output on any cluster, a year from today.
Built for data, healthcare, and OT/SOC teams whose production records can’t ship to vendors, partners, or research collaborators. Every run is reproducible byte-for-byte and sealed with a cryptographic evidence chain your regulator or auditor can verify offline. Drive the whole platform from a typed Python SDK, an OpenAPI 3.1 REST surface, or plain English — with a visible cost gate before anything expensive runs.
- Tabular fidelity vs real data (independent benchmark)
- 95.69 %
- IEEE-spec industrial protocols, wire-level encoded
- 6
- MITRE ATT&CK ICS techniques, ground-truth labelled
- 67
- CI-tested plant templates across 6 industrial sectors
- 67
- FHIR R4 core patient resource types
- all 8
- Typed tools wired into the Agentic Data Scientist
- 43
- When a step fails
- ADS self-heals
- Cryptographic seal (BLAKE3)
- every run
- Source connectors, encrypted in transit + at rest
- 14
- Same prompt + same seed = same output
- byte-identical
Built for four audiences, integrated by construction.
We didn’t glue four products together. The pillars share one sealed-job format, one evidence chain, one tenant model, one connector vault, one agent runtime — so a SCADA run can feed a Synthesize job, an attack mix can be labelled by the same agent that cleans your CSV, and every artefact lands in the same auditable place.
Tabular synthesis at 95.69 % benchmark-certified fidelity — five engines under one contract.
Five different generation engines — one enterprise flagship plus four alternates, including one built specifically for linked tables like customers + orders + payments — all driven through the same sealed request format. Start from a one-line description, feed your own CSV, or let the autonomous agent decide which engine fits your job. Every output joins the same cryptographic receipt chain. The AI Assistant lets a non-engineer drive the whole thing in plain English, with a visible cost estimate before anything expensive runs.
Open the synthetic data pillarEight HL7 FHIR R4 resource types, shipped clinical vocabularies, 100 % referential integrity, zero PHI.
Eight kinds of patient record — demographics (Patient), visits (Encounter), diagnoses (Condition), lab results (Observation), prescriptions (MedicationRequest), allergies, procedures, and immunisations — generated as a single internally-consistent bundle. Every cross-reference inside the bundle points to a record that actually exists there (100 % referential integrity), not a placeholder patched later. The standard medical code lists (LOINC for lab tests, RxNorm for drugs, ICD-10-CM for diagnoses) ship loaded inside the image under their free licences. SNOMED CT stays bring-your-own-licence. Two validators check every bundle before it ships — our own structural check and a Python-native full R4 conformance check — so nothing malformed can leave. For a clinical-trial sponsor or EHR integrator this is the PHI-free cohort you can hand to a research partner without a two-month IRB negotiation: cryptographically sealed, byte-reproducible from the same seed a year from today, acceptable to the validators your downstream toolchain already runs. For an FDA Part 11 or HIPAA de-identification reviewer it is the provenance chain that lets them trace every row back to the request that authorised it.
Open the healthcare fhir pillarSix OT protocols at IEEE-spec binary level, 67 MITRE ATT&CK ICS techniques, 67 CI-gated plant templates.
The six industrial protocols every real factory runs — Modbus, OPC-UA, BACnet, MQTT, DNP3, IEC 61850 — simulated at the exact binary level your equipment sees on the wire. Physics-accurate process models for water-treatment, power, and chemical plants (no canned CSVs). Air-gapped virtual controllers you can run entirely inside your VPC for red-team exercises or operator training. Pre-labelled attack datasets mapped to the public MITRE ATT&CK ICS framework, with realistic blast-radius modelling (when an attacker hits valve A, what does sensor B read?). Compose the three into a sealed cyber-range your IDS or SOC platform can score against — the same scenarios, scored the same way, every vendor evaluation.
Open the industrial simulators pillarOne typed surface, four entry points.
Four ways to talk to the platform, all on the same contract. A Python SDK fully typed end-to-end (Pydantic v2, async-first) — the kind your IDE can autocomplete and your CI can contract-test. An OpenAPI 3.1 REST surface with idempotency keys (safe retries) and webhook payloads signed with a cryptographic hash so you can prove nobody tampered with them in transit. A first-class command-line interface that wraps the same sealed-job contract for shell-scripted pipelines, evidence-bundle inspection, and offline verification. Fourteen pre-built source connectors for the systems teams actually use — warehouses, object stores, databases, Hugging Face — with secrets auto-vaulted to an encrypted store instead of sitting in config files. A single engineer can wire RadMah AI into a real data plane in one afternoon; a contract-test runner is already waiting in your CI.
Open the developer platform pillarSix surfaces, one evidence chain.
A walk through the real surface — the agent chat that drives the whole platform, a live SCADA HMI driving real industrial protocols, the sealed ICS attack timeline, the OpenAPI REST surface, the typed SDK and REPL, the Autonomous Data Scientist runtime, and the cryptographically-chained evidence bundle an auditor opens offline. Every panel below is wired through the same sealed-job format, the same cryptographic chain, the same tenant vault.
Six industrial tags on real wire protocols.
Six live sensor readings streaming from a simulated pump station — discharge pressure, suction pressure, motor temperature, variable- frequency-drive output, flow rate, and an over-speed alarm. The simulator speaks the two real industrial standards Modbus and OPC-UA at the same 500-messages-per-second rate your actual equipment uses, and the values come out of a physics-accurate process model (not canned CSVs), so an IDS, SIEM or operator-training suite can't tell this stream apart from a real plant. The seal at the bottom is the run's cryptographic receipt — proof nobody tampered with the stream after it was generated.
For the SOC team that needs a training set and the OT engineer who cannot hand over real plant tapes: this is your repeatable dataset. Every run is byte-identical under the same seed, so your detection rules have a stable target, and every value in the stream is inside the physical envelope your equipment actually operates in.
For the non-technical reader — the plant manager, the procurement lead, the board member — the plain-English read is this: we produce the kind of industrial data your engineering team has always needed but could never ship off-site. The protocols are real so the receiving tool (IDS, SIEM, operator-training suite) cannot tell the difference. The physics is real so nothing in the stream asks your equipment to do the impossible. The seal at the bottom of the panel is proof the receiving party can verify without involving us — that is what makes it distributable to a vendor, a partner, or a regulator without a fresh legal review every time.
- ◆ Deep library of pre-built plant templates across verticals
- ◆ 6 protocols — Modbus, OPC-UA, BACnet, MQTT, DNP3, IEC 61850
- ◆ Sealed signals + alarms + commands streams per run
Ground-truth attacks, not heuristic guesses.
A six-minute attack window recorded against the SCADA pressure trace with every attacker move pre-labelled — command injection, operator- view spoofing, parameter tampering, alarm suppression, controller program modification. Each labelled event comes with the exact start and stop time, the public MITRE attack-technique ID it maps to, and which piece of equipment it targeted, all written into a machine-readable file called truth.ndjson. Your intrusion-detection system no longer needs a human analyst to hand-label the validation set — the ground truth ships with the bundle.
The commercial win: your SOC analyst stops hand-labelling thousands of rows to benchmark a detection rule, and your vendor selection process stops relying on a red-team engagement you can only run once a quarter. Drop the sealed bundle into the IDS under evaluation, compare per-event precision and recall against ground truth, and call the vendor meeting with numbers, not vibes.
For the non-technical reader: think of it as the cyber-security equivalent of a driving-test track — realistic enough that the skill transfers to the real road, safe enough that you can run it every week. Every event on the timeline is labelled in a way a regulator and a vendor both accept, so you can run the same scenario against five different security tools and compare scores apples-to-apples. No more "trust our red team"; the proof is in the file.
- ◆ 9 MITRE ATT&CK ICS classes wired end-to-end
- ◆ pcapng + signals.parquet + truth.ndjson in one sealed bundle
- ◆ Per-event severity + impact classes for regression-testing detection rules
Plain-English driver, explicit cost gate.
One chat window drives every engine on the platform — quick mock data, full tabular synthesis, virtual factory simulation, attack composition, and the autonomous agent. Before anything expensive runs, the assistant returns a plan card showing the concrete steps it intends to take, the compute class it will use, and a credit estimate in plain numbers. Nothing charges your account until you tap approve. The entire back-and- forth — your request, the plan, the approval, every tool call the agent made, and the sealed output — lands in the same receipt bundle, so an auditor can reconstruct exactly what happened and why.
For a finance or procurement lead worried about AI-agent cost runaway: there is no runaway here. The agent never bills a cent until a human approves the plan, and every penny of credit spend is in the transcript for the auditor. The analyst who drives it does not need to learn the SDK — they describe the job in English, read the plan, and click.
For the non-technical reader: imagine hiring a data-science contractor who shows you their proposed plan, their estimated bill, and waits for your approval before starting. That is the assistant. It writes down what it intends to do, how much it will cost, which tool it will use, and what the deliverable will be — all before spending a credit. A procurement lead can approve runs the way they already approve purchase orders.
- ◆ Soft-cap per turn, hard-cap per project — the agent stops before it overspends
- ◆ Every tool call is typed, versioned, and signed into the run ledger
- ◆ Transcript + plan + approvals are part of the cryptographic hash-sealed bundle
Python on the left, real-time run events on the right.
A Python SDK that's fully typed end-to-end — your IDE autocompletes every field, your linter catches every typo, your CI can run a contract test before you ship a single line. On the right, a live REPL stream from the same run showing training-loss numbers, quality-gate pass/fail events, seal events, and the final cryptographic verification. The SDK uses the exact same object shapes the REST API does, so the work you do in a notebook on day one is the same work you put into production on day ten — no rewrite.
For the integration engineer: this is a one-afternoon job, not a three-week sprint. Import the client, configure an API key, hand us a dataset or a connector, and you get a sealed bundle back. The same SDK drives every engine on the platform, so the hours you spend learning the Synthesize surface are the hours you also just spent learning Virtual SCADA and the Autonomous Data Scientist.
For the non-technical reader: the SDK is the piece that lets your engineering team automate everything shown on this marketing site — the synthetic-data runs, the industrial simulators, the FHIR bundles, the orchestration — from inside your own CI pipeline, without logging into a dashboard. Every run still produces the same sealed evidence bundle a regulator or an auditor can check offline, so automating the work does not cost you the provenance story.
- ◆ 100% type coverage · async-first primitives
- ◆ Identical object model across SDK and REST surface
- ◆ Contract-test runner drops into your CI out of the box
Eight resources, one bundle, zero PHI.
One synthetic patient with a full medical story — their visit, their diagnosis, their lab results, their prescription, their allergies, their procedures, their vaccinations — all generated as a single internally-consistent bundle. Every cross-reference inside the bundle (which visit produced which lab result, which prescription treated which condition) points to a record that actually exists there; the generator builds consistency in from the start, not as a post-hoc patch. The standard medical code lists — LOINC for lab tests, RxNorm for drugs, and the full US ICD-10-CM catalogue — ship inside the image. Two validators run before any bundle leaves, and no real patient row is ever touched.
For the clinical-trial sponsor, the EHR team, and the health-tech product lead: this is the PHI-free cohort you can actually ship to a partner without a two-month IRB negotiation. Every bundle is cryptographically sealed and reproducible, so a regulator can open it a year from today and get the same file.
- ◆ 8 HL7 FHIR R4 core resources · 100 % RI
- ◆ LOINC subset, RxNorm set, full US ICD-10-CM shipped
- ◆ Two-stage validator · structural + R4 datatype conformance
One plan. Four engines. One sealed bundle.
The autonomous data scientist takes a plain-English request and breaks it down into a sequence of concrete steps — one engine for the customer table, another for the clinical cohort, another for the factory telemetry, a physics-projection step to correct any out-of-envelope values, and a final sealing step that chains every output's cryptographic hash into one verifiable root. The plan card on the right is exactly what the operator sees in production before approving the run: what the agent intends to do, in what order, at what cost.
For the buyer comparing us with a point-tool: this is the surface a point tool can't build without rewriting every engine it doesn't own. Every step is typed, every transition is auditable, every cost is shown to the operator before spend — and the final bundle is the same artefact regardless of which engines the plan used.
- ◆ Planner + executor + self-healer under one typed tool surface
- ◆ Cost gate on every expensive step · no runaway agent spend
- ◆ Transcript + plan + approvals sealed inside every bundle
Every job runs through six guarantees.
The same six guarantees apply whether the engine is Mock, Synthesize, Virtual SCADA, ICS Security, or an autonomous agent. They are not premium features you turn on — they are the only path through the platform.
Sealed before run
Every job opens with a sealed job specification that captures what you asked for and how it must be generated. Once committed it cannot be silently mutated; the spec's cryptographic hash is the root of the entire downstream evidence chain.
Per-step provenance
As the engine runs, each step's inputs, outputs, parameters, hardware, and timing are written to a per-step proof packet. Any tool-generated code is versioned, and every quality gate is watched in real time — if a gate trips, the failing sub-graph is repaired and the chain is rebuilt without losing what already passed.
Quality fail-closed
Distribution distance, correlation structure, constraint satisfaction, and per-column drift are checked at the end of every generation step. There is no silent degradation — a regression aborts the run and the operator sees the exact gate that failed with a recommended fix.
Cryptographically-chained ledger
Each step's IO + parameters are hashed and chained into a tamper-evident ledger. The chain root makes the bundle uniquely identifiable, and any in-place mutation of any artefact downstream is immediately visible to the offline verifier — no trust in the runtime required.
Signed, portable bundle
The final .tar.zst bundle ships with the contract, the per-step run-log, the quality report, the cryptographic hash manifest, the SBOM of the engine version that produced it, and a plain-English narrative for the auditor. Re-running the contract on a different cluster yields the same dataset hash — byte for byte.
Tenant-isolated, end to end
Per-tenant Fernet keys at rest, per-tenant artefact prefixes, per-tenant evidence keys, JWT-scoped API surface, ORM-level row filtering. There is no path through the system where data from tenant A can be physically retrieved by tenant B — not in logs, not in caches, not in run state, not in evidence bundles.
What RadMah AI does, in concrete numbers.
A single platform that covers tabular synthesis, healthcare FHIR, industrial OT, ICS attack data, physics-constrained sampling, agentic orchestration, and cryptographic evidence. Each row below maps to code we ship today.
| Capability | What ships today |
|---|---|
| Tabular synthesis | Benchmark-certified fidelity under an independent QA harness; a flagship engine plus four alternates (including a relational cascade for linked tables) selected through one sealed contract |
| Deterministic re-generation | Sealed contract + seed — byte-identical output across clusters and time |
| Cryptographic evidence per run | Multi-artefact cryptographic bundle by default on every job; offline verifier ships with the SDK |
| Constraint-aware generation | Primary-key / foreign-key, monotonicity, sum, and rate-limit constraints; hard-projection post-processor for trajectories |
| Healthcare FHIR R4 | Core resource types with the standard clinical vocabularies shipped; premium vocabularies BYO; conformance validator at run time; zero PHI |
| Industrial OT simulators | Six OT protocols at binary-spec level (Modbus, OPC-UA, BACnet, MQTT, DNP3, IEC 61850); wire-level packet capture; physics-honest process kernels |
| ICS attack datasets | Comprehensive coverage of the public MITRE ATT&CK ICS framework with per-event ground-truth labels; blast-radius cascade modelling; configurable attack density |
| Autonomous data scientist | Planner + executor + self-healer driving a typed tool surface; cryptographically-chained decision audit; human-approval gate on credit-spending steps |
| Connector secrets | Per-tenant encrypted vault, auto-hoisted from inline config, per-tenant key-encryption-key rotation |
| Deployment options | Managed SaaS (multi-tenant); Enterprise Data Never Leaves (signed container delivery on customer's network, licence-bound distribution) |
Benchmark numbers are reproducible under the same independent QA harness using matched train / test splits. Full per-release benchmark certificates are published inside the authenticated customer console alongside every run's sealed evidence bundle; the evidence-chain page explains how the chain is built and verified offline.
Posture that survives procurement.
The four guarantees below are not toggles in a settings page. They are architectural — turning them off would mean re-writing the platform.
Tenant-isolated end-to-end
Per-tenant Fernet at rest, per-tenant artefact prefixes, ORM-level row filtering, JWT-scoped API.
Spend-bounded by design
Soft caps per turn, hard caps per project; the agent halts and asks before any large spend.
Audit-ready by default
Sealed transcripts and signed bundles are not premium — they are how the platform works.
Replayable across clusters
Re-run the same sealed job spec + seed on any cluster: dataset hash matches byte-for-byte.
Bring a real dataset. We’ll ship a sealed bundle.
30-minute working session: bring (or we mock) a representative dataset and one open question. We drive Mock, Synthesize, the Autonomous Data Scientist, or any combination, end-to-end. You keep the cryptographic hash-sealed evidence bundle, the quality report, and a sandbox API key to keep iterating.