FCA SDEG governance considerations for synthetic data in UK financial services

The Financial Conduct Authority established the Synthetic Data Expert Group (SDEG) in March 2023, bringing together around 20 specialists from financial services, public sector, technology vendors and consumer groups. SDEG produced two reports: the first in March 2024 covered six use cases, the second and final in August 2025 distilled nine governance principles for firms to consider when building synthetic data programmes. Neither report is FCA guidance. The principles are explicitly non-binding considerations, derived from the AI Ethics literature and existing model risk management practice. UK firms regulated by the PRA and FCA must nonetheless translate these considerations into operational governance, because supervisors expect coherent reasoning when synthetic data appears in model documentation under SS1/23 or in fair value assessment under Consumer Duty.

§ 01 / SDEG setup and timeline

From 2022 Call for Input to August 2025 conclusion

The FCA issued a Call for Input on synthetic data in 2022, asking the market to share use cases, opportunities and concerns. The response was substantial enough to motivate establishing a structured forum. The Synthetic Data Expert Group was launched in March 2023 as a specialised sub-group of the FCA's Innovation Advisory Group, with around 20 members spanning banks, insurers, fintech firms, technology vendors, data protection specialists, academics, and consumer representatives. The mandate ran for approximately two years, with explicit scoping to produce practical insights rather than regulatory product.

SDEG produced two outputs. The first, "Using Synthetic Data in Financial Services", was published in March 2024 and examined six use cases across three thematic blocks. The second and final report, "Generating and using synthetic data for models in financial services: governance considerations", was published in August 2025. The second report concludes SDEG's work programme. It builds on the first paper, responds to industry feedback received in the interim, and distils what the group considers to be the nine governance principles most relevant for organisations using synthetic data in financial services.

Status disclaimer. The FCA states explicitly that the SDEG report is not guidance, does not represent recommendations or future policy, does not endorse or condemn the use of synthetic data, and does not imply compliance with UK data protection law. The considerations are non-exhaustive and will develop as synthetic data usage expands across the sector. UK firms should treat the principles as a reasoning frame, not a checklist.

The status disclaimer is consequential. A UK firm that adopts the SDEG principles as a compliance framework risks treating non-binding considerations as if they were FCA expectations, which can lead to over-engineered governance for low-risk use cases or under-engineered governance for use cases the SDEG did not cover. The principles work best as a reasoning frame against which the firm builds its own context-specific operational governance, with reasoning documented for supervisory dialogue.

§ 02 / The nine principles

What SDEG distilled from AI Ethics and MRM

The nine principles SDEG identified are not novel. They draw from established AI Ethics taxonomies and from existing model risk management practice that has informed PRA SS1/23 and other regulatory frameworks. The contribution of SDEG is not the principles themselves but the application to the specific context of synthetic data in financial services, where the data generation step introduces governance considerations absent from traditional model lifecycles.

Principle 01

Accountability

Clear roles and responsibilities across the synthetic data lifecycle, from generation through to model deployment.

Principle 02

Safety

Outcomes from models trained on synthetic data are robust, reliable, and free from foreseeable harms to consumers or markets.

Principle 03

Transparency

Documented decisions on data generation, validation, and use, accessible to relevant stakeholders and supervisors.

Principle 04

Explainability and interpretability

The synthetic data generation process and downstream model behaviour can be explained at the level needed by users and oversight functions.

Principle 05

Security and privacy

Synthetic data protects the confidentiality of source data and is generated with appropriate technical controls preventing re-identification.

Principle 06

Fairness

Synthetic data does not introduce, amplify, or fail to mitigate historical bias in the source data, particularly affecting protected characteristics.

Principle 07

Agency

Where synthetic data influences customer-impacting decisions, individuals retain meaningful ability to challenge or seek review of those decisions.

Principle 08

Suitability

Synthetic data is used only where its quality demonstrably meets the required risk threshold for the intended use case.

Principle 09

Continuous monitoring and improvement

Performance, fairness, and privacy properties of synthetic datasets are monitored over time and updated as conditions change.

Three principles tend to dominate operational discussion: Accountability, Fairness, and Suitability. Accountability requires a named owner across the data lifecycle, often spanning the model risk function, data protection officer, and business sponsor. Fairness requires bias testing methodology that is documented and repeatable. Suitability requires criteria for when synthetic data is appropriate versus when real data should be used despite the privacy cost.

§ 03 / Three lifecycle themes

How SDEG structures the governance question

SDEG organises its considerations around three themes that map to the synthetic data lifecycle. Each theme has its own governance challenges and characteristic failure modes.

Data augmentation and bias mitigation covers use cases where synthetic data is generated to artificially increase under-represented groups in a training dataset, with the intent of reducing bias in downstream models. The governance question here is whether the augmentation genuinely reduces bias or whether it introduces a different bias by, for instance, overweighting synthetic samples whose patterns reflect the generator's assumptions rather than reality. SDEG members noted that there are no universal tests for bias and recommended combining quantitative methods with qualitative subject matter expert judgement.

System testing and model validation is the use case most directly relevant to PRA SS1/23 obligations. Synthetic data can augment or replace real data for training and validation. The governance question is whether the synthetic data is fit for the validation purpose, which depends on whether it preserves the conditional distributions and edge cases on which the model under test should be challenged. SDEG members emphasised that statistical similarity between real and synthetic data is necessary but not sufficient: downstream model performance must be validated against an independent holdout of real data.

Internal and external data sharing covers use cases where synthetic data is shared within an organisation across divisions or externally with vendors, research institutions, or other firms. This is the most legally complex theme, because shared synthetic data must not enable re-identification of individuals in the source data, and the contractual frameworks for sharing must address residual risks. SDEG members noted that the legal status of synthetic data under UK data protection law is unsettled in case-by-case scenarios, and DUAA 2025 did not specifically resolve the question.

§ 04 / TSTR methodology

Train Synthetic, Test Real

The most specific technical recommendation in the SDEG report is the Train-Synthetic-Test-Real methodology for validating models that use synthetic data in training. Under TSTR, the model is trained on synthetic data but final validation is conducted on an independent holdout of real data. If the model performs comparably on the real holdout to a baseline model trained on real data, the synthetic data is considered fit for the training purpose. If performance degrades materially, the synthetic data has failed to preserve the patterns the model needs.

TSTR is methodologically sound but operationally demanding. It requires the firm to maintain a real-data holdout under strict access controls, separate from the development environment using synthetic data. The holdout must not be used iteratively to tune the synthetic data generation, which would leak real-data information into the synthesis process and defeat the privacy intent. In practice, this means the holdout is consumed once per model release and must be replenished from current production for subsequent releases.

SDEG members observed that statistical similarity tests, while easier to run, do not provide the same assurance as TSTR. Two synthetic datasets can pass statistical similarity tests while differing materially in downstream model performance, particularly for high-dimensional financial services data where subtle feature interactions matter. The pragmatic recommendation is to use statistical similarity as a fast screening tool but to rely on TSTR for final validation.

§ 05 / Generation phase

Three areas at the data creation step

SDEG identified three governance areas that warrant attention specifically during the generation phase, before downstream model work begins. Early and deliberate consideration of these areas helps firms make informed design decisions and avoid expensive rework.

The first is auditability controls and monitoring. The generation process should produce a complete record of the model used, the parameters chosen, the seed for any stochastic elements, the version of source data consumed, and the quality controls applied. This record forms the lineage backbone for any downstream model that uses the synthetic data. Firms that skip this step at generation time discover the gap during model validation or supervisory dialogue, when the question "where did this dataset come from" cannot be answered authoritatively.

The second is data privacy risk. The generation process must produce a defensible privacy assessment: what is the residual risk of re-identification, how was it measured, and what controls mitigate it. UK GDPR Article 32 obligations apply during the generation process itself, when real source data is consumed by the generator. They apply differently to the synthetic output, which may or may not constitute personal data depending on residual identifiability. The legal analysis is fact-specific.

The third is bias management. Bias in source data does not disappear in synthetic outputs unless the generation process explicitly accounts for it. Firms must embed fairness testing into the generation step, not defer it to downstream model validation, because by the time the downstream model is validated the bias may be entrenched in a way that is operationally difficult to remove. SDEG members highlighted that combining quantitative bias metrics with qualitative subject matter expert review produces more reliable bias detection than either approach alone.

§ 06 / Intersection with binding frameworks

PRA SS1/23, Consumer Duty, DUAA

Although SDEG's nine principles are not binding, three UK frameworks that are binding intersect with SDEG considerations in operationally significant ways. UK firms using synthetic data must therefore translate SDEG considerations into evidence usable under the binding frameworks.

PRA SS1/23 Model Risk Management in force since 17 May 2024 requires firms to govern models throughout the lifecycle, including independent validation under Principle 4. A model trained or validated using synthetic data is in scope of SS1/23 like any other model. The validation function under SS1/23 will examine the synthetic data quality, the generation methodology, the TSTR results, and the audit trail of data lineage. SDEG considerations on Accountability, Suitability, and Continuous monitoring map directly to SS1/23 validation evidence.

FCA Consumer Duty in force since 31 July 2023 for open products requires firms to deliver good outcomes for retail customers across four outcomes. Synthetic data used in models affecting retail decisions, such as credit decisioning, pricing, or vulnerability detection, must be governed in a way that supports the fair outcomes obligation. SDEG considerations on Fairness, Agency, and Suitability are directly relevant. A firm whose pricing model is trained on synthetic data must be able to demonstrate that the synthesis process did not introduce or amplify bias affecting protected groups, and that the model's behaviour produces fair value across customer segments.

UK GDPR as amended by DUAA 2025 preserves Article 32 obligations on security of processing. The generation process operates on real personal data and must satisfy Article 32 controls. The synthetic output may or may not be personal data under UK GDPR depending on residual identifiability. SDEG considerations on Security and privacy and Transparency map directly to Article 32 and Article 14 transparency obligations.

§ 07 / Honest implementation

What does mature SDEG-informed governance look like

A UK firm that builds its synthetic data governance to genuinely satisfy SDEG considerations and binding regulatory frameworks tends to converge on a recognisable operational pattern. Six elements appear consistently in mature implementations.

First, a named senior owner under the Senior Managers Regime, typically the Chief Risk Officer or Chief Data Officer, with explicit prescribed responsibility for synthetic data governance. Second, a written policy that defines when synthetic data is appropriate, sets thresholds for required validation depth, and specifies escalation paths for proposed novel use cases. Third, a centralised registry of synthetic datasets generated, including provenance metadata, validation results, and approved use cases. Fourth, an independent validation function with explicit terms of reference for assessing synthetic data quality before downstream use. Fifth, an annual review cycle that revisits the policy, the registry, and the validation framework as the technology and regulatory environment evolve. Sixth, board-level reporting on the synthetic data programme as part of broader data and model governance reporting.

The pattern is not particularly exotic. It mirrors the governance patterns mature firms have for traditional model risk, with adaptations for the specific generation step that synthetic data introduces. The operational cost is moderate; the regulatory and reputational protection is substantial.

§ 08 / What Infundum is designing

CAUSA AI Data Engine for SDEG-informed governance

Infundum's CAUSA AI Data Engine is being designed as a causal multi-table synthesis infrastructure for the financial sector. CAUSA is pre-MVP; what follows describes design intent and architectural direction, not shipped feature set.

For UK firms building synthetic data governance informed by SDEG considerations, CAUSA addresses three specific needs. First, principled multi-table fidelity: synthesis preserves causal dependencies across customer, transaction, product, and outcome tables, supporting the Suitability principle by producing data that genuinely meets the quality threshold for credit, fraud, and behavioural model training and validation. Second, audit-grade generation provenance: each generated dataset carries documented generation parameters, source model, seed, privacy threshold, and bias metrics, mapping directly to the Accountability, Transparency, and Auditability considerations and to PRA SS1/23 validation evidence requirements. Third, embedded fairness testing: the generation pipeline produces fairness metrics for protected characteristic proxies as a first-class output, supporting the Fairness principle and Consumer Duty fair value assessment without requiring downstream bias engineering.

Deployment is self-hosted within the firm's secure perimeter. Production data informs the synthesis pattern but never leaves the security boundary. Specific architecture details are available under NDA during formal evaluation.

§ 09 / Related

Regulatory context

For the binding model risk framework see PRA SS1/23 Model Risk Management: validating multi-table banking models without PII exposure. For UK data protection following DUAA and Consumer Duty fair value assessment see UK DUAA 2025 and FCA Consumer Duty: synthetic data for demonstrable compliance. For the European parallel on data governance see BCBS 239 and ECB RDARR: synthetic data lineage for cross-border European banks.

Conclusion

The FCA Synthetic Data Expert Group has concluded its work programme. The August 2025 final report distils nine governance principles that UK firms should treat as a reasoning frame, not a checklist. The principles are non-binding considerations derived from AI Ethics and model risk management practice, applied to the specific lifecycle stages introduced by synthetic data generation. UK firms using synthetic data in models for credit, fraud, AML, or customer outcomes nonetheless face binding obligations under PRA SS1/23 model risk management, FCA Consumer Duty fair outcomes, and UK GDPR as amended by DUAA 2025. The SDEG considerations translate naturally into evidence usable under these frameworks when implemented in a mature governance pattern: named senior accountability, written policy, centralised registry, independent validation, annual review, and board-level reporting. Causal multi-table synthesis with audit-grade provenance and embedded fairness metrics maps to the SDEG considerations and to the binding frameworks simultaneously. Infundum is designing CAUSA AI Data Engine for precisely this convergence.

Author's note. Thirteen years engineering data infrastructure across European financial services — across four jurisdictions, across the regulatory stack: BCBS 239 lineage, KNF risk reporting, Solvency II data quality, model risk validation. First version of CAUSA completed end of 2024 after 18 months of solo R&D. — A. Kordos, Founder, Infundum.

FCA SDEG governance considerations: nine principles for synthetic data in UK financial services.