HTAtlas
← Back to explore

Taxonomy of synthetic data in healthcare

Taxonomypeer-reviewed

A classification system that organizes synthetic healthcare data based on how much real data is used (proportion), the type of data (modality), and how it has been changed or generated (transformation). It helps researchers and practitioners understand and compare different kinds of synthetic health data.

At a glance

Use when

Classifying synthetic datasets in health research, designing studies using synthetic data, or reporting methods involving synthetic data generation

Avoid when

Working exclusively with real patient data without synthetic components, or when detailed technical evaluation of data generation algorithms is required

Inputs

Characteristics of synthetic datasets (e.g., origin, generation method, data type)

Outputs

A structured classification of synthetic healthcare data based on proportion, modality, and transformation

How it works

The taxonomy categorizes synthetic healthcare data along three dimensions: data proportion (the extent to which data is synthetic vs. real), data modality (the type of data such as imaging, genomic, or clinical records), and data transformation (the methods used to generate or alter data). This structured classification supports systematic analysis and reporting of synthetic datasets in health research and applications.

Project
INSAFEDARE
Funding
Horizon Europe
Project status
Ongoing
HTA domains
Aspects Beyond HTA
Technology
Non-specific
Assumptions
Synthetic data can be meaningfully categorized along the dimensions of proportion, modality, and transformation; these categories are relevant and sufficient for characterizing most synthetic healthcare datasets
Strengths
Provides a clear, structured framework for comparing and selecting synthetic datasets; supports transparency and reproducibility in research; applicable across diverse data types and use cases
Limitations
May not capture all nuances of complex data generation processes; does not evaluate data quality or fidelity to real-world distributions
Also known as
Synthetic Data Taxonomy in Healthcare

Questions this answers

Similar by meaning

Beta record. Based on the original catalogue summary; primary-source enrichment pending.