Taxonomy of synthetic data in healthcare
A classification system that organizes synthetic healthcare data based on how much real data is used (proportion), the type of data (modality), and how it has been changed or generated (transformation). It helps researchers and practitioners understand and compare different kinds of synthetic health data.
At a glance
Use when
Classifying synthetic datasets in health research, designing studies using synthetic data, or reporting methods involving synthetic data generation
Avoid when
Working exclusively with real patient data without synthetic components, or when detailed technical evaluation of data generation algorithms is required
Inputs
Characteristics of synthetic datasets (e.g., origin, generation method, data type)
Outputs
A structured classification of synthetic healthcare data based on proportion, modality, and transformation
How it works
The taxonomy categorizes synthetic healthcare data along three dimensions: data proportion (the extent to which data is synthetic vs. real), data modality (the type of data such as imaging, genomic, or clinical records), and data transformation (the methods used to generate or alter data). This structured classification supports systematic analysis and reporting of synthetic datasets in health research and applications.
- Project
- INSAFEDARE
- Funding
- Horizon Europe
- Project status
- Ongoing
- HTA domains
- Aspects Beyond HTA
- Categories
- Data ClassificationSynthetic Data
- Technology
- Non-specific
- Assumptions
- Synthetic data can be meaningfully categorized along the dimensions of proportion, modality, and transformation; these categories are relevant and sufficient for characterizing most synthetic healthcare datasets
- Strengths
- Provides a clear, structured framework for comparing and selecting synthetic datasets; supports transparency and reproducibility in research; applicable across diverse data types and use cases
- Limitations
- May not capture all nuances of complex data generation processes; does not evaluate data quality or fidelity to real-world distributions
- Also known as
- Synthetic Data Taxonomy in Healthcare
Questions this answers
Similar by meaning
Beta record. Based on the original catalogue summary; primary-source enrichment pending.

