Cookies?
Library Header Image
LSE Research Online LSE Library Services

Generative data modelling for diverse populations in Africa: insights from South Africa

Simmons, Sally Sonia ORCID: 0000-0002-9126-5922, Hagan Jr, John Elvis and Schack, Thomas (2025) Generative data modelling for diverse populations in Africa: insights from South Africa. Information, 16 (7). ISSN 2078-2489

[img] Text (information-16-00612) - Published Version
Available under License Creative Commons Attribution.

Download (509kB)

Identification Number: 10.3390/info16070612

Abstract

Studies on the demography and health of racially diverse African populations are scarce, particularly due to lingering data challenges. Generative data modelling has emerged as a valuable solution to this burden. The study, therefore, examined the efficacy of Conditional Tabular GAN (CTGAN), CopulaGAN, and Tabula Variational Autoencoder (TVAE) for generating synthetic but realistic demographic and health data. This study employed the World Health Organisation stigy on global ageing and adult health survey (SAGE) Wave 1 South African data (n = 4227). Information missing from SAGE Wave 1, including demographic (e.g., race, age) and health (e.g., hypertension, blood pressure) indicators, were imputed using Generative Adversarial Imputation Nets (GAIN). CopulaGAN, CTGAN, and TVAE, sourced from the sdv 1.24.1 python library, generated 104,227 synthetic records based on the SAGE data constituents. The outcomes were accessed with similarity and machine learning (XGBoost) augmentation metrics (sourced from the sdmetrics 0.21.0 python library), including column shapes and overall and precision ratio scores. Generally, the GAIN imputations resulted in data with properties that were comparable to original and with no missing information. CTGAN’s (89.20%) overall quality of performance was above that of TVAE (86.50%) and CopulaGAN (88.45%). These findings underscore the usefulness of generative data modelling in addressing data quality challenges in diverse populations to enhance actionable health research and policy implementation.

Item Type: Article
Additional Information: © 2025 by the authors
Divisions: Psychological and Behavioural Science
Social Policy
Subjects: H Social Sciences > HB Economic Theory
Q Science > QA Mathematics > QA76 Computer software
Date Deposited: 05 Aug 2025 08:27
Last Modified: 08 Aug 2025 00:36
URI: http://eprints.lse.ac.uk/id/eprint/129032

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics