Cookies?
Library Header Image
LSE Research Online LSE Library Services

The hidden sexual minorities: machine learning approaches to estimate the sexual minority orientation among Beijing college students

Chen, Yunsong, He, Guangye and Ju, Guodong (2022) The hidden sexual minorities: machine learning approaches to estimate the sexual minority orientation among Beijing college students. Journal of Social Computing, 3 (2). pp. 128-138. ISSN 2688-5255

[img] Text (The_Hidden_Sexual_Minorities_Machine_Learning_Approaches_to_Estimate_the_Sexual_Minority_Orientation_Among_Beijing_College_Students) - Published Version
Available under License Creative Commons Attribution.

Download (2MB)

Identification Number: 10.23919/JSC.2021.0021

Abstract

Based on the fourth-wave Beijing College Students Panel Survey (BCSPS), this study aims to provide accurate estimation of the percentage of the potential sexual minorities among the Beijing college students by using machine learning methods. Specifically, we employ random forest (RF), an ensemble learning approach for classification and regression, to predict the sexual orientation of those who were not willing to disclose his/her inherent sexual identity. To overcome the imbalance problem arising from far different numerical proportion of sexual minority and majority members, we adopt the repeated random sub-sampling for training set by partitioning those who expressed heterosexual orientation into different number of splits and further combining each split with those who expressed sexual minority orientation. The prediction from 24-split random forest suggests that youths in Beijing with sexual minority orientation amount to 5.71%, almost two times that of the original estimation 3.03%. The results are robust to alternative learning methods and covariate sets. Besides, it is also suggested that random forest outperforms other learning algorithms, including AdaBoost, Naive Bayes, support vector machine (SVM), and logistic regression, in dealing with missing data, by showing higher accuracy, F1 score, and area under curve (AUC) value.

Item Type: Article
Additional Information: © 2020 The Author(s).
Divisions: Social Policy
Subjects: H Social Sciences
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Date Deposited: 06 Sep 2022 10:36
Last Modified: 07 Sep 2022 09:24
URI: http://eprints.lse.ac.uk/id/eprint/116447

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics