Sparse Common and Distinctive Covariates Logistic Regression: classification method for high-dimensional multiblock data
Author(s) / Creator(s)
Park, Soogeun
Ceulemans, Eva
Van Deun, Katrijn
Abstract / Description
Datasets comprised of large sets of variables from multiple sources concerning the same observation units are becoming more widespread today. Constructing a classification model in the context of such high-dimensional and multi-block datasets involves a multitude of challenges: variable selection, classification of the response variable and identification of processes at play underneath the predictors. These processes are of particular interest in the setting of multi-block data because they can either be associated individually with single data blocks or jointly with multiple blocks. Many methods have addressed the classification problem in high-dimensionality for a single block of data. However, the additional challenge of capturing and distinguishing distinctive and joint processes from multi-block data has not received sufficient attention. To this end, we propose Sparse Common and Distinctive Covariates Logistic Regression (SCD-Cov-logR). The method extends principal covariates regression to multi-block settings and combines with generalized linear modeling framework to allow classification of a categorical response while revealing predictive processes that involve single or multiple data blocks. In a simulation study, SCD-Cov-logR resulted in outperformance compared to related methods commonly used in behavioural sciences.
Persistent Identifier
Date of first publication
2021-05-18
Is part of
Research Synthesis & Big Data, 2021, online
Publisher
ZPID (Leibniz Institute for Psychology)
Citation
Park, S., Ceulemans, E., & Van Deun, K. (2021). Sparse Common and Distinctive Covariates Logistic Regression: classification method for high-dimensional multiblock data. ZPID (Leibniz Institute for Psychology). https://doi.org/10.23668/PSYCHARCHIVES.4831
-
soogeunpark_bigdatapsychology2021.pdfAdobe PDF - 530.14KBMD5: fe72979861b4cd62d3284cd0c464ba2e
-
There are no other versions of this object.
-
Author(s) / Creator(s)Park, Soogeun
-
Author(s) / Creator(s)Ceulemans, Eva
-
Author(s) / Creator(s)Van Deun, Katrijn
-
PsychArchives acquisition timestamp2021-05-14T13:16:47Z
-
Made available on2021-05-14T13:16:47Z
-
Date of first publication2021-05-18
-
Abstract / DescriptionDatasets comprised of large sets of variables from multiple sources concerning the same observation units are becoming more widespread today. Constructing a classification model in the context of such high-dimensional and multi-block datasets involves a multitude of challenges: variable selection, classification of the response variable and identification of processes at play underneath the predictors. These processes are of particular interest in the setting of multi-block data because they can either be associated individually with single data blocks or jointly with multiple blocks. Many methods have addressed the classification problem in high-dimensionality for a single block of data. However, the additional challenge of capturing and distinguishing distinctive and joint processes from multi-block data has not received sufficient attention. To this end, we propose Sparse Common and Distinctive Covariates Logistic Regression (SCD-Cov-logR). The method extends principal covariates regression to multi-block settings and combines with generalized linear modeling framework to allow classification of a categorical response while revealing predictive processes that involve single or multiple data blocks. In a simulation study, SCD-Cov-logR resulted in outperformance compared to related methods commonly used in behavioural sciences.en
-
Publication statusunknownen
-
Review statusunknownen
-
CitationPark, S., Ceulemans, E., & Van Deun, K. (2021). Sparse Common and Distinctive Covariates Logistic Regression: classification method for high-dimensional multiblock data. ZPID (Leibniz Institute for Psychology). https://doi.org/10.23668/PSYCHARCHIVES.4831en
-
Persistent Identifierhttps://hdl.handle.net/20.500.12034/4268
-
Persistent Identifierhttps://doi.org/10.23668/psycharchives.4831
-
Language of contenteng
-
PublisherZPID (Leibniz Institute for Psychology)en
-
Is part ofResearch Synthesis & Big Data, 2021, onlineen
-
Dewey Decimal Classification number(s)150
-
TitleSparse Common and Distinctive Covariates Logistic Regression: classification method for high-dimensional multiblock dataen
-
DRO typeconferenceObjecten
-
Visible tag(s)ZPID Conferences and Workshops