Conference Object

Sparse Common and Distinctive Covariates Logistic Regression: classification method for high-dimensional multiblock data

Author(s) / Creator(s)

Park, Soogeun
Ceulemans, Eva
Van Deun, Katrijn

Abstract / Description

Datasets comprised of large sets of variables from multiple sources concerning the same observation units are becoming more widespread today. Constructing a classification model in the context of such high-dimensional and multi-block datasets involves a multitude of challenges: variable selection, classification of the response variable and identification of processes at play underneath the predictors. These processes are of particular interest in the setting of multi-block data because they can either be associated individually with single data blocks or jointly with multiple blocks. Many methods have addressed the classification problem in high-dimensionality for a single block of data. However, the additional challenge of capturing and distinguishing distinctive and joint processes from multi-block data has not received sufficient attention. To this end, we propose Sparse Common and Distinctive Covariates Logistic Regression (SCD-Cov-logR). The method extends principal covariates regression to multi-block settings and combines with generalized linear modeling framework to allow classification of a categorical response while revealing predictive processes that involve single or multiple data blocks. In a simulation study, SCD-Cov-logR resulted in outperformance compared to related methods commonly used in behavioural sciences.

Persistent Identifier

Date of first publication

2021-05-18

Is part of

Research Synthesis & Big Data, 2021, online

Publisher

ZPID (Leibniz Institute for Psychology)

Citation

Park, S., Ceulemans, E., & Van Deun, K. (2021). Sparse Common and Distinctive Covariates Logistic Regression: classification method for high-dimensional multiblock data. ZPID (Leibniz Institute for Psychology). https://doi.org/10.23668/PSYCHARCHIVES.4831
  • Author(s) / Creator(s)
    Park, Soogeun
  • Author(s) / Creator(s)
    Ceulemans, Eva
  • Author(s) / Creator(s)
    Van Deun, Katrijn
  • PsychArchives acquisition timestamp
    2021-05-14T13:16:47Z
  • Made available on
    2021-05-14T13:16:47Z
  • Date of first publication
    2021-05-18
  • Abstract / Description
    Datasets comprised of large sets of variables from multiple sources concerning the same observation units are becoming more widespread today. Constructing a classification model in the context of such high-dimensional and multi-block datasets involves a multitude of challenges: variable selection, classification of the response variable and identification of processes at play underneath the predictors. These processes are of particular interest in the setting of multi-block data because they can either be associated individually with single data blocks or jointly with multiple blocks. Many methods have addressed the classification problem in high-dimensionality for a single block of data. However, the additional challenge of capturing and distinguishing distinctive and joint processes from multi-block data has not received sufficient attention. To this end, we propose Sparse Common and Distinctive Covariates Logistic Regression (SCD-Cov-logR). The method extends principal covariates regression to multi-block settings and combines with generalized linear modeling framework to allow classification of a categorical response while revealing predictive processes that involve single or multiple data blocks. In a simulation study, SCD-Cov-logR resulted in outperformance compared to related methods commonly used in behavioural sciences.
    en
  • Publication status
    unknown
    en
  • Review status
    unknown
    en
  • Citation
    Park, S., Ceulemans, E., & Van Deun, K. (2021). Sparse Common and Distinctive Covariates Logistic Regression: classification method for high-dimensional multiblock data. ZPID (Leibniz Institute for Psychology). https://doi.org/10.23668/PSYCHARCHIVES.4831
    en
  • Persistent Identifier
    https://hdl.handle.net/20.500.12034/4268
  • Persistent Identifier
    https://doi.org/10.23668/psycharchives.4831
  • Language of content
    eng
  • Publisher
    ZPID (Leibniz Institute for Psychology)
    en
  • Is part of
    Research Synthesis & Big Data, 2021, online
    en
  • Dewey Decimal Classification number(s)
    150
  • Title
    Sparse Common and Distinctive Covariates Logistic Regression: classification method for high-dimensional multiblock data
    en
  • DRO type
    conferenceObject
    en
  • Visible tag(s)
    ZPID Conferences and Workshops