Article Accepted Manuscript

Finding Scientific Topics in Continuously Growing Text Corpora

Author(s) / Creator(s)

Bittermann, André
Rieger, Jonas

Abstract / Description

The ever growing amount of research publications demands computational assistance for everyone trying to keep track with scientific processes. Topic modeling has become a popular approach for finding scientific topics in static collections of research papers. However, the reality of continuously growing corpora of scholarly documents poses a major challenge for traditional approaches. We introduce RollingLDA for an ongoing monitoring of research topics, which offers the possibility of sequential modeling of dynamically growing corpora with time consistency of time series resulting from the modeled texts. We evaluate its capability to detect research topics and present a Shiny App as an easy-to-use interface. In addition, we illustrate usage scenarios for different user groups such as researchers, students, journalists, or policy-makers.
This is the Accepted Manuscript of: Bittermann, A. & Rieger, J. (2022). Finding Scientific Topics in Continuously Growing Text Corpora. In Arman Cohan et al. (Eds.), Proceedings of the Third Workshop on Scholarly Document Processing (pp. 7–18), Gyeongju, Republic of Korea. Association for Computational Linguistics. https://aclanthology.org/2022.sdp-1.2/

Keyword(s)

research topics topic modeling topic model update science trends topic detection topic monitoring research monitoring PsychTopics Shiny App scientometrics bibliometrics metascience meta-psychology psychological research Latent Dirichlet Allocation R Shiny topic tool topic visualization restricted memory big literature dynamic corpora growing corpora living corpora scholarly documents research publications scientific publications text mining big data hot topics RollingLDA ldaPrototype PSYNDEX topic model validation term lifting topic shifts topics splits topic app

Persistent Identifier

Date of first publication

2022-09-09

Journal title

Proceedings of the Third Workshop on Scholarly Document Processing

Page numbers

7-18

Publisher

PsychArchives

Publication status

acceptedVersion

Review status

reviewed

Is version of

Citation

  • Author(s) / Creator(s)
    Bittermann, André
  • Author(s) / Creator(s)
    Rieger, Jonas
  • PsychArchives acquisition timestamp
    2022-09-09T15:15:13Z
  • Made available on
    2022-09-09T15:15:13Z
  • Date of first publication
    2022-09-09
  • Abstract / Description
    The ever growing amount of research publications demands computational assistance for everyone trying to keep track with scientific processes. Topic modeling has become a popular approach for finding scientific topics in static collections of research papers. However, the reality of continuously growing corpora of scholarly documents poses a major challenge for traditional approaches. We introduce RollingLDA for an ongoing monitoring of research topics, which offers the possibility of sequential modeling of dynamically growing corpora with time consistency of time series resulting from the modeled texts. We evaluate its capability to detect research topics and present a Shiny App as an easy-to-use interface. In addition, we illustrate usage scenarios for different user groups such as researchers, students, journalists, or policy-makers.
    en
  • Abstract / Description
    This is the Accepted Manuscript of: Bittermann, A. & Rieger, J. (2022). Finding Scientific Topics in Continuously Growing Text Corpora. In Arman Cohan et al. (Eds.), Proceedings of the Third Workshop on Scholarly Document Processing (pp. 7–18), Gyeongju, Republic of Korea. Association for Computational Linguistics. https://aclanthology.org/2022.sdp-1.2/
    en
  • Publication status
    acceptedVersion
    en
  • Review status
    reviewed
    en
  • Persistent Identifier
    https://hdl.handle.net/20.500.12034/7461
  • Persistent Identifier
    https://doi.org/10.23668/psycharchives.8168
  • Language of content
    eng
  • Publisher
    PsychArchives
    en
  • Is version of
    https://aclanthology.org/2022.sdp-1.2/
  • Is related to
    https://www.psycharchives.org/handle/20.500.12034/8467
  • Is related to
    https://www.psycharchives.org/handle/20.500.12034/9037
  • Keyword(s)
    research topics
    en
  • Keyword(s)
    topic modeling
    en
  • Keyword(s)
    topic model update
    en
  • Keyword(s)
    science trends
    en
  • Keyword(s)
    topic detection
    en
  • Keyword(s)
    topic monitoring
    en
  • Keyword(s)
    research monitoring
    en
  • Keyword(s)
    PsychTopics
    en
  • Keyword(s)
    Shiny App
    en
  • Keyword(s)
    scientometrics
    en
  • Keyword(s)
    bibliometrics
    en
  • Keyword(s)
    metascience
    en
  • Keyword(s)
    meta-psychology
    en
  • Keyword(s)
    psychological research
    en
  • Keyword(s)
    Latent Dirichlet Allocation
    en
  • Keyword(s)
    R Shiny
    en
  • Keyword(s)
    topic tool
    en
  • Keyword(s)
    topic visualization
    en
  • Keyword(s)
    restricted memory
    en
  • Keyword(s)
    big literature
    en
  • Keyword(s)
    dynamic corpora
    en
  • Keyword(s)
    growing corpora
    en
  • Keyword(s)
    living corpora
    en
  • Keyword(s)
    scholarly documents
    en
  • Keyword(s)
    research publications
    en
  • Keyword(s)
    scientific publications
    en
  • Keyword(s)
    text mining
    en
  • Keyword(s)
    big data
    en
  • Keyword(s)
    hot topics
    en
  • Keyword(s)
    RollingLDA
    en
  • Keyword(s)
    ldaPrototype
    en
  • Keyword(s)
    PSYNDEX
    en
  • Keyword(s)
    topic model validation
    en
  • Keyword(s)
    term lifting
    en
  • Keyword(s)
    topic shifts
    en
  • Keyword(s)
    topics splits
    en
  • Keyword(s)
    topic app
    en
  • Dewey Decimal Classification number(s)
    150
  • Title
    Finding Scientific Topics in Continuously Growing Text Corpora
    en
  • DRO type
    article
    en
  • Leibniz institute name(s) / abbreviation(s)
    ZPID
  • Journal title
    Proceedings of the Third Workshop on Scholarly Document Processing
    en
  • Page numbers
    7-18
  • Visible tag(s)
    Accepted Manuscript
    en