Finding Scientific Topics in Continuously Growing Text Corpora
Author(s) / Creator(s)
Bittermann, André
Rieger, Jonas
Abstract / Description
The ever growing amount of research publications demands computational assistance for everyone trying to keep track with scientific processes. Topic modeling has become a popular approach for finding scientific topics in static collections of research papers. However, the reality of continuously growing corpora of scholarly documents poses a major challenge for traditional approaches. We introduce RollingLDA for an ongoing monitoring of research topics, which offers the possibility of sequential modeling of dynamically growing corpora with time consistency of time series resulting from the modeled texts. We evaluate its capability to detect research topics and present a Shiny App as an easy-to-use interface. In addition, we illustrate usage scenarios for different user groups such as researchers, students, journalists, or policy-makers.
This is the Accepted Manuscript of: Bittermann, A. & Rieger, J. (2022). Finding Scientific Topics in Continuously Growing Text Corpora. In Arman Cohan et al. (Eds.), Proceedings of the Third Workshop on Scholarly Document Processing (pp. 7–18), Gyeongju, Republic of Korea. Association for Computational Linguistics. https://aclanthology.org/2022.sdp-1.2/
Keyword(s)
research topics topic modeling topic model update science trends topic detection topic monitoring research monitoring PsychTopics Shiny App scientometrics bibliometrics metascience meta-psychology psychological research Latent Dirichlet Allocation R Shiny topic tool topic visualization restricted memory big literature dynamic corpora growing corpora living corpora scholarly documents research publications scientific publications text mining big data hot topics RollingLDA ldaPrototype PSYNDEX topic model validation term lifting topic shifts topics splits topic appPersistent Identifier
Date of first publication
2022-09-09
Journal title
Proceedings of the Third Workshop on Scholarly Document Processing
Page numbers
7-18
Publisher
PsychArchives
Publication status
acceptedVersion
Review status
reviewed
Is version of
Citation
-
Bittermann & Rieger (2022)_Findings Scientific Topics_Preprint.pdfAdobe PDF - 292.6KBMD5: 219a923e8b3ead6eda9f2f1940f14c12
-
There are no other versions of this object.
-
Author(s) / Creator(s)Bittermann, André
-
Author(s) / Creator(s)Rieger, Jonas
-
PsychArchives acquisition timestamp2022-09-09T15:15:13Z
-
Made available on2022-09-09T15:15:13Z
-
Date of first publication2022-09-09
-
Abstract / DescriptionThe ever growing amount of research publications demands computational assistance for everyone trying to keep track with scientific processes. Topic modeling has become a popular approach for finding scientific topics in static collections of research papers. However, the reality of continuously growing corpora of scholarly documents poses a major challenge for traditional approaches. We introduce RollingLDA for an ongoing monitoring of research topics, which offers the possibility of sequential modeling of dynamically growing corpora with time consistency of time series resulting from the modeled texts. We evaluate its capability to detect research topics and present a Shiny App as an easy-to-use interface. In addition, we illustrate usage scenarios for different user groups such as researchers, students, journalists, or policy-makers.en
-
Abstract / DescriptionThis is the Accepted Manuscript of: Bittermann, A. & Rieger, J. (2022). Finding Scientific Topics in Continuously Growing Text Corpora. In Arman Cohan et al. (Eds.), Proceedings of the Third Workshop on Scholarly Document Processing (pp. 7–18), Gyeongju, Republic of Korea. Association for Computational Linguistics. https://aclanthology.org/2022.sdp-1.2/en
-
Publication statusacceptedVersionen
-
Review statusrevieweden
-
Persistent Identifierhttps://hdl.handle.net/20.500.12034/7461
-
Persistent Identifierhttps://doi.org/10.23668/psycharchives.8168
-
Language of contenteng
-
PublisherPsychArchivesen
-
Is version ofhttps://aclanthology.org/2022.sdp-1.2/
-
Is related tohttps://www.psycharchives.org/handle/20.500.12034/8467
-
Is related tohttps://www.psycharchives.org/handle/20.500.12034/9037
-
Keyword(s)research topicsen
-
Keyword(s)topic modelingen
-
Keyword(s)topic model updateen
-
Keyword(s)science trendsen
-
Keyword(s)topic detectionen
-
Keyword(s)topic monitoringen
-
Keyword(s)research monitoringen
-
Keyword(s)PsychTopicsen
-
Keyword(s)Shiny Appen
-
Keyword(s)scientometricsen
-
Keyword(s)bibliometricsen
-
Keyword(s)metascienceen
-
Keyword(s)meta-psychologyen
-
Keyword(s)psychological researchen
-
Keyword(s)Latent Dirichlet Allocationen
-
Keyword(s)R Shinyen
-
Keyword(s)topic toolen
-
Keyword(s)topic visualizationen
-
Keyword(s)restricted memoryen
-
Keyword(s)big literatureen
-
Keyword(s)dynamic corporaen
-
Keyword(s)growing corporaen
-
Keyword(s)living corporaen
-
Keyword(s)scholarly documentsen
-
Keyword(s)research publicationsen
-
Keyword(s)scientific publicationsen
-
Keyword(s)text miningen
-
Keyword(s)big dataen
-
Keyword(s)hot topicsen
-
Keyword(s)RollingLDAen
-
Keyword(s)ldaPrototypeen
-
Keyword(s)PSYNDEXen
-
Keyword(s)topic model validationen
-
Keyword(s)term liftingen
-
Keyword(s)topic shiftsen
-
Keyword(s)topics splitsen
-
Keyword(s)topic appen
-
Dewey Decimal Classification number(s)150
-
TitleFinding Scientific Topics in Continuously Growing Text Corporaen
-
DRO typearticleen
-
Leibniz institute name(s) / abbreviation(s)ZPID
-
Journal titleProceedings of the Third Workshop on Scholarly Document Processingen
-
Page numbers7-18
-
Visible tag(s)Accepted Manuscripten