Reproducible Text Analysis with Topic Modeling

Bittermann, André

Conference Object

Reproducible Text Analysis with Topic Modeling

Author(s) / Creator(s)

Bittermann, André

Abstract / Description

Topic Modeling is a popular text mining method for finding the central topics in large collections of texts. In this process, an algorithm identifies groups of words that frequently occur together in the texts. These groups of words are called "topics". Since text collections of any size can thus be evaluated automatically, topic modeling can be an insightful tool for various text-based applications, such as social media studies or psychotherapy research. Even though Topic Modeling is an "unsupervised machine learning" technique, many parameter decisions have to be made by the person doing the analysis. Since these decisions can have strong effects on the results and are partly based on random numbers, good documentation and freely available analysis code are crucial for reproducible Topic Modeling. In this introductory demonstration, the established topic modeling variant "Latent Dirichlet Allocation" is presented and applied to a freely available dataset. Special emphasis is placed on topic validity and topic reliability - two often overlooked but important model properties. An example is used to show how transparent and detailed code can make the analysis reproducible. A brief introduction to PsychTopics (psychtopics.org), ZPID's open-source tool for exploring psychological research topics and trends, is also provided. This uses a novel topic modeling approach to dynamically identify topics in psychological publications and interactively display them in an R Shiny app. These are the slides for the topic modeling demonstration in the "Practices of Open Science" Lecture series. Find more information here: https://leibniz-psychology.org/en/opensciencelectures/topic-modeling/

Keyword(s)

topic modeling latent dirichlet allocation text analysis text mining

Persistent Identifier

https://doi.org/10.23668/psycharchives.8382

Date of first publication

2022-11-03

Is part of

PTOS, 2022, online

Publisher

PsychArchives

Citation

Select Style

Download BibTex

Download as Text

PTOS_Topic Modeling_021122.pdf

Adobe PDF - 10.71MB

MD5 : 64f4bed3cefe2089006b3e51614fd3af

Sharing Level 0 (Public Use) CC-BY-SA 4.0

Download

There are no other versions of this object.

Author(s) / Creator(s)

Bittermann, André
PsychArchives acquisition timestamp

2022-11-03T16:56:28Z
Made available on

2022-11-03T16:56:28Z
Date of first publication

2022-11-03
Abstract / Description

Topic Modeling is a popular text mining method for finding the central topics in large collections of texts. In this process, an algorithm identifies groups of words that frequently occur together in the texts. These groups of words are called "topics". Since text collections of any size can thus be evaluated automatically, topic modeling can be an insightful tool for various text-based applications, such as social media studies or psychotherapy research. Even though Topic Modeling is an "unsupervised machine learning" technique, many parameter decisions have to be made by the person doing the analysis. Since these decisions can have strong effects on the results and are partly based on random numbers, good documentation and freely available analysis code are crucial for reproducible Topic Modeling. In this introductory demonstration, the established topic modeling variant "Latent Dirichlet Allocation" is presented and applied to a freely available dataset. Special emphasis is placed on topic validity and topic reliability - two often overlooked but important model properties. An example is used to show how transparent and detailed code can make the analysis reproducible. A brief introduction to PsychTopics (psychtopics.org), ZPID's open-source tool for exploring psychological research topics and trends, is also provided. This uses a novel topic modeling approach to dynamically identify topics in psychological publications and interactively display them in an R Shiny app. These are the slides for the topic modeling demonstration in the "Practices of Open Science" Lecture series. Find more information here: https://leibniz-psychology.org/en/opensciencelectures/topic-modeling/

en
Review status

unknown

en
Persistent Identifier

https://hdl.handle.net/20.500.12034/7665
Persistent Identifier

https://doi.org/10.23668/psycharchives.8382
Language of content

eng
Publisher

PsychArchives

en
Is part of

PTOS, 2022, online
Is related to

https://hdl.handle.net/20.500.12034/8154
Keyword(s)

topic modeling

en
Keyword(s)

latent dirichlet allocation

en
Keyword(s)

text analysis

en
Keyword(s)

text mining

en
Dewey Decimal Classification number(s)

150
Title

Reproducible Text Analysis with Topic Modeling

en
DRO type

conferenceObject

en
Visible tag(s)

ZPID Conferences and Workshops