Please use this identifier to cite or link to this item:
Title: Usability of web scraping of open-source discussions for identifying key beliefs
Authors: Gordoni, Galit
Steinmetz, Holger
Schmidt, Peter
Issue Date: 28-May-2019
Publisher: ZPID (Leibniz Institute for Psychology Information)
Abstract: Background: The recent years has brought tremendous interest in the collection and use of Big Data. While in the first phase of interest, the discussion largely focused on practical and societal issues, researchers have begun to consider the use of Big Data for scientific uses. In Psychology, there is an increasing interest in the usability of user-generated data for addressing psychological research questions (Adjerid & Kelley, 2018; Harlow & Oswald, 2016). As a prominent data collection method, web scraping (i.e., an automated tool for finding and extracting data from online sources) has been used for research on eating disorders (Moessner, et al., 2018), mental toughness (Gucciardi, 2017) and personality (Farnadi et al., 2016). One frequent characteristic of common Big Data analytics is its exploratory nature. In contrast, researchers increasingly demand to use it for theory-relevant research (e.g., Shmueli, 2010). Although web scraping is increasingly applied it is still not clear whether posts, can serve as a valuable data source in theory-driven empirical studies. In this study we address the lack of knowledge on usability of user-generated data for assessing research questions concerning beliefs of people (Eagly & Chaiken, 1993). As a relevant, theoretical framework that focuses on the fundamental role of beliefs in interventions, we draw on the well replicated social psychological theory—the Theory of Planned Behavior (TPB; Ajzen, 1991). The theory integrates the cognitive foundation of motivational and decision processes (i.e., the beliefs) with attitudes, perceptions of social legitimization, efficacy, and feasibility of the behavior in question (Fishbein & Ajzen, 2010). Briefly, the theory claims that deliberate behavior is mainly determined by the intention to perform the behavior. The intention, in turn, is a function of the attitude towards the behavior (i.e., the perceived attractiveness of the behavior), the subjective norm (i.e., the perceived expectations of important others towards conducting the behavior), and the perceived behavioral control (i.e., the perceived feasibility and control with regard to the behavior). Furthermore, the theory claims that these motivationally relevant factors are based on beliefs about positive and negative consequences of the behavior, the opinions of specific others and barriers and facilitators. The TPB serves as a central theoretical framework for understanding and changing behaviors. Since changing beliefs is the essence of intervention approaches, knowledge about potent beliefs of potential benefits, costs, social expectations, barriers, and facilitators of the behavior, is not only of theoretical value but provides the basis for practical endeavours to change behaviors (Steinmetz et al. 2016). The initial stage in a TPB driven study includes identifying motivationally relevant key beliefs via a qualitative pilot study. While this procedure (Ajzen & Fishbein, 1980; Fishbein & Ajzen, 2010) has been fruitful for identifying relevant beliefs for decades of TPB research, it has the limitation that the number of respondents is very small and that the approach runs the danger of reactive responses. Especially in cases with a non-familiar behavior, the comments may lack validity and will not concern those beliefs which occur in a natural decision process. In this study we focus on the potential of open-source discussions to serve as an additional data source that resembles the pitfalls of self-reported answers. Users comments are produced by individuals concerned with consequences of the behavior in question or expected difficulties of conducting the behavior, formulated in a natural setting, with no potential response bias due to factors, such as, interviewer effect, topic complexity and topic sensitivity. Objectives: We aim to advance the knowledge on the usability of integrating web scraping of web discussions in the initial stage of theory-driven belief study, for identifying key beliefs underlying behaviors under interest. Research questions: We use the behavior of Big Data adoption in organizations as an illustrative case for testing the following questions: 1. What are the key beliefs concerning Big Data adoption (behavioral beliefs, normative beliefs and control beliefs)? 2. Do key behavioral, normative and control beliefs concerning Big Data adoption identified in user-generated posts differ from those identified in self-report surveys? Method: We conducted web scraping study of discussion boards on Big Data usage in Israel, generated between June and August 2018. Discussions appeared mainly after online articles (41%), in social networks (25%) and forums (19%). Unit of analysis was the complete discussion beginning with the opening post up to the closing one. 353 authentic discussions (i.e., containing at least 2 comments) were scraped. Content analysis was conducted, manually for a sample of 148 authentic discussions. We applied the methodology used for identifying key beliefs in TPB driven studies (de Leeuw et al., 2015) for counting the number of times a given category of comment content appeared across discussions. Second, following Landers et al. (2016), we compared the beliefs found via web scraping with representative surveys in French companies (Raguseo, 2018) and in German companies (Commerzbank AG, 2018). These external data sources serve as a base rate for testing the replicability of key beliefs found in the web scraping data. For comparison we used for example the response distribution of the following multiple response question “What are the benefits to companies from the systematic use of digital data?” asked in the German companies survey (n=2004) conducted in 2017. Results: Initial and descriptive results will be presented. Content analysis resulted in classification of the 148 discussions into semantic units representing the advantages and disadvantages of big data adoption, list of potential stakeholders, and factors that could impede or facilitate it. Initial results show similarity in the content of beliefs and frequency rank across the independent data sources. For example, the most frequently cited advantage, in both data sources, German survey and web scraping, was better decision making (cited by 58% of survey participants and in 41% of scraped discussions that cited advantages). Conclusions and expected implications: Drawing upon web scraping of open-source discussions, we demonstrated initial results supporting the usefulness of using web scraping as an observational data collection method in first stages of identifying key beliefs underlying specific behaviors for a theory-driven belief-scale development. References: Adjerid, I., & Kelley, K. (2018). Big data in psychology: A framework for research advancement. American Psychologist, 73(7), 899-917. Ajzen, I. (1991). The theory of planned behavior. Organizational Behavior and Human Decision Processes, 50(2), 179-211. ‏ Ajzen, I., & Fishbein, M. (1980). Understanding attitudes and predicting social behavior.Englewood Cliffs, NJ: Prentice-Hall. Commerzbank Initiative Unternehmerperspektiven (2017). The Raw Material of the 21st century: Big Data, Smart Data – Lost Data? Retrieved from De Leeuw, A., Valois, P., Ajzen, I., & Schmidt, P. (2015). Using the theory of planned behavior to identify key beliefs underlying pro-environmental behavior in high-school students: Implications for educational interventions. Journal of Environmental Psychology, 42, 128-138. ‏ Eagly, A. H., & Chaiken, S. (1993). The psychology of attitudes. Harcourt Brace Jovanovich College Publishers. ‏ Farnadi, G., Sitaraman, G., Sushmita, S., Celli, F., Kosinski, M., Stillwell, D., ... & De Cock, M. (2016). Computational personality recognition in social media. User Modeling and User-Adapted Interaction, 26(2-3), 109-142. ‏ Fishbein, M., & Ajzen, I. (2010). Predicting and changing behavior: The reasoned action approach. Psychology Press. ‏Gucciardi, D. F. (2017). Mental toughness: progress and prospects. Current Opinion in Psychology, 16, 17-23. ‏ Harlow, L. L., & Oswald, F. L. (2016). Big data in psychology: Introduction to special issue. Psychological Methods, 21(4), 447–457. Landers, R. N., Brusso, R. C., Cavanaugh, K. J., & Collmus, A. B. (2016). A primer on theory-driven web scraping: Automatic extraction of big data from the Internet for use in psychological research. Psychological Methods, 21(4), 475-492. ‏Moessner, M., Feldhege, J., Wolf, M., & Bauer, S. (2018). Analyzing big data in social media: Text and network analyses of an eating disorder forum. International Journal of Eating Disorders, 51(7), 656-667. Raguseo, E. (2018). Big data technologies: An empirical investigation on their adoption, benefits and risks for companies. International Journal of Information Management, 38(1), 187-195. ‏ ‏ ‏ Shmueli, G. (2010). To explain or to predict?. Statistical Science, 25(3), 289-310. Steinmetz, H., Knappstein, M., Ajzen, I., Schmidt, P., & Kabst, R. (2016). How effective are behavior change interventions based on the theory of planned behavior?. Zeitschrift für Psychologie, 224(3), 216–233. ‏‏
Citation: Gordoni, G., Steinmetz, H., & Schmidt, P. (2019). Usability of web scraping of open-source discussions for identifying key beliefs. ZPID (Leibniz Institute for Psychology Information).
Appears in Collections:Conference Object

Files in This Item:
File Description SizeFormat 
1_Gordoni et al_28_5_2019.pdf
Public UseCC-BY-SA 4.0
Conference Talk478 kBAdobe PDFDownload