Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Votes on Twitter: Assessing Candidate Preferences and Topics of Discussion During the 2016 U.S. Presidential Election:

Votes on Twitter: Assessing Candidate Preferences and Topics of Discussion During the 2016 U.S.... Social media offers scholars new and innovative ways of understanding public opinion, including citizens’ prospective votes in elections and referenda. We classify social media users’ preferences over the two U.S. presidential candidates in the 2016 election using Twitter data and explore the topics of conversation among proClinton and proTrump supporters. We take advantage of hashtags that signaled users’ vote preferences to train our machine learning model which employs a novel classifier—a Topic-Based Naive Bayes model—that we demonstrate improves on existing classifiers. Our findings demonstrate that we are able to classify users with a high degree of accuracy and precision. We further explore the similarities and divergences among what proClinton and proTrump users discussed on Twitter. Keywords computer science, user classification, topic modelling, twitter, political science, social science For decades scholars have turned to surveys to understand where random sampling is a strength. At the same time, if public opinion, particularly in the context of citizens’ vote researchers are especially interested in the expressed views choice. Surveys have real advantages; they offer insight on a of an engaged and active audience, posts on sites social host of political attitudes and beliefs, they allow one to media have a particular value. And, it is on social media explore how and why respondents hold certain views, and where scholars can well study the intensity of opinion, as they have been shown to often be valid predictors of election citizens post on issues and ideas that interest them, expressed outcomes. At the same time, surveys are not without limita- in their own way. A second challenge relates to the very tions: for example, the designs are typically static in nature, nature of social media—its infrastructure and affordances. respondents may offer poorly informed or misinformed Sites such as Facebook, for example, allow for long posts responses, or the issues being probed may not correspond to and subposts for discussion. Information conveyed on those citizens truly care about. Even the costs of implemen- Facebook can be useful in a myriad of ways: even revealing tation can be prohibitive in many electoral contexts. users’ ideology (Bond & Messing, 2015). At the same time, Researchers in recent years have recognized the utility of many Facebook users protect the privacy of their posts. Posts assessing public opinion in new and previously unavailable on Twitter, on the other hand, are (most often) public, ways, especially through modern information technologies although character restrictions on the length of tweets mean such as social media. Posts on social media sites are by their that posts will not only be short but also frequently adopt very nature contemporaneous and dynamic, and they reflect unconventional language including abbreviations and an interested and engaged public’s view across a diversity of hashtags that can complicate interpretation. Third, social topics that citizens care about. Social media can open chan- media conversations are text-based and are typically absent a nels for political expression, engagement, and participation readily identifiable signal of vote choice or preference, and (Tucker, Theocharis, Roberts, & Barberá, 2017). thus more challenging to interpret. Of course, analyzing public opinion through the lens of social media presents its own unique set of challenges. First, 1 University of Glasgow, Glasgow, UK scholars have noted that posts on sites such as Facebook, University of South Alabama, Mobile, USA Twitter, and Snapchat are typically unrepresentative of the Corresponding Author: views of the population as a whole (Barberá & Rivero, 2014; Anjie Fang, School of Computing Science, University of Glasgow, Sir Beauchamp, 2016; Burnap, Gibson, Sloan, Southern, & Alwyn Williams Building, Lilybank Gardens, Glasgow G12 8QQ, UK. Williams, 2016) particularly in comparison with surveys Email: a.fang.1@research.gla.ac.uk Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). 2 SAGE Open Here, we take advantage of hashtags that signal vote respective hashtags to train a classifier on the remaining text choice to mitigate some of these concerns, training a classi- of the tweet. Once removed, we employ a set of machine fier based on the content of tweets from users who signified learning classifiers, including our TBNB, to determine their candidate preference via the consistent use of certain whether we can with accuracy and precision classify the vote hashtags. Because hashtags serve as a convenient means for signal, the hashtag. We compare the performance of several users to catalog a post as germane to a given topic and classifiers against one another using standard evaluation because users invoke hashtags often, researchers can rely on metrics, finding TBNB to outperform the others in our train- hashtags to probe conversations and topics of interest. ing data. Given our high degree of success, we then apply our Moreover, certain hashtags can even convey user prefer- trained TBNB classifier to our “unseen data” to understand ences or political attitudes over a given issue or candidate. candidate support across a much wider and indeed massive For example, in the lead up to the 2016 election, a user audience—drawing on commonalities in the content of invoking the hashtag #VoteTrump signals their preference tweets among our labeled hashtag users in our training data for a candidate. Similarly, #AlwaysHillary indicates support and among users in our unseen data to assess overall levels of for Hillary Clinton. candidate support on Twitter. We evaluate the classification The aims of our study are several. We turn to a new data of our unseen data, the out-of-sample performance, with a source, our own collection of approximately 29.5 million Crowdflower study of a subset of Twitter users. We then publicly available tweets related to the 2016 U.S. presiden- move to understanding the topics of discussion surrounding tial election, to assess support for the presidential candidates the 2016 election within the two communities: those sup- on the Twitter platform. We train a machine learning classi- porting Donald Trump and those supporting Hillary Clinton. fier on the tweets of those users who adopted hashtags that Did Clinton supporters and Trump supporters discuss the signaled their support for a particular candidate, and then we same issues? Or did they diverge in their conversations? In apply our classifier to understand the viewers of a much answering these questions, we shed light on the relevant top- larger audience. We validate our classifier with a study for a ics associated with candidate support. As a final virtue, our subset of Twitter users, evaluating the “vote” label of our methodology is flexible and can be translated well to other classifier against Crowdflower workers’ assessment of which electoral contexts—local, state, and other federal elections candidate a given Twitter user preferred. Of course, our goal within the United States and indeed democratic elections and is not to predict the election outcome, as we recognize that referenda in all parts of the world. social media users are not representative of the U.S. voting population, but instead to understand public support for the Context respective candidates on Twitter. Our second and closely related task is to explore the topics of discussion among Our work builds on a rich literature on the utility of Twitter social media users supporting Donald Trump and among in elections, including the ways in which citizens communi- those supporting Hillary Clinton. We look to see what types cate on the platform and the degree to which tweets can be of topics were invoked, and whether we see similar or diver- used to understand vote choice and even electoral outcomes gent issues and themes from these two communities. Thus, (see, for example, Burnap et al., 2016; Jungherr, 2016; we offer a novel means of understanding public conserva- McKelvey, DiGrazia, & Rojas, 2014). In one notable exam- tions during an important election. Taken together, our aim is ple of the latter, when utilizing state-level polling data in to understand the vote preferences of Twitter users and the conjunction with Twitter data, Beauchamp (2016) demon- topic discussions among supporters of the two candidates. strates that Twitter textual features can be deployed effec- Our analysis offers new perspectives on public opinion— tively to understand the dynamics of public opinion and vote candidate support and topics of conversation—in the 2016 choice during the 2012 election cycle. In an overview of the election. literature on social media and elections, Jungherr (2016) To address our twofold research question, we introduce to notes that politicians do look to Twitter as a means of gaug- the social science literature a novel method: a Topic-Based ing citizens’ interest and public opinion. Naive Bayes (TBNB) classifier that integrates Latent The 2016 U.S. presidential election represents a particu- Dirichlet Allocation (LDA) topic models within a Naive larly unique and important context for our study. Both the Bayes classifier framework (Fang, Ounis, Habel, MacDonald, public and scholars alike recognize the novelty of the first & Limsopatham, 2015). We show that the TBNB classifier female major party candidate with a rich political history outperforms others, and it also provides leverage in under- running against a celebrity without prior elected experience standing topics of conversation on Twitter. The application and with a reputation for “telling it like it is.” Indeed, it was of our TBNB proceeds in several steps. We begin by locating clear throughout the summer and fall of 2016 that the two Twitter users who adopted certain hashtags consistently over candidates presented markedly different visions for their time—hashtags that signaled support for either Donald administrations. Not surprising, then, and as we will show, Trump or Hillary Clinton. These users’ tweets represent our the conversations on social media platforms by the commu- ground truth data. From these users’ tweets, we remove the nities of support were numerous and diverse in nature. Fang et al. 3 Tweets covered topics including missing emails, media bias, The use of the hashtags above served as the ground truth the Federal Bureau of Investigation (FBI) and Comey, rac- in the model—the marker assumed to be a valid indicator of ism, border walls, and more. And as we will show, the nature a Twitter user’s preference for independence. With this of discussions and the appearance of topics within a com- ground truth in hand, Fang et al. (2015) implemented a munity evolved over time, with new events and revelations TBNB classification task on the text of the tweets, after triggering new dialogue online. excluding from the tweets the relevant hashtags markers above. The classifier applied LDA to extract discussed topics Twitter User Classification on the 2014 referendum from tweets, and then it leveraged Navies Bayes to construct word probabilities conditional on Our work is focused on understanding users preferences both classes—“Yes” and “No” communities. The authors for the candidates and, importantly, the topics of conversa- demonstrated that they could, with high levels of success, tion within proClinton or proTrump communities. Our identify social media users’ community of support (pro approach both parallels and builds on that of Fang et al. Independence or not) using this approach. (2015) who utilized a TBNB classifier to assess support for Moreover, the successful application of TBNB to the independence in Scotland during the 2014 referendum. To users in the ground truth dataset suggested that one can train give the historical background for that election, on a classifier to assess “Yes” and “No” support a much wider September 18, 2014, voters in Scotland were given the and indeed massive audience. For example, the patterns of opportunity to decide their country’s future—whether they language use for a tweeter who advocated for indepen- wished for Scotland to be independent from the United dence—but often included hashtags of the opposition so as to Kingdom or to remain together with England, Wales, and extend the reach of the tweet or even to troll the opposition— Northern Ireland. The referendum ballot raised the ques- can be used to recognize such a user as a Yes supporter. tion matter-of-factly: “Should Scotland be an independent Similarly, we identify a set of hashtags signaling vote country?” with voters given two straightforward response preference during the 2016 U.S. presidential election, and we options, “Yes” or “No.” then apply the TBNB classifier to assess support on both The goals of the previous study were similarly both to training data and unseen data, and we finally use topic mod- understand social media user’s preferences for Yes or No, eling to extract topics of discussion by proTrump or proClin- and second, to explore the topics of conversation during ton communities. We begin with locating users who the 2014 Independence Referendum among pro and anti- incorporated hashtags into their tweets in a consistent fash- Independence communities. To obtain the ground truth ion over the period leading up to the November election, data, the foundation from which their machine learning which form ground truth labels with the hashtags in Table 1. classifier was built, Fang et al. (2015) relied upon users As one can see from the list below, our chosen hashtags sig- who employed the following hashtags in consistent ways nal support in clear ways—and moreover, the hashtags were over time—hashtags that were interpreted by the research- widely adopted by users during the election to ensure a large ers as definitive signals of vote choice: training dataset. Again to be clear, to be included in the ground truth data- Yes Users (Those Supporting Independence for Scotland): set, users across the 3-month period of analysis leading up to the November 8 election could blend hashtags within either #YesBecause, #YesScotland, #YesScot, #VoteYes the proClinton and proTrump sets above, but they could not ever blend hashtags across these sets. Following Fang et al. No Users (Those Preferring to Remain in the United (2015), after labeling users as proClinton or proTrump, we Kingdom): take advantage of the fact that users tweet additional textual #NoBecause, #BetterTogther, #VoteNo, #NoThanks content beyond hashtags. Our twofold assumption is that there is meaningful textual information conveyed in tweets To be clear, Fang et al. (2015) labeled a social media user (beyond hashtags) that can be used to assess support for a as a “Yes” supporter in the 2014 IndyRef if he or she exclu- given candidate and understand the topics of conversation by sively used one or more of the hashtags in the above “Yes” respective candidate communities, and that the TBNB classi- set during the 2 months leading up to the September referen- fier can learn such patterns and word usages. We thus strip dum. Similarly, if a user used only those hashtags in the “No” the tweets of the hashtags that allowed us to label users as set during the same 2-month period, he or she was labeled as proClinton or proTrump (Table 1) to classify users into pro- a “No” voter. The project excluded those users who at any Clinton and proTrump communities using the textual fea- point during the 2 months leading up to the referendum tures of their tweets. Our results show that we are able to do offered any single tweet that included hashtags in both sets so with a high degree of success. We then apply this classifier Yes and No. Users who blended hashtags or who did not to the larger, unseen data, to determine overall support for incorporate them were left unlabeled. Clinton and Trump on the Twitter platform. 4 SAGE Open Table 1. Hashtags to Label Users. communities. Note that retweets are not included to avoid labeling a user according to someone else’s original content. proClinton proTrump Our labeling method results in 28.1k users in the proClinton #imwithher #trumptrain community who author 245.6k tweets, and 11.6k users in #alwayshillary #alwaystrump the proTrump community who tweet 148.3k times, as seen #strongertogether #votetrump in Table 2. One can see that the proClinton community is #nevertrump #crookedhillary larger than the proTrump one in our training data. #dumptrump #neverhillary #notrump #corrupthillary Unseen data. For our unseen data, we collect tweets in the #antitrump #nohillary 3 months leading up to the 2016 elections—tweets con- taining either keywords or hashtags (or both) that we con- sider election-related. For example, we have tweets with Methodology words or hashtags such as “Trump” or “Hillary” or “Clin- ton” or “debate” or “vote” or “election.” We then collect Figure 1 shows the components of our research design. We all the tweets from all users who authored at least four first collect both our training and our unseen social media 9 tweets that used such hashtags. In total, then, we have data. Next, using the hashtag labeling method, we described 264,518 users with 3,565,336 tweets in our unseen data, as above in section “Twitter User Classification,” we train our shown in Table 2. To be clear, to be included in the unseen candidate community classifier to determine whether a given data, each tweet must include an election-related keyword Twitter user supported Hillary Clinton or Donald Trump, as or hashtag, and each user must have authored at least four described in subsection “Community Classification.” Note such tweets. Our unseen data are of course much larger that our classification task is at the user level, not at the tweet than our training data, given that our training data includes level. In subsection “Crowdflower User Study of Twitter only users who used hashtags consistently and their respec- Users’ Candidate Preferences,” we describe the methodology tive tweets. The candidate preference of Twitter users in for validating the application of our classifier on the unseen our unseen data is what we aim to determine. data through a Crowdflower user study comparing our Next, we explain how we use our training and unseen machine learning classification to Crowdflower worker’s data. As different datasets are used in the following sections, evaluation for a subset of 100 Twitter users. Finally, the we list the usage of the datasets in their respective sections in methodology for extracting the discussed topics in tweets Table 3. The training data is used for training a set of classi- from the proClinton and proTrump communities is discussed fiers as described in subsection “Community Classification,” in subsection “Topic Modeling of the Unseen Data.” and the performance of the classifiers are reported in subsec- tion “Results of classification for the training data,” where we show the TBNB outperforms the others on the training Data Collection data. The subsection “Community Classification” also We begin by collecting a Twitter dataset with a sample of describes the community classification for our unseen data. tweets posted in the United States within a 3-month period We describe the design of our Crowdflower user study that leading up to the election, from August 1 to November 8, speaks to how well our TBNB classifier performs on labeling 2016—election day. This Twitter dataset is crawled using the the candidate preferences of users in our unseen data in sub- Twitter Streaming API by setting a bounding box to cover section “Crowdflower User Study of Twitter Users’ only the area of the United States. We collect roughly 1.5 Candidate Preferences,” thereby assessing the out-of-sample million tweets per day. From this data collection of tweets performance of the classifier. We describe how we conduct and respective users, we divide our data into training and the topic models for the unseen data by proClinton and pro- unseen data. We note that it is possible that tweets posted Trump communities in subsection “Topic Modeling of the from Twitter bot accounts are included in both our training Unseen Data.” Results related to the unseen data are reported and unseen data. in subsection “Vote preferences of unseen Twitter users” showing overall levels of support for the two candidates on Training data. We use the hashtag labeling method described Twitter; subsection “Results of the Crowdflower user study in section “Twitter User Classification” to obtain our train- on the unseen data classification” reports the results from the ing data (i.e., the ground truth data) for the proClinton and Crowdflower study; and subsection “Topics Extracted From proTrump community classification. From the proClinton the Unseen Data, proTrump and proClinton Communities” and proTrump hashtags, we obtain a training dataset con- displays the topics of discussion among proClinton and pro- taining 39,854 users who produce 394,072 tweets, as shown Trump communities. in Table 2. Again, the Twitter users in the training data used We pause here to note that recent attention has been either the proClinton or proTrump hashtags listed in Table 1 drawn to the role of fake news and Twitter bot accounts in and thus can be readily labeled as members of these two influencing public opinion, particularly fake news and bots Fang et al. 5 Figure 1. Components of the analysis. Note. TBNB = Topic-Based Naive Bayes classifier; LDA = Latent Dirichlet allocation. Table 2. Attributes Our Training and Unseen Data. No. of users No. of tweets Dataset proClinton proTrump Total proClinton proTrump Total Training data 28,168 11,686 39,854 245,692 148,380 394,072 Unseen data 264,518 3,565,336 Table 3. The Use of the Training and Unseen Data by Section. Community Classification Datasets Sections Our first, fundamental goal is classification—that is, we wish to understand whether a given Twitter user supported Training data subsections “Community Classification” & Hillary Clinton (and ergo is part of the proClinton commu- “Results of classification for the training data.” nity in our framework) or whether a user supported Donald Unseen data subsections “Community Classification,” “Crowdflower User Study of Twitter Users’ Trump (and thus is part of the proTrump community). One Candidate Preferences,” “Topic Modeling could argue that applying classification algorithms to under- of the Unseen Data,” “Vote preferences stand the vote preferences of Twitter users is unnecessary, of unseen Twitter users,” “Results of the that one could instead look directly at the use of hashtags, Crowdflower user study on the unseen data URLs (Adamic & Glance, 2005), or employ network models classification,” & “Topics Extracted From (M. A. Porter, Mucha, Newman, & Warmbrand, 2005). the Unseen Data, proTrump and proClinton However, most tweets do not contain hashtags or URLs, and Communities.” Twitter users might not have enough followers/followees to construct effective network models. We argue that classifica- tion algorithms improve our understanding of the vote pref- originating from Russia during the 2016 election (Allcott & erences of a large number of Twitter users. Gentzkow, 2017; Guess, Nyhan, & Reifler, 2018; Howard, In computational social science, several classification Woolley, & Calo, 2018; Soroush, Roy, & Aral, 2018; algorithms are often used, among them Decision Tree, Naive Timberg, 2016). To ascertain the presence of Russian bots in Bayes, Support Vector Machine, Neural Networks imple- our analysis, we turn to a list of 2,752 Russian bot accounts mented as C4.5 (Tree), Multinomial Naive Bayes (NB), that were identified by the U.S. House Select Committee on Linear Support Vector Classification (SVM), and Multilayer Intelligence. We then examine how many tweets from Perceptron (MLP) in scikit-learn. Among these, NB, SVM, these accounts are present in our training and unseen datas- and MLP have often been used in text classification (see, for ets. We found none of these Russian bot accounts is present example, Fang et al., 2015; Harrag, Hamdi-Cherif, & in our training data, and a mere 25 tweets from 16 Russian El-Qawasmeh, 2010; Joachims, 1998; Khorsheed & bots are present in our unseen data. Thus, we argue the influ- Al-Thubaity, 2013; McCallum, Nigam, et al., 1998). In addi- ence of these identified bot accounts on our analysis is mini- tion to these classifiers, we also apply the TBNB classifier mal. Our use of a bounding box for our data collection that explained earlier in section “Twitter User Classification.” restricted tweets to accounts within the United States in part For comparison, we deploy a random classifier (RDN), explains why we find so few tweets from these Russian bot which generates classification results (i.e., proTrump or accounts in our data. 6 SAGE Open Table 4. The Classification Results. Recall, F1, and Accuracy (Rijsbergen, 1979). Precision is the fraction of Twitter users correctly labeled among all the Candidate community predicted positive (either proClinton or proTrump) Twitter users, whereas Accuracy is the fraction of correctly classi- proClinton proTrump Accuracy fied Twitter users among all Twitter users. Recall is the frac- RDN tion of Twitter users correctly labeled among all real positive F1 0.582 0.366 Twitter users. F1 represents the harmonic average of Precision 0.703 0.290 0.496 Precision and Recall. Recall 0.496 0.497 Tree F1 0.817 0.639 Crowdflower User Study of Twitter Users’ Precision 0.874 0.567 0.757 Candidate Preferences Recall 0.768 0.733 NB We describe here our Crowdflower user study to evaluate the F1 0.883 0.760 performance of our TBNB classifier on the unseen data. As Precision 0.930 0.689 0.843 we noted in subsection “Data Collection,” our hashtag label- Recall 0.840 0.849 ing method provides the ground truth data for the proClinton/ SVM proTrump classifier. We can (and do, in Table 4) evaluate the F1 0.881 0.747 performance of our classifiers in terms of how effectively Precision 0.916 0.690 0.838 they place users into proClinton and proTrump communities Recall 0.848 0.814 in our training data. However, in the absence of ground truth/ MLP the hashtag labeling method, we cannot evaluate our classi- F1 0.835 0.678 fier’s performance on the unseen data. Therefore, we evalu- Precision 0.897 0.597 0.782 ate the out-of-sample performance of our classifiers by Recall 0.781 0.784 comparing it with judgments made by workers on the TBNB Crowdflower platform. Here, we ask Crowdflower workers F1 0.893 0.753 to determine whether a given Twitter user in our unseen data Precision 0.903 0.734 0.851 supported Hillary Clinton or Donald Trump (or neither) by Recall 0.883 0.772 looking at the content of the user’s tweets, for a random sam- Note. We bold the highest values for reference. RDN = random classifier; ple of 100 Twitter users in our unseen data. Thus, we com- NB = Naive Bayesian classifier; SVM = Support Vector Classification pare the vote classification performance of our classifier to classifier; MLP = Multilayer Perceptron classifier; TBNB = Topic-Based judgments from Crowdflower workers. The interface of this Naive Bayesian classifier. user study is shown in Figure 2. To begin, we randomly select 100 Twitter users from the proClinton) by considering the distribution of classes in the unseen Twitter dataset described in subsection “Data training data. Using multiple classifiers in our analysis Collection.” For each of the 100 selected Twitter users, we allows us to compare and contrast their performance in cat- present crowdsourced workers with at most eight of their egorizing users in our training data into proTrump or pro- respective tweets selected randomly, as seen in the top of Clinton communities, including assessing the utility of our Figure 2. After reading up to eight tweets, a Crowdflower TBNB approach against the others. worker is asked to select whether the given Twitter user sup- We applied steps typical in the preprocessing of text data ports Hillary Clinton or Donald Trump—or if candidate sup- (Grimmer & Stewart, 2013) prior to classification. Steps port cannot be determined, as seen in the lower left of Figure included removing commonly used words that do not help 2. To understand how the workers reach their decision, we improve the classification (i.e., English stop-words). We also also ask them to explain their reasoning through three pro- stemmed the text to root words using a Porter Stemmer (M. vided choices: (a) “Tweets clearly indicate user’s candidate F. Porter, 1997). preference,” (b) “Tweets do not clearly indicate user’s candi- We use the top-ranked 5,000 words in the training data- date preference. But I can figure out the preference by the set as features—the attributes that we rely upon to train the tweets,” (c) “Tweets do not clearly indicate the preference. classifiers for use on the unseen data. Each user is translated This is my balanced choice.” We obtain three independent into TF-IDF vectors for the input of the classifiers. Because judgments of whether each of our 100 Twitter users was pro- we found from our training data that the proTrump commu- Clinton or proTrump, or neither. We report the results of nity was smaller with 11.6k users than the proClinton com- this user study in section 4. munity of 28.2k users in Table 2, we apply oversampling to the proTrump community to avoid class imbalance that may Topic Modeling of the Unseen Data bias the learned classification models. To evaluate the per- formance of our classifiers for each community, we use Our final step is the application of topic modeling to extract three standard metrics in information retrieval: Precision, topics among the tweets within the proClinton and proTrump Fang et al. 7 Figure 2. The user interface of the Crowdflower user study. communities from the unseen data. Here, a topic is a distri- classifier on the unseen data, we are able to classify a much bution over words in a topic model, often represented by the larger group of Twitter users into the two communities: pro- Top n (e.g., n = 10) words according to its distribution. For Clinton and proTrump. Thus, we are able to speak to overall each candidate community, we sample 200k tweets to be support on the Twitter platform, the “Twitter voteshare” for used for topic modeling. In this study, we use time-sensitive the two candidates in the 3 months leading up to the election topic modeling approaches (Fang, MacDonald, Ounis, date. Finally, we show the topics of discussion among the Habel, & Yang, 2017), as they have been shown to be effec- proClinton and proTrump communities. tive for Twitter data and can speak to the dynamics of when topics are invoked over time. The number of topics selected, Performance of the Community Classification known as K, has implications for the coherence of the topics that are extracted (Fang, MacDonald, Ounis and Habel, In this section, we first show the performance of the classi- 2016a): a small K will produce few topics that are difficult to fiers on the training data. We apply our TBNB classifier on interpret, given that they include many different themes and unseen data to assess Twitter users’ candidate preferences. ideas; whereas, a large K will produce more finite topics but We then report the results of our Crowdflower user study, ones that may not differentiate themselves well from one which indicates how well the classifier performed on the another. To select K, we first set K from 10 to 100, with step unseen data, assessing its out-of-sample performance. 10 to obtain topic models with a good quality. To evaluate the coherence and quality of the resulting topics, we use Twitter Results of classification for the training data. Table 4 speaks to coherence metrics developed by Fang, MacDonald, Ounis, the results of our classification task by several classification 25 26 and Habel (2016b). We use the average coherence and algorithms we employ, including our TBNB. As we coherence@n (c@n) to select the appropriate K number to described in subsection “Community Classification,” the yield more coherent, interpretable topics. table compares the performance of the classifier in determin- ing whether a user is proClinton or proTrump based on the textual content of tweets in our training data assessed against Results & Analysis the vote preference as revealed by the consistent use of We offer two sets of results, first related to classification for hashtags in Table 1. both our training data and unseen data in subsection From Table 4, we can see that, with the exception of the “Performance of the Community Classification,” and next random classifier (RDN), all of the classifiers exhibit a related to the topic modeling by proClinton and proTrump strong performance on the F1, Precision, Recall, and communities in subsection “Topics Extracted From the Accuracy metrics. Clearly, Twitter users in the proClinton Unseen Data, proTrump and proClinton Communities.” We and proTrump communities differentiated themselves well first report how successful we are in categorizing Twitter from one another, that the language of their tweets was suf- users in our training data as proClinton or proTrump. We ficiently distinct so as to be able to classify users correctly as show a remarkable degree of success in this task, particularly proClinton and proTrump in ways consistent with their adop- with our TBNB classifier. By subsequently applying the tion of hashtags displayed in Table 1. One can also see from 8 SAGE Open Table 5. Topics Generated From the Training Data. Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Top 10 ranked words #nevertrump vote America #nevertrump #maga Clinton party #neverhillary #dumptrump #votetrump doesn election crookedhillary #debalwaystrump #wakeupamerica tax gop Obama news #draintheswamp racist win Country media #tcot pay voting #womenwhovotetrump right women #dumptrump candidate White stupid watch care support #trumptrain follow #foxnews new voters People think way campaign nominee America delegate America Top 10 ranked words Topic 6 Topic 7 Topic 8 Topic 9 Topic 10 #demsinphilly #debatenight #dumptrump #vote #crookedhillary #hillaryclinton tonight @cnn #election #voting #demconvention #stoptrump @gop #dealmein #bernieorbust speech watch @debalwaystrump best #stillsanders proud president @msnbc #hillarystrong #demexit woman right @seanhannity #electionday #electionfraud @timkaine like @reince #goparemurderers #revolution history wait @tedcruz #gopendtimes #votersuppression amazing yes @speakerryan #voteblue #dncleak ready woman #rncincle #clintonkaine #nevertrumporhillary Table 6. Findings Our proClinton and proTrump Communities in the Unseen Data. No. of users No. of tweets Dataset proClinton proTrump Total proClinton proTrump Total Unseen data 196,702 67,816 264,518 2,126,276 1,439,060 3,565,336 the table that the TBNB classifier achieves the highest accu- 4, and 9 are proClinton topics, whereas Topics 3, 5, and 10 racy among all the classifiers, 0.851. That is, in 85.1% of are proTrump. The remaining topics in Table 5 do not have instances, the TBNB is able to classify the candidate prefer- clear polarity. This is not to say, however, that Twitter ence of the Twitter user in our ground truth data accurately— users from the two community communicate similarly using the textual information in their tweets—devoid of the about these topics. As the TBNB classifier can distinguish hashtags in Table 1. To be clear, this result demonstrates that the word probability conditioned on both topics and com- for those in our training data, the text of their tweets provides munities, it can capture the different word usage of the two information that can readily identify Twitter users as pro- communities within a topic to classify a Twitter user’s Trump and proClinton, matching the label applied based on community. the social media user’s use of hashtags. As we have argued in section “Twitter User Vote preferences of unseen Twitter users. In applying the Classification,” an added value of the TBNB classifier is trained TBNB classifier on our unseen data, we find that our also the generation of the topics used in the classification unseen tweeters differentiate into 196,702 proClinton users algorithm, which we show in Table 5. The table displays authoring 2,126,276 tweets (10.81 on average), and 67,816 the topics generated from the training data by applying the proTrump users with 1,439,060 tweets (21.98 tweets on standard LDA on the training data in TBNB classifier. The average), as shown in Table 6. The proClinton community listed topics are represented by their Top 10 most frequent across the Twitter platform is much more sizable than the words. For a given Twitter user, the TBNB classifier first proTrump one in terms of the number of users and the num- identifies which topics pertain to the user, and then the ber of election-related tweets, but Trump supporters tweet classifier assigns the community affiliation, either pro- more often on average. Clinton or proTrump, to a Twitter user based on his or her Because each tweet is associated with a time point, we word usages within the related topics. In Table 5, Topics 1, can also examine the dynamics of support, first overall and Fang et al. 9 Figure 3. The number of tweets from proClinton and proTrump communities over time. Table 7. The Performance of Classifiers Compared With Table 7 displays the Cohen’s kappa and accuracy scores Human Judgments. of the classifiers compared with the human judgments among the 76 users with either proClinton or proTrump Classifier Kappa Accuracy labels. All classifiers (with the exception of the random RDN −0.013 0.50 classifier) achieve reasonable accuracy scores. This find- Tree 0.44 0.72 ing suggests that our hashtag labeling method is effective NB 0.60 0.80 in training a valid and reliable classifier. Among the clas- SVM 0.62 0.82 sifiers, we still see that the TBNB classifier has higher MLP 0.58 0.79 kappa and accuracy scores than the others, consistent with TBNB 0.66 0.83 what we saw with the training data. Our user study dem- onstrates that our hashtag labeling method and our TBNB Note. RDN = random classifier; NB = Naive Bayesian classifier; SVM = Support Vector Machine classifier; MLP = Multilayer Perceptron classifier; classifier perform very well overall—that we can distin- TBNB = Topic-Based Naive Bayesian classifier. guish two communities of support, proClinton and proTrump. The remaining 24 users of the 100 randomly selected then by the community. In Figure 3, we show the number of do not have a clear community affiliation according to the tweets that were posted by proClinton and proTrump com- crowdsourced workers. Note that these 24 users can be munities over time. Not only were proClinton tweets more classified as either proClinton or proTrump by our classi- plentiful as we showed above in Table 6, but they were more fier. Thus, it could be that our classifier is in error, which prolific over the entire period of analysis. During the three is entirely plausible, as Twitter users may discuss the elec- televised debates, marked by spikes in the data, we see par- tion—and indeed the use of words in tweets is the crite- ticular activity among the proClinton community. rion to be in our unseen data set—while lacking a vote Results of the Crowdflower user study on the unseen data classi- preference. A second explanation could be that the crowd- fication. Table 7 presents the results of our crowdsourced sourced workers saw an insufficient sample of tweets, and user study examining the performance of our classifier on the in these (up to eight tweets), the vote preferences of the unseen data versus the evaluation of crowdsourced workers. Twitter users were not revealed. Examining additional Among the 100 randomly selected Twitter users in our content may have proven helpful. A third and related unseen data, 76 users are labeled as either proClinton or pro- explanation is that our classifier can correctly identify a Trump according to the online workers. Among these 76 given user as proClinton or proTrump community, but that Twitter users, crowdsourced workers were unanimous for 51 the textual information the classifier relies on to make this (67%)—all three workers agreed that the Twitter user was determination is not immediately discernible to proClinton, or all three agreed the user was proTrump. Con- Crowdflower workers. The classifier uses the top-ranked cerning their explanations for how they determined whether 5,000 words in the training dataset, far more than any a Twitter user was proClinton or proTrump, for 31 users, the Crowdflower worker sees among eight tweets. To illus- workers marked that the “Tweets clearly indicate user’s can- trate by example, #dealmein is found among Topic 9 in didate preference”; for 42 Twitter users, the workers Table 5 as an identifier of the proClinton community. The answered that the “Tweets do not clearly indicate user’s can- hashtag emerged as a result of a tweet by Hillary Clinton didate preference. But I can figure out the preference by the using the phrase “Deal Me In.” Whereas an online worker tweets”; and for three Twitter users, the workers selected that may not have recognized the association with the hashtag the “Tweets do not clearly indicate the preference. This is my and the proClinton community, the classifier was able to balanced choice.” learn it. 10 SAGE Open Figure 4. Topics extracted from proClinton community (Topics 1-6). Discussion of the topics by community. Our figures represent Topics Extracted From the Unseen Data, the 12 most coherent topics from the topic models of the proTrump and proClinton Communities two communities, as evaluated using the aforementioned To understand the topics of discussion among the proTrump topic coherence metrics. For example, Topic 2 in the pro- and proClinton communities, we first apply topic models on Clinton community is the second most interpretable/coher- the tweets of those users who were identified as being part of ent topic within a topic model consisting of K (here, 70) the proClinton or proTrump communities. As mentioned in topics for the proClinton community. We represent each section “Methodology,” we set K with different values. We topic by a word cloud using its Top n words, here approxi- here report on the coherence of the generated topic models mately 20 words for each topic. The size of these words and select the topic models with the best K; in this case, K of indicates how often it is used in the topic discussion. The 70 for proClinton and K of 60 for proTrump, in order ulti- blue or black color, however, is added only to ease interpre- mately to present and analyze the extracted topics. tation. We also include the trend for each topic just below Rather than present all 130 topics across the two commu- the word cloud to highlight at which moments in time that nities, for the purpose of visualization and interpretation, we particular topic was discussed. The red line represents the focus on the Top 12 topics from each community. Figures 4 volume of the related tweets over our period of analysis, and 5 display the 12 most coherent topics among the pro- where the x-axis is the timeline and “S” signals the start Clinton community, and Figures 6 and 7 display topics from date (August 1), numbers “1,” “2,” and “3” denote each proTrump community. To also aid visualization, we display debate, and “E” represents election day. A spike in a trend repeated topic words in only one instance—in the respective suggests that a topic is highly discussed at that particular topic with the highest coherence. point in time. Fang et al. 11 Figure 5. Topics extracted from proClinton community (Topics 7-12). We first present the topics in the proClinton community in find a strong linkage between Trump and racism, with words Figures 4 and 5, and then we turn to Figures 6 and 7 for the such as racism, racist, KKK, bigot, scary included. That such proTrump community. First, it should be noted that the topics a topic would emerge as the single most coherent among the are, to a degree, subject to interpretation. Second, it also 70 topics in the proClinton community speaks to the nature bears noting that where we see the similarity in topics among of the debate on Twitter. Topics 2 and 3 both have linkages to the two communities, that we can conclude that both pro- Russia, with Topic 3 particularly relevant to the email scan- Clinton and proTrump communities discussed a given issue dal including words such as truth, Putin, Foundation, pri- or event. However, the language used and the ways these vate, and emails. Topic 4 continues this theme with references issues and topics were discussed was distinct among Clinton to the FBI, Comey, lies/liar. The trends demonstrate that and Trump supporters. Finally, as a general comment, it Topics 1 through 4 all gain momentum as election day should be noted that there is a relevant dearth of policy- approaches. Topic 5 appears more positive than the previous related topics for each community, perhaps with the excep- ones, with words such as hope, nice, choice, children. Topic tion of matters related to immigration such as the border 6 is particularly relevant to the #vpdebate, including Pence wall. Instead, we see the dominance of themes such as the but also covering the need to release tax returns. Clinton email scandal, corruption, concerns over racism and Turning to the next most coherent topics, Topics 7 prejudice, gender bias, Russia, mass media and media bias, through 12 in Figure 5, we again see a mix of topics with and voter fraud, to name a few. some pertaining more directly to Trump, and others related Beginning with Figure 4, we see a mix of topics associ- more to Clinton. For example, words such as sexual assault, ated more closely with the Trump campaign and those more rape, dangerous, Billy Bush appear in Topics 7 and 8 related closely associated with the Clinton campaign. In Topic 1, we to the allegations against Trump and the Access Hollywood 12 SAGE Open Figure 6. Topics extracted from proTrump community (Topics 1-6). tape. Concerns over unfair trade, middle class, and China Finally, Topics 7 through 12 in the proTrump community appear in Topic 9. Topic 10 through 11 have a mix of more also provide an important lens to understand Trump support positive words associated with the Clinton campaign such on Twitter. Topic 7 invokes the border wall and illegal while as job, hiring, and #ClintonKaine, whereas Topic 12 again also bringing in #wikileaks and the #ClintonFoundation. returns to tackling on Trump campaign pledges with build Topic 8 turns attention to voter fraud, machine, ballots. Topic wall. 9 is an example of a topic that appeared early on in our period Turning to the Top 12 most coherent topics of discus- of analysis but was relatively quiet thereafter, touching on sion among the proTrump community, we find consider- several themes including immigration and the candidate able attention paid to Trump’s opponent. Words such as names and general words such as election, America. Topic 10 foundation, email, Clinton, Comey all appear in Topic 1, has particular relevance to the debates and debate modera- with considerable discussion from the second debate tion (e.g., Chris Wallace, debate). Topic 11 links largely to onward, and then another peak just before election day the Obama administration and concerns over a Supreme when Comey announced that the emails were being exam- Court appointment (e.g., Biden, record, Supreme Court) and ined once more. Topic 2 sees a number of mentions of includes apparent trolling of the former president through @ #CrookedHillary and #NeverHillary along with apparent barackobama. Topic 12 represents another mix of terms such trolling of the opposition with #ImWithHer used. Topic 3 as Democrat, friend, and Deplorables. points to perceived media bias, coverage/covering, left, Among the 12 most coherent topics in each community, propaganda, Obama, and Topic 5 invokes truth. Topic 5 there are also some notable absences. Apart from concern and particularly Topic 6 speak to concerns over foreign, about illegal immigration and the border wall, there are no ISIS, Soros, and muslims. clear topics pertaining to policy decisions or policy-related Fang et al. 13 Figure 7. Topics extracted from proTrump community (Topics 7-12). terms such as taxes/taxes, education, spending, defense, or then trained a set of classifiers and applied them to our even Obamacare—even during the presidential debates unseen data to understand the overall levels of support for when these matters were well discussed. There are also few Trump and Clinton on social media. Finally, we employed terms relevant to polls or battleground states in these topics, topic models to understand the topics of discussion by nor to campaign rallies and key events apart from the debates. communities of support. Taken together, our study has pro- Nor were these topics especially prescient of the ones that vided a novel view of the dynamics of support and discus- have dominated the first part of the Trump presidency, sion—shedding new light on the dynamics of public including the Mueller investigation and fake news, and pol- opinion during the 2016 election. As we have pointed out, icy efforts including a repeal of Obamacare followed by suc- a virtue of the method described here is that it is flexible cess with tax reform legislation. and can be easily extended to other electoral contexts. For example, earlier work employed the methodology for understanding support or opposition to the 2014 Scottish Discussion Independence Referendum on social media. Providing one This article has implemented a novel classification method, can identify a set of hashtags that are frequently used on TBNB, to understand overall levels of support for the 2016 Twitter and readily speak to support or opposition for a U.S. presidential election candidates and the topics of dis- given candidate, political party, referendum—than one can cussion on social media. Our method relied on users who train a classifier and then explore the levels and dynamics used hashtags expressing support for the two candidates in of such support and opposition for a large community of consistent ways over time as our ground truth data. We social media users. 14 SAGE Open Appendix A Table A1. The Words That Are Removed From proClinton Figures (Figures 4-5). Topic index Removed words 2 Trump 3 Trump 4 Email 7 Tax, Trump, hope 8 Trump 9 Support, Trump 10 Look 11 Tweet, Trump 12 Build, expect Note. Table A1 displays words that appeared in multiple topics and were removed for interpretability, as discussed in subsection “Topics Extracted From the Unseen Data, proTrump and proClinton Communities.” The words are retained in Figures 4-5 in the topic with the highest coherence. Table A2. The Words That Are Removed From proTrump Figures (Figures 6-7). Topic index Removed words 4 Lie, cover, destroy 5 Report 7 Email, love 9 Illegal 10 Debate, @realdonaltrump, go 11 @realdonaltrump, muslim, give, think 12 Poll, border, machine, Russian Note. Table A2 displays words that appeared in multiple topics and were removed for interpretability, as discussed in subsection “Topics Extracted From the Unseen Data, proTrump and proClinton Communities.” The words are retained in Figures 6-7 in the topic with the highest coherence. Appendix B Figure B1. Coherence of topic models with different K. (a) topic models from proClinton-related tweets and (b) topic models from proTrump-related tweets. Note. Figure B1 reports the coherence of topic models with the different number of topics K. These topic models are generated from the two communities. We select the best topic number K of 70 for proClinton community and K of 60 for proTrump community. Fang et al. 15 Authors’ Note that mark their candidate choice, or they blend proTrump and proClinton hashtags. We aim to classify and understand the Previous versions of this paper were presented at New York vote preferences of these unseen Twitter users, as described in University’s Social Media and Political Participation (SMaPP) subsection “Data Collection.” Global Project meeting in November 2017 and at the Alabama 10. https://dev.twitter.com Political Science Association’s (AlPSA) annual meeting in March 11. This setting allows us to obtain a sample of roughly 1% of all tweets in the United States (Morstatter, Pfeffer, Liu, & Carley, 2013), including Alaska but not including Hawaii. Acknowledgments 12. The use of the bounding box allows us to obtain tweets that We thank participants at the SMaPP Global meeting especially for are posted within the United States, as here we are fundamen- their helpful comments, particularly Josh Tucker, Jonathan Nagler, tally interested in the views of U.S. Twitter users rather than and Dean Eckles, and the audience at AlPSA for their feedback, users from other parts of the world. These tweets either have especially Thomas Shaw. We also thank the editor, Pablo Barberá, exact geo-locations (i.e., longitude & latitude) or have place and the two anonymous reviewers for their feedback. information (e.g., New York City) identifiable by Twitter. We rely on Twitter’s internal classification process to determine whether the tweet has been posted in the United States or not. Declaration of Conflicting Interests More information is provided at https://developer.twitter.com/ The author(s) declared no potential conflicts of interest with respect en/docs/tutorials/filtering-tweets-by-location.html to the research, authorship, and/or publication of this article. 13. The keyword “Donald” is not selected as we found it intro- duces too much noise, as it is more generic than “Trump” or Funding “Hillary,” for example. We did not want to collect tweets about “Donald Duck,” for example. The author(s) disclosed receipt of the following financial support 14. The used election-related hashtags are #clinton, #trump, #hill- for the research, authorship, and/or publication of this article: Anjie ary, #debatenight, and hashtags in Table 1. Note that the case Fang thanks the LKAS PhD Scholarship at the University of sensitivity of these hashtags is omitted, for example, #TRUMP Glasgow for funding support. is same as #trump. 15. Including users who tweet only one or a few times can intro- Notes duce too much noise into the analysis. 1. See Klašnja, Barberá, Beauchamp, Nagler, and Tucker (2017) 16. See https://democrats-intelligence.house.gov/uploadedfiles/ for an excellent discussion of Twitter for understanding public exhibit_b.pdf for the list. opinion, both its strengths and limitations. 17. http://scikit-learn.org 2. We use only public posts in our analysis. 18. For Multilayer Perceptron (MLP), we set the hidden layer as 3. At the time of the data collection, Twitter posts were limited to 100 and set the topic number K in Topic-Based Naive Bayes 140 characters in length. Since then tweets are permitted to be (TBNB) as 10. 280 characters. 19. Words are ranked by their frequencies in the corpus. 4. Of course, it is possible for users to troll the opposition using 20. https://www.crowdflower.com a given hashtag provocatively—for example, a Clinton sup- 21. Note that we did not disclose the user’s account handle, or porter attempting to engage Trump supporters by including name, nor other identifying information. #VoteTrump in their tweet. But, we argue that such a user 22. We used at most eight tweets to make the task more manage- would be unlikely to only tweet #VoteTrump and would able and feasible. instead be more likely to blend hashtags including ones that 23. The Crowdflower workers are required to spend at least 20 s for signal both support (e.g., #VoteTrump) and opposition to each judgment. Each worker is paid US$0.20 for each judgment. Trump (e.g., #DumpTrump or #NeverTrump) over time. To ensure quality, we prepare a set of test questions, where the 5. Note that Beauchamp (2016) demonstrates that Twitter data community labels of the Twitter users are verified in advance. can be leveraged to understand election dynamics and candi- Crowdflower workers enter the task if they reach 70% accuracy date support. on the test questions. 6. Fang, Ounis, Habel, MacDonald, and Limsopatham (2015) 24. Note that our classifier places users into either a proClinton or validated the labeling method by examining follower networks proTrump community and does not include a third option of among these users, finding that those labeled “Yes” were far neither. more likely to follow politicians from the Scottish National 25. These coherence metrics leverage word embedding models, Party (the SNP) than the others, with the SNP being the party trained using public tweets posted from August 2015 to August strongly in favor of independence. 2016, to capture the semantic similarities of words in extracted 7. Naive Bayes (NB) has the advantages of being used widely for topics, and thus give a coherence score of a topic. text classification (Kim, Han, Rim, & Myaeng, 2006; Zhang 26. Because Tree, NB, Support Vector Classification (SVM), and & Li, 2007), and it can be easily adapted with Latent Dirichlet TBNB classifiers do not have random processes, we do not Allocation (LDA). Both are probabilistic models. conduct a paired t test when comparing their results. Instead, 8. We eliminate retweets from our analysis. we use McNemar’s test (Dietterich, 1998) to see whether two 9. By unseen data, we mean the body of Twitter users who author classifiers perform equally. We find that the TBNB classifier election-related tweets but whose vote preferences have not performs differently from the other classifiers (mid – p < 0.05) been labeled—because they do not use the hashtags in Table 1 according to the McNemar’s test. 16 SAGE Open 27. We show the coherence of the topic models extracted from D. Song, D. Albakour, S. Watt, & J. Tait (Eds.), Proceedings the two candidate communities in Appendix Figure B1. The of the 39th European Conference on Information Retrieval coherence results are consistent with Fang, MacDonald, (pp. 252-265). Aberdeen, UK: Springer International Ounis, and Habel (2016a): the average coherence of a topic Publishing. model decreases when the number of topics increases; how- Fang, A., Ounis, I., Habel, P., MacDonald, C., & Limsopatham, ever, the increasing line of c@10/20/30 in Figure B1 indicates N. (2015). Topic-centric classification of Twitter user’s politi- that the top-ranked topics in a topic model are much easier cal orientation. In R. Baeza-Yates, M. Lalmas, A. Moffat, & B. to understand as K increases. Among proClinton topic mod- Ribeiro-Neto (Eds.), Proceedings of the 38th International ACM els, we found the coherence (c@10/20/30) of topics becomes SIGIR conference on Research and Development in Information stable when K reaches 70, and for proTrump, when K reaches Retrieval (pp. 791-794). Santiago, Chile: Association for 60. Therefore, we present a proClinton model with K = 70 and Computing Machinery. a proTrump model with K = 60. Grimmer, J., & Stewart, B. (2013). Text as data: The promise and 28. The complete list of generated topics is available at https:// pitfalls of automatic content analysis methods for political goo.gl/8ev4Pk. texts. Political Analysis, 21, 267-297. 29. The repeated, removed topic words are listed by the topic Guess, A., Nyhan, B., & Reifler, J. (2018). Selective exposure to where they were removed in Appendix Tables A1 and A2 for misinformation: Evidence from the consumption of fake news during the 2016 U.S. presidential campaign (Working paper). reference. Retrieved from https://www.dartmouth.edu/~nyhan/fake- news-2016.pdf References Harrag, F., Hamdi-Cherif, A., & El-Qawasmeh, E. (2010). Adamic, L. A., & Glance, N. (2005). The political blogosphere Performance of MLP and RBF neural networks on Arabic text and the 2004 U.S. election: Divided they blog. In J. Adibi, categorization using SVD. Neural Network World, 20, 441-459. M. Grobelnik, D. Mladenic, & P. Pantel (Eds.), Proceedings Howard, P. N., Woolley, S., & Calo, R. (2018). Algorithms, bots, of the 3rd international workshop on Link discovery (pp. 36- and political communication in the U.S. 2016 election: The 43). Chicago, IL: Association for Computing Machinery. challenge of automated political communication for election Allcott, H., & Gentzkow, M. (2017). Social media and fake news law and administration. Journal of Information Technology & in the 2016 election. Journal of Economic Perspectives, 31, Politics, 15, 81-93. 211-236. Joachims, T. (1998). Text categorization with support vec- Barberá, P., & Rivero, G. (2014). Twitter and Facebook are not tor machines: Learning with many relevant features. In C. representative of the general population: Political attitudes and Nedellec & C. Rouveirol (Eds.), Proceedings of the 10th demographics of British social media users. Social Science European Conference on Machine Learning (pp. 137-142). Computer Review, 33, 712-729. Chemnitz, Germany: Springer-Verlag. Beauchamp, N. (2016). Predicting and interpolating state-level Jungherr, A. (2016). Twitter use in election campaigns: A system- polls using Twitter textual data. American Journal of Political atic literature review. Journal of Information Technology & Science, 61, 490-503. Politics, 13, 72-91. Bond, R., & Messing, S. (2015). Quantifying social media’s political Khorsheed, M. S., & Al-Thubaity, A. O. (2013). Comparative evalua- space: Estimating ideology from publicly revealed preferences tion of text classification techniques using a large diverse Arabic on Facebook. American Political Science Review, 109, 62-78. dataset. Language Resources and Evaluation, 47, 513-538. Burnap, P., Gibson, R., Sloan, L., Southern, R., & Williams, M. Kim, S.-B., Han, K.-S., Rim, H.-C., & Myaeng, S. H. (2006). (2016). 140 characters to victory? Using Twitter to predict the Some effective techniques for naive Bayes text classification. UK 2015 general election. Electoral Studies, 41, 230-233. IEEE Transactions on Knowledge and Data Engineering, 18, Dietterich, T. G. (1998). Approximate statistical tests for com- 1457-1466. paring supervised classification learning algorithms. Neural Klašnja, M., Barberá, P., Beauchamp, N., Nagler, J., & Tucker, J. Computation, 10, 1895-1923. (2017). Measuring public opinion with social media data. In L. R. Fang, A., MacDonald, C., Ounis, I., & Habel, P. (2016a). Atkeson & R. M. Alvarez (Eds.), The Oxford handbook of polling Examining the coherence of the top ranked tweet topics. In R. and survey methods. Oxford, UK: Oxford University Press. Perego, F. Sebastiani, J. Aslam, I. Ruthven, & J. Zobel (Eds.), McCallum, A., & Nigam, K. (1998). A comparison of event models Proceedings of the 39th International ACM SIGIR conference for naive Bayes text classification. In Proceedings of the 15th on Research and Development in Information Retrieval (pp. AAAI conference workshop on learnning for text categorization 825-828). Pisa, Italy: Association for Computing Machinery. (pp. 41-48). Madison, US: Citeseer. Fang, A., MacDonald, C., Ounis, I., & Habel, P. (2016b). Using McKelvey, K., DiGrazia, J., & Rojas, F. (2014). Twitter publics: word embedding to evaluate the coherence of topics from How online political communities signaled electoral out- Twitter data. In R. Perego, F. Sebastiani, J. Aslam, I. Ruthven, comes in the 2010 U.S. house election. Journal of Information & J. Zobel (Eds.), Proceedings of the 39th International Technology & Politics, 17, 436-450. ACM SIGIR conference on Research and Development in Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. M. (2013). Is the Information Retrieval (pp. 1057-1060). Pisa, Italy: Association sample good enough? Comparing data from Twitter’s stream- for Computing Machinery. ing API with Twitter’s firehose. In E. Kiciman, N. Ellison, B. Fang, A., MacDonald, C., Ounis, I., Habel, P., & Yang, X. (2017). Hogan, P. Resnick, & I. Soboroff (Eds.), Proceedings of the Exploring time-sensitive variational Bayesian inference LDA Seventh International AAAI Conference on Weblogs and Social for social media data. In J. M. Jose, C. Hauff, I. S. Altıngovde, Media (pp. 400-408). Boston, US: AAAI Press. Fang et al. 17 Porter, M. A., Mucha, P. J., Newman, M. E., & Warmbrand, Author Biographies C. M. (2005). A network analysis of committees in the Anjie Fang is a final year PhD student in the School of Computing U.S. house of representatives. Proceedings of the National Science of at the University of Glasgow. His PhD topic is to develop Academy of Sciences of the United States of America, 102, effective computing science approaches, such as topic modelling 7057-7062. and user classification, for analyzing political events on social Porter, M. F. (1997). Readings in information retrieval. Burlington, media platforms. US: Morgan Kaufmann. Philip Habel is an associate professor and chair of the Department Rijsbergen, C. J. V. (1979). Information retrieval (2nd ed.). of Political Science & Criminal Justice at the University of South Newton, MA: Butterworth-Heinemann. Alabama, and affiliate senior research fellow in the School of Soroush, V., Roy, D., & Aral, S. (2018). The spread of true and Political and Social Sciences at the University of Glasgow. His false news online. Science, 359, 1146-1151. areas of research include political communication, public opinion, Timberg, C. (2016, November). Russian propaganda effort helped and computational social science. spread “fake news” during election, experts say. The Washington Post. Retrieved from https://www.washingtonpost.com/business/ Iadh Ounis is professor in the School of Computing Science at the economy/russian-propaganda-effort-helped-spread-fake-news- University of Glasgow. He is also the leader of the Terrier Team during-election-experts-say/2016/11/24/793903b6-8a40-4ca9- and a former deputy director/director of knowledge exchange at the b712-716af66098fe_story.html?noredirect=on&utm_term=. Scottish Informatics & Computer Science Alliance (SICSA). His a7e52ce2d5a0 research concerns developing and evaluating novel large-scale text Tucker, J. A., Theocharis, Y., Roberts, M. E., & Barberá, P. (2017). information retrieval techniques and applications. From liberation to turmoil: Social media and democracy. Journal of Democracy, 28, 46-59. Craig MacDonald is lecturer in Information Retrieval in the Zhang, H., & Li, D. (2007). Naїve Bayes text classifier. In T. Y. School of Computing Science at the University of Glasgow. He is Lin, X. Hu, J. Han, X. Shen, & Z. Li (Eds.), IEEE International the lead developer for the Terrier IR platform. His research in infor- Conference on Granular computing (pp. 708-708). San Jose, mation retrieval includes web, Enterprise, social media and Smart US: IEEE. cities applications. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png SAGE Open SAGE

Votes on Twitter: Assessing Candidate Preferences and Topics of Discussion During the 2016 U.S. Presidential Election:

SAGE Open , Volume 9 (1): 1 – Mar 27, 2019

Loading next page...
 
/lp/sage/votes-on-twitter-assessing-candidate-preferences-and-topics-of-QhnBmFUmgG

References (36)

Publisher
SAGE
Copyright
Copyright © 2022 by SAGE Publications Inc, unless otherwise noted. Manuscript content on this site is licensed under Creative Commons Licenses.
ISSN
2158-2440
eISSN
2158-2440
DOI
10.1177/2158244018791653
Publisher site
See Article on Publisher Site

Abstract

Social media offers scholars new and innovative ways of understanding public opinion, including citizens’ prospective votes in elections and referenda. We classify social media users’ preferences over the two U.S. presidential candidates in the 2016 election using Twitter data and explore the topics of conversation among proClinton and proTrump supporters. We take advantage of hashtags that signaled users’ vote preferences to train our machine learning model which employs a novel classifier—a Topic-Based Naive Bayes model—that we demonstrate improves on existing classifiers. Our findings demonstrate that we are able to classify users with a high degree of accuracy and precision. We further explore the similarities and divergences among what proClinton and proTrump users discussed on Twitter. Keywords computer science, user classification, topic modelling, twitter, political science, social science For decades scholars have turned to surveys to understand where random sampling is a strength. At the same time, if public opinion, particularly in the context of citizens’ vote researchers are especially interested in the expressed views choice. Surveys have real advantages; they offer insight on a of an engaged and active audience, posts on sites social host of political attitudes and beliefs, they allow one to media have a particular value. And, it is on social media explore how and why respondents hold certain views, and where scholars can well study the intensity of opinion, as they have been shown to often be valid predictors of election citizens post on issues and ideas that interest them, expressed outcomes. At the same time, surveys are not without limita- in their own way. A second challenge relates to the very tions: for example, the designs are typically static in nature, nature of social media—its infrastructure and affordances. respondents may offer poorly informed or misinformed Sites such as Facebook, for example, allow for long posts responses, or the issues being probed may not correspond to and subposts for discussion. Information conveyed on those citizens truly care about. Even the costs of implemen- Facebook can be useful in a myriad of ways: even revealing tation can be prohibitive in many electoral contexts. users’ ideology (Bond & Messing, 2015). At the same time, Researchers in recent years have recognized the utility of many Facebook users protect the privacy of their posts. Posts assessing public opinion in new and previously unavailable on Twitter, on the other hand, are (most often) public, ways, especially through modern information technologies although character restrictions on the length of tweets mean such as social media. Posts on social media sites are by their that posts will not only be short but also frequently adopt very nature contemporaneous and dynamic, and they reflect unconventional language including abbreviations and an interested and engaged public’s view across a diversity of hashtags that can complicate interpretation. Third, social topics that citizens care about. Social media can open chan- media conversations are text-based and are typically absent a nels for political expression, engagement, and participation readily identifiable signal of vote choice or preference, and (Tucker, Theocharis, Roberts, & Barberá, 2017). thus more challenging to interpret. Of course, analyzing public opinion through the lens of social media presents its own unique set of challenges. First, 1 University of Glasgow, Glasgow, UK scholars have noted that posts on sites such as Facebook, University of South Alabama, Mobile, USA Twitter, and Snapchat are typically unrepresentative of the Corresponding Author: views of the population as a whole (Barberá & Rivero, 2014; Anjie Fang, School of Computing Science, University of Glasgow, Sir Beauchamp, 2016; Burnap, Gibson, Sloan, Southern, & Alwyn Williams Building, Lilybank Gardens, Glasgow G12 8QQ, UK. Williams, 2016) particularly in comparison with surveys Email: a.fang.1@research.gla.ac.uk Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (http://www.creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). 2 SAGE Open Here, we take advantage of hashtags that signal vote respective hashtags to train a classifier on the remaining text choice to mitigate some of these concerns, training a classi- of the tweet. Once removed, we employ a set of machine fier based on the content of tweets from users who signified learning classifiers, including our TBNB, to determine their candidate preference via the consistent use of certain whether we can with accuracy and precision classify the vote hashtags. Because hashtags serve as a convenient means for signal, the hashtag. We compare the performance of several users to catalog a post as germane to a given topic and classifiers against one another using standard evaluation because users invoke hashtags often, researchers can rely on metrics, finding TBNB to outperform the others in our train- hashtags to probe conversations and topics of interest. ing data. Given our high degree of success, we then apply our Moreover, certain hashtags can even convey user prefer- trained TBNB classifier to our “unseen data” to understand ences or political attitudes over a given issue or candidate. candidate support across a much wider and indeed massive For example, in the lead up to the 2016 election, a user audience—drawing on commonalities in the content of invoking the hashtag #VoteTrump signals their preference tweets among our labeled hashtag users in our training data for a candidate. Similarly, #AlwaysHillary indicates support and among users in our unseen data to assess overall levels of for Hillary Clinton. candidate support on Twitter. We evaluate the classification The aims of our study are several. We turn to a new data of our unseen data, the out-of-sample performance, with a source, our own collection of approximately 29.5 million Crowdflower study of a subset of Twitter users. We then publicly available tweets related to the 2016 U.S. presiden- move to understanding the topics of discussion surrounding tial election, to assess support for the presidential candidates the 2016 election within the two communities: those sup- on the Twitter platform. We train a machine learning classi- porting Donald Trump and those supporting Hillary Clinton. fier on the tweets of those users who adopted hashtags that Did Clinton supporters and Trump supporters discuss the signaled their support for a particular candidate, and then we same issues? Or did they diverge in their conversations? In apply our classifier to understand the viewers of a much answering these questions, we shed light on the relevant top- larger audience. We validate our classifier with a study for a ics associated with candidate support. As a final virtue, our subset of Twitter users, evaluating the “vote” label of our methodology is flexible and can be translated well to other classifier against Crowdflower workers’ assessment of which electoral contexts—local, state, and other federal elections candidate a given Twitter user preferred. Of course, our goal within the United States and indeed democratic elections and is not to predict the election outcome, as we recognize that referenda in all parts of the world. social media users are not representative of the U.S. voting population, but instead to understand public support for the Context respective candidates on Twitter. Our second and closely related task is to explore the topics of discussion among Our work builds on a rich literature on the utility of Twitter social media users supporting Donald Trump and among in elections, including the ways in which citizens communi- those supporting Hillary Clinton. We look to see what types cate on the platform and the degree to which tweets can be of topics were invoked, and whether we see similar or diver- used to understand vote choice and even electoral outcomes gent issues and themes from these two communities. Thus, (see, for example, Burnap et al., 2016; Jungherr, 2016; we offer a novel means of understanding public conserva- McKelvey, DiGrazia, & Rojas, 2014). In one notable exam- tions during an important election. Taken together, our aim is ple of the latter, when utilizing state-level polling data in to understand the vote preferences of Twitter users and the conjunction with Twitter data, Beauchamp (2016) demon- topic discussions among supporters of the two candidates. strates that Twitter textual features can be deployed effec- Our analysis offers new perspectives on public opinion— tively to understand the dynamics of public opinion and vote candidate support and topics of conversation—in the 2016 choice during the 2012 election cycle. In an overview of the election. literature on social media and elections, Jungherr (2016) To address our twofold research question, we introduce to notes that politicians do look to Twitter as a means of gaug- the social science literature a novel method: a Topic-Based ing citizens’ interest and public opinion. Naive Bayes (TBNB) classifier that integrates Latent The 2016 U.S. presidential election represents a particu- Dirichlet Allocation (LDA) topic models within a Naive larly unique and important context for our study. Both the Bayes classifier framework (Fang, Ounis, Habel, MacDonald, public and scholars alike recognize the novelty of the first & Limsopatham, 2015). We show that the TBNB classifier female major party candidate with a rich political history outperforms others, and it also provides leverage in under- running against a celebrity without prior elected experience standing topics of conversation on Twitter. The application and with a reputation for “telling it like it is.” Indeed, it was of our TBNB proceeds in several steps. We begin by locating clear throughout the summer and fall of 2016 that the two Twitter users who adopted certain hashtags consistently over candidates presented markedly different visions for their time—hashtags that signaled support for either Donald administrations. Not surprising, then, and as we will show, Trump or Hillary Clinton. These users’ tweets represent our the conversations on social media platforms by the commu- ground truth data. From these users’ tweets, we remove the nities of support were numerous and diverse in nature. Fang et al. 3 Tweets covered topics including missing emails, media bias, The use of the hashtags above served as the ground truth the Federal Bureau of Investigation (FBI) and Comey, rac- in the model—the marker assumed to be a valid indicator of ism, border walls, and more. And as we will show, the nature a Twitter user’s preference for independence. With this of discussions and the appearance of topics within a com- ground truth in hand, Fang et al. (2015) implemented a munity evolved over time, with new events and revelations TBNB classification task on the text of the tweets, after triggering new dialogue online. excluding from the tweets the relevant hashtags markers above. The classifier applied LDA to extract discussed topics Twitter User Classification on the 2014 referendum from tweets, and then it leveraged Navies Bayes to construct word probabilities conditional on Our work is focused on understanding users preferences both classes—“Yes” and “No” communities. The authors for the candidates and, importantly, the topics of conversa- demonstrated that they could, with high levels of success, tion within proClinton or proTrump communities. Our identify social media users’ community of support (pro approach both parallels and builds on that of Fang et al. Independence or not) using this approach. (2015) who utilized a TBNB classifier to assess support for Moreover, the successful application of TBNB to the independence in Scotland during the 2014 referendum. To users in the ground truth dataset suggested that one can train give the historical background for that election, on a classifier to assess “Yes” and “No” support a much wider September 18, 2014, voters in Scotland were given the and indeed massive audience. For example, the patterns of opportunity to decide their country’s future—whether they language use for a tweeter who advocated for indepen- wished for Scotland to be independent from the United dence—but often included hashtags of the opposition so as to Kingdom or to remain together with England, Wales, and extend the reach of the tweet or even to troll the opposition— Northern Ireland. The referendum ballot raised the ques- can be used to recognize such a user as a Yes supporter. tion matter-of-factly: “Should Scotland be an independent Similarly, we identify a set of hashtags signaling vote country?” with voters given two straightforward response preference during the 2016 U.S. presidential election, and we options, “Yes” or “No.” then apply the TBNB classifier to assess support on both The goals of the previous study were similarly both to training data and unseen data, and we finally use topic mod- understand social media user’s preferences for Yes or No, eling to extract topics of discussion by proTrump or proClin- and second, to explore the topics of conversation during ton communities. We begin with locating users who the 2014 Independence Referendum among pro and anti- incorporated hashtags into their tweets in a consistent fash- Independence communities. To obtain the ground truth ion over the period leading up to the November election, data, the foundation from which their machine learning which form ground truth labels with the hashtags in Table 1. classifier was built, Fang et al. (2015) relied upon users As one can see from the list below, our chosen hashtags sig- who employed the following hashtags in consistent ways nal support in clear ways—and moreover, the hashtags were over time—hashtags that were interpreted by the research- widely adopted by users during the election to ensure a large ers as definitive signals of vote choice: training dataset. Again to be clear, to be included in the ground truth data- Yes Users (Those Supporting Independence for Scotland): set, users across the 3-month period of analysis leading up to the November 8 election could blend hashtags within either #YesBecause, #YesScotland, #YesScot, #VoteYes the proClinton and proTrump sets above, but they could not ever blend hashtags across these sets. Following Fang et al. No Users (Those Preferring to Remain in the United (2015), after labeling users as proClinton or proTrump, we Kingdom): take advantage of the fact that users tweet additional textual #NoBecause, #BetterTogther, #VoteNo, #NoThanks content beyond hashtags. Our twofold assumption is that there is meaningful textual information conveyed in tweets To be clear, Fang et al. (2015) labeled a social media user (beyond hashtags) that can be used to assess support for a as a “Yes” supporter in the 2014 IndyRef if he or she exclu- given candidate and understand the topics of conversation by sively used one or more of the hashtags in the above “Yes” respective candidate communities, and that the TBNB classi- set during the 2 months leading up to the September referen- fier can learn such patterns and word usages. We thus strip dum. Similarly, if a user used only those hashtags in the “No” the tweets of the hashtags that allowed us to label users as set during the same 2-month period, he or she was labeled as proClinton or proTrump (Table 1) to classify users into pro- a “No” voter. The project excluded those users who at any Clinton and proTrump communities using the textual fea- point during the 2 months leading up to the referendum tures of their tweets. Our results show that we are able to do offered any single tweet that included hashtags in both sets so with a high degree of success. We then apply this classifier Yes and No. Users who blended hashtags or who did not to the larger, unseen data, to determine overall support for incorporate them were left unlabeled. Clinton and Trump on the Twitter platform. 4 SAGE Open Table 1. Hashtags to Label Users. communities. Note that retweets are not included to avoid labeling a user according to someone else’s original content. proClinton proTrump Our labeling method results in 28.1k users in the proClinton #imwithher #trumptrain community who author 245.6k tweets, and 11.6k users in #alwayshillary #alwaystrump the proTrump community who tweet 148.3k times, as seen #strongertogether #votetrump in Table 2. One can see that the proClinton community is #nevertrump #crookedhillary larger than the proTrump one in our training data. #dumptrump #neverhillary #notrump #corrupthillary Unseen data. For our unseen data, we collect tweets in the #antitrump #nohillary 3 months leading up to the 2016 elections—tweets con- taining either keywords or hashtags (or both) that we con- sider election-related. For example, we have tweets with Methodology words or hashtags such as “Trump” or “Hillary” or “Clin- ton” or “debate” or “vote” or “election.” We then collect Figure 1 shows the components of our research design. We all the tweets from all users who authored at least four first collect both our training and our unseen social media 9 tweets that used such hashtags. In total, then, we have data. Next, using the hashtag labeling method, we described 264,518 users with 3,565,336 tweets in our unseen data, as above in section “Twitter User Classification,” we train our shown in Table 2. To be clear, to be included in the unseen candidate community classifier to determine whether a given data, each tweet must include an election-related keyword Twitter user supported Hillary Clinton or Donald Trump, as or hashtag, and each user must have authored at least four described in subsection “Community Classification.” Note such tweets. Our unseen data are of course much larger that our classification task is at the user level, not at the tweet than our training data, given that our training data includes level. In subsection “Crowdflower User Study of Twitter only users who used hashtags consistently and their respec- Users’ Candidate Preferences,” we describe the methodology tive tweets. The candidate preference of Twitter users in for validating the application of our classifier on the unseen our unseen data is what we aim to determine. data through a Crowdflower user study comparing our Next, we explain how we use our training and unseen machine learning classification to Crowdflower worker’s data. As different datasets are used in the following sections, evaluation for a subset of 100 Twitter users. Finally, the we list the usage of the datasets in their respective sections in methodology for extracting the discussed topics in tweets Table 3. The training data is used for training a set of classi- from the proClinton and proTrump communities is discussed fiers as described in subsection “Community Classification,” in subsection “Topic Modeling of the Unseen Data.” and the performance of the classifiers are reported in subsec- tion “Results of classification for the training data,” where we show the TBNB outperforms the others on the training Data Collection data. The subsection “Community Classification” also We begin by collecting a Twitter dataset with a sample of describes the community classification for our unseen data. tweets posted in the United States within a 3-month period We describe the design of our Crowdflower user study that leading up to the election, from August 1 to November 8, speaks to how well our TBNB classifier performs on labeling 2016—election day. This Twitter dataset is crawled using the the candidate preferences of users in our unseen data in sub- Twitter Streaming API by setting a bounding box to cover section “Crowdflower User Study of Twitter Users’ only the area of the United States. We collect roughly 1.5 Candidate Preferences,” thereby assessing the out-of-sample million tweets per day. From this data collection of tweets performance of the classifier. We describe how we conduct and respective users, we divide our data into training and the topic models for the unseen data by proClinton and pro- unseen data. We note that it is possible that tweets posted Trump communities in subsection “Topic Modeling of the from Twitter bot accounts are included in both our training Unseen Data.” Results related to the unseen data are reported and unseen data. in subsection “Vote preferences of unseen Twitter users” showing overall levels of support for the two candidates on Training data. We use the hashtag labeling method described Twitter; subsection “Results of the Crowdflower user study in section “Twitter User Classification” to obtain our train- on the unseen data classification” reports the results from the ing data (i.e., the ground truth data) for the proClinton and Crowdflower study; and subsection “Topics Extracted From proTrump community classification. From the proClinton the Unseen Data, proTrump and proClinton Communities” and proTrump hashtags, we obtain a training dataset con- displays the topics of discussion among proClinton and pro- taining 39,854 users who produce 394,072 tweets, as shown Trump communities. in Table 2. Again, the Twitter users in the training data used We pause here to note that recent attention has been either the proClinton or proTrump hashtags listed in Table 1 drawn to the role of fake news and Twitter bot accounts in and thus can be readily labeled as members of these two influencing public opinion, particularly fake news and bots Fang et al. 5 Figure 1. Components of the analysis. Note. TBNB = Topic-Based Naive Bayes classifier; LDA = Latent Dirichlet allocation. Table 2. Attributes Our Training and Unseen Data. No. of users No. of tweets Dataset proClinton proTrump Total proClinton proTrump Total Training data 28,168 11,686 39,854 245,692 148,380 394,072 Unseen data 264,518 3,565,336 Table 3. The Use of the Training and Unseen Data by Section. Community Classification Datasets Sections Our first, fundamental goal is classification—that is, we wish to understand whether a given Twitter user supported Training data subsections “Community Classification” & Hillary Clinton (and ergo is part of the proClinton commu- “Results of classification for the training data.” nity in our framework) or whether a user supported Donald Unseen data subsections “Community Classification,” “Crowdflower User Study of Twitter Users’ Trump (and thus is part of the proTrump community). One Candidate Preferences,” “Topic Modeling could argue that applying classification algorithms to under- of the Unseen Data,” “Vote preferences stand the vote preferences of Twitter users is unnecessary, of unseen Twitter users,” “Results of the that one could instead look directly at the use of hashtags, Crowdflower user study on the unseen data URLs (Adamic & Glance, 2005), or employ network models classification,” & “Topics Extracted From (M. A. Porter, Mucha, Newman, & Warmbrand, 2005). the Unseen Data, proTrump and proClinton However, most tweets do not contain hashtags or URLs, and Communities.” Twitter users might not have enough followers/followees to construct effective network models. We argue that classifica- tion algorithms improve our understanding of the vote pref- originating from Russia during the 2016 election (Allcott & erences of a large number of Twitter users. Gentzkow, 2017; Guess, Nyhan, & Reifler, 2018; Howard, In computational social science, several classification Woolley, & Calo, 2018; Soroush, Roy, & Aral, 2018; algorithms are often used, among them Decision Tree, Naive Timberg, 2016). To ascertain the presence of Russian bots in Bayes, Support Vector Machine, Neural Networks imple- our analysis, we turn to a list of 2,752 Russian bot accounts mented as C4.5 (Tree), Multinomial Naive Bayes (NB), that were identified by the U.S. House Select Committee on Linear Support Vector Classification (SVM), and Multilayer Intelligence. We then examine how many tweets from Perceptron (MLP) in scikit-learn. Among these, NB, SVM, these accounts are present in our training and unseen datas- and MLP have often been used in text classification (see, for ets. We found none of these Russian bot accounts is present example, Fang et al., 2015; Harrag, Hamdi-Cherif, & in our training data, and a mere 25 tweets from 16 Russian El-Qawasmeh, 2010; Joachims, 1998; Khorsheed & bots are present in our unseen data. Thus, we argue the influ- Al-Thubaity, 2013; McCallum, Nigam, et al., 1998). In addi- ence of these identified bot accounts on our analysis is mini- tion to these classifiers, we also apply the TBNB classifier mal. Our use of a bounding box for our data collection that explained earlier in section “Twitter User Classification.” restricted tweets to accounts within the United States in part For comparison, we deploy a random classifier (RDN), explains why we find so few tweets from these Russian bot which generates classification results (i.e., proTrump or accounts in our data. 6 SAGE Open Table 4. The Classification Results. Recall, F1, and Accuracy (Rijsbergen, 1979). Precision is the fraction of Twitter users correctly labeled among all the Candidate community predicted positive (either proClinton or proTrump) Twitter users, whereas Accuracy is the fraction of correctly classi- proClinton proTrump Accuracy fied Twitter users among all Twitter users. Recall is the frac- RDN tion of Twitter users correctly labeled among all real positive F1 0.582 0.366 Twitter users. F1 represents the harmonic average of Precision 0.703 0.290 0.496 Precision and Recall. Recall 0.496 0.497 Tree F1 0.817 0.639 Crowdflower User Study of Twitter Users’ Precision 0.874 0.567 0.757 Candidate Preferences Recall 0.768 0.733 NB We describe here our Crowdflower user study to evaluate the F1 0.883 0.760 performance of our TBNB classifier on the unseen data. As Precision 0.930 0.689 0.843 we noted in subsection “Data Collection,” our hashtag label- Recall 0.840 0.849 ing method provides the ground truth data for the proClinton/ SVM proTrump classifier. We can (and do, in Table 4) evaluate the F1 0.881 0.747 performance of our classifiers in terms of how effectively Precision 0.916 0.690 0.838 they place users into proClinton and proTrump communities Recall 0.848 0.814 in our training data. However, in the absence of ground truth/ MLP the hashtag labeling method, we cannot evaluate our classi- F1 0.835 0.678 fier’s performance on the unseen data. Therefore, we evalu- Precision 0.897 0.597 0.782 ate the out-of-sample performance of our classifiers by Recall 0.781 0.784 comparing it with judgments made by workers on the TBNB Crowdflower platform. Here, we ask Crowdflower workers F1 0.893 0.753 to determine whether a given Twitter user in our unseen data Precision 0.903 0.734 0.851 supported Hillary Clinton or Donald Trump (or neither) by Recall 0.883 0.772 looking at the content of the user’s tweets, for a random sam- Note. We bold the highest values for reference. RDN = random classifier; ple of 100 Twitter users in our unseen data. Thus, we com- NB = Naive Bayesian classifier; SVM = Support Vector Classification pare the vote classification performance of our classifier to classifier; MLP = Multilayer Perceptron classifier; TBNB = Topic-Based judgments from Crowdflower workers. The interface of this Naive Bayesian classifier. user study is shown in Figure 2. To begin, we randomly select 100 Twitter users from the proClinton) by considering the distribution of classes in the unseen Twitter dataset described in subsection “Data training data. Using multiple classifiers in our analysis Collection.” For each of the 100 selected Twitter users, we allows us to compare and contrast their performance in cat- present crowdsourced workers with at most eight of their egorizing users in our training data into proTrump or pro- respective tweets selected randomly, as seen in the top of Clinton communities, including assessing the utility of our Figure 2. After reading up to eight tweets, a Crowdflower TBNB approach against the others. worker is asked to select whether the given Twitter user sup- We applied steps typical in the preprocessing of text data ports Hillary Clinton or Donald Trump—or if candidate sup- (Grimmer & Stewart, 2013) prior to classification. Steps port cannot be determined, as seen in the lower left of Figure included removing commonly used words that do not help 2. To understand how the workers reach their decision, we improve the classification (i.e., English stop-words). We also also ask them to explain their reasoning through three pro- stemmed the text to root words using a Porter Stemmer (M. vided choices: (a) “Tweets clearly indicate user’s candidate F. Porter, 1997). preference,” (b) “Tweets do not clearly indicate user’s candi- We use the top-ranked 5,000 words in the training data- date preference. But I can figure out the preference by the set as features—the attributes that we rely upon to train the tweets,” (c) “Tweets do not clearly indicate the preference. classifiers for use on the unseen data. Each user is translated This is my balanced choice.” We obtain three independent into TF-IDF vectors for the input of the classifiers. Because judgments of whether each of our 100 Twitter users was pro- we found from our training data that the proTrump commu- Clinton or proTrump, or neither. We report the results of nity was smaller with 11.6k users than the proClinton com- this user study in section 4. munity of 28.2k users in Table 2, we apply oversampling to the proTrump community to avoid class imbalance that may Topic Modeling of the Unseen Data bias the learned classification models. To evaluate the per- formance of our classifiers for each community, we use Our final step is the application of topic modeling to extract three standard metrics in information retrieval: Precision, topics among the tweets within the proClinton and proTrump Fang et al. 7 Figure 2. The user interface of the Crowdflower user study. communities from the unseen data. Here, a topic is a distri- classifier on the unseen data, we are able to classify a much bution over words in a topic model, often represented by the larger group of Twitter users into the two communities: pro- Top n (e.g., n = 10) words according to its distribution. For Clinton and proTrump. Thus, we are able to speak to overall each candidate community, we sample 200k tweets to be support on the Twitter platform, the “Twitter voteshare” for used for topic modeling. In this study, we use time-sensitive the two candidates in the 3 months leading up to the election topic modeling approaches (Fang, MacDonald, Ounis, date. Finally, we show the topics of discussion among the Habel, & Yang, 2017), as they have been shown to be effec- proClinton and proTrump communities. tive for Twitter data and can speak to the dynamics of when topics are invoked over time. The number of topics selected, Performance of the Community Classification known as K, has implications for the coherence of the topics that are extracted (Fang, MacDonald, Ounis and Habel, In this section, we first show the performance of the classi- 2016a): a small K will produce few topics that are difficult to fiers on the training data. We apply our TBNB classifier on interpret, given that they include many different themes and unseen data to assess Twitter users’ candidate preferences. ideas; whereas, a large K will produce more finite topics but We then report the results of our Crowdflower user study, ones that may not differentiate themselves well from one which indicates how well the classifier performed on the another. To select K, we first set K from 10 to 100, with step unseen data, assessing its out-of-sample performance. 10 to obtain topic models with a good quality. To evaluate the coherence and quality of the resulting topics, we use Twitter Results of classification for the training data. Table 4 speaks to coherence metrics developed by Fang, MacDonald, Ounis, the results of our classification task by several classification 25 26 and Habel (2016b). We use the average coherence and algorithms we employ, including our TBNB. As we coherence@n (c@n) to select the appropriate K number to described in subsection “Community Classification,” the yield more coherent, interpretable topics. table compares the performance of the classifier in determin- ing whether a user is proClinton or proTrump based on the textual content of tweets in our training data assessed against Results & Analysis the vote preference as revealed by the consistent use of We offer two sets of results, first related to classification for hashtags in Table 1. both our training data and unseen data in subsection From Table 4, we can see that, with the exception of the “Performance of the Community Classification,” and next random classifier (RDN), all of the classifiers exhibit a related to the topic modeling by proClinton and proTrump strong performance on the F1, Precision, Recall, and communities in subsection “Topics Extracted From the Accuracy metrics. Clearly, Twitter users in the proClinton Unseen Data, proTrump and proClinton Communities.” We and proTrump communities differentiated themselves well first report how successful we are in categorizing Twitter from one another, that the language of their tweets was suf- users in our training data as proClinton or proTrump. We ficiently distinct so as to be able to classify users correctly as show a remarkable degree of success in this task, particularly proClinton and proTrump in ways consistent with their adop- with our TBNB classifier. By subsequently applying the tion of hashtags displayed in Table 1. One can also see from 8 SAGE Open Table 5. Topics Generated From the Training Data. Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Top 10 ranked words #nevertrump vote America #nevertrump #maga Clinton party #neverhillary #dumptrump #votetrump doesn election crookedhillary #debalwaystrump #wakeupamerica tax gop Obama news #draintheswamp racist win Country media #tcot pay voting #womenwhovotetrump right women #dumptrump candidate White stupid watch care support #trumptrain follow #foxnews new voters People think way campaign nominee America delegate America Top 10 ranked words Topic 6 Topic 7 Topic 8 Topic 9 Topic 10 #demsinphilly #debatenight #dumptrump #vote #crookedhillary #hillaryclinton tonight @cnn #election #voting #demconvention #stoptrump @gop #dealmein #bernieorbust speech watch @debalwaystrump best #stillsanders proud president @msnbc #hillarystrong #demexit woman right @seanhannity #electionday #electionfraud @timkaine like @reince #goparemurderers #revolution history wait @tedcruz #gopendtimes #votersuppression amazing yes @speakerryan #voteblue #dncleak ready woman #rncincle #clintonkaine #nevertrumporhillary Table 6. Findings Our proClinton and proTrump Communities in the Unseen Data. No. of users No. of tweets Dataset proClinton proTrump Total proClinton proTrump Total Unseen data 196,702 67,816 264,518 2,126,276 1,439,060 3,565,336 the table that the TBNB classifier achieves the highest accu- 4, and 9 are proClinton topics, whereas Topics 3, 5, and 10 racy among all the classifiers, 0.851. That is, in 85.1% of are proTrump. The remaining topics in Table 5 do not have instances, the TBNB is able to classify the candidate prefer- clear polarity. This is not to say, however, that Twitter ence of the Twitter user in our ground truth data accurately— users from the two community communicate similarly using the textual information in their tweets—devoid of the about these topics. As the TBNB classifier can distinguish hashtags in Table 1. To be clear, this result demonstrates that the word probability conditioned on both topics and com- for those in our training data, the text of their tweets provides munities, it can capture the different word usage of the two information that can readily identify Twitter users as pro- communities within a topic to classify a Twitter user’s Trump and proClinton, matching the label applied based on community. the social media user’s use of hashtags. As we have argued in section “Twitter User Vote preferences of unseen Twitter users. In applying the Classification,” an added value of the TBNB classifier is trained TBNB classifier on our unseen data, we find that our also the generation of the topics used in the classification unseen tweeters differentiate into 196,702 proClinton users algorithm, which we show in Table 5. The table displays authoring 2,126,276 tweets (10.81 on average), and 67,816 the topics generated from the training data by applying the proTrump users with 1,439,060 tweets (21.98 tweets on standard LDA on the training data in TBNB classifier. The average), as shown in Table 6. The proClinton community listed topics are represented by their Top 10 most frequent across the Twitter platform is much more sizable than the words. For a given Twitter user, the TBNB classifier first proTrump one in terms of the number of users and the num- identifies which topics pertain to the user, and then the ber of election-related tweets, but Trump supporters tweet classifier assigns the community affiliation, either pro- more often on average. Clinton or proTrump, to a Twitter user based on his or her Because each tweet is associated with a time point, we word usages within the related topics. In Table 5, Topics 1, can also examine the dynamics of support, first overall and Fang et al. 9 Figure 3. The number of tweets from proClinton and proTrump communities over time. Table 7. The Performance of Classifiers Compared With Table 7 displays the Cohen’s kappa and accuracy scores Human Judgments. of the classifiers compared with the human judgments among the 76 users with either proClinton or proTrump Classifier Kappa Accuracy labels. All classifiers (with the exception of the random RDN −0.013 0.50 classifier) achieve reasonable accuracy scores. This find- Tree 0.44 0.72 ing suggests that our hashtag labeling method is effective NB 0.60 0.80 in training a valid and reliable classifier. Among the clas- SVM 0.62 0.82 sifiers, we still see that the TBNB classifier has higher MLP 0.58 0.79 kappa and accuracy scores than the others, consistent with TBNB 0.66 0.83 what we saw with the training data. Our user study dem- onstrates that our hashtag labeling method and our TBNB Note. RDN = random classifier; NB = Naive Bayesian classifier; SVM = Support Vector Machine classifier; MLP = Multilayer Perceptron classifier; classifier perform very well overall—that we can distin- TBNB = Topic-Based Naive Bayesian classifier. guish two communities of support, proClinton and proTrump. The remaining 24 users of the 100 randomly selected then by the community. In Figure 3, we show the number of do not have a clear community affiliation according to the tweets that were posted by proClinton and proTrump com- crowdsourced workers. Note that these 24 users can be munities over time. Not only were proClinton tweets more classified as either proClinton or proTrump by our classi- plentiful as we showed above in Table 6, but they were more fier. Thus, it could be that our classifier is in error, which prolific over the entire period of analysis. During the three is entirely plausible, as Twitter users may discuss the elec- televised debates, marked by spikes in the data, we see par- tion—and indeed the use of words in tweets is the crite- ticular activity among the proClinton community. rion to be in our unseen data set—while lacking a vote Results of the Crowdflower user study on the unseen data classi- preference. A second explanation could be that the crowd- fication. Table 7 presents the results of our crowdsourced sourced workers saw an insufficient sample of tweets, and user study examining the performance of our classifier on the in these (up to eight tweets), the vote preferences of the unseen data versus the evaluation of crowdsourced workers. Twitter users were not revealed. Examining additional Among the 100 randomly selected Twitter users in our content may have proven helpful. A third and related unseen data, 76 users are labeled as either proClinton or pro- explanation is that our classifier can correctly identify a Trump according to the online workers. Among these 76 given user as proClinton or proTrump community, but that Twitter users, crowdsourced workers were unanimous for 51 the textual information the classifier relies on to make this (67%)—all three workers agreed that the Twitter user was determination is not immediately discernible to proClinton, or all three agreed the user was proTrump. Con- Crowdflower workers. The classifier uses the top-ranked cerning their explanations for how they determined whether 5,000 words in the training dataset, far more than any a Twitter user was proClinton or proTrump, for 31 users, the Crowdflower worker sees among eight tweets. To illus- workers marked that the “Tweets clearly indicate user’s can- trate by example, #dealmein is found among Topic 9 in didate preference”; for 42 Twitter users, the workers Table 5 as an identifier of the proClinton community. The answered that the “Tweets do not clearly indicate user’s can- hashtag emerged as a result of a tweet by Hillary Clinton didate preference. But I can figure out the preference by the using the phrase “Deal Me In.” Whereas an online worker tweets”; and for three Twitter users, the workers selected that may not have recognized the association with the hashtag the “Tweets do not clearly indicate the preference. This is my and the proClinton community, the classifier was able to balanced choice.” learn it. 10 SAGE Open Figure 4. Topics extracted from proClinton community (Topics 1-6). Discussion of the topics by community. Our figures represent Topics Extracted From the Unseen Data, the 12 most coherent topics from the topic models of the proTrump and proClinton Communities two communities, as evaluated using the aforementioned To understand the topics of discussion among the proTrump topic coherence metrics. For example, Topic 2 in the pro- and proClinton communities, we first apply topic models on Clinton community is the second most interpretable/coher- the tweets of those users who were identified as being part of ent topic within a topic model consisting of K (here, 70) the proClinton or proTrump communities. As mentioned in topics for the proClinton community. We represent each section “Methodology,” we set K with different values. We topic by a word cloud using its Top n words, here approxi- here report on the coherence of the generated topic models mately 20 words for each topic. The size of these words and select the topic models with the best K; in this case, K of indicates how often it is used in the topic discussion. The 70 for proClinton and K of 60 for proTrump, in order ulti- blue or black color, however, is added only to ease interpre- mately to present and analyze the extracted topics. tation. We also include the trend for each topic just below Rather than present all 130 topics across the two commu- the word cloud to highlight at which moments in time that nities, for the purpose of visualization and interpretation, we particular topic was discussed. The red line represents the focus on the Top 12 topics from each community. Figures 4 volume of the related tweets over our period of analysis, and 5 display the 12 most coherent topics among the pro- where the x-axis is the timeline and “S” signals the start Clinton community, and Figures 6 and 7 display topics from date (August 1), numbers “1,” “2,” and “3” denote each proTrump community. To also aid visualization, we display debate, and “E” represents election day. A spike in a trend repeated topic words in only one instance—in the respective suggests that a topic is highly discussed at that particular topic with the highest coherence. point in time. Fang et al. 11 Figure 5. Topics extracted from proClinton community (Topics 7-12). We first present the topics in the proClinton community in find a strong linkage between Trump and racism, with words Figures 4 and 5, and then we turn to Figures 6 and 7 for the such as racism, racist, KKK, bigot, scary included. That such proTrump community. First, it should be noted that the topics a topic would emerge as the single most coherent among the are, to a degree, subject to interpretation. Second, it also 70 topics in the proClinton community speaks to the nature bears noting that where we see the similarity in topics among of the debate on Twitter. Topics 2 and 3 both have linkages to the two communities, that we can conclude that both pro- Russia, with Topic 3 particularly relevant to the email scan- Clinton and proTrump communities discussed a given issue dal including words such as truth, Putin, Foundation, pri- or event. However, the language used and the ways these vate, and emails. Topic 4 continues this theme with references issues and topics were discussed was distinct among Clinton to the FBI, Comey, lies/liar. The trends demonstrate that and Trump supporters. Finally, as a general comment, it Topics 1 through 4 all gain momentum as election day should be noted that there is a relevant dearth of policy- approaches. Topic 5 appears more positive than the previous related topics for each community, perhaps with the excep- ones, with words such as hope, nice, choice, children. Topic tion of matters related to immigration such as the border 6 is particularly relevant to the #vpdebate, including Pence wall. Instead, we see the dominance of themes such as the but also covering the need to release tax returns. Clinton email scandal, corruption, concerns over racism and Turning to the next most coherent topics, Topics 7 prejudice, gender bias, Russia, mass media and media bias, through 12 in Figure 5, we again see a mix of topics with and voter fraud, to name a few. some pertaining more directly to Trump, and others related Beginning with Figure 4, we see a mix of topics associ- more to Clinton. For example, words such as sexual assault, ated more closely with the Trump campaign and those more rape, dangerous, Billy Bush appear in Topics 7 and 8 related closely associated with the Clinton campaign. In Topic 1, we to the allegations against Trump and the Access Hollywood 12 SAGE Open Figure 6. Topics extracted from proTrump community (Topics 1-6). tape. Concerns over unfair trade, middle class, and China Finally, Topics 7 through 12 in the proTrump community appear in Topic 9. Topic 10 through 11 have a mix of more also provide an important lens to understand Trump support positive words associated with the Clinton campaign such on Twitter. Topic 7 invokes the border wall and illegal while as job, hiring, and #ClintonKaine, whereas Topic 12 again also bringing in #wikileaks and the #ClintonFoundation. returns to tackling on Trump campaign pledges with build Topic 8 turns attention to voter fraud, machine, ballots. Topic wall. 9 is an example of a topic that appeared early on in our period Turning to the Top 12 most coherent topics of discus- of analysis but was relatively quiet thereafter, touching on sion among the proTrump community, we find consider- several themes including immigration and the candidate able attention paid to Trump’s opponent. Words such as names and general words such as election, America. Topic 10 foundation, email, Clinton, Comey all appear in Topic 1, has particular relevance to the debates and debate modera- with considerable discussion from the second debate tion (e.g., Chris Wallace, debate). Topic 11 links largely to onward, and then another peak just before election day the Obama administration and concerns over a Supreme when Comey announced that the emails were being exam- Court appointment (e.g., Biden, record, Supreme Court) and ined once more. Topic 2 sees a number of mentions of includes apparent trolling of the former president through @ #CrookedHillary and #NeverHillary along with apparent barackobama. Topic 12 represents another mix of terms such trolling of the opposition with #ImWithHer used. Topic 3 as Democrat, friend, and Deplorables. points to perceived media bias, coverage/covering, left, Among the 12 most coherent topics in each community, propaganda, Obama, and Topic 5 invokes truth. Topic 5 there are also some notable absences. Apart from concern and particularly Topic 6 speak to concerns over foreign, about illegal immigration and the border wall, there are no ISIS, Soros, and muslims. clear topics pertaining to policy decisions or policy-related Fang et al. 13 Figure 7. Topics extracted from proTrump community (Topics 7-12). terms such as taxes/taxes, education, spending, defense, or then trained a set of classifiers and applied them to our even Obamacare—even during the presidential debates unseen data to understand the overall levels of support for when these matters were well discussed. There are also few Trump and Clinton on social media. Finally, we employed terms relevant to polls or battleground states in these topics, topic models to understand the topics of discussion by nor to campaign rallies and key events apart from the debates. communities of support. Taken together, our study has pro- Nor were these topics especially prescient of the ones that vided a novel view of the dynamics of support and discus- have dominated the first part of the Trump presidency, sion—shedding new light on the dynamics of public including the Mueller investigation and fake news, and pol- opinion during the 2016 election. As we have pointed out, icy efforts including a repeal of Obamacare followed by suc- a virtue of the method described here is that it is flexible cess with tax reform legislation. and can be easily extended to other electoral contexts. For example, earlier work employed the methodology for understanding support or opposition to the 2014 Scottish Discussion Independence Referendum on social media. Providing one This article has implemented a novel classification method, can identify a set of hashtags that are frequently used on TBNB, to understand overall levels of support for the 2016 Twitter and readily speak to support or opposition for a U.S. presidential election candidates and the topics of dis- given candidate, political party, referendum—than one can cussion on social media. Our method relied on users who train a classifier and then explore the levels and dynamics used hashtags expressing support for the two candidates in of such support and opposition for a large community of consistent ways over time as our ground truth data. We social media users. 14 SAGE Open Appendix A Table A1. The Words That Are Removed From proClinton Figures (Figures 4-5). Topic index Removed words 2 Trump 3 Trump 4 Email 7 Tax, Trump, hope 8 Trump 9 Support, Trump 10 Look 11 Tweet, Trump 12 Build, expect Note. Table A1 displays words that appeared in multiple topics and were removed for interpretability, as discussed in subsection “Topics Extracted From the Unseen Data, proTrump and proClinton Communities.” The words are retained in Figures 4-5 in the topic with the highest coherence. Table A2. The Words That Are Removed From proTrump Figures (Figures 6-7). Topic index Removed words 4 Lie, cover, destroy 5 Report 7 Email, love 9 Illegal 10 Debate, @realdonaltrump, go 11 @realdonaltrump, muslim, give, think 12 Poll, border, machine, Russian Note. Table A2 displays words that appeared in multiple topics and were removed for interpretability, as discussed in subsection “Topics Extracted From the Unseen Data, proTrump and proClinton Communities.” The words are retained in Figures 6-7 in the topic with the highest coherence. Appendix B Figure B1. Coherence of topic models with different K. (a) topic models from proClinton-related tweets and (b) topic models from proTrump-related tweets. Note. Figure B1 reports the coherence of topic models with the different number of topics K. These topic models are generated from the two communities. We select the best topic number K of 70 for proClinton community and K of 60 for proTrump community. Fang et al. 15 Authors’ Note that mark their candidate choice, or they blend proTrump and proClinton hashtags. We aim to classify and understand the Previous versions of this paper were presented at New York vote preferences of these unseen Twitter users, as described in University’s Social Media and Political Participation (SMaPP) subsection “Data Collection.” Global Project meeting in November 2017 and at the Alabama 10. https://dev.twitter.com Political Science Association’s (AlPSA) annual meeting in March 11. This setting allows us to obtain a sample of roughly 1% of all tweets in the United States (Morstatter, Pfeffer, Liu, & Carley, 2013), including Alaska but not including Hawaii. Acknowledgments 12. The use of the bounding box allows us to obtain tweets that We thank participants at the SMaPP Global meeting especially for are posted within the United States, as here we are fundamen- their helpful comments, particularly Josh Tucker, Jonathan Nagler, tally interested in the views of U.S. Twitter users rather than and Dean Eckles, and the audience at AlPSA for their feedback, users from other parts of the world. These tweets either have especially Thomas Shaw. We also thank the editor, Pablo Barberá, exact geo-locations (i.e., longitude & latitude) or have place and the two anonymous reviewers for their feedback. information (e.g., New York City) identifiable by Twitter. We rely on Twitter’s internal classification process to determine whether the tweet has been posted in the United States or not. Declaration of Conflicting Interests More information is provided at https://developer.twitter.com/ The author(s) declared no potential conflicts of interest with respect en/docs/tutorials/filtering-tweets-by-location.html to the research, authorship, and/or publication of this article. 13. The keyword “Donald” is not selected as we found it intro- duces too much noise, as it is more generic than “Trump” or Funding “Hillary,” for example. We did not want to collect tweets about “Donald Duck,” for example. The author(s) disclosed receipt of the following financial support 14. The used election-related hashtags are #clinton, #trump, #hill- for the research, authorship, and/or publication of this article: Anjie ary, #debatenight, and hashtags in Table 1. Note that the case Fang thanks the LKAS PhD Scholarship at the University of sensitivity of these hashtags is omitted, for example, #TRUMP Glasgow for funding support. is same as #trump. 15. Including users who tweet only one or a few times can intro- Notes duce too much noise into the analysis. 1. See Klašnja, Barberá, Beauchamp, Nagler, and Tucker (2017) 16. See https://democrats-intelligence.house.gov/uploadedfiles/ for an excellent discussion of Twitter for understanding public exhibit_b.pdf for the list. opinion, both its strengths and limitations. 17. http://scikit-learn.org 2. We use only public posts in our analysis. 18. For Multilayer Perceptron (MLP), we set the hidden layer as 3. At the time of the data collection, Twitter posts were limited to 100 and set the topic number K in Topic-Based Naive Bayes 140 characters in length. Since then tweets are permitted to be (TBNB) as 10. 280 characters. 19. Words are ranked by their frequencies in the corpus. 4. Of course, it is possible for users to troll the opposition using 20. https://www.crowdflower.com a given hashtag provocatively—for example, a Clinton sup- 21. Note that we did not disclose the user’s account handle, or porter attempting to engage Trump supporters by including name, nor other identifying information. #VoteTrump in their tweet. But, we argue that such a user 22. We used at most eight tweets to make the task more manage- would be unlikely to only tweet #VoteTrump and would able and feasible. instead be more likely to blend hashtags including ones that 23. The Crowdflower workers are required to spend at least 20 s for signal both support (e.g., #VoteTrump) and opposition to each judgment. Each worker is paid US$0.20 for each judgment. Trump (e.g., #DumpTrump or #NeverTrump) over time. To ensure quality, we prepare a set of test questions, where the 5. Note that Beauchamp (2016) demonstrates that Twitter data community labels of the Twitter users are verified in advance. can be leveraged to understand election dynamics and candi- Crowdflower workers enter the task if they reach 70% accuracy date support. on the test questions. 6. Fang, Ounis, Habel, MacDonald, and Limsopatham (2015) 24. Note that our classifier places users into either a proClinton or validated the labeling method by examining follower networks proTrump community and does not include a third option of among these users, finding that those labeled “Yes” were far neither. more likely to follow politicians from the Scottish National 25. These coherence metrics leverage word embedding models, Party (the SNP) than the others, with the SNP being the party trained using public tweets posted from August 2015 to August strongly in favor of independence. 2016, to capture the semantic similarities of words in extracted 7. Naive Bayes (NB) has the advantages of being used widely for topics, and thus give a coherence score of a topic. text classification (Kim, Han, Rim, & Myaeng, 2006; Zhang 26. Because Tree, NB, Support Vector Classification (SVM), and & Li, 2007), and it can be easily adapted with Latent Dirichlet TBNB classifiers do not have random processes, we do not Allocation (LDA). Both are probabilistic models. conduct a paired t test when comparing their results. Instead, 8. We eliminate retweets from our analysis. we use McNemar’s test (Dietterich, 1998) to see whether two 9. By unseen data, we mean the body of Twitter users who author classifiers perform equally. We find that the TBNB classifier election-related tweets but whose vote preferences have not performs differently from the other classifiers (mid – p < 0.05) been labeled—because they do not use the hashtags in Table 1 according to the McNemar’s test. 16 SAGE Open 27. We show the coherence of the topic models extracted from D. Song, D. Albakour, S. Watt, & J. Tait (Eds.), Proceedings the two candidate communities in Appendix Figure B1. The of the 39th European Conference on Information Retrieval coherence results are consistent with Fang, MacDonald, (pp. 252-265). Aberdeen, UK: Springer International Ounis, and Habel (2016a): the average coherence of a topic Publishing. model decreases when the number of topics increases; how- Fang, A., Ounis, I., Habel, P., MacDonald, C., & Limsopatham, ever, the increasing line of c@10/20/30 in Figure B1 indicates N. (2015). Topic-centric classification of Twitter user’s politi- that the top-ranked topics in a topic model are much easier cal orientation. In R. Baeza-Yates, M. Lalmas, A. Moffat, & B. to understand as K increases. Among proClinton topic mod- Ribeiro-Neto (Eds.), Proceedings of the 38th International ACM els, we found the coherence (c@10/20/30) of topics becomes SIGIR conference on Research and Development in Information stable when K reaches 70, and for proTrump, when K reaches Retrieval (pp. 791-794). Santiago, Chile: Association for 60. Therefore, we present a proClinton model with K = 70 and Computing Machinery. a proTrump model with K = 60. Grimmer, J., & Stewart, B. (2013). Text as data: The promise and 28. The complete list of generated topics is available at https:// pitfalls of automatic content analysis methods for political goo.gl/8ev4Pk. texts. Political Analysis, 21, 267-297. 29. The repeated, removed topic words are listed by the topic Guess, A., Nyhan, B., & Reifler, J. (2018). Selective exposure to where they were removed in Appendix Tables A1 and A2 for misinformation: Evidence from the consumption of fake news during the 2016 U.S. presidential campaign (Working paper). reference. Retrieved from https://www.dartmouth.edu/~nyhan/fake- news-2016.pdf References Harrag, F., Hamdi-Cherif, A., & El-Qawasmeh, E. (2010). Adamic, L. A., & Glance, N. (2005). The political blogosphere Performance of MLP and RBF neural networks on Arabic text and the 2004 U.S. election: Divided they blog. In J. Adibi, categorization using SVD. Neural Network World, 20, 441-459. M. Grobelnik, D. Mladenic, & P. Pantel (Eds.), Proceedings Howard, P. N., Woolley, S., & Calo, R. (2018). Algorithms, bots, of the 3rd international workshop on Link discovery (pp. 36- and political communication in the U.S. 2016 election: The 43). Chicago, IL: Association for Computing Machinery. challenge of automated political communication for election Allcott, H., & Gentzkow, M. (2017). Social media and fake news law and administration. Journal of Information Technology & in the 2016 election. Journal of Economic Perspectives, 31, Politics, 15, 81-93. 211-236. Joachims, T. (1998). Text categorization with support vec- Barberá, P., & Rivero, G. (2014). Twitter and Facebook are not tor machines: Learning with many relevant features. In C. representative of the general population: Political attitudes and Nedellec & C. Rouveirol (Eds.), Proceedings of the 10th demographics of British social media users. Social Science European Conference on Machine Learning (pp. 137-142). Computer Review, 33, 712-729. Chemnitz, Germany: Springer-Verlag. Beauchamp, N. (2016). Predicting and interpolating state-level Jungherr, A. (2016). Twitter use in election campaigns: A system- polls using Twitter textual data. American Journal of Political atic literature review. Journal of Information Technology & Science, 61, 490-503. Politics, 13, 72-91. Bond, R., & Messing, S. (2015). Quantifying social media’s political Khorsheed, M. S., & Al-Thubaity, A. O. (2013). Comparative evalua- space: Estimating ideology from publicly revealed preferences tion of text classification techniques using a large diverse Arabic on Facebook. American Political Science Review, 109, 62-78. dataset. Language Resources and Evaluation, 47, 513-538. Burnap, P., Gibson, R., Sloan, L., Southern, R., & Williams, M. Kim, S.-B., Han, K.-S., Rim, H.-C., & Myaeng, S. H. (2006). (2016). 140 characters to victory? Using Twitter to predict the Some effective techniques for naive Bayes text classification. UK 2015 general election. Electoral Studies, 41, 230-233. IEEE Transactions on Knowledge and Data Engineering, 18, Dietterich, T. G. (1998). Approximate statistical tests for com- 1457-1466. paring supervised classification learning algorithms. Neural Klašnja, M., Barberá, P., Beauchamp, N., Nagler, J., & Tucker, J. Computation, 10, 1895-1923. (2017). Measuring public opinion with social media data. In L. R. Fang, A., MacDonald, C., Ounis, I., & Habel, P. (2016a). Atkeson & R. M. Alvarez (Eds.), The Oxford handbook of polling Examining the coherence of the top ranked tweet topics. In R. and survey methods. Oxford, UK: Oxford University Press. Perego, F. Sebastiani, J. Aslam, I. Ruthven, & J. Zobel (Eds.), McCallum, A., & Nigam, K. (1998). A comparison of event models Proceedings of the 39th International ACM SIGIR conference for naive Bayes text classification. In Proceedings of the 15th on Research and Development in Information Retrieval (pp. AAAI conference workshop on learnning for text categorization 825-828). Pisa, Italy: Association for Computing Machinery. (pp. 41-48). Madison, US: Citeseer. Fang, A., MacDonald, C., Ounis, I., & Habel, P. (2016b). Using McKelvey, K., DiGrazia, J., & Rojas, F. (2014). Twitter publics: word embedding to evaluate the coherence of topics from How online political communities signaled electoral out- Twitter data. In R. Perego, F. Sebastiani, J. Aslam, I. Ruthven, comes in the 2010 U.S. house election. Journal of Information & J. Zobel (Eds.), Proceedings of the 39th International Technology & Politics, 17, 436-450. ACM SIGIR conference on Research and Development in Morstatter, F., Pfeffer, J., Liu, H., & Carley, K. M. (2013). Is the Information Retrieval (pp. 1057-1060). Pisa, Italy: Association sample good enough? Comparing data from Twitter’s stream- for Computing Machinery. ing API with Twitter’s firehose. In E. Kiciman, N. Ellison, B. Fang, A., MacDonald, C., Ounis, I., Habel, P., & Yang, X. (2017). Hogan, P. Resnick, & I. Soboroff (Eds.), Proceedings of the Exploring time-sensitive variational Bayesian inference LDA Seventh International AAAI Conference on Weblogs and Social for social media data. In J. M. Jose, C. Hauff, I. S. Altıngovde, Media (pp. 400-408). Boston, US: AAAI Press. Fang et al. 17 Porter, M. A., Mucha, P. J., Newman, M. E., & Warmbrand, Author Biographies C. M. (2005). A network analysis of committees in the Anjie Fang is a final year PhD student in the School of Computing U.S. house of representatives. Proceedings of the National Science of at the University of Glasgow. His PhD topic is to develop Academy of Sciences of the United States of America, 102, effective computing science approaches, such as topic modelling 7057-7062. and user classification, for analyzing political events on social Porter, M. F. (1997). Readings in information retrieval. Burlington, media platforms. US: Morgan Kaufmann. Philip Habel is an associate professor and chair of the Department Rijsbergen, C. J. V. (1979). Information retrieval (2nd ed.). of Political Science & Criminal Justice at the University of South Newton, MA: Butterworth-Heinemann. Alabama, and affiliate senior research fellow in the School of Soroush, V., Roy, D., & Aral, S. (2018). The spread of true and Political and Social Sciences at the University of Glasgow. His false news online. Science, 359, 1146-1151. areas of research include political communication, public opinion, Timberg, C. (2016, November). Russian propaganda effort helped and computational social science. spread “fake news” during election, experts say. The Washington Post. Retrieved from https://www.washingtonpost.com/business/ Iadh Ounis is professor in the School of Computing Science at the economy/russian-propaganda-effort-helped-spread-fake-news- University of Glasgow. He is also the leader of the Terrier Team during-election-experts-say/2016/11/24/793903b6-8a40-4ca9- and a former deputy director/director of knowledge exchange at the b712-716af66098fe_story.html?noredirect=on&utm_term=. Scottish Informatics & Computer Science Alliance (SICSA). His a7e52ce2d5a0 research concerns developing and evaluating novel large-scale text Tucker, J. A., Theocharis, Y., Roberts, M. E., & Barberá, P. (2017). information retrieval techniques and applications. From liberation to turmoil: Social media and democracy. Journal of Democracy, 28, 46-59. Craig MacDonald is lecturer in Information Retrieval in the Zhang, H., & Li, D. (2007). Naїve Bayes text classifier. In T. Y. School of Computing Science at the University of Glasgow. He is Lin, X. Hu, J. Han, X. Shen, & Z. Li (Eds.), IEEE International the lead developer for the Terrier IR platform. His research in infor- Conference on Granular computing (pp. 708-708). San Jose, mation retrieval includes web, Enterprise, social media and Smart US: IEEE. cities applications.

Journal

SAGE OpenSAGE

Published: Mar 27, 2019

Keywords: computer science; user classification; topic modelling; twitter; political science; social science

There are no references for this article.