Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A Pragmatic Account of the Weak Evidence Effect

A Pragmatic Account of the Weak Evidence Effect REPORT 1 1,2 2 Samuel A. Barnett , Thomas L. Griffiths , and Robert D. Hawkins Department of Computer Science, Princeton University, Princeton, New Jersey Department of Psychology, Princeton University, Princeton, New Jersey Keywords: communication, persuasion, pragmatics, decision-making an open access journal ABSTRACT Language is not only used to transmit neutral information; we often seek to persuade by arguing in favor of a particular view. Persuasion raises a number of challenges for classical accounts of belief updating, as information cannot be taken at face value. How should listeners account for a speaker’s “hidden agenda” when incorporating new information? Here, we extend recent probabilistic models of recursive social reasoning to allow for persuasive goals and show that our model provides a pragmatic account for why weakly favorable arguments may backfire, a phenomenon known as the weak evidence effect. Critically, this model predicts a systematic relationship between belief updates and expectations about the information source: weak evidence should only backfire when speakers are expected to act under persuasive goals and prefer the strongest evidence. We introduce a simple experimental paradigm called the Stick Contest to measure the extent to which the weak evidence effect depends on speaker expectations, and show that a pragmatic listener model accounts for the empirical data better than alternative models. Our findings suggest further avenues for rational models of social reasoning to illuminate classical decision-making phenomena. Citation: Barnett, S. A., Griffiths, T. L., & Hawkins, R. D. (2022). A Pragmatic Account of the Weak Evidence Effect. Open Mind: Discoveries in Cognitive “Well, he would [say that], wouldn’t he?” Science, 6, 169–182. https://doi.org/10 .1162/opmi_a_00061 —Mandy Rice-Davies, 1963 DOI: https://doi.org/10.1162/opmi_a_00061 INTRODUCTION Supplemental Materials: https://doi.org/10.1162/opmi_a_00061 Communication is a powerful engine of learning, enabling us to efficiently transmit complex information that would be costly to acquire on our own (Henrich, 2015; Tomasello, 2009). Received: 8 December 2021 Accepted: 18 July 2022 While much of what we know is learned from others, it can also be challenging to know how to incorporate socially transmitted information into our beliefs about the world. Each Competing Interests: The authors declare no conflict of interest. source is a person with a “hidden agenda” encompassing their own beliefs and desires and biases, and not all information can be treated the same (Hovland et al., 1953;O’Keefe, 2015). Corresponding Author: Robert D. Hawkins For example, when deciding whether to buy a car, we may weight information differently rdhawkins@princeton.edu depending on whether we heard it from a trusted family memory or the dealership, as we know the dealership is trying to make a sale. While such reasoning is empirically Copyright: © 2022 well-established—even young children are able to discount information from untrustworthy Massachusetts Institute of Technology or unknowledgeable individuals (Gweon et al., 2014; Harris et al., 2018; Mills & Landrum, Published under a Creative Commons Attribution 4.0 International 2016; Poulin-Dubois & Brosseau-Liard, 2016; Sobel & Kushnir, 2013; Wood et al., 2013)— (CC BY 4.0) license these phenomena have continued to pose a problem for formal models of belief updating, which typically take information at face value. The MIT Press A Pragmatic Account of the Weak Evidence Effect Barnett et al. Recent probabilistic models of social reasoning have provided a mathematical framework for understanding how listeners ought to draw inferences from socially transmitted informa- tion. Rather than treating information as a direct observation of the true state of the world, social reasoning models suggest treating the true state of the world as a latent variable that can be recovered by inverting a generative model of how an intentional agent would share information under different circumstances (Baker et al., 2017; Goodman & Frank, 2016; Goodman & Stuhlmüller, 2013; Hawthorne-Madell & Goodman, 2019; Jara-Ettinger et al., 2016; Vélez & Gweon, 2019;Whalenetal., 2017). These models raise new explanations for classic effects in the judgment and decision-making literature, where behavior is often mea- sured in social or linguistic contexts (Bagassi & Macchi, 2006; Ma et al., 2020; McKenzie & Nelson, 2003; Mosconi & Macchi, 2001; Politzer & Macchi, 2000; Sperber et al., 1995). Consider the weak evidence effect (Fernbach et al., 2011; Lopes, 1987; McKenzie et al., 2002)or boomerang effect (Petty, 2018), a striking case of non-monotonic belief updating where weak evidence in favor of a particular conclusion may backfire and actually reduce an individual’s belief in that conclusion. For example, suppose a juror is determining the guilt of a defendant in court. After hearing a prosecutor give a weak argument in support of a guilty verdict—say, calling a single witness with circumstantial evidence—we might expect the juror’s beliefs to only be shifted weakly in support of guilt. Instead, the weak evidence effect describes a situation where the prosecutor’s argument actually leads to a shift in the opposite direction – the juror may now believe that the defendant is more likely to be innocent. Importantly, social reasoning mechanisms are not necessarily in conflict with previously proposed mechanisms for the weak evidence effect, such as algorithmic biases in generating alternative hypotheses (Dasgupta et al., 2017; Fernbach et al., 2011), causal reasoning about other non-social attributes of the situation (Bhui & Gershman, 2020), or sequential belief- updating (McKenzie et al., 2002; Trueblood & Busemeyer, 2011). Both social and asocial models are able to account for the basic effect. To find unique predictions that distinguish models with a social component, then, we argue that we must shift focus from the existence of the effect to asking under what conditions it emerges. Social mechanisms lead to unique predictions about these conditions that purely asocial models cannot generate. In particular, if evidence comes from an intentional agent who is expected to present the strongest possible argument in favor of their case, then weak evidence would imply the absence of stronger evidence (Grice, 1975); otherwise weak evidence may be taken more at face value. Thus, a pragmatic account predicts a systematic relationship between a listener’s social expectations and the strength of the weak evidence effect: weak evidence should only backfire when the information source is expected to provide the strongest evidence available to them. In this paper, we proceed by first extending recent rational models of communication to equip speakers with persuasive goals (rather than purely informative ones) and present a series of simulations deriving key predictions from our model. We then introduce a simple behav- ioral paradigm, the Stick Contest, which allows us to elicit a participant’s social expectations about the speaker alongside their inferences as listeners. Based on the speaker expectations, we find that participants cluster into sub-populations of pragmatic listeners or literal listeners, who expect speakers to provide strongly persuasive evidence or informative but neutral Harris et al. (2013) presents a related model of the faint praise effect, where the omission of any stronger information that a speaker would be expected to know implies that it is more likely to be negative than positive (e.g., “James has very good handwriting.”) Importantly, this effect is sensitive to the perceived expertise of the source; no such implication follows for unknowledgable informants (see also Bonawitz et al., 2011; Gweon et al., 2014; Hsu et al., 2017, for related inferences from omission). OPEN MIND: Discoveries in Cognitive Science 170 A Pragmatic Account of the Weak Evidence Effect Barnett et al. evidence, respectively. As predicted by the pragmatic account, only the first group of partic- ipants, who expected speakers to provide persuasive evidence, reliably displayed a weak evi- dence effect in their belief updates. Finally, we use these data to quantitatively compare our model against prior asocial accounts and find that a pragmatic model accounting for these hetereogenous groups is most consistent with the empirical data. Taken together, we suggest that pragmatic reasoning mechanisms are central to explaining belief updating when evidence is presented in social contexts. FORMALIZING A PRAGMATIC ACCOUNT OF THE WEAK EVIDENCE EFFECT To derive precise behavioral predictions, we begin by formalizing the pragmatics of persuasion in a computational model. Specifically, we draw upon recent progress in the Rational Speech Act (RSA) framework (Franke & Jäger, 2016; Goodman & Frank, 2016; Scontras et al., 2018). This framework instantiates a theory of recursive social inference, whereby listeners do not naively update their beliefs to reflect the information they hear, but explicitly account for the fact that speakers are intentional agents choosing which information to provide (Grice, 1975). Reasoning about Evidence from Informative Speakers We begin by defining a pragmatic listener L who is attempting to update their beliefs about the underlying state of the world w (e.g., the guilt or innocence of the defendant), after hearing an utterance u (e.g., an argument provided by the prosecution). According to Bayes’ rule, the lis- tener’s posterior beliefs about the world P (w|u) may be derived as follows: PðÞ wju ∝ PðÞ ujw PwðÞ (1) L S where P(w) is the listener’s prior beliefs about the world and the likelihood P (u|w) is derived by imagining what a hypothetical speaker agent would choose to say in different circum- stances. This term yields different predictions given different assumptions about the speaker, captured by different speaker utility functions U. In existing RSA models, the speaker is usually assumed to be epistemically informative, choosing utterances that bring the listener’s beliefs as close as possible to the true state of the world, as measured by information-theoretic surprisal: PðÞ ujw ∝ exp αU ðÞ u; w S epi (2) U ðÞ u; w ¼ lnP ðÞ wju epi L where the free parameter α 2 [0, ∞] controls the temperature of the soft-max function and U epi denotes the utility function of an (epistemically) informative speaker. As α → ∞, the speaker increasingly chooses the single utterance with the highest utility, and as α → 0 the speaker becomes indifferent among utterances. If this hypothetical speaker, in turn, aimed to be infor- mative to the same listener defined in Equation 1, it would yield an infinite recursion: the RSA framework instead assumes that the recursion is grounded in a base case known as the “literal” listener, L , who takes evidence at face value: P ðÞ wju ∝ δ PwðÞ: (3) 〚u〛ðÞ w Here, 〚u〛 gives the literal semantics of the utterance u, with δ returning 1 if w is consistent 〚u〛(w) with the state of affairs denoted by u, and 0 (or very small ) otherwise. Reasoning about Evidence from Motivated Speakers The epistemic utility defined in Equation 2 aims only to produce assertions that most effec- tively lead to true beliefs. Often, however, speakers do not seek to neutrally inform, but to persuade in favor of a particular outcome or “hidden agenda.” What is needed to represent OPEN MIND: Discoveries in Cognitive Science 171 A Pragmatic Account of the Weak Evidence Effect Barnett et al. such persuasive goals in the RSA framework? We begin by assuming that motivated speakers have a particular goal state w* that they aim to induce in the listener, where w* does not nec- essarily coincide with the true state of affairs w. This naturally yields a persuasive utility U pers that aims to persuade the listener to adopt the intended beliefs w*: U ðÞ u; w ¼ lnP ðÞ w ju (4) pers L where we say an utterance u is strictly more persuasive than u if and only if U (u|w*) > U pers pers (u |w*) (i.e., when the utterance results in the listener assigning higher probability to the desired state w*). Following prior extensions of the speaker utility to other non-epistemic goals (e.g., Bohn et al., 2021; Yoon et al., 2018, 2020), we then define a combined utility assuming the speaker aims to jointly fulfill persuasive aims (Equation 4) while remaining consistent with the true world state w (Equation 2): PðÞ ujw ; w ∝ expfg αUuðÞ ; w ; w (5) UuðÞ ; w ; w ¼ U ðÞ u; w þ βU ðÞ u; w (6) epi pers where β is a parameter controlling the strength of the persuasive goal (we recover the standard epistemic RSA model when β = 0). This motivated speaker forms the foundation for a prag- matic model of the weak evidence effect. A pragmatic listener L who suspects that the utter- ance was generated by a motivated speaker with non-zero bias β is able to be “skeptical” of the speaker’s agenda and discount their evidence accordingly: PðÞ wju; w ; β ∝ PðÞ ujw ; w ; β PwðÞ (7) L S To see why this model allows evidence to backfire, note that the probability of different utter- ances are in competition with one another under the speaker model. In the case that w and w* coincide, the speaker is expected to choose a utterance that is strongly supportive of that state; weaker utterances have a lower probability of being chosen. Conversely, if w* deviates from the true state of affairs, stronger utterances in favor of w* will be dispreferred (because they will be false and violate the epistemic term), hence weaker utterances are more likely. In this way, the absence of strong evidence from a speaker who would be highly motivated to show it statistically implies that no such evidence exists. EXPERIMENT: THE STICK CONTEST Empirical studies of the weak evidence effect require a cover story to elicit belief judgments and manipulate the strength of evidence. Typically, this cover story is based on a real-world scenario such as a jury trial (McKenzie et al., 2002) or public policy debate (Fernbach et al., 2011), where participants are asked to report their belief in a hypothetical state such as the defendant’s guilt or the effectiveness of the policy intervention. While these cover stories are naturalistic, they also introduce several complications for evaluating models of belief updating: participants may bring in different baseline expectations based on world knowledge and the absolute scalar argument strength of verbal statements is often unclear. To address these concerns, we introduce a simple behavioral paradigm called the Stick Contest (see Figure 1). This game is inspired by a courtroom scenario: two contestants take turns presenting competing evidence to a judge, who must ultimately issue a verdict. Here, however, the Coincident with our work, Vignero (2022) has proposed a similar formulation to explain how speakers may stretch the truth of epistemic modals like “possibly” or “probably.” Although we formulate the listener’s posterior as being conditioned on a known value of β, we can also consider the case in which the listener has a prior distribution over biases and can compute (marginal) posteriors accordingly—refer to Appendix E for details. OPEN MIND: Discoveries in Cognitive Science 172 A Pragmatic Account of the Weak Evidence Effect Barnett et al. Figure 1. In the Stick Contest paradigm, participants are asked to determine whether a set of five hidden sticks is longer or shorter, on average, than a midpoint (dotted line) based on limited evidence from a pair of contestants. In the speaker expectation phase (left), partic- ipants were asked which one of the five sticks a given contestant would be most likely to show. In the listener judgment phase (right), par- ticipants were presented with a sequence of sticks from each contestant and asked to judge the likelihood that the overall sample is “longer.” verdict concerns the average length of N = 5 sticks which range from a minimum length of 1″ to a maximum length of 9″. These sticks are hidden from the judge but visible to both contes- tants, who are each given an opportunity to reveal exactly one stick as evidence for their case. As in a courtroom, each contestant has a clear agenda that is known to the judge: one con- testant is rewarded if the judge determines that the average length of the sticks is longer than the midpoint of 5″ (shown as a dotted line in Figure 1), and the other is rewarded if the judge determines that the average length of the sticks is shorter than the midpoint. This paradigm has several advantages for comparing models of the weak evidence effect. First, unlike verbal statements of evidence, the scale of evidence strength is made explicit and provided as common knowledge to the judge and contestants. The strength of a given piece of evidence is directly proportional to the length of the revealed stick, and these lengths are bounded between the minimum and maximum values. Second, while previous paradigms have operationalized the weak evidence effect in terms of a sequence of belief updates across multiple pieces of evidence (e.g., where the first piece of evidence sets a baseline for the second piece of evidence), common knowledge about the scale allows the weak evidence effect to emerge from a single piece of evidence. This property helps to disentangle the core mechanisms driving the weak evidence effect from those driving order effects (e.g., Trueblood & Busemeyer, 2011). Participants We recruited 804 participants from the Prolific crowd-sourcing platform, 723 of whom suc- cessfully completed the task and passed attention checks (see Appendix A). The task took approximately 5 to 7 minutes, and each participant was paid $1.40 for an average hourly rate OPEN MIND: Discoveries in Cognitive Science 173 A Pragmatic Account of the Weak Evidence Effect Barnett et al. of $14. We restricted recruitment to the USA, UK, and Canada and balanced recruitment evenly between male and female participants. Participants were not allowed to complete the task on mobile or to complete the experiment more than once. Design and Procedure The experiment proceeded in two phases: first, a speaker expectation phase, and second, a listener judgment phase (see Figure 1). In the speaker expectation phase, we placed partici- pants in the role of the contestants, gave them an example set of sticks {2, 4, 7, 8, 9} and asked them which ones they believed each contestant would choose to show, in order of priority. In the listener judgment phase, we placed participants in the role of the judge and presented them with a sequence of observations. After each observation, they used a slider to indicate their belief about the verdict on a scale ranging from 0 (“average is definitely shorter than five inches”) to 100 (“average is definitely longer than five inches”). It was stated explicitly that the judge knows that there are exactly five sticks, and that each contestant’s incentives are public knowledge. After each phase, we asked participants to explain their response in a free- response box (see Tables S2–S3 for sample responses). This within-participant design allowed us to examine individual co-variation between the strength of a participant’s weak evidence effect in the listener judgment phase and their beliefs about the evidence generation process in the speaker expectation phase. Critically, while the set of candidate sticks in the speaker expectation phase was held constant across all partici- pants for consistency, the strength of evidence we presented in the listener judgment phase was manipulated in a between-subjects design. The length of the first piece of evidence was chosen from the set {6, 7, 8, 9} when the long-biased contestant went first, and from the set {4, 3, 2, 1} when the short-biased contestant went first, for a total of 4 possible “strength” conditions (measured as the distance of the observation from the midpoint; we assigned more participants to the more theoretically important “weak evidence” condition, i.e., {4, 6}, to obtain a higher-powered estimate). The order of contestants was counterbalanced across participants and held constant across the speaker and listener phase. Although it was not the focus of the current study, we also presented a second piece of evidence from the other contestant to capture potential order effects (see Appendix B for preliminary analyses). RESULTS Behavioral Results Before quantitatively evaluating our model, we first examine its key qualitative predictions. Do participants exhibit a weak evidence effect in their listener judgments at all, and if so, to what extent is variation in the strength of the effect related to their expectations about the speaker? We focus on each participant’s first judgment, provided after the first piece of evidence in the listener phase. This judgment provides the clearest view of the weak evidence effect, as sub- sequent judgments may be complicated by order effects. We constructed a linear regression model predicting participants’ continuous slider responses. We included fixed effects of evi- dence strength as well as expectations from the speaker phase (coded as a categorical variable, expecting strongest evidence vs. expecting weaker evidence), and their interaction, along with a fixed effect of whether the first contestant was “short”-biased or “long”-biased. Because the An earlier iteration of our experiment only used a long-biased speaker; we report results from this version in Appendix D. OPEN MIND: Discoveries in Cognitive Science 174 A Pragmatic Account of the Weak Evidence Effect Barnett et al. design was fully between-participant (i.e., each participant only provided a single slider response as judge), no random effects were supported. As predicted, we found a significant interaction between speaker expectations and evi- dence strength, t(718) = 5.2, p <0.001;see Figure 2. For participants who expected the speaker to provide the strongest evidence (485 participants or 67% of the sample), weak evi- dence in favor of the persuasive goal backfired and actually pushed beliefs in the opposite direction, m = 34.7, 95% CI: [32.3, 37.3], p < 0.001. Meanwhile, for participants who expected speakers to “hedge” and not necessarily show the strongest evidence first (238 par- ticipants, or 33% of the sample), no weak evidence effect was found (m = 50.1, group differ- ence = −15.4, post-hoc t(367) = −6.3, p < 0.001.) We found only a marginally significant asymmetry in slider bias, p = 0.056, with short-biased participants giving slightly larger endorsements (m = 1.6 slider points) across the board. Model Simulations The qualitative effect observed the previous section is consistent with our pragmatic account: weak evidence only backfired for participants who expected speakers to provide the strongest available. In this section we conduct a series of simulations to explicitly examine the condi- tions under which this effect emerges from our model of recursive social reasoning between a speaker (who selects the evidence) and a listener (who updates their beliefs in light of the evi- dence). Our task is naturally formalized by defining the possible utterances u 2U as the pos- sible lengths of individual sticks the speaker must choose between, the world state w as the true set of sticks, and the persuasive goals w* 2 {longer, shorter} as a binary proposition corresponding to each speaker’s incentive. Because the speaker only has access to true utter- ances, all utterances have equal epistemic utility (i.e., the speaker must show one of the five Figure 2. Individual differences in the weak evidence effect are predicted by pragmatic expec- tations. Dotted line represents neutral or unchanged beliefs. Error bars are bootstrapped 95% CIs (see Figure S3 for raw distributions). OPEN MIND: Discoveries in Cognitive Science 175 A Pragmatic Account of the Weak Evidence Effect Barnett et al. actual sticks, which has the epistemic effect of reducing uncertainty about the identity of exactly one stick). Hence, the combined utility (Equation 6) simplifies to the following: SuðÞ jw ; w ; β ∝ expfg αβ ln LðÞ w ju (8) and the persuasive utility of an utterance is monotonic in the stick length (see Appendix C for complete proofs). Note that when β = 0, the pragmatic listener L expects the speaker prefer- ences to be uniform over true evidence, S (u | w, w*, β = 0) = Unif(u), thus reducing to the literal listener L . When β → ∞, the pragmatic listener expects the speaker to maximize utility and choose the single strongest piece of evidence. In our simulations, we present the listener models with different pieces of evidence u 2 {5, 6, 7, 8, 9, 10} and manipulate β, which represents the degree to which the pragmatic listener L expects the speaker S to be motivated to show data that prefers target goal state w*= longer (the case for shorter is analogous). We operationalize the size of the weak evi- dence effect as the decrease in belief for a proposition given positive evidence supporting that proposition. For example, if observing a stick length of 6″ decreased the listener’s beliefs that the sample was longer than 5″ from a prior belief of P(longer) = 0.5 to a posterior belief of P(longer | u = 6) = 0.4, then we say the size of the effect is 0.5 − 0.4 = 0.1. First, we observe that when β =0(Figure 3A, left-most column), no weak evidence effect is observed: the listener interprets the evidence literally. However, as the perceived bias of the speaker increases, we observe a weak evidence effect emerge for shorter sticks. When the perceived bias grows large (e.g., β = 100, right-most column), the weak evidence effect is found over a broad range of evidence: if the listener expects the speaker to show the single strongest piece of evidence available, then even a stick of length 8″ rules out the existence of any stronger evidence, shifting the possible range of sticks in the sample. To further understand this effect, we computed the beliefs of literal ( J ) and pragmatic ( J ) listener models as a func- 0 1 tion of the evidence they’ve been shown (Figure 3B). While the literal listener predicts a near- linear shift in beliefs as a function of positive or negative evidence, the pragmatic listener yields a sharper S-shaped curve reflecting more skeptical belief updating. Quantitative Model Comparison Our behavioral results suggest an important role for speaker expectations in explanations of the weak evidence effect, and our simulations reveal how a pragmatic listener model derives this effect from different expectations about speaker bias. In this section, we compare our model against alternative accounts by fitting them to our empirical data (see Appendix E for details). Fitting the RSA model to behavioral data. We considered several variants of the RSA model, which handled the relationship between the speaker and listener phase in different ways. The simplest variant, which we call the homogeneous model, assumes the entire population of participants is explained by a pragmatic model (z = L ) with an unknown bias. It is For related tasks studying outright lying, see Franke et al. (2020), Oey et al. (2019), Oey and Vul (2021), and Ransom et al. (2017). For a more comprehensive and multidisciplinary overview of varieties of deception and misleading, see Meibauer (2019) and Saul (2012). Because the product α · β is non-zero only if the persuasion weight β is non-zero, these two parameters are redundant in our task. We thus treat their product as a single free parameter, effectively fixing α = 1. It is possible that a near-zero α (e.g., low effort from participants) may make it difficult to empirically detect a non-zero β term in our model comparison below, but this would work against our hypothesis. OPEN MIND: Discoveries in Cognitive Science 176 A Pragmatic Account of the Weak Evidence Effect Barnett et al. Figure 3. Model simulations. (A) Our pragmatic listener model predicts a weak evidence effect for a broader range of evidence strengths at higher perceived speaker bias β. The color scale represents the extent to which the listener’s posterior beliefs decrease in light of positive evidence, where the black region represents conditions under which no weak evidence effect is predicted. (B) Posterior beliefs of literal and pragmatic listener models as a function of evidence from long-biased speaker. Horizontal line represents prior beliefs. Error bars are given by 10-fold cross-validation across parameter fits on different subsets of our behavior data, with average β = 2.03 and response offset o  = −0.13 (translating the curve down). homogeneous because the same model is assumed to be shared across the whole population. The second variant, which we call the heterogeneous model, is a mixture model where we predicted each participant’s response as a convex combination of the J and J models with 0 1 mixture weight p (i.e., marginalizing out latent assignments z ). In the third variant, which we z i call the speaker-dependent model, we explicitly fit different mixture weights depending on the participant’s response in the speaker expectations phase. Rather than learning a single mixture weight for the entire population, this variant learns independent mixture weights for different sub-groups z , defined by the different sticks j that participants chose in the speaker phase. This model asks whether conditioning on speaker data allows the model to make sufficiently better predictions about the listener data. Fitting anchor-and-adjust models to empirical data. The most prominent family of asocial models accounting for the weak evidence effect are anchor-and-adjust (AA) models. In these models, individuals compare the strength of new evidence u against a reference point R and adjust their beliefs P(w|u) up or down accordingly: PwðÞ ju ¼ PwðÞþ ηðÞ suðÞ − R ; (9) where s(u) is the strength of the evidence, and η is an adjustment weight. In the simplest variant (Hogarth & Einhorn, 1992), the reference point and scaling are fixed to a neutral baseline η = P(w)=1 − P(w) = .5 and R = 0. In a more complex variant, beliefs are not updated from a neutral baseline but instead relative to more stringent level known as the argument’s “minimum acceptable strength” (MAS; McKenzie et al., 2002), which is treated as a free parameter: R∼ Unif [−1, 1]. In this case, positive evidence that falls short of R may nonetheless be treated as neg- ative evidence and decrease the listener’s beliefs. Although the anchor is typically taken to be a specific earlier observation, it may be interpreted in the single-observation case as the OPEN MIND: Discoveries in Cognitive Science 177 A Pragmatic Account of the Weak Evidence Effect Barnett et al. Table 1. Results of the model comparison, including the likelihood achieved by the best-fitting model as well as the WAIC, and PSIS-LOO (± standard error), which penalize for model complexity. Model Variant Likelihood WAIC PSIS-LOO A&A Homogeneous −28.1 57.7 ± 9.9 28.8 ± 9.9 MAS Homogeneous 8.2 −13.3 ± 9.6 −6.6 ± 9.6 Heterogeneous 8.2 −11.3 ± 9.5 −5.6 ± 9.5 RSA Homogeneous 8.1 −13.3 ± 9.5 −6.7 ± 9.5 Heterogeneous 8.1 −10.5 ± 9.3 −5.2 ± 9.3 Speaker-dependent 12.0 −16.4 ± 9.1 −9.2 ±9.1 participant’s implicit or imagined expectations from the task instructions and cover story. Prior work using anchor-and-adjust models would not predict a relationship between behavior in the speaker phase and in the listener phase. We thus evaluated a homogeneous AA model, a homogeneous MAS model, and a heterogeneous mixture model predicting responses as a convention combination of the two. Comparison results. We examined several metrics to assess the relative performance of these models. First, as an absolute goodness of fit measure, we found the parameters that maxi- mized the model likelihood (see Table 1). As a Bayesian alternative, which penalizes models for added complexity, we also considered a measure using the full posterior, the Watanabe- Akaike (or Widely Applicable) Information Criterion (Gelman et al., 2013; Watanabe, 2013). The WAIC penalizes model flexibility in a way that asymptotically equates to Bayesian leave- one-out (LOO) cross-validation (Acerbi et al., 2018;Gelmanetal., 2013), which we also include in the form of the PSIS-LOO measure (PSIS stands for Pareto Smoothed Importance Sampling, a method for stabilizing estimates Vehtari et al., 2017). These comparison criteria (Table 1) suggest that the added complexity of the speaker-dependent RSA model is justified: it outperforms all asocial variants. For this speaker-dependent model, we found a maximum a posteriori (MAP) estimate of β = 2.26, providing strong support for a non-zero persuasive bias term. We found that the pragmatic J model best explained the judgments of participants who expected the strongest evidence to be shown during the speaker phase (mixture weight p ^ = 0.99) while the literal J model best explained the judgments of participants who expected weaker sticks to be shown (mixture weight p ^ = 0.1). Full parameter posteriors are shown in Figure S5. DISCUSSION Evidence is not a direct reflection of the world: it comes from somewhere, often from other people. Yet appropriately accounting for social sources of information has posed a challenge for models of belief-updating, even as increasing attention has been given to the role of prag- matic reasoning in classic phenomena. In this paper, we formalized a pragmatic account of the weak evidence effect via a model of recursive social reasoning, where weaker evidence may All models were implemented in WebPPL (Goodman & Stuhlmüller, 2014); code for reproducing these anal- yses is available at https://github.com/s-a-barnett/bayesian-persuasion. We drew 1,000 samples from the posterior via MCMC across four chains, with a burn-in of 7,500 steps and a lag of 100 steps between samples. OPEN MIND: Discoveries in Cognitive Science 178 A Pragmatic Account of the Weak Evidence Effect Barnett et al. backfire when the speaker is expected to have a persuasive agenda. This model critically pre- dicts that individual differences in the weak evidence effect should be related to individual differences in how the speaker is expected to select evidence. We evaluated this qualitative prediction using a novel behavioral paradigm—the Stick Contest—and demonstrated through simulations and quantitative model comparisons that our model uniquely captures this source of variance in judgments. Several avenues remain important for future work. First, while we focused on the initial judgment as the purest manifestation of the weak evidence effect, subsequent judgments are consistent with the order effects that have been the central focus of previous accounts (see Appendix B; Anderson, 1981; Davis, 1984; Trueblood & Busemeyer, 2011). Thus, we view our model of social reasoning as capturing an orthogonal aspect of the phenomenon, and further work should explicitly integrate computational-level principles of social reasoning with process-level mechanisms of sequential belief updating. Second, our model provides a foundation for accounting for related message involvement effects (e.g., emotion, attractive- ness of source), presentation effects (e.g., numerical vs. verbal descriptions), and social affilia- tion effects (i.e., whether the source is in-group) that have been examined in real-world settings of persuasion (e.g., Bohner et al., 2002; Cialdini, 1993; DeBono & Harnish, 1988;Falk& Scholz, 2018; Martire et al., 2014; Park et al., 2007), These settings also involve uncertainty about the scale of possible argument strength, unlike the clearly defined interval of lengths in our paradigm. Third, while the weak evidence effect emerges after a single level of social recursion, it is natural to ask what happens at higher levels: what about a more sophisticated speaker who is aware that weak evidence may lead to such inferences? Our paradigm explic- itly informed participants of the speaker bias, but uncertainty about the speaker’shidden agenda may give rise to a strong evidence effect (Perfors et al., 2018), where speakers are motivated to avoid the strongest arguments to appear more neutral (see Appendix E). Based on the self-explanations we elicited (Table S2), it is possible that some participants who expected less strong evidence were reasoning in this way. These individual differences are consistent with prior work reporting heterogeneity in levels of reasoning in other communica- tive tasks (e.g., Franke & Degen, 2016). We used a within-participant individual differences design for simplicity and naturalism, but there are also limitations associated with this design choice. For example, it is possible that the group of participants who expected weaker evidence to be shown first could be sys- tematically different from the other group in some way, such as differing levels of inattention or motivation, that explains their behavior on both speaker and listener trials. We aimed to control for these factors in multiple ways, including strict attention checks (Appendix A) and self- explanations (Tables S2–S3), which suggest a thoughtful rationale for expecting weaker evi- dence. However, an alternative solution would be to explicitly manipulate social expectations about the speaker in the cover story (e.g., training participants on speakers that tend to show weaker or stronger evidence first). Such a design would license stronger causal inferences, but would also raise new concerns about exactly what is being manipulated. A second limitation of our design is that the speaker phase was always presented before the listener phase. It is already known that the order of these roles may affect participants’ reasoning (e.g., Shafto et al., 2014; Sikos et al., 2021), but asocial accounts of the weak evidence effect would not predict any relationship between speaker and listener trials under either order. Hence, we chose the order we thought would minimize confusion about the task; it is not our goal to suggest that social reasoning is spontaneous or mandatory, and we expect that social- pragmatic factors may be more salient in some contexts than others (e.g., when evidence is presented verbally vs. numerically, as in Martire et al., 2014). OPEN MIND: Discoveries in Cognitive Science 179 A Pragmatic Account of the Weak Evidence Effect Barnett et al. Probabilistic models have continually emphasized the importance of the data generating process, distinguishing between assumptions like weak sampling, strong sampling, and peda- gogical sampling (Hsu & Griffiths, 2009; Shafto et al., 2014; Tenenbaum, 1999; Tenenbaum & Griffiths, 2001). Our work considers a fourth sampling assumption, rhetorical sampling, where the data are not necessarily generated in the service of pedagogy but rather in the service of persuasive rhetoric. Critically, although we formalized this account in a recursive Bayesian reasoning framework, insights about rhetorical sampling are also compatible with other frame- works: for example, work in the anchor-and-adjust framework may use similar principles to derive a relationship between information sources and reference points. Such socially sensitive objectives may be particularly key in the context of developing artificial agents that are more closely aligned with human values (Carroll et al., 2019;Hilgardetal., 2021; Irving et al., 2018). As we navigate an information landscape increasingly filled with disinformation from adversarial sources, a heightened sense of skepticism may be rational after all. ACKNOWLEDGMENTS This work was supported by grant #62220 from the John Templeton Foundation to TG. RDH is funded by a C.V. Starr Postdoctoral Fellowship and NSF SPRF award #1911835. We are grate- ful for early contributions by Mark Ho and helpful conversations with other members of the Princeton Computational Cognitive Science Lab, as well as Ryan Adams and members of the Laboratory for Intelligent Probabilistic Systems. REFERENCES Acerbi, L., Dokka, K., Angelaki, D. E., & Ma, W. J. (2018). Bayesian Carroll, M., Shah, R., Ho, M. K., Griffiths, T., Seshia, S., Abbeel, P., comparison of explicit and implicit causal inference strategies in & Dragan, A. (2019). On the utility of learning about humans for multisensory heading perception. PLOS Computational Biology, human-AI coordination. In Advances in Neural Information Pro- 14(7), e1006110. https://doi.org/10.1371/journal.pcbi.1006110, cessing Systems (pp. 5175–5186). PubMed: 30052625 Cialdini, R. B. (1993). Influence: The psychology of persuasion. Anderson, N. H. (1981). Foundations of information integration Morrow. theory. Academic Press. Dasgupta, I., Schulz, E., & Gershman, S. J. (2017). Where do Bagassi, M., & Macchi, L. (2006). Pragmatic approach to decision hypotheses come from? Cognitive Psychology, 96,1–25. https:// making under uncertainty: The case of the disjunction effect. doi.org/10.1016/j.cogpsych.2017.05.001,PubMed: 28586634 Thinking & Reasoning, 12(3), 329–350. https://doi.org/10.1080 Davis, J. H. (1984). Order in the courtroom. Psychology and Law, /13546780500375663 251–265. Baker, C. L., Jara-Ettinger, J., Saxe, R., & Tenenbaum, J. B. (2017). DeBono, K. G., & Harnish, R. J. (1988). Source expertise, source Rational quantitative attribution of beliefs, desires and percepts attractiveness, and the processing of persuasive information: A in human mentalizing. Nature Human Behaviour, 1(4), 1–10. functional approach. Journal of Personality and Social Psychology, https://doi.org/10.1038/s41562-017-0064 55(4), 541–546. https://doi.org/10.1037/0022-3514.55.4.541 Bohn,M., Tessler, M.H., Merrick,M., & Frank,M.C.(2021). Falk, E., & Scholz, C. (2018). Persuasion, influence, and value: Per- How young children integrate information sources to infer the spectives from communication and social neuroscience. Annual meaning of words. Nature Human Behaviour, 5(8), 1046–1054. Review of Psychology, 69(1), 329–356. https://doi.org/10.1146 https://doi.org/10.1038/s41562-021-01145-1, PubMed: /annurev-psych-122216-011821, PubMed: 28961060 34211148 Fernbach, P. M., Darlow, A., & Sloman, S. A. (2011). When good Bohner, G., Ruder, M., & Erb, H.-P. (2002). When expertise back- evidence goes bad: The weak evidence effect in judgment and fires: Contrast and assimilation effects in persuasion. British Jour- decision-making. Cognition, 119(3), 459–467. https://doi.org/10 nal of Social Psychology, 41(4), 495–519. https://doi.org/10.1348 .1016/j.cognition.2011.01.013, PubMed: 21345428 /014466602321149858, PubMed: 12593750 Franke, M., & Degen, J. (2016). Reasoning in reference games: Bonawitz, E., Shafto, P., Gweon, H., Goodman, N. D., Spelke, E., & Individual-vs. population-level probabilistic modeling. PLOS Schulz, L. (2011). The double-edged sword of pedagogy: Instruc- ONE, 11(5), e0154854. https://doi.org/10.1371/journal.pone tion limits spontaneous exploration and discovery. Cognition, .0154854, PubMed: 27149675 120(3), 322–330. https://doi.org/10.1016/j.cognition.2010.10 Franke, M., Dulcinati, G., & Pouscoulous, N. (2020). Strategies of .001, PubMed: 21216395 deception: Under-informativity, uninformativity, and Bhui, R., & Gershman, S. J. (2020). Paradoxical effects of persuasive lies—Misleading with different kinds of implicature. Topics in messages. Decision, 7(4), 239–258. https://doi.org/10.1037 Cognitive Science, 12(2), 583–607. https://doi.org/10.1111/tops /dec0000123 .12456, PubMed: 31541530 OPEN MIND: Discoveries in Cognitive Science 180 A Pragmatic Account of the Weak Evidence Effect Barnett et al. Franke, M., & Jäger, G. (2016). Probabilistic pragmatics, or why Lopes, L. L. (1987). Procedural debiasing. Acta Psychologica, 64(2), Bayes’ rule is probably important for pragmatics. Zeitschrift für 167–185. https://doi.org/10.1016/0001-6918(87)90005-9 Sprachwissenschaft, 35(1), 3–44. https://doi.org/10.1515/zfs Ma, F., Zeng, D., Xu, F., Compton, B. J., & Heyman, G. D. (2020). -2016-0002 Delay of gratification as reputation management. Psychological Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Science, 31(9), 1174–1182. https://doi.org/10.1177 Rubin, D. B. (2013). Bayesian data analysis. CRC Press. https:// /0956797620939940, PubMed: 32840460 doi.org/10.1201/b16018 Martire, K. A., Kemp, R. I., Sayle, M., & Newell, B. R. (2014). On Goodman, N. D., & Frank, M. C. (2016). Pragmatic language inter- the interpretation of likelihood ratios in forensic science evi- pretation as probabilistic inference. Trends in Cognitive Sciences, dence: Presentation formats and the weak evidence effect. Foren- 20(11), 818–829. https://doi.org/10.1016/j.tics.2016.08.005, sic Science International, 240,61–68. https://doi.org/10.1016/j PubMed: 27692852 .forsciint.2014.04.005, PubMed: 24814330 Goodman, N. D., & Stuhlmüller, A. (2013). Knowledge and impli- McKenzie, C. R. M., Lee, S. M., & Chen, K. K. (2002). When negative cature: Modeling language understanding as social cognition. evidence increases confidence: Change in belief after hearing two Topics in Cognitive Science, 5(1), 173–184. https://doi.org/10 sides of a dispute. Journal of Behavioral Decision Making, 15(1), .1111/tops.12007, PubMed: 23335578 1–18. https://doi.org/10.1002/bdm.400 Goodman, N. D., & Stuhlmüller, A. (2014). The design and imple- McKenzie, C. R. M., & Nelson, J. D. (2003). What a speaker’s choice mentation of probabilistic programming languages. Retrieved of frame reveals: Reference points, frame selection, and framing 2020-1-7, from https://dippl.org. effects. Psychonomic Bulletin & Review, 10(3), 596–602. https:// Grice, H. P. (1975). Logic and conversation. In P. Cole & J. Morgan doi.org/10.3758/BF03196520, PubMed: 14620352 (Eds.), Syntax and semantics, speech acts (Vol. 3). Academic Press. Meibauer, J. (2019). The Oxford handbook of lying. Oxford Univer- Gweon, H., Pelton, H., Konopka, J. A., & Schulz, L. E. (2014). Sins sity Press. https://doi.org/10.1093/oxfordhb/9780198736578.001 of omission: Children selectively explore when teachers are .0001 under-informative. Cognition, 132(3), 335–341. https://doi.org Mills, C. M., & Landrum, A. R. (2016). Learning who knows what: /10.1016/j.cognition.2014.04.013, PubMed: 24873737 Children adjust their inquiry to gather information from others. Harris, A., Corner, A., & Hahn, U. (2013). James is polite and punc- Frontiers in Psychology, 7, 951. https://doi.org/10.3389/fpsyg tual (and useless): A Bayesian formalisation of faint praise. Think- .2016.00951, PubMed: 27445916 ing & Reasoning, 19(3), 414–429. https://doi.org/10.1080 Mosconi, G., & Macchi, L. (2001). The role of pragmatic rules in /13546783.2013.801367 the conjunction fallacy. Mind & Society, 2(1), 31–57. https://doi Harris, P., Koenig, M. A., Corriveau, K. H., & Jaswal, V. K. (2018). .org/10.1007/BF02512074 Cognitive foundations of learning from testimony. Annual Review Oey, L. A., Schachner, A., & Vul, E. (2019). Designing good decep- of Psychology, 69,251–273. https://doi.org/10.1146/annurev tion: Recursive theory of mind in lying and lie detection. In Pro- -psych-122216-011710, PubMed: 28793811 ceedings of the 41st Annual Conference of the Cognitive Science Hawthorne-Madell, D., & Goodman, N. D. (2019). Reasoning Society (pp. 897–903). https://doi.org/10.31234/osf.io/5s4wc about social sources to learn from actions and outcomes. Deci- Oey, L. A., & Vul, E. (2021). Lies are crafted to the audience. In sion, 6(1), 17–60. https://doi.org/10.1037/dec0000088 Proceedings of the 43rd Annual Meeting of the Cognitive Science Henrich, J. (2015). The secret of our success: How culture is driving Society (pp. 791–797). human evolution, domesticating our species, and making us smarter. O’Keefe, D. J. (2015). Persuasion: Theory and research. Sage Princeton University Press. https://doi.org/10.2307/j.ctvc77f0d Publications. Hilgard, S., Rosenfeld, N., Banaji, M. R., Cao, J., & Parkes, D. Park, H. S., Levine, T. R., Westerman, C. Y. K., Orfgen, T., & Foregger, (2021). Learning representations by humans, for humans. In M. S. (2007). The effects of argument quality and involvement type Meila&T. Zhang(Eds.), Proceedings of the 38th International on attitude formation and attitude change: A test of dual-process Conference on Machine Learning (pp. 4227–4238). and social judgment predictions. Human Communication Hogarth,R.M., &Einhorn,H.J.(1992).Order effectsinbelief Research, 33(1), 81–102. https://doi.org/10.1111/j.1468-2958 updating: The belief-adjustment model. Cognitive Psychology, .2007.00290.x 24(1), 1–55. https://doi.org/10.1016/0010-0285(92)90002-J Perfors, A., Navarro, D., & Shafto, P. (2018). Stronger evidence isn’t Hovland, C. I., Janis, I. L., & Kelley, H. H. (1953). Communication always better: The role of social inference in evidence selection. and persuasion. Yale University Press. In Proceedings of the 40th Annual Conference of the Cognitive Hsu, A., & Griffiths, T. L. (2009). Differential use of implicit negative Science Society (pp. 864–869). evidence in generative and discriminative language learning. Petty, R. E. (2018). Attitudes and persuasion: Classic and contem- In Advances in Neural Information Processing Systems 22 porary approaches. Routledge. https://doi.org/10.4324 (pp. 754–762). /9780429502156 Hsu, A., Horng, A., Griffiths, T. L., & Chater, N. (2017). When Politzer, G., & Macchi, L. (2000). Reasoning and pragmatics. Mind absence of evidence is evidence of absence: Rational inferences & Society, 1(1), 73–93. https://doi.org/10.1007/BF02512230 from absent data. Cognitive Science, 41, 1155–1167. https://doi Poulin-Dubois, D., & Brosseau-Liard, P. (2016). The developmental .org/10.1111/cogs.12356, PubMed: 26946380 origins of selective social learning. Current Directions in Psycho- Irving, G., Christiano, P. F., & Amodei, D. (2018). AI safety via logical Science, 25(1), 60–64. https://doi.org/10.1177 debate. ArXiv, abs/1805.00899. https://doi.org/10.48550/arXiv /0963721415613962 .1805.00899 Ransom, K., Voorspoels, W., Perfors, A., & Navarro, D. (2017). A Jara-Ettinger, J., Gweon, H., Schulz, L. E., & Tenenbaum, J. B. cognitive analysis of deception without lying. In Proceedings of (2016). The näıve utility calculus: Computational principles the 39th Annual Conference of the Cognitive Science Society underlying commonsense psychology. Trends in Cognitive (pp. 992–997). Sciences, 20(8), 589–604. https://doi.org/10.1016/j.tics.2016.05 Saul, J. M. (2012). Lying, misleading, and what is said: An explora- .011, PubMed: 27388875 tion in philosophy of language and in ethics. Oxford University OPEN MIND: Discoveries in Cognitive Science 181 A Pragmatic Account of the Weak Evidence Effect Barnett et al. Press. https://doi.org/10.1093/acprof:oso/9780199603688.001 Trueblood, J. S., & Busemeyer, J. R. (2011). A quantum probability .0001 account of order effects in inference. Cognitive Science, 35(8), Scontras, G., Tessler, M. H., & Franke, M. (2018). Probabilistic lan- 1518–1552. https://doi.org/10.1111/j.1551-6709.2011.01197.x, guage understanding: An introduction to the rational speech act PubMed: 21951058 framework. Retrieved from https://problang.org, 2020-01-07. Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian Shafto, P., Goodman, N. D., & Griffiths, T. L. (2014). A rational model evaluation using leave-one-out cross-validation and account of pedagogical reasoning: Teaching by, and learning WAIC. Statistics and Computing, 27(5), 1413–1432. https://doi from, examples. Cognitive Psychology, 71,55–89. https://doi .org/10.1007/s11222-016-9696-4 .org/10.1016/j.cogpsych.2013.12.004, PubMed: 24607849 Vélez, N., & Gweon, H. (2019). Integrating incomplete information Sikos, L., Venhuizen, N. J., Drenhaus, H., & Crocker, M. W. (2021). with imperfect advice. Topics in Cognitive Science, 11(2), Speak before you listen: Pragmatic reasoning in multi-trial lan- 299–315. https://doi.org/10.1111/tops.12388,PubMed: 30414253 guage games. In Proceedings of the 43rd Annual Meeting of Vignero, L. (2022). Updating on biased probabilistic testimony. the Cognitive Science Society. Erkenntnis,1–24. https://doi.org/10.1007/s10670-022-00545-7 Sobel, D. M., & Kushnir, T. (2013). Knowledge matters: How chil- Watanabe, S. (2013). A widely applicable Bayesian information cri- dren evaluate the reliability of testimony as a process of rational terion. Journal of Machine Learning Research, 14(1), 867–897. inference. Psychological Review, 120(4), 779–797. https://doi Whalen, A., Griffiths, T. L., & Buchsbaum, D. (2017). Sensitivity to .org/10.1037/a0034191, PubMed: 24015954 shared information in social learning. Cognitive Science, 42(1), Sperber, D., Cara, F., & Girotto, V. (1995). Relevance theory 168–187. https://doi.org/10.1111/cogs.12485,PubMed: 28608488 explains the selection task. Cognition, 57(1), 31–95. https://doi Wood, L. A., Kendal, R. L., & Flynn, E. G. (2013). Whom do children .org/10.1016/0010-0277(95)00666-M, PubMed: 7587018 copy? Model-based biasesinsocial learning. Developmental Tenenbaum, J. B. (1999). Bayesian modeling of human concept Review, 33(4), 341–356. https://doi.org/10.1016/j.dr.2013.08.002 learning. In Advances in Neural Information Processing Systems Yoon, E. J., MacDonald, K., Asaba, M., Gweon, H., & Frank, M. C. (pp. 59–68). (2018). Balancing informational and social goals in active learning. Tenenbaum, J. B., & Griffiths, T. L. (2001). Generalization, similarity, In Proceedings of the 40th Annual Conference of the Cognitive and Bayesian inference. Behavioral and Brain Sciences, 24(4), Science Society (pp. 1218–1223). 629–640. https://doi.org/10.1017/S0140525X01000061, Yoon, E. J., Tessler, M. H., Goodman, N. D., & Frank, M. C. (2020). PubMed: 12048947 Polite speech emerges from competing social goals. Open Mind, Tomasello, M. (2009). The cultural origins of human cognition. 4,71–87. https://doi.org/10.1162/opmi_a_00035, PubMed: Harvard University Press. https://doi.org/10.2307/j.ctvjsf4jc 33225196 OPEN MIND: Discoveries in Cognitive Science 182 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Open Mind MIT Press

A Pragmatic Account of the Weak Evidence Effect

Loading next page...
 
/lp/mit-press/a-pragmatic-account-of-the-weak-evidence-effect-iIukAsfOZL

References (152)

Publisher
MIT Press
Copyright
© 2022 Massachusetts Institute of Technology. Published under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
eISSN
2470-2986
DOI
10.1162/opmi_a_00061
Publisher site
See Article on Publisher Site

Abstract

REPORT 1 1,2 2 Samuel A. Barnett , Thomas L. Griffiths , and Robert D. Hawkins Department of Computer Science, Princeton University, Princeton, New Jersey Department of Psychology, Princeton University, Princeton, New Jersey Keywords: communication, persuasion, pragmatics, decision-making an open access journal ABSTRACT Language is not only used to transmit neutral information; we often seek to persuade by arguing in favor of a particular view. Persuasion raises a number of challenges for classical accounts of belief updating, as information cannot be taken at face value. How should listeners account for a speaker’s “hidden agenda” when incorporating new information? Here, we extend recent probabilistic models of recursive social reasoning to allow for persuasive goals and show that our model provides a pragmatic account for why weakly favorable arguments may backfire, a phenomenon known as the weak evidence effect. Critically, this model predicts a systematic relationship between belief updates and expectations about the information source: weak evidence should only backfire when speakers are expected to act under persuasive goals and prefer the strongest evidence. We introduce a simple experimental paradigm called the Stick Contest to measure the extent to which the weak evidence effect depends on speaker expectations, and show that a pragmatic listener model accounts for the empirical data better than alternative models. Our findings suggest further avenues for rational models of social reasoning to illuminate classical decision-making phenomena. Citation: Barnett, S. A., Griffiths, T. L., & Hawkins, R. D. (2022). A Pragmatic Account of the Weak Evidence Effect. Open Mind: Discoveries in Cognitive “Well, he would [say that], wouldn’t he?” Science, 6, 169–182. https://doi.org/10 .1162/opmi_a_00061 —Mandy Rice-Davies, 1963 DOI: https://doi.org/10.1162/opmi_a_00061 INTRODUCTION Supplemental Materials: https://doi.org/10.1162/opmi_a_00061 Communication is a powerful engine of learning, enabling us to efficiently transmit complex information that would be costly to acquire on our own (Henrich, 2015; Tomasello, 2009). Received: 8 December 2021 Accepted: 18 July 2022 While much of what we know is learned from others, it can also be challenging to know how to incorporate socially transmitted information into our beliefs about the world. Each Competing Interests: The authors declare no conflict of interest. source is a person with a “hidden agenda” encompassing their own beliefs and desires and biases, and not all information can be treated the same (Hovland et al., 1953;O’Keefe, 2015). Corresponding Author: Robert D. Hawkins For example, when deciding whether to buy a car, we may weight information differently rdhawkins@princeton.edu depending on whether we heard it from a trusted family memory or the dealership, as we know the dealership is trying to make a sale. While such reasoning is empirically Copyright: © 2022 well-established—even young children are able to discount information from untrustworthy Massachusetts Institute of Technology or unknowledgeable individuals (Gweon et al., 2014; Harris et al., 2018; Mills & Landrum, Published under a Creative Commons Attribution 4.0 International 2016; Poulin-Dubois & Brosseau-Liard, 2016; Sobel & Kushnir, 2013; Wood et al., 2013)— (CC BY 4.0) license these phenomena have continued to pose a problem for formal models of belief updating, which typically take information at face value. The MIT Press A Pragmatic Account of the Weak Evidence Effect Barnett et al. Recent probabilistic models of social reasoning have provided a mathematical framework for understanding how listeners ought to draw inferences from socially transmitted informa- tion. Rather than treating information as a direct observation of the true state of the world, social reasoning models suggest treating the true state of the world as a latent variable that can be recovered by inverting a generative model of how an intentional agent would share information under different circumstances (Baker et al., 2017; Goodman & Frank, 2016; Goodman & Stuhlmüller, 2013; Hawthorne-Madell & Goodman, 2019; Jara-Ettinger et al., 2016; Vélez & Gweon, 2019;Whalenetal., 2017). These models raise new explanations for classic effects in the judgment and decision-making literature, where behavior is often mea- sured in social or linguistic contexts (Bagassi & Macchi, 2006; Ma et al., 2020; McKenzie & Nelson, 2003; Mosconi & Macchi, 2001; Politzer & Macchi, 2000; Sperber et al., 1995). Consider the weak evidence effect (Fernbach et al., 2011; Lopes, 1987; McKenzie et al., 2002)or boomerang effect (Petty, 2018), a striking case of non-monotonic belief updating where weak evidence in favor of a particular conclusion may backfire and actually reduce an individual’s belief in that conclusion. For example, suppose a juror is determining the guilt of a defendant in court. After hearing a prosecutor give a weak argument in support of a guilty verdict—say, calling a single witness with circumstantial evidence—we might expect the juror’s beliefs to only be shifted weakly in support of guilt. Instead, the weak evidence effect describes a situation where the prosecutor’s argument actually leads to a shift in the opposite direction – the juror may now believe that the defendant is more likely to be innocent. Importantly, social reasoning mechanisms are not necessarily in conflict with previously proposed mechanisms for the weak evidence effect, such as algorithmic biases in generating alternative hypotheses (Dasgupta et al., 2017; Fernbach et al., 2011), causal reasoning about other non-social attributes of the situation (Bhui & Gershman, 2020), or sequential belief- updating (McKenzie et al., 2002; Trueblood & Busemeyer, 2011). Both social and asocial models are able to account for the basic effect. To find unique predictions that distinguish models with a social component, then, we argue that we must shift focus from the existence of the effect to asking under what conditions it emerges. Social mechanisms lead to unique predictions about these conditions that purely asocial models cannot generate. In particular, if evidence comes from an intentional agent who is expected to present the strongest possible argument in favor of their case, then weak evidence would imply the absence of stronger evidence (Grice, 1975); otherwise weak evidence may be taken more at face value. Thus, a pragmatic account predicts a systematic relationship between a listener’s social expectations and the strength of the weak evidence effect: weak evidence should only backfire when the information source is expected to provide the strongest evidence available to them. In this paper, we proceed by first extending recent rational models of communication to equip speakers with persuasive goals (rather than purely informative ones) and present a series of simulations deriving key predictions from our model. We then introduce a simple behav- ioral paradigm, the Stick Contest, which allows us to elicit a participant’s social expectations about the speaker alongside their inferences as listeners. Based on the speaker expectations, we find that participants cluster into sub-populations of pragmatic listeners or literal listeners, who expect speakers to provide strongly persuasive evidence or informative but neutral Harris et al. (2013) presents a related model of the faint praise effect, where the omission of any stronger information that a speaker would be expected to know implies that it is more likely to be negative than positive (e.g., “James has very good handwriting.”) Importantly, this effect is sensitive to the perceived expertise of the source; no such implication follows for unknowledgable informants (see also Bonawitz et al., 2011; Gweon et al., 2014; Hsu et al., 2017, for related inferences from omission). OPEN MIND: Discoveries in Cognitive Science 170 A Pragmatic Account of the Weak Evidence Effect Barnett et al. evidence, respectively. As predicted by the pragmatic account, only the first group of partic- ipants, who expected speakers to provide persuasive evidence, reliably displayed a weak evi- dence effect in their belief updates. Finally, we use these data to quantitatively compare our model against prior asocial accounts and find that a pragmatic model accounting for these hetereogenous groups is most consistent with the empirical data. Taken together, we suggest that pragmatic reasoning mechanisms are central to explaining belief updating when evidence is presented in social contexts. FORMALIZING A PRAGMATIC ACCOUNT OF THE WEAK EVIDENCE EFFECT To derive precise behavioral predictions, we begin by formalizing the pragmatics of persuasion in a computational model. Specifically, we draw upon recent progress in the Rational Speech Act (RSA) framework (Franke & Jäger, 2016; Goodman & Frank, 2016; Scontras et al., 2018). This framework instantiates a theory of recursive social inference, whereby listeners do not naively update their beliefs to reflect the information they hear, but explicitly account for the fact that speakers are intentional agents choosing which information to provide (Grice, 1975). Reasoning about Evidence from Informative Speakers We begin by defining a pragmatic listener L who is attempting to update their beliefs about the underlying state of the world w (e.g., the guilt or innocence of the defendant), after hearing an utterance u (e.g., an argument provided by the prosecution). According to Bayes’ rule, the lis- tener’s posterior beliefs about the world P (w|u) may be derived as follows: PðÞ wju ∝ PðÞ ujw PwðÞ (1) L S where P(w) is the listener’s prior beliefs about the world and the likelihood P (u|w) is derived by imagining what a hypothetical speaker agent would choose to say in different circum- stances. This term yields different predictions given different assumptions about the speaker, captured by different speaker utility functions U. In existing RSA models, the speaker is usually assumed to be epistemically informative, choosing utterances that bring the listener’s beliefs as close as possible to the true state of the world, as measured by information-theoretic surprisal: PðÞ ujw ∝ exp αU ðÞ u; w S epi (2) U ðÞ u; w ¼ lnP ðÞ wju epi L where the free parameter α 2 [0, ∞] controls the temperature of the soft-max function and U epi denotes the utility function of an (epistemically) informative speaker. As α → ∞, the speaker increasingly chooses the single utterance with the highest utility, and as α → 0 the speaker becomes indifferent among utterances. If this hypothetical speaker, in turn, aimed to be infor- mative to the same listener defined in Equation 1, it would yield an infinite recursion: the RSA framework instead assumes that the recursion is grounded in a base case known as the “literal” listener, L , who takes evidence at face value: P ðÞ wju ∝ δ PwðÞ: (3) 〚u〛ðÞ w Here, 〚u〛 gives the literal semantics of the utterance u, with δ returning 1 if w is consistent 〚u〛(w) with the state of affairs denoted by u, and 0 (or very small ) otherwise. Reasoning about Evidence from Motivated Speakers The epistemic utility defined in Equation 2 aims only to produce assertions that most effec- tively lead to true beliefs. Often, however, speakers do not seek to neutrally inform, but to persuade in favor of a particular outcome or “hidden agenda.” What is needed to represent OPEN MIND: Discoveries in Cognitive Science 171 A Pragmatic Account of the Weak Evidence Effect Barnett et al. such persuasive goals in the RSA framework? We begin by assuming that motivated speakers have a particular goal state w* that they aim to induce in the listener, where w* does not nec- essarily coincide with the true state of affairs w. This naturally yields a persuasive utility U pers that aims to persuade the listener to adopt the intended beliefs w*: U ðÞ u; w ¼ lnP ðÞ w ju (4) pers L where we say an utterance u is strictly more persuasive than u if and only if U (u|w*) > U pers pers (u |w*) (i.e., when the utterance results in the listener assigning higher probability to the desired state w*). Following prior extensions of the speaker utility to other non-epistemic goals (e.g., Bohn et al., 2021; Yoon et al., 2018, 2020), we then define a combined utility assuming the speaker aims to jointly fulfill persuasive aims (Equation 4) while remaining consistent with the true world state w (Equation 2): PðÞ ujw ; w ∝ expfg αUuðÞ ; w ; w (5) UuðÞ ; w ; w ¼ U ðÞ u; w þ βU ðÞ u; w (6) epi pers where β is a parameter controlling the strength of the persuasive goal (we recover the standard epistemic RSA model when β = 0). This motivated speaker forms the foundation for a prag- matic model of the weak evidence effect. A pragmatic listener L who suspects that the utter- ance was generated by a motivated speaker with non-zero bias β is able to be “skeptical” of the speaker’s agenda and discount their evidence accordingly: PðÞ wju; w ; β ∝ PðÞ ujw ; w ; β PwðÞ (7) L S To see why this model allows evidence to backfire, note that the probability of different utter- ances are in competition with one another under the speaker model. In the case that w and w* coincide, the speaker is expected to choose a utterance that is strongly supportive of that state; weaker utterances have a lower probability of being chosen. Conversely, if w* deviates from the true state of affairs, stronger utterances in favor of w* will be dispreferred (because they will be false and violate the epistemic term), hence weaker utterances are more likely. In this way, the absence of strong evidence from a speaker who would be highly motivated to show it statistically implies that no such evidence exists. EXPERIMENT: THE STICK CONTEST Empirical studies of the weak evidence effect require a cover story to elicit belief judgments and manipulate the strength of evidence. Typically, this cover story is based on a real-world scenario such as a jury trial (McKenzie et al., 2002) or public policy debate (Fernbach et al., 2011), where participants are asked to report their belief in a hypothetical state such as the defendant’s guilt or the effectiveness of the policy intervention. While these cover stories are naturalistic, they also introduce several complications for evaluating models of belief updating: participants may bring in different baseline expectations based on world knowledge and the absolute scalar argument strength of verbal statements is often unclear. To address these concerns, we introduce a simple behavioral paradigm called the Stick Contest (see Figure 1). This game is inspired by a courtroom scenario: two contestants take turns presenting competing evidence to a judge, who must ultimately issue a verdict. Here, however, the Coincident with our work, Vignero (2022) has proposed a similar formulation to explain how speakers may stretch the truth of epistemic modals like “possibly” or “probably.” Although we formulate the listener’s posterior as being conditioned on a known value of β, we can also consider the case in which the listener has a prior distribution over biases and can compute (marginal) posteriors accordingly—refer to Appendix E for details. OPEN MIND: Discoveries in Cognitive Science 172 A Pragmatic Account of the Weak Evidence Effect Barnett et al. Figure 1. In the Stick Contest paradigm, participants are asked to determine whether a set of five hidden sticks is longer or shorter, on average, than a midpoint (dotted line) based on limited evidence from a pair of contestants. In the speaker expectation phase (left), partic- ipants were asked which one of the five sticks a given contestant would be most likely to show. In the listener judgment phase (right), par- ticipants were presented with a sequence of sticks from each contestant and asked to judge the likelihood that the overall sample is “longer.” verdict concerns the average length of N = 5 sticks which range from a minimum length of 1″ to a maximum length of 9″. These sticks are hidden from the judge but visible to both contes- tants, who are each given an opportunity to reveal exactly one stick as evidence for their case. As in a courtroom, each contestant has a clear agenda that is known to the judge: one con- testant is rewarded if the judge determines that the average length of the sticks is longer than the midpoint of 5″ (shown as a dotted line in Figure 1), and the other is rewarded if the judge determines that the average length of the sticks is shorter than the midpoint. This paradigm has several advantages for comparing models of the weak evidence effect. First, unlike verbal statements of evidence, the scale of evidence strength is made explicit and provided as common knowledge to the judge and contestants. The strength of a given piece of evidence is directly proportional to the length of the revealed stick, and these lengths are bounded between the minimum and maximum values. Second, while previous paradigms have operationalized the weak evidence effect in terms of a sequence of belief updates across multiple pieces of evidence (e.g., where the first piece of evidence sets a baseline for the second piece of evidence), common knowledge about the scale allows the weak evidence effect to emerge from a single piece of evidence. This property helps to disentangle the core mechanisms driving the weak evidence effect from those driving order effects (e.g., Trueblood & Busemeyer, 2011). Participants We recruited 804 participants from the Prolific crowd-sourcing platform, 723 of whom suc- cessfully completed the task and passed attention checks (see Appendix A). The task took approximately 5 to 7 minutes, and each participant was paid $1.40 for an average hourly rate OPEN MIND: Discoveries in Cognitive Science 173 A Pragmatic Account of the Weak Evidence Effect Barnett et al. of $14. We restricted recruitment to the USA, UK, and Canada and balanced recruitment evenly between male and female participants. Participants were not allowed to complete the task on mobile or to complete the experiment more than once. Design and Procedure The experiment proceeded in two phases: first, a speaker expectation phase, and second, a listener judgment phase (see Figure 1). In the speaker expectation phase, we placed partici- pants in the role of the contestants, gave them an example set of sticks {2, 4, 7, 8, 9} and asked them which ones they believed each contestant would choose to show, in order of priority. In the listener judgment phase, we placed participants in the role of the judge and presented them with a sequence of observations. After each observation, they used a slider to indicate their belief about the verdict on a scale ranging from 0 (“average is definitely shorter than five inches”) to 100 (“average is definitely longer than five inches”). It was stated explicitly that the judge knows that there are exactly five sticks, and that each contestant’s incentives are public knowledge. After each phase, we asked participants to explain their response in a free- response box (see Tables S2–S3 for sample responses). This within-participant design allowed us to examine individual co-variation between the strength of a participant’s weak evidence effect in the listener judgment phase and their beliefs about the evidence generation process in the speaker expectation phase. Critically, while the set of candidate sticks in the speaker expectation phase was held constant across all partici- pants for consistency, the strength of evidence we presented in the listener judgment phase was manipulated in a between-subjects design. The length of the first piece of evidence was chosen from the set {6, 7, 8, 9} when the long-biased contestant went first, and from the set {4, 3, 2, 1} when the short-biased contestant went first, for a total of 4 possible “strength” conditions (measured as the distance of the observation from the midpoint; we assigned more participants to the more theoretically important “weak evidence” condition, i.e., {4, 6}, to obtain a higher-powered estimate). The order of contestants was counterbalanced across participants and held constant across the speaker and listener phase. Although it was not the focus of the current study, we also presented a second piece of evidence from the other contestant to capture potential order effects (see Appendix B for preliminary analyses). RESULTS Behavioral Results Before quantitatively evaluating our model, we first examine its key qualitative predictions. Do participants exhibit a weak evidence effect in their listener judgments at all, and if so, to what extent is variation in the strength of the effect related to their expectations about the speaker? We focus on each participant’s first judgment, provided after the first piece of evidence in the listener phase. This judgment provides the clearest view of the weak evidence effect, as sub- sequent judgments may be complicated by order effects. We constructed a linear regression model predicting participants’ continuous slider responses. We included fixed effects of evi- dence strength as well as expectations from the speaker phase (coded as a categorical variable, expecting strongest evidence vs. expecting weaker evidence), and their interaction, along with a fixed effect of whether the first contestant was “short”-biased or “long”-biased. Because the An earlier iteration of our experiment only used a long-biased speaker; we report results from this version in Appendix D. OPEN MIND: Discoveries in Cognitive Science 174 A Pragmatic Account of the Weak Evidence Effect Barnett et al. design was fully between-participant (i.e., each participant only provided a single slider response as judge), no random effects were supported. As predicted, we found a significant interaction between speaker expectations and evi- dence strength, t(718) = 5.2, p <0.001;see Figure 2. For participants who expected the speaker to provide the strongest evidence (485 participants or 67% of the sample), weak evi- dence in favor of the persuasive goal backfired and actually pushed beliefs in the opposite direction, m = 34.7, 95% CI: [32.3, 37.3], p < 0.001. Meanwhile, for participants who expected speakers to “hedge” and not necessarily show the strongest evidence first (238 par- ticipants, or 33% of the sample), no weak evidence effect was found (m = 50.1, group differ- ence = −15.4, post-hoc t(367) = −6.3, p < 0.001.) We found only a marginally significant asymmetry in slider bias, p = 0.056, with short-biased participants giving slightly larger endorsements (m = 1.6 slider points) across the board. Model Simulations The qualitative effect observed the previous section is consistent with our pragmatic account: weak evidence only backfired for participants who expected speakers to provide the strongest available. In this section we conduct a series of simulations to explicitly examine the condi- tions under which this effect emerges from our model of recursive social reasoning between a speaker (who selects the evidence) and a listener (who updates their beliefs in light of the evi- dence). Our task is naturally formalized by defining the possible utterances u 2U as the pos- sible lengths of individual sticks the speaker must choose between, the world state w as the true set of sticks, and the persuasive goals w* 2 {longer, shorter} as a binary proposition corresponding to each speaker’s incentive. Because the speaker only has access to true utter- ances, all utterances have equal epistemic utility (i.e., the speaker must show one of the five Figure 2. Individual differences in the weak evidence effect are predicted by pragmatic expec- tations. Dotted line represents neutral or unchanged beliefs. Error bars are bootstrapped 95% CIs (see Figure S3 for raw distributions). OPEN MIND: Discoveries in Cognitive Science 175 A Pragmatic Account of the Weak Evidence Effect Barnett et al. actual sticks, which has the epistemic effect of reducing uncertainty about the identity of exactly one stick). Hence, the combined utility (Equation 6) simplifies to the following: SuðÞ jw ; w ; β ∝ expfg αβ ln LðÞ w ju (8) and the persuasive utility of an utterance is monotonic in the stick length (see Appendix C for complete proofs). Note that when β = 0, the pragmatic listener L expects the speaker prefer- ences to be uniform over true evidence, S (u | w, w*, β = 0) = Unif(u), thus reducing to the literal listener L . When β → ∞, the pragmatic listener expects the speaker to maximize utility and choose the single strongest piece of evidence. In our simulations, we present the listener models with different pieces of evidence u 2 {5, 6, 7, 8, 9, 10} and manipulate β, which represents the degree to which the pragmatic listener L expects the speaker S to be motivated to show data that prefers target goal state w*= longer (the case for shorter is analogous). We operationalize the size of the weak evi- dence effect as the decrease in belief for a proposition given positive evidence supporting that proposition. For example, if observing a stick length of 6″ decreased the listener’s beliefs that the sample was longer than 5″ from a prior belief of P(longer) = 0.5 to a posterior belief of P(longer | u = 6) = 0.4, then we say the size of the effect is 0.5 − 0.4 = 0.1. First, we observe that when β =0(Figure 3A, left-most column), no weak evidence effect is observed: the listener interprets the evidence literally. However, as the perceived bias of the speaker increases, we observe a weak evidence effect emerge for shorter sticks. When the perceived bias grows large (e.g., β = 100, right-most column), the weak evidence effect is found over a broad range of evidence: if the listener expects the speaker to show the single strongest piece of evidence available, then even a stick of length 8″ rules out the existence of any stronger evidence, shifting the possible range of sticks in the sample. To further understand this effect, we computed the beliefs of literal ( J ) and pragmatic ( J ) listener models as a func- 0 1 tion of the evidence they’ve been shown (Figure 3B). While the literal listener predicts a near- linear shift in beliefs as a function of positive or negative evidence, the pragmatic listener yields a sharper S-shaped curve reflecting more skeptical belief updating. Quantitative Model Comparison Our behavioral results suggest an important role for speaker expectations in explanations of the weak evidence effect, and our simulations reveal how a pragmatic listener model derives this effect from different expectations about speaker bias. In this section, we compare our model against alternative accounts by fitting them to our empirical data (see Appendix E for details). Fitting the RSA model to behavioral data. We considered several variants of the RSA model, which handled the relationship between the speaker and listener phase in different ways. The simplest variant, which we call the homogeneous model, assumes the entire population of participants is explained by a pragmatic model (z = L ) with an unknown bias. It is For related tasks studying outright lying, see Franke et al. (2020), Oey et al. (2019), Oey and Vul (2021), and Ransom et al. (2017). For a more comprehensive and multidisciplinary overview of varieties of deception and misleading, see Meibauer (2019) and Saul (2012). Because the product α · β is non-zero only if the persuasion weight β is non-zero, these two parameters are redundant in our task. We thus treat their product as a single free parameter, effectively fixing α = 1. It is possible that a near-zero α (e.g., low effort from participants) may make it difficult to empirically detect a non-zero β term in our model comparison below, but this would work against our hypothesis. OPEN MIND: Discoveries in Cognitive Science 176 A Pragmatic Account of the Weak Evidence Effect Barnett et al. Figure 3. Model simulations. (A) Our pragmatic listener model predicts a weak evidence effect for a broader range of evidence strengths at higher perceived speaker bias β. The color scale represents the extent to which the listener’s posterior beliefs decrease in light of positive evidence, where the black region represents conditions under which no weak evidence effect is predicted. (B) Posterior beliefs of literal and pragmatic listener models as a function of evidence from long-biased speaker. Horizontal line represents prior beliefs. Error bars are given by 10-fold cross-validation across parameter fits on different subsets of our behavior data, with average β = 2.03 and response offset o  = −0.13 (translating the curve down). homogeneous because the same model is assumed to be shared across the whole population. The second variant, which we call the heterogeneous model, is a mixture model where we predicted each participant’s response as a convex combination of the J and J models with 0 1 mixture weight p (i.e., marginalizing out latent assignments z ). In the third variant, which we z i call the speaker-dependent model, we explicitly fit different mixture weights depending on the participant’s response in the speaker expectations phase. Rather than learning a single mixture weight for the entire population, this variant learns independent mixture weights for different sub-groups z , defined by the different sticks j that participants chose in the speaker phase. This model asks whether conditioning on speaker data allows the model to make sufficiently better predictions about the listener data. Fitting anchor-and-adjust models to empirical data. The most prominent family of asocial models accounting for the weak evidence effect are anchor-and-adjust (AA) models. In these models, individuals compare the strength of new evidence u against a reference point R and adjust their beliefs P(w|u) up or down accordingly: PwðÞ ju ¼ PwðÞþ ηðÞ suðÞ − R ; (9) where s(u) is the strength of the evidence, and η is an adjustment weight. In the simplest variant (Hogarth & Einhorn, 1992), the reference point and scaling are fixed to a neutral baseline η = P(w)=1 − P(w) = .5 and R = 0. In a more complex variant, beliefs are not updated from a neutral baseline but instead relative to more stringent level known as the argument’s “minimum acceptable strength” (MAS; McKenzie et al., 2002), which is treated as a free parameter: R∼ Unif [−1, 1]. In this case, positive evidence that falls short of R may nonetheless be treated as neg- ative evidence and decrease the listener’s beliefs. Although the anchor is typically taken to be a specific earlier observation, it may be interpreted in the single-observation case as the OPEN MIND: Discoveries in Cognitive Science 177 A Pragmatic Account of the Weak Evidence Effect Barnett et al. Table 1. Results of the model comparison, including the likelihood achieved by the best-fitting model as well as the WAIC, and PSIS-LOO (± standard error), which penalize for model complexity. Model Variant Likelihood WAIC PSIS-LOO A&A Homogeneous −28.1 57.7 ± 9.9 28.8 ± 9.9 MAS Homogeneous 8.2 −13.3 ± 9.6 −6.6 ± 9.6 Heterogeneous 8.2 −11.3 ± 9.5 −5.6 ± 9.5 RSA Homogeneous 8.1 −13.3 ± 9.5 −6.7 ± 9.5 Heterogeneous 8.1 −10.5 ± 9.3 −5.2 ± 9.3 Speaker-dependent 12.0 −16.4 ± 9.1 −9.2 ±9.1 participant’s implicit or imagined expectations from the task instructions and cover story. Prior work using anchor-and-adjust models would not predict a relationship between behavior in the speaker phase and in the listener phase. We thus evaluated a homogeneous AA model, a homogeneous MAS model, and a heterogeneous mixture model predicting responses as a convention combination of the two. Comparison results. We examined several metrics to assess the relative performance of these models. First, as an absolute goodness of fit measure, we found the parameters that maxi- mized the model likelihood (see Table 1). As a Bayesian alternative, which penalizes models for added complexity, we also considered a measure using the full posterior, the Watanabe- Akaike (or Widely Applicable) Information Criterion (Gelman et al., 2013; Watanabe, 2013). The WAIC penalizes model flexibility in a way that asymptotically equates to Bayesian leave- one-out (LOO) cross-validation (Acerbi et al., 2018;Gelmanetal., 2013), which we also include in the form of the PSIS-LOO measure (PSIS stands for Pareto Smoothed Importance Sampling, a method for stabilizing estimates Vehtari et al., 2017). These comparison criteria (Table 1) suggest that the added complexity of the speaker-dependent RSA model is justified: it outperforms all asocial variants. For this speaker-dependent model, we found a maximum a posteriori (MAP) estimate of β = 2.26, providing strong support for a non-zero persuasive bias term. We found that the pragmatic J model best explained the judgments of participants who expected the strongest evidence to be shown during the speaker phase (mixture weight p ^ = 0.99) while the literal J model best explained the judgments of participants who expected weaker sticks to be shown (mixture weight p ^ = 0.1). Full parameter posteriors are shown in Figure S5. DISCUSSION Evidence is not a direct reflection of the world: it comes from somewhere, often from other people. Yet appropriately accounting for social sources of information has posed a challenge for models of belief-updating, even as increasing attention has been given to the role of prag- matic reasoning in classic phenomena. In this paper, we formalized a pragmatic account of the weak evidence effect via a model of recursive social reasoning, where weaker evidence may All models were implemented in WebPPL (Goodman & Stuhlmüller, 2014); code for reproducing these anal- yses is available at https://github.com/s-a-barnett/bayesian-persuasion. We drew 1,000 samples from the posterior via MCMC across four chains, with a burn-in of 7,500 steps and a lag of 100 steps between samples. OPEN MIND: Discoveries in Cognitive Science 178 A Pragmatic Account of the Weak Evidence Effect Barnett et al. backfire when the speaker is expected to have a persuasive agenda. This model critically pre- dicts that individual differences in the weak evidence effect should be related to individual differences in how the speaker is expected to select evidence. We evaluated this qualitative prediction using a novel behavioral paradigm—the Stick Contest—and demonstrated through simulations and quantitative model comparisons that our model uniquely captures this source of variance in judgments. Several avenues remain important for future work. First, while we focused on the initial judgment as the purest manifestation of the weak evidence effect, subsequent judgments are consistent with the order effects that have been the central focus of previous accounts (see Appendix B; Anderson, 1981; Davis, 1984; Trueblood & Busemeyer, 2011). Thus, we view our model of social reasoning as capturing an orthogonal aspect of the phenomenon, and further work should explicitly integrate computational-level principles of social reasoning with process-level mechanisms of sequential belief updating. Second, our model provides a foundation for accounting for related message involvement effects (e.g., emotion, attractive- ness of source), presentation effects (e.g., numerical vs. verbal descriptions), and social affilia- tion effects (i.e., whether the source is in-group) that have been examined in real-world settings of persuasion (e.g., Bohner et al., 2002; Cialdini, 1993; DeBono & Harnish, 1988;Falk& Scholz, 2018; Martire et al., 2014; Park et al., 2007), These settings also involve uncertainty about the scale of possible argument strength, unlike the clearly defined interval of lengths in our paradigm. Third, while the weak evidence effect emerges after a single level of social recursion, it is natural to ask what happens at higher levels: what about a more sophisticated speaker who is aware that weak evidence may lead to such inferences? Our paradigm explic- itly informed participants of the speaker bias, but uncertainty about the speaker’shidden agenda may give rise to a strong evidence effect (Perfors et al., 2018), where speakers are motivated to avoid the strongest arguments to appear more neutral (see Appendix E). Based on the self-explanations we elicited (Table S2), it is possible that some participants who expected less strong evidence were reasoning in this way. These individual differences are consistent with prior work reporting heterogeneity in levels of reasoning in other communica- tive tasks (e.g., Franke & Degen, 2016). We used a within-participant individual differences design for simplicity and naturalism, but there are also limitations associated with this design choice. For example, it is possible that the group of participants who expected weaker evidence to be shown first could be sys- tematically different from the other group in some way, such as differing levels of inattention or motivation, that explains their behavior on both speaker and listener trials. We aimed to control for these factors in multiple ways, including strict attention checks (Appendix A) and self- explanations (Tables S2–S3), which suggest a thoughtful rationale for expecting weaker evi- dence. However, an alternative solution would be to explicitly manipulate social expectations about the speaker in the cover story (e.g., training participants on speakers that tend to show weaker or stronger evidence first). Such a design would license stronger causal inferences, but would also raise new concerns about exactly what is being manipulated. A second limitation of our design is that the speaker phase was always presented before the listener phase. It is already known that the order of these roles may affect participants’ reasoning (e.g., Shafto et al., 2014; Sikos et al., 2021), but asocial accounts of the weak evidence effect would not predict any relationship between speaker and listener trials under either order. Hence, we chose the order we thought would minimize confusion about the task; it is not our goal to suggest that social reasoning is spontaneous or mandatory, and we expect that social- pragmatic factors may be more salient in some contexts than others (e.g., when evidence is presented verbally vs. numerically, as in Martire et al., 2014). OPEN MIND: Discoveries in Cognitive Science 179 A Pragmatic Account of the Weak Evidence Effect Barnett et al. Probabilistic models have continually emphasized the importance of the data generating process, distinguishing between assumptions like weak sampling, strong sampling, and peda- gogical sampling (Hsu & Griffiths, 2009; Shafto et al., 2014; Tenenbaum, 1999; Tenenbaum & Griffiths, 2001). Our work considers a fourth sampling assumption, rhetorical sampling, where the data are not necessarily generated in the service of pedagogy but rather in the service of persuasive rhetoric. Critically, although we formalized this account in a recursive Bayesian reasoning framework, insights about rhetorical sampling are also compatible with other frame- works: for example, work in the anchor-and-adjust framework may use similar principles to derive a relationship between information sources and reference points. Such socially sensitive objectives may be particularly key in the context of developing artificial agents that are more closely aligned with human values (Carroll et al., 2019;Hilgardetal., 2021; Irving et al., 2018). As we navigate an information landscape increasingly filled with disinformation from adversarial sources, a heightened sense of skepticism may be rational after all. ACKNOWLEDGMENTS This work was supported by grant #62220 from the John Templeton Foundation to TG. RDH is funded by a C.V. Starr Postdoctoral Fellowship and NSF SPRF award #1911835. We are grate- ful for early contributions by Mark Ho and helpful conversations with other members of the Princeton Computational Cognitive Science Lab, as well as Ryan Adams and members of the Laboratory for Intelligent Probabilistic Systems. REFERENCES Acerbi, L., Dokka, K., Angelaki, D. E., & Ma, W. J. (2018). Bayesian Carroll, M., Shah, R., Ho, M. K., Griffiths, T., Seshia, S., Abbeel, P., comparison of explicit and implicit causal inference strategies in & Dragan, A. (2019). On the utility of learning about humans for multisensory heading perception. PLOS Computational Biology, human-AI coordination. In Advances in Neural Information Pro- 14(7), e1006110. https://doi.org/10.1371/journal.pcbi.1006110, cessing Systems (pp. 5175–5186). PubMed: 30052625 Cialdini, R. B. (1993). Influence: The psychology of persuasion. Anderson, N. H. (1981). Foundations of information integration Morrow. theory. Academic Press. Dasgupta, I., Schulz, E., & Gershman, S. J. (2017). Where do Bagassi, M., & Macchi, L. (2006). Pragmatic approach to decision hypotheses come from? Cognitive Psychology, 96,1–25. https:// making under uncertainty: The case of the disjunction effect. doi.org/10.1016/j.cogpsych.2017.05.001,PubMed: 28586634 Thinking & Reasoning, 12(3), 329–350. https://doi.org/10.1080 Davis, J. H. (1984). Order in the courtroom. Psychology and Law, /13546780500375663 251–265. Baker, C. L., Jara-Ettinger, J., Saxe, R., & Tenenbaum, J. B. (2017). DeBono, K. G., & Harnish, R. J. (1988). Source expertise, source Rational quantitative attribution of beliefs, desires and percepts attractiveness, and the processing of persuasive information: A in human mentalizing. Nature Human Behaviour, 1(4), 1–10. functional approach. Journal of Personality and Social Psychology, https://doi.org/10.1038/s41562-017-0064 55(4), 541–546. https://doi.org/10.1037/0022-3514.55.4.541 Bohn,M., Tessler, M.H., Merrick,M., & Frank,M.C.(2021). Falk, E., & Scholz, C. (2018). Persuasion, influence, and value: Per- How young children integrate information sources to infer the spectives from communication and social neuroscience. Annual meaning of words. Nature Human Behaviour, 5(8), 1046–1054. Review of Psychology, 69(1), 329–356. https://doi.org/10.1146 https://doi.org/10.1038/s41562-021-01145-1, PubMed: /annurev-psych-122216-011821, PubMed: 28961060 34211148 Fernbach, P. M., Darlow, A., & Sloman, S. A. (2011). When good Bohner, G., Ruder, M., & Erb, H.-P. (2002). When expertise back- evidence goes bad: The weak evidence effect in judgment and fires: Contrast and assimilation effects in persuasion. British Jour- decision-making. Cognition, 119(3), 459–467. https://doi.org/10 nal of Social Psychology, 41(4), 495–519. https://doi.org/10.1348 .1016/j.cognition.2011.01.013, PubMed: 21345428 /014466602321149858, PubMed: 12593750 Franke, M., & Degen, J. (2016). Reasoning in reference games: Bonawitz, E., Shafto, P., Gweon, H., Goodman, N. D., Spelke, E., & Individual-vs. population-level probabilistic modeling. PLOS Schulz, L. (2011). The double-edged sword of pedagogy: Instruc- ONE, 11(5), e0154854. https://doi.org/10.1371/journal.pone tion limits spontaneous exploration and discovery. Cognition, .0154854, PubMed: 27149675 120(3), 322–330. https://doi.org/10.1016/j.cognition.2010.10 Franke, M., Dulcinati, G., & Pouscoulous, N. (2020). Strategies of .001, PubMed: 21216395 deception: Under-informativity, uninformativity, and Bhui, R., & Gershman, S. J. (2020). Paradoxical effects of persuasive lies—Misleading with different kinds of implicature. Topics in messages. Decision, 7(4), 239–258. https://doi.org/10.1037 Cognitive Science, 12(2), 583–607. https://doi.org/10.1111/tops /dec0000123 .12456, PubMed: 31541530 OPEN MIND: Discoveries in Cognitive Science 180 A Pragmatic Account of the Weak Evidence Effect Barnett et al. Franke, M., & Jäger, G. (2016). Probabilistic pragmatics, or why Lopes, L. L. (1987). Procedural debiasing. Acta Psychologica, 64(2), Bayes’ rule is probably important for pragmatics. Zeitschrift für 167–185. https://doi.org/10.1016/0001-6918(87)90005-9 Sprachwissenschaft, 35(1), 3–44. https://doi.org/10.1515/zfs Ma, F., Zeng, D., Xu, F., Compton, B. J., & Heyman, G. D. (2020). -2016-0002 Delay of gratification as reputation management. Psychological Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Science, 31(9), 1174–1182. https://doi.org/10.1177 Rubin, D. B. (2013). Bayesian data analysis. CRC Press. https:// /0956797620939940, PubMed: 32840460 doi.org/10.1201/b16018 Martire, K. A., Kemp, R. I., Sayle, M., & Newell, B. R. (2014). On Goodman, N. D., & Frank, M. C. (2016). Pragmatic language inter- the interpretation of likelihood ratios in forensic science evi- pretation as probabilistic inference. Trends in Cognitive Sciences, dence: Presentation formats and the weak evidence effect. Foren- 20(11), 818–829. https://doi.org/10.1016/j.tics.2016.08.005, sic Science International, 240,61–68. https://doi.org/10.1016/j PubMed: 27692852 .forsciint.2014.04.005, PubMed: 24814330 Goodman, N. D., & Stuhlmüller, A. (2013). Knowledge and impli- McKenzie, C. R. M., Lee, S. M., & Chen, K. K. (2002). When negative cature: Modeling language understanding as social cognition. evidence increases confidence: Change in belief after hearing two Topics in Cognitive Science, 5(1), 173–184. https://doi.org/10 sides of a dispute. Journal of Behavioral Decision Making, 15(1), .1111/tops.12007, PubMed: 23335578 1–18. https://doi.org/10.1002/bdm.400 Goodman, N. D., & Stuhlmüller, A. (2014). The design and imple- McKenzie, C. R. M., & Nelson, J. D. (2003). What a speaker’s choice mentation of probabilistic programming languages. Retrieved of frame reveals: Reference points, frame selection, and framing 2020-1-7, from https://dippl.org. effects. Psychonomic Bulletin & Review, 10(3), 596–602. https:// Grice, H. P. (1975). Logic and conversation. In P. Cole & J. Morgan doi.org/10.3758/BF03196520, PubMed: 14620352 (Eds.), Syntax and semantics, speech acts (Vol. 3). Academic Press. Meibauer, J. (2019). The Oxford handbook of lying. Oxford Univer- Gweon, H., Pelton, H., Konopka, J. A., & Schulz, L. E. (2014). Sins sity Press. https://doi.org/10.1093/oxfordhb/9780198736578.001 of omission: Children selectively explore when teachers are .0001 under-informative. Cognition, 132(3), 335–341. https://doi.org Mills, C. M., & Landrum, A. R. (2016). Learning who knows what: /10.1016/j.cognition.2014.04.013, PubMed: 24873737 Children adjust their inquiry to gather information from others. Harris, A., Corner, A., & Hahn, U. (2013). James is polite and punc- Frontiers in Psychology, 7, 951. https://doi.org/10.3389/fpsyg tual (and useless): A Bayesian formalisation of faint praise. Think- .2016.00951, PubMed: 27445916 ing & Reasoning, 19(3), 414–429. https://doi.org/10.1080 Mosconi, G., & Macchi, L. (2001). The role of pragmatic rules in /13546783.2013.801367 the conjunction fallacy. Mind & Society, 2(1), 31–57. https://doi Harris, P., Koenig, M. A., Corriveau, K. H., & Jaswal, V. K. (2018). .org/10.1007/BF02512074 Cognitive foundations of learning from testimony. Annual Review Oey, L. A., Schachner, A., & Vul, E. (2019). Designing good decep- of Psychology, 69,251–273. https://doi.org/10.1146/annurev tion: Recursive theory of mind in lying and lie detection. In Pro- -psych-122216-011710, PubMed: 28793811 ceedings of the 41st Annual Conference of the Cognitive Science Hawthorne-Madell, D., & Goodman, N. D. (2019). Reasoning Society (pp. 897–903). https://doi.org/10.31234/osf.io/5s4wc about social sources to learn from actions and outcomes. Deci- Oey, L. A., & Vul, E. (2021). Lies are crafted to the audience. In sion, 6(1), 17–60. https://doi.org/10.1037/dec0000088 Proceedings of the 43rd Annual Meeting of the Cognitive Science Henrich, J. (2015). The secret of our success: How culture is driving Society (pp. 791–797). human evolution, domesticating our species, and making us smarter. O’Keefe, D. J. (2015). Persuasion: Theory and research. Sage Princeton University Press. https://doi.org/10.2307/j.ctvc77f0d Publications. Hilgard, S., Rosenfeld, N., Banaji, M. R., Cao, J., & Parkes, D. Park, H. S., Levine, T. R., Westerman, C. Y. K., Orfgen, T., & Foregger, (2021). Learning representations by humans, for humans. In M. S. (2007). The effects of argument quality and involvement type Meila&T. Zhang(Eds.), Proceedings of the 38th International on attitude formation and attitude change: A test of dual-process Conference on Machine Learning (pp. 4227–4238). and social judgment predictions. Human Communication Hogarth,R.M., &Einhorn,H.J.(1992).Order effectsinbelief Research, 33(1), 81–102. https://doi.org/10.1111/j.1468-2958 updating: The belief-adjustment model. Cognitive Psychology, .2007.00290.x 24(1), 1–55. https://doi.org/10.1016/0010-0285(92)90002-J Perfors, A., Navarro, D., & Shafto, P. (2018). Stronger evidence isn’t Hovland, C. I., Janis, I. L., & Kelley, H. H. (1953). Communication always better: The role of social inference in evidence selection. and persuasion. Yale University Press. In Proceedings of the 40th Annual Conference of the Cognitive Hsu, A., & Griffiths, T. L. (2009). Differential use of implicit negative Science Society (pp. 864–869). evidence in generative and discriminative language learning. Petty, R. E. (2018). Attitudes and persuasion: Classic and contem- In Advances in Neural Information Processing Systems 22 porary approaches. Routledge. https://doi.org/10.4324 (pp. 754–762). /9780429502156 Hsu, A., Horng, A., Griffiths, T. L., & Chater, N. (2017). When Politzer, G., & Macchi, L. (2000). Reasoning and pragmatics. Mind absence of evidence is evidence of absence: Rational inferences & Society, 1(1), 73–93. https://doi.org/10.1007/BF02512230 from absent data. Cognitive Science, 41, 1155–1167. https://doi Poulin-Dubois, D., & Brosseau-Liard, P. (2016). The developmental .org/10.1111/cogs.12356, PubMed: 26946380 origins of selective social learning. Current Directions in Psycho- Irving, G., Christiano, P. F., & Amodei, D. (2018). AI safety via logical Science, 25(1), 60–64. https://doi.org/10.1177 debate. ArXiv, abs/1805.00899. https://doi.org/10.48550/arXiv /0963721415613962 .1805.00899 Ransom, K., Voorspoels, W., Perfors, A., & Navarro, D. (2017). A Jara-Ettinger, J., Gweon, H., Schulz, L. E., & Tenenbaum, J. B. cognitive analysis of deception without lying. In Proceedings of (2016). The näıve utility calculus: Computational principles the 39th Annual Conference of the Cognitive Science Society underlying commonsense psychology. Trends in Cognitive (pp. 992–997). Sciences, 20(8), 589–604. https://doi.org/10.1016/j.tics.2016.05 Saul, J. M. (2012). Lying, misleading, and what is said: An explora- .011, PubMed: 27388875 tion in philosophy of language and in ethics. Oxford University OPEN MIND: Discoveries in Cognitive Science 181 A Pragmatic Account of the Weak Evidence Effect Barnett et al. Press. https://doi.org/10.1093/acprof:oso/9780199603688.001 Trueblood, J. S., & Busemeyer, J. R. (2011). A quantum probability .0001 account of order effects in inference. Cognitive Science, 35(8), Scontras, G., Tessler, M. H., & Franke, M. (2018). Probabilistic lan- 1518–1552. https://doi.org/10.1111/j.1551-6709.2011.01197.x, guage understanding: An introduction to the rational speech act PubMed: 21951058 framework. Retrieved from https://problang.org, 2020-01-07. Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian Shafto, P., Goodman, N. D., & Griffiths, T. L. (2014). A rational model evaluation using leave-one-out cross-validation and account of pedagogical reasoning: Teaching by, and learning WAIC. Statistics and Computing, 27(5), 1413–1432. https://doi from, examples. Cognitive Psychology, 71,55–89. https://doi .org/10.1007/s11222-016-9696-4 .org/10.1016/j.cogpsych.2013.12.004, PubMed: 24607849 Vélez, N., & Gweon, H. (2019). Integrating incomplete information Sikos, L., Venhuizen, N. J., Drenhaus, H., & Crocker, M. W. (2021). with imperfect advice. Topics in Cognitive Science, 11(2), Speak before you listen: Pragmatic reasoning in multi-trial lan- 299–315. https://doi.org/10.1111/tops.12388,PubMed: 30414253 guage games. In Proceedings of the 43rd Annual Meeting of Vignero, L. (2022). Updating on biased probabilistic testimony. the Cognitive Science Society. Erkenntnis,1–24. https://doi.org/10.1007/s10670-022-00545-7 Sobel, D. M., & Kushnir, T. (2013). Knowledge matters: How chil- Watanabe, S. (2013). A widely applicable Bayesian information cri- dren evaluate the reliability of testimony as a process of rational terion. Journal of Machine Learning Research, 14(1), 867–897. inference. Psychological Review, 120(4), 779–797. https://doi Whalen, A., Griffiths, T. L., & Buchsbaum, D. (2017). Sensitivity to .org/10.1037/a0034191, PubMed: 24015954 shared information in social learning. Cognitive Science, 42(1), Sperber, D., Cara, F., & Girotto, V. (1995). Relevance theory 168–187. https://doi.org/10.1111/cogs.12485,PubMed: 28608488 explains the selection task. Cognition, 57(1), 31–95. https://doi Wood, L. A., Kendal, R. L., & Flynn, E. G. (2013). Whom do children .org/10.1016/0010-0277(95)00666-M, PubMed: 7587018 copy? Model-based biasesinsocial learning. Developmental Tenenbaum, J. B. (1999). Bayesian modeling of human concept Review, 33(4), 341–356. https://doi.org/10.1016/j.dr.2013.08.002 learning. In Advances in Neural Information Processing Systems Yoon, E. J., MacDonald, K., Asaba, M., Gweon, H., & Frank, M. C. (pp. 59–68). (2018). Balancing informational and social goals in active learning. Tenenbaum, J. B., & Griffiths, T. L. (2001). Generalization, similarity, In Proceedings of the 40th Annual Conference of the Cognitive and Bayesian inference. Behavioral and Brain Sciences, 24(4), Science Society (pp. 1218–1223). 629–640. https://doi.org/10.1017/S0140525X01000061, Yoon, E. J., Tessler, M. H., Goodman, N. D., & Frank, M. C. (2020). PubMed: 12048947 Polite speech emerges from competing social goals. Open Mind, Tomasello, M. (2009). The cultural origins of human cognition. 4,71–87. https://doi.org/10.1162/opmi_a_00035, PubMed: Harvard University Press. https://doi.org/10.2307/j.ctvjsf4jc 33225196 OPEN MIND: Discoveries in Cognitive Science 182

Journal

Open MindMIT Press

Published: Sep 28, 2022

There are no references for this article.