Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Accurately measuring willingness to pay for consumer goods: a meta-analysis of the hypothetical bias

Accurately measuring willingness to pay for consumer goods: a meta-analysis of the hypothetical bias Consumers’ willingness to pay (WTP) is highly relevant to managers and academics, and the various direct and indirect methods used to measure it vary in their accuracy, defined as how closely the hypothetically measured WTP (HWTP) matches consumers’ real WTP (RWTP). The difference between HWTP and RWTP is the Bhypothetical bias.^ A prevalent assumption in marketing science is that indirect methods measure WTP more accurately than do direct methods. With a meta-analysis of 77 studies reported in 47 papers and resulting in 115 effect sizes, we test that assumption by assessing the hypothetical bias. The total sample consists of 24,347 included observations for HWTP and 20,656 for RWTP. Moving beyond extant meta-analyses in marketing, we introduce an effect size metric (i.e., response ratio) and a novel analysis method (i.e., multivariate mixed linear model) to analyze the stochastically dependent effect sizes. Our findings are relevant for academic researchers and managers. First, on average, the hypothetical bias is 21%, and this study provides a reference point for the expected magnitude of the hypothetical bias. Second, the deviation primarily depends on the use of a direct or indirect method for measuring HWTP. In contrast with conventional wisdom, indirect methods actually overestimate RWTP significantly stronger than direct methods. Third, the hypothetical bias is greater for higher valued products, specialty goods (cf. other product types), and within-subject designs (cf. between-subject designs), thus a stronger downward adjustment of HWTP values is necessary to reflect consumers’ RWTP. . . . . . . Keywords Willingness to pay Reservation price Pricing Conjoint analysis Measurement accuracy Hypothetical bias . . Meta-analysis Response ratio Stochastically dependent effect sizes Introduction willingness to pay (WTP) is Bthe cornerstone of marketing strategy^ that drives important marketing decisions. First, con- In a state-of-practice study of consumer value assessments, sumers’ WTP is the central input for price response models that Anderson et al. (1992, p. 3) point out that consumers’ inform optimal pricing and promotion decisions. Second, a new product’s introductory price must be carefully chosen, because a Mark Houston and John Hulland served as Special Issue Editors for this poorly considered introductory price can jeopardize the invest- article. ments in its development and threaten innovation failures Electronic supplementary material The online version of this article (Ingenbleek et al. 2013). Not only do companies need to know (https://doi.org/10.1007/s11747-019-00666-6) contains supplementary what consumers are willing to pay early in their product devel- material, which is available to authorized users. opment process, but WTP is also of interest to researchers in marketing and economics who seek to quantify concepts such as * Jonas Schmidt jo.schmidt@uni-muenster.de a product’s value (Steiner et al. 2016). Obtaining accurate mea- sures of consumers’ WTP thus is essential. Tammo H.A.Bijmolt Existing methods for measuring WTP can be assigned to a t.h.a.bijmolt@rug.nl 2 × 2 classification (Miller et al. 2011), according to whether they measure WTP in a hypothetical or real context, with Marketing Center Muenster, University of Muenster, Am direct or indirect measurement methods (see Table 1). First, Stadtgraben 13-15, 48143 Muenster, Germany a hypothetical measure of WTP (HWTP) does not impose any Department of Marketing, Faculty of Economics and Business, financial consequences for participants’ decisions. University of Groningen, Nettelbosje 2, 9747 AE Groningen, The Netherlands Participants just state what they would pay for a product, if 500 J. of the Acad. Mark. Sci. (2020) 48:499–518 Table 1 Classification of Type of measurement methods for measuring WTP Context Direct Indirect Hypothetical � Open questioning � Closed-ended � Conjoint analysis � Choice bracketing procedure Real � Vickrey auction � BDM lottery th � Random n price auction � Incentive-aligned conjoint analysis � English auction � eBay given the opportunity to buy it. In contrast, participants may clear summary of these findings is available, and considering be required to pay their stated WTP in a real context, which the discrepancy between theory and practice, Bthere is a lack provides a real measure of WTP (RWTP). This could for ex- of consensus on the ‘right’ way to measure […]consumer’s ample be in the context of an auction, where the winner in the reservation price^ (Wang et al. 2007, p. 200). Therefore, with end actually has to buy the product. The difference between this study we seek to shed new light on the relative accuracy of RWTP and HWTP is induced by the hypothetical context and alternative methods for measuring consumers’ WTP, and par- is called Bhypothetical bias.^ This hypothetical bias provides a ticularly the accuracy of direct versus indirect methods. We measure of the hypothetical method’s accuracy (Harrison and perform a meta-analysis of existing studies that measure Rutström 2008). In case HWTP is measured with two differ- HWTP and RWTP for the same product or service, which ent methods, the one with the lower hypothetical bias gives a reveals some empirical generalizations regarding accuracy. more accurate estimate of participants’ RWTP, increasing the We also acknowledge the potential influence of other factors estimate’s validity. We conceptualize the hypothetical bias as on the accuracy of WTP measures (Hofstetter et al. 2013; the ratio of HWTP to RWTP. A method yielding an exemplary Sichtmann et al. 2011), such that we anticipate substantial hypothetical bias of 1.5 shows that those participants overstate heterogeneity across extant studies. With a meta-regression, their RWTP for a product by 50% when asked hypothetically. we accordingly identify moderators that might explain this Second, direct methods ask consumers directly for their WTP, heterogeneity in WTP accuracy (Thompson and Sharp 1999; whereas indirect methods require consumers to evaluate, com- van Houwelingen et al. 2002). Our multivariate mixed linear pare, and choose among different product alternatives, and the model enables us to analyze the stochastically dependent ef- price attribute is just one of several characteristics. Then, WTP fect sizes (ESs) explicitly (Gleser and Olkin 2009; Kalaian can be derived from their responses. and Raudenbush 1996), which provides the most accurate Many researchers assume thatdirectmethods create a stronger way to deal with dependent ESs (van den Noortgate et al. hypotheticalbias,becausetheyevokegreaterpriceconsciousness 2013). As an effect size (ES) measure, we use the response (Völckner 2006). In their pricing textbook, Nagle and Müller ratio of HWTP and RWTP (Hedges et al. 1999), such that we (2018) allege that direct questioning Bshould never be accepted obtain the relative deviation of HWTP. To the best of our as a valid methodology. The results of such studies are at best knowledge, no previous meta-analysis in marketing has ap- useless and are potentially highly misleading^ (p. 186). Simon plied a mixed linear model nor a response ratio to measure (2018) takes a similar line, stating, BIt doesn’t make sense to ask ESs. consumers directly for the utility or their WTP, as they aren’table On average, the hypothetical bias is about 21%. In addition, to give a direct and precise estimate. The most important direct methods outperform indirect methods with regard to method to quantify utilities and WTP is the conjoint their accuracy. The meta-regression shows that, compared analysis^ (p. 53). Because indirect methods represent a with direct measurement methods, the hypothetical bias is shopping experience, they are expected to be more ac- considerably higher in indirect measures, by 10 percentage curate for measuring HWTP (Breidert et al. 2006;Leigh et al. 1984;Völckner 2006). Still, practitioners largely Three meta-analyses dealing with the hypothetical bias exist (Carson et al. continue to rely on direct survey methods, which tend to be 1996;List and Gallet 2001;Murphyet al. 2005). However, they focus on easier to implement (Anderson et al. 1992; Hofstetter et al. public goods and their results are of limited use for marketing. In contrast to the existing meta-analyses, we focus on private goods and include several 2013; Steiner and Hendus 2012). private good specific moderators of high interest for marketers. For a more Various studies specify the accuracy of one or more direct detailed discussion of the three existing meta-analyses, please refer to Web or indirect methods by comparing HWTP with RWTP. Yet no Appendix A. J. of the Acad. Mark. Sci. (2020) 48:499–518 501 points in a full model. This finding contradicts the prevailing Direct methods to measure WTP wisdom in academic studies but supports current prac- tices in companies. In addition to the type of measure- Direct measures usually include open questions, such as, ment, value of the product, product type, and type of BWhat is the maximum you would pay for this product?^ subject design have a significant influence on the hypo- Other methods use closed question formats (Völckner 2006) thetical bias. and require participants to state whether they would accept In the next section, we prove an overview of WTP certain prices or not. Still others combine closed and open and its different measurement options. After detailing questions. The choice bracketing procedure starts with several the data collection and coding, we explicate our pro- closed questions, each of which depends on the previous an- posed ES measure, which informs the analysis approach swer. If consumers do not accept the last price of the last we take to deal with stochastically dependent ESs. We closed question, they must answer an open question about present the results and affirm their robustness with multiple how much they would be willing to pay (Wertenbroch and methods. Finally, we conclude by highlighting our theoretical Skiera 2002). contributions, explaining the main managerial implications, In particular, the most widely used direct measures of and outlining some limitations and directions for further RWTP are the Vickrey auction (Vickrey 1961)and the research. Becker-DeGroot-Marschak lottery (BDM) (Becker et al. 1964). In a Vickrey auction, every participant hands in one sealed bid. The highest bidder wins the auction but pays only Willingness to pay the price of the second highest bid; accordingly, these auctions also are called second-price sealed bid auctions. By Definition and classification disentangling the bid and the potential price, no bidding strat- egy is superior to bidding actual WTP. Different adaptions of We take a standard economic view of WTP (or reservation these Vickrey auctions are available, such as the random nth price) and define it as the maximum price a consumer is will- price auction (Shogren et al. 2001), in which participants do ing to pay for a given quantity of a product or a service not know the quantity being sold in the auction upfront. In (Wertenbroch and Skiera 2002). At that price, the con- contrast, a BDM lottery does not require participants to com- sumer is indifferent to buying or not buying, because pete for the product. Instead, participants first state their WTP, WTP reflects the product’s inherent value in monetary and then a price is drawn randomly. If her or his stated WTP is terms. That is, the product and the money have the equal to or more than the drawn price, a participant must buy same value, so spending to obtain a product is the same as the product for the drawn price. If the stated WTP is less than keeping the money. the drawn price, she or he may not buy the product. Similar to the Vickrey auction, the stated WTP does not influence the Hypothetical versus real WTP drawn price and therefore does not determine the final price. Again then, the dominant strategy is to state actual WTP. The first dimension in Table 1 distinguishes between hypo- Not all direct measures of RWTP are theoretically incentive thetical and real contexts, according to whether the measure compatible. For example, in an English auction, the price in- includes a payment obligation or not. Most measures of creases until only one interested buyer is left, who eventually RWTP rely on incentive-compatible methods, which ensure buys the product for the highest announced bid. Every bidder it is the participant’s best option to reveal his or her true WTP. has an incentive to bid up WTP (Rutström 1998), so an Several different incentive-compatible methods are available English auction reveals all bidders’ WTP, except for the win- (Noussair et al. 2004) and have been used in prior empirical ner’s, who stops bidding after the last competitor leaves. studies to measure RWTP. However, all methods that measure Therefore, the English auction is not theoretically incentive RWTP require a finished, sellable version of the product. compatible, yet the mean RWTP obtained tend to be similar Therefore, practitioners regularly turn to HWTP during the to those resulting from incentive-compatible methods (Kagel product development process, before the final product actually et al. 1987). Therefore, we treat studies using an English auc- exists. In addition, measuring RWTP can be difficult and ex- tion as direct measures of RWTP. pensive, for both practitioners and researchers. Therefore, the Finally, the online auction platform eBay can provide accuracy of HWTP methods is of interest to practitioners and a direct measure of RWTP. Unlike a Vickrey auction, academics alike. Because RWTP reflects consumers’ actual the auction format implemented in eBay allows partici- valuation of a product, it provides a clear benchmark for com- pants to bid multiple times, and the auction has a fixed parison with HWTP. We integrate existing empirical evidence endpoint. Although multiple bids from one participant about the accuracy of various direct and indirect methods to imply that not every bid reveals true WTP, the highest and latest bid does provide this information (Ockenfels and Roth measure HWTP. 502 J. of the Acad. Mark. Sci. (2020) 48:499–518 2006). Theoretically then, eBay auctions are not incentive Hypotheses compatible either (Barrot et al. 2010), but the empirical results from eBay and Vickrey auctions are highly comparable We predict that several moderators may affect the hypothetical (Ariely et al. 2005; Bolton and Ockenfels 2014). Schlag bias. In addition, we control for several variables. The (2008) gauges RWTP from eBay by exclusively using the potential moderators constitute four main categories: (1) highest bid from each participant but disregarding the win- methods for measuring WTP, (2) research stimulus, (3) ners’ bid. We include this study in our meta-analysis as an general research design of the study, and (4) the publi- example of a direct method. cation in which the study appeared. The last category only contains control variables. Indirect methods to measure WTP Moderators: HWTP measurement Among the variety of indirect methods to compute WTP (Lusk and Schroeder 2004), the most prominent is choice- Direct methods for measuring HWTP have some theoretical based conjoint (CBC) analysis. Each participant chooses sev- drawbacks compared to indirect methods. First, asking con- eral times among multiple alternative products, including a sumers directly for their HWTP tends to prime them to focus Bno choice^ option that indicates the participant does not like on the price (Breidert et al. 2006), which is unlike a natural any of the offered products. Each product features several shopping experience in which consumers choose among sev- product attributes, and each attribute offers various levels. eral products that vary on multiple attributes. That is, direct To measure WTP, price must be one of the attributes. From methods may cause atypically high price consciousness the collected choices, it is possible to compute individual util- (Völckner 2006). Indirect methods address this drawback by ities for each presented attribute level and, by interpolation, forcing participants to weigh the costs and benefits of different each intermediate value. Ultimately, WTP can be derived ac- alternatives. Second, when asked directly, consumers might cording to the following relationship (Kohli and Mahajan try to answer strategically if they suspect their answers might 1991), which is the most often used approach in the studies influence future retail prices (Jedidi and Jagpal 2009). included in the meta-analysis: Because indirect methods do not prompt participants to state their HWTP directly, strategic answering may be less likely. u þ uðÞ p ≥u ; itj−p i Third, direct statements of HWTP are cognitively challenging, whereas methods that mimic realistic shopping experiences where u is the utility of product t excluding the utility of it∣− p require less cognitive effort (Brown et al. 1996). the price, and u (p) is the utility for a price level p for i Indirect methods for measuring HWTP also have some consumer i. In accordance with Miller et al. (2011)and drawbacks that might influence the hypothetical bias. First, Jedidi and Zhang (2002), we define u as the utility of researchers using a CBC must take care to avoid a number- the Bno choice^ option. The resulting WTP indicates the of-levels effect, especially in pricing studies (Eggers and highest price p that still fulfills the relationship. In their Sattler 2009). To do so, they generally can test only a few web appendix, Miller et al. (2011)provideanumerical different prices, which might decrease accuracy if the limita- example. tion excludes the HWTP of people with higher (lower) In principle, indirect methods provide measures of HWTP, WTP than the highest (lowest) price shown. Second, because the choices and other judgments expressed by the indirect methods assume a linear relationship between participants do not have any financial consequences. Efforts price levels, through their use of linear interpolation to measure RWTP indirectly attempt to insert a downstream (Jedidi and Zhang 2002). mechanism that introduces a binding element (Wlömert and Overall then, measuring HWTP with direct or indirect Eggers 2016). For example, Ding et al. (2005) propose to methods could evoke the hypothetical bias, and extant randomly choose one of the selected alternatives and make evidence is mixed (e.g. Miller et al. 2011), featuring that choice binding. Every choice could be the binding one, arguments for the superiority of both method types. so participants have an incentive to reveal their true Therefore, we formulate two competing hypotheses. preferences throughout the task. Ding (2007) also incorporates the idea of the BDM lottery, proposing that participants could H1a:MeasuringHWTPwithanindirect method leads to take part in a conjoint task, from which it is possible to infer a smaller hypothetical bias compared to direct their WTP for one specific product, according to the methods. person’s choices in the conjoint task. The inferred WTP then enters the BDM lottery subsequently, so par- H1b: Measuring HWTP with a direct method leads to a ticipants have an incentive to reveal their true preferences in smaller hypothetical bias compared to indirect the conjoint task. methods. J. of the Acad. Mark. Sci. (2020) 48:499–518 503 Moderators: research stimulus tend to result in stronger effects (Ariely et al. 2006). Fox and Tversky (1995) identify stronger effects for a within-subject When asked for their HWTP, personal budget constraints do versus between-subject design in the context of ambiguity not exert an effect, because the consumer does not actually aversion; Ariely et al. (2006) similarly find such stronger ef- have to pay any money. However, when measuring RWTP, fects for a within-subject design for a study comparing WTP budget constraints limit the amount that participants may con- and willingness to accept. According to Frederick and tribute (Brown et al. 2003). For low-priced products, this con- Fischhoff (1998), participants in a within-subject design ex- straint should have little influence on the hypothetical bias, press greater WTP differences for small versus large because the RWTP likely falls within this budget. For high- quantities of a product than do those in a between- priced products though, budget constraints likely become subject design. Therefore, more relevant; participants might state HWTP estimates that they could not afford in reality, thereby increasing the hypo- H5: The hypothetical bias is greater for within-subject designs thetical bias. Thus, we hypothesize: compared with between-subject designs. H2: The hypothetical bias is greater for products with a higher Another source of uncertainty pertains to product perfor- mance, and it increases when the consumer can only review value. images (e.g., online) rather than inspect the product itself A classic categorization of consumer goods cites conve- physically (Dimoka et al. 2012). Consequently, many con- nience, shopping, and specialty goods, depending on the sumers test products in a store to reduce their uncertainty amount of search and price comparison effort they require before buying them online (showrooming) (Gensler et al. (Copeland 1923). Consumers engage in more search effort 2017). Similarly, consumers’ uncertainty might be reduced when they have trouble assessing a product’s utility. in a WTP experiment by giving them an opportunity to Hofstetter et al. (2013) in turn show that the hypothetical bias inspect and test the product before bidding. Bushong et al. decreases as people gain means to assess a product’s utility, (2010) show that participants state a higher RWTP when real and in a parallel finding, Sichtmann et al. (2011)show that products, rather than images, have been displayed. As higher product involvement reduces the hypothetical bias. Hofstetter et al. (2013) note, greater uncertainty increases the That is, higher product involvement likely reduces the need hypothetical bias. We hypothesize: for intensive search effort. Therefore, we hypothesize: H6: Giving participants the opportunity to test a product before H3: The hypothetical bias is least for convenience goods, bidding reduces the hypothetical bias. greater for shopping goods, and greatest for specialty goods. Finally, researchers often motivate participation in an experiment by paying some remuneration or providing Consumers face uncertainty about an innovative prod- an initial balance to bid in an auction. Equipping par- uct’s performance and their preferences for it (Hoeffler ticipants with money might change their RWTP, because 2003). According to Sichtmann et al. (2011), stronger they gain an additional budget. They even might con- consumer preferences lower the hypothetical bias. In sider this additional budget like a coupon, which they contrast, greater uncertainty reduces their ability to as- add to their original RWTP. Consumers in general over- sess a product’s utility, which increases the hypothetical state their WTP in hypothetical contexts, so providing a bias (Hofstetter et al. 2013). Finally, Hofstetter et al. participation fee could decrease the hypothetical bias. (2013) show that the perceived innovativeness of a Yet Hensher (2010) criticizes the use of participation product increases the hypothetical bias. Consequently, fees, noting that they can bias participants’ RWTP. H4: The hypothetical bias is greater for innovations compared H7: Providing participants (a) a participation fee or (b) an to established products. initial balance decreases the hypothetical bias. Moderators: research design Collection and coding of studies The research design also might influence the hypothetical bias (List and Gallet 2001;Murphy et al. 2005). In particular, the Collection of studies subject design of an experiment determines the results, in the sense that between-subject designs tend to be more conserva- With our meta-analysis, we aim to generalize empirical find- tive (Charness et al. 2012), whereas within-subject designs ings about the relative accuracy of HWTP measures, so we 504 J. of the Acad. Mark. Sci. (2020) 48:499–518 conducted a search for studies that report ESs of these mea- by using Copeland’s(1923)classificationofconsumer goods sures. We used three inclusion criteria. First, the study had to according to the search and price comparison effort they re- measure consumers’ HWTP and RWTP for the same product quire, as convenience goods, shopping goods, or specialty or service, so that we could determine the hypothetical bias. goods. We use an ordinal scale for product type and therefore Second, the research stimulus had to be private goods or ser- assessed interrater reliability with a two-way mixed, consis- vices. Third, we included only studies that reported the mean tency-based, average-measure intraclass correlation coeffi- and standard deviation (or values that allow us to compute it) cient (ICC) (Hallgren 2012). The resulting ICC of 0.82 is rated of HWTP and RWTP or for which the authors provided these as excellent (Cicchetti 1994); the two independent coders values at our request. agreed on most stimuli. The lack of any substantial To identify relevant studies, we applied a keyword search measurement error indicates no notable influence on in different established online databases (e.g., Science Direct, the statistical power of the subsequent analyses EBSCO) and Google Scholar across all research disciplines (Hallgren 2012). Any inconsistent codes were resolved and years. The keywords included Bwillingness-to-pay,^ through discussion between the two coders. We include Breservation price,^ Bhypothetical bias,^ and Bconjoint product type in the analyses with two dummy variables analysis.^ We also conducted a manual search among leading for shopping and specialty goods, and convenience goods are marketing and economics journals. To reduce the risk of a captured by the intercept. publication bias, we extended our search to the Social In the third category, we consider moderators that deal with Science Research Network, Research Papers in Economics, the research design. The type of experiment HWTP and type of and the Researchgate network, and we checked for relevant experiment RWTP capture whether the studies measure dissertations whose results had not been published in journals. HWTP and RWTP in field or lab experiments, respectively. Moreover, we conducted a cross-reference search to find other Experiments conducted during a lecture or class are designat- studies. We contacted authors of studies that did not report all ed lab experiments. Offline/online HWTP and offline/online relevant values and asked them for any further relevant studies RWTP indicate whether the experiment is conducted online they might have conducted. Ultimately, we identified 77 stud- or offline; the type of subject design reveals if researchers used ies reported in 47 articles, accounting for 117 ESs and total a between- or within-subject design. The moderator opportu- sample sizes of 24,441 for HWTP and 20,766 for RWTP. nity to test indicates whether participants could inspect the product in more detail before bidding. Participation fee and Coding initial balance capture whether participants received money for showing up or for spending in the auction, respectively. As mentioned previously and as indicated by Table 2,we We identify a student sample when the sample consists of exclusively students; mixed samples are coded as not a student classify the moderators into four categories: (1) methods for measuring WTP, (2) research stimulus, (3) general research sample. Methods for measuring RWTP often are not self-ex- design of the study, and (4) the publication in which the study planatory, so researchers introduce them to participants, using appears. In the first category, the main moderator of interest is various types of instruction. We focused on whether incentive the type of measurement HWTP, that is, the direct versus indi- compatibility concepts or the dominant bidding strategy were rect measurement of HWTP. Two other moderators deal with explained, using a moderator introduction of method for RWTP measurement. Type of measurement RWTP similarly RWTP with four values. It equals Bnone^ if the method was distinguishes between direct and indirect measures, whereas not introduced, Bexplanation^ if the method and its character- incentive compatible reflects the incentive compatibility (or istics were explained, Btraining^ if mock auctions or questions not) of the method. designed to understand the mechanism occurred before the The second category of moderators, dealing with the re- focal auction took place or questions were asked, and Bnot search stimulus, includes value, or the mean RWTP for the mentioned^ if the study does not indicate whether the method corresponding product. The experiments in our meta-analysis was introduced. With this nominal scale, we include this mod- span different countries and years, so we converted all values erator by using three dummy variables for explanation, train- into U.S. dollars using the corresponding exchange rates. The ing, and not mentioned, while the none category is captured variable variance ES captures participants’ uncertainty and by the intercept. Finally, we include region. Almost all heterogeneity when evaluating a product. With regard to the the studies were conducted in North America or Europe; products, we checked whether they were described as new to we distinguish North America from Bother countries the consumer or innovations, which enabled us to code the (mostly Europe).^ innovation moderator. The moderator product/service distin- The fourth category of moderators contains publication guishes products and services. Finally, the product type mod- characteristics. We checked whether a study underwent a peer erator requires more subjective judgment. Two independent review process (peer reviewed), reflected a marketing or eco- coders, unaware of the research project, coded product type nomics research domain (discipline), how many citations it J. of the Acad. Mark. Sci. (2020) 48:499–518 505 Table 2 Moderators Category Moderator Values Variables Description WTP measurement Type of measurement HWTP Direct Dummy variable (indirect = 1) Whether HWTP is measured directly or indirectly. Indirect Type of measurement RWTP Direct Dummy variable (indirect = 1) Whether RWTP is measured directly or indirectly. Indirect Incentive compatible No Dummy variable (yes = 1) Whether the method for measuring RWTP Yes is incentive compatible. Research stimulus Value Metric variable The mean RWTP converted into US dollars. Product type Convenience goods Two dummy variables for shopping Classification of respective stimulus based Shopping goods and specialty goods; convenience on an Copeland (1923). goods are captured by the intercept Specialty goods Innovation No Dummy variable (yes = 1) Whether the stimulus is an innovation. Yes Product/service Product Dummy variable (service = 1) Whether the stimulus is a product or a service. Service Variance ES Metric variable The variance of the ES. Research design Type of subject design Between Dummy variable (within = 1) Whether it is a between or a within subject design. Within Opportunity to test No Dummy variable (yes = 1) Whether participants had the chance to test Yes the product before bidding. Participation fee No Dummy variable (yes = 1) Whether participants received a participation fee. Yes Initial balance No Dummy variable (yes = 1) Whether participants received an initial Yes balance for the auction. Type of experiment HWTP Field Dummy variable (lab = 1) Whether HWTP is measured in a field Lab or a lab experiment. Type of experiment RWTP Field Dummy variable (lab = 1) Whether RWTP is measured in a field Lab or a lab experiment. Offline/online HWTP Offline Dummy variable (online = 1) Whether HWTP is measured offline or online. Online Offline/online RWTP Offline Dummy variable (online = 1) Whether RWTP is measured offline or online. Online Student sample No Dummy variable (yes = 1) Whether the sample consists of students only. Yes Introduction of method for RWTP None Three dummy variables for explanation, How the method for measuring RWTP Explanation training, and not mentioned; None was introduced. is captured by the intercept Training Not mentioned Region Other Countries (mostly Europe) Dummy variable (North America = 1) Region where the experiment was conducted. North America 506 J. of the Acad. Mark. Sci. (2020) 48:499–518 had on Google Scholar (citations), and in which year it was published (year). Methodology Effect size To determine the hypothetical bias induced by different methods, we need an ES that represents the difference be- tween obtained values for HWTP and RWTP. When the dif- ferences stem from a comparison of a treatment and a control group, standardized mean differences (SMD) are appropriate measures (e.g. Abraham and Hamilton 2018; Scheibehenne et al. 2010). Specifically, to compute SMD, researchers divide the difference in the means of the treatment and the control group by the standard deviation, which helps to control for differences in the scales of the dependent variables in the experiments. Accordingly, it applies to studies that measure the same outcome on different scales (Borenstein et al. 2009, p. 25). In contrast, the ESs in our meta-analysis rely on the same scale; they differ in their position on the scale, because the products evoke different WTP values. In this case, the standard deviation depends on not only the scale range but also many other relevant factors, so the standard deviation should not be used to standardize the outcomes. In addition, as studies may have used alternate experimental designs, dif- ferent standard deviations could be used across studies, lead- ing to standardized mean differences that are not directly com- parable (Morris and DeShon 2002). Rather than the SMD, we therefore use a response ratio to assess ES, because it depends on the group means only. Specifically, the response ratio is the mean outcome in an experimental group divided by that in a corresponding control group, such that it quantifies the percentage of variation be- tween the experimental and control groups (Hedges et al. 1999). Unlike SMD, the response ratio applies when the out- come is measured on a ratio scale with a natural zero point, such as length or money (Borenstein et al. 2009). Accordingly, the response ratio often assesses ES in meta-analyses in ecol- ogy domains (Koricheva and Gurevitch 2014), for which many outcomes can be measured on ratio scales. To the best of our knowledge though, the response ratio has not been adopted in meta-analyses in marketing yet. However, it is common practice to specify a multiplicative, instead of a linear, model when assessing the effects of marketing instruments on product sales or other outcomes (Leeflang et al. 2015). Hence, it would be a natural option to use an effect measure representing proportion- ate changes, instead of additive changes, when deriving empirical generalizations on marketing subjects like re- sponse effects to mailing campaigns. For our effort, we define the response ratio as Table 2 (continued) Category Moderator Values Variables Description Publication characteristics Peer reviewed No Dummy variable (yes = 1) Whether the study was peer reviewed. Yes Discipline Economics Dummy variable (marketing = 1) Corresponding research discipline Marketing Citations Metric variable Number of citations in Google Scholar Year Metric variable Year the study was published Moderators in italics are control variables J. of the Acad. Mark. Sci. (2020) 48:499–518 507 HWTP avoid dependent ESs (Grewal et al. 2017). However, nested response ratio ¼ ; RWTP data structures and the associated dependent ESs are prominent in marketing research, so Bijmolt and Pieters where μ and μ are the means of a study’s HWTP RWTP (2001) suggest using a three-level model to account for de- corresponding HWTP and RWTP values. pendency, by adding error terms on all levels. In turn, market- For three reasons, we run statistical analyses using the nat- ing researchers started to model dependence stochastically by ural logarithm of the response ratio as the dependent variable. applying multi-level regression models (e.g. Abraham and First, the use of the natural logarithm linearizes the metric, so Hamilton 2018;Artset al. 2011;Babić Rosario et al. 2016; deviations in the numerator and denominator have the same Bijmolt et al. 2005;Edeling andFischer 2016; Edeling and impact (Hedges et al. 1999). Second, the parameters (β)for Himme 2018). However, when additional information about the moderating effects in the meta-regression are easy to in- correlations among the ESs are available, it is most accurate to terpret, as a multiplication factor, by taking the exponent of the model dependence explicitly by incorporating the dependen- estimate (Exp(β)). Most moderators are dummy variables, and cies in the covariance matrix at the within-study level (Gleser a change of the corresponding dummy value results in a and Olkin 2009). In contrast to modeling dependence stochas- change of (Exp(β) − 1) ∗ 100% in the hypothetical bias. tically, the covariances are not estimated but rather are calcu- However, this point should not be taken to mean that the lated on the basis of the provided information. To the best of difference of the hypothetical bias between two conditions our knowledge, this approach has not been applied by meta- of a moderator is Exp(β) − 1 percentage points, because that analyses in marketing previously. value depends on the values of other moderators. Third, the To model stochastic dependence among ESs explicitly, we distribution of the natural logarithm of response ratios is ap- follow Kalaian and Raudenbush (1996)and useamultivariate proximately normally distributed (Hedges et al. 1999). mixed linear model with two levels: a within-studies level and Consequently, we define ES as: a between-studies level. On the former, we estimate a com- plete vector of the corresponding K true ESs, α =(α , μ i 1i HWTP ES ¼ ln : … , α ) , for each study i. However, not every study exam- Ki RWTP ines all possible K ESs, so the vector of ES estimates for study i, ES ¼ðÞ ES ; …; ES ,contains L of the total possible K i 1i L i i Modeling stochastically dependent effect sizes ESs, and by definition, L ≤ K.That is, K equals the maximum explicitly number of dependent ESs in one study (i.e., six in our sample), and every vector ES contains between one and six estimates. Most meta-analyses assume the statistical independence of The first-level model regresses α on ES with an indicator ki i observed ESs, but this assumption only applies to limited variable Z , which equals 1 if ES estimates α and0other- lki li ki cases; often, ESs are stochastically dependent. Two main wise, according to the following linear model: types of dependencies arise between studies and ESs. First, ES ¼ ∑ α Z þ e ; studies can measure and compare several treatments or vari- li ki lki li k¼1 ants of a type of treatment against a common control. In our or in matrix notation, context, for example, a study might measure HWTP with different methods and compare the results to the same ES ¼ Z α þ e : i i i i RWTP, leading to multiple ESs that correlate because they The first-level errors e are assumed to be multivariate nor- share the same RWTP. Treating them as independent would erroneously add RWTP to the analysis twice. This type of mal in their distribution, such that e ~N(0, V ), where V is a i i i K × K covariance matrix for study i, or the multivariate ex- study is called a multiple-treatment study (Gleser and Olkin i i 2009). Second, studies can produce several dependent ESs by tension of the V-known model for the meta-regression. The elements of V must be calculated according to the obtaining more than one measure from each participant. For example, a study might measure HWTP and RWTP for sev- chosen ES measure (see Web Appendix B; Gleser and Olkin 2009; Lajeunesse 2011). In turn, they form the eral products from the same sample. The resulting ESs basis for modeling the dependent ESs appropriately. correlate, because they are based on a common subject. The vector α of a study’strueESisestimated by This scenario represents a multiple-endpoint study weighted least squares, and each observation is weight- (Gleser and Olkin 2009). ed by the inverse of the corresponding covariance ma- There are different approaches for dealing with stochasti- trix (Gleser and Olkin 2009). cally dependent ESs, such as ignoring or avoiding depen- The linear model for the second stage is dence, or else modeling dependence stochastically or explic- itly (Bijmolt and Pieters 2001; van den Noortgate et al. 2013). α ¼ β þ ∑ β X þ u ; ki k0 km mi ki m¼1 In marketing research, it is still common, and also suggested to 508 J. of the Acad. Mark. Sci. (2020) 48:499–518 or in matrix notation symmetric, which indicates the absence of a publication bias. Finally, as the competing H1a and H1b indicate, we do not α ¼ X β þ u ; i i i expect a strong selection mechanism in research or publication processes that would favor significant or high (or low) ESs. where the K ESs become the dependent variable. The resid- Thus, we do not consider publication bias a serious concern uals u are assumed to be K-variate normal with zero average ki for our study. and a covariance matrix τ. Then X reflects the moderator To detect outliers in the data, we checked for extreme ESs variables. By combining both levels, the resulting model is using the boxplot (see Web Appendix D, Figure WA2). We are ES ¼ Z X β þ Z u þ e : i i i i i i especially interested in the moderator type of measurement HWTP, so we computed separate boxplots for the direct and Estimates for τ are based on restricted maximum likeli- indirect measures of HWTP and thereby identified one obser- hood. The analysis uses the metafor package for meta- vation for each measurement type (indirect Kimenju et al. analyses in R (Viechtbauer 2010). 2005; direct Neill et al. 1994) for which the ESs (0.9079; 0.9582) exceeded the upper whisker, defined as the 75% quantile plus 1.5 times the box length. Kimenju et al. (2005) Data screening and descriptive statistics report HWTP ($11.68) values from an indirect method that overestimate RWTP ($94.48) by a factor of eight; we exclud- One of the criticisms of meta-analyses is the risk of publica- ed it from our analyses. Neill et al. (1994) report HWTP tion bias, such that all the included ESs would reflect the non- ($109) that overestimates RWTP ($12) by a factor of nine random sampling procedure. Including unpublished studies when excluding outliers, and it is another outlier in our data- can address this concern; in our sample, 22 of 117 ESs come base. Thus, we excluded two of 117 observations, or less than from unpublished studies, for an unpublished work proportion 5% of the full sample, which is a reasonable range (Cohen of 19%, which favorably compares with other meta-analyses et al. 2003,p.397). pertaining to pricing, such as 10% in Tully and Winer (2014), The remaining 115 ESs represent 77 studies reported by 47 9% in Bijmolt et al. (2005), or 16% in Abraham and Hamilton different articles, with a total sample size of 24,347 for HWTP (2018). The funnel plot for the sample, as depicted in Fig. 1,is and 20,656 for RWTP. Sixteen out of these 115 ESs indicate Fig. 1 Funnel plot Notes: Six ESs with a very high standard error are not included here, to improve readability. A funnel plot with all ESs in Web Appendix C confirms the lack of a publication bias. J. of the Acad. Mark. Sci. (2020) 48:499–518 509 Table 3 Descriptive statistics Mean SD N Mean SD N Mean SD N Mean SD N Type of measurement HWTP Direct Indirect 0.1818 0.1709 85 0.2280 0.2048 30 Type of measurement RWTP Direct Indirect 0.1869 0.1776 106 0.2758 0.2055 9 Incentive compatible No Yes 0.1294 0.1709 24 0.2109 0.1801 91 Product type Convenience Shopping Specialty 0.1954 0.1852 38 0.1339 0.1554 48 0.2911 0.1758 29 Innovation No Yes 0.1760 0.1807 76 0.2287 0.1773 39 Product/service Product Service 0.2482 0.1797 80 0.0696 0.1840 35 Type of subject design Between Within 0.1800 0.1740 42 0.1800 0.1740 73 Opportunity to test No Yes 0.1626 0.1746 75 0.2524 0.1789 40 Participation fee No Yes 0.1400 0.1731 106 0.2747 0.1617 9 Initial balance No Yes 0.1774 0.1662 69 0.3879 0.2365 46 Type of experiment HWTP Field Lab 0.2716 0.1663 42 0.1491 0.1741 73 Type of experiment RWTP Field Lab 0.2743 0.1663 39 0.1526 0.1741 76 Offline/online HWTP Offline Online 0.1888 0.1893 87 0.2096 0.1521 28 Offline/online RWTP Offline Online 0.1880 0.1857 91 0.2159 0.1612 24 Student sample No Yes 0.2635 0.1571 57 0.1254 0.1769 58 Introduction of method for RWTP None Explanation Training Not mentioned 0.1689 0.1670 17 0.1657 0.1863 65 0.3464 0.2096 12 0.2201 0.1144 22 Region Other countries (mostly Europe) North America 0.2678 0.1773 32 0.1653 0.1746 83 Peer reviewed No Yes 510 J. of the Acad. Mark. Sci. (2020) 48:499–518 an underestimation of RWTP, resulting from direct (12) and indirect (4) methods. Table 3 contains an overview of the moderators’ descriptive statistics. Type of measurement HWTP reveals some mean differences between direct (0.1818) and indirect (0.2280) measures, which represents model-free support for H1b. The descriptive statistics of prod- uct type suggest a higher mean ES for specialty goods (0.2911) than convenience (0.1954) or shopping (0.1399) goods, in accordance with H3. With regard to innovation, we find a higher ES mean for innovative (0.2287) compared with non-innovative (0.1760) products, as we predicted in H4. Model-free evidence gathered from the moderators that reflect the research design also supports H5, in that the mean for between-subject designs is lower (0.1800) than that for within-subject designs (0.2798). The descriptive statistics can- not confirm H6 though, because giving participants an oppor- tunity to test a product before stating their WTP increases the ES (0.2525) relatively to no such opportunity (0.1626). We also do not find support for H7 in the model-free evidence, because studies with an initial balance and participation fee report higher ESs than those without. After detecting outliers and before conducting the meta- regressions, we checked for multicollinearity by calculating 1/(2 ∗ df) the generalized variance inflation factor GVIF , which is used when there are dummy regressors from categorical variables; it is comparable to the square root of the variance pffiffiffiffiffiffiffiffi inflation factor VIF for 1 degree of freedom (df = 1) (Fox and Monette 1992). In an iterative procedure, we excluded the 1/(2 ∗ df) moderator with the highest GVIF and reestimated the 1/(2 ∗ df) model repeatedly, until all moderators had a GVIF <2. This cut-off value of 2 has been applied in other disciplines (Pebsworth et al. 2012;Vegaetal. 2010) and is comparable to a VIF cut-off value of 4, within the range of suggested values (i.e., 3–5; Hair Jr et al., 2019, p. 316). Accordingly, we ex- cluded moderators—all control variables that do not appear in any hypotheses—in the following order: type of experiment 1/(2 ∗ df) HWTP (GVIF = 3.4723), offline/online RWTP 1/(2 ∗ df) 1/(2 ∗ df) (GVIF =3.2504), discipline (GVIF =2.2.4791), 1/(2 ∗ df) product/service (GVIF =2.2.3290), and peer reviewed 1/(2 ∗ df) (GVIF =2.0419). Results To address our research questions about the accuracy of WTP measurement methods and the moderators of this perfor- mance, we performed several meta-regressions in which we varied the moderating effects included in the models. First, we ran an analysis without any moderators. Second, we ran a meta-regression with all the moderators that met the multicollinearity criteria. Third, we conducted a stepwise anal- ysis, dropping the non-significant moderators one by one. Table 3 (continued) Mean SD N Mean SD N Mean SD N Mean SD N 0.1843 0.1938 21 0.1960 0.1785 94 Discipline Economics Marketing 0.1194 0.1435 65 0.2907 0.1789 50 Moderators in italics are control variables J. of the Acad. Mark. Sci. (2020) 48:499–518 511 Table 4 Results of full and reduced models Full model Reduced model Estimate EXP (Estimate) Std. Err. p Value Significance Estimate EXP (Estimate) Std. Err. p Value Significance Intercept −2.7030 0.0670 9.4731 0.7754 0.0831 1.0867 0.0500 0.0965 * Type of measurement HWTP (indirect) 0.1027 1.1082 0.0404 0.0110 ** 0.0905 1.0947 0.0382 0.0177 ** Type of measurement RWTP (indirect) −0.0132 0.9869 0.0587 0.8216 Incentive compatible (yes) 0.0488 1.0500 0.0574 0.3951 Value 0.0002 1.0002 0.0001 0.0656 * Product type (shopping) 0.0353 1.0359 0.0445 0.4274 0.0028 1.0028 0.0371 0.9388 Product type (specialty) 0.1615 1.1753 0.0476 0.0007 *** 0.1624 1.1763 0.0393 <.0001 *** Innovation (yes) −0.0004 0.9996 0.0505 0.9944 Variance ES 0.1752 1.1915 0.2527 0.4883 Type of subject design (within) 0.0878 1.0918 0.0439 0.0455 ** Opportunity to test (yes) 0.0139 1.0140 0.0468 0.7658 Participation fee (yes) 0.0522 1.0536 0.0489 0.2858 Initial balance (yes) 0.0978 1.1027 0.0746 0.1896 Type of experiment RWTP (lab) −0.0050 0.9950 0.0471 0.9156 Offline/online HWTP (offline) 0.0904 1.0946 0.0553 0.1019 Student sample (yes) −0.1134 0.8928 0.0446 0.0110 ** −0.1026 0.9025 0.0344 0.0021 *** Introduction of method for RWTP (explanation) 0.0497 1.0510 0.0579 0.3908 0.0671 1.0694 0.0420 0.1095 Introduction of method for RWTP (training) 0.1846 1.2027 0.0762 0.0154 ** 0.2032 1.2253 0.0604 0.0008 *** Introduction of method for RWTP (not mentioned) 0.1299 1.1387 0.0784 0.0974 * 0.1546 1.1672 0.0524 0.0032 *** Region (North America) −0.0765 0.9264 0.0467 0.1013 Citations 0.0001 1.0001 0.0001 0.3300 Year 0.0013 1.0013 0.0047 0.7809 τ 0.0031 0.0047 R 0.7416 0.6083 AICc 45.6093 −23.4892 Significance codes: *** p < 0.01; ** p < 0.05; * p < 0.1 Moderators in italics are control variables 512 J. of the Acad. Mark. Sci. (2020) 48:499–518 Fig. 2 Overestimation of RWTP 50% 40% 30% 20% 10% -2 0% Base scenario Product type Student sample Introduction of Introduction of (specialty) (yes) method for method for -10% RWTP (not RWTP (training) mentioned) Direct measurement of HWTP Indirect measurement of HWTP Notes: The base scenario is as follows: product type = convenience good,introduction of method for RWTP = explanation,student sample = no. The first model, including only the intercept, results in an H7a, or H7b though, because opportunity to test (β =0.0139, estimate (β) of 0.1889 with a standard error (SE) of 0.0183 Exp(β) = 1.0140, SE = 0.0468, p = 0.7658), participation fee and a p value < .0001. The estimate corresponds to an average (β = 0.0522, Exp(β) = 1.0536, SE = 0.0489, p = 0.2858), and hypothetical bias of 20.79% (Exp(0.1889) = 1.2079), meaning initial balance (β = 0.0978, Exp(β) = 1.1027, SE = 0.0746, that on average, HWTP overestimates RWTP by almost 21%. p = 0.1896) do not show significant effects. The analysis with all the moderators that met the Of the control variables, only student sample (β = − multicollinearity threshold produces the estimation results in 0.1134, Exp(β) = 0.8928, SE = 0.0446, p = 0.0110) and intro- Table 4.The type of measurement HWTP has a significant, duction of method for RWTP (training) (β = 0.1846, positive effect (β = 0.1027, Exp(β) = 1.1082, SE = 0.0404, Exp(β) = 1.2027, SE = 0.0762, p = 0.0154) exert significant p = 0.0110), indicating that indirect measures overestimate effects in the full model. If a study only includes students, RWTP more than direct measures do. We reject H1a and con- the hypothetical bias gets smaller by 11%; conducting mock firm H1b. In particular, the ratio of HWTP to RWTP should be auctions before measuring RWTP increases the hypothetical multiplied by 1.1082, resulting in an overestimation by indi- bias by 20%. rect methods of an additional 10.82%. Value has a significant, Finally, we ran analyses in which we iteratively excluded positive effect at the 10% level (β =0.0002, Exp(β) = 1.0002, moderators until all remaining moderators were significant at SE = 0.0001, p = 0.0656), in weak support of H2. The percent- the 5% level. We excluded the moderator with the highest p age overestimation of RWTP by HWTP increases slightly, by value from the full model, reran the analysis, and repeated this an additional 0.02%, with each additional U.S. dollar increase procedure until we had only significant moderators left. We in value. For H3, we find no significant difference in the treated the dummy variables from the nominal/ordinal moder- hypothetical bias between convenience and shopping goods, ators product type and introduction of method for RWTP as yet specialty goods evoke a significantly higher hypothetical belonging together, and we considered these moderators as bias than convenience goods (β = 0.1615, Exp(β) = 1.1753, significant when one of the corresponding dummy variables SE = 0.0476, p < .0001). This finding implies that the hypo- showed a significant effect. The exclusion order was as fol- thetical bias is greater for products that demand extraordinary lows: innovation, type of experiment RWTP, type of measure- search effort, as we predicted in H3. We do not find support ment RWTP, opportunity to test, year, variance ES, incentive for H4, because innovation does not influence the hypothetical compatible, initial balance, citations, participation fee, bias significantly (β = − 0.0004, Exp(β) = 0.9996, SE = region, value, type of subject design,and offline/online 0.0505, p =0.9944). HWTP. The results in Table 4 reconfirm the support for H1b, For moderators from the research design category, we con- because the type of measurement HWTP has a positive, sig- firm the support we previously identified for H5. Measuring nificant effect (β =0.0905, Exp(β) = 1.0947, SE =0.0382, p = HWTP and RWTP using a within-subject design results in a 0.0177), resulting in a multiplication factor of 1.0947. The greater hypothetical bias than does a between-subject design overestimation of RWTP increases considerably for measures (β = 0.0878, Exp(β) = 1.0918, SE = 0.0439, p = 0.0455), such of WTP for specialty goods (β = 0.1624, Exp(β) = 1.1763, that the hypothetical bias increases by an additional 9.18 per- SE = 0.0393, p < .0001), in support of H3. Yet we do not find centage points in this case. We do not find support for H6, support for any other hypotheses in the reduced model. J. of the Acad. Mark. Sci. (2020) 48:499–518 513 Regarding the control variables, student sample (β = − mentioned), region,and peer reviewed have significant 0.1026, Exp(β) = 0.9025, SE = 0.0344, p = 0.0021) again has effects (5% level). The moderators excluded from the a significant effect, and introduction of method for main models due to multicollinearity (product/service, RWTP affects the hypothetical bias significantly. In this type of experiment HWTP, offline/online RWTP,and case, the hypothetical bias increases when the article discipline) do not show significant influences. does not mention any introduction of the method for Next, we estimated two models with all ESs, including the measuring RWTP to participants (β = 0.1546, Exp(β)= two outliers, but varied varied the number of included moder- 1.1672, SE = 0.0524, p = 0.0032) and when the method ators (Models 2 and 3 in Table WA2). The results remain involves mock auctions (β = 0.2032, Exp(β) = 1.2253, similar to our main findings. Perhaps most important, SE = 0.0604, p = 0.0008). the type of measurement HWTP has a significant effect For ease of interpretation, we depict the hypothetical bias on the hypothetical bias, comparable in size to the ef- for different scenarios in Fig. 2. The reduced model provides a fect in the main model. better model fit, according to the corrected Akaike informa- In addition, instead of the multivariate mixed linear model, tion criterion (AICc) (AICc =45.61, AICc we used a random-effects, three-level model, such that the ES full model reduced model- = − 23.49), so we use it as the basis for the simulation. The measures nested within studies with a V-known model at the base scenario depicted in Fig. 2 measures WTP for conve- lowest level (Bijmolt and Pieters 2001; van den Noortgate nience goods, explains the method for measuring RWTP to et al. 2013), which can account for dependence between ob- participants, and does not include solely students. The other servations. We estimated the two main models and the three scenarios are adaptions of the base scenario, where one of the robustness check models with this random-effects three-level three aforementioned characteristics is changed. In the base model (Models 4–8 in Table WA2). Again, the results do not scenario, we predict that direct measurement overestimates change substantially, except for value, which becomes signif- RWTP by 9%, and indirect measurement overestimates it by icant at the 5% level. 19%, so the difference is 10 percentage points. In contrast, for Finally, we tested for possible interaction effects. That is, specialty goods, the overestimation increases to 28% for direct we took all significant moderators from the full model and and to 40% for indirect measures. When using a pure student tested, for each significant moderator, all possible interactions. sample instead of a mixed sample, the predictions are relative- The limited number of observations prevented us from simul- ly accurate. Here, direct measurement even underestimates taneously including all interactions in one model. Therefore, RWTP by 2%, while indirect measurement yields an overes- we first estimated separate models for each of the significant timation of 7%. With respect to how the method for measuring moderators from the full model, after dropping moderators 1/(2 ∗ RWTP is introduced to the participants, not mentioning it in a due to multicollinearity until all moderators had a GVIF df) paper, as well as training the method beforehand increase the < 2. Then, we estimated an additional extension of the full hypothetical bias. While the first option is hardly interpretable, model by adding all significant interactions that emerged from running mock tasks increases the bias to 33% in case of direct the previous interaction models. We next reduced that model and to 46% in case of indirect methods used for measuring until all moderators were significant at a 5% level. The HWTP. resulting model achieved a higher AICc than our main re- duced model. Comparing all full models with interactions, the model with the lowest AICc (Burnham and Anderson 2004) did not feature a significant interaction, indicating that Robustness checks the possible interactions are small and do not affect our results. All of these models are available in Web Appendix F. We ran several additional analyses to check the robustness of the results, which we summarize in Table WA2 in Web Appendix F. To start, we analyzed Model 1 in Table WA2 Discussion pffiffiffiffiffi 1=ðÞ 2*df by applying a cut-off value of GVIF < 10,compa- rable to the often used cut-off value of 10 for the VIF. In this Theoretical contributions case, we did not need to exclude any moderator, but the results do not deviate in their signs or significance levels relatively to Though three meta-analyses discussing the hypothetical bias the main results. Type of measurement HWTP still has a sig- exist (Carson et al. 1996; List and Gallet, 2001;Murphy etal. nificant effect (5% level) on the hypothetical bias. In 2005), this is the first comprehensive study giving marketing addition, value, product type (specialty),and type of managers and scholars advices on how to accurately measure subject design exert significant influences. Among the consumers’ WTP. In contrast to the existing meta-analyses, control variables, introduction of method for RWTP we focus on private goods, instead of on public goods, in- (training), introduction of method for RWTP (not creasing the applicability of our findings within a marketing 514 J. of the Acad. Mark. Sci. (2020) 48:499–518 Table 5 Hypotheses testing results Hypothesis Full model Reduced Robustness model checks H1a Type of measurement HWTP: indirect methods have smaller bias than direct methods H1b Type of measurement HWTP: direct methods have smaller bias than indirect methods ✓✓ ✓ H2 Bias increases with product value ✓✓ H3 Bias is least for convenience goods, greater for shopping goods, greatest for specialty goods ✓✓ ✓ H4 Bias is greater for innovations H5 Bias is greater for within-subject designs than for between-subject designs ✓✓ H6 Opportunity to test a product reduces the bias H7a Participation fee decreases the bias H7b Initial balance decreases the bias context. With a meta-analysis of 115 ESs gathered from 77 In our results related to H2, the p value of the value mod- studies reported in 47 papers, we conclude that HWTP erator is slightly greater than 5% in the full model, such that methods tend to overestimate RWTP considerably, by about the hypothetical bias appears greater for more valuable prod- 21% on average. This hypothetical bias depends on several ucts in percentage terms, though the effect is relatively small. factors, for which we formulated hypotheses (Table 5) and Value does not remain in the reduced model, but the signifi- which we discuss subsequently. cant effect is very consistent across the robustness checks that With respect to the method for measuring HWTP, whether feature the full model (Table 5). Therefore, our results support direct or indirect, across all the different models, we find H2: The hypothetical bias increases if the value of the prod- strong support for H1b, which states that indirect methods ucts to be evaluated increases. This finding is new, in that overestimate HWTP more severely than direct methods. neither existing meta-analyses (Carson et al. 1996;List and This important finding contradicts the prevailing opinion Gallet 2001;Murphy etal. 2005) nor any primary studies have among academic researchers (Breidert et al. 2006) and has examined this moderating effect. not previously been revealed in meta-analyses (Carson et al. We also find support for H3 across all analyzed models. For 1996;Listand Gallet 2001;Murphy et al. 2005). We in turn participants it is harder to evaluate a specialty product’s utility propose several potential mechanisms that could produce this than a convenience product’s utility; specialty goods often surprising finding. First, we consider the concept of coherent feature a higher degree of complexity or are less familiar to arbitrariness, as first introduced by Ariely et al. (2003). People consumers than convenience goods. The greater ability to as- facing many consecutive choices tend to base each decision sess the product’s utility reduces the hypothetical bias on their previous ones, such that they show stable preferences. (Hofstetter et al. 2013), such that our finding of higher over- However, study participants might make their first decision estimation for specialty goods is in line with prior research. more or less randomly. Indirect measures require many, con- Yet we do not find any difference between shopping and con- secutive choices, so coherent arbitrariness could arise when venience goods, prompting us to posit that the hypothetical using these methods to measure WTP. In that sense, the results bias might not be affected by moderate search effort; rather, of indirect measures indicate stable preferences, but they do only products demanding strong search effort increase the not accurately reflect the participants’ actual valuation. hypothetical bias. Existing meta-analyses (Carson et al. Second, participants providing indirect measure responses 1996; List and Gallet 2001;Murphy et al. 2005) include pub- might focus less on the absolute values of an attribute and lic goods and do not distinguish among different types of more on relative values (Drolet et al. 2000). The absolute private goods. By showing that the type of a private good values of the price attribute are key determinants of WTP, so influences the hypothetical bias, we add to an understanding the hypothetical bias might increase if the design of the choice of the hypothetical bias in a marketing context that features alternatives does not include correct price levels. A wide- private goods. spread argument for the greater accuracy of indirect methods With respect to innovation, we find no support for H4, compared with direct methods asserts they mimic a natural because the differences between innovations and existing shopping experience (Breidert et al. 2006); our analysis chal- products are small and not significant. This finding contrasts lenges this claim. with Hofstetter et al.’s(2013) results. Accordingly, we avoid rejecting the claim that methods for measuring HWTP work as well (or as poorly) for innovations as they do Please refer to Web Appendix A for a more detailed discussion of the for existing products. existing meta-analyses. J. of the Acad. Mark. Sci. (2020) 48:499–518 515 A within-subject research design increases the hypothetical using response ratios and thus offer marketing scholars anoth- bias, compared with a between-subject design, as we predict- er ES option to use in their meta-analyses. ed in H5 and in accordance with prior research (Ariely et al. 2006,FoxandTversky 1995, Frederick and Fischhoff 1998). Managerial implications Yet this finding still seems surprising to some extent. When asking a participant for WTP twice (once hypothetically, once This meta-analysis identifies a substantial hypothetical bias of in a real context), the first answer seemingly should serve as 21% on average in measures of WTP. Although hypothetically an anchor for the second, leading to an assimilation expected derived WTP estimates are often the best estimates available, to reduce the hypothetical bias. Instead, two similar questions managers should realize that they generally overestimate con- under different conditions appear to evoke a contrast instead sumers’ RWTP and take that bias into account when using of an assimilation effect, and they produce a greater hypothet- HWTP results to develop a pricing strategy or when setting ical bias. Consequently, when designing marketing experi- an innovation’s launch price. In addition, we detail conditions ments to investigate the hypothetical bias, researchers should in which the bias is larger or smaller, and we provide a brief use a between-subject design to prevent the answers from overview of how extensive the expected biases might become. influencing each other. When researching the influence of In particular, managers should anticipate a greater hypotheti- consumer characteristics on the hypothetical bias though, it cal bias when measuring WTP for products with higher values would be more appropriate to choose a within-subject design or for specialty goods. For example, when measuring HWTP (Hofstetter et al. 2013), though researchers must recognize for specialty goods, direct methods overestimate it by 28% that the hypothetical bias might be overestimated more severe- and indirect methods do so by 40%. These predicted degrees ly in this case. Murphy et al. (2005) also distinguish different of RWTP overestimation should be used to adjust decisions subject designs in their meta-analysis and find a significant based on WTP studies in practice. effect, though they use RWTP instead of the difference be- The study at hand also shows that direct methods result in tween HWTP and RWTP as their dependent variable. In this more accurate estimates of WTP than indirect methods do. sense, our finding of a moderating role of the study design on Therefore, practitioners can resist, or at least consider with the hypothetical bias is new to the literature. some skepticism, the prevalent academic advice to use indirect Our results do not support H6; we do not find differences in methods to measure WTP. In addition to being less accurate, the hypothetical bias when participants have an opportunity indirect methods require more effort and costs (Leigh et al. the test a product before stating their WTP or not. Testing a 1984). However, this recommendation only applies if the mea- product in advance reduces uncertainty about product perfor- surement of HWTP is necessary. If RWTP can be measured mance, and our finding is in contrast with Hofstetter et al.’s with an auction format, that option is preferable, since RWTP (2013) evidence that higher uncertainty increases the hypo- reflects actual WTP, whereas HWTP tends to overestimate it. thetical bias. Note however, that the result by Hofstetter This result also implies an exclusive focus on measuring WTP et al.’s(2013) refers to an effect of a consumer characteristic, for a specific product, such that it disregards some advantages and might be specific to the examined product, namely digital of the disaggregate information provided by indirect methods cameras. Our results are more general across a wide (e.g., demand due to cannibalization, brand switching, or range of product categories and experimental designs. market expansion; Jedidi and Jagpal 2009). In summary, the Furthermore, this result on H6 is in line with our find- key takeaway for managers who might use direct measures of ings for H4; both hypotheses rest on the participants’ HWTP is that the Bquick and dirty solution^ is only quick, not uncertainty about product performance, and we do not dirty—or at least, not more dirty than indirect methods. find support for either of them. Finally, neither a participation fee nor initial balance re- Limitations and research directions duce the hypothetical bias significantly, so we find no support for H7a or H7b. Formally, we can only Bnot reject^ a null This meta-analysis suggests several directions for further re- hypothesis of no moderator effect, but these findings suggest search, some of which are based on the limitations of our that we can dispel fears about influencing WTP results too meta-analysis. First, several recent adaptations of indirect much by offering participation fees or an initial balance. methods seek to improve their accuracy (Gensler et al. 2012, In addition to these theoretical insights on WTP measures, Schlereth and Skiera 2017). These improvements might re- we contribute to marketing literature by showing how to mod- duce the variance in measurement accuracy between direct el stochastically dependent ESs explicitly when the covari- and indirect measurements. These recently developed ances and variances of the observed ESs are known or can methods have not been tested by empirical comparison stud- be computed. Moreover, we use (the log of) the response ies, so we could not include them in our meta-analysis. An ratios as the ES in our meta-analysis, which has not been done extensive comparison of those adaptions, in terms of their previously in marketing. We provide a detailed rationale for effects on the hypothetical bias, would provide researchers 516 J. of the Acad. Mark. Sci. (2020) 48:499–518 and managers more comprehensive insights for choosing the pricing decisions, if it included assessments of different direct right method when measuring WTP. methods for measuring WTP. Second, the prevailing opinion of indirect methods yielding Acknowledgments The authors appreciate helpful comments from Felix a lower hypothetical bias than direct methods bases upon as- Eggers, Manfred Krafft, and Hans Risselada. sumptions concerning individuals’ decision making; though our results are in contrast with this opinion. The underlying mental processes when asked for the WTP through direct or indirect methods are not well understood yet. Investigating Open Access This article is distributed under the terms of the Creative those processes would foster the understanding of differences Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, in the hypothetical bias between direct and indirect methods distribution, and reproduction in any medium, provided you give and between other experimental conditions. This would en- appropriate credit to the original author(s) and the source, provide a link able the development of new adaptions minimizing the hypo- to the Creative Commons license, and indicate if changes were made. thetical bias. Third, the hypothetical bias depends on a variety of factors, including individual-level considerations (Hofstetter et al. 2013; Sichtmann et al. 2011), that extend beyond the product References or study level moderators as examined in our meta-regres- sions. Very few studies have investigated these factors, so Abraham, A. T., & Hamilton, R. W. (2018). When does partitioned pric- ing lead to more favorable consumer preferences? Meta-analytic we could not incorporate them in our meta-analysis, though evidence. Journal of Marketing Research, 55(5), 686–703. consumer characteristics likely explain some differences. Anderson, J. C., Jain, D. C., & Chintagunta, P. K. (1992). Customer value Therefore, we call for more research on whether and how assessment in business markets: A state-of-practice study. Journal of individual characteristics influence the hypothetical bias. For Business-to-Business Marketing, 1(1), 3–29. example, a possible explanation for the limited accuracy of Ariely, D., Loewenstein, G., & Prelec, D. (2003). BCoherent arbitrariness^: Stable demand curves without stable preferences. indirect measures could reflect coherent arbitrariness (Ariely The Quarterly Journal of Economics, 118(1), 73–106. et al. 2003). Continued research might examine whether and Ariely, D., Ockenfels, A., & Roth, A. E. (2005). An experimental analysis how coherent arbitrariness affects different consumers, espe- of ending rules in internet auctions. RAND Journal of Economics, cially in the context of CBCs. In addition, our findings on 36(4), 890–907. some product-level factors are new, namely that the hypothet- Ariely, D., Loewenstein, G., & Prelec, D. (2006). Tom sawyer and the construction of value. Journal of Economic Behavior & ical bias is greater for higher valued products and for specialty Organization, 60(1), 1–10. goods. These results could be cross-validated in future exper- Arts, J. W., Frambach, R. T., & Bijmolt, T. H. A. (2011). Generalizations imental studies. on consumer innovation adoption: A meta-analysis on drivers of Fourth, knowing and measuring WTP is crucial for firms intention and behavior. International Journal of Research in Marketing, 28(2), 134–144. operating in business-to-business (B2B) contexts (Anderson Babić Rosario, A., Sotgiu, F., de Valck, K., & Bijmolt, T. H. A. (2016). et al. 1992), yet all ESs in our study are from a business-to- The effect of electronic word of mouth on sales: A meta-analytic consumer context. Because B2B products and services tend to review of platform, product, and metric factors. Journal of be more complex, customers might prefer to identify product Marketing Research, 53(3), 297–318. characteristics and to include them separately when determin- Barrot, C., Albers, S., Skiera, B., & Schäfers, B. (2010). Why second- price sealed-bid auction leads to more realistic price-demand func- ing their WTP in response to an indirect method. However, tions. International Journal of Electronic Commerce, 14(4), 7–38. anecdotal evidence indicates that direct measurement works Becker, G. M., DeGroot, M. H., & Marschak, J. (1964). Measuring utility better for industrial goods than for consumer goods (Dolan by a single-response sequential method. Systems Research and and Simon 1996). Researching the differential accuracy of Behavioral Science, 9(3), 226–232. Bijmolt, T. H. A., & Pieters, R. G. M. (2001). Meta-analysis in marketing the various methods in a B2B context would be espe- when studies contain multiple measurements. Marketing Letters, cially interesting; our study already indicates differences 12(2), 157–169. between convenience and (more complex) specialty Bijmolt, T. H. A., van Heerde, H. J., & Pieters, R. G. M. (2005). New goods. Therefore, we join Lilien (2016)incalling for empirical generalizations on the determinants of price elasticity. more research in B2B marketing, including the measure- Journal of Marketing Research, 42(2), 141–156. Bolton, G. E., & Ockenfels, A. (2014). Does laboratory trading mirror ment of WTP. behavior in real world markets? Fair bargaining and competitive Fifth, the majority of studies included herein used open bidding on eBay. Journal of Economic Behavior & Organization, questioning as the direct method for measuring WTP. In prac- 97,143–154. tice, different direct methods are available (Steiner and Borah, A., Wang, X., & Ryoo, J. H. (2018). Understanding influence of Hendus 2012), yet they rarely have been investigated in aca- marketing thought on practice: An analysis of business journals using textual and latent Dirichlet allocation (LDA) analysis. demic research. Pricing research could increase in managerial Customer Needs and Solutions, 5(3–4), 146–161. relevance (Borah et al. 2018), and help managers make better J. of the Acad. Mark. Sci. (2020) 48:499–518 517 Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. Gensler, S., Neslin, S. A., & Verhoef, P. C. (2017). The showrooming phenomenon: It’s more than just about price. Journal of Interactive (2009). Introduction to meta-analysis. Chichester, United Kingdom: John Wiley & Sons. Marketing, 38,29–43. Breidert, C., Hahsler, M., & Reutterer, T. (2006). A review of methods for Gleser, L. J., & Olkin, I. (2009). Stochastically dependent effect sizes. In measuring willingness-to-pay. Innovative Marketing, 2(4), 8–32. H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of nd research synthesis and meta-analysis (2 ed., pp. 357–376). New Brown, T. C., Champ, P. A., Bishop, R. C., & McCollum, D. W. (1996). York: Russel Sage Foundation. Which response format reveals the truth about donations to a public good? Land Economics, 72(2), 152–166. Grewal, D., Puccinelli, N., & Monroe, K. B. (2017). Meta-analysis: Integrating accumulated knowledge. Journal of the Academy of Brown, T. C., Ajzen, I., & Hrubes, D. (2003). Further tests of entreaties to Marketing Science, 47(5), 840. avoid hypothetical bias in referendum contingent valuation. Journal of Environmental Economics and Management, 46(2), 353–361. Hair J.F. Jr, Black, W. C., Babin, B. J., & Anderson, R. E. (2019). th Multivariate data analysis (8 ed.). Hampshire, United Kingdom: Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference: Cengage Learning EMEA. Understanding AIC and BIC in model selection. Sociological Methods & Research, 33(2), 261–304. Hallgren, K. A. (2012). Computing inter-rater reliability for observational data: An overview and tutorial. Tutorial in Quantitative Methods for Bushong, B., King, L. M., Camerer, C. F., & Rangel, A. (2010). Psychology, 8(1), 23–34. Pavlovian processes in consumer choice: The physical presence of a good increases willingness-to-pay. American Economic Review, Harrison, G. W., & Rutström, E. E. (2008). Experimental evidence on the 100(4), 1556–1571. existence of hypothetical bias in value elicitation methods. In C. R. Plott & V. L. Smith (Eds.), Handbook of experimental economics Carson, R. T., Flores, N. E., Martin, K. M., & Wright, J. L. (1996). results (Vol. 1, pp. 752–767). Amsterdam, Netherlands: Elsevier. Contingent valuation and revealed preference methodologies: Comparing the estimates for quasi-public goods. Land Economics, Hedges, L. V., Gurevitch, J., & Curtis, P. S. (1999). The meta-analysis of 72(1), 80–99. response ratios in experimental ecology. Ecology, 80(4), 1150–1156. Charness, G., Gneezy, U., & Kuhn, M. A. (2012). Experimental methods: Hensher, D. A. (2010). Hypothetical bias, choice experiments and will- Between-subject and within subject designs. Journal of Economic ingness to pay. Transportation Research Part B: Methodological, Behavior & Organization, 81(1), 1–8. 44(6), 735–752. Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for eval- Hoeffler, S. (2003). Measuring preferences for really new products. uating normed and standardized assessment instruments in psychol- Journal of Marketing Research, 40(4), 406–420. ogy. Psychological Assessment, 6(4), 284–290. Hofstetter, R., Miller, K. M., Krohmer, H., & Zhang, Z. J. (2013). How do Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple consumer characteristics affect the bias in measuring willingness to regression/correlation analysis for the behavioral sciences (3rd ed.). pay for innovative products? Journal of Product Innovation Mahwah, NJ: Lawrence Erlbaum Associates. Management, 30(5), 1042–1053. Copeland, M. T. (1923). Relation of consumers’ buying habits to market- Ingenbleek, P. T. M., Frambach, R. T., & Verhallen, T. M. M. (2013). Best ing methods. Harvard Business Review, 1(3), 282–289. practices for new product pricing: Impact on market performance Dimoka, A., Hong, Y., & Pavlou, P. A. (2012). On product uncertainty in and price level under different conditions. Journal of Product online markets: Theory and evidence. MIS Quarterly, 36(2), 395– Innovation Management, 30(3), 560–573. Jedidi, K., & Jagpal, S. (2009). Willingness to pay: Measurement and Ding, M. (2007). An incentive-aligned mechanism for conjoint analysis. managerial implications. In V. R. Rao (Eds.), Handbook of pricing Journal of Marketing Research, 44(2), 214–223. research in marketing (pp. 37–60). Cheltenham, United Kingdom: Edward Elgar Publishing. Ding, M., Grewal, R., & Liechty, J. (2005). Incentive-aligned conjoint analysis. Journal of Marketing Research, 42(1), 67–82. Jedidi, K., & Zhang, Z. J. (2002). Augmenting conjoint analysis to esti- mate consumer reservation price. Management Science, 48(10), Dolan, R. J., & Simon, H. (1996). Power pricing: how managing price 1350–1368. transforms the bottom line. New York: The Free Press. Kagel, J. H., Harstad, R. M., & Levin, D. (1987). Information impact and Drolet, A., Simonson, I., & Tversky, A. (2000). Indifference curves that allocation rules in auctions with affiliated private values: A labora- travel with the choice set. Marketing Letters, 11(3), 199.209. tory study. Econometrica, 55(6), 1275–1304. Edeling, A., & Fischer, M. (2016). Marketing’s impact on firm value: Kalaian, H. A., & Raudenbush, S. W. (1996). A multivariate mixed linear Generalizations from a meta-analysis. Journal of Marketing model for meta-analysis. Psychological Methods, 1(3), 227–235. Research, 53(4), 515–534. Kimenju, S. C., Morawetz, U. B., & De Groote, H. (2005). Comparing Edeling, A., & Himme, A. (2018). When does market share matter? New contingent valuation method, choice experiments and experimental empirical generalizations from a meta-analysis of the market share– auctions in soliciting consumer preference for maize in Western performance relationship. Journal of Marketing, 82(3), 1–24. Kenya: Preliminary results (Presentation at the African Eggers, F., & Sattler, H. (2009). Hybrid individualized two-level choice- th Econometric Society 10 annual conference on econometric model- based conjoint (HIT-CBC): A new method for measuring preference ing in Africa, Nairobi, Kenya). structures with many attribute levels. International Journal of Kohli, R., & Mahajan, V. (1991). A reservation-price model for optimal Research in Marketing, 26(2), 108–118. pricing of multiattribute products in conjoint analysis. Journal of Fox, J., & Monette, G. (1992). Generalized collinearity diagnostics. Marketing Research, 28(3), 347–354. Journal of the American Statistical Association, 87(417), 178–183. Koricheva, J., & Gurevitch, J. (2014). Uses and misuses of meta-analysis Fox, C. R., & Tversky, A. (1995). Ambiguity aversion and comparative in plant ecology. Journal of Ecology, 102(4), 828–844. ignorance. The Quarterly Journal of Economics, 110(3), 585–603. Lajeunesse, M. J. (2011). On the meta-analysis of response ratios for Frederick, S., & Fischhoff, B. (1998). Scope (in)sensitivity in elicited studies with correlated and multi-group designs. Ecology, 92(11), valuations. Risk Decision and Policy, 3(2), 109–123. 2049–2055. Gensler, S., Hinz, O., Skiera, B., & Theysohn, S. (2012). Willingness-to- Leeflang, P.S.H., Wieringa, J.E., Bijmolt, T.H.A., & Pauwels, K.H. pay estimation with choice-based conjoint analysis: Addressing ex- (2015). Modeling markets; analyzing marketing phenomena and treme response behavior with individually adapted designs. improving marketing decision making. New York, NY: Springer. European Journal of Operational Research, 219(2), 368–378. 518 J. of the Acad. Mark. Sci. (2020) 48:499–518 Leigh, T. W., MacKay, D. B., & Summers, J. O. (1984). Reliability and Shogren, J. F., Margolis, M., Koo, C., & List, J. A. (2001). A random nth- validity of conjoint analysis and self-explicated weights: A compar- price auction. Journal of Economic Behavior & Organization, ison. Journal of Marketing Research, 21(4), 456–462. 46(4), 409–421. Sichtmann, C., Wilken, R., & Diamantopoulos, A. (2011). Estimating Lilien, G. (2016). L. (2016). The b2b knowledge gap. International willingness-to-pay with choice-based conjoint analysis: Can con- JournalofResearch inMarketing,33,543–556. sumer characteristics explain variations in accuracy? British List, J. A., & Gallet, C. A. (2001). What experimental protocol influence JournalofManagement, 22(4), 628–645. disparities between actual and hypothetical stated values? Evidence Simon, H. (2018). Irrationals Verhalten. Interview. Harvard Business from a meta-analysis. Environmental and Resource Economics, Manager, 40(8), 52–54. 20(3), 241–254. Steiner, M., & Hendus, J. (2012). How consumers’ willingness to pay is Lusk, J. L., & Schroeder, T. C. (2004). Are choice experiments incentive measured in practice: An empirical analysis of common approaches’ compatible? A test with quality differentiated beef steaks. American relevance. Retrieved from SSRN: https://ssrn.com/abstract= Journal of Agricultural Economics, 86(2), 467–482. 2025618. Accessed 20 Aug 2018 Miller, K. M., Hofstetter, R., Krohmer, H., & Zhang, Z. J. (2011). How Steiner, M., Eggert, A., Ulaga, W., & Backhaus, K. (2016). Do custom- should consumers’ willingness to pay be measured? An empirical ized service packages impede value capture in industrial markets? comparison of state-of-the-art approaches. Journal of Marketing Journal of the Academy of Marketing Science, 44(2), 151–165. Research, 48(1), 172–184. Thompson, S. G., & Sharp, S. J. (1999). Explaining heterogeneity in Morris, S. B., & DeShon, R. P. (2002). Combining effect size estimates in meta-analysis: A comparison of methods. Statistics in Medicine, meta-analysis with repeated measures and independent-groups de- 18(20), 2693–2708. signs. Psychological Methods, 7(1), 105–125. Tully, S. M., & Winer, R. S. (2014). The role of the beneficiary in will- Murphy, J. J., Allen, P. G., Stevens, T. H., & Weatherhead, D. (2005). A ingness to pay for socially responsible products: a meta-analysis. meta-analysis of hypothetical bias in stated preference valuation. Journal of Retailing, 90(2), 255–274. Environmental and Resource Economics, 30(3), 313–325. van den Noortgate, W., López-López, J. A., Marín-Martínez, F., & Nagle, T. T., & Müller, G. (2018). The strategy and tactics of pricing: A Sánchez-Meca, J. (2013). Three-level meta-analysis of dependent guide to growing more profitably (6th ed.). New York, NY: effect sizes. Behavior Research Methods, 45(2), 576–594. Routledge. van Houwelingen, H. C., Arends, L. R., & Stijnen, T. (2002). Advanced Neill, H. R., Cummings, R. G., Ganderton, P. T., Harrison, G. W., & methods in meta-analysis: Multivariate approach and meta-regres- McGuckin, T. (1994). Hypothetical surveys and real economic com- sion. Statistics in Medicine, 21(4), 589–624. mitments. Land Economics, 70(2), 145–154. Vega, L. A., Koike, F., & Suzuki, M. (2010). Conservation study of Noussair, C., Robin, S., & Ruffieux, B. (2004). Revealing consumers’ myrsine seguinii in Japan: Current distribution explained by past willingness-to-pay: A comparison of the BDM mechanism and the land use and prediction of distribution by land use-planning simula- Vickrey auction. Journal of Economic Psychology, 25(6), 725–741. tion. Ecological Research, 25(6), 1091–1099. Ockenfels, A., & Roth, A. E. (2006). Late and multiple bidding in second Vickrey, W. (1961). Counterspeculation, auctions, and competitive sealed price internet auctions: Theory and evidence concerning different tenders. Journal of Finance, 16(1), 8–37. rules for ending an auction. Games and Economic Behavior, 55(2), Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor 297–320. package. Journal of Statistical Software, 36(3). Völckner, F. (2006). Methoden zur Messung individueller Pebsworth, P. A., MacIntosh, A. J. J., Morgan, H. R., & Huffman, M. A. Zahlungsbereitschaften: Ein Überblick zum State of the Art. (2012). Factors influencing the ranging behavior of chacma baboons Journal für Betriebswirtschaft, 56(1), 33–60. (papio hamadryas ursinus) living in a human-modified habitat. Wang, T., Venkatesh, R., & Chatterjee, R. (2007). Reservation price as a International Journal of Primatology, 33(4), 872–887. range: An incentive-compatible measurement approach. Journal of Rutström, E. E. (1998). Home-grown values and incentive compatible Marketing Research, 44(2), 200–213. auction design. International Journal of Game Theory, 27(3), Wertenbroch, K., & Skiera, B. (2002). Measuring consumers’ willingness 427–441. to pay at the point of purchase. Journal of Marketing Research, Scheibehenne, B., Greifeneder, R., & Todd, P. M. (2010). Can there ever 39(2), 228–241. be too many options? A meta-analytic overview of choice overload. Wlömert, N., & Eggers, F. (2016). Predicting new service adoption with Journal of Consumer Research, 37(3), 409–425. conjoint analysis: External validity of BDM-based incentive-aligned Schlag, N. (2008). Validierung der Conjoint-Analyse zur Prognose von and dual-response choice designs. Marketing Letters, 27(1), 195– Preisreaktionen mithilfe realer Zahlungsbereitschaften. In Lohmar. Germany: Josef Eul Verlag. Schlereth, C., & Skiera, B. (2017). Two new features in discrete choice experiments to improve willingness-to-pay estimation that result in Publisher’snote Springer Nature remains neutral with regard to jurisdic- SDR and SADR: Separated (adaptive) dual response. Management tional claims in published maps and institutional affiliations. Science, 63(3), 829–842. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of the Academy of Marketing Science Springer Journals

Accurately measuring willingness to pay for consumer goods: a meta-analysis of the hypothetical bias

Loading next page...
 
/lp/springer-journals/accurately-measuring-willingness-to-pay-for-consumer-goods-a-meta-R5sIMWNkwI

References (103)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2019
Subject
Business and Management; Business and Management, general; Marketing; Social Sciences, general
ISSN
0092-0703
eISSN
1552-7824
DOI
10.1007/s11747-019-00666-6
Publisher site
See Article on Publisher Site

Abstract

Consumers’ willingness to pay (WTP) is highly relevant to managers and academics, and the various direct and indirect methods used to measure it vary in their accuracy, defined as how closely the hypothetically measured WTP (HWTP) matches consumers’ real WTP (RWTP). The difference between HWTP and RWTP is the Bhypothetical bias.^ A prevalent assumption in marketing science is that indirect methods measure WTP more accurately than do direct methods. With a meta-analysis of 77 studies reported in 47 papers and resulting in 115 effect sizes, we test that assumption by assessing the hypothetical bias. The total sample consists of 24,347 included observations for HWTP and 20,656 for RWTP. Moving beyond extant meta-analyses in marketing, we introduce an effect size metric (i.e., response ratio) and a novel analysis method (i.e., multivariate mixed linear model) to analyze the stochastically dependent effect sizes. Our findings are relevant for academic researchers and managers. First, on average, the hypothetical bias is 21%, and this study provides a reference point for the expected magnitude of the hypothetical bias. Second, the deviation primarily depends on the use of a direct or indirect method for measuring HWTP. In contrast with conventional wisdom, indirect methods actually overestimate RWTP significantly stronger than direct methods. Third, the hypothetical bias is greater for higher valued products, specialty goods (cf. other product types), and within-subject designs (cf. between-subject designs), thus a stronger downward adjustment of HWTP values is necessary to reflect consumers’ RWTP. . . . . . . Keywords Willingness to pay Reservation price Pricing Conjoint analysis Measurement accuracy Hypothetical bias . . Meta-analysis Response ratio Stochastically dependent effect sizes Introduction willingness to pay (WTP) is Bthe cornerstone of marketing strategy^ that drives important marketing decisions. First, con- In a state-of-practice study of consumer value assessments, sumers’ WTP is the central input for price response models that Anderson et al. (1992, p. 3) point out that consumers’ inform optimal pricing and promotion decisions. Second, a new product’s introductory price must be carefully chosen, because a Mark Houston and John Hulland served as Special Issue Editors for this poorly considered introductory price can jeopardize the invest- article. ments in its development and threaten innovation failures Electronic supplementary material The online version of this article (Ingenbleek et al. 2013). Not only do companies need to know (https://doi.org/10.1007/s11747-019-00666-6) contains supplementary what consumers are willing to pay early in their product devel- material, which is available to authorized users. opment process, but WTP is also of interest to researchers in marketing and economics who seek to quantify concepts such as * Jonas Schmidt jo.schmidt@uni-muenster.de a product’s value (Steiner et al. 2016). Obtaining accurate mea- sures of consumers’ WTP thus is essential. Tammo H.A.Bijmolt Existing methods for measuring WTP can be assigned to a t.h.a.bijmolt@rug.nl 2 × 2 classification (Miller et al. 2011), according to whether they measure WTP in a hypothetical or real context, with Marketing Center Muenster, University of Muenster, Am direct or indirect measurement methods (see Table 1). First, Stadtgraben 13-15, 48143 Muenster, Germany a hypothetical measure of WTP (HWTP) does not impose any Department of Marketing, Faculty of Economics and Business, financial consequences for participants’ decisions. University of Groningen, Nettelbosje 2, 9747 AE Groningen, The Netherlands Participants just state what they would pay for a product, if 500 J. of the Acad. Mark. Sci. (2020) 48:499–518 Table 1 Classification of Type of measurement methods for measuring WTP Context Direct Indirect Hypothetical � Open questioning � Closed-ended � Conjoint analysis � Choice bracketing procedure Real � Vickrey auction � BDM lottery th � Random n price auction � Incentive-aligned conjoint analysis � English auction � eBay given the opportunity to buy it. In contrast, participants may clear summary of these findings is available, and considering be required to pay their stated WTP in a real context, which the discrepancy between theory and practice, Bthere is a lack provides a real measure of WTP (RWTP). This could for ex- of consensus on the ‘right’ way to measure […]consumer’s ample be in the context of an auction, where the winner in the reservation price^ (Wang et al. 2007, p. 200). Therefore, with end actually has to buy the product. The difference between this study we seek to shed new light on the relative accuracy of RWTP and HWTP is induced by the hypothetical context and alternative methods for measuring consumers’ WTP, and par- is called Bhypothetical bias.^ This hypothetical bias provides a ticularly the accuracy of direct versus indirect methods. We measure of the hypothetical method’s accuracy (Harrison and perform a meta-analysis of existing studies that measure Rutström 2008). In case HWTP is measured with two differ- HWTP and RWTP for the same product or service, which ent methods, the one with the lower hypothetical bias gives a reveals some empirical generalizations regarding accuracy. more accurate estimate of participants’ RWTP, increasing the We also acknowledge the potential influence of other factors estimate’s validity. We conceptualize the hypothetical bias as on the accuracy of WTP measures (Hofstetter et al. 2013; the ratio of HWTP to RWTP. A method yielding an exemplary Sichtmann et al. 2011), such that we anticipate substantial hypothetical bias of 1.5 shows that those participants overstate heterogeneity across extant studies. With a meta-regression, their RWTP for a product by 50% when asked hypothetically. we accordingly identify moderators that might explain this Second, direct methods ask consumers directly for their WTP, heterogeneity in WTP accuracy (Thompson and Sharp 1999; whereas indirect methods require consumers to evaluate, com- van Houwelingen et al. 2002). Our multivariate mixed linear pare, and choose among different product alternatives, and the model enables us to analyze the stochastically dependent ef- price attribute is just one of several characteristics. Then, WTP fect sizes (ESs) explicitly (Gleser and Olkin 2009; Kalaian can be derived from their responses. and Raudenbush 1996), which provides the most accurate Many researchers assume thatdirectmethods create a stronger way to deal with dependent ESs (van den Noortgate et al. hypotheticalbias,becausetheyevokegreaterpriceconsciousness 2013). As an effect size (ES) measure, we use the response (Völckner 2006). In their pricing textbook, Nagle and Müller ratio of HWTP and RWTP (Hedges et al. 1999), such that we (2018) allege that direct questioning Bshould never be accepted obtain the relative deviation of HWTP. To the best of our as a valid methodology. The results of such studies are at best knowledge, no previous meta-analysis in marketing has ap- useless and are potentially highly misleading^ (p. 186). Simon plied a mixed linear model nor a response ratio to measure (2018) takes a similar line, stating, BIt doesn’t make sense to ask ESs. consumers directly for the utility or their WTP, as they aren’table On average, the hypothetical bias is about 21%. In addition, to give a direct and precise estimate. The most important direct methods outperform indirect methods with regard to method to quantify utilities and WTP is the conjoint their accuracy. The meta-regression shows that, compared analysis^ (p. 53). Because indirect methods represent a with direct measurement methods, the hypothetical bias is shopping experience, they are expected to be more ac- considerably higher in indirect measures, by 10 percentage curate for measuring HWTP (Breidert et al. 2006;Leigh et al. 1984;Völckner 2006). Still, practitioners largely Three meta-analyses dealing with the hypothetical bias exist (Carson et al. continue to rely on direct survey methods, which tend to be 1996;List and Gallet 2001;Murphyet al. 2005). However, they focus on easier to implement (Anderson et al. 1992; Hofstetter et al. public goods and their results are of limited use for marketing. In contrast to the existing meta-analyses, we focus on private goods and include several 2013; Steiner and Hendus 2012). private good specific moderators of high interest for marketers. For a more Various studies specify the accuracy of one or more direct detailed discussion of the three existing meta-analyses, please refer to Web or indirect methods by comparing HWTP with RWTP. Yet no Appendix A. J. of the Acad. Mark. Sci. (2020) 48:499–518 501 points in a full model. This finding contradicts the prevailing Direct methods to measure WTP wisdom in academic studies but supports current prac- tices in companies. In addition to the type of measure- Direct measures usually include open questions, such as, ment, value of the product, product type, and type of BWhat is the maximum you would pay for this product?^ subject design have a significant influence on the hypo- Other methods use closed question formats (Völckner 2006) thetical bias. and require participants to state whether they would accept In the next section, we prove an overview of WTP certain prices or not. Still others combine closed and open and its different measurement options. After detailing questions. The choice bracketing procedure starts with several the data collection and coding, we explicate our pro- closed questions, each of which depends on the previous an- posed ES measure, which informs the analysis approach swer. If consumers do not accept the last price of the last we take to deal with stochastically dependent ESs. We closed question, they must answer an open question about present the results and affirm their robustness with multiple how much they would be willing to pay (Wertenbroch and methods. Finally, we conclude by highlighting our theoretical Skiera 2002). contributions, explaining the main managerial implications, In particular, the most widely used direct measures of and outlining some limitations and directions for further RWTP are the Vickrey auction (Vickrey 1961)and the research. Becker-DeGroot-Marschak lottery (BDM) (Becker et al. 1964). In a Vickrey auction, every participant hands in one sealed bid. The highest bidder wins the auction but pays only Willingness to pay the price of the second highest bid; accordingly, these auctions also are called second-price sealed bid auctions. By Definition and classification disentangling the bid and the potential price, no bidding strat- egy is superior to bidding actual WTP. Different adaptions of We take a standard economic view of WTP (or reservation these Vickrey auctions are available, such as the random nth price) and define it as the maximum price a consumer is will- price auction (Shogren et al. 2001), in which participants do ing to pay for a given quantity of a product or a service not know the quantity being sold in the auction upfront. In (Wertenbroch and Skiera 2002). At that price, the con- contrast, a BDM lottery does not require participants to com- sumer is indifferent to buying or not buying, because pete for the product. Instead, participants first state their WTP, WTP reflects the product’s inherent value in monetary and then a price is drawn randomly. If her or his stated WTP is terms. That is, the product and the money have the equal to or more than the drawn price, a participant must buy same value, so spending to obtain a product is the same as the product for the drawn price. If the stated WTP is less than keeping the money. the drawn price, she or he may not buy the product. Similar to the Vickrey auction, the stated WTP does not influence the Hypothetical versus real WTP drawn price and therefore does not determine the final price. Again then, the dominant strategy is to state actual WTP. The first dimension in Table 1 distinguishes between hypo- Not all direct measures of RWTP are theoretically incentive thetical and real contexts, according to whether the measure compatible. For example, in an English auction, the price in- includes a payment obligation or not. Most measures of creases until only one interested buyer is left, who eventually RWTP rely on incentive-compatible methods, which ensure buys the product for the highest announced bid. Every bidder it is the participant’s best option to reveal his or her true WTP. has an incentive to bid up WTP (Rutström 1998), so an Several different incentive-compatible methods are available English auction reveals all bidders’ WTP, except for the win- (Noussair et al. 2004) and have been used in prior empirical ner’s, who stops bidding after the last competitor leaves. studies to measure RWTP. However, all methods that measure Therefore, the English auction is not theoretically incentive RWTP require a finished, sellable version of the product. compatible, yet the mean RWTP obtained tend to be similar Therefore, practitioners regularly turn to HWTP during the to those resulting from incentive-compatible methods (Kagel product development process, before the final product actually et al. 1987). Therefore, we treat studies using an English auc- exists. In addition, measuring RWTP can be difficult and ex- tion as direct measures of RWTP. pensive, for both practitioners and researchers. Therefore, the Finally, the online auction platform eBay can provide accuracy of HWTP methods is of interest to practitioners and a direct measure of RWTP. Unlike a Vickrey auction, academics alike. Because RWTP reflects consumers’ actual the auction format implemented in eBay allows partici- valuation of a product, it provides a clear benchmark for com- pants to bid multiple times, and the auction has a fixed parison with HWTP. We integrate existing empirical evidence endpoint. Although multiple bids from one participant about the accuracy of various direct and indirect methods to imply that not every bid reveals true WTP, the highest and latest bid does provide this information (Ockenfels and Roth measure HWTP. 502 J. of the Acad. Mark. Sci. (2020) 48:499–518 2006). Theoretically then, eBay auctions are not incentive Hypotheses compatible either (Barrot et al. 2010), but the empirical results from eBay and Vickrey auctions are highly comparable We predict that several moderators may affect the hypothetical (Ariely et al. 2005; Bolton and Ockenfels 2014). Schlag bias. In addition, we control for several variables. The (2008) gauges RWTP from eBay by exclusively using the potential moderators constitute four main categories: (1) highest bid from each participant but disregarding the win- methods for measuring WTP, (2) research stimulus, (3) ners’ bid. We include this study in our meta-analysis as an general research design of the study, and (4) the publi- example of a direct method. cation in which the study appeared. The last category only contains control variables. Indirect methods to measure WTP Moderators: HWTP measurement Among the variety of indirect methods to compute WTP (Lusk and Schroeder 2004), the most prominent is choice- Direct methods for measuring HWTP have some theoretical based conjoint (CBC) analysis. Each participant chooses sev- drawbacks compared to indirect methods. First, asking con- eral times among multiple alternative products, including a sumers directly for their HWTP tends to prime them to focus Bno choice^ option that indicates the participant does not like on the price (Breidert et al. 2006), which is unlike a natural any of the offered products. Each product features several shopping experience in which consumers choose among sev- product attributes, and each attribute offers various levels. eral products that vary on multiple attributes. That is, direct To measure WTP, price must be one of the attributes. From methods may cause atypically high price consciousness the collected choices, it is possible to compute individual util- (Völckner 2006). Indirect methods address this drawback by ities for each presented attribute level and, by interpolation, forcing participants to weigh the costs and benefits of different each intermediate value. Ultimately, WTP can be derived ac- alternatives. Second, when asked directly, consumers might cording to the following relationship (Kohli and Mahajan try to answer strategically if they suspect their answers might 1991), which is the most often used approach in the studies influence future retail prices (Jedidi and Jagpal 2009). included in the meta-analysis: Because indirect methods do not prompt participants to state their HWTP directly, strategic answering may be less likely. u þ uðÞ p ≥u ; itj−p i Third, direct statements of HWTP are cognitively challenging, whereas methods that mimic realistic shopping experiences where u is the utility of product t excluding the utility of it∣− p require less cognitive effort (Brown et al. 1996). the price, and u (p) is the utility for a price level p for i Indirect methods for measuring HWTP also have some consumer i. In accordance with Miller et al. (2011)and drawbacks that might influence the hypothetical bias. First, Jedidi and Zhang (2002), we define u as the utility of researchers using a CBC must take care to avoid a number- the Bno choice^ option. The resulting WTP indicates the of-levels effect, especially in pricing studies (Eggers and highest price p that still fulfills the relationship. In their Sattler 2009). To do so, they generally can test only a few web appendix, Miller et al. (2011)provideanumerical different prices, which might decrease accuracy if the limita- example. tion excludes the HWTP of people with higher (lower) In principle, indirect methods provide measures of HWTP, WTP than the highest (lowest) price shown. Second, because the choices and other judgments expressed by the indirect methods assume a linear relationship between participants do not have any financial consequences. Efforts price levels, through their use of linear interpolation to measure RWTP indirectly attempt to insert a downstream (Jedidi and Zhang 2002). mechanism that introduces a binding element (Wlömert and Overall then, measuring HWTP with direct or indirect Eggers 2016). For example, Ding et al. (2005) propose to methods could evoke the hypothetical bias, and extant randomly choose one of the selected alternatives and make evidence is mixed (e.g. Miller et al. 2011), featuring that choice binding. Every choice could be the binding one, arguments for the superiority of both method types. so participants have an incentive to reveal their true Therefore, we formulate two competing hypotheses. preferences throughout the task. Ding (2007) also incorporates the idea of the BDM lottery, proposing that participants could H1a:MeasuringHWTPwithanindirect method leads to take part in a conjoint task, from which it is possible to infer a smaller hypothetical bias compared to direct their WTP for one specific product, according to the methods. person’s choices in the conjoint task. The inferred WTP then enters the BDM lottery subsequently, so par- H1b: Measuring HWTP with a direct method leads to a ticipants have an incentive to reveal their true preferences in smaller hypothetical bias compared to indirect the conjoint task. methods. J. of the Acad. Mark. Sci. (2020) 48:499–518 503 Moderators: research stimulus tend to result in stronger effects (Ariely et al. 2006). Fox and Tversky (1995) identify stronger effects for a within-subject When asked for their HWTP, personal budget constraints do versus between-subject design in the context of ambiguity not exert an effect, because the consumer does not actually aversion; Ariely et al. (2006) similarly find such stronger ef- have to pay any money. However, when measuring RWTP, fects for a within-subject design for a study comparing WTP budget constraints limit the amount that participants may con- and willingness to accept. According to Frederick and tribute (Brown et al. 2003). For low-priced products, this con- Fischhoff (1998), participants in a within-subject design ex- straint should have little influence on the hypothetical bias, press greater WTP differences for small versus large because the RWTP likely falls within this budget. For high- quantities of a product than do those in a between- priced products though, budget constraints likely become subject design. Therefore, more relevant; participants might state HWTP estimates that they could not afford in reality, thereby increasing the hypo- H5: The hypothetical bias is greater for within-subject designs thetical bias. Thus, we hypothesize: compared with between-subject designs. H2: The hypothetical bias is greater for products with a higher Another source of uncertainty pertains to product perfor- mance, and it increases when the consumer can only review value. images (e.g., online) rather than inspect the product itself A classic categorization of consumer goods cites conve- physically (Dimoka et al. 2012). Consequently, many con- nience, shopping, and specialty goods, depending on the sumers test products in a store to reduce their uncertainty amount of search and price comparison effort they require before buying them online (showrooming) (Gensler et al. (Copeland 1923). Consumers engage in more search effort 2017). Similarly, consumers’ uncertainty might be reduced when they have trouble assessing a product’s utility. in a WTP experiment by giving them an opportunity to Hofstetter et al. (2013) in turn show that the hypothetical bias inspect and test the product before bidding. Bushong et al. decreases as people gain means to assess a product’s utility, (2010) show that participants state a higher RWTP when real and in a parallel finding, Sichtmann et al. (2011)show that products, rather than images, have been displayed. As higher product involvement reduces the hypothetical bias. Hofstetter et al. (2013) note, greater uncertainty increases the That is, higher product involvement likely reduces the need hypothetical bias. We hypothesize: for intensive search effort. Therefore, we hypothesize: H6: Giving participants the opportunity to test a product before H3: The hypothetical bias is least for convenience goods, bidding reduces the hypothetical bias. greater for shopping goods, and greatest for specialty goods. Finally, researchers often motivate participation in an experiment by paying some remuneration or providing Consumers face uncertainty about an innovative prod- an initial balance to bid in an auction. Equipping par- uct’s performance and their preferences for it (Hoeffler ticipants with money might change their RWTP, because 2003). According to Sichtmann et al. (2011), stronger they gain an additional budget. They even might con- consumer preferences lower the hypothetical bias. In sider this additional budget like a coupon, which they contrast, greater uncertainty reduces their ability to as- add to their original RWTP. Consumers in general over- sess a product’s utility, which increases the hypothetical state their WTP in hypothetical contexts, so providing a bias (Hofstetter et al. 2013). Finally, Hofstetter et al. participation fee could decrease the hypothetical bias. (2013) show that the perceived innovativeness of a Yet Hensher (2010) criticizes the use of participation product increases the hypothetical bias. Consequently, fees, noting that they can bias participants’ RWTP. H4: The hypothetical bias is greater for innovations compared H7: Providing participants (a) a participation fee or (b) an to established products. initial balance decreases the hypothetical bias. Moderators: research design Collection and coding of studies The research design also might influence the hypothetical bias (List and Gallet 2001;Murphy et al. 2005). In particular, the Collection of studies subject design of an experiment determines the results, in the sense that between-subject designs tend to be more conserva- With our meta-analysis, we aim to generalize empirical find- tive (Charness et al. 2012), whereas within-subject designs ings about the relative accuracy of HWTP measures, so we 504 J. of the Acad. Mark. Sci. (2020) 48:499–518 conducted a search for studies that report ESs of these mea- by using Copeland’s(1923)classificationofconsumer goods sures. We used three inclusion criteria. First, the study had to according to the search and price comparison effort they re- measure consumers’ HWTP and RWTP for the same product quire, as convenience goods, shopping goods, or specialty or service, so that we could determine the hypothetical bias. goods. We use an ordinal scale for product type and therefore Second, the research stimulus had to be private goods or ser- assessed interrater reliability with a two-way mixed, consis- vices. Third, we included only studies that reported the mean tency-based, average-measure intraclass correlation coeffi- and standard deviation (or values that allow us to compute it) cient (ICC) (Hallgren 2012). The resulting ICC of 0.82 is rated of HWTP and RWTP or for which the authors provided these as excellent (Cicchetti 1994); the two independent coders values at our request. agreed on most stimuli. The lack of any substantial To identify relevant studies, we applied a keyword search measurement error indicates no notable influence on in different established online databases (e.g., Science Direct, the statistical power of the subsequent analyses EBSCO) and Google Scholar across all research disciplines (Hallgren 2012). Any inconsistent codes were resolved and years. The keywords included Bwillingness-to-pay,^ through discussion between the two coders. We include Breservation price,^ Bhypothetical bias,^ and Bconjoint product type in the analyses with two dummy variables analysis.^ We also conducted a manual search among leading for shopping and specialty goods, and convenience goods are marketing and economics journals. To reduce the risk of a captured by the intercept. publication bias, we extended our search to the Social In the third category, we consider moderators that deal with Science Research Network, Research Papers in Economics, the research design. The type of experiment HWTP and type of and the Researchgate network, and we checked for relevant experiment RWTP capture whether the studies measure dissertations whose results had not been published in journals. HWTP and RWTP in field or lab experiments, respectively. Moreover, we conducted a cross-reference search to find other Experiments conducted during a lecture or class are designat- studies. We contacted authors of studies that did not report all ed lab experiments. Offline/online HWTP and offline/online relevant values and asked them for any further relevant studies RWTP indicate whether the experiment is conducted online they might have conducted. Ultimately, we identified 77 stud- or offline; the type of subject design reveals if researchers used ies reported in 47 articles, accounting for 117 ESs and total a between- or within-subject design. The moderator opportu- sample sizes of 24,441 for HWTP and 20,766 for RWTP. nity to test indicates whether participants could inspect the product in more detail before bidding. Participation fee and Coding initial balance capture whether participants received money for showing up or for spending in the auction, respectively. As mentioned previously and as indicated by Table 2,we We identify a student sample when the sample consists of exclusively students; mixed samples are coded as not a student classify the moderators into four categories: (1) methods for measuring WTP, (2) research stimulus, (3) general research sample. Methods for measuring RWTP often are not self-ex- design of the study, and (4) the publication in which the study planatory, so researchers introduce them to participants, using appears. In the first category, the main moderator of interest is various types of instruction. We focused on whether incentive the type of measurement HWTP, that is, the direct versus indi- compatibility concepts or the dominant bidding strategy were rect measurement of HWTP. Two other moderators deal with explained, using a moderator introduction of method for RWTP measurement. Type of measurement RWTP similarly RWTP with four values. It equals Bnone^ if the method was distinguishes between direct and indirect measures, whereas not introduced, Bexplanation^ if the method and its character- incentive compatible reflects the incentive compatibility (or istics were explained, Btraining^ if mock auctions or questions not) of the method. designed to understand the mechanism occurred before the The second category of moderators, dealing with the re- focal auction took place or questions were asked, and Bnot search stimulus, includes value, or the mean RWTP for the mentioned^ if the study does not indicate whether the method corresponding product. The experiments in our meta-analysis was introduced. With this nominal scale, we include this mod- span different countries and years, so we converted all values erator by using three dummy variables for explanation, train- into U.S. dollars using the corresponding exchange rates. The ing, and not mentioned, while the none category is captured variable variance ES captures participants’ uncertainty and by the intercept. Finally, we include region. Almost all heterogeneity when evaluating a product. With regard to the the studies were conducted in North America or Europe; products, we checked whether they were described as new to we distinguish North America from Bother countries the consumer or innovations, which enabled us to code the (mostly Europe).^ innovation moderator. The moderator product/service distin- The fourth category of moderators contains publication guishes products and services. Finally, the product type mod- characteristics. We checked whether a study underwent a peer erator requires more subjective judgment. Two independent review process (peer reviewed), reflected a marketing or eco- coders, unaware of the research project, coded product type nomics research domain (discipline), how many citations it J. of the Acad. Mark. Sci. (2020) 48:499–518 505 Table 2 Moderators Category Moderator Values Variables Description WTP measurement Type of measurement HWTP Direct Dummy variable (indirect = 1) Whether HWTP is measured directly or indirectly. Indirect Type of measurement RWTP Direct Dummy variable (indirect = 1) Whether RWTP is measured directly or indirectly. Indirect Incentive compatible No Dummy variable (yes = 1) Whether the method for measuring RWTP Yes is incentive compatible. Research stimulus Value Metric variable The mean RWTP converted into US dollars. Product type Convenience goods Two dummy variables for shopping Classification of respective stimulus based Shopping goods and specialty goods; convenience on an Copeland (1923). goods are captured by the intercept Specialty goods Innovation No Dummy variable (yes = 1) Whether the stimulus is an innovation. Yes Product/service Product Dummy variable (service = 1) Whether the stimulus is a product or a service. Service Variance ES Metric variable The variance of the ES. Research design Type of subject design Between Dummy variable (within = 1) Whether it is a between or a within subject design. Within Opportunity to test No Dummy variable (yes = 1) Whether participants had the chance to test Yes the product before bidding. Participation fee No Dummy variable (yes = 1) Whether participants received a participation fee. Yes Initial balance No Dummy variable (yes = 1) Whether participants received an initial Yes balance for the auction. Type of experiment HWTP Field Dummy variable (lab = 1) Whether HWTP is measured in a field Lab or a lab experiment. Type of experiment RWTP Field Dummy variable (lab = 1) Whether RWTP is measured in a field Lab or a lab experiment. Offline/online HWTP Offline Dummy variable (online = 1) Whether HWTP is measured offline or online. Online Offline/online RWTP Offline Dummy variable (online = 1) Whether RWTP is measured offline or online. Online Student sample No Dummy variable (yes = 1) Whether the sample consists of students only. Yes Introduction of method for RWTP None Three dummy variables for explanation, How the method for measuring RWTP Explanation training, and not mentioned; None was introduced. is captured by the intercept Training Not mentioned Region Other Countries (mostly Europe) Dummy variable (North America = 1) Region where the experiment was conducted. North America 506 J. of the Acad. Mark. Sci. (2020) 48:499–518 had on Google Scholar (citations), and in which year it was published (year). Methodology Effect size To determine the hypothetical bias induced by different methods, we need an ES that represents the difference be- tween obtained values for HWTP and RWTP. When the dif- ferences stem from a comparison of a treatment and a control group, standardized mean differences (SMD) are appropriate measures (e.g. Abraham and Hamilton 2018; Scheibehenne et al. 2010). Specifically, to compute SMD, researchers divide the difference in the means of the treatment and the control group by the standard deviation, which helps to control for differences in the scales of the dependent variables in the experiments. Accordingly, it applies to studies that measure the same outcome on different scales (Borenstein et al. 2009, p. 25). In contrast, the ESs in our meta-analysis rely on the same scale; they differ in their position on the scale, because the products evoke different WTP values. In this case, the standard deviation depends on not only the scale range but also many other relevant factors, so the standard deviation should not be used to standardize the outcomes. In addition, as studies may have used alternate experimental designs, dif- ferent standard deviations could be used across studies, lead- ing to standardized mean differences that are not directly com- parable (Morris and DeShon 2002). Rather than the SMD, we therefore use a response ratio to assess ES, because it depends on the group means only. Specifically, the response ratio is the mean outcome in an experimental group divided by that in a corresponding control group, such that it quantifies the percentage of variation be- tween the experimental and control groups (Hedges et al. 1999). Unlike SMD, the response ratio applies when the out- come is measured on a ratio scale with a natural zero point, such as length or money (Borenstein et al. 2009). Accordingly, the response ratio often assesses ES in meta-analyses in ecol- ogy domains (Koricheva and Gurevitch 2014), for which many outcomes can be measured on ratio scales. To the best of our knowledge though, the response ratio has not been adopted in meta-analyses in marketing yet. However, it is common practice to specify a multiplicative, instead of a linear, model when assessing the effects of marketing instruments on product sales or other outcomes (Leeflang et al. 2015). Hence, it would be a natural option to use an effect measure representing proportion- ate changes, instead of additive changes, when deriving empirical generalizations on marketing subjects like re- sponse effects to mailing campaigns. For our effort, we define the response ratio as Table 2 (continued) Category Moderator Values Variables Description Publication characteristics Peer reviewed No Dummy variable (yes = 1) Whether the study was peer reviewed. Yes Discipline Economics Dummy variable (marketing = 1) Corresponding research discipline Marketing Citations Metric variable Number of citations in Google Scholar Year Metric variable Year the study was published Moderators in italics are control variables J. of the Acad. Mark. Sci. (2020) 48:499–518 507 HWTP avoid dependent ESs (Grewal et al. 2017). However, nested response ratio ¼ ; RWTP data structures and the associated dependent ESs are prominent in marketing research, so Bijmolt and Pieters where μ and μ are the means of a study’s HWTP RWTP (2001) suggest using a three-level model to account for de- corresponding HWTP and RWTP values. pendency, by adding error terms on all levels. In turn, market- For three reasons, we run statistical analyses using the nat- ing researchers started to model dependence stochastically by ural logarithm of the response ratio as the dependent variable. applying multi-level regression models (e.g. Abraham and First, the use of the natural logarithm linearizes the metric, so Hamilton 2018;Artset al. 2011;Babić Rosario et al. 2016; deviations in the numerator and denominator have the same Bijmolt et al. 2005;Edeling andFischer 2016; Edeling and impact (Hedges et al. 1999). Second, the parameters (β)for Himme 2018). However, when additional information about the moderating effects in the meta-regression are easy to in- correlations among the ESs are available, it is most accurate to terpret, as a multiplication factor, by taking the exponent of the model dependence explicitly by incorporating the dependen- estimate (Exp(β)). Most moderators are dummy variables, and cies in the covariance matrix at the within-study level (Gleser a change of the corresponding dummy value results in a and Olkin 2009). In contrast to modeling dependence stochas- change of (Exp(β) − 1) ∗ 100% in the hypothetical bias. tically, the covariances are not estimated but rather are calcu- However, this point should not be taken to mean that the lated on the basis of the provided information. To the best of difference of the hypothetical bias between two conditions our knowledge, this approach has not been applied by meta- of a moderator is Exp(β) − 1 percentage points, because that analyses in marketing previously. value depends on the values of other moderators. Third, the To model stochastic dependence among ESs explicitly, we distribution of the natural logarithm of response ratios is ap- follow Kalaian and Raudenbush (1996)and useamultivariate proximately normally distributed (Hedges et al. 1999). mixed linear model with two levels: a within-studies level and Consequently, we define ES as: a between-studies level. On the former, we estimate a com- plete vector of the corresponding K true ESs, α =(α , μ i 1i HWTP ES ¼ ln : … , α ) , for each study i. However, not every study exam- Ki RWTP ines all possible K ESs, so the vector of ES estimates for study i, ES ¼ðÞ ES ; …; ES ,contains L of the total possible K i 1i L i i Modeling stochastically dependent effect sizes ESs, and by definition, L ≤ K.That is, K equals the maximum explicitly number of dependent ESs in one study (i.e., six in our sample), and every vector ES contains between one and six estimates. Most meta-analyses assume the statistical independence of The first-level model regresses α on ES with an indicator ki i observed ESs, but this assumption only applies to limited variable Z , which equals 1 if ES estimates α and0other- lki li ki cases; often, ESs are stochastically dependent. Two main wise, according to the following linear model: types of dependencies arise between studies and ESs. First, ES ¼ ∑ α Z þ e ; studies can measure and compare several treatments or vari- li ki lki li k¼1 ants of a type of treatment against a common control. In our or in matrix notation, context, for example, a study might measure HWTP with different methods and compare the results to the same ES ¼ Z α þ e : i i i i RWTP, leading to multiple ESs that correlate because they The first-level errors e are assumed to be multivariate nor- share the same RWTP. Treating them as independent would erroneously add RWTP to the analysis twice. This type of mal in their distribution, such that e ~N(0, V ), where V is a i i i K × K covariance matrix for study i, or the multivariate ex- study is called a multiple-treatment study (Gleser and Olkin i i 2009). Second, studies can produce several dependent ESs by tension of the V-known model for the meta-regression. The elements of V must be calculated according to the obtaining more than one measure from each participant. For example, a study might measure HWTP and RWTP for sev- chosen ES measure (see Web Appendix B; Gleser and Olkin 2009; Lajeunesse 2011). In turn, they form the eral products from the same sample. The resulting ESs basis for modeling the dependent ESs appropriately. correlate, because they are based on a common subject. The vector α of a study’strueESisestimated by This scenario represents a multiple-endpoint study weighted least squares, and each observation is weight- (Gleser and Olkin 2009). ed by the inverse of the corresponding covariance ma- There are different approaches for dealing with stochasti- trix (Gleser and Olkin 2009). cally dependent ESs, such as ignoring or avoiding depen- The linear model for the second stage is dence, or else modeling dependence stochastically or explic- itly (Bijmolt and Pieters 2001; van den Noortgate et al. 2013). α ¼ β þ ∑ β X þ u ; ki k0 km mi ki m¼1 In marketing research, it is still common, and also suggested to 508 J. of the Acad. Mark. Sci. (2020) 48:499–518 or in matrix notation symmetric, which indicates the absence of a publication bias. Finally, as the competing H1a and H1b indicate, we do not α ¼ X β þ u ; i i i expect a strong selection mechanism in research or publication processes that would favor significant or high (or low) ESs. where the K ESs become the dependent variable. The resid- Thus, we do not consider publication bias a serious concern uals u are assumed to be K-variate normal with zero average ki for our study. and a covariance matrix τ. Then X reflects the moderator To detect outliers in the data, we checked for extreme ESs variables. By combining both levels, the resulting model is using the boxplot (see Web Appendix D, Figure WA2). We are ES ¼ Z X β þ Z u þ e : i i i i i i especially interested in the moderator type of measurement HWTP, so we computed separate boxplots for the direct and Estimates for τ are based on restricted maximum likeli- indirect measures of HWTP and thereby identified one obser- hood. The analysis uses the metafor package for meta- vation for each measurement type (indirect Kimenju et al. analyses in R (Viechtbauer 2010). 2005; direct Neill et al. 1994) for which the ESs (0.9079; 0.9582) exceeded the upper whisker, defined as the 75% quantile plus 1.5 times the box length. Kimenju et al. (2005) Data screening and descriptive statistics report HWTP ($11.68) values from an indirect method that overestimate RWTP ($94.48) by a factor of eight; we exclud- One of the criticisms of meta-analyses is the risk of publica- ed it from our analyses. Neill et al. (1994) report HWTP tion bias, such that all the included ESs would reflect the non- ($109) that overestimates RWTP ($12) by a factor of nine random sampling procedure. Including unpublished studies when excluding outliers, and it is another outlier in our data- can address this concern; in our sample, 22 of 117 ESs come base. Thus, we excluded two of 117 observations, or less than from unpublished studies, for an unpublished work proportion 5% of the full sample, which is a reasonable range (Cohen of 19%, which favorably compares with other meta-analyses et al. 2003,p.397). pertaining to pricing, such as 10% in Tully and Winer (2014), The remaining 115 ESs represent 77 studies reported by 47 9% in Bijmolt et al. (2005), or 16% in Abraham and Hamilton different articles, with a total sample size of 24,347 for HWTP (2018). The funnel plot for the sample, as depicted in Fig. 1,is and 20,656 for RWTP. Sixteen out of these 115 ESs indicate Fig. 1 Funnel plot Notes: Six ESs with a very high standard error are not included here, to improve readability. A funnel plot with all ESs in Web Appendix C confirms the lack of a publication bias. J. of the Acad. Mark. Sci. (2020) 48:499–518 509 Table 3 Descriptive statistics Mean SD N Mean SD N Mean SD N Mean SD N Type of measurement HWTP Direct Indirect 0.1818 0.1709 85 0.2280 0.2048 30 Type of measurement RWTP Direct Indirect 0.1869 0.1776 106 0.2758 0.2055 9 Incentive compatible No Yes 0.1294 0.1709 24 0.2109 0.1801 91 Product type Convenience Shopping Specialty 0.1954 0.1852 38 0.1339 0.1554 48 0.2911 0.1758 29 Innovation No Yes 0.1760 0.1807 76 0.2287 0.1773 39 Product/service Product Service 0.2482 0.1797 80 0.0696 0.1840 35 Type of subject design Between Within 0.1800 0.1740 42 0.1800 0.1740 73 Opportunity to test No Yes 0.1626 0.1746 75 0.2524 0.1789 40 Participation fee No Yes 0.1400 0.1731 106 0.2747 0.1617 9 Initial balance No Yes 0.1774 0.1662 69 0.3879 0.2365 46 Type of experiment HWTP Field Lab 0.2716 0.1663 42 0.1491 0.1741 73 Type of experiment RWTP Field Lab 0.2743 0.1663 39 0.1526 0.1741 76 Offline/online HWTP Offline Online 0.1888 0.1893 87 0.2096 0.1521 28 Offline/online RWTP Offline Online 0.1880 0.1857 91 0.2159 0.1612 24 Student sample No Yes 0.2635 0.1571 57 0.1254 0.1769 58 Introduction of method for RWTP None Explanation Training Not mentioned 0.1689 0.1670 17 0.1657 0.1863 65 0.3464 0.2096 12 0.2201 0.1144 22 Region Other countries (mostly Europe) North America 0.2678 0.1773 32 0.1653 0.1746 83 Peer reviewed No Yes 510 J. of the Acad. Mark. Sci. (2020) 48:499–518 an underestimation of RWTP, resulting from direct (12) and indirect (4) methods. Table 3 contains an overview of the moderators’ descriptive statistics. Type of measurement HWTP reveals some mean differences between direct (0.1818) and indirect (0.2280) measures, which represents model-free support for H1b. The descriptive statistics of prod- uct type suggest a higher mean ES for specialty goods (0.2911) than convenience (0.1954) or shopping (0.1399) goods, in accordance with H3. With regard to innovation, we find a higher ES mean for innovative (0.2287) compared with non-innovative (0.1760) products, as we predicted in H4. Model-free evidence gathered from the moderators that reflect the research design also supports H5, in that the mean for between-subject designs is lower (0.1800) than that for within-subject designs (0.2798). The descriptive statistics can- not confirm H6 though, because giving participants an oppor- tunity to test a product before stating their WTP increases the ES (0.2525) relatively to no such opportunity (0.1626). We also do not find support for H7 in the model-free evidence, because studies with an initial balance and participation fee report higher ESs than those without. After detecting outliers and before conducting the meta- regressions, we checked for multicollinearity by calculating 1/(2 ∗ df) the generalized variance inflation factor GVIF , which is used when there are dummy regressors from categorical variables; it is comparable to the square root of the variance pffiffiffiffiffiffiffiffi inflation factor VIF for 1 degree of freedom (df = 1) (Fox and Monette 1992). In an iterative procedure, we excluded the 1/(2 ∗ df) moderator with the highest GVIF and reestimated the 1/(2 ∗ df) model repeatedly, until all moderators had a GVIF <2. This cut-off value of 2 has been applied in other disciplines (Pebsworth et al. 2012;Vegaetal. 2010) and is comparable to a VIF cut-off value of 4, within the range of suggested values (i.e., 3–5; Hair Jr et al., 2019, p. 316). Accordingly, we ex- cluded moderators—all control variables that do not appear in any hypotheses—in the following order: type of experiment 1/(2 ∗ df) HWTP (GVIF = 3.4723), offline/online RWTP 1/(2 ∗ df) 1/(2 ∗ df) (GVIF =3.2504), discipline (GVIF =2.2.4791), 1/(2 ∗ df) product/service (GVIF =2.2.3290), and peer reviewed 1/(2 ∗ df) (GVIF =2.0419). Results To address our research questions about the accuracy of WTP measurement methods and the moderators of this perfor- mance, we performed several meta-regressions in which we varied the moderating effects included in the models. First, we ran an analysis without any moderators. Second, we ran a meta-regression with all the moderators that met the multicollinearity criteria. Third, we conducted a stepwise anal- ysis, dropping the non-significant moderators one by one. Table 3 (continued) Mean SD N Mean SD N Mean SD N Mean SD N 0.1843 0.1938 21 0.1960 0.1785 94 Discipline Economics Marketing 0.1194 0.1435 65 0.2907 0.1789 50 Moderators in italics are control variables J. of the Acad. Mark. Sci. (2020) 48:499–518 511 Table 4 Results of full and reduced models Full model Reduced model Estimate EXP (Estimate) Std. Err. p Value Significance Estimate EXP (Estimate) Std. Err. p Value Significance Intercept −2.7030 0.0670 9.4731 0.7754 0.0831 1.0867 0.0500 0.0965 * Type of measurement HWTP (indirect) 0.1027 1.1082 0.0404 0.0110 ** 0.0905 1.0947 0.0382 0.0177 ** Type of measurement RWTP (indirect) −0.0132 0.9869 0.0587 0.8216 Incentive compatible (yes) 0.0488 1.0500 0.0574 0.3951 Value 0.0002 1.0002 0.0001 0.0656 * Product type (shopping) 0.0353 1.0359 0.0445 0.4274 0.0028 1.0028 0.0371 0.9388 Product type (specialty) 0.1615 1.1753 0.0476 0.0007 *** 0.1624 1.1763 0.0393 <.0001 *** Innovation (yes) −0.0004 0.9996 0.0505 0.9944 Variance ES 0.1752 1.1915 0.2527 0.4883 Type of subject design (within) 0.0878 1.0918 0.0439 0.0455 ** Opportunity to test (yes) 0.0139 1.0140 0.0468 0.7658 Participation fee (yes) 0.0522 1.0536 0.0489 0.2858 Initial balance (yes) 0.0978 1.1027 0.0746 0.1896 Type of experiment RWTP (lab) −0.0050 0.9950 0.0471 0.9156 Offline/online HWTP (offline) 0.0904 1.0946 0.0553 0.1019 Student sample (yes) −0.1134 0.8928 0.0446 0.0110 ** −0.1026 0.9025 0.0344 0.0021 *** Introduction of method for RWTP (explanation) 0.0497 1.0510 0.0579 0.3908 0.0671 1.0694 0.0420 0.1095 Introduction of method for RWTP (training) 0.1846 1.2027 0.0762 0.0154 ** 0.2032 1.2253 0.0604 0.0008 *** Introduction of method for RWTP (not mentioned) 0.1299 1.1387 0.0784 0.0974 * 0.1546 1.1672 0.0524 0.0032 *** Region (North America) −0.0765 0.9264 0.0467 0.1013 Citations 0.0001 1.0001 0.0001 0.3300 Year 0.0013 1.0013 0.0047 0.7809 τ 0.0031 0.0047 R 0.7416 0.6083 AICc 45.6093 −23.4892 Significance codes: *** p < 0.01; ** p < 0.05; * p < 0.1 Moderators in italics are control variables 512 J. of the Acad. Mark. Sci. (2020) 48:499–518 Fig. 2 Overestimation of RWTP 50% 40% 30% 20% 10% -2 0% Base scenario Product type Student sample Introduction of Introduction of (specialty) (yes) method for method for -10% RWTP (not RWTP (training) mentioned) Direct measurement of HWTP Indirect measurement of HWTP Notes: The base scenario is as follows: product type = convenience good,introduction of method for RWTP = explanation,student sample = no. The first model, including only the intercept, results in an H7a, or H7b though, because opportunity to test (β =0.0139, estimate (β) of 0.1889 with a standard error (SE) of 0.0183 Exp(β) = 1.0140, SE = 0.0468, p = 0.7658), participation fee and a p value < .0001. The estimate corresponds to an average (β = 0.0522, Exp(β) = 1.0536, SE = 0.0489, p = 0.2858), and hypothetical bias of 20.79% (Exp(0.1889) = 1.2079), meaning initial balance (β = 0.0978, Exp(β) = 1.1027, SE = 0.0746, that on average, HWTP overestimates RWTP by almost 21%. p = 0.1896) do not show significant effects. The analysis with all the moderators that met the Of the control variables, only student sample (β = − multicollinearity threshold produces the estimation results in 0.1134, Exp(β) = 0.8928, SE = 0.0446, p = 0.0110) and intro- Table 4.The type of measurement HWTP has a significant, duction of method for RWTP (training) (β = 0.1846, positive effect (β = 0.1027, Exp(β) = 1.1082, SE = 0.0404, Exp(β) = 1.2027, SE = 0.0762, p = 0.0154) exert significant p = 0.0110), indicating that indirect measures overestimate effects in the full model. If a study only includes students, RWTP more than direct measures do. We reject H1a and con- the hypothetical bias gets smaller by 11%; conducting mock firm H1b. In particular, the ratio of HWTP to RWTP should be auctions before measuring RWTP increases the hypothetical multiplied by 1.1082, resulting in an overestimation by indi- bias by 20%. rect methods of an additional 10.82%. Value has a significant, Finally, we ran analyses in which we iteratively excluded positive effect at the 10% level (β =0.0002, Exp(β) = 1.0002, moderators until all remaining moderators were significant at SE = 0.0001, p = 0.0656), in weak support of H2. The percent- the 5% level. We excluded the moderator with the highest p age overestimation of RWTP by HWTP increases slightly, by value from the full model, reran the analysis, and repeated this an additional 0.02%, with each additional U.S. dollar increase procedure until we had only significant moderators left. We in value. For H3, we find no significant difference in the treated the dummy variables from the nominal/ordinal moder- hypothetical bias between convenience and shopping goods, ators product type and introduction of method for RWTP as yet specialty goods evoke a significantly higher hypothetical belonging together, and we considered these moderators as bias than convenience goods (β = 0.1615, Exp(β) = 1.1753, significant when one of the corresponding dummy variables SE = 0.0476, p < .0001). This finding implies that the hypo- showed a significant effect. The exclusion order was as fol- thetical bias is greater for products that demand extraordinary lows: innovation, type of experiment RWTP, type of measure- search effort, as we predicted in H3. We do not find support ment RWTP, opportunity to test, year, variance ES, incentive for H4, because innovation does not influence the hypothetical compatible, initial balance, citations, participation fee, bias significantly (β = − 0.0004, Exp(β) = 0.9996, SE = region, value, type of subject design,and offline/online 0.0505, p =0.9944). HWTP. The results in Table 4 reconfirm the support for H1b, For moderators from the research design category, we con- because the type of measurement HWTP has a positive, sig- firm the support we previously identified for H5. Measuring nificant effect (β =0.0905, Exp(β) = 1.0947, SE =0.0382, p = HWTP and RWTP using a within-subject design results in a 0.0177), resulting in a multiplication factor of 1.0947. The greater hypothetical bias than does a between-subject design overestimation of RWTP increases considerably for measures (β = 0.0878, Exp(β) = 1.0918, SE = 0.0439, p = 0.0455), such of WTP for specialty goods (β = 0.1624, Exp(β) = 1.1763, that the hypothetical bias increases by an additional 9.18 per- SE = 0.0393, p < .0001), in support of H3. Yet we do not find centage points in this case. We do not find support for H6, support for any other hypotheses in the reduced model. J. of the Acad. Mark. Sci. (2020) 48:499–518 513 Regarding the control variables, student sample (β = − mentioned), region,and peer reviewed have significant 0.1026, Exp(β) = 0.9025, SE = 0.0344, p = 0.0021) again has effects (5% level). The moderators excluded from the a significant effect, and introduction of method for main models due to multicollinearity (product/service, RWTP affects the hypothetical bias significantly. In this type of experiment HWTP, offline/online RWTP,and case, the hypothetical bias increases when the article discipline) do not show significant influences. does not mention any introduction of the method for Next, we estimated two models with all ESs, including the measuring RWTP to participants (β = 0.1546, Exp(β)= two outliers, but varied varied the number of included moder- 1.1672, SE = 0.0524, p = 0.0032) and when the method ators (Models 2 and 3 in Table WA2). The results remain involves mock auctions (β = 0.2032, Exp(β) = 1.2253, similar to our main findings. Perhaps most important, SE = 0.0604, p = 0.0008). the type of measurement HWTP has a significant effect For ease of interpretation, we depict the hypothetical bias on the hypothetical bias, comparable in size to the ef- for different scenarios in Fig. 2. The reduced model provides a fect in the main model. better model fit, according to the corrected Akaike informa- In addition, instead of the multivariate mixed linear model, tion criterion (AICc) (AICc =45.61, AICc we used a random-effects, three-level model, such that the ES full model reduced model- = − 23.49), so we use it as the basis for the simulation. The measures nested within studies with a V-known model at the base scenario depicted in Fig. 2 measures WTP for conve- lowest level (Bijmolt and Pieters 2001; van den Noortgate nience goods, explains the method for measuring RWTP to et al. 2013), which can account for dependence between ob- participants, and does not include solely students. The other servations. We estimated the two main models and the three scenarios are adaptions of the base scenario, where one of the robustness check models with this random-effects three-level three aforementioned characteristics is changed. In the base model (Models 4–8 in Table WA2). Again, the results do not scenario, we predict that direct measurement overestimates change substantially, except for value, which becomes signif- RWTP by 9%, and indirect measurement overestimates it by icant at the 5% level. 19%, so the difference is 10 percentage points. In contrast, for Finally, we tested for possible interaction effects. That is, specialty goods, the overestimation increases to 28% for direct we took all significant moderators from the full model and and to 40% for indirect measures. When using a pure student tested, for each significant moderator, all possible interactions. sample instead of a mixed sample, the predictions are relative- The limited number of observations prevented us from simul- ly accurate. Here, direct measurement even underestimates taneously including all interactions in one model. Therefore, RWTP by 2%, while indirect measurement yields an overes- we first estimated separate models for each of the significant timation of 7%. With respect to how the method for measuring moderators from the full model, after dropping moderators 1/(2 ∗ RWTP is introduced to the participants, not mentioning it in a due to multicollinearity until all moderators had a GVIF df) paper, as well as training the method beforehand increase the < 2. Then, we estimated an additional extension of the full hypothetical bias. While the first option is hardly interpretable, model by adding all significant interactions that emerged from running mock tasks increases the bias to 33% in case of direct the previous interaction models. We next reduced that model and to 46% in case of indirect methods used for measuring until all moderators were significant at a 5% level. The HWTP. resulting model achieved a higher AICc than our main re- duced model. Comparing all full models with interactions, the model with the lowest AICc (Burnham and Anderson 2004) did not feature a significant interaction, indicating that Robustness checks the possible interactions are small and do not affect our results. All of these models are available in Web Appendix F. We ran several additional analyses to check the robustness of the results, which we summarize in Table WA2 in Web Appendix F. To start, we analyzed Model 1 in Table WA2 Discussion pffiffiffiffiffi 1=ðÞ 2*df by applying a cut-off value of GVIF < 10,compa- rable to the often used cut-off value of 10 for the VIF. In this Theoretical contributions case, we did not need to exclude any moderator, but the results do not deviate in their signs or significance levels relatively to Though three meta-analyses discussing the hypothetical bias the main results. Type of measurement HWTP still has a sig- exist (Carson et al. 1996; List and Gallet, 2001;Murphy etal. nificant effect (5% level) on the hypothetical bias. In 2005), this is the first comprehensive study giving marketing addition, value, product type (specialty),and type of managers and scholars advices on how to accurately measure subject design exert significant influences. Among the consumers’ WTP. In contrast to the existing meta-analyses, control variables, introduction of method for RWTP we focus on private goods, instead of on public goods, in- (training), introduction of method for RWTP (not creasing the applicability of our findings within a marketing 514 J. of the Acad. Mark. Sci. (2020) 48:499–518 Table 5 Hypotheses testing results Hypothesis Full model Reduced Robustness model checks H1a Type of measurement HWTP: indirect methods have smaller bias than direct methods H1b Type of measurement HWTP: direct methods have smaller bias than indirect methods ✓✓ ✓ H2 Bias increases with product value ✓✓ H3 Bias is least for convenience goods, greater for shopping goods, greatest for specialty goods ✓✓ ✓ H4 Bias is greater for innovations H5 Bias is greater for within-subject designs than for between-subject designs ✓✓ H6 Opportunity to test a product reduces the bias H7a Participation fee decreases the bias H7b Initial balance decreases the bias context. With a meta-analysis of 115 ESs gathered from 77 In our results related to H2, the p value of the value mod- studies reported in 47 papers, we conclude that HWTP erator is slightly greater than 5% in the full model, such that methods tend to overestimate RWTP considerably, by about the hypothetical bias appears greater for more valuable prod- 21% on average. This hypothetical bias depends on several ucts in percentage terms, though the effect is relatively small. factors, for which we formulated hypotheses (Table 5) and Value does not remain in the reduced model, but the signifi- which we discuss subsequently. cant effect is very consistent across the robustness checks that With respect to the method for measuring HWTP, whether feature the full model (Table 5). Therefore, our results support direct or indirect, across all the different models, we find H2: The hypothetical bias increases if the value of the prod- strong support for H1b, which states that indirect methods ucts to be evaluated increases. This finding is new, in that overestimate HWTP more severely than direct methods. neither existing meta-analyses (Carson et al. 1996;List and This important finding contradicts the prevailing opinion Gallet 2001;Murphy etal. 2005) nor any primary studies have among academic researchers (Breidert et al. 2006) and has examined this moderating effect. not previously been revealed in meta-analyses (Carson et al. We also find support for H3 across all analyzed models. For 1996;Listand Gallet 2001;Murphy et al. 2005). We in turn participants it is harder to evaluate a specialty product’s utility propose several potential mechanisms that could produce this than a convenience product’s utility; specialty goods often surprising finding. First, we consider the concept of coherent feature a higher degree of complexity or are less familiar to arbitrariness, as first introduced by Ariely et al. (2003). People consumers than convenience goods. The greater ability to as- facing many consecutive choices tend to base each decision sess the product’s utility reduces the hypothetical bias on their previous ones, such that they show stable preferences. (Hofstetter et al. 2013), such that our finding of higher over- However, study participants might make their first decision estimation for specialty goods is in line with prior research. more or less randomly. Indirect measures require many, con- Yet we do not find any difference between shopping and con- secutive choices, so coherent arbitrariness could arise when venience goods, prompting us to posit that the hypothetical using these methods to measure WTP. In that sense, the results bias might not be affected by moderate search effort; rather, of indirect measures indicate stable preferences, but they do only products demanding strong search effort increase the not accurately reflect the participants’ actual valuation. hypothetical bias. Existing meta-analyses (Carson et al. Second, participants providing indirect measure responses 1996; List and Gallet 2001;Murphy et al. 2005) include pub- might focus less on the absolute values of an attribute and lic goods and do not distinguish among different types of more on relative values (Drolet et al. 2000). The absolute private goods. By showing that the type of a private good values of the price attribute are key determinants of WTP, so influences the hypothetical bias, we add to an understanding the hypothetical bias might increase if the design of the choice of the hypothetical bias in a marketing context that features alternatives does not include correct price levels. A wide- private goods. spread argument for the greater accuracy of indirect methods With respect to innovation, we find no support for H4, compared with direct methods asserts they mimic a natural because the differences between innovations and existing shopping experience (Breidert et al. 2006); our analysis chal- products are small and not significant. This finding contrasts lenges this claim. with Hofstetter et al.’s(2013) results. Accordingly, we avoid rejecting the claim that methods for measuring HWTP work as well (or as poorly) for innovations as they do Please refer to Web Appendix A for a more detailed discussion of the for existing products. existing meta-analyses. J. of the Acad. Mark. Sci. (2020) 48:499–518 515 A within-subject research design increases the hypothetical using response ratios and thus offer marketing scholars anoth- bias, compared with a between-subject design, as we predict- er ES option to use in their meta-analyses. ed in H5 and in accordance with prior research (Ariely et al. 2006,FoxandTversky 1995, Frederick and Fischhoff 1998). Managerial implications Yet this finding still seems surprising to some extent. When asking a participant for WTP twice (once hypothetically, once This meta-analysis identifies a substantial hypothetical bias of in a real context), the first answer seemingly should serve as 21% on average in measures of WTP. Although hypothetically an anchor for the second, leading to an assimilation expected derived WTP estimates are often the best estimates available, to reduce the hypothetical bias. Instead, two similar questions managers should realize that they generally overestimate con- under different conditions appear to evoke a contrast instead sumers’ RWTP and take that bias into account when using of an assimilation effect, and they produce a greater hypothet- HWTP results to develop a pricing strategy or when setting ical bias. Consequently, when designing marketing experi- an innovation’s launch price. In addition, we detail conditions ments to investigate the hypothetical bias, researchers should in which the bias is larger or smaller, and we provide a brief use a between-subject design to prevent the answers from overview of how extensive the expected biases might become. influencing each other. When researching the influence of In particular, managers should anticipate a greater hypotheti- consumer characteristics on the hypothetical bias though, it cal bias when measuring WTP for products with higher values would be more appropriate to choose a within-subject design or for specialty goods. For example, when measuring HWTP (Hofstetter et al. 2013), though researchers must recognize for specialty goods, direct methods overestimate it by 28% that the hypothetical bias might be overestimated more severe- and indirect methods do so by 40%. These predicted degrees ly in this case. Murphy et al. (2005) also distinguish different of RWTP overestimation should be used to adjust decisions subject designs in their meta-analysis and find a significant based on WTP studies in practice. effect, though they use RWTP instead of the difference be- The study at hand also shows that direct methods result in tween HWTP and RWTP as their dependent variable. In this more accurate estimates of WTP than indirect methods do. sense, our finding of a moderating role of the study design on Therefore, practitioners can resist, or at least consider with the hypothetical bias is new to the literature. some skepticism, the prevalent academic advice to use indirect Our results do not support H6; we do not find differences in methods to measure WTP. In addition to being less accurate, the hypothetical bias when participants have an opportunity indirect methods require more effort and costs (Leigh et al. the test a product before stating their WTP or not. Testing a 1984). However, this recommendation only applies if the mea- product in advance reduces uncertainty about product perfor- surement of HWTP is necessary. If RWTP can be measured mance, and our finding is in contrast with Hofstetter et al.’s with an auction format, that option is preferable, since RWTP (2013) evidence that higher uncertainty increases the hypo- reflects actual WTP, whereas HWTP tends to overestimate it. thetical bias. Note however, that the result by Hofstetter This result also implies an exclusive focus on measuring WTP et al.’s(2013) refers to an effect of a consumer characteristic, for a specific product, such that it disregards some advantages and might be specific to the examined product, namely digital of the disaggregate information provided by indirect methods cameras. Our results are more general across a wide (e.g., demand due to cannibalization, brand switching, or range of product categories and experimental designs. market expansion; Jedidi and Jagpal 2009). In summary, the Furthermore, this result on H6 is in line with our find- key takeaway for managers who might use direct measures of ings for H4; both hypotheses rest on the participants’ HWTP is that the Bquick and dirty solution^ is only quick, not uncertainty about product performance, and we do not dirty—or at least, not more dirty than indirect methods. find support for either of them. Finally, neither a participation fee nor initial balance re- Limitations and research directions duce the hypothetical bias significantly, so we find no support for H7a or H7b. Formally, we can only Bnot reject^ a null This meta-analysis suggests several directions for further re- hypothesis of no moderator effect, but these findings suggest search, some of which are based on the limitations of our that we can dispel fears about influencing WTP results too meta-analysis. First, several recent adaptations of indirect much by offering participation fees or an initial balance. methods seek to improve their accuracy (Gensler et al. 2012, In addition to these theoretical insights on WTP measures, Schlereth and Skiera 2017). These improvements might re- we contribute to marketing literature by showing how to mod- duce the variance in measurement accuracy between direct el stochastically dependent ESs explicitly when the covari- and indirect measurements. These recently developed ances and variances of the observed ESs are known or can methods have not been tested by empirical comparison stud- be computed. Moreover, we use (the log of) the response ies, so we could not include them in our meta-analysis. An ratios as the ES in our meta-analysis, which has not been done extensive comparison of those adaptions, in terms of their previously in marketing. We provide a detailed rationale for effects on the hypothetical bias, would provide researchers 516 J. of the Acad. Mark. Sci. (2020) 48:499–518 and managers more comprehensive insights for choosing the pricing decisions, if it included assessments of different direct right method when measuring WTP. methods for measuring WTP. Second, the prevailing opinion of indirect methods yielding Acknowledgments The authors appreciate helpful comments from Felix a lower hypothetical bias than direct methods bases upon as- Eggers, Manfred Krafft, and Hans Risselada. sumptions concerning individuals’ decision making; though our results are in contrast with this opinion. The underlying mental processes when asked for the WTP through direct or indirect methods are not well understood yet. Investigating Open Access This article is distributed under the terms of the Creative those processes would foster the understanding of differences Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, in the hypothetical bias between direct and indirect methods distribution, and reproduction in any medium, provided you give and between other experimental conditions. This would en- appropriate credit to the original author(s) and the source, provide a link able the development of new adaptions minimizing the hypo- to the Creative Commons license, and indicate if changes were made. thetical bias. Third, the hypothetical bias depends on a variety of factors, including individual-level considerations (Hofstetter et al. 2013; Sichtmann et al. 2011), that extend beyond the product References or study level moderators as examined in our meta-regres- sions. Very few studies have investigated these factors, so Abraham, A. T., & Hamilton, R. W. (2018). When does partitioned pric- ing lead to more favorable consumer preferences? Meta-analytic we could not incorporate them in our meta-analysis, though evidence. Journal of Marketing Research, 55(5), 686–703. consumer characteristics likely explain some differences. Anderson, J. C., Jain, D. C., & Chintagunta, P. K. (1992). Customer value Therefore, we call for more research on whether and how assessment in business markets: A state-of-practice study. Journal of individual characteristics influence the hypothetical bias. For Business-to-Business Marketing, 1(1), 3–29. example, a possible explanation for the limited accuracy of Ariely, D., Loewenstein, G., & Prelec, D. (2003). BCoherent arbitrariness^: Stable demand curves without stable preferences. indirect measures could reflect coherent arbitrariness (Ariely The Quarterly Journal of Economics, 118(1), 73–106. et al. 2003). Continued research might examine whether and Ariely, D., Ockenfels, A., & Roth, A. E. (2005). An experimental analysis how coherent arbitrariness affects different consumers, espe- of ending rules in internet auctions. RAND Journal of Economics, cially in the context of CBCs. In addition, our findings on 36(4), 890–907. some product-level factors are new, namely that the hypothet- Ariely, D., Loewenstein, G., & Prelec, D. (2006). Tom sawyer and the construction of value. Journal of Economic Behavior & ical bias is greater for higher valued products and for specialty Organization, 60(1), 1–10. goods. These results could be cross-validated in future exper- Arts, J. W., Frambach, R. T., & Bijmolt, T. H. A. (2011). Generalizations imental studies. on consumer innovation adoption: A meta-analysis on drivers of Fourth, knowing and measuring WTP is crucial for firms intention and behavior. International Journal of Research in Marketing, 28(2), 134–144. operating in business-to-business (B2B) contexts (Anderson Babić Rosario, A., Sotgiu, F., de Valck, K., & Bijmolt, T. H. A. (2016). et al. 1992), yet all ESs in our study are from a business-to- The effect of electronic word of mouth on sales: A meta-analytic consumer context. Because B2B products and services tend to review of platform, product, and metric factors. Journal of be more complex, customers might prefer to identify product Marketing Research, 53(3), 297–318. characteristics and to include them separately when determin- Barrot, C., Albers, S., Skiera, B., & Schäfers, B. (2010). Why second- price sealed-bid auction leads to more realistic price-demand func- ing their WTP in response to an indirect method. However, tions. International Journal of Electronic Commerce, 14(4), 7–38. anecdotal evidence indicates that direct measurement works Becker, G. M., DeGroot, M. H., & Marschak, J. (1964). Measuring utility better for industrial goods than for consumer goods (Dolan by a single-response sequential method. Systems Research and and Simon 1996). Researching the differential accuracy of Behavioral Science, 9(3), 226–232. Bijmolt, T. H. A., & Pieters, R. G. M. (2001). Meta-analysis in marketing the various methods in a B2B context would be espe- when studies contain multiple measurements. Marketing Letters, cially interesting; our study already indicates differences 12(2), 157–169. between convenience and (more complex) specialty Bijmolt, T. H. A., van Heerde, H. J., & Pieters, R. G. M. (2005). New goods. Therefore, we join Lilien (2016)incalling for empirical generalizations on the determinants of price elasticity. more research in B2B marketing, including the measure- Journal of Marketing Research, 42(2), 141–156. Bolton, G. E., & Ockenfels, A. (2014). Does laboratory trading mirror ment of WTP. behavior in real world markets? Fair bargaining and competitive Fifth, the majority of studies included herein used open bidding on eBay. Journal of Economic Behavior & Organization, questioning as the direct method for measuring WTP. In prac- 97,143–154. tice, different direct methods are available (Steiner and Borah, A., Wang, X., & Ryoo, J. H. (2018). Understanding influence of Hendus 2012), yet they rarely have been investigated in aca- marketing thought on practice: An analysis of business journals using textual and latent Dirichlet allocation (LDA) analysis. demic research. Pricing research could increase in managerial Customer Needs and Solutions, 5(3–4), 146–161. relevance (Borah et al. 2018), and help managers make better J. of the Acad. Mark. Sci. (2020) 48:499–518 517 Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. Gensler, S., Neslin, S. A., & Verhoef, P. C. (2017). The showrooming phenomenon: It’s more than just about price. Journal of Interactive (2009). Introduction to meta-analysis. Chichester, United Kingdom: John Wiley & Sons. Marketing, 38,29–43. Breidert, C., Hahsler, M., & Reutterer, T. (2006). A review of methods for Gleser, L. J., & Olkin, I. (2009). Stochastically dependent effect sizes. In measuring willingness-to-pay. Innovative Marketing, 2(4), 8–32. H. Cooper, L. V. Hedges, & J. C. Valentine (Eds.), The handbook of nd research synthesis and meta-analysis (2 ed., pp. 357–376). New Brown, T. C., Champ, P. A., Bishop, R. C., & McCollum, D. W. (1996). York: Russel Sage Foundation. Which response format reveals the truth about donations to a public good? Land Economics, 72(2), 152–166. Grewal, D., Puccinelli, N., & Monroe, K. B. (2017). Meta-analysis: Integrating accumulated knowledge. Journal of the Academy of Brown, T. C., Ajzen, I., & Hrubes, D. (2003). Further tests of entreaties to Marketing Science, 47(5), 840. avoid hypothetical bias in referendum contingent valuation. Journal of Environmental Economics and Management, 46(2), 353–361. Hair J.F. Jr, Black, W. C., Babin, B. J., & Anderson, R. E. (2019). th Multivariate data analysis (8 ed.). Hampshire, United Kingdom: Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference: Cengage Learning EMEA. Understanding AIC and BIC in model selection. Sociological Methods & Research, 33(2), 261–304. Hallgren, K. A. (2012). Computing inter-rater reliability for observational data: An overview and tutorial. Tutorial in Quantitative Methods for Bushong, B., King, L. M., Camerer, C. F., & Rangel, A. (2010). Psychology, 8(1), 23–34. Pavlovian processes in consumer choice: The physical presence of a good increases willingness-to-pay. American Economic Review, Harrison, G. W., & Rutström, E. E. (2008). Experimental evidence on the 100(4), 1556–1571. existence of hypothetical bias in value elicitation methods. In C. R. Plott & V. L. Smith (Eds.), Handbook of experimental economics Carson, R. T., Flores, N. E., Martin, K. M., & Wright, J. L. (1996). results (Vol. 1, pp. 752–767). Amsterdam, Netherlands: Elsevier. Contingent valuation and revealed preference methodologies: Comparing the estimates for quasi-public goods. Land Economics, Hedges, L. V., Gurevitch, J., & Curtis, P. S. (1999). The meta-analysis of 72(1), 80–99. response ratios in experimental ecology. Ecology, 80(4), 1150–1156. Charness, G., Gneezy, U., & Kuhn, M. A. (2012). Experimental methods: Hensher, D. A. (2010). Hypothetical bias, choice experiments and will- Between-subject and within subject designs. Journal of Economic ingness to pay. Transportation Research Part B: Methodological, Behavior & Organization, 81(1), 1–8. 44(6), 735–752. Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for eval- Hoeffler, S. (2003). Measuring preferences for really new products. uating normed and standardized assessment instruments in psychol- Journal of Marketing Research, 40(4), 406–420. ogy. Psychological Assessment, 6(4), 284–290. Hofstetter, R., Miller, K. M., Krohmer, H., & Zhang, Z. J. (2013). How do Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple consumer characteristics affect the bias in measuring willingness to regression/correlation analysis for the behavioral sciences (3rd ed.). pay for innovative products? Journal of Product Innovation Mahwah, NJ: Lawrence Erlbaum Associates. Management, 30(5), 1042–1053. Copeland, M. T. (1923). Relation of consumers’ buying habits to market- Ingenbleek, P. T. M., Frambach, R. T., & Verhallen, T. M. M. (2013). Best ing methods. Harvard Business Review, 1(3), 282–289. practices for new product pricing: Impact on market performance Dimoka, A., Hong, Y., & Pavlou, P. A. (2012). On product uncertainty in and price level under different conditions. Journal of Product online markets: Theory and evidence. MIS Quarterly, 36(2), 395– Innovation Management, 30(3), 560–573. Jedidi, K., & Jagpal, S. (2009). Willingness to pay: Measurement and Ding, M. (2007). An incentive-aligned mechanism for conjoint analysis. managerial implications. In V. R. Rao (Eds.), Handbook of pricing Journal of Marketing Research, 44(2), 214–223. research in marketing (pp. 37–60). Cheltenham, United Kingdom: Edward Elgar Publishing. Ding, M., Grewal, R., & Liechty, J. (2005). Incentive-aligned conjoint analysis. Journal of Marketing Research, 42(1), 67–82. Jedidi, K., & Zhang, Z. J. (2002). Augmenting conjoint analysis to esti- mate consumer reservation price. Management Science, 48(10), Dolan, R. J., & Simon, H. (1996). Power pricing: how managing price 1350–1368. transforms the bottom line. New York: The Free Press. Kagel, J. H., Harstad, R. M., & Levin, D. (1987). Information impact and Drolet, A., Simonson, I., & Tversky, A. (2000). Indifference curves that allocation rules in auctions with affiliated private values: A labora- travel with the choice set. Marketing Letters, 11(3), 199.209. tory study. Econometrica, 55(6), 1275–1304. Edeling, A., & Fischer, M. (2016). Marketing’s impact on firm value: Kalaian, H. A., & Raudenbush, S. W. (1996). A multivariate mixed linear Generalizations from a meta-analysis. Journal of Marketing model for meta-analysis. Psychological Methods, 1(3), 227–235. Research, 53(4), 515–534. Kimenju, S. C., Morawetz, U. B., & De Groote, H. (2005). Comparing Edeling, A., & Himme, A. (2018). When does market share matter? New contingent valuation method, choice experiments and experimental empirical generalizations from a meta-analysis of the market share– auctions in soliciting consumer preference for maize in Western performance relationship. Journal of Marketing, 82(3), 1–24. Kenya: Preliminary results (Presentation at the African Eggers, F., & Sattler, H. (2009). Hybrid individualized two-level choice- th Econometric Society 10 annual conference on econometric model- based conjoint (HIT-CBC): A new method for measuring preference ing in Africa, Nairobi, Kenya). structures with many attribute levels. International Journal of Kohli, R., & Mahajan, V. (1991). A reservation-price model for optimal Research in Marketing, 26(2), 108–118. pricing of multiattribute products in conjoint analysis. Journal of Fox, J., & Monette, G. (1992). Generalized collinearity diagnostics. Marketing Research, 28(3), 347–354. Journal of the American Statistical Association, 87(417), 178–183. Koricheva, J., & Gurevitch, J. (2014). Uses and misuses of meta-analysis Fox, C. R., & Tversky, A. (1995). Ambiguity aversion and comparative in plant ecology. Journal of Ecology, 102(4), 828–844. ignorance. The Quarterly Journal of Economics, 110(3), 585–603. Lajeunesse, M. J. (2011). On the meta-analysis of response ratios for Frederick, S., & Fischhoff, B. (1998). Scope (in)sensitivity in elicited studies with correlated and multi-group designs. Ecology, 92(11), valuations. Risk Decision and Policy, 3(2), 109–123. 2049–2055. Gensler, S., Hinz, O., Skiera, B., & Theysohn, S. (2012). Willingness-to- Leeflang, P.S.H., Wieringa, J.E., Bijmolt, T.H.A., & Pauwels, K.H. pay estimation with choice-based conjoint analysis: Addressing ex- (2015). Modeling markets; analyzing marketing phenomena and treme response behavior with individually adapted designs. improving marketing decision making. New York, NY: Springer. European Journal of Operational Research, 219(2), 368–378. 518 J. of the Acad. Mark. Sci. (2020) 48:499–518 Leigh, T. W., MacKay, D. B., & Summers, J. O. (1984). Reliability and Shogren, J. F., Margolis, M., Koo, C., & List, J. A. (2001). A random nth- validity of conjoint analysis and self-explicated weights: A compar- price auction. Journal of Economic Behavior & Organization, ison. Journal of Marketing Research, 21(4), 456–462. 46(4), 409–421. Sichtmann, C., Wilken, R., & Diamantopoulos, A. (2011). Estimating Lilien, G. (2016). L. (2016). The b2b knowledge gap. International willingness-to-pay with choice-based conjoint analysis: Can con- JournalofResearch inMarketing,33,543–556. sumer characteristics explain variations in accuracy? British List, J. A., & Gallet, C. A. (2001). What experimental protocol influence JournalofManagement, 22(4), 628–645. disparities between actual and hypothetical stated values? Evidence Simon, H. (2018). Irrationals Verhalten. Interview. Harvard Business from a meta-analysis. Environmental and Resource Economics, Manager, 40(8), 52–54. 20(3), 241–254. Steiner, M., & Hendus, J. (2012). How consumers’ willingness to pay is Lusk, J. L., & Schroeder, T. C. (2004). Are choice experiments incentive measured in practice: An empirical analysis of common approaches’ compatible? A test with quality differentiated beef steaks. American relevance. Retrieved from SSRN: https://ssrn.com/abstract= Journal of Agricultural Economics, 86(2), 467–482. 2025618. Accessed 20 Aug 2018 Miller, K. M., Hofstetter, R., Krohmer, H., & Zhang, Z. J. (2011). How Steiner, M., Eggert, A., Ulaga, W., & Backhaus, K. (2016). Do custom- should consumers’ willingness to pay be measured? An empirical ized service packages impede value capture in industrial markets? comparison of state-of-the-art approaches. Journal of Marketing Journal of the Academy of Marketing Science, 44(2), 151–165. Research, 48(1), 172–184. Thompson, S. G., & Sharp, S. J. (1999). Explaining heterogeneity in Morris, S. B., & DeShon, R. P. (2002). Combining effect size estimates in meta-analysis: A comparison of methods. Statistics in Medicine, meta-analysis with repeated measures and independent-groups de- 18(20), 2693–2708. signs. Psychological Methods, 7(1), 105–125. Tully, S. M., & Winer, R. S. (2014). The role of the beneficiary in will- Murphy, J. J., Allen, P. G., Stevens, T. H., & Weatherhead, D. (2005). A ingness to pay for socially responsible products: a meta-analysis. meta-analysis of hypothetical bias in stated preference valuation. Journal of Retailing, 90(2), 255–274. Environmental and Resource Economics, 30(3), 313–325. van den Noortgate, W., López-López, J. A., Marín-Martínez, F., & Nagle, T. T., & Müller, G. (2018). The strategy and tactics of pricing: A Sánchez-Meca, J. (2013). Three-level meta-analysis of dependent guide to growing more profitably (6th ed.). New York, NY: effect sizes. Behavior Research Methods, 45(2), 576–594. Routledge. van Houwelingen, H. C., Arends, L. R., & Stijnen, T. (2002). Advanced Neill, H. R., Cummings, R. G., Ganderton, P. T., Harrison, G. W., & methods in meta-analysis: Multivariate approach and meta-regres- McGuckin, T. (1994). Hypothetical surveys and real economic com- sion. Statistics in Medicine, 21(4), 589–624. mitments. Land Economics, 70(2), 145–154. Vega, L. A., Koike, F., & Suzuki, M. (2010). Conservation study of Noussair, C., Robin, S., & Ruffieux, B. (2004). Revealing consumers’ myrsine seguinii in Japan: Current distribution explained by past willingness-to-pay: A comparison of the BDM mechanism and the land use and prediction of distribution by land use-planning simula- Vickrey auction. Journal of Economic Psychology, 25(6), 725–741. tion. Ecological Research, 25(6), 1091–1099. Ockenfels, A., & Roth, A. E. (2006). Late and multiple bidding in second Vickrey, W. (1961). Counterspeculation, auctions, and competitive sealed price internet auctions: Theory and evidence concerning different tenders. Journal of Finance, 16(1), 8–37. rules for ending an auction. Games and Economic Behavior, 55(2), Viechtbauer, W. (2010). Conducting meta-analyses in R with the metafor 297–320. package. Journal of Statistical Software, 36(3). Völckner, F. (2006). Methoden zur Messung individueller Pebsworth, P. A., MacIntosh, A. J. J., Morgan, H. R., & Huffman, M. A. Zahlungsbereitschaften: Ein Überblick zum State of the Art. (2012). Factors influencing the ranging behavior of chacma baboons Journal für Betriebswirtschaft, 56(1), 33–60. (papio hamadryas ursinus) living in a human-modified habitat. Wang, T., Venkatesh, R., & Chatterjee, R. (2007). Reservation price as a International Journal of Primatology, 33(4), 872–887. range: An incentive-compatible measurement approach. Journal of Rutström, E. E. (1998). Home-grown values and incentive compatible Marketing Research, 44(2), 200–213. auction design. International Journal of Game Theory, 27(3), Wertenbroch, K., & Skiera, B. (2002). Measuring consumers’ willingness 427–441. to pay at the point of purchase. Journal of Marketing Research, Scheibehenne, B., Greifeneder, R., & Todd, P. M. (2010). Can there ever 39(2), 228–241. be too many options? A meta-analytic overview of choice overload. Wlömert, N., & Eggers, F. (2016). Predicting new service adoption with Journal of Consumer Research, 37(3), 409–425. conjoint analysis: External validity of BDM-based incentive-aligned Schlag, N. (2008). Validierung der Conjoint-Analyse zur Prognose von and dual-response choice designs. Marketing Letters, 27(1), 195– Preisreaktionen mithilfe realer Zahlungsbereitschaften. In Lohmar. Germany: Josef Eul Verlag. Schlereth, C., & Skiera, B. (2017). Two new features in discrete choice experiments to improve willingness-to-pay estimation that result in Publisher’snote Springer Nature remains neutral with regard to jurisdic- SDR and SADR: Separated (adaptive) dual response. Management tional claims in published maps and institutional affiliations. Science, 63(3), 829–842.

Journal

Journal of the Academy of Marketing ScienceSpringer Journals

Published: May 7, 2020

There are no references for this article.