Judgment and Decision Making, Vol. 13, No. 2, March 2018, pp. 163-169

Using Tversky’s contrast model to investigate how features of similarity affect judgments of likelihood

Mirta Galesic*   A. Walkyria Goode#   Thomas S. Wallsten$   Kent L. Norman$

The representativeness heuristic suggests that similarity judgments provide a basis for judgments of likelihood. We use Tversky’s (1977) contrast model of similarity to design tests of this underlying mechanism. If similarity is used to judge likelihood, factors that are known to affect similarity should also affect judgments of likelihood. In two experiments, we manipulated two such factors described in the contrast model of similarity: the nature of the task and context effects. In a between-subject design, respondents assessed either similarity of fictive citizens of 15th century Florence, or the likelihood that they belonged to the same family. The factors that affected similarity also affected the likelihood judgments. These results support the assumption that similarity is an important contributor to judgments of likelihood.


Keywords: similarity, dissimilarity, context, probability judgment

1  Introduction

Over 40 years ago Kahneman and Tversky (1972) proposed that judgments of likelihood are sometimes based on representativeness, or the similarity of an event to some class of events. This representativeness heuristic, although often useful, can lead to departures from normative probability rules. Relatively little research has been done on investigating whether manipulations of similarity affect likelihood judgments, as would be expected by the representativeness heuristic. In a classic set of studies, Bar-Hillel (1982) examined the relationship between similarity and likelihood judgments. She used stimuli in which similarity and likelihood did not coincide and found that judgments of likelihood were strongly related to judgments of similarity. Read and Grushka-Cockayne (2011) built on her work and showed that similarity can be used to make accurate judgments of likelihood.

Here we take a somewhat different route and investigate whether manipulating similarity has similar effects on both similarity and likelihood judgments. We use Tversky’s (1977) seminal contrast model of similarity to determine factors that affect similarity judgments, and test whether they also influence judgments of likelihood.

1.1  Contrast model of similarity

Tversky (1977) noted that many items, such as faces, countries, or personalities, are better described in terms of qualitative features than in terms of a small number of quantitative dimensions. Rather than as point on a continuous dimension, such items might be better described in terms of presence or absence of specific features. Accordingly, he proposed a model in which objects are represented by sets of features, and similarity judgments depend on how the features match.

Features are usually discrete, often binary variables, but any dimension can also be represented as nested or overlapping sets of discrete features. The observed similarity of object a to object b, S(a,b), is a function of their common features, those that are shared by both a and b, and their distinctive features, those that belong to one but not to the other. The theory then expresses similarity as a linear combination, or a contrast, of the measures of their common and distinctive features:

S(a,b) = θ f(A ⋂ B) − α f(AB) − β f(BA)

where θ, α, β, ≥ 0. This model allows for a variety of similarity relations over the same set of objects, depending on the values of the parameters θ, α, and β. If θ = 1 and α = β = 0, the similarity of the objects is entirely determined by their common features. If, on the other hand, θ = 0 and α = β = 1, the similarity of the objects is determined entirely by their distinctive features. The scale f reflects prominence or salience of different features, determining the contribution of each individual feature to the similarity between objects.1

By means of the parameters θ, α, and β the contrast model allows people to pay more attention to the objects’ common features when assessing their similarity, and more attention to their distinctive features when assessing their dissimilarity. As a result, a pair of richly described objects, which are likely to share many common and many distinctive features, could be judged to be both more similar and more different than a pair of less richly described objects, which are likely to share fewer common and distinctive features. As an example of such effects of the nature of comparison, Tversky (1977; Tversky & Gati, 1978, Study 1) showed that countries which their respondents described as “more prominent” (such as West and East Germany at the time, or England and Ireland) were judged to be both more similar and more different than countries they described as “less prominent” (such as Ceylon and Nepal, or Pakistan and Mongolia). The results confirm the hypothesis that the relative weight of common and distinctive features varies with nature of the comparison. Shafir (1993), who has extended this paradigm to choices between different options has shown that more richly described (“enriched”) options tend to be both more often chosen and more often rejected than the less richly described (“impoverished”) options.

The contrast model is also consistent with context effects on the judged similarity of objects, because the function f is sensitive to the context of comparison. Some features have greater diagnostic value, and affect the judgments of similarity more in certain contexts than in others. In one of the many demonstrations of these context effects, respondents were presented with four countries that naturally formed two clusters (Tversky, 1977; Tversky & Gati, 1978, Study 4). For example, consider the two sets of four countries, (England, Israel, Syria, Iran) and (England, Israel, Syria, France), which differ only in the fourth country in the list. A natural grouping in the first case is Syria and Iran as Muslim countries and England and Iran as non-Muslim countries. A natural grouping in the second case is Israel and Syria as Middle-Eastern countries and England and France as European countries. Accordingly, England and Israel are judged as more similar to each other in the first case than in the second. Note that somewhat similar effects have been found in the choice-set effects literature (e.g. Hsee, 1996), where presence of an additional option changes evaluation of the existing options.

If the representativeness heuristic for judging likelihood is based on similarity (Kahneman & Tversky, 1972; Kahneman & Frederick, 2002), then it should be sensitive to factors that influence similarity, such as nature of comparison and context effects. To test this idea, we designed two experiments aimed at exploring these factors. In designing the experiments, we closely followed the methodology of the studies described in Tversky (1977) and Tversky and Gati (1978). In what follows, we first describe the methodology shared by both experiments, and then address them separately.

2  General methodology





Figure 1: Description of the study, Experiment 1.



Materials. Both experiments used fictitious citizens of 15th century Florence2 identified by Italian names selected from a list of names online.3 An average respondent knows little or nothing about the inhabitants of 15th century Florence, yet the objects – citizens - sound plausible and are suitable for formulating different tasks (see also Koehler, Brenner, Liberman & Tversky, 1996).

As features, we used trait adjectives associated with two broad factors derived from the Big Five Factor Model of Personality. These factors, Agreeableness and Openness to Experience, have high loadings on different higher-order factors (Digman, 1997). We aimed to add or remove traits from the descriptions of citizens without influencing the meaning/connotations of the remaining traits.

Each citizen was described by a number of trait adjectives, taken from Goldberg’s (1990) list of 100 clusters of adjectives, which he derived by a factor analysis of 339 adjectives describing different personality traits (Table 3 in Goldberg, 1990). Average reliability of his clusters was α = .66, and average pair-wise correlation of trait adjectives within the clusters was r = .40. We chose eight positive and eight negative clusters of trait adjectives from each of the two factors mentioned above. Each citizen was described by one or more trait adjectives belonging to the same cluster, for each of the two factors. The Appendix provides a complete list of trait adjectives we used.

Procedure. The questionnaire started with a consent form, and a few questions on respondents’ demographic characteristics and English language skills. The respondents then received a short introduction to the study (Figure 1). The respondents proceeded to answer questions about the experimental items, presented in random order. Respondents had to answer each item before they could continue to the next one.

3  Experiment 1: Nature of comparison.


Figure 2: An example of an item in Experiment 1.

3.1  Hypothesis

According to Tversky’s (1977) contrast model, when judging similarity among objects, people tend to weigh their common features more heavily than their distinctive features. This relative weighting is reversed when judging differences among objects. As a result, “enriched” pairs of objects, i.e., those with both more common and more distinctive features will be judged as both more similar and more different to each other than will pairs of “impoverished” objects, i.e., those with fewer common and distinctive features. Consequently, if the representativeness heuristic is used to judge likelihoods, pairs of enriched objects should be judged as both more and less likely to belong to the same class than pairs of impoverished objects.

3.2  Method

Respondents. The respondents were recruited either from the pool of the University of Maryland undergraduate psychology students (n = 54, 76% female), or through online advertising to the general public (n = 130, 70% female). To reduce the burden for the latter group, they were asked to complete only a random half of all experimental items. At the end of the study, the students were rewarded by course credit, and the Web respondents were offered a list of potentially interesting links related to judgment, decision-making, and perception. There were no significant differences between the two samples so we report pooled results.

Materials. In this experiment, an item consisted of two pairs of citizens, each described by adjectives associated with the traits of Agreeableness and Openness to Experience. In one pair six adjectives described each member, three focusing on one trait and three on the other. In the other pair only two adjectives described each member, one on each trait. An example of a typical item is shown in Figure 2. By using clusters of adjectives describing the same trait, respondents in the enriched conditions arguably received little new information (according to Goldberg’s 1992 analysis described above), but via a larger number of adjectives.

Pretest. We generated 160 items consisting of two pairs of citizens. In order to equate richness of description with what Tversky called “prominence”, we asked the respondents to select the pair in each item that “stands out more.” The sample included 28 undergraduate students of psychology, tested on computers in our lab, and 57 respondents tested online, recruited through word of mouth. Across items, the average percentage of respondents choosing the enriched pair was 79%. For the main study, we selected 20 items for which agreement was 100%.

Procedure. Respondents were randomized to four experimental groups (see Table 1, rows). Two groups made similarity judgments regarding the pairs of citizens: one group was asked to assess their similarity (“Choose the pair whose members are more similar”), while the other assessed their dissimilarity (“Choose the pair whose members are less similar”). The other two groups assessed the likelihood that the citizens from each of the pairs belong to the same family. Equivalently to the similarity group, one of the groups was told to “Choose the pair whose members are more likely to belong to the same family”, while the other was told to “Choose the pair whose members are less likely to belong to the same family”. Position of the items on the screen was counterbalanced – the enriched pair was put above the impoverished pair for half of the items. Items were randomized for each respondent.

3.3  Results

The percentage of respondents choosing the enriched pair in each of the four conditions is shown in Table 1. If the nature of comparison does not play any role, we would expect that the percentages for each type of judgments sum to 100. However, the average sums were larger than 100 (t(67)=5.55, p=.001).

Of particular interest to our study are the judgments of likelihood. The percentage of respondents who chose the enriched pair as “more likely to belong to the same family” and those who chose that same pair as “less likely to belong to the same family” summed to 117.8, higher than 100 (t(50)=7.37, p=.001).


Table 1: Results of the experimental manipulation of the nature of comparison (averages across 20 items).
Judgments (“Choose the pair whose members are…”)Percentage choosing the enriched pairN
Similarity  
“… more similar”4935
“… less similar”6233
Sum of the two versions11168
Likelihood  
“… more likely to belong to the same family”5330
“… less likely to belong to the same family”6521
Sum of the two versions11851

4  Experiment 2: Context effects


Version 1

Version 2

Figure 3: An example of items in Experiment 2.

4.1  Hypothesis

According to the contrast model, objects can be judged to be more or less similar to each other based on the context in which they appear. In particular, context can alter the salience of certain features by changing the natural clustering of the objects. The same effects should hence be observed in judgments of likelihood if based on similarity.

4.2  Method

Respondents.. We used Amazon Mechanical Turk to recruit 63 respondents for the pretest (29% female) and 118 respondents for the main study (43% female). They were all native English speakers and most of them were between 25 and 40 years of age.

Materials. The objects and features used to form the items in this experiment were the same as in Experiment 1. An item consisted of a quadruple of citizens. Each citizen was described with four features. Each quadruple had two versions. Three of the citizens were the same in both versions (a, b, and c), while the fourth differed (p or q). Their features were chosen in such a way that the natural groupings within the quadruple changed when the fourth citizen was changed. In one version, the natural groupings were a with b and c with p, while in the other the natural groupings were a with c and b with q. Hence, we expected that a would be perceived as more similar to b than to c in the presence of p, but would be perceived as more similar to c than to b in the presence of q. An example is shown in Figure 4. Here, Fiorenza is citizen a, Amadora b, Rosa c, Gianina p, and Ottavia q.

Pretest. We generated 20 pairs of items, or 40 quadruples in total. In the pretest (equivalent to the one described in Tversky, 1977; and Tversky & Gati, 1978, study 4), we checked whether the natural groupings of each quadruple were in accord with our expectations. The respondents were asked to divide each quadruple into two most natural pairs. All quadruples were divided in the expected pairs. The average percentage of the respondents who grouped the quadruples as expected was 69% (minimum 57%, maximum 83%).

Procedure. Respondents were randomly divided into four groups and asked questions about the 20 experimental items. Two of the groups assessed the similarity of the citizens, and two the likelihood that the citizens were cousins. For each quadruple, the respondents had to say which of the three citizens – b, c, or p/q, is the most similar to, or the most likely to belong to the same family as, the citizen a. Within each type of judgment, one group of respondents got quadruples containing citizen p, while the other group got those same quadruples but with citizen q instead. Citizen a was always positioned on the top of the page, while the order of the other three citizens on the page was counterbalanced. Items were randomized.


Table 2: Results of the context manipulation on mean differences in choice proportions (averages across 20 items).
Judgments (“Which of these three persons is…”)%N
Similarity  
“… most similar to <person a>?”  
% choosing b in presence of p − % choosing b in presence of q1031
% choosing c in presence of q − % choosing c in presence of p3331
Mean difference2262
Likelihood  
“… most likely to be belong to the same family as <person a>?”  
% choosing b in presence of p − % choosing b in presence of q1227
% choosing c in presence of q − % choosing c in presence of p2929
Mean difference2156

4.3  Results

When a quadruple contained citizens a, b, c and p, the expected groupings were a with b and c with p; when it contained a, b, c and q, the expected groupings were a with c and b with q. Consequently, in line with Tversky & Gati (1978), we expected that b would be chosen as the most similar to a more often in the presence of p than of q, and c would be chosen as the most similar to a more often in the presence of q than of p. Accordingly, for likelihood judgments, we expected that b would be chosen as the most likely to belong to the same family as a more often in the presence of p than in the presence of q, and to the same family as c more often in the presence of q than in the presence of p.

As shown in Table 2, the context manipulation affected both the similarity and the likelihood judgments. The difference in the percentage of respondents who chose person b in the presence of p vs. q was significantly different from zero for both types of judgments (for similarity, t(19)=3.42, p=.003; for likelihood, t(19)=5.46, p=.001). The same was the case for the average difference in the percentage of respondents who chose person c in the presence of q vs. p (for similarity, t(19)=12.03, p=.001; for likelihood, t(19)=7.98, p=.001). Average differences were also reliably different from zero (for similarity, t(19)=10.42, p=.001; for likelihood, t(19)=8.13, p=.001).

5  Discussion

We used Tversky’s (1977) contrast model to develop manipulations that are known to affect similarity judgments and tested whether they also influence likelihood judgments. Results of both experiments were in accord with the representativeness heuristic, which holds that judgments of likelihood are affected by the similarity of objects.4 One experiment showed that pairs of enriched objects – those defined with more features — were judged to be both more and less similar, as well as both more and less likely to belong to the same class. The other experiment showed that context affects similarity and likelihood judgments in similar ways.

These results provide support for the assumption that judgments of likelihood are based on similarity of objects, in accord with the studies of Bar-Hillel (1982). Our study is novel in that it is the first to test whether manipulating similarity has comparable effects on both judgments of similarity and judgments of likelihood.

Our results are also in line with those of Nilsson, Olsson & Juslin (2005; see also Nilsson, Juslin & Olsson, 2008). They compared three cognitive mechanisms that could underlie probability judgments: (1) representativeness heuristics – modeled as prototype similarity, relative likelihood, or evidential support accumulation; (2) cue-based relative frequency; and (3) exemplar memory accounts. They found that the mechanism based on exemplar memory outperformed other accounts of probability judgments in a range of tasks. The exemplar-based mechanism differed from the other accounts in that it responded to both the similarity of an event to exemplars from other categories, and to the relative frequency of exemplars from other categories.

The idea that manipulating features of objects can affect both similarity and likelihood judgments has been investigated in the feature-based categorization literature (e.g., Sloman, 1993; Smith & Osherton, 1989). Further studies could investigate whether feature-based models of induction could be applied to the types of tasks investigated in this paper, and enable a more precise understanding of the underlying processes. Furthermore, it would be useful to investigate the relationship between similarity and likelihood judgments using different sets of stimuli, going beyond persons and families, and beyond categorization tasks.

Tenenbaum & Griffiths (2001) analyzed the rational basis of representativeness using a Bayesian approach. One of their findings was that similarity-based models can approximate rational Bayesian models with reasonable accuracy but require much simpler computations. This result is in line with the idea that similarity might be used as a heuristic for probability judgments. Our results, showing that similarity and likelihood judgments track each other seem to be in accord with these suggestions.

Hertwig & Gigerenzer (1999) warned that the way respondents interpret tasks involving probability judgments depends on the overall context. When tasks involve cues that are irrelevant to judgments of likelihood (such as the description of Linda in Tversky & Kahneman, 1983), respondents may use standard conversational norms (Grice, 1989) and infer that the task involves more than simply evaluating mathematical probability of an event. A related issue might be relevant in our study. In our tasks, the information that could have been used to assess the likelihood that any two guests belong to a particular family (i.e., number of families and their members present at the party given at the beginning) was less salient than the information about the similarity of personality traits (given on every page). This is because we tried to emulate the similarity tasks Tversky used as closely as possible. Had we created a context that involved more likelihood cues, we might have obtained different results.

Nilsson et al. (2008), showed that subjective probability depends not only on similarity but also on other factors. Thus, we close by emphasizing that, while we have shown that similarity is an important contributor to likelihood judgments, we are not claiming that the two types of judgments are identical. Clearly, likelihood judgments depend on other factors in addition to similarity, or might we say in addition to representativeness.

References

Bar-Hillel, M. (1982). Studies of representativeness. In D. Kahneman, P. Slovic, and A. Tversky (Eds), Judgment under uncertainty: Heuristics and biases, (pp. 69–83). Cambridge: Cambridge University Press.

Digman, J. M. (1997). Higher-order factors of the Big Five. Journal of Personality and Social Psychology, 73, 1246–1256.

Gigerenzer, G. (1991). How to make cognitive illusions disappear: Beyond heuristics and biases. European Review of Social Psychology 2, 83–115.

Gigerenzer, G. (1996). On narrow norms and vague heuristics: A reply to Kahneman and Tversky. Psychological Review, 103, 592–596.

Goldberg, L. (1990). An alternative “description of personality”: The Big-Five factor structure. Journal of Personality and Social Psychology, 59, 1216–1229.

Grice, H. P. (1989). Studies in the Way of Words. Cambridge, MA: Harvard University Press.

Hertwig, R., & Gigerenzer, G. (1999). The “conjunction fallacy” revisited: How intelligent inferences look like reasoning errors. Journal of Behavioral Decision Making, 12, 275–305.

Hsee, C. K. (1996). The evaluability hypothesis: An explanation for preference reversals between joint and separate evaluations of alternatives. Organizational Behavior and Human Decision Processes, 67, 247–257.

Kahneman, D., Tversky, A. (1972) Subjective probability: A judgment of representativeness. Cognitive Psychology, 3, 430–451.

Kahneman, D., & Tversky, A. (1973). On the Psychology of prediction. Psychological Review, 80, 237–251.

Kahneman, D. & Tversky, A. (1996). On the reality of cognitive illusions. Psychological Review, 103, 582–591.

Kahneman, D. & Frederick, S. (2002). Representativeness revisited: Attribute substitution in intuitive judgment. In T. Gilovich, D. Griffin & D. Kahneman (Eds.), Heuristics and biases: The psychology of intuitive judgment (pp. 49–81). New York: Cambridge University Press.

Koehler, D. J., Brenner, L. A., Liberman, V., & Tversky, A. (1996). Confidence and accuracy in trait inference: Judgment by similarity. Acta Psychologica, 92, 33–57.

Nilsson, H., Juslin, P., & Olsson, H., & (2008). Exemplars in the mist: The cognitive substrate of the representativeness heuristic. Scandinavian Journal of Psychology, 49, 201–212.

Nilsson, H., Olsson, H., & Juslin, P. (2005). The cognitive substrate of subjective probability. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 600–620.

Padgett, J. & Ansell, C. (1993). Robust action and the rise of the Medici, 1400–1434. American Journal of Sociology, 98, 1259–1319.

Read, D., & Grushka-Cockayne, Y. (2011). The similarity heuristic. Journal of Behavioral Decision Making, 24, 23–46.

Shafir, E. (1993). Choosing versus rejecting: Why some options are both better and worse than others. Memory and Cognition, 21, 546–556.

Shepard, R. N. (1987). Toward a universal law of generalization for psychological science. Science, 237, 1317–1323.

Sloman, S. A. (1993). Feature-based induction. Cognitive Psychology, 25, 231–280.

Smith, E. E., & Osherson, D. N. (1989). Similarity and decision making. In S. Vosniadou & A. A. Ortony (Eds.), Similarity and Analogical Reasoning (pp. 60–75). New York: Cambridge University Press.

Tenenbaum, J. B., & Griffiths, T. L. (2001) The rational basis of representativeness. Proceedings of the 23rd Annual Conference of the Cognitive Science Society.

Torgerson, W. S. (1952). Multidimensional scaling: I. Theory and method. Psychometrika, 17, 401–419.

Tversky, A. & Gati, I. (1978). Studies of similarity. In E. Rosch and B. Lloyd (Eds.), Cognition and Categorization (pp. 79–98). Hillsdale, NJ: Erlbaum.

Tversky, A. & Kahneman, D. (1974). Judgment under uncertainty: heuristics and biases. Science, 185, 1124–1131.

Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327–352.

Tversky, A., & Kahneman, D. (1983). Extensional vs. intuitive reasoning: The conjunction fallacy in probability judgment. Psychological Review, 91, 293–315.

Wallsten, T. S. (1983). The theoretical status of judgmental heuristics. In R. W. Scholz (Ed.), Decision making under uncertainty (pp. 21–38). Amsterdam: North Holland Publishing Company.

Appendix: Trait adjectives used as features
Factors & clustersTrait adjectives – features1
Agreeableness 
Positive traits 
Cooperationagreeable, cooperative, peaceful
Moralityhonest, sincere, truthful
Leniencylenient, tolerant, forgiving
Courtesycourteous, polite, tactful
Generositybenevolent, charitable, generous
Flexibilityadaptable, flexible, obliging
Modestyhumble, modest, unassuming
Warmthaffectionate, compassionate, warm
Negative traits 
Belligerenceantagonistic, quarrelsome, combative
Deceitdeceitful, lying, underhanded
Overcriticalnesscritical, faultfinding, harsh
Rudenessimpudent, rude, contemptious
Selfishnessgreedy, selfish, self-indulgent
Stubbornnessbullheaded, obstinate, stubborn
Conceitboastful, conceited, vain
Callousnesscold, callous, aloof
Openness to experience - intellect
Positive traits 
Intellectuality 1thoughtful, meditative, philosophical
Intellectuality 2intellectual, contemplative, introspective
Depthcomplex, deep, profund
Intelligencebright, intelligent, smart
Creativity 1artistic, creative, original
Curiositycurious, inquisitive, inquiring
Sophistication 1cultured, refined, sophisticated
Sophistication 2cosmopolitan, worldly, world-wise
Negative traits 
Unintellectuality 12simple-minded, obtuse, trivial
Unintellectuality 22silly, small-minded, one-dimensional
Shallownessshallow, cursory, superficial
Stupiditydull, ignorant, brainless
Unimaginativeness 12prosaic, arid, conventional
Indifference2indifferent, numb, apathetic
Unsophistication 12coarse, crude, primitive
Unsophistication 22provincial, dogmatic, narrow
1 Trait adjectives marked in italics were the ones used in impoverished conditions in Experiment 1.
2 Because Goldberg’s list (1990; in his Table 3) included more positive than negative trait adjectives related to the factor Openness to experience, we added several negative trait adjectives by finding antonyms for the positive trait adjectives within this factor. We used Merriam-Webster’s dictionary available at www.m-w.com.

*
Joint Program in Survey Methodology, University of Maryland; Center for Adaptive Behavior and Cognition, Max Planck Institute for Human Development, Berlin, Germany; and Santa Fe Institute, New Mexico. Email: galesic@santafe.edu.
#
Department of Psychology, University of Maryland; and Escuela Superior Politécnica del Litoral, ESPAE Graduate School of Management, Guayaquil, Ecuador
$
Department of Psychology, University of Maryland

Copyright: © 2018. The authors license this article under the terms of the Creative Commons Attribution 3.0 License.

1
Note that “prominence” has been used to simply describe objects with more features, without any value-related meaning.
2
We were inspired by Padget and Ansell’s (1993) study of social networks between families living in Florence in 15th century.
3
4
We also attempted to construct a task that would test the effect of another factor that can influence similarity, namely the directionality of comparison. Here, when object a has more features than object b has, the judged similarity of b to a is greater than that of a to b. (Tversky, 1977; Tversky & Gati, 1978). However, using our stimulus set, we were not able to construct items in which one citizen was consistently judged to “stand out more” than the other and therefore we could not create the conditions of the test.

This document was translated from LATEX by HEVEA.