Judgment and Decision
Making, vol. 1, no. 2, November 2006, pp. 146-152.
It must be awful for them: Perspective and task context affects ratings
for health conditions
Heather P. Lacey1
Dylan M. Smith
Peter A. Ubel
When survey respondents rate the quality of life (QoL) associated with a
health condition, they must not only evaluate the health condition
itself, but must also interpret the meaning of the rating scale in
order to assign a specific value. The way that respondents approach
this task depends on subjective interpretations, resulting in
inconsistent results across populations and tasks. In particular,
patients and non-patients often give very different ratings to health
conditions, a discrepancy that raises questions about the objectivity
of either groups' evaluations. In this study, we found that the
perspective of the raters (i.e., their own current health relative to
the health conditions they rated) influences the way they distinguish
between different health states that vary in severity. Consistent with
prospect theory, a mild and a severe lung disease scenario were rated
quite differently by lung disease patients whose own health falls
between the two scenarios, whereas healthy non-patients, whose own
health was better than both scenarios, rated the two scenarios as much
more similar. In addition, we found that the context of the rating
task influences the way participants distinguish between mild and
severe scenarios. Both patients and non-patients gave less distinct
ratings to the two scenarios when each were presented in isolation than
when they were presented alongside other scenarios that provided
contextual information about the possible range of severity for lung
disease. These results raise continuing concerns about the reliability
and validity of subjective QoL ratings, as these ratings are highly
sensitive to differences between respondent groups and the particulars
of the rating task.
Keywords: quality of life, health state measurement, prospect
theory, medical decision making.
Imagine a patient who suffers from lung disease. She suffers shortness
of breath only during heavy physical activity, such as jogging for
three blocks. On a scale of 0 to 100, what is her quality of life
like? And how does her quality of life compare to that of a more
severely ill patient, someone who suffers shortness of breath even in a
Figure 1: Predicted differences in quality of life ratings for mild and
severe lung disease scenarios for patients and non-patients, based on
Kahneman & Tversky's (1979) prospect theory.
A respondent in a health survey may find it extremely difficult to come
up with a rating for a health description like this. Surely the first
person is much healthier than the second, but how much healthier? And
how different would their quality of life be? 20 points? 50? 80?
The specific numbers may seem quite arbitrary.
In order to rate health conditions, survey respondents must not only
evaluate how good or bad a condition is, they must then decide how to
translate that evaluation into a specific value on an unfamiliar rating
scale. Because such tasks are subject to individual interpretation,
the specific values assigned to a given health state may depend on who
is doing the rating and the circumstances of the rating task, leaving
much confusion for researchers and policy makers trying to make sense
of the results.
1.1 Personal perspective in health ratings
The uncertainty of health ratings is evident in the differences
often observed between patients' and non-patients' ratings of health
conditions (Boyd et al., 1990; Brickman et al., 1978; Hurst et al.,
1994; Riis et al., 2005; Sackett & Torrance, 1978; Schultz & Decker,
1985; Sieff et al., 1999; Smith et al., in press). Patients
typically rate their condition higher than non-patients, so
explanations for the discrepancy often focus either on patients
overvaluing their health condition or non-patients undervaluing it.
However, the discrepancy between patients' and non-patients'
ratings may actually reflect more complex perspective differences
than a straightforward under- or over-valuing of health
conditions by either group. Kahneman and Tversky's (1979)
prospect theory suggests that an individual's reference point is
critical in determining how he or she evaluates a given state.
As gains or losses become more distant from the status quo, they
have a diminishing effect on utility. In the case of health,
small changes in health should produce a relatively steep change
in quality of life (QoL), with proportionally smaller impact from
larger changes in health.
Because patients and non-patients have a different status-quo reference
point, they should have different perceptions of the same health
condition. For a patient suffering from a moderately severe case of
lung disease, a milder case of the same disease would represent a gain
in health generating a steep improvement in QoL, whereas a severe case
of lung disease would represent a loss in health with a steep cost in
QoL. By contrast, for a person in full health with no lung disease,
both mild and severe cases of lung disease would represent a loss in
health. Because increasing losses have a diminishing impact, the mild
case would have a proportionally larger cost in QoL than the more
As Figure 1 illustrates, the gain and loss framing and the
diminishing-return characteristic of prospect theory predicts that
patients may actually give worse ratings to severe conditions, and that
patients should perceive a greater QoL difference between mild and
severe health conditions than do non-patients. If so, it may be too
simplistic to say that patients overvalue, or non-patients undervalue,
the health condition
1.2 Context in health ratings
Another issue that may complicate interpretation of health state
evaluations is that ratings may depend on the task context. When
rating single items in isolation with no context about how it compares
to alternatives, respondents tend to give noncommittal ratings
somewhere in the middle of the scale, arguably to leave room on either
side for unknown future items (Haubensak, 1992). However, when
multiple items are rated, respondents tend to spread the items somewhat
evenly across the rating scale (Parducci, 1963), essentially using the
items themselves to impose meaning onto the rating scale. These
strategies suggest that people may be attending more to the relative
position of the items than to the specific values associated with the
scale. The evaluability hypothesis (Hsee et al., 1999) suggests that
respondents may draw heavily on such inter-item comparisons,
particularly when the relevant attributes for judgment are unfamiliar
or difficult to evaluate.
A rating task that presents multiple items simultaneously allows
respondents to take relative positioning into account when assigning
values to each item. Rather than dropping items somewhere in the
middle for lack of more information, respondents can use the relative
comparison between items to decide how to place the items on the scale.
1.3 Testing for perspective and context effects
This study looks at how patients and non-patients rate descriptions of
health conditions that differ in severity. We asked lung disease
patients and healthy non-patients to evaluate the quality of life (QoL)
for several scenarios describing different severity levels in lung
disease, ranging from mild to severe. Based on prospect theory, we
predicted that patients QoL ratings should not be uniformly higher than
non-patients' ratings for all of the lung disease scenarios. Rather,
we predicted that, because most patients' status quo position lies
between the mildest and most severe scenarios, they should perceive a
wide distinction between these two scenarios. Because non-patients
view both scenarios as a loss, they should perceive a much smaller gap
between them. The difference in ratings between the mild and severe
scenarios should be larger for the patients than for the non-patients.
In addition, this study looks at the effect of multiple-item context on
both patients' and non-patients' ratings. Some of our participants
rated only a single lung disease scenario in isolation, a condition we
called the "No Context" condition because no information was provided
about the relative severity of the scenario compared to other possible
cases. Other participants rated multiple scenarios presented together,
each describing a different level of severity. We term this the
"Context" condition because the task places each scenario within a
broader context that conveys the severity of the scenario relative to
We predicted that items rated in the No Context condition should be
grouped closer to the center of the rating scale, with relatively small
differences between the mild and severe scenarios. By contrast, items
rated in the Context condition should receive more distinct ratings,
with a greater difference between mild and severe conditions. We also
predicted a greater effect of the rating context for patients than for
non-patients. By virtue of their own experience, patients should bring
some implicit context to the task that is largely unavailable to
non-patients. Patients are more likely to know something about the
possible range of severity than do non-patients. Even when severity
context is not provided explicitly by the task, we anticipated that
patients would be able to draw on that information and make those
comparisons on their own, attenuating the effect of the explicit
information provided in the context condition.
Lung disease patients. Patient participants were recruited
from a list of 310 potential participants who met eligibility criteria
based on administrative records of the University of Pennsylvania
Health System. Eligible participants had received a diagnosis of
chronic bronchitis or emphysema (as designated by the ICD-9 codes of
491*, 492*, or 496*) and had been seen more than once in a pulmonary
clinic between January 1, 2001 and January 1, 2002. Potential
participants received the survey in the mail with a cover letter
describing the purpose of the study. No financial incentive was
offered. If patients did not return the survey within 3 weeks, they
were sent another copy of the survey. Of the 310 lung disease patients
identified as potential participants, 10 were deceased, 11 could not be
reached due to incorrect addresses, and 2 stated that they did not have
lung disease. Excluding these patients, the response rate was 55%
(N = 159).
Participants ranged in age from 23 to 90 years
(M = 67.5, SD = 11.3). Most
participants were Caucasian (74%) or African American (23%), with
slightly more females (54%) than males. Years of education ranged
from 8 to 21 (M = 13.6, SD = 3.1). Sixty-five
percent of participants indicated that they had emphysema, 17% had
chronic bronchitis, and 29% had asthma. Patients' reported their own
QoL as 56.9, on average (SD = 22.8). In comparing their own
health to our five lung scenarios (Appendix A), 49.6% rated their own
health as better than the middle scenario, Scenario C, and 50.4% rated
their own health as being as bad or worse than this scenario. Only
11% described their health as "excellent" or "very good," while
38% described it as "good," 34% described it as "fair," and 16%
described it as "poor." None of these self-rated health measures was
significantly related to the outcome variables of interest, the QoL
ratings for the mild or severe lung disease scenario.
Healthy participants. Healthy participants were recruited from
a pool of prospective jurors at the Philadelphia County Courthouse. In
Philadelphia County, prospective jurors are selected from voter
registration and drivers license records. Surveys were distributed to
interested jurors after announcing to all prospective jurors that those
who filled out a survey would receive a candy bar.
Among the prospective jurors, 240 volunteers completed the survey.
Participants were asked in the survey whether they had any personal
experience with lung disease, and only those who indicated no such
experience (N = 196) were included for analysis in this study.
Among these, participants ranged in age from 18 to 83 years
(M = 39.9, SD = 13.1), and were
predominantly Caucasian (50%) or African American (43%), with more
females (69%) than males. Years of education ranged from 9 to 21
(M = 14.4, SD = 2.5).
The patient and non-patient samples were significantly different on
several demographic dimensions. The non-patient group was
significantly younger and more educated than the patient group and
included significantly more women, more African Americans, and fewer
Caucasians than the patient group. However, of these variables only
one, age, was significantly related to one of the outcome variables of
interest, QoL for the mild scenario. The pattern of results was
unchanged when these demographic variables were included as covariates
in analyses comparing patients and non-patients.
2.2 Survey materials and procedures
Survey materials included scenarios describing lung conditions with
different levels of severity (See Appendix 1 for all scenarios). Each
lung condition scenario described the level of activity that would
cause a person with that condition to become short of breath. For
example, the scenario for the most severe lung condition stated, "This
person has a lung condition that causes him to become short of breath
even when in a resting state. In other words, he is short of breath
just sitting in a chair. Occasionally, his shortness of breath
interferes with his sleep." Participants were asked to provide QoL
estimates on a scale from 0 (as bad as death) to 100
(perfect health) for one or more lung disease scenario.
Participants were randomly assigned to either the Context condition, or
the No Context condition. In the Context condition, participants read
and rated five different scenarios, presented in order from least
severe to most severe.2 In the
No Context condition, participants read and rated only one scenario,
either the least severe (shortness of breath only after extreme
exertion) or the most severe (shortness of breath in a resting
state), and were provided with no information about other possible
scenarios or the relative severity of the condition. Participants in
the No Context condition were randomly assigned to either the mild or
the severe survey version. Participants were first given instructions
for the task and were given one or five scenarios to read over first,
then given the scenario(s) a second time to rate.
Patient participants also received several items addressing their own
health, including, 1) Current QoL: patients rated their own
QoL using the using same 0 to 100 scale used for scenario ratings,
2) Current lung disease description: patients saw the
same five lung disease scenarios used as contextual severity
information in the Context condition (see Appendix 1), and were asked
to identify which of the five was most similar to their own lung
condition. Patients selected one of 7 response options (better than
scenario A, about the same as scenario A, B,C,D, or E, or worse than
Scenario E), 3) SF-1general health evaluation (Ware &
Sherbourne, 1992): patients categorized their own general health as
excellent, very good, good, fair, or poor.
Finally, all participants were asked for demographic information,
including age, gender, race, and educational background. Healthy
participants were also asked whether they had personal experience with
Figure 2: Patients' and non-patients' quality of life ratings for mild
and severe lung disease scenarios, presented alone or in the context of
3.1 Comparing patients' and non-patients' ratings
We hypothesized that non-patients would distinguish less
between mild and severe scenarios than patients. Consistent with this
hypothesis, the difference in ratings for the mild and severe scenarios
in the No Context condition was only 16 points for healthy
non-patients, versus 29 points for patients. Mean QoL ratings for the
mild (M = 54.9, SD = 17.4) and severe (M =
39.1, SD = 20.1) scenarios were significantly different for
both healthy participants, t(127) = 4.66, p
.001, and for patients (M = 70.3, SD =
20.9 for mild, M = 41.6, SD = 25.3 for
severe), t(97) = 6.13, p .001, but the
effect was significantly larger for patients, F(1, 224) =
5.23, p =.02, h = .02.
3.2 Comparing Context and No Context rating tasks
We hypothesized that participants would distinguish more between mild
and severe lung disease scenarios in the Context condition, where
multiple scenarios were presented together to provide contextual
information about relative severity. As predicted, the contextual
information increased the difference in ratings between mild and severe
scenarios. Collapsing across the two participant groups, the
difference in ratings increased from 21 points in the No Context
condition to 54 points in the Context condition, t(332) =
7.49, p .001. Mean QoL ratings for the mild
scenario were significantly higher in the Context condition (M
= 61.48, SD = 20.39) than in the No Context condition
(M = 69.89, SD = 23.61), t(220) = 2.84,
p = .005. Conversely, mean QoL ratings for the severe
scenario were significantly lower in the Context condition (M
= 21.3, SD = 22.87) than in the No Context condition
(M = 53.90, SD = 23.34), t(227) = 7.92,
3.3 Comparing context effects for patients and non-patients
We hypothesized that the introduction of context information would
affect non-patients' ratings more than patients' ratings, because
patients should have some implicit context information about their own
disease, even in the No Context condition. Contrary to this
hypothesis, there was a non-significant trend toward a larger
context effect for patients than for non-patients. Figure 2 shows
that, for non-patients, the difference between mild and severe ratings
grew from 16 points in the No Context condition to 45 points in the
Context condition, whereas for non-patients, the difference grew from
29 points to 67 points, t(332) = 1.01, p = .31. The
effect of context was significant for both patients, t(332) =
5.59, p .001, and for non-patients,
t(332) = 4.98, p .001.
To summarize we found that patients give more distinct ratings to mild
and severe health state scenarios than do non-patients, consistent with
prospect theory. We also found that both patients and non-patients
give more distinct ratings to mild and severe scenarios when multiple
scenarios are presented together, providing contextual information
about the health state and the range of severity associated with the
condition. Finally, we expected non-patients' ratings to be affected
more by context than non-patients' ratings, but this prediction did not
bear out. The effect of context was not significantly different for
the two groups, and in fact, there was a non-significant trend toward a
larger effect of context for the patient group.
Because there is no way to objectively measure quality of life,
researchers working to understand how health influences quality of life
are forced to rely on subjective judgments. By their nature, these
judgments are based on personal interpretation, making it difficult to
compare judgments across individuals, across groups, or across
Previous studies have demonstrated that personal health history
influences judgments of health conditions, with patients typically
giving more positive ratings to health conditions than non-patients.
This study provides evidence that this patient vs. non-patient
discrepancy is not unidirectional; lung disease patients in this study
rated severe conditions more negatively than did non-patients.
The way patients and non-patients in this study distinguished between
mild and severe conditions was consistent with prospect theory
(Kahneman & Tversky, 1979). Almost all of the respondents in our
patient group rated their health somewhere between the most severe and
the least severe lung disease. From this perspective, the mild
scenario looks like a dramatic improvement and the severe scenario
looks like a dramatic drop, spreading the two scenarios relatively far
apart on a QoL scale. For non-patients, both conditions are a loss in
health, with the most dramatic cost in QoL associated with the initial
drop to the mild condition, placing the two scenarios relatively close
together on the QoL scale. In the No Context condition, our healthy
participants estimated only a 16 point difference (out of 100) in
quality of life between a patient who experiences shortness of breath
while resting in a chair--an extremely severe degree of lung disease
suffered by only about 5% of our patient population--and a patient
who suffers shortness of breath only after jogging three blocks, a
level of fitness that likely exceeds that of most Americans. Our
patient participants estimated a larger 29 point difference between
these same two scenarios.
This study also explored how ratings are affected by the rating task
itself, specifically, the context in which a scenario is presented.
Hsee and colleagues (1979) found that ratings made in isolation differ
from ratings made alongside other items, particularly when the items
are difficult to evaluate. Multiple items presented together provide
information about possible alternatives, helping raters understand
whether a given item is good or bad, big or small, a lot or a little.
In the case of our lung disease scenarios, the evaluability of lung
disease severity should not have been especially poor. Rather than
using unfamiliar measurement units to describe severity, such as
providing some metric of lung capacity, the scenarios specified
familiar types of physical activity that would cause shortness of
breath. Nevertheless, contextual information influenced ratings,
despite these intuitive descriptions of lung disease severity, arguably
providing useful information about the range of severity that can be
expected for the disease. Across the two respondent groups, ratings
for the mild and severe conditions were more distinct when made in the
context of multiple scenarios, with a 21 point difference in the No
Context condition and a 54 point difference in the Context condition.
The results of this study did pose one surprise. We anticipated that
non-patients would be more affected by context than non-patients. If
context helps participants evaluate the conditions by providing
information about the range of alternatives, then patients should be
less affected by this additional information, as their own experience
should provide some information about the range even in the No Context
condition. We found no evidence of an attenuated context effect for
patients. If anything, patients showed a slightly larger effect of
context, though the interaction of group and context was
Why were lung disease patients influenced by contextual information as
much or more than non-patients? One possibility is that, while
patients have a good deal of information about lung disease and the
range of severity associated with it, they may not always access this
information when evaluating the lung disease scenarios. Judgments are
highly influenced by whatever information is most active and accessible
in memory (Tversky & Kahneman, 1973). When patients in the Context
condition were cued to think about the range of severity for lung
disease, this range should have become a highly active feature of the
disease that strongly influenced judgments, whereas severity range
might have been only one of many features that came to mind for
patients in the No Context condition who were not explicitly cued to
think about it.
Over the last several years, the positive psychology movement has
inspired more researchers to investigate the factors that influence
well-being and the mechanisms behind people's remarkable capacity to
adapt to adverse circumstances. A continuing concern about this
research emerges from the subjective nature of the available measures
of happiness, quality of life, and related constructs. Conclusions
about what does or does not influence well-being rely on subjective
self-reports, reports that are often malleable.
In the health domain, another concern arises from the application of
health-related quality of life data to cost-effectiveness analyses.
The discrepancy between patients' and non-patients' QoL ratings has led
to some discussion in the literature as to whether health care analyses
ought to incorporate evaluations made by patients or by the general
public (Boyd et al., 1990; Dolan, 1996; Gold et al., 1996; Ubel,
Loewenstein, & Jepson, 2003). This question is further complicated by
the evidence presented here. The discrepancy in ratings can not be
easily characterized as an overestimation by patients or an
underestimation by non-patients. Rather, ratings seem to depend on the
relative position of the rater and the health condition in question.
Because both patients' and non-patients' ratings are remarkably
malleable, dramatically influenced by the context in which scenarios
were rated, this study can not resolve the question of whose ratings
are more accurate or more reliable in evaluating health states.
Rather, this study highlights the difficulty of comparing the two
groups or of drawing conclusions about whose evaluations are
These results suggest that researchers should take great care and
consider the details of the rating task when soliciting QoL estimates.
Whether the research goals are a theoretical understanding of
well-being or an applied effort to improve quality of life, we must
exercise caution in making conclusions based on subjective reports.
Boyd, N. F. Sutherland, H. J., Heasman, K. Z., Tritchler, D. L., &
Cummings, B. J. (1990). Whose utilities for decision analysis?
Medical Decision Making, 10, 58-67.
Brickman, P., Coates, D., & Janoff-Bulman, R. (1978). Lottery winners
and accident victims: Is happiness relative? Journal of
Personality & Social Psychology, 36, 917-927.
Dolan, P. (1996). The effect of experience of illness on health state
valuations. Journal of Clinical Epidemiology, 49, 551-564.
Gold, M. R., Siegel, J. E., Russell, L. B., & Weinstein, M. (1996).
Cost-effectiveness in health and medicine. New York: Oxford
Hurst, N. P., Jobanputra, P. Hunter, M. Lambert, C. M., Lockhead, A., &
Brown, H. (1994). Validity of EuroQoL--a generic health status
instrument in patients with rheumatoid arthritis. British
Journal of Rheumatology, 33, 655-62.
Haubensak, G. (1992). The consistency model: A process for absolute
judgments. Journal of Experimental Psychology: Human
Perception and Performance, 18, 303-309.
Hsee, C. K., Loewenstein, G. F., Blount, S. & Bazerman, M. H. (1999).
Preference reversals between joint and separate evaluations of options:
A review and theoretical analysis. Psychological Bulletin, 125
Kahneman, D. & Tversky, A. (1979). Prospect theory: An analysis of
decision under risk. Econometrica, 47, 263-291.
Parducci, A. (1963) Range-frequency compromise in judgment.
Psychological Monographs, 77(92, Whole No. 565).
Riis, J., Loewenstein, G., Baron, J., Jepson, C., Fagerlin, A., & Ubel,
P. A. (2005). Ignorance of hedonic adaptation to Hemo-Dialysis: A study
using ecological momentary assessment. Journal of Experimental
Psychology, General, 134, 3-9.
Sackett, D. L. & Torrance, G. W. (1978). The utility of different health
states as perceived by the general public. Journal of Chronic
Diseases, 31, 697-704.
Schulz, R. & Decker, S. (1985). Long-term adjustment to physical
disability: the role of social support, perceived control, and
self-blame. Journal of Personality and Social Psychology, 48,
Sieff, E. M., Dawes, R. M., & Loewenstein, G. (1999). Anticipated
versus actual reaction to HIV test results. American Journal of
Psychology, 112, 297-311.
Smith, D., Sherriff, R. L.,Damschroder, L., Loewenstein, G., &
Ubel, P. A. (In press). Former patients give lower utility
ratings for colostomy than do current patients: Evidence for
theory driven recall bias. Health Psychology.
Tversky, A. & Kahneman, D. (1973). Availability: A heuristic for
judging frequency and probability. Cognitive Psychology, 5,
Ubel, P. A., Loewenstein, G., & Jepson, C. (2003). Whose quality of
life? A commentary exploring discrepancies between health state
evaluations of patients and the general public. Quality of Life
Research, 12, 599-607.
Ware, J. E. & Sherbourne, C. D. (1992). The MOS 36-item short form health
survey (SF-36). I. Conceptual framework and item selection.
Medical Care, 30, 473-483.
4.1.1 Scenario A
This person has a lung condition that causes him to become short of
breath only after extreme exertion, like jogging 3 blocks, carrying a
heavy basket of laundry up two flights of stairs, or shoveling snow for
4.1.2 Scenario B
This person has a lung condition that causes him to become short of
breath after walking briskly for 2 blocks or walking up one flight of
4.1.3 Scenario C
This person has a lung condition that causes him to become short of
breath after walking slowly for 1 block. He must rest while walking up
a flight of stairs.
4.1.4 Scenario D
This person has a lung condition that causes him to become short of
breath after walking across a room. He is unable to walk up stairs.
4.1.5 Scenario E
This person has a lung condition that causes him to become short of
breath even when in a resting state. In other words, he is short of
breath just sitting in a chair. Occasionally, his shortness of breath
interferes with his sleep.
This work was funded by R01HD040789, R01HD038963. Heather
P. Lacey was supported by a HSR&D post-doctoral fellowship from
the Department of Veterans Affairs, and Angela Fagerlin and Dylan
M. Smith were supported by MREP early career development awards
from the Department of Veterans Affairs.
Address: Department of Applied Psychology,
1150 Douglas Pike,
Smithfield, RI 02864,
ANY GRANT SUPPORT?!!!
2Two additional survey variations were
given to additional healthy participants, but due to a limited sample
size for patients, these versions were not given to patient
participants, so patient vs. healthy participant comparisons are not
possible for these conditions. The first of these variations was
similar to the Context condition in that participants read all five
scenarios (see Appendix 1), but different in that patients were only
asked to rate one of the five. This variation was introduced to test
whether any effects of Rating Context could be attributed strictly to
scaling range-frequency scale usage effects, with participants
spreading their ratings evenly across the ratings scale (e.g., Parducci,
1963). Results in this condition closely approximated those found in
the Context condition reported in this study, suggesting that the
Rating Context effects occur even when only a single item is rated.
The second variation was a reverse ordering of the scenarios in the
Context condition, with scenarios presented from most severe to most
mild. Presentation order did not affect ratings for any of the
scenarios. Since neither of these manipulations affected the results
for healthy participants, they were omitted from the design in order to
maximize the limited patient sample in other conditions.
File translated from
On 16 Nov 2006, 10:26.