Judgment and Decision Making, vol. 1, no. 2, November 2006, pp. 146-152.

It must be awful for them: Perspective and task context affects ratings for health conditions

Heather P. Lacey¹

Angela Fagerlin

George Loewenstein

Dylan M. Smith

Jason Riis

Peter A. Ubel

Abstract

When survey respondents rate the quality of life (QoL) associated with a health condition, they must not only evaluate the health condition itself, but must also interpret the meaning of the rating scale in order to assign a specific value. The way that respondents approach this task depends on subjective interpretations, resulting in inconsistent results across populations and tasks. In particular, patients and non-patients often give very different ratings to health conditions, a discrepancy that raises questions about the objectivity of either groups' evaluations. In this study, we found that the perspective of the raters (i.e., their own current health relative to the health conditions they rated) influences the way they distinguish between different health states that vary in severity. Consistent with prospect theory, a mild and a severe lung disease scenario were rated quite differently by lung disease patients whose own health falls between the two scenarios, whereas healthy non-patients, whose own health was better than both scenarios, rated the two scenarios as much more similar. In addition, we found that the context of the rating task influences the way participants distinguish between mild and severe scenarios. Both patients and non-patients gave less distinct ratings to the two scenarios when each were presented in isolation than when they were presented alongside other scenarios that provided contextual information about the possible range of severity for lung disease. These results raise continuing concerns about the reliability and validity of subjective QoL ratings, as these ratings are highly sensitive to differences between respondent groups and the particulars of the rating task.

Keywords: quality of life, health state measurement, prospect theory, medical decision making.

1 Introduction

Imagine a patient who suffers from lung disease. She suffers shortness of breath only during heavy physical activity, such as jogging for three blocks. On a scale of 0 to 100, what is her quality of life like? And how does her quality of life compare to that of a more severely ill patient, someone who suffers shortness of breath even in a resting state?

Figure 1: Predicted differences in quality of life ratings for mild and severe lung disease scenarios for patients and non-patients, based on Kahneman & Tversky's (1979) prospect theory.

A respondent in a health survey may find it extremely difficult to come up with a rating for a health description like this. Surely the first person is much healthier than the second, but how much healthier? And how different would their quality of life be? 20 points? 50? 80? The specific numbers may seem quite arbitrary.

In order to rate health conditions, survey respondents must not only evaluate how good or bad a condition is, they must then decide how to translate that evaluation into a specific value on an unfamiliar rating scale. Because such tasks are subject to individual interpretation, the specific values assigned to a given health state may depend on who is doing the rating and the circumstances of the rating task, leaving much confusion for researchers and policy makers trying to make sense of the results.

1.1 Personal perspective in health ratings

The uncertainty of health ratings is evident in the differences often observed between patients' and non-patients' ratings of health conditions (Boyd et al., 1990; Brickman et al., 1978; Hurst et al., 1994; Riis et al., 2005; Sackett & Torrance, 1978; Schultz & Decker, 1985; Sieff et al., 1999; Smith et al., in press). Patients typically rate their condition higher than non-patients, so explanations for the discrepancy often focus either on patients overvaluing their health condition or non-patients undervaluing it.

However, the discrepancy between patients' and non-patients' ratings may actually reflect more complex perspective differences than a straightforward under- or over-valuing of health conditions by either group. Kahneman and Tversky's (1979) prospect theory suggests that an individual's reference point is critical in determining how he or she evaluates a given state. As gains or losses become more distant from the status quo, they have a diminishing effect on utility. In the case of health, small changes in health should produce a relatively steep change in quality of life (QoL), with proportionally smaller impact from larger changes in health.

Because patients and non-patients have a different status-quo reference point, they should have different perceptions of the same health condition. For a patient suffering from a moderately severe case of lung disease, a milder case of the same disease would represent a gain in health generating a steep improvement in QoL, whereas a severe case of lung disease would represent a loss in health with a steep cost in QoL. By contrast, for a person in full health with no lung disease, both mild and severe cases of lung disease would represent a loss in health. Because increasing losses have a diminishing impact, the mild case would have a proportionally larger cost in QoL than the more severe case.

As Figure 1 illustrates, the gain and loss framing and the diminishing-return characteristic of prospect theory predicts that patients may actually give worse ratings to severe conditions, and that patients should perceive a greater QoL difference between mild and severe health conditions than do non-patients. If so, it may be too simplistic to say that patients overvalue, or non-patients undervalue, the health condition

1.2 Context in health ratings

Another issue that may complicate interpretation of health state evaluations is that ratings may depend on the task context. When rating single items in isolation with no context about how it compares to alternatives, respondents tend to give noncommittal ratings somewhere in the middle of the scale, arguably to leave room on either side for unknown future items (Haubensak, 1992). However, when multiple items are rated, respondents tend to spread the items somewhat evenly across the rating scale (Parducci, 1963), essentially using the items themselves to impose meaning onto the rating scale. These strategies suggest that people may be attending more to the relative position of the items than to the specific values associated with the scale. The evaluability hypothesis (Hsee et al., 1999) suggests that respondents may draw heavily on such inter-item comparisons, particularly when the relevant attributes for judgment are unfamiliar or difficult to evaluate.

A rating task that presents multiple items simultaneously allows respondents to take relative positioning into account when assigning values to each item. Rather than dropping items somewhere in the middle for lack of more information, respondents can use the relative comparison between items to decide how to place the items on the scale.

1.3 Testing for perspective and context effects

This study looks at how patients and non-patients rate descriptions of health conditions that differ in severity. We asked lung disease patients and healthy non-patients to evaluate the quality of life (QoL) for several scenarios describing different severity levels in lung disease, ranging from mild to severe. Based on prospect theory, we predicted that patients QoL ratings should not be uniformly higher than non-patients' ratings for all of the lung disease scenarios. Rather, we predicted that, because most patients' status quo position lies between the mildest and most severe scenarios, they should perceive a wide distinction between these two scenarios. Because non-patients view both scenarios as a loss, they should perceive a much smaller gap between them. The difference in ratings between the mild and severe scenarios should be larger for the patients than for the non-patients.

In addition, this study looks at the effect of multiple-item context on both patients' and non-patients' ratings. Some of our participants rated only a single lung disease scenario in isolation, a condition we called the "No Context" condition because no information was provided about the relative severity of the scenario compared to other possible cases. Other participants rated multiple scenarios presented together, each describing a different level of severity. We term this the "Context" condition because the task places each scenario within a broader context that conveys the severity of the scenario relative to other cases.

We predicted that items rated in the No Context condition should be grouped closer to the center of the rating scale, with relatively small differences between the mild and severe scenarios. By contrast, items rated in the Context condition should receive more distinct ratings, with a greater difference between mild and severe conditions. We also predicted a greater effect of the rating context for patients than for non-patients. By virtue of their own experience, patients should bring some implicit context to the task that is largely unavailable to non-patients. Patients are more likely to know something about the possible range of severity than do non-patients. Even when severity context is not provided explicitly by the task, we anticipated that patients would be able to draw on that information and make those comparisons on their own, attenuating the effect of the explicit information provided in the context condition.

2 Method

2.1 Participants

Lung disease patients. Patient participants were recruited from a list of 310 potential participants who met eligibility criteria based on administrative records of the University of Pennsylvania Health System. Eligible participants had received a diagnosis of chronic bronchitis or emphysema (as designated by the ICD-9 codes of 491*, 492*, or 496*) and had been seen more than once in a pulmonary clinic between January 1, 2001 and January 1, 2002. Potential participants received the survey in the mail with a cover letter describing the purpose of the study. No financial incentive was offered. If patients did not return the survey within 3 weeks, they were sent another copy of the survey. Of the 310 lung disease patients identified as potential participants, 10 were deceased, 11 could not be reached due to incorrect addresses, and 2 stated that they did not have lung disease. Excluding these patients, the response rate was 55% (N = 159).

Participants ranged in age from 23 to 90 years (M= 67.5, SD = 11.3). Most participants were Caucasian (74%) or African American (23%), with slightly more females (54%) than males. Years of education ranged from 8 to 21 (M = 13.6, SD = 3.1). Sixty-five percent of participants indicated that they had emphysema, 17% had chronic bronchitis, and 29% had asthma. Patients' reported their own QoL as 56.9, on average (SD = 22.8). In comparing their own health to our five lung scenarios (Appendix A), 49.6% rated their own health as better than the middle scenario, Scenario C, and 50.4% rated their own health as being as bad or worse than this scenario. Only 11% described their health as "excellent" or "very good," while 38% described it as "good," 34% described it as "fair," and 16% described it as "poor." None of these self-rated health measures was significantly related to the outcome variables of interest, the QoL ratings for the mild or severe lung disease scenario.

Healthy participants. Healthy participants were recruited from a pool of prospective jurors at the Philadelphia County Courthouse. In Philadelphia County, prospective jurors are selected from voter registration and drivers license records. Surveys were distributed to interested jurors after announcing to all prospective jurors that those who filled out a survey would receive a candy bar.

Among the prospective jurors, 240 volunteers completed the survey. Participants were asked in the survey whether they had any personal experience with lung disease, and only those who indicated no such experience (N = 196) were included for analysis in this study. Among these, participants ranged in age from 18 to 83 years (M= 39.9, SD = 13.1), and were predominantly Caucasian (50%) or African American (43%), with more females (69%) than males. Years of education ranged from 9 to 21 (M = 14.4, SD = 2.5).

The patient and non-patient samples were significantly different on several demographic dimensions. The non-patient group was significantly younger and more educated than the patient group and included significantly more women, more African Americans, and fewer Caucasians than the patient group. However, of these variables only one, age, was significantly related to one of the outcome variables of interest, QoL for the mild scenario. The pattern of results was unchanged when these demographic variables were included as covariates in analyses comparing patients and non-patients.

2.2 Survey materials and procedures

Survey materials included scenarios describing lung conditions with different levels of severity (See Appendix 1 for all scenarios). Each lung condition scenario described the level of activity that would cause a person with that condition to become short of breath. For example, the scenario for the most severe lung condition stated, "This person has a lung condition that causes him to become short of breath even when in a resting state. In other words, he is short of breath just sitting in a chair. Occasionally, his shortness of breath interferes with his sleep." Participants were asked to provide QoL estimates on a scale from 0 (as bad as death) to 100 (perfect health) for one or more lung disease scenario.

Participants were randomly assigned to either the Context condition, or the No Context condition. In the Context condition, participants read and rated five different scenarios, presented in order from least severe to most severe.² In the No Context condition, participants read and rated only one scenario, either the least severe (shortness of breath only after extreme exertion) or the most severe (shortness of breath in a resting state), and were provided with no information about other possible scenarios or the relative severity of the condition. Participants in the No Context condition were randomly assigned to either the mild or the severe survey version. Participants were first given instructions for the task and were given one or five scenarios to read over first, then given the scenario(s) a second time to rate.

Patient participants also received several items addressing their own health, including, 1) Current QoL: patients rated their own QoL using the using same 0 to 100 scale used for scenario ratings, 2) Current lung disease description: patients saw the same five lung disease scenarios used as contextual severity information in the Context condition (see Appendix 1), and were asked to identify which of the five was most similar to their own lung condition. Patients selected one of 7 response options (better than scenario A, about the same as scenario A, B,C,D, or E, or worse than Scenario E), 3) SF-1general health evaluation (Ware & Sherbourne, 1992): patients categorized their own general health as excellent, very good, good, fair, or poor.

Finally, all participants were asked for demographic information, including age, gender, race, and educational background. Healthy participants were also asked whether they had personal experience with lung disease.

Figure 2: Patients' and non-patients' quality of life ratings for mild and severe lung disease scenarios, presented alone or in the context of other scenarios.

3 Results

3.1 Comparing patients' and non-patients' ratings

We hypothesized that non-patients would distinguish less between mild and severe scenarios than patients. Consistent with this hypothesis, the difference in ratings for the mild and severe scenarios in the No Context condition was only 16 points for healthy non-patients, versus 29 points for patients. Mean QoL ratings for the mild (M = 54.9, SD = 17.4) and severe (M = 39.1, SD = 20.1) scenarios were significantly different for both healthy participants, t(127) = 4.66, p .001, and for patients (M = 70.3, SD = 20.9 for mild, M = 41.6, SD = 25.3 for severe), t(97) = 6.13, p .001, but the effect was significantly larger for patients, F(1, 224) = 5.23, p =.02, h = .02.

3.2 Comparing Context and No Context rating tasks

We hypothesized that participants would distinguish more between mild and severe lung disease scenarios in the Context condition, where multiple scenarios were presented together to provide contextual information about relative severity. As predicted, the contextual information increased the difference in ratings between mild and severe scenarios. Collapsing across the two participant groups, the difference in ratings increased from 21 points in the No Context condition to 54 points in the Context condition, t(332) = 7.49, p .001. Mean QoL ratings for the mild scenario were significantly higher in the Context condition (M = 61.48, SD = 20.39) than in the No Context condition (M = 69.89, SD = 23.61), t(220) = 2.84, p = .005. Conversely, mean QoL ratings for the severe scenario were significantly lower in the Context condition (M = 21.3, SD = 22.87) than in the No Context condition (M = 53.90, SD = 23.34), t(227) = 7.92, p .001.

3.3 Comparing context effects for patients and non-patients

We hypothesized that the introduction of context information would affect non-patients' ratings more than patients' ratings, because patients should have some implicit context information about their own disease, even in the No Context condition. Contrary to this hypothesis, there was a non-significant trend toward a larger context effect for patients than for non-patients. Figure 2 shows that, for non-patients, the difference between mild and severe ratings grew from 16 points in the No Context condition to 45 points in the Context condition, whereas for non-patients, the difference grew from 29 points to 67 points, t(332) = 1.01, p = .31. The effect of context was significant for both patients, t(332) = 5.59, p .001, and for non-patients, t(332) = 4.98, p .001.

To summarize we found that patients give more distinct ratings to mild and severe health state scenarios than do non-patients, consistent with prospect theory. We also found that both patients and non-patients give more distinct ratings to mild and severe scenarios when multiple scenarios are presented together, providing contextual information about the health state and the range of severity associated with the condition. Finally, we expected non-patients' ratings to be affected more by context than non-patients' ratings, but this prediction did not bear out. The effect of context was not significantly different for the two groups, and in fact, there was a non-significant trend toward a larger effect of context for the patient group.

4 Discussion

Because there is no way to objectively measure quality of life, researchers working to understand how health influences quality of life are forced to rely on subjective judgments. By their nature, these judgments are based on personal interpretation, making it difficult to compare judgments across individuals, across groups, or across different tasks.

Previous studies have demonstrated that personal health history influences judgments of health conditions, with patients typically giving more positive ratings to health conditions than non-patients. This study provides evidence that this patient vs. non-patient discrepancy is not unidirectional; lung disease patients in this study rated severe conditions more negatively than did non-patients.

The way patients and non-patients in this study distinguished between mild and severe conditions was consistent with prospect theory (Kahneman & Tversky, 1979). Almost all of the respondents in our patient group rated their health somewhere between the most severe and the least severe lung disease. From this perspective, the mild scenario looks like a dramatic improvement and the severe scenario looks like a dramatic drop, spreading the two scenarios relatively far apart on a QoL scale. For non-patients, both conditions are a loss in health, with the most dramatic cost in QoL associated with the initial drop to the mild condition, placing the two scenarios relatively close together on the QoL scale. In the No Context condition, our healthy participants estimated only a 16 point difference (out of 100) in quality of life between a patient who experiences shortness of breath while resting in a chair--an extremely severe degree of lung disease suffered by only about 5% of our patient population--and a patient who suffers shortness of breath only after jogging three blocks, a level of fitness that likely exceeds that of most Americans. Our patient participants estimated a larger 29 point difference between these same two scenarios.

This study also explored how ratings are affected by the rating task itself, specifically, the context in which a scenario is presented. Hsee and colleagues (1979) found that ratings made in isolation differ from ratings made alongside other items, particularly when the items are difficult to evaluate. Multiple items presented together provide information about possible alternatives, helping raters understand whether a given item is good or bad, big or small, a lot or a little.

In the case of our lung disease scenarios, the evaluability of lung disease severity should not have been especially poor. Rather than using unfamiliar measurement units to describe severity, such as providing some metric of lung capacity, the scenarios specified familiar types of physical activity that would cause shortness of breath. Nevertheless, contextual information influenced ratings, despite these intuitive descriptions of lung disease severity, arguably providing useful information about the range of severity that can be expected for the disease. Across the two respondent groups, ratings for the mild and severe conditions were more distinct when made in the context of multiple scenarios, with a 21 point difference in the No Context condition and a 54 point difference in the Context condition.

The results of this study did pose one surprise. We anticipated that non-patients would be more affected by context than non-patients. If context helps participants evaluate the conditions by providing information about the range of alternatives, then patients should be less affected by this additional information, as their own experience should provide some information about the range even in the No Context condition. We found no evidence of an attenuated context effect for patients. If anything, patients showed a slightly larger effect of context, though the interaction of group and context was non-significant.

Why were lung disease patients influenced by contextual information as much or more than non-patients? One possibility is that, while patients have a good deal of information about lung disease and the range of severity associated with it, they may not always access this information when evaluating the lung disease scenarios. Judgments are highly influenced by whatever information is most active and accessible in memory (Tversky & Kahneman, 1973). When patients in the Context condition were cued to think about the range of severity for lung disease, this range should have become a highly active feature of the disease that strongly influenced judgments, whereas severity range might have been only one of many features that came to mind for patients in the No Context condition who were not explicitly cued to think about it.

4.1 Implications

Over the last several years, the positive psychology movement has inspired more researchers to investigate the factors that influence well-being and the mechanisms behind people's remarkable capacity to adapt to adverse circumstances. A continuing concern about this research emerges from the subjective nature of the available measures of happiness, quality of life, and related constructs. Conclusions about what does or does not influence well-being rely on subjective self-reports, reports that are often malleable.

In the health domain, another concern arises from the application of health-related quality of life data to cost-effectiveness analyses. The discrepancy between patients' and non-patients' QoL ratings has led to some discussion in the literature as to whether health care analyses ought to incorporate evaluations made by patients or by the general public (Boyd et al., 1990; Dolan, 1996; Gold et al., 1996; Ubel, Loewenstein, & Jepson, 2003). This question is further complicated by the evidence presented here. The discrepancy in ratings can not be easily characterized as an overestimation by patients or an underestimation by non-patients. Rather, ratings seem to depend on the relative position of the rater and the health condition in question. Because both patients' and non-patients' ratings are remarkably malleable, dramatically influenced by the context in which scenarios were rated, this study can not resolve the question of whose ratings are more accurate or more reliable in evaluating health states. Rather, this study highlights the difficulty of comparing the two groups or of drawing conclusions about whose evaluations are more meaningful.

These results suggest that researchers should take great care and consider the details of the rating task when soliciting QoL estimates. Whether the research goals are a theoretical understanding of well-being or an applied effort to improve quality of life, we must exercise caution in making conclusions based on subjective reports.

References

Boyd, N. F. Sutherland, H. J., Heasman, K. Z., Tritchler, D. L., & Cummings, B. J. (1990). Whose utilities for decision analysis? Medical Decision Making, 10, 58-67.

Brickman, P., Coates, D., & Janoff-Bulman, R. (1978). Lottery winners and accident victims: Is happiness relative? Journal of Personality & Social Psychology, 36, 917-927.

Dolan, P. (1996). The effect of experience of illness on health state valuations. Journal of Clinical Epidemiology, 49, 551-564.

Gold, M. R., Siegel, J. E., Russell, L. B., & Weinstein, M. (1996). Cost-effectiveness in health and medicine. New York: Oxford University Press.

Hurst, N. P., Jobanputra, P. Hunter, M. Lambert, C. M., Lockhead, A., & Brown, H. (1994). Validity of EuroQoL--a generic health status instrument in patients with rheumatoid arthritis. British Journal of Rheumatology, 33, 655-62.

Haubensak, G. (1992). The consistency model: A process for absolute judgments. Journal of Experimental Psychology: Human Perception and Performance, 18, 303-309.

Hsee, C. K., Loewenstein, G. F., Blount, S. & Bazerman, M. H. (1999). Preference reversals between joint and separate evaluations of options: A review and theoretical analysis. Psychological Bulletin, 125 , 576-590.

Kahneman, D. & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263-291.

Parducci, A. (1963) Range-frequency compromise in judgment. Psychological Monographs, 77(92, Whole No. 565).

Riis, J., Loewenstein, G., Baron, J., Jepson, C., Fagerlin, A., & Ubel, P. A. (2005). Ignorance of hedonic adaptation to Hemo-Dialysis: A study using ecological momentary assessment. Journal of Experimental Psychology, General, 134, 3-9.

Sackett, D. L. & Torrance, G. W. (1978). The utility of different health states as perceived by the general public. Journal of Chronic Diseases, 31, 697-704.

Schulz, R. & Decker, S. (1985). Long-term adjustment to physical disability: the role of social support, perceived control, and self-blame. Journal of Personality and Social Psychology, 48, 1162-1172.

Sieff, E. M., Dawes, R. M., & Loewenstein, G. (1999). Anticipated versus actual reaction to HIV test results. American Journal of Psychology, 112, 297-311.

Smith, D., Sherriff, R. L.,Damschroder, L., Loewenstein, G., & Ubel, P. A. (In press). Former patients give lower utility ratings for colostomy than do current patients: Evidence for theory driven recall bias. Health Psychology.

Tversky, A. & Kahneman, D. (1973). Availability: A heuristic for judging frequency and probability. Cognitive Psychology, 5, 207-232.

Ubel, P. A., Loewenstein, G., & Jepson, C. (2003). Whose quality of life? A commentary exploring discrepancies between health state evaluations of patients and the general public. Quality of Life Research, 12, 599-607.

Ware, J. E. & Sherbourne, C. D. (1992). The MOS 36-item short form health survey (SF-36). I. Conceptual framework and item selection. Medical Care, 30, 473-483.

Appendix

4.1.1 Scenario A

This person has a lung condition that causes him to become short of breath only after extreme exertion, like jogging 3 blocks, carrying a heavy basket of laundry up two flights of stairs, or shoveling snow for 20 minutes.

4.1.2 Scenario B

This person has a lung condition that causes him to become short of breath after walking briskly for 2 blocks or walking up one flight of stairs.

4.1.3 Scenario C

This person has a lung condition that causes him to become short of breath after walking slowly for 1 block. He must rest while walking up a flight of stairs.

4.1.4 Scenario D

This person has a lung condition that causes him to become short of breath after walking across a room. He is unable to walk up stairs.

4.1.5 Scenario E

This person has a lung condition that causes him to become short of breath even when in a resting state. In other words, he is short of breath just sitting in a chair. Occasionally, his shortness of breath interferes with his sleep.

Footnotes:

¹ This work was funded by R01HD040789, R01HD038963. Heather P. Lacey was supported by a HSR&D post-doctoral fellowship from the Department of Veterans Affairs, and Angela Fagerlin and Dylan M. Smith were supported by MREP early career development awards from the Department of Veterans Affairs. Address: Department of Applied Psychology, Bryant University, 1150 Douglas Pike, Smithfield, RI 02864, Email hlacey@bryant.edu
ANY GRANT SUPPORT?!!!

²Two additional survey variations were given to additional healthy participants, but due to a limited sample size for patients, these versions were not given to patient participants, so patient vs. healthy participant comparisons are not possible for these conditions. The first of these variations was similar to the Context condition in that participants read all five scenarios (see Appendix 1), but different in that patients were only asked to rate one of the five. This variation was introduced to test whether any effects of Rating Context could be attributed strictly to scaling range-frequency scale usage effects, with participants spreading their ratings evenly across the ratings scale (e.g., Parducci, 1963). Results in this condition closely approximated those found in the Context condition reported in this study, suggesting that the Rating Context effects occur even when only a single item is rated. The second variation was a reverse ordering of the scenarios in the Context condition, with scenarios presented from most severe to most mild. Presentation order did not affect ratings for any of the scenarios. Since neither of these manipulations affected the results for healthy participants, they were omitted from the design in order to maximize the limited patient sample in other conditions.

File translated from T_EX by T_TH, version 3.74.
On 16 Nov 2006, 10:26.