Judgment and Decision Making, vol. 5, no. 1 February 2010, pp. 64-71

How do jurors argue with one another?

Joshua Warren*   and Deanna Kuhn
Columbia University

Michael Weinstock
Ben-Gurion University of the Negev

We asked jurors awaiting trial assignment to listen to a recorded synopsis of an authentic criminal trial and to make a choice among 4 verdict possibilities. Each participant juror then deliberated with another juror whose verdict choice differed, as a microcosm of a full jury’s deliberation. Analysis of the transcripts of these deliberations revealed both characteristics general to the sample and characteristics for which variation appeared across participants. Findings were interpreted in terms of a model of juror reasoning as entailing theory-evidence coordination. More frequently than challenging the other’s statements, we found, a juror agreed with and added to or elaborated them. Epistemological stance — whether knowledge was regarded as absolute and certain or subject to interpretation — predicted several characteristics of discourse. Absolutists were less likely to make reference to the verdict criteria in their discourse. Those who did so, as well as those who made frequent reference to the evidence, were more likely to persuade their discourse partners.

Keywords: jurors, discourse, argumentation, epistemology, reasoning.

1  Introduction

Rozin (2009) has argued recently that some phenomena are of such broad social significance that they warrant whatever, even imperfect light we are able to shed on them. The discourse by means of which jurors reach their verdict decisions arguably falls into this category. Jury deliberation is a component of a democratic legal system traditionally shrouded in secrecy. Even if it were more available to external observers, the complexity of 12 individuals engaged in largely unconstrained dialog about controversial and often intricate matters is so great as to challenge analysis of the process. Unsurprisingly, then, the bulk of jury research has been devoted to an examination of influences on jury outcomes, rather than the deliberation process itself. The research that does exist on the deliberation process typically is concerned with social influence processes, and has little to say about the phenomenon of key interest to us here — the reasoning by means of which jurors influence one another.

In the work reported here, we undertook to gain insight into the dialogic reasoning that occurs during jury deliberation by reducing its complexity, looking at just a microcosm, albeit an arguably authentic one, of the larger process. Jurors awaiting trial assignment were asked to listen to a tape-recorded synopsis constructed from the transcript of an actual trial and to make a verdict judgment. Two jurors whose judgments differed were then asked to deliberate, as they might in a jury room, and to try to reach agreement with one another regarding the proper verdict. In the present article, we examine some of the characteristics of this discourse, including both characteristics general to the sample and characteristics for which variation appeared across participants.

An individual juror’s task, two of us have argued earlier, is one of theory-evidence coordination (Kuhn, Weinstock, & Flaton, 1994). The juror needs to construct multiple theories (story-verdict constellations) that are evaluated against the presented evidence. These theories are compared, and the verdict having the most consistent and least discrepant evidence associated with it is selected. If this characterization is correct, the task requires representations of the evidence, representations of each of the theories (verdict definitions) and a set of mental operations directed toward coordinating the two. A specific prediction we test here is therefore that the nature of representations of both will affect the nature and outcome of jurors’ deliberation.

Individual variation data are consistent with such a model. Kuhn et al. (1994) found that individuals at more advanced epistemological levels (who believe that absolute certainty is not possible, and who also tend to be more highly educated), were more likely to choose more moderate (intermediate) verdicts (2nd degree murder or manslaughter) in their individual verdict choices. Performance was also related to a number of argument skills — e.g., discounting, counterargument, justifying alternative verdict — that require evidence-theory coordination (Kuhn et al., 1994; Weinstock & Cronin, 2003).

In the present article, we wished to explore the utility of this theory-evidence coordination model in shedding light on aspects of the subsequent phase in the jury decision-making process, i.e., from an individual juror’s reasoning and judgment following presentation of the trial evidence to the engagement of jurors in discourse with one another with the objective of reaching a joint decision. Specifically, we pose two research questions. First, if success in an excerpt of discourse between two jurors is defined as one successfully persuading the other that the evidence better fits one verdict than another, successful or productive discourse should contain frequent reference to the individual verdicts and the criteria that define each, as well as to the evidence against which they must be compared. Does jurors’ discourse in fact have these characteristics?

Second, we wished to better understand how the individual differences in juror reasoning identified in earlier work are likely to manifest themselves in juror discourse. Education level has been found a strong predictor of individual juror reasoning (Kuhn et al., 1994) and therefore could be anticipated to affect discourse between jurors as well. However, by itself, education level is a complex, largely opaque variable. What is likely to make the more educated individual a more incisive reasoner in a juror context? Here we investigate level of epistemological understanding as a likely candidate. Research on epistemological understanding (Greene, Azevedo, & Torney-Purta, 2008; Hofer & Pintrich, 1997, 2002; Kuhn, Cheney, & Weinstock, 2000; Moshman, 2008) has identified three broad levels of understanding: absolutist (an absolute, objective truth can be determined), multiplist (subjectivity of interpretation and judgment is recognized and given priority), and evaluativist (subjectivity is recognized but does not preclude evaluation and judgment of conflicting interpretations). The latter levels become more prevalent with increasing age, and more educated individuals tend to fall more often into the latter categories (Hofer & Pintrich, 2002). These different levels of understanding regarding the nature of knowing and certainty, we propose, are implicated in the education difference in juror behavior, and similarly may have implications for the ways in which jurors talk to one another and hence conduct their decision-making task. Specifically, we predict that the absolutist conception of knowledge as certain and objectively knowable works against execution of the theory/evidence coordination entailed in the juror task. The multiplist (or relativist) conception of knowledge as entirely subjective is more open to examination from multiple perspectives, yet omits the judgmental component characteristic of the evaluativist conception, in which multiple perspectives can be compared and evaluated in a framework of argument and evidence. Hence we predict juror discourse to become more competent and productive as a function of jurors’ epistemological stance.

2  Method

2.1  Participants

Participants were jurors called for service at the Kings County Supreme Court and Civil Court in Brooklyn, New York, and awaiting assignment. Jurors were invited to participate while awaiting assignment. Their names were called randomly in the same manner that prospective jurors are selected for impaneling. Although unavailable for impaneling for part of a day, participants returned to the pool after their participation for the remainder of their jury duty. Roughly 90% of those solicited agreed to participate.

A total of 70 participants (40 male) constituted the final sample. They ranged in age from 19 to 67 (M=41). Slightly more than half (36) reported having a college or advanced degree.

2.2  Procedure

Each participant listened to 20-minute audiotaped shortened versions of two actual homicide trials, in counterbalanced order across participants. These included examination and cross-examination of witnesses, lawyers’ opening and closing statements, and judge’s introduction to the trial and final instructions. The judge’s instructions and verdict criteria were also presented in writing and remained available for reference. Each case had the same four verdict alternatives: 1st degree murder, 2nd degree murder, manslaughter and self-defense. Following the playing of each tape, the participant was asked to make and justify a verdict choice.

Following a short break, each participating juror was paired with another who had made a different verdict choice for the trial that each had heard second of the two. The dyad was instructed to deliberate as would a jury and to attempt to reach an agreed-upon verdict. If they were unable to do so after one half hour of deliberation, the deliberation was terminated. Of the 35 pairs, in 16 the difference between the two individual initial verdict choices was one level, i.e., 1st degree vs. 2nd degree (2 pairs), 2nd degree vs. manslaughter (2 pairs), or manslaughter vs. self-defense (12 pairs). In the remaining pairs the difference was greater — 1st degree vs. manslaughter (7 pairs), 2nd degree vs. self-defense (2 pairs), or 1st degree vs. self-defense (10 pairs).

All participants were also individually administered a version of the Livia task (see Weinstock & Cronin, 2003) to assess epistemological stance, given its anticipated relevance to the juror task. Two discrepant historical accounts of a fictitious war are presented and the respondent is asked about how and whether they can be reconciled. The task allows classification of responses into the three broad categories identified above: absolutist, multiplist, and evaluativist.

3  Results

3.1  Discourse characteristics

Of an initial sample of 35 pairs who engaged in deliberations, 27 pairs were successful in reaching a joint verdict decision. The other 8 indicated at the end of the 30-min period that they had been unable to do so. The pairs’ tape-recorded deliberations were transcribed and the transcripts segmented into idea units. An idea unit consisted of either a question or a claim and any supporting justification accompanying it. The 35 dialogs contained an average of 118.8 units, with a range from 38 to 285. Individuals’ contributions to the dialogs ranged from a low of 40% to a high of 60%. Thus, there were no cases in which one member of the pair did all of the talking.

A version of the scheme developed by Felton and Kuhn (2001) to examine argumentive discourse was used to code the transcripts. An initial sample of 35 transcripts consisted of a total of 4,154 identifiable utterances. Two raters worked together to code 11 transcripts and refine the coding scheme. The raters then independently coded an additional 11 transcripts, including 32.9% of the total utterances. Disagreements were resolved through discussion and consensus. Inter-rater reliability was calculated as a percentage agreement of 80.65% (of 1349 utterances, Cohen’s kappa 0.785). The remaining transcripts were coded by one of the raters. Codes that were assigned to at least 5% of all utterances appear in Appendix 1. Several other rarely used codes were assigned to a total of less than 5% of utterances.

Among the most notable of the results from Appendix 1 are, first of all, that jurors did make frequent (7.9% of utterances) reference to the evidence. They also made reference (7.1%) although slightly less frequently to the verdict categories. Rarely (1.6%), however, did they refer explicitly to the criteria that defined each of the verdict categories. Also notable is the fact that they were more likely to add to the other’s statement (8.8%) than to critique or challenge it (6.3%). In this respect, the discourse mode was more collaborative than it was oppositional. Finally, jurors often (17.9%) made meta-level statements about their deliberation, making reference to their own thinking and/or to what the other has contributed or should contribute to the discourse.

3.2  Variation across individuals

We sought to gain further insight from individual differences in performance, in particular in connection to epistemological stance. Assignment of individuals to epistemological level was done as part of earlier work (Kuhn et al., 1994; Kuhn & Weinstock, 2002). Among the larger sample of 173 individual jurors (not all of whom participated in deliberations), a subset of 43 protocols were coded independently and satisfactory reliability achieved, with 90.7% percentage agreement and a Cohen’s kappa of .88) Among the present sample of 70 jurors, 34 were classified in the absolutist category. Of the remaining 36, 26 were classified in the multiplist category and 10 in the evaluativist category. As reported elsewhere for the larger sample, this classification was related to education level, with 34% of the present sample without college degrees falling into one of the two higher epistemological categories, versus 66% of those with college degrees (χ2=4.72, df=1, p=.030). Samples of responses to the key questions from the Livia task for the three epistemological stances appear in Appendix 2.

We hypothesized that individuals who have progressed beyond the absolutist level of belief in the certainty of knowledge might show greater sensitivity to the role of argument and judgment in the jury task. We therefore examined how the deliberation of those in the two higher epistemological categories differed from that of those in the absolutist category. We report here on differences we found both in the way in which the discourse was conducted and in the discourse strategies employed.

First, the higher-level epistemological group tended to take greater control of the discourse. Of this group, 58.3% (21 of 36 multiplists or evaluativists) made a greater share of the pair’s utterances, versus 35.2% (12 of 34 absolutists; χ2=3.725, p=0.054). Absolutists made an average of 48.4% of their pair’s total dialog utterances, compared to 51.2% from the more advanced group, a significant difference, t(68)=2.789, p=0.007. Furthermore, the more advanced epistemological group exhibited greater control of the discourse in their more frequent use of Meta-Discourse statements (see Appendix 1), an average of 12.8%, vs. 9.7% for absolutists, t(68)=2.157, p=0.035. Nineteen of the 36 (52.8%) in the more advanced epistemological group made at least 13% Meta-Discourse utterances, compared to 8 of 34 (23.5%) absolutists (χ2=6.313, p=0.012).

With respect to quality of discourse, two findings stand out. First, the higher-level epistemological group were more likely to refer to the verdict criteria — 63.8% did so at least once, compared to 32.4% of absolutists (χ2=6.962., p=0.008). Second, the higher-level group were more likely to directly counter their opponent’s claims. Of the 36 jurors in the higher-level epistemological group, 12 (33.3%) showed a high proportion of counter-critique statements (10% or greater of all utterances); in contrast, only 4 (11.8%) of the 34 absolutists did so (χ2=4.613, p=0.032).

These differences are notable when we recall that many of these individuals were paired with someone from the same epistemological grouping. If we consider only those individuals who deliberated with someone in a contrasting epistemological category (18 pairs), the differences are even more striking. Among these 18 pairs, for example, the higher-level epistemological partner made a greater share of the pair’s utterances in 13 of the 18 (72.2%) pairs (binomial probability of chance occurrence=0.0481). Similarly, 13 (72.2%) of the 18 higher-level epistemological partners in mixed epistemology pairs used greater than 10% Meta-Discourse utterances; in contrast, only seven (38.9%) of the 18 absolutists in mixed pairs did so, a significant difference, χ2=4.050, p=0.044.

Finally, individuals who deliberated with a partner of the same epistemological level tended to make more utterances, averaging 69.62 utterances compared to 49.64 for individuals in mixed epistemology pairs, a significant difference, t=(68)=2.863, p=0.006. Individuals in mixed pairs, however, made more Meta-Self utterances — 8.2% — vs. 4.9% for individuals in pairs of the same epistemological level, t(68)=2.405, p=0.019. Both findings suggest that the mixed pairs may have been less comfortable and experienced more difficulty communicating with one another. See Appendix 3 for a summary of all effects involving epistemological level.

3.3  Outcome of deliberation

Of the 35 pairs, 26 concluded with one partner convincing the other to change verdict decision. Eight pairs were unable to resolve and ended in deadlock. One pair concluded with a compromise in which both partners changed from their original verdict choices.

Deadlocked jurors showed modest differences from other jurors in use of Meta-D utterances, use of verdict criteria, and use of agree utterances. Deadlocked jurors had significantly higher proportion of Meta-D utterances, averaging 14.1% compared to remaining jurors, who averaged 10.4%, t(68)=2.197, p=0.031. Deadlocked jurors were also more likely to state the verdict criteria at least once; only 22 (40.7%) of the 54 jurors reaching a verdict stated the verdict criteria, whereas 12 (75%) of the 16 deadlocked jurors explicitly started the verdict definition at least once (χ2=5.799 , p=0.016). Apart from these differences in Meta-D, verdict criteria, deadlocked jurors did not differ significantly from other jurors on any variable.

Epistemological classification did not predict whether a pair would deadlock nor which member of the pair would change verdict to accommodate to the partner’s verdict,1 nor was gender predictive of outcome. 2

Two other variables, however, were found predictive of outcome. One is reference to verdict criteria. Of the 10 pairs in which both jurors referred to verdict criteria, five (50%) of those pairs deadlocked. In contrast, only three (12%) of the remaining 25 pairs, in which only one (14 pairs) or neither member of the pair (11 pairs) directly referenced the verdict’s definition, ended in deadlock (χ2=5.85 , p=0.016). Furthermore, when only one member referred to verdict criteria and a verdict was reached (12 pairs), 11 of these 12 (92%), the member who referenced the verdict criteria maintained his or her original verdict and the partner accommodated, an outcome highly unlikely to occur by chance (binomial probability=.003).

Finally, a second variable that predicted deliberation outcome was reference to evidence. The members of each pair who successfully persuaded their partners to change verdict made more frequent reference to evidence than did the member of the pair who accommodated or than did those who deadlocked — an average of 11.1% statements of evidence for those who persuaded their discourse partners, vs. 6.0% statements about evidence for those who accommodated or deadlocked, t(68)=3.315, p=0.001.

4  Discussion

In their study of individual juror reasoning, Kuhn et al. (1994) found that individuals at more advanced epistemological levels, indicated by their belief that absolute certainty is not possible, were more likely to choose intermediate verdicts (2nd degree murder or manslaughter) in their verdict choices. Absolutists, indeed, may be engaged in a significantly different task — identifying the certain truth — than more advanced epistemologists, who accept a degree of uncertainty and seek to find the best match between theory, as defined by the verdict criteria, and evidence.

In the present work, we found that when they engaged in deliberation with another, jurors having a more advanced epistemological stance took greater charge of the discourse. Most important, they were more likely to make reference to the verdict criteria in their discourse — a crucial characteristic since it is these criteria that define each of the verdict theories and are hence central to the theory-evidence coordination task. And those who referred to the verdict criteria, as well as those who made frequent reference to the evidence, we found, were more likely to persuade their discourse partners — unless, that is, their partners also made reference to verdict criteria, in which case the pair was likely to deadlock.

These findings are encouraging, perhaps, in suggesting that jurors’ attention to key substantive criteria in jury decisions — verdict criteria and evidence — has an impact on the process and its outcome. On the other hand, a characteristic for which we found notable individual variation was the kind of discourse strategy an individual used in responding to the discourse partner. Did the individual respond critically, with counterargument, or did he or she simply add to the other’s contribution, in a way that left it unchallenged? Possibly our most important finding is that the latter strategy is overall more prevalent.

Of course, not every statement that is contributed to argumentive discourse warrants challenge or counterargument. Collaborative truth-seeking has its own strengths (Gilbert, 1997; Oaksford, Chater, & Hahn, 2008). Nonetheless, to the extent that the discourse partners in the jury setting do not engage in identifying and then critiquing, as well as supporting, alternative theories, they cannot fulfill their task. The discourse contributions we categorized in the Add category were embellishments or elaborations or sometimes tangents, in relation to the opponent’s preceding contribution. The activity in which our participants took part obviously differs from real jury deliberation in significant ways. Yet to the extent their discourse can be viewed as a microcosm of the discourse that takes place in the jury room, it gives us reason to be concerned that the characteristics of jury room discourse may depart significantly from an ideal model.


Appendix 1: Codes applied to deliberation transcripts

Average frequency
Statement of evidence
Statement of evidence presented in the trial
Example1: “The cop saw, the cop saw, he was 75 feet away, he saw the arm go back, he saw the arm go up and come down, okay.”

Example2: “When he came back he had the gun out.”

Verdict statement
Statement and justification of a verdict choice
Example1: “I basically a lot went with the no self-defense because of the weapon should be out if you’re defending yourself.”

Example2: “And I didn’t think it was manslaughter because there was obviously some planning involved.”

Verdict criteria
Statement of verdict criteria; reference to a verdict’s definition
Example1: “We know that manslaughter is killing without malice.”

Example2: “Murder on the second degree is killing committed with malice, although without deliberate premeditation.”

Other declarative statement or expression of rationale
Example1: “Him hiding the gun, I don’t think necessarily meant for him to kill his father”

Example2: “He would get revenge”

Agreement with other’s immediately preceding utterance
Example1: “Right, that’s true.”

Example2: “Yes, yes, you’re right.”

Addition to or elaboration of other’s immediately preceding utterance
Example sequence:

Partner1: “He beat the kid since he was like…”

Partner2 response: “Since he was two years old, OK” [add]

Partner1 response: “He probably beat the mother too and she just stayed quiet.” [add]

Partner2 response: “Yeah, the mother said he did beat her but she stayed quiet” [add]

Critique or challenge of other’s immediately preceding utterance

Partner1: “Well, he didn’t even know that the father was gonna do anything with the gun.”

Partner2 response: “But he took the gun out and pointed it at him and waved it in his face.” [CtrC]


Partner1: “If he thinks he’d just call him out, he wouldn’t walk with a knife on him.”

Partner2 response: “Well he claims that he always walked with a knife not to keep home.” [CtrC]

Direct question of any sort to the other.
Example1: “Why not first degree, if that’s the case we’re assuming?”

Example2: “If he took out a gun and was just waiting for his father, what did you think he was gonna scare him? Just scare him?”

Continued on next page.

Appendix 1, continued.

Average frequency


Meta-statement (or question) about or characterizing the pair’s discourse
Example1: “You understand?”

Example2: “The other page is what you need to look at.”

Example3: “And you really have to convict beyond a reasonable doubt.”



Meta-statement about the self’s ideas or thought processes
Example1: “This I understand completely. I can see why.”

Example2: “I don’t, I don’t know, you know, it’s maybe my-my view could have also been clouded by the fact, you know he was abused his whole life and so now I, you know, I couldn’t quite do the murder one.”

Any statement so minimal, fragmented, or indecipherable as to not be classifiable in any category
Example1: “Right”

Example2: “mm-hmm”

Example3: “Yes”

Example4: “But that was”

Example5: “Really”

An unjustified and preliminary statement of verdict choice
Example: “I chose manslaughter.”
Other (Includes: Hypothetical, Counter-alternative, Co-opt, Repeat, Dismiss, Disconnect)
Rarely appearing codes

Appendix 2: Epistemological stances represented in responses to key questions from Livia interview

Epistemological stance
Could both historians’ accounts of the Fifth Livian War be right?
Could anyone be certain of what happened in the Fifth Livian War?
No. Someone would have to be right. It’s not possible that both sides could be. Both sides could be right as to say that they both fought, but there had to be one victor in the fight.
Yes. Maybe some witnesses who saw it happen, or some historian that reported or write this thing. They might research some other books and see what happened.
Yes. It’s their own point of view, I mean they could be right, how they see things; it doesn’t mean it’s wrong.
No. Because as history passes through time the accounts change and the truth can be misconstrued. Also one historian is from South Livia and one is from North Livia. So, one could favor their country as opposed to the other.
Yes. From their viewpoint they are both probably very accurate. It is just that one is going to emphasize one point from their position more so than the other side is going to emphasize.
No. You [could] look into it and do an impartial analysis, possibly. Several people would have to go to both sides and get more information if possible.

Appendix 3: Epistemological effects

By epistemological stance (N=70).

Absolutist (n=34)
Multiplist/Evaluativist (n=36)
35.2% dominate discussion
58.3% dominate discussion
χ2=3.725, p=0.054
48.4% of total dialogue
51.2% of total dialogue
t=(68)=2.789, p=0.007
9.7% Meta-Discourse
12.8% Meta-Discourse
t(68)=2.0557, p=0.035
23.5% high Meta-Discourse
52.8% high Meta-Discourse
χ2=6.313, p=0.012
11.8% high Counter-Critique
33.3% high Counter-Critique
χ2=4.613, p=0.032
32.4% refer to Verdict Criteria
63.8% refer to Verdict Criteria
χ2=6.962., p=0.008

By partners’ epistemological stance (N=70).

Mixed Epistemology Pairs (n=36)
Same Epistemology Pairs (n=34)
49.64 average individual utterances
69.62 average individual utterances
t (68)=2.863, p=0.006
8.2% Meta-Self
4.9% Meta-Self
t(68)=2.405, p=0.019

By epistemological stance for mixed epistemology pairs only (n=36).

Absolutist (n=18)
Multiplist/Evaluativist (n=18)
27.8% dominate
72.2% dominate
Binomial probability=0.0481
38.9% high Meta-Discourse
72.2% high Meta-Discourse
χ2=4.050, p=0.044

Among the 18 pairs in which epistemological classification was mixed, 4 pairs deadlocked (vs. 4 pairs deadlocking in the remaining 17 non-mixed pairs). Nor were there significant differences in non-deadlocking mixed pairs with respect to whether the epistemologically-lower vs. epistemologically-higher juror changed verdicts.
Of 16 mixed-gender pairs, 3 pairs deadlocked. In the remaining non-deadlocked, mixed-gender pairs, there was a trend toward male dominance but a statistically nonsignificant one: 9 males and 4 females maintained their initial verdicts while their partners accommodated (chance likelihood p=.133).

