This is an R Markdown document that accompanies the following manuscript:

Phillips, J., Ong, D. C., Surtees, A. D. R., Xin, Y., Williams, S., Saxe, R., & Frank, M. C. (accepted). A second look at automatic false belief representation: reconsidering Kovács, Téglás, and Endress (2010).

We have provided the raw data from all the experiments reported in the manuscript, as well as our analysis code. This document will walk through and allow you to reproduce the analyses that we have done. The text is from the paper.

plot of chunk unnamed-chunk-2

Figure 2: Left-most panel: data from KTE’s Experiment 1, estimated from KTE’s Fig. 2A. For purpose of comparison to other figures, error bars show 95% confidence intervals, rather than the standard error of the mean provided in the original. The next three panels show the pattern of reaction times that are predicted by Automatic ToM, for responding to “ball present” (second panel; identical to KTE’s paradigm), responding to “ball absent” (third panel; tested with our Studies 2–3), and responding to ball present in “occluder” trials, where there is an occluder between the agent and the ball at all times (fourth panel; tested with our Study 4).

Study 1–4 Results

We replicate the critical statistical results of KTE (2010).

We replicated the main statistical comparisons reported by KTE. KTE’s t-tests (p. 1832) are reported along with the equivalent tests for Studies 1a–1c in Table 1. There are four main comparisons of interest. First, participants were faster to detect the ball when both the participant and the agent believed that the ball was present, compared to when neither did (P+A+ < P-A-; Cohen’s d = 0.284, 0.393, 0.654 respectively for Studies 1a,b,c; Cohen’s d = D / s where D is the difference between means, and s the standard deviation of the differences. p’s = 0.04, 0.001, 0.004).

dValues # P+A+ < P-A- #
## [1] 0.2839 0.3930 0.6545
## [1] 0.041798 0.001361 0.003920

Second, participants were also faster when they believed that the ball was present but the agent did not, as compared to when neither they nor the agent believed that it was present (P+A- < P-A-; d’s = 0.611, 0.555, 0.792; all p’s < 0.001). These first two comparisons confirm the expected result that the participant’s belief has an effect on the participant’s reaction time. Specifically, when the participant believes that the ball is present behind the occluder, the participant is faster to detect the ball, as compared to when the participant expects the ball to be absent (and is presumably surprised by the presence of the ball).

dValues # P+A- < P-A- #
## [1] 0.6106 0.5545 0.7921
## [1] 3.921e-05 1.216e-05 7.571e-04

Third, and most importantly, we also replicated the critical result that KTE interpreted as providing evidence for automatic ToM: Participants were faster to respond when the agent believed that the ball was present (and the participant did not), as compared to when neither believed it was present (P-A+ < P-A-; d’s = 0.594, 0.473, 0.422; p < 0.001, p < 0.001, p =0.05 respectively). Based on this comparison, KTE proposed that the agent’s belief facilitated participants’ detection of the ball.

dValues # P-A+ < P-A- #
## [1] 0.5942 0.4726 0.4219
## [1] 5.896e-05 1.481e-04 5.020e-02

Fourth, we replicated the null result that participants’ reaction times did not differ between the case when only the agent believed that the ball was present, and the case when only the participant had a belief that the ball was present (P-A+ ~ P+A-; d’s = 0.079, 0.062, 0.324; p’s = 0.57, 0.60, 0.13).. KTE suggest that both types of beliefs (participant beliefs and agent beliefs) individually facilitate reaction times to the same degree. All the statistical tests that were reported by KTE were replicated in all three studies; this robustness indicates that the effects that KTE reported are highly replicable across different sets of stimuli and different testing environments (online vs. in lab).

# P-A+ ~ P+A- #
## [1] 0.07874 0.06242 0.32384
## [1] 0.5653 0.5980 0.1263

Comparison Study t-statistic df p value Cohen’s d (P-A-) - (P+A+) KTE 3.47 23 0.002 0.708 1a 2.09 53 0.042 0.284 1b 3.33 71 0.001 0.393 1c 3.21 23 0.004 0.654 (P-A-) - (P+A-) KTE 3.43 23 0.002 0.700 1a 4.49 53 <.001 0.611 1b 4.71 71 <.001 0.555 1c 3.88 23 <.001 0.792 (P-A-) - (P-A+) KTE 2.42 23 0.02 0.494 1a 4.37 53 <.001 0.594 1b 4.01 71 <.001 0.473 1c 2.07 23 0.05 0.422 (P-A+) - (P+A-) KTE 0.99 23 0.33 n.s. 0.202 1a 0.58 53 0.57 n.s. 0.079 1b 0.53 71 0.60 n.s. 0.062 1c 1.59 23 0.13 n.s. 0.324

Table 1. Direct replication of results of Experiment 1 from KTE (2010) using Studies 1a–1c. The t, df, and p values from KTE were reported in the paper, while Cohen’s d for KTE’s studies were calculated from the t and df values.

We observe a crossover interaction that is not consistent with KTE’s theory.

In addition to replicating KTE’s reported results, we also observed a consistent pattern in the reaction times that KTE did not report (Studies 1a–1c, top row of Fig. 3): all three experiments showed a strong crossover interaction. The interaction coefficients for Studies 1a–1c (with 95% CIs) are: 175 [97, 253], 121 [65, 176], 66 [18, 114] msec (p < 0.001, p < 0.001, p = 0.007 respectively). The crossover was caused by relatively slow reaction times on P+A+ trials. If reaction times reflect automatic ToM, participants should be faster to respond to the ball when the agent correctly believes the ball is present than when the agent believes the ball is absent, but we observed the opposite pattern (P+A+ slower than P+A-; d’s = 0.35, 0.20, 0.41; p’s = 0.01, 0.09, 0.06). This crossover interaction is thus not consistent with automatic ToM, and it was not observed in the data that KTE report (Fig. 1) . Nevertheless, this crossover interaction was robustly present in all three of our replications (as well as in our subsequent studies, reported below). Hence, although we consistently replicated all of KTE’s reported statistical tests, our data are inconsistent with their theory.

## [1] 172.65  94.03 251.27
## [1] 121.81  65.75 177.87
## [1]  65.71  13.15 118.27
c(Expt1a_pValue, Expt1b_pValue, Expt1c_pValue)
## [1] 1.677e-05 2.057e-05 1.427e-02
dValues # P+A+ slower than P+A- #
## [1] 0.3501 0.2043 0.4082
## [1] 0.01294 0.08738 0.05749

The crossover interaction is observed regardless of the agent’s beliefs about the presence or absence of the ball.

Further evidence against the interpretation of this pattern in terms of automatic ToM comes from Studies 2a–2b and 3. Recall that the prediction based on a ToM account is that the pattern of RTs across conditions should reverse if participants are instructed to respond to the ball’s absence (or, at the very least, the previous pattern should no longer be observed). In Study 2a and 2b, participants responded to both ball presence and ball absence. The trials of interest are the correct rejections (“CR”), where participants correctly indicate that the ball is absent. In Study 3, participants only responded to the absence of the ball; the results of these studies are shown in Fig. 3.

If reaction times reflect automatic ToM, participants should be faster (or at least not slower) to respond to the absence of the ball when the agent correctly believed the ball was absent (P-A-) than when the agent falsely believed the ball was present (P-A+), as illustrated in Fig. 2 (top row, right panel). Contrary to this prediction, participants were faster to respond to the ball’s absence for P-A+ than for P-A- (P-A+ faster than P-A-; d’s = 0.42, 0.81, 0.66 for Study 2a CR, 2b CR, 3 respectively; all p’s < 0.001). Moreover, we observed exactly the same crossover pattern of reaction times across conditions for responses to the ball’s absence as we did for responded to the ball’s presence (Fig. 3, interaction b in Study 2a CR, 2b CR and 3 are: 207 [114, 301], 161 [102, 221], 173 [116, 229] msec; all p’s < 0.001).

## [1] 213.5 104.7 322.2
## [1] 152.80  99.05 206.55
## [1] 171.5 115.9 227.0
c(Expt2aCR_pValue, Expt2bCR_pValue, Expt3_pValue)
## [1] 1.190e-04 2.524e-08 1.420e-09
dValues # P-A+ faster than P-A-
## [1] -0.4197 -0.8141 -0.6560
## [1] 1.595e-03 5.792e-04 3.254e-08

We next collapsed across Studies 1–3 and tested whether the CR and “absent” trials produced a different pattern of reaction times as the original “present” trials. The model for this analysis included terms for participant and agent belief, as well as a term for whether the trials were CR/absent trials (and all interactions). Although we found a main effect of CR/absent trials, which were overall slightly slower (b = 68 [39, 97] msec; p < .0001), there were no reliable two- or three-way interactions with CR/absent trials (all b’s < 44 msec, all p’s > .10). In addition, the two-way interaction between participant and agent beliefs that we observed in each individual experiment was still reliable (b = 139 [105, 172], p < .0001). This analysis thus supports the claim that, across studies, there was no statistical difference in the pattern of reaction times across different response criteria (responding “present” or “absent”). This result clearly contradicts the predictions of an automatic ToM account.

# Tests for differences by response type ("Present" vs "Absent") across Studies1-3
allStudies1to3 = subset(d, d$expt=="1a:Replication1" | d$expt=="1b:Replication2" | d$expt=="2a:2AFC" |
                             d$expt=="2a:2AFC,CR" | d$expt=="3:Absent" | d$expt=="1c:LabReplication"  | 
                          d$expt=="2b:Lab2AFC" | d$expt=="2b:Lab2AFC,CR" )

allStudies1to3$Absent <- allStudies1to3$expt == "2a:2AFC,CR" | allStudies1to3$expt=="2b:Lab2AFC,CR" | 
# this tests whether there is a difference in CR/Absent conditions
crmod <- summary(lmer(reactionTime ~ participant*agent*Absent + (participant*agent|workerid), data=allStudies1to3))
# cr coefficient
crCoefficient = c(crmod$coefficients[4,1], 
                  crmod$coefficients[4,1] - crmod$coefficients[4,2] * 1.96,
                  crmod$coefficients[4,1] + crmod$coefficients[4,2] * 1.96)

cr_pValue = 2*(1-pnorm(crmod$coefficients[4,3])) # for the interaction t

study1to3_interactionCoefficient = c(crmod$coefficients[5,1],
                                     crmod$coefficients[5,1] - crmod$coefficients[5,3] * 1.96,
                                     crmod$coefficients[5,1] + crmod$coefficients[5,3] * 1.96)
study1to3_pValue = 2*(1-pnorm(crmod$coefficients[5,3])) # for the interaction t
## [1] 69.97 39.98 99.96
## [1] 4.807e-06
## [1] 139.0 126.2 151.9
## [1] 5.453e-11

The crossover interaction is independent of the agent’s perspective.

As a final check of whether participants’ reaction times reflect automatic encoding of the agent’s belief, we replicated Study 1 with one critical difference in the stimuli: a substantially large wall blocked the agent’s view. In this study, the agent has no perceptual access to the ball; thus the response time should be affected only by the participants’ own belief (compare the theoretical prediction in Fig. 2, last panel, with the data shown in Fig. 3, bottom right panel). Yet, contra that prediction, the pattern of reaction times across conditions remained similar to previous experiments, and the crossover interaction was still reliable (interaction b = 109 [49, 169] msec; p < 0.001).

## [1] 113.71  49.54 177.89
## [1] 0.0005148

plot of chunk unnamed-chunk-20

Figure 3: Mean reaction times by condition and experiment. The crossover interaction is statistically reliable for every experiment and condition (see text for interpretation). Error bars represent 95% confidence intervals of the mean. Lines are displaced slightly along the horizontal axis for clarity. Top row from left to right: Studies 1a, 1b, and 1c (direct replications). Middle row: Study 2a, 2b Hits (respond “present” when ball is present), and 2a Correct Rejections (CRs; respond “absent” when ball is absent). Bottom row: 2b Correct Rejections, Study 3 (respond “absent” when ball is absent), and Study 4 (permanent occluder between agent and ball).

Study 5–8 Results

The crossover interaction is observed only when there is an “attention check” with variable timing.

In Study 5a, the attention check requirement was removed, and the response time pattern became flat, with no crossover interaction (interaction b = 22 [-47, 91] msec; p = 0.53).

## [1]  21.72 -47.41  90.84
## [1] 0.538

Study 5b provided an additional replication of Study 1 and Study 5a in a within-subjects design, to allow for direct statistical comparisons between the two. In two blocks of 24 trials (6 trials per condition, with blocks in a random order across participants), participants were either asked to respond to the attention check or not. In this study, we found a reliable three-way interaction of participant condition, agent condition, and attention check condition (three-way interaction b = 76 [8, 145] msec, p = 0.029). There was still a crossover-interaction even when there was no attention check (interaction b = 62 [-16, 140] msec, p = 0.036), but the size of the effect was more than doubled in trials where there was an attention check (b = 140 [88, 192] msec, p < .001). The three-way interaction provides evidence that the magnitude of the crossover observed in Study 1, but not in Study 5a, is driven by the attention check.

## [1]  62.40   4.28 120.51
## [1] 0.03535
## [1] 138.98  86.75 191.21
## [1] 1.835e-07

Summarizing these results, Studies 5a and 5b show that removing the attention check reduces differences in RT across conditions. However, this experiment does not provide conclusive evidence for the role of the attention check; participants might simply have ignored the video display when the attention check was not required, keeping them from encoding either participant or agent beliefs. To address this issue, in Study 6, the attention check was shifted to when the agent returned to the scene, which was at 19s in all conditions. Once again, the pattern of responses was flat (interaction b = 12 [-35, 59] msec; p = 0.62). This study used the exact same stimuli as Studies 1–3, except that the attention check timing was matched across all four conditions, again based on a salient action of the agent. Critically, the characteristic pattern of response times found in Studies 1–3 was absent.

## [1]  12.01 -32.56  56.58
## [1] 0.5973

In sum, Studies 5 and 6 showed that the pattern of responses observed in Studies 1–4 disappeared when the attention check was removed or even when its timing was held constant across all videos, even though the stimuli were the same as those used in Studies 1–3.

The pattern of observed reaction times is a parametric function of the timing of the attention check and is independent of belief condition and even the presence of the agent.

To directly test the attention check hypothesis, we next decoupled the timing of the attention check from the beliefs that the participant and agent would have formed in that condition. To make this possible, we included a light bulb in the videos and instructed participants to press an additional button when the light bulb came on. This event was then used as the attention check instead of the agent’s departure. (As before, all other aspects of the studies remained identical, except where noted). By replicating the asymmetric attention check pattern in the absence of an agent (Study 7), and by varying the attention check independent of the agent (Study 8), we were able to test for a complete dissociation between attention check timing and belief condition. In Study 7, we removed the agent entirely but had the light bulb differentially switch on at the times that corresponded to when the agent left the scene in Studies 1–4 (i.e., 10.8s, 13.2s and 16.7s, see Fig. 1.). As in Studies 1–4, participants were instructed to press an additional button to indicate that they had been paying attention. Thus, participants were asked to respond at the exact same times in Study 7 as they were in Studies 1–4. We once again observed a crossover interaction (interaction b = 86 [32, 140] msec; p = 0.002), though it was slightly smaller than before. This time, however, the crossover interaction was observed without an agent being present at all! Thus, the results of Study 7 support the hypothesis that the response times observed in Studies 1–3 were independent of agent beliefs, and were plausibly driven by the attention check.

## [1]  90.36  37.40 143.32
## [1] 0.0008256

Study 7 showed that the reaction time difference between conditions can be elicited without an agent but with the corresponding attention check timing. Study 8 goes further by showing that, even when the agent is present, the reaction time effect remains absent if the attention check timing is appropriately controlled. Study 8a crossed the timing of the light bulb flash with the video condition: 3 timings (10.8s, 13.2s, 16.7s) crossed with 8 belief condition videos. Study 8b used 5 evenly spaced timings when the light bulb switch on (10.9s, 12.9s, 14.9s, 16.9s and 18.9s), again crossed with the 8 videos. As in Study 7, the participant was instructed to press a button when the light bulb flashed. Averaging across attention check timings, there was no cross-over interaction in RTs based on belief condition in either study (Fig. 4; Study 8a: interaction b = 5.3 [-38, 48] msec; p = 0.81; Study 8b: interaction b = -32.6 [-67, 2]; p = 0.07).

## [1]   6.299 -47.872  60.469
## [1] -31.890 -66.437   2.658
c(Expt8a_pValue, Expt8b_pValue)
## [1] 0.8197 1.9296

To test the effect of attention check timing on subsequent ball-detection RT, controlling for belief condition, we added attention check timing as a continuous predictor variable in our regression model (which, as discussed above, fits separate coefficients for participant and agent beliefs and their interaction). This model showed a reliable linear effect of attention check timing in both studies (coefficient on the attention time = 9.7 [5.5, 13.9] and 12.1 [9.1, 15.1] msec/sec; p’s < 0.001, Fig. 5). The closer to the ball-detection decision the attention check was, the slower the ball-detection decision was. As discussed above, this result is congruent with literature on the psychological refractory period, which suggests that the offset between two reaction-time measurements has systematic effects on the latency of the second measurement.

## [1]  9.660  5.483 13.836
## [1] 12.083  9.052 15.113
c(Expt8aTime_pValue, Expt8bTime_pValue)
## [1] 5.812e-06 5.551e-15

plot of chunk unnamed-chunk-34

Figure 4: Reaction times by condition and experiment. Crossover interaction was only statistically reliable in Study 5b and Study 7 (see text). Error bars represent 95% confidence intervals. Lines are displaced slightly along the horizontal axis for clarity. Top row, from left to right: Study 5a (attention check was removed), Study 5b trials with attention check removed, Study 5b trials with attention checks, and Study 6 (attention check was moved to the same time for all videos). Bottom row: Study 7 (agent was removed, and participants had to respond to the flash of a light bulb as an attention check), and Study 8a and 8b (agent is present, but participants responded to the flash of a light bulb at different times).