Network structure and dynamics of the mental workspace
- Alexander Schlegel1,
- Peter J. Kohler,
- Sergey V. Fogelson,
- Prescott Alexander,
- Dedeepya Konuthula, and
- Peter Ulric Tse
Edited by Michael S. Gazzaniga, University of California, Santa Barbara, CA, and approved August 28, 2013 (received for review June 11, 2013)
We do not know how the human brain mediates complex and creative behaviors such as artistic, scientific, and mathematical thought. Scholars theorize that these abilities require conscious experience as realized in a widespread neural network, or “mental workspace,” that represents and manipulates images, symbols, and other mental constructs across a variety of domains. Evidence for such a complex, interconnected network has been difficult to produce with current techniques that mainly study brain activity in isolation and are insensitive to distributed informational processes. The present work takes advantage of emerging techniques in network and information analysis to provide empirical support for such a widespread and interconnected information processing network in the brain that supports the manipulation of visual imagery.
The conscious manipulation of mental representations is central to many creative and uniquely human abilities. How does the human brain mediate such flexible mental operations? Here, multivariate pattern analysis of functional MRI data reveals a widespread neural network that performs specific mental manipulations on the contents of visual imagery. Evolving patterns of neural activity within this mental workspace track the sequence of informational transformations carried out by these manipulations. The network switches between distinct connectivity profiles as representations are maintained or manipulated.
Albert Einstein described the elements of his scientific thought as “certain signs and more or less clear images which can be ‘voluntarily’ reproduced or combined” (1). Creative thought in science as well as in other domains such as the visual arts, mathematics, music, and dance requires the capacity to manipulate mental representations flexibly. Cognitive scientists refer to this capacity as a “mental workspace” and suggest that it is a key function of consciousness (2) involving the distribution of information among widespread, specialized subdomains (3).
How does the human brain mediate these flexible mental operations? Behavioral studies of the mental workspace, such as Shepard and Metzler’s work on mental rotation (4), have found that many mental operations closely resemble their corresponding physical operations. This finding supports the view that the mental workspace can simulate the physical world. Recent work in neuroscience has focused on mental representations instead of operations, showing that the contents of visual perception (5), visual imagery (6), and even dreams (7) can be decoded from activity in visual cortex. These results suggest that the same regions that mediate representations in sensory perception also are involved in mental imagery. However, how the mind can manipulate these representations remains unknown. Many studies have found increased activity in frontal and parietal regions associated with a range of high-level cognitive abilities (8, 9) including mental rotation (10), analogical reasoning (11), working memory (12), and fluid intelligence (13). Together, these findings suggest that a frontoparietal network may form the core of the mental workspace. We therefore hypothesized that operations on visual representations in the mental workspace are realized through the coordinated activity of a distributed network of regions that spans at least the frontal, parietal, and occipital cortices. A strong test of this hypothesis would be to ask whether patterns of neural activity in these regions contain information about specific mental operations and whether these patterns evolve over time as mental representations are manipulated.
In the present study, we tested this hypothesis by asking 15 participants to engage in either maintenance or manipulation of visual imagery while we collected functional MRI (fMRI) measurements of their neural activity. As stimuli, we developed 100 abstract parts that could be combined into 2 × 2 figures (Fig. 1 A and C). In a series of trials, participants mentally maintained a set of parts or a whole figure, mentally constructed a set of four parts into a figure, or mentally deconstructed a figure into its four parts (Fig. 1B). Stimuli were presented briefly at the beginning of each trial, followed by a task prompt and a 6-s delay during which the participant performed the indicated mental operation. At the end of the delay, the target output of the operation was presented along with three similar distractors, and the participant indicated the correct target (Fig. 1D). Adjusting the complexity of the stimuli allowed us to equate for task difficulty by maintaining an accuracy of two out of three correct responses for each participant in each of the four conditions (chance would be 1 out of 4 correct; Fig. 1E).
As an initial procedure for selecting regions of interest (ROI) on the fMRI blood oxygenation level-dependent (BOLD) data, we carried out a whole-brain univariate general linear model (GLM) analysis to identify regions in which neural activity levels differed between mental manipulation (construct parts or deconstruct figure) and mental maintenance (maintain parts or maintain figure) conditions. This analysis revealed 11 bilateral cortical and subcortical ROIs (Fig. 2), suggesting that a widespread network mediates the manipulation tasks. All but two of the ROIs showed greater activation in manipulation than in maintenance conditions; the exceptions were the medial temporal lobe (MTL) and medial frontal cortex. In a separate control GLM analysis, we evaluated whether any regions showed differences in activity between the two manipulation conditions. No voxels were significant in this analysis, suggesting that overall activity levels were well matched between the manipulation tasks. We did not see a univariate effect in occipital cortex. This result is expected, because visual stimuli were equated across the four conditions. However, because we hypothesized that visual cortex plays a role in mediating operations on visual imagery, we included an anatomically defined occipital mask in our set of ROIs. Thus we had 12 ROIs to investigate for informational content relevant to the mental operations.
We then attempted to decode the particular mental operations performed by participants based on spatiotemporal patterns of BOLD responses in each of these 12 ROIs. We carried out a multivariate pattern-classification analysis (5) within each ROI. In this analysis, a classifier algorithm first is trained by providing it with a set of BOLD response patterns from the ROI along with the mental operation associated with each pattern. Then a holdout pattern not involved in the training is used to test the classifier. If the classifier can predict above chance the mental operation associated with the holdout pattern, the ROI contains information specific to that particular mental operation and likely is involved in mediating that operation. We carried out two-way classifications in each ROI between construct-parts and deconstruct-figure conditions and between maintain-parts and maintain-figure conditions, with results shown in Fig. 3A. To evaluate the informational content of each ROI in a single analysis, we constructed the model confusion matrix that would be expected for regions that mediated the mental operations (Fig. 3B). A confusion matrix indicates the similarity between patterns from different conditions; if patterns are more similar, the classifier will be more likely to confuse them. In this case, we expected high similarity between patterns from the same condition, moderate similarity when both patterns were from either two manipulation or two maintenance conditions, and low similarity when one pattern was from a manipulation condition and the other was from a maintenance condition. We then carried out correlation analyses between this model and the actual confusion matrix in each ROI derived from four-way classifications among the conditions (Fig. 3C). These analyses identified a subset of the ROIs, consisting of occipital cortex, posterior parietal cortex (PPC), precuneus, posterior inferior temporal cortex, dorsolateral prefrontal cortex (DLPFC), and frontal eye fields, in which we could decode the specific mental operations from patterns of neural activity. Additional control analyses confirmed that our results were not affected by ROI size or differences in response times between conditions (Fig. S1 and Table S1).
Each of the four operations followed a three-stage temporal sequence in which participants encoded an input into a mental representation, performed a mental operation (construct, deconstruct, or maintain) on that representation, and produced an output mental representation. Each of these stages entailed a unique relationship among the mental states associated with the four conditions (Fig. 4A). For example, the inputs to the construct-parts condition were similar to those of the maintain-parts condition, the operation performed during the construct-parts condition was similar to that of the deconstruct-figure condition, and the outputs from the construct-parts condition were similar to those of the maintain-figure condition. Thus, the relationship among the conditions evolved throughout the trial and provided a means of further exploring the informational content of the mental workspace. To do so, we carried out a four-way classification among the conditions at each time point and correlated the resulting confusion matrices with each of the three model similarity structures in Fig. 4A. High correlation between a confusion matrix and one of the model structures would indicate that a particular region was carrying out the corresponding stage of processing at that time. Fig. 4B shows the time course of correlations with each model in occipital cortex. In Fig. 4C, we report peak correlation times in each of the 12 ROIs. In the four regions with highest classification accuracies in Fig. 3A, correlation peaks progressed from input through operation to output, providing strong evidence that these four areas directly mediated the mental operations as they unfolded over time. It should be noted that the differences between test stimuli could have affected the output correlation time course (orange trace in Fig. 4B) because the output mental representations were similar to the stimuli presented during the test phase. Our experimental design did not allow us to evaluate the relative contributions of the output mental representations and of the test stimuli to the output correlation time course.
The above analyses show that a subset of ROIs supports the temporal evolution of information necessary to carry out particular mental operations. However, they do not provide evidence about how these regions communicate within the mental workspace network. We investigated this communication by analyzing patterns of functional connectivity between the ROIs. For each condition, participant, and region, we constructed a time course by concatenating the mean BOLD signal within that region across the participant’s correct-response trials for that condition. We calculated the functional connectivity, defined as the correlation between pairs of time courses, for each condition, participant, and pair of regions (14). This procedure yielded one network-wide pattern of functional connectivity for each condition and participant. A cross-subject classification analysis on these connectivity patterns successfully predicted whether participants mentally manipulated or maintained imagery with 61.7% accuracy [t(14) = 2.4, P = 0.029], thus indicating that patterns of connectivity between the network components changed depending on the operation that participants performed on the contents of their mental imagery. Investigating the weights that the classifier assigned to each pair of regions allowed us to determine which connections were most informative (Fig. 5A). Increases in connectivity between pairs with positive weights drove the classifier toward the manipulation conditions, whereas increases between pairs with negative weights drove it toward the maintenance conditions. Thus, stronger connectivities with the precuneus and with left posterior inferior temporal cortex indicated manipulation conditions, and stronger connectivities primarily with the MTL indicated maintenance conditions. In Fig. 5B, we plot the difference in functional connectivity between conditions. During manipulation conditions the precuneus and posterior inferior temporal cortex showed stronger connectivity with several frontal and parietal regions, whereas connectivity between the MTL and many regions became weaker. Thus, our data show not only that a distributed set of regions mediates mental operations but also that these regions communicate in an information-processing network. The network switches between two connectivity profiles depending on whether mental representations are maintained or manipulated.
Our findings reveal a widespread cortical and subcortical network that operates on visual representations in the mental workspace. This network includes four core regions spanning the DLPFC, PPC, posterior precuneus, and occipital cortex that manipulate the contents of visual imagery. Within these regions we decoded and tracked the evolution of mental operations over time. Several other areas showed a difference in BOLD responses between the manipulation and maintenance conditions but without the specificity found in the four core areas. Therefore it is likely that an extended network of regions is involved in the operations. Changes in patterns of connectivity between the mental workspace network’s nodes reveal that the network supports at least two distinct modes of operation, depending on whether mental representations are maintained or manipulated. We discuss each of the identified components of the network below.
Our finding that the DLPFC and PPC directly mediate manipulation of visual imagery is supported by multiple studies suggesting that a network of frontal and parietal areas is involved in many high-level cognitive abilities in humans (10⇓⇓–13). Miller et al. (15) showed that the responses of neurons in DLPFC convey more information about the task relevance of stimuli than about their specific features and that this selectivity for task relevance is maintained over extended durations in the absence of stimulus input. Thus, the DLPFC appears to be part of a network that maintains representations in working memory via attention. Human neuroimaging studies have shown that both the DLPFC and the PPC are activated, regardless of the type of information that is held in working memory (16, 17). Selectivity for task rather than representation distinguishes this system from subsidiary systems that are capable only of maintaining particular classes of information (18). These findings support the view that the frontoparietal network is an executive system that recruits subsidiary systems, as proposed in Baddeley’s (19) model of working memory. Modeling work by O’Reilly and colleagues (8, 9) has shown how prefrontal cortex may be able to self-organize abstract rules flexibly and later apply them to specific representations. This ability is common to many flexible cognitive processes in humans such as analogical reasoning, creativity (11), and fluid intelligence (13). Our data provide empirical support for this model by showing that the DLPFC and PPC mediate not only the maintenance of representations in working memory but also the manipulation of those representations. Thus, these areas may form the core of a system that mediates conscious operations on mental representations, in this case the contents of visual imagery represented at least partially in the occipital cortex.
Several studies have found that the occipital cortex processes information relevant to internally generated visual experience. Harrison and Tong (6) used patterns of activity in early visual cortex to decode the orientation of gratings that participants maintained in working memory. Recently, Horikawa and colleagues (7) decoded the contents of participants’ visual experience during dreaming from patterns in visual cortex. Thus, the visual cortex likely represents the contents of both internally and perceptually generated visual experience. Our results extend these findings to show that mental representations not only are formed but also are operated on in visual cortex. This result may generalize to other sensory domains, so that the brain mediates perceptual processes and operates on the corresponding mental representations in the same regions.
Margulies et al. (20) reported that the precuneus in humans is functionally connected to the lateral frontal, posterior parietal, and occipital cortices. The precuneus is one of the most connected regions of the cortex, suggesting that it may serve as a hub in several cortical networks. In their review, Cavanna and Trimble (21) cite a body of evidence that the precuneus is involved in visuospatial imagery, is relatively larger in humans than in nonhuman primates and other animals, and is one of the last regions to myelinate during development. Consistent with these findings, Vogt and Laureys (22) propose that the precuneus plays a central role in conscious information processing. Extending this work, our data show that the posterior precuneus becomes more functionally connected to the DLPFC, PPC, and occipital cortex when participants manipulate mental visual representations and suggest that it acts as a hub in the mental workspace network.
Our findings reveal that the DLPFC, PPC, posterior precuneus, and occipital cortex are central to the mental workspace. However, several other regions activated during the experimental tasks. Current understanding of these areas’ functions suggests possible roles they could play in mental operations. The cerebellum, long thought to be involved exclusively in motor coordination, now is known to connect strongly to prefrontal and posterior parietal cortices and to mediate attentional processes (23). Posterior regions of the inferotemporal cortex are involved in visual object processing (24). The thalamus is a hub for interaction between cortical areas and may play a critical role in consciousness (25). The MTL is a hub in memory formation and retrieval (26). This role is supported by our finding of stronger functional connectivity between the MTL and other ROIs during maintenance conditions. The frontal and supplementary eye fields play a role in controlling visual attention (27). Recently, Higo et al. (28) showed that the frontal operculum controls attention toward occipito-temporal representations of stimuli held in memory. The medial frontal cortex is a hub in the default mode network that plays a role in self-directed attentional processes (29). Thus, all these regions are likely involved in the mental operations performed by participants.
A significant finding of the present study is that connectivity in the mental workspace network switches between orthogonal modes of operation depending on whether the network maintains or manipulates representations. Although several network components represent information during both tasks, our data show that patterns of network connectivity associated with these tasks differ substantially. Maintenance of representations involves dense, bilateral interconnections across the entire network with the MTL acting as a hub, whereas manipulation of those representations recruits a sparse, slightly left-lateralized network with a hub in the posterior precuneus. Although the MTL hub does not contain specific information about either mental representations or manipulations, the posterior precuneus hub contains information specific to each operation. This finding suggests that these hubs serve distinct functions across the tasks. The MTL appears to bind network components together, whereas the posterior precuneus may exchange information within a sparse core of this network that itself supports manipulation of representations.
Previous studies have not been able to find evidence that the areas we identified play specific roles in manipulating representations. They have shown differences in BOLD or connectivity between maintenance and manipulation in certain areas (30⇓–32) but have not shown that these areas are responsible for the manipulations themselves. An alternative explanation of these findings could be merely that attentional allocation is increased during manipulation as compared with maintenance tasks. In this study we investigated neural activity in two qualitatively distinct types of manipulations. We show that a subset of areas in the mental workspace network contains information specific to particular manipulations. We additionally show that the task-related informational structure of these areas evolves over time in accordance with the manipulations performed. These results provide specific evidence for the particular network components that directly mediate mental operations.
Human cognition is distinguished by the flexibility with which mental representations can be constructed and manipulated to generate novel ideas and actions. Dehaene (2) and others have proposed that this ability is a key role of a global neuronal workspace that in part realizes our conscious experience. Here we have shown that patterns of activity in just such a distributed neuronal network mediate the flexible recombination of mental images. Although the present study was limited to visual imagery, we anticipate that this network is part of a more general workspace in the human brain in which core conscious processes in frontal and parietal areas recruit specialized subdomains for specific mental operations. Understanding the neural basis of this workspace could reveal common processes central to the flexible cognitive abilities that characterize our species.
Materials and Methods
Sixteen participants (six females) age 19–30 y gave informed written consent according to the Institutional Review Board guidelines of Dartmouth College before participating. Data from one participant who could not achieve our task accuracy criterion were discarded before further analysis. Participation consisted of two sessions: an initial behavioral session during which participants practiced the tasks and an fMRI session.
Participants performed four mental operations with the stimuli: They mentally constructed four parts into a figure, deconstructed a figure into four parts, maintained four parts, or maintained a figure. At the start of each trial, both a figure and four unrelated parts were displayed to equate for low-level image properties and attention across tasks. After 2 s, the stimulus disappeared and was replaced for 1 s by a prompt indicating the task to be performed. The participant then had 5 s to perform the operation, during which only a fixation dot appeared. Finally, a test screen appeared in which the target output of the operation was shown along with three distractors that were identical to the target except for a single part. The participant was instructed to indicate the target within 4 s of the test screen’s appearance. The stimulus complexity was updated on each trial so that participants achieved an accuracy of two out of three correct responses in each trial type. See SI Materials and Methods for an extended description of the task.
MRI Acquisition and Preprocessing.
Data were collected using a 3.0 T Philips Achieva Intera scanner with a 32-channel sense head coil at the Dartmouth Brain Imaging Center. Participants completed 10 functional runs consisting of 16 trials interleaved with 10-s blanks. fMRI data were preprocessed using FSL (33), and structural images were processed using the FreeSurfer image analysis suite (34). See SI Materials and Methods for a detailed description of acquisition parameters and preprocessing steps.
ROI Selection Procedure.
A whole-brain GLM analysis was carried out on functional data using the FMRIB Software Library’s FEAT tool. A first-level analysis for each participant used boxcar predictors for each of the four conditions, convolved with a double-gamma hemodynamic response function (HRF). Only trials in which participants made correct responses were considered (∼27 per condition). The results of this analysis were passed to higher-level cross-subject analyses carried out in Montreal Neurological Institute space, in which t contrasts were defined for manipulate > maintain and for manipulate < maintain. Each t-contrast map was cluster thresholded at z ≥ 2.3; clusters then were thresholded at P ≤ 0.05 according to Gaussian Random Field theory (33). This analysis yielded 11 bilateral ROIs that then were transformed back into each participant’s native space for further analysis. An additional occipital ROI was defined anatomically in each participant’s native space using the following cortical masks from FreeSurfer: inferior occipital gyrus and sulcus; middle occipital gyrus and sulci; superior occipital gyrus; cuneus; occipital pole; superior occipital and transverse occipital sulci; and anterior occipital sulcus.
Multivariate Pattern Analysis: Classification.
Multivariate pattern analysis (MVPA) was carried out using PyMVPA (35). Spatiotemporal patterns were constructed for each correct-response trial and ROI using the z-scored BOLD response from TRs 4–6 (the period during which the operation was performed, after shifting by a 4-s estimate of the HRF delay). Classification was carried out in each ROI between construct-parts and deconstruct-figure trials and between maintain-parts and maintain-figure trials using these patterns, a linear support vector machine (SVM) classifier, and leave-one-out cross-validation. Significance of accuracies was evaluated using one-tailed, one-sample t tests compared with chance (50%) and false-discovery rate (FDR) corrected across the 24 comparisons (one for each ROI and classification). A four-way classification also was carried out in each ROI to produce the confusion matrices in Fig. 3C. The correlation between each of these confusion matrices and the model similarity structure was calculated (Fig. 3B), and significance was determined at P ≤ 0.05, FDR corrected across the 12 comparisons (one for each ROI).
MVPA: Correlation Time Courses.
Four-way classification was carried out for each ROI and at each time point of the trial, here using only spatial patterns of the BOLD signal across all voxels within the ROI. This procedure produced a confusion matrix for each time point and ROI, and these confusion matrices were correlated with each of the model similarity structures in Fig. 4A. The first structure models similarities between the conditions based on whether the input representation is a set of parts or a figure. The second structure models similarities based on the two types of operations carried out, manipulation or maintenance. The third structure models similarities based on the outputs from each condition. For each ROI and model structure, we calculated the time point at which the mean correlation reached a maximum, yielding the table in Fig. 4C. These calculations were restricted to TRs 3–8, representing the pretest portion of the trial shifted by 4 s to account for hemodynamic lag. For each ROI we carried out one-way repeated-measures ANOVA on the peak correlation times to test whether the expected progression from input through operation to output occurred. We performed the analysis on trimmed, jackknifed data as recommended by Miller et al. (36) for latency analyses. In a jackknifed analysis with N subjects, N grand means of the data are calculated, each with one subject left out. The analysis then is performed on these grand means with corrections applied for the jackknife-induced decrease in variance. In the case of noisy estimates, as occur when calculating latencies from single-subject time courses, this procedure provides cleaner results and does not bias estimates of significance. For each ANOVA we defined two orthogonal linear contrasts (input/operation/output: C1 = −1/−1/2; C2 = −1/1/0) to evaluate the temporal order of the peaks. We determined that an ROI significantly followed the expected progression if and only if both of these contrasts were significant at P ≤ 0.05 uncorrected.
The functional connectivity (14), defined as the Fisher’s z-transformed correlation between time courses, was calculated for each participant and condition across all pairs of the 24 unilateral ROIs and using data pooled across all correct trials. This procedure gave a single connectivity pattern for each participant and condition. Unilateral ROIs were used to maximize the potential information in each pattern. We then carried out a cross-subject classification between manipulation and maintenance conditions, using these connectivity patterns and an SVM classifier. The sensitivities shown in Fig. 5A are significantly different from zero in a one-sample t test, corrected for the low variance because of the similarity between folds (36), and thresholded at P ≤ 0.05.
We thank Andrei Gorea for his input on the study design. This study was funded by a National Science Foundation Graduate Research Fellowship (to A.S.) and by Templeton Foundation Grant 14316 (to P.U.T.).
- 1To whom correspondence should be addressed. E-mail: firstname.lastname@example.org.
Author contributions: A.S. and P.J.K. designed research; A.S., P.J.K., P.A., and D.K. performed research; A.S. and P.J.K. contributed new reagents/analytic tools; A.S., S.V.F., P.A., and D.K. analyzed data; and A.S., P.J.K., S.V.F., and P.U.T. wrote the paper.
The authors declare no conflict of interest.
This article is a PNAS Direct Submission.
This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1311149110/-/DCSupplemental.