The Role of The Medial Prefrontal Cortex In Exploring New Options

Post by Deborah Joye

What's the science?

In our daily life, we constantly make the choice to exploit an ongoing familiar plan or to abandon that plan and explore alternative options. But how exactly does our brain resolve this exploitation-exploration dilemma? One view is that the brain might resolve this dilemma by considering feedback from action outcomes. But, it has also been suggested that the brain might resolve this dilemma with predictive encoding. Predictive encoding occurs in perceptual systems such as the visual system, where an observer’s prior beliefs about a scene alter how they perceive it. In the case of the exploration-exploitation dilemma, predictive encoding might mean that how we interpret outcome feedback is shaped by our beliefs about our own behavior. Previous research has revealed that the resolution of the exploitation-exploration dilemma involves the medial prefrontal cortex, an important brain region for executive function; however, the exact neural mechanisms that underlie this process remain unknown. This week in Science, Domenech and colleagues demonstrate that the medial prefrontal cortex resolves the exploitation-exploration dilemma through a two-stage predictive encoding process involving both the ventromedial and dorsomedial prefrontal cortex.

How did they do it?

To investigate the exploitation-exploration dilemma, the authors used electroencephalography to record neural activity in the ventromedial and dorsomedial prefrontal cortex of 6 epilepsy patients. The authors had the participants perform a sequential number task that forced them to continuously choose whether to maintain an ongoing action plan or switch toward exploring alternative responses. To determine how the brain activity changed specifically when participants switched from “exploit” to “explore”, the authors analyzed neural activity in the ventromedial and dorsomedial prefrontal cortex over successive trials. In “stay” trials, the participants received feedback reinforcing their current action plan and ultimately maintained their ongoing plan while using the feedback to slightly adjust their strategy. In “switch” trials, feedback and action outcomes led participants to switch away from their ongoing plan and instead explore new plans in the following trials. Analysis of brain activity across different combinations of trials allowed the authors to determine how brain activity in two distinct regions of the medial prefrontal cortex changed over time across different combinations of trials.

What did they find?

The authors found that the medial prefrontal cortex resolves the dilemma of exploitation versus exploration through a two-stage process that relies on predictive encoding to proactively alter the significance of incoming information. In the first stage, the ventromedial prefrontal cortex either signals that the ongoing action plan has been reliable and should be exploited or that it may be unreliable, and new plans should be explored. To signal that the ongoing action plan is reliable the ventromedial prefrontal cortex increases gamma activity, which are fast brainwaves (> 50 Hz) thought to indicate local neural processing. Conversely, to signal that the action plan should be abandoned and alternative plans explored, the ventromedial prefrontal cortex increases brain activity in the beta range, which are slower brain waves (13-30 Hz) associated with focused concentration. Activity changes in the ventromedial prefrontal cortex occurred before feedback occurred, rather than in direct response to incoming information. This suggests that the ventromedial prefrontal cortex proactively flags incoming information either as a signal to adjust the current plan through reinforcement learning or as a trigger to explore alternative strategies.

deb (1).png

In the second stage of exploitation-exploration resolution, the authors found that the dorsomedial prefrontal cortex responded to action outcomes based on the expectations provided by the ventromedial prefrontal cortex. For example, when the ventromedial prefrontal cortex signals that the current plan should be exploited, the dorsomedial prefrontal cortex also signals reinforcement learning through high gamma activity. However, when the ventromedial prefrontal cortex signals that the plan should be abandoned, the dorsomedial prefrontal cortex reconfigures its activity into the alpha range, which are slower brain waves (8-12 Hz) associated with the inhibition of information irrelevant to ongoing behavior. The decrease of alpha waves in the dorsomedial prefrontal cortex thus reflects the release of inhibition of alternative action plans. The increased alpha activity also disrupts the high gamma activity in both the ventromedial and dorsomedial prefrontal cortex that would otherwise signal reinforcement learning. This presumably helps to keep the “keep going with this plan” and “let’s explore other options” signals distinct from one another. Altogether these two regions of the medial prefrontal cortex work together to predict the functional value of incoming information and make decisions about it.

What's the impact?

This work is the first to demonstrate some of the underlying neural processes that resolve the exploitation-exploration dilemma. The authors reveal that the ventromedial prefrontal cortex can proactively change the functional significance of incoming feedback, adding support to the idea that predictive encoding is not limited to perceptual systems but is also utilized in executive function systems. The finding that predictive encoding is important in both perceptual and executive systems, suggests that it serves as a general mechanism used across the cerebral cortex to process incoming information.

exploration_quote_sept7.jpg

Domenech et al., Neural mechanisms resolving exploitation-exploration dilemmas in the medial prefrontal cortex, Science (2020). Access the original scientific publication here.