Uncertainty Favors Exploitation While Novelty Drives Exploration

Post by Shireen Parimoo 

The takeaway

The explore-exploit dilemma refers to the competing desires to seek versus avoid novelty and uncertainty. People are more likely to avoid uncertain options in order to maximize reward (exploit bias), except when the option is completely novel (explore bias).

What's the science?

The explore-exploit dilemma is the competition between choosing a known outcome (exploit) or a less known outcome with the potential to learn more (explore). People vary vastly in their decision-making process, especially when it comes to outcome uncertainty. Some people avoid choices with uncertain outcomes (exploit bias), while others seek out uncertainty (explore bias). Similarly, the tendency to explore new options with unknown outcomes also varies across individuals, as novelty inherently contains uncertainty. Computational approaches are useful for modeling behavior because different algorithms reflect different hypotheses about the underlying processes driving behavior. For example, some algorithms prioritize exploration (trying new options) in the beginning to learn the expected reward values, and later favor exploitation (selecting known options) to maximize reward.

Prior research has not distinguished between novelty and uncertainty, making it unclear whether they have different effects on value-based decision-making. This week in Neuron, Cockburn and colleagues used computational modeling and neuroimaging techniques to understand how the brain represents novelty and uncertainty to guide behavior.

How did they do it?

Thirty-two young adults completed 20 blocks of the multi-armed bandit task, a reward-based decision-making task while undergoing fMRI scanning. On each trial, participants were instructed to select one of two slot machines that offered different amounts of monetary reward, with the long-term goal of accumulating as much reward as possible. There were five unique slot machines in each block that were each associated with a fixed reward probability. Three of these machines were familiar and had been encountered on a previous block, while two machines were novel. Importantly, the reward probability value of a familiar machine from a previous block changed across (but not within) blocks. The slot machines varied along three dimensions: (1) reward probability or expected value, (2) novelty, based on the number of previous exposures overall, and (3) uncertainty, based on the number of exposures within that block.

The authors performed logistic regression and computational modeling of the behavioral data to model the influence of expected value, novelty, and uncertainty on the participants’ choice on each trial. The subjective utility of each slot machine choice was determined from the expected value and uncertainty of the choice and was dynamically adapted based on recent reward history and novelty. The authors then identified brain regions that were most responsive to the (1) overall subjective utility of the two choices, (2) total reward value of the two choices, (3) decision-making process associated with the chosen slot machine, as well as their (4) expected value and (5) uncertainty. Finally, they examined patterns of brain activity associated with reward prediction errors with and without the effect of novelty in the computational model to determine how novelty and uncertainty are represented in the brain.

What did they find?

Slot machines with a higher expected reward value were selected more than those with a lower expected value. New machines were chosen over familiar machines when the difference in expected reward was large, whereas the uncertain option was chosen when the difference in expected reward was small (i.e., there was a larger potential payoff from the uncertain machine). The effect of novelty increased as the trials progressed, as participants became more likely to choose the new machine over time. In contrast, the opposite was true for uncertainty as participants avoided the uncertain choice over time. The computational model also captured this pattern of behavior, demonstrating that people favor exploitation and avoid uncertainty over time unless the choice involves a completely novel option.

Several brain regions tracked the subjective utility and total expected reward value of the two choices, including the ventromedial prefrontal cortex (VMPFC), ventral striatum, and the posterior cingulate cortex. The decision-making process was represented in the VMPFC, while regions of the ventral and dorsal medial PFC were associated with expected reward value and degree of uncertainty, respectively. Paralleling the behavioral results, VMPFC activity associated with uncertainty was no longer present when novelty was included in the model, indicating that novelty inhibits the effect of uncertainty on decision-making

What's the impact?

This study is the first to isolate the contribution of both outcome uncertainty and stimulus novelty on value-based decision-making. These findings provide deeper insight into how the brain represents the subjective utility of choices over time and pave the way for future research to explore how people evaluate choices under different circumstances (e.g., under stress).

Access the original scientific publication here.