Learning from Your Mistakes: The Role of Dopamine Activity in Prediction Errors

Post by Lincoln Tracy 

What's the science?

Understanding how associative learning occurs in the brain is one of the most important questions in neuroscience. One of the key concepts in associative learning relates to the idea of a prediction error — a mismatch between what we expect to happen and what actually happens. Both humans and animals use prediction errors to learn; the greater the error, the greater the learning. Prediction errors can be calculated using the method of temporal difference. The ability to map millisecond by millisecond changes in neuronal dopamine firing activity has been a major step forward in understanding prediction errors. However, there are still aspects of prediction errors that are yet to be fully explored. Previous research has demonstrated that optogenetics can be used to shunt—or attenuate—neuronal dopamine activity to prevent learning about a reward when it is delivered. This week in Nature Neuroscience, Maes and colleagues used second-order conditioning to determine whether blocking or shunting neuronal dopamine activity with laser light when a visual cue that predicts a reward is presented prevents learning from occurring in a similar fashion.

How did they do it?

The authors took rats whose genome had been altered to express Cre recombinase, an enzyme derived from bacteria, that could be controlled by a tyrosine hydroxylase promoter. The rats underwent surgery, where a Cre-dependent viral vector carrying halorhodopsin was injected into the ventral tegmental area (VTA) of the brain. Optic fibers were also implanted into the VTA; these would be targeted by the lasers during optogenetic stimulation. The rats were then placed on a food-restricted diet for four weeks before they were conditioned to associate a specific visual cue (stimulus A; a flashing light) with a reward (a chocolate-tasting sucrose pellet). After the training period, the rats completed two experiments; a second-order conditioning experiment and a blocking experiment. In both experiments, the percentage of time the rats spent approaching the food port where the pellet was delivered was taken as a measure of how conditioned they had become. The second-order training experiment had two types of trials. In both types of trials, the previously conditioned cue (the flashing light) was used to reinforce learning about two novel cues. That is, a second novel stimulus, either a chime (stimulus C) or a siren (stimulus D), was presented after the flashing light. In the C trials, continuous laser light was beamed onto the VTA half a second before the presentation of the flashing light so to disrupt the dopamine transmission that would normally occur when the reward predicting cue was presented. In the D trials, the light was beamed onto the VTA at a random time point after the flashing light was presented. Following the training, the rats also completed probe testing, where the chime and siren were presented without a reward. The authors then compared the behavioral response between the two trial types to determine if disrupting dopaminergic transmission impacted learning.

In the blocking experiment, the conditioning cue (the flashing light) was presented in separate compounds with each of two novel audio stimuli, a tone (stimulus X) or a click (stimulus Y). Each of these compounds was paired with reward. Normally, under these conditions, the conditioned light blocks learning about the relationships between X (or Y) and the reward. The question was, if the conditioned cue carries information about the prediction of up going reward, then disrupting this prediction would prevent the light from blocking learning about X. To test this, the laser light was beamed onto the VTA in the X and Y trials at the flashing light or at a random time point between trials, respectively. Learning these compounds was compared to a compound that consisted of a non-conditioned steady light and another audio cue, a white noise (stimulus D), which was also paired with a reward. The rats underwent probe testing following the blocking (compound) training, where the X, Y, and Z stimuli were presented alone and without a reward.

What did they find?

Optogenetic manipulation did not alter responding during second-order training. However, during the probe test, the rats responded to the D stimulus more frequently than the C stimulus. These results indicate that attenuating dopaminergic activity at the start of the reward-predictive cue prevented second-order conditioning from occurring to stimulus C. Similar to the second-order experiment, optogenetic manipulation did not alter responding during blocking training. During the probe test of the blocking experiment, the rats responded more to the control stimulus, Z, compared to the blocked stimuli, X or Y. These results confirmed that the conditioned flashing light was able to block learning about the novel cues, X and Y, but showed that attenuating the dopamine signal to the flashing light did not disrupt the ability of this stimulus to block learning about the novel stimulus X, suggesting that the dopamine signal to good predictors of reward represents a prediction error and not a prediction about reward.   

lincoln_nature_neuro_pic.png

What's the impact?

This study provides clear evidence that the increases in firing activity of dopaminergic neurons following the presentation of a reward-predicting cue serve as prediction errors to support associative learning in a similar fashion to the previously shown reward-evoked changes in dopaminergic firing. Importantly, these findings suggest a broader role for dopaminergic signaling in driving associative learning than what is thought in current theories. 

dopamine_quote_Feb11.jpg

Maes et al. Causal evidence supporting the proposal that dopamine transients function as temporal difference prediction errors. Nature Neuroscience (2020). Access the original scientific publication here.