Experience Replay is Associated with Efficient Non-local Learning
Post by Andrew Vo
What's the science?
When we make decisions, we often rely on previously learned relationships between actions and their outcomes. Although it is relatively easy to assign a value to an action when they are experienced close together in time and space (local), it becomes more challenging to do so when they are further apart (non-local). It has been hypothesized that non-local learning is achieved via a model-based approach such as ‘experience replay’, in which a learned map of the environment is used to link local rewards to non-local actions. Direct evidence for such a mechanism in humans has yet to be established, however. This week in Science, Liu and colleagues used a novel decision-making task and non-invasive brain imaging to test how experience replay helps achieve non-local learning.
How did they do it?
To test their hypothesis, the authors designed a decision-making task that separated local and non-local learning. On each trial, the participant was presented with one of three starting arms. Each arm contained two paths, between which the participant would choose. Each path was composed of a sequence of three stimuli followed by a reward outcome. Critically, the two possible outcomes that were reachable in each arm were also shared across all starting arms. In this way, participants could use the learned outcome in the current arm (local) to inform and update their choices when encountering the other two starting arms (non-local), in a model-based approach.
To measure neural replay, the authors used magnetoencephalography (MEG) to record fast whole-brain activity as participants performed the task. Replay was defined as the reactivation of a sequence at the time of reward receipt, which could occur in both forward (i.e., from the beginning of a path sequence to the eventual outcome) and backward (i.e., from the outcome at the end of a path sequence towards the beginning) directions. They also examined how neural replay was prioritized for non-local experiences based on their utility for future decisions. This utility was determined by the gain (i.e., how informative is this current reward for improving my choice at this arm) and the need (i.e., how frequently will this arm be visited in the future) of each experience. These two task features were manipulated by changing the reward and arm probabilities, respectively, across trials.
What did they find?
When participants encountered the same starting arm as before, they were more likely to favor the path that was previously rewarded, indicative of direct, model-free learning. This local learning was found to transfer to non-local experiences, as participants would favor the path leading to a previously rewarded outcome even when it was presented in a different starting arm. Using computational modeling, the authors found that values of non-local paths were updated to a similar extent as those of local paths, and learning rates were higher for non-local paths with greater priority (higher gain and need).
Examining MEG recordings, the authors observed two types of neural replay occurring after reward receipt: (1) a fast, forward replay that peaked at 30-ms lag, and (2) a slower, backward replay that peaked at 160-ms lag. The forward replay was associated with local actions and an increase in ripple frequency power, whereas the backward replay preferentially encoded non-local actions. Computational modeling revealed that only the backward replay was associated with efficient non-local learning. Similarly, only the backward replay was related to the utility (higher gain and need) of non-local experiences.
What's the impact?
In summary, this study revealed that backward replay serves as a neural mechanism for non-local learning and is prioritized based on utility for future decisions. The results here build upon model-based reinforcement learning theories largely tested in rodents and extend them to human behavior. They contribute to our understanding of how the brain might bridge the gap between direct and indirect experiences to guide our decisions.
Liu et al. Experience replay is associated with efficient nonlocal learning. Science (2021). Access the original scientific publication here.