Animals exhibit a diverse range of behaviors when exploring new environments and can learn actions or sequences of actions that yield positive outcomes. The release of dopamine in response to rewards is critical for reinforcing actions that produce rewards 1-3. However, it has been challenging to understand how to assign credit to the precise action that resulted in dopamine release during continuous behavior. We studied this issue using a new self-stimulation model where specific spontaneous movements trigger optical stimulation of dopaminergic cells. Self-dopamine stimulation quickly and dynamically changes the structure of overall behavior. Initial stimuli reinforce not only the targeted action that produces stimulation but also actions similar to the target and actions that occurred seconds before the stimulation. Repeated pairings led to gradual improvement in behavior focused on the target. Reinforcing action sequences revealed additional temporal dependencies for improvement. Pairs enhance automatically separated actions over long intervals gradually assign credit, with early improvement for actions closer to the stimulation and later improvement for actions further away. Thus, the mechanism of retroactive reinforcement enhances not only reinforcement but also gradual improvement of overall behavior to assign credit to the specific actions and action sequences that lead to dopamine release.
Introduction
Animals exhibit a diverse range of behaviors when exploring new environments and can learn actions or sequences of actions that yield positive outcomes. The release of dopamine in response to rewards is critical for reinforcing actions that produce rewards 1-3. However, it has been challenging to understand how to assign credit to the precise action that resulted in dopamine release during continuous behavior. In this study, we addressed this problem using a new self-stimulation model. In this model, specific spontaneous movements trigger optical stimulation of dopaminergic cells. We found that self-dopamine stimulation quickly and dynamically alters the structure of overall behavior. Initial stimuli reinforce not only the targeted action that produces stimulation but also actions similar to the target and actions that occurred seconds before the stimulation. Repeated pairings led to gradual improvement in behavior focused on the target. Reinforcing action sequences revealed additional temporal dependencies for improvement. Pairs enhance automatically separated actions over long intervals gradually assign credit, with early improvement for actions closer to the stimulation and later improvement for actions further away. Thus, the mechanism of retroactive reinforcement enhances not only reinforcement but also gradual improvement of overall behavior to assign credit to the specific actions and action sequences that lead to dopamine release.
Additional Details
An unedited version of this manuscript is provided to allow early access to the results. Before final publication, the manuscript will undergo further review. Please note that there may be errors affecting the content, and all legal disclaimers apply.
Topics
Reward-based learning
Leave a Reply