Access the full text.
Sign up today, get DeepDyve free for 14 days.
Deficits in reward processing are a central feature of major depressive disorder with patients exhibiting decreased reward learning and altered feedback sensitivity in probabilistic reversal learning tasks. Methods to quantify probabilistic learning in both rodents and humans have been developed, providing translational paradigms for depression research. We have utilised a probabilistic reversal learning task to investigate potential differences between conventional and rapid-acting antidepressants on reward learning and feedback sensitivity. We trained 12 rats in a touchscreen probabilistic reversal learning task before investigating the effect of acute administration of citalopram, venlafaxine, reboxetine, ketamine or scopolamine. Data were also analysed using a Q-learning reinforcement learning model to understand the effects of antidepressant treatment on underlying reward processing parameters. Citalopram administration decreased trials taken to learn the first rule and increased win-stay probability. Reboxetine decreased win-stay behaviour while also decreasing the number of rule changes animals performed in a session. Venlafaxine had no effect. Ketamine and scopolamine both decreased win-stay probability, number of rule changes performed and motivation in the task. Insights from the reinforcement learning model suggested that reboxetine led animals to choose a less optimal strategy, while ketamine decreased the model-free learning rate. These results suggest that reward learning and feedback sensitivity are not differentially modulated by conventional and rapid-acting antidepressant treatment in the probabilistic reversal learning task. Keywords Antidepressants, major depressive disorder, probabilistic reversal learning, ketamine, cognitive flexibility, Q-learning, feedback sensitivity, citalopram Received: 21 November 2019; accepted: 15 January 2020 a PRLT, Bari et al. (2010) observed that chronic 5-mg/kg citalo- Introduction pram treatment increased positive feedback sensitivity (PFS) Reward learning (RL), the ability of reward to modulate future while acutely and bidirectionally modulating reversal perfor- behaviour, is believed to contribute to the aetiology and treatment mance and negative feedback sensitivity (NFS). Drozd et al. of depression (Der-Avakian et al., 2015; Vrieze et al., 2013). (2018) further characterised the PRLT with a range of conven- Probabilistic reward learning (PLT) and probabilistic reversal tional antidepressants and observed no effects with escitalopram learning tasks (PRLT) have been used to study RL and the behav- and venlafaxine treatment, although mirtazapine decreased RL ioural response to positive and negative feedback in humans and performance. The rapid-acting antidepressant, ketamine, has also animals (Bari et al., 2010; Murphy et al., 2003; Slaney et al., been investigated in the PRLT with Rychlik et al. (2017) 2018). Patients with major depressive disorder (MDD) show increased sensitivity to misleading feedback but respond nor- mally to accurate negative feedback (Murphy et al., 2003; Taylor School of Physiology, Pharmacology and Neuroscience, University of Bristol, Bristol, UK Tavares et al., 2008). MDD patients and patients in remission Nuffield Department of Clinical Neurosciences, University of Oxford, also have an impaired ability to integrate reward information Oxford, UK over time (Pechtel et al., 2013; Pizzagalli et al., 2008). Translation of human PLTs into rodent paradigms (Bari et al., Corresponding author: 2010; Der-Avakian et al., 2013) provides an opportunity to probe Emma S. J. Robinson, School of Physiology, Pharmacology and the link between RL and depressive behaviour. Two types of Neuroscience, University of Bristol, Biomedical Sciences Building, tasks are commonly used: PLTs and PRLTs, the reversal learning University Walk, Bristol BS8 1TD, UK. version also providing a measure of cognitive flexibility. Utilising Email: Emma.S.J.Robinson@bristol.ac.uk Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage). 2 Brain and Neuroscience Advances reporting decreased sensitivity to misleading negative feedback cotton rope and cardboard tubes. Sample size was estimated based following treatment. Another PLT, the Response Bias on previous studies using a similar task and manipulations (Bari Probabilistic Reward Task, has also been developed for rodents, et al., 2010). Animals weighed on average 272 g at the start of and it has been shown to be sensitive to dopaminergic manipula- training and 419 g prior to the commencement of drug study tions whereby amphetamine enhanced but pramipexole impaired experiments. Rats were kept in temperature-controlled conditions RL (Der-Avakian et al., 2013). Antidepressants have not yet been (21 ± 1°C) and under a 12:12 h reverse light–dark cycle (lights off assessed in this task. at 08:00 h). Water was available ad libitum in the home cage, but Modulation of reward-related behaviour has been relatively animals were mildly food restricted to no less than 90% of their widely studied in both animal models and humans (Lewis et al., free-feeding weight matched to a normal growth curve (≈18 g of 2019; Robinson and Roiser, 2016; Slaney et al., 2018). In tradi- food per rat/day laboratory chow (LabDiet, PMI Nutrition tional rodent models of anhedonia, chronic stress-induced International)). All dosing and behavioural testing was carried out impairments in reward sensitivity are reversed by chronic but not in the animals’ active phase between 09:00 and 18:00 h. All exper- acute antidepressant treatments (Willner, 2017), while ketamine iments were carried out in accordance with local institutional rapidly reverses these deficits (Yang et al., 2015). Recently, guidelines (University of Bristol Animal Welfare and Ethical human emotional processing tasks have been translated into Review Board), the UK Animals (Scientific procedures) Act of methods suitable for non-human species to study reward-related 1986 and the European Parliament and Council Directive of 22 cognitive biases in rodent models (Hales et al., 2014; Robinson September 2010 (2010/63/EU). and Roiser, 2016). In the affective bias test (ABT), an assay prob- ing how affective biases modulate learning and memory, conven- Apparatus tional antidepressant treatment induces a positive bias during learning of new substrate-reward associations but does not ame- Behavioural testing was carried out in touchscreen operant boxes liorate previously learnt negative biases (Stuart et al., 2013, (Med Associates, USA) containing a magazine delivering 45-mg 2015). Conversely, ketamine treatment was found to block nega- reward pellets (Test Diet, Sandown Scientific, UK), house light, tive biases but have no effect upon new learning. The judgement tone generator and infrared touchscreen panel with three win- bias task (JBT) investigates how cognitive biases alter the valua- dows within which animals could respond. The system was con- tion of ambiguous information. Within the JBT, ketamine rapidly trolled by KLimbic Software (Conclusive Solutions Ltd, UK), increases optimistic responses towards the ambiguous cue, while and output files were then decoded in a custom MATLAB conventional antidepressant treatment requires 2 weeks of treat- (MathWorks Inc version R2017a, USA) programme (see ‘Data ment for an effect (Hales et al., 2017). Taken together, these find- analysis and output measures’). ings suggest that different underlying neuropsychological process contribute to reward-related behaviours, and these are differen- tially modulated in models of depression and in response to Behavioural task delayed versus rapid-acting antidepressants. The PRLT was adapted for use on a touchscreen operant system In this study, we sought to compare the effects of conven- from the original design by Bari et al. (2010) (see Figure 1(a) for tional monoaminergic antidepressants and rapid-acting antide- a summary of the task design). pressants upon behaviour in the PRLT. We tested the conventional antidepressants citalopram, venlafaxine and reboxetine along- side the rapid-acting antidepressants ketamine and scopolamine Training. Training was conducted in three stages. In stage 1, at doses previously used in the ABT and JBT. Drugs were animals learnt to touch an initiation square presented in the centre administered acutely as our primary interest was the effects of window of the screen to receive a single reward pellet for a maxi- the rapid-onset antidepressants upon RL. Some conventional mum of 120 trials or 30 min (whichever was reached first). In antidepressants have been reported to acutely exacerbate anxiety common with other touchscreen operant tasks, animals typically (Urban et al., 2016); however, they still have acute antidepres- responded to the screen with their noses (Horner et al., 2013). sant effects (Harmer et al., 2009; Stuart et al., 2013), and acute Once animals reached criterion (completion of all 120 trials dosing allows comparison with previous PRLT studies. We addi- within a session for two consecutive sessions, mean time to train: tionally analysed data using a Q-learning model to probe para- 6.7 ± 0.51 sessions), they progressed onto stage 2. In this phase, meter changes underlying differences in RL performance. By animals had to first press the initiation square before then press- dissociating the neuropsychological mechanisms underlying ing either of two stimuli simultaneously presented in the left and differential responses to conventional and rapid-acting antide- right window to receive reward (max 200 trials or 40 min). Ani- pressant treatment, this will aid in the development of future mals were deemed to reach criteria when they completed 80% of antidepressant compounds with both long-lasting efficacy and a trials for two consecutive sessions (mean time to train: 2.25 ± 0.13 rapid-onset of action. sessions). Rats then progressed onto the main spatial probabilis- tic reversal learning protocol. Methods Behavioural testing. In the PRLT, animals had to first press an initiation stimulus in the centre of the screen before then choosing Animals and housing to respond to either a left or right spatial stimulus. There was no Twelve male Lister-hooded rats (Harlan, UK) were housed in time limit for the response to the initiation screen enabling ani- pairs within enriched laboratory cages (55 × 35 × 21 cm) contain- mals to self-pace the task. Stimuli were probabilistically rewarded ing sawdust, paper bedding, red Perspex houses (30 × 17 × 10 cm), such that the ‘rich’ stimulus had an 80% chance of reward and the Wilkinson et al. 3 did not have any effect of overall task performance (see Supple- mental Figure S2). Training was deemed to have been complete when animals’ performance had stabilised in the main output parameters of interest, allowing drug study experiments to com- mence (no main effect of session over last five sessions of training in rule changes, win-stay probability, lose-shift probability and initiation reaction time, see Supplemental Figure S1). Experimental design Acute dose–response studies were conducted in a blinded, within- subject fully counterbalanced design with all animals receiving every dose of drug. Treatment groups were allocated through use of a fully randomised design containing four treatment groups (except for the scopolamine study where three groups were used) with each group having the treatments in a different order. The conventional antidepressants citalopram hydrobromide (Selective serotonin reuptake inhibitor (SSRI), 1, 3, 10 mg/kg; t = −30 m; HelloBio, UK), venlafaxine hydrochloride (serotonin–noradrena- line reuptake inhibitor (SNRI), 1, 3, 10 mg/kg; t = −30 m; HelloBio, UK) and reboxetine mesylate hydrate (noradrenaline reuptake inhibition (NRI), 0.1, 0.3, 1 mg/kg; t = −30 m; Sigma-Aldrich, UK) alongside the rapid-acting antidepressants ketamine hydro- chloride (N-methyl-D-aspartate (NMDA) antagonist, 1, 3, 10 mg/ kg; t = −60 m; Sigma-Aldrich, UK) and scopolamine hydrobro- mide (muscarinic antagonist, 0.03, 0.1 mg/kg; t = −60 m; Tocris, UK) were all dissolved in 0.9% saline and administered before the start of testing by intraperitoneal injection using a low-stress tech- nique (Stuart and Robinson, 2015). Drug doses and pre-treatment times were chosen as to be clinically relevant and were based upon previous behavioural studies (Bari et al., 2010; Hales et al., 2017; Jones and Higgins, 1995; Stuart et al., 2013). All studies were car- ried out such that a baseline session always preceded a session when drug was administered, there was at least 2 days between Figure 1. Overview of experimental protocol. (a) Schematic of each drug session and all animals completed at least five baseline all possible routes for a single successful trial in the PRLT task. sessions (minimum one week) between the end of a drug study and Probabilities of each outcome are depicted by the width of each arrow. the commencement of the next to minimise any carryover effects Green arrows represent an animal making an action, while white arrows of treatment. All drug studies were carried out in food-restricted depict transfer from one stage of a trial to the next. If no response was animals, but a test using pre-feeding was not observed to have any detected within 10 s of an animal pressing the initiation square, then effects on the main outcomes measured in the PRLT except from this was classed as an omission and animals received a 5-s timeout. decreasing overall motivation (see Supplemental Figure S3). (b) Details of pharmacological studies, order refers to the sequence in which individual drug studies were carried out. Data analysis and output measures ‘lean’ stimulus had only a 20% chance of reward. Once an animal Parameters of interest. Output parameters were calculated to had selected a stimulus, they were either presented with a reward be consistent with previous PRLT studies (Bari et al., 2010; pellet in the magazine (once animals had retrieved the reward, the Rychlik et al., 2017). The number of rule changes and the first initiation screen illuminated and they could begin the next trial) or rule change trial were defined as the number of times an animal punished with no reward and a timeout of 5 s with the house light was able to successfully switch reward contingencies in a session on. If animals did not make a choice of stimuli within 10 s after and the trial at which an animal first achieved criterion for a rule trial initiation, this was classified as an omission and animals change within a session, respectively. Win-stay behaviour was received a timeout of 5 s during which time the house light was analysed as a proxy of PFS (how likely animals were to change illuminated. Following eight consecutive ‘rich’ stimulus choices, their behaviour as a function of positive feedback), alongside the contingencies switched so that the spatial location previously lose-shift behaviour (how likely animals were to shift responding associated with the ‘rich’ stimulus was now associated with the following negative feedback) which was examined as a proxy of ‘lean’ stimulus and vice versa. Animals were permitted to serially NFS. Win-stay behaviour was defined as the probability upon reverse throughout a session (max 200 trials or 40 min). The spa- receiving a reward at a stimulus that the rat would stay at that tial location of the ‘rich’ stimulus at the start of a session was stimulus for the next trial as opposed to shifting to the opposite consistent across sessions and counterbalanced across animals. stimulus. Conversely, lose-shift behaviour was defined as the Changing the location of the rich stimulus at the start of a session probability that following punishment at a stimulus, the rat would 4 Brain and Neuroscience Advances switch to the opposite stimulus for the next trial. Win-stay and direct measure of task performance. β (the inverse temperature of lose-shift behaviour were additionally subdivided into either true the softmax equation) is related to how deterministic stimulus or misleading feedback based upon whether the feedback given choices are, with high β values meaning that choices are made matched with the underlying rule of the task at the time. For towards stimuli with higher estimated values, while low β values example, if a rat was punished for selecting the ‘rich’ stimulus, essentially mean that choices are random (Grogan et al., 2017). this would be classed as misleading feedback but if it was Data from both training and each drug study were analysed using a rewarded for selecting the ‘rich’ stimulus, this would match with Q-learning model (Watkins and Dayan, 1992) adapted from the underlying rule of the task and, therefore, be true feedback. Grogan et al. (2017) and discussed in detail there. Briefly, for each Initiation reaction time was defined as the time taken for rats to trial, the value (Q) of choosing each stimulus is updated with a respond to the presentation of the initiation square in the central proportion of the reward prediction error (RPE) – the difference window of the screen and was taken as a proxy for motivation to between the reward expected from an action and the reward complete a trial. received. The proportion of the RPE used for updating is controlled by the learning rate (α) parameter. For acute drug studies, data for each individual rat were fit to two models: one containing a single Statistical analysis. Data were decoded and output measures learning rate (Qlearn1) and another containing a dual learning rate calculated from KLimbic output files using a custom MATLAB for positive and negative information (Qlearn2). The choice of (MathWorks Inc version R2017a, USA) programme before sta- model fit for each animal was made using Bayesian information tistical analysis was conducted using SPSS (IBM version 24, criteria (BIC; Schwarz, 1978) when fitted to the vehicle data, with USA) and output graphics constructed using GraphPad Prism 7 the model with the lowest BIC chosen. For every drug study, the (GraphPad Software, USA). Sample size was calculated from single learning rate model (Qlearn1) was the better fitting model previous experiments utilising similar behavioural tasks. Outlier (Supplemental Figure S4). Once the model had been fit, these exclusion was conducted blind to treatment, and animals that starting parameters were used to individually fit each dose and ani- completed less than 50 trials were only analysed for the variables: mal separately to create the output parameters. number of rule changes and first rule change trial. Individual drug studies were analysed independently. Each behavioural parameter was analysed using one or two factor repeated mea- Results sures analysis of variance (RM-ANOVA) with the factors treat- ment (containing each dose as a level) or treatment and feedback Effects of conventional antidepressants type, respectively (for true/misleading feedback analysis). All data were assessed for violations of sphericity using Mauchly’s Reboxetine decreased the number of rule changes (Figure 2(a)) test and where this was the case, degrees of freedom were within a session (RM-ANOVA, F = 3.31, p = 0.033), an overall 3,30 adjusted using the Huynh–Feldt correction. Post hoc analysis was integrative measure of RL performance. Although there was no main conducted using Sidak’s correction. Where data were non-nor- effect of citalopram treatment upon rule changes, there was a ten- mally distributed (assessed using Kolmogorov–Smirnov and dency for higher rule change performance when the first two doses Shapiro–Wilk tests), output variables were evaluated using the of citalopram plus vehicle were analysed in isolation (RM-ANOVA, Friedman test with post hoc analysis carried out using Bonfer- F = 2.91, p = 0.076). Citalopram reduced the trials taken to reach 2,22 roni-corrected Wilcoxon signed rank tests. Dotted lines indicate the first rule change (Friedman test, χ (3) = 8.50, p = 0.037, Figure separate drug studies. A bracket and star(s) over multiple bars 2(b)), while there was also a trend towards reboxetine increasing the indicates a main effect of treatment, while star(s) over a single trials taken to reach the first rule change (Friedman test, χ (3) = 7.02, bar indicates a post hoc significant difference compared to vehicle p = 0.071). PFS, as measured by the proportion of win-stay behav- treatment for that drug study. All data are shown as mean ± stan- iour (Figure 2(c)), was increased by citalopram treatment dard error of mean (SEM), *⩽0.05, **<0.01, ***<0.001, (RM-ANOVA, F = 3.81, p = 0.019). Surprisingly for an antide- 3,33 ****<0.0001. pressant, reboxetine decreased win-stay behaviour (RM-ANOVA, F = 6.16, p = 0.003). None of the conventional antidepressants 3,24 tested had any effects upon lose-shift behaviour (Figure 2(d)), a Modelling measure of NFS. Both citalopram (RM-ANOVA, F = 8.02, 3,33 p = 0.0004) and reboxetine (RM-ANOVA, F = 11.17, 1.25, 12.48 Computational modelling of behavioural responses provides p = 0.004) reduced motivation in the PRLT, as measured by the time another method for evaluating the mechanism of antidepressant taken to initiate a trial in the PRLT (Figure 2(e)). Venlafaxine (one drug action by probing the underlying computational processes animal excluded from study due to illness) had no effects on any occurring in the brain to produce behaviour. The Q-learning model variable measured in the PRLT. Reboxetine but not citalopram or is one of the most widely used models for analysis of reinforce- venlafaxine also reduced the number of trials completed in a session ment learning and works using the same input information availa- (Friedman test, χ (3) = 18.0, p = 0.0004, Supplemental Figure S5) ble to the animals to iterate through each trial and make decisions with the aim of maximising total reward. This computer ‘optimal strategy’ can then be compared with animal behaviour to allow Effects of rapid-acting antidepressants estimation of both absolute RL performance and estimations of underlying RL parameters such as learning rate and softmax tem- Both ketamine (RM-ANOVA, F = 5.697, p = 0.003) and sco- 3,33 perature. Theoretical accuracy was described as how well animals’ polamine (RM-ANOVA, F = 16.23, p < 0.0001) reduced RL 2,22 performance matched up to the model-predicted perfect strategy, performance as measured by the number of rule changes com- allowing an estimation of absolute RL efficiency within a session. pleted in a session (Figure 3(a)). Neither drugs had any effect on This is a different measure to observed accuracy which was a the trial at which the first rule change was achieved (Figure 3(b)); Wilkinson et al. 5 Figure 2. Effects of delayed antidepressant administration in the PRLT. (a) Rule changes completed within a session. (b) Trial at which animals first met the criterion for a rule change. (c) Win-stay probability. (d) Lose-shift probability. (e) Initiation reaction time. Dotted lines indicate separate drug studies. however, both ketamine (RM-ANOVA, F = 3.928, et al., 2017). This has been described corresponding to either true 1.97, 19.66 p = 0.037) and scopolamine (RM-ANOVA, F = 8.36, p = 0.004) or misleading feedback. We, therefore, re-analysed win-stay and 2,14 decreased win-stay behaviour (Figure 3(c)). As seen with the con- lose-shift data to observe if acute antidepressant treatment in the ventional antidepressants, neither rapid-acting antidepressant PRLT differentially effects responses to the different feedback tested had any effect on lose-shift behaviour (Figure 3(d)). types. There was an inconsistent response to feedback type Consistent with citalopram and reboxetine, both ketamine between experiments with the only difference found for the win- (RM-ANOVA, F = 7.36, p = 0.021) and scopolamine stay responses between true and misleading positive feedback for 1.08, 9.68 (Friedman test, χ (2) = 7.75, p = 0.021) also decreased motivation the venlafaxine study (two-way ANOVA, F = 8.00, p = 0.018, 1,10 as measured by the time taken to initiate a trial (Figure 3(e)). Both Figure 4(c)). There was a more consistent difference in lose-shift ketamine (Friedman test, χ (3) = 19.08, p = 0.0003, Supplemental probability between true and misleading negative feedback Figure S5) and scopolamine (Friedman test, χ (2) = 14.6, with venlafaxine (two-way ANOVA, F = 32.72, p = 0.0002, 1,10 p = 0.0007, Supplemental Figure S5) additionally reduced trials Figure 4(d)), reboxetine (two-way ANOVA, F = 19.81, 1,10 completed by animals within a session. p = 0.001, Figure 4(f)) and ketamine studies (two-way ANOVA, F = 8.72, p = 0.016, Figure 4(h)) showing a decreased lose-shift 1,9 probability following misleading feedback compared to true feedback. An interaction between ketamine treatment and feed- Effects of antidepressant treatment on true back type was also observed for animal’s lose-shift response to or misleading feedback true and misleading feedback (two-way ANOVA, F = 3.565, 3,27 Recent evidence has suggested that pharmacological treatment p = 0.027, Figure 4(h)). Further analysis revealed no effect of can differentially effect the way animals respond to probabilistic ketamine treatment on true lose-shift behaviour but a trend rewards depending on whether they agree or disagree with the towards decreased sensitivity to NFS following misleading feed- animals’ expectation of task feedback (Drozd et al., 2018; Rychlik back emerged (RM-ANOVA, F = 2.69, p = 0.066). Across all 3,27 6 Brain and Neuroscience Advances Figure 3. Effects of rapid-onset antidepressant administration in the PRLT. (a) Rule changes completed within a session. (b) Trial at which animals first met the criterion for a rule change. (c) Win-stay probability. (d) Lose-shift probability. (e) Initiation reaction time. Dotted lines indicate separate drug studies. studies, no main effects of drug treatment were observed, (RM-ANOVA, F = 3.14, p = 0.04, Figure 5(c)) and scopolamine 3,30 although trends were observed towards decreased win-stay (RM-ANOVA, F = 8.77, p = 0.013) decreased theoretical 1.247,8.728 behaviour following reboxetine (two-way ANOVA, F = 2.67, accuracy, the degree to which rats’ behaviour deviated from an opti- 3,30 p = 0.066, Figure 4(e)) and scopolamine (two-way ANOVA, mal choice strategy. Ketamine (RM-ANOVA, F = 7.41, 1.31, 13.10 F = 3.17, p = 0.073, Figure 4(i)) treatment alongside a trend p = 0.013, Figure 5(d)) and scopolamine (RM-ANOV A, F = 12.36, 2,14 2,14 towards decreased lose-shift behaviour following citalopram p = 0.0008) both decreased the learning rate, the degree to which treatment (two-way ANOVA, F = 2.62, p = 0.067, Figure 4(b)). new evidence is used to make decisions as opposed to previously 3,33 stored information. Softmax β, a measure of how deterministic ani- mals’ stimulus choices are, was increased by citalopram treatment Q-learn modelling of antidepressant (RM-ANOVA, F = 7.24, p = 0.0007, Figure 5(e)). Conversely, 3,33 treatment in the PRLT stimulus selection was more random (lower β) when animals were treated with reboxetine (RM-ANOVA, F = 4.13, p = 0.014). 3,30 The Q-learning model was first used to analyse training data where the parameter learning rate followed the same relationship as the behavioural outcome rule changes, a measure of overall Discussion RL performance (Figure 5(a)). Theoretical accuracy, the accu- racy of animals compared to a model-predicted optimal strategy, Conventional antidepressants also followed a close relationship with the overall accuracy of animals in the task (Figure 5(b)), with animals always following Citalopram was the only conventional antidepressant tested a close but non-optimal strategy compared to ideal. exhibiting behavioural effects consistent with its antidepres- Acute dose–response study data was analysed using the Qlearn1 sant role. Treated animals required fewer trials to learn the first model due to it being the better fitting model. Both reboxetine probabilistic rule, exhibited increased PFS and deterministic Wilkinson et al. 7 Figure 4. Effects of antidepressant treatment upon true versus misleading feedback sensitivity. Responses were divided as to whether they met the underlying rule of the task (true feedback) or clashed with the underlying rule (misleading feedback) the animal was required to learn at the time. (a) Citalopram: Win-stay probability. (b) Citalopram: Lose-shift probability. (c) Venlafaxine: Win-stay probability. (d) Venlafaxine: Lose-shift probability. (e) Reboxetine: Win-stay probability. (f) Reboxetine: Lose-shift probability. (g) Ketamine: Win-stay probability. (h) Ketamine: Lose-shift probability. (i) Scopolamine: Win-stay probability. (j) Scopolamine: Lose-shift probability. A bracket and star(s) over multiple bars indicates a main effect of treatment within a feedback type (e.g. misleading lose-shift) for a single drug study. This was only assessed when a significant interaction between treatment and feedback type was observed in the two-way ANOVA for both feedback types combined. 8 Brain and Neuroscience Advances Figure 5. Reinforcement learning modelling of PRLT behaviour. (a) Correlation between rule changes and model-free learning rate over the first 18 sessions of training in the PRLT. (b) Correlation between absolute accuracy and accuracy compared to a model-predicted perfect strategy in the first 18 sessions of training. (c) Theoretical accuracy. (d) Model-free learning rate. (e) β, the inverse softmax temperature. Wilkinson et al. 9 stimulus selection but did not show any increase in rule Other studies have also suggested ketamine impairs reward changes. This lack of effect upon rule changes matches escit- processing. Administration of ketamine to both rats (10 mg/kg) alopram data from Drozd et al. (2018) who found no effect and humans (0.5 mg/kg) has been found to reduce reward antici- upon any parameter measured. However, Bari et al. (2010) pation responses in the ventral striatum to either money or food observed that acute 5-mg/kg citalopram administration (Francois et al., 2016). In addition, it has been observed that keta- increased rule changes and decreased NFS, while 10-mg/kg mine (32 mg/kg) impairs rats’ ability to assign a motivational chronic administration also increased PFS. Differences in value to a reward-predicting cue (Fitzpatrick and Morrow, 2017). results between the current study, Drozd et al. (2018) and Bari Scopolamine’s ability to impair RL has been observed in other et al. (2010) may be related to animals’ level of performance tasks, with an acute 0.17-mg/kg dose increasing error rate upon prior to drug study experiments, with animals in this study and rule reversal in a non-probabilistic reversal learning task in mice Drozd et al. (2018) performing roughly triple the baseline rule (Pelsőczi and Lévay, 2017). changes compared to Bari et al. (2010). In similar low doses to those tested here, work in our labora- Reboxetine decreased RL performance and PFS. Noradrenaline tory has previously observed that ketamine has a robust effect on has been observed to support choice variability with clonidine- ambiguous cue interpretation and retrieval of memory biases in treated monkeys, a manipulation reducing central nervous system the ABT at 1 mg/kg (Hales et al., 2017; Stuart et al., 2015). These (CNS) noradrenaline levels, displaying decreased choice variabil- results combined with observations from the present study seem ity in a sequential cost/benefit decision-making task (Jahn et al., to suggest that while ketamine administration impairs RL, in 2018). Reboxetine-treated animals in the present study had a assays of affective bias, it modifies biases during retrieval and decreased β parameter value, a proxy of increased choice variabil- positively biases interpretation of ambiguous cues. With regard ity. In the PRLT too high a choice variability could impair the to scopolamine, one could tentatively conclude that scopolamine ability of animals to persevere at a stimulus for long enough to appears to impair RL; however, more research is needed. reverse (Delgado et al., 2011). However, in a fixed reward proba- bility reversal learning task, rats administered with the NET- Reinforcement learning model biased tricyclic antidepressant (TCA) desipramine showed increased reversal learning performance (Seu and Jentsch, 2009). When fitting behavioural data with the Q-learn model, we Potentially, increased noradrenaline is detrimental to performance observed that the single learning rate Q-learning model fits best, where rules are uncertain but beneficial when the reward contin- irrespective of drug treatment. This is in contrast to previous gencies are deterministic. PRLT studies in rodents where a dual learning rate model was the Concurrent with results from Drozd et al. (2018), acute venla- best fitting (Alsiö et al., 2019; Noworyta-Sokolowska et al., faxine treatment did not change RL or feedback sensitivity. The 2019). This also differs to human data where a dual rate model is lack of effect observed in the present study may be because of the again better fitted (Grogan et al., 2017). One possibility for the mixed serotonergic and noradrenergic transporter affinities of the difference in best fitting model might be due to subtleties in the drug whereby an RL impairment caused by enhanced noradrener- training for the task resulting in animals within different studies gic transmission is balanced by increased serotonergic signalling using different strategies to perform the task. The fact that theo- improving RL ability. retical accuracy was consistently higher than actual task accuracy In the ABT, citalopram, venlafaxine and reboxetine all posi- suggests that there are factors underlying animal performance tively bias the valuation of reward during new learning over mul- that are not accounted for in the Q-learning model tested here. tiple days (Stuart et al., 2013). However, unless they are dosed Potential solutions to this could include utilising other models chronically, acute conventional antidepressants do not positively such as those employing choice stickiness (Alsiö et al., 2019) or bias the interpretation of an ambiguous cue in the JBT (Hales fictitious updating of the non-chosen option (Noworyta- et al., 2017). These results combined with the observations in the Sokolowska et al., 2019). present study suggest that conventional antidepressants do not Citalopram increased the β parameter, implying choices were alter absolute RL, rather requiring at least overnight integration more deterministic, while reboxetine led to animals making more of memories to have an effect. random choices. Citalopram’s ability to increase deterministic choice selection is interesting in the context of citalopram acutely increasing anxiety (Urban et al., 2016) and suggests that animals Rapid-acting antidepressants were better able to form and execute strategy to perform the task. Both ketamine and scopolamine impaired RL, PFS and motiva- Reboxetine and scopolamine both impaired theoretical accuracy, tion. Both drugs have sedative effects at higher doses, a potential while reboxetine also decreased the β parameter; these impair- cause of their motivational impairments, although it has been ments are consistent with their effects upon rule changes. observed that motivation and RL are dissociable within the PRLT Ketamine and scopolamine decreased the learning rate with this (Roberts et al., 2019). Rychlik et al. (2017) argued that attenuat- effect again mapping onto their impairments upon rule change ing animals’ sensitivity to misleading negative feedback may be performance. One recent study found that ketamine administra- a mechanism for ketamine’s therapeutic action. We also observed tion in humans performing a PRLT containing a risk-based ele- an interaction between ketamine treatment and feedback type ment caused no change in the learning rate but impaired ability to with a trend towards decreased misleading NFS. However, inter- follow an optimal reward strategy (similar to theoretical accu- preting the significance of this finding is difficult when compared racy, Vinckier et al., 2016). Surprisingly, it has been observed with the marked impairments ketamine had upon overall proba- that learning rate is negatively correlated with reward sensitivity bilistic learning and PFS. No such interaction was seen in ani- and that patients with MDD or high anhedonia have decreased mals treated with scopolamine. reward sensitivity but no change in the learning rate (Huys et al., 10 Brain and Neuroscience Advances 2013). This would suggest that for drugs to have antidepressant this task. This suggests that modulation of RL and feedback sen- efficacy, they cannot increase both the learning rate and reward sitivity, as measured in this task, does not appear to be a key sensitivity. This would mean that the ketamine-mediated decrease mechanism for their therapeutic efficacy. in both the learning rate and PFS is more likely due to a general impairment of cognitive functioning as opposed to a specific Acknowledgements effect on RL. The authors would like to thank Julia Bartlett for assistance in performing experimental procedures and Claire Hales for help with MATLAB analy- sis and computational modelling. Comparison with human literature Human PRLT studies have observed that escitalopram and citalo- Declaration of conflicting interests pram increase both errors made to criterion while achieving the The author(s) declared no potential conflicts of interest with respect to first reversal and misleading NFS (Chamberlain et al., 2006; the research, authorship, and/or publication of this article. Skandali et al., 2018). Tryptophan supplementation, a manipula- tion increasing synaptic serotonin levels, has been also found to Funding have no effect upon reversal learning errors in the PRLT The author(s) disclosed receipt of the following financial support for the (Thirkettle et al., 2019). Interestingly, the SSRI paroxetine has research, authorship, and/or publication of this article: The funding for been found to attenuate the bias that depressed patients have this study was provided by the BBSRC SWBio DTP PhD programme towards preferentially learning from negative feedback com- (grant numbers: BB/J014400/1 and BB/M009122/1) awarded to M.P.W. pared to positive feedback (Moustafa et al., 2013). Atomoxetine, E.S.J.R. has received research funding from Boehringer Ingelheim, Eli a noradrenaline reuptake inhibitor, has also been observed in the Lilly, Pfizer, Small Pharma Ltd. and MSD, but these companies were not human PRLT to have little effect on both misleading negative associated with the data presented in this manuscript. M.P.W. has received feedback and errors to criterion (Chamberlain et al., 2006; a simi- funding from UCB unrelated to this research. lar measure to trials to first rule change), similar to the results seen in rodents. There is a lack of human studies examining any ORCID iD of the other drugs tested in the present study. Matthew P. Wilkinson https://orcid.org/0000-0002-6327-2949 Although the translatability of the PRLT is one of its key strengths, there are noticeable differences between how humans and rodents complete the task. Aside from the previously dis- Supplemental material cussed differences in model fitting between species, there are Supplemental material for this article is available online. also marked differences in baseline feedback sensitivity. In response to misleading negative feedback, humans rarely switch References to the opposite stimulus (p(lose-switch) = 0.05, Skandali et al., Alsiö J, Phillips BU, Sala-bayo J, et al. (2019) Dopamine D2-like recep- 2018) compared to rats in this study (p(lose-switch) ≈ 0.65). tor stimulation blocks negative feedback in visual and spatial rever- Humans are also able to dissociate between true and misleading sal learning in the rat: Behavioural and computational evidence. positive feedback (win-stay following misleading positive feed- Psychopharmacology 236(8): 2307–2323. back: p(win-stay) = 0.01 and true positive feedback: p(win- Bari A, Theobald DE, Caprioli D, et al. (2010) Serotonin modulates stay) = 0.86), while rodents in this study were not able to do this. sensitivity to reward and negative feedback in a probabilistic This implies that rodents and humans are potentially using differ- reversal learning task in rats. Neuropsychopharmacology 35(6): ent strategies to complete the task, meaning interpretation of the 1290–1301. results between species might not be straightforward. Chamberlain SR, Muller U, Blackwell AA, et al. (2006) Neurochemi- cal modulation of response inhibition and probabilistic learning in humans. Science 311(5762): 861–864. Summary Delgado MR, Phelps EA and Robbins TW (2011) Decision Making, Affect, and Learning: Attention and Performance XXIII. Oxford: These data suggest that within this PRLT, antidepressant treat- Oxford University Press. ment does not consistently modify RL or feedback sensitivity in Der-Avakian A, Barnes S, Markou A, et al. (2015) Translational assess- a manner congruent with the drugs’ antidepressant efficacy or ment of reward and motivational deficits in psychiatric disorders. time course of effects. Citalopram’s ability to improve RL and In: Robbins TW and Sahakian BJ (eds) Translational Neuropsycho- modify PFS in the face of other drugs causing general impair- Pharmacology. Current Topics in Behavioural Neuroscience, vol. 28. ments suggests that the task is potentially sensitive to manipula- pp. 231–262. tions of serotonergic neurotransmission. These results also add Der-Avakian A, D’Souza MS, Pizzagalli DA, et al. (2013) Assessment of reward responsiveness in the response bias probabilistic reward task further weight that motivation and RL are dissociable within the in rats: Implications for cross-species translational research. Trans- PRLT due to the ability of citalopram to improve RL but impair lational Psychiatry 3(8): e297. motivation. Unlike tasks which measure affective biases, this Drozd R, Rychlik M, Fijalkowska A, et al. (2018) Effects of cognitive PRLT did not reveal any differences between the rapid-acting and judgement bias and acute antidepressant treatment on sensitivity to conventional antidepressants which could be related to the tem- feedback and cognitive flexibility in the rat version of the proba- poral differences in their clinical benefits. Evidence from rein- bilistic reversal-learning test. Behavioural Brain Research 359: forcement learning model analysis also suggested that none of 619–629. the drugs improved innate reward processing parameters. All the Fitzpatrick CJ and Morrow JD (2017) Subanesthetic ketamine decreases drugs tested are effective antidepressants but, except for citalo- the incentive-motivational value of reward-related cues. Journal of pram, either caused general impairments in RL or had no effect in Psychopharmacology 31(1): 67–74. Wilkinson et al. 11 Francois J, Grimm O, Schwarz AJ, et al. (2016) Ketamine suppresses Roberts BZ, Young JW, He YV, et al. (2019) Oxytocin improves prob- the ventral striatal response to reward anticipation: A cross-species abilistic reversal learning but not effortful motivation in Brown translational neuroimaging study. Neuropsychopharmacology 41(5): Norway rats. Neuropharmacology 150: 15–26. 1386–1394. Robinson ESJ and Roiser JP (2016) Affective biases in humans and Grogan JP, Tsivos D, Smith L, et al. (2017) Effects of dopamine on rein- animals. In: Robbins TW and Sahakian BJ (eds) Translational Neu- forcement learning and consolidation in Parkinson’ s disease. Elife ropsychopharmacology. Cham: Springer, pp. 263–286. 6: e26801. Rychlik M, Bollen E and Rygula R (2017) Ketamine decreases sensitiv- Hales CA, Houghton CJ and Robinson ESJ (2017) Behavioural and com- ity of male rats to misleading negative feedback in a probabilistic putational methods reveal differential effects for how delayed and reversal-learning task. Psychopharmacology 234(4): 613–620. rapid onset antidepressants effect decision making in rats. European Schwarz G (1978) Estimating the dimension of a model. The Annals of Neuropsychopharmacology 27(12): 1268–1280. Statistics 6(2): 461–464. Hales CA, Stuart SA, Anderson MH, et al. (2014) Modelling cognitive Seu E and Jentsch D (2009) Effect of acute and repeated treatment with affective biases in major depressive disorder using rodents. British desipramine or methylphenidate on serial reversal learning in rats. Journal of Pharmacology 171(20): 4524–4538. Neuropharmacology 57(7–8): 665–672. Harmer CJ, O’Sullivan U, Favaron E, et al. (2009) Effect of acute Skandali N, Rowe JB, Voon V, et al. (2018) Dissociable effects of acute antidepressant administration on negative affective bias in SSRI (escitalopram) on executive, learning and emotional functions depressed patients. American Journal of Psychiatry 166(10): in healthy humans. Neuropsychopharmacology 43(13): 2645–2651. 1178–1184. Slaney CL, Hales CA and Robinson ESJ (2018) Rat models of reward Horner AE, Heath CJ, Hvoslef-Eide M, et al. (2013) The touchscreen deficits in psychiatric disorders. Current Opinion in Behavioral Sci- operant platform for testing learning and memory in rats and mice. ences 22: 136–142. Nature Protocols 8(10): 1961–1984. Stuart SA and Robinson ESJ (2015) Reducing the stress of drug adminis- Huys QJ, Pizzagalli DA, Bogdan R, et al. (2013) Mapping anhedonia tration: Implications for the 3Rs. Scientific Reports 5: 14288. onto reinforcement learning: A behavioural meta-analysis. Biology Stuart SA, Butler P, Munafò MR, et al. (2013) A translational rodent of Mood & Anxiety Disorders 3(1): 12. assay of affective biases in depression and antidepressant therapy. Jahn CI, Gilardeau S, Varazzani C, et al. (2018) Dual contributions of Neuropsychopharmacology 38(9): 1625–1635. noradrenaline to behavioural flexibility and motivation. Psychophar- Stuart SA, Butler P, Munafò MR, et al. (2015) Distinct neuropsychologi- macology 235(9): 2687–2702. cal mechanisms may explain delayed- versus rapid-onset antidepres- Jones DNC and Higgins GA (1995) Effect of scopolamine on visual sant efficacy. Neuropsychopharmacology 40(9): 2165–2174. attention in rats. Psychopharmacology 120(2): 142–149. Taylor Tavares JV, Clark L, Furey ML, et al. (2008) Neural basis of Lewis LR, Benn A, Dwyer DM, et al. (2019) Affective biases and their abnormal response to negative feedback in unmedicated mood disor- interaction with other reward-related deficits in rodent models of ders. NeuroImage 42(3): 1118–1126. psychiatric disorders. Behavioural Brain Research 372: 112051. Thirkettle M, Barker L, Gallagher T, et al. (2019) Dissociable effects Moustafa AA, Petrides G, Abdellatif SM, et al. (2013) Learning from of tryptophan supplementation on negative feedback sensitivity and negative feedback in patients with major depressive disorder is reversal learning. Frontiers in Behavioral Neuroscience 13: 127. attenuated by SSRI antidepressants. Frontiers in Integrative Neuro- Urban DJ, Zhu H, Marcinkiewcz CA, et al. (2016) Elucidation of science 7: 67. the behavioral program and neuronal network encoded by dorsal Murphy FC, Michael A, Robbins TW, et al. (2003) Neuropsychological raphe serotonergic neurons. Neuropsychopharmacology 41(5): impairment in patients with major depressive disorder: The effects 1404–1415. of feedback on task performance. Psychological Medicine 33(3): Vinckier F, Gaillard R, Palminteri S, et al. (2016) Confidence and 455–467. psychosis: A neuro-computational account of contingency learn- Noworyta-Sokolowska K, Kozub A, Jablonska J, et al. (2019) Sensitivity ing disruption by NMDA blockade. Molecular Psychiatry 21(7): to negative and positive feedback as a stable and enduring behav- 946–955. ioural trait in rats. Psychopharmacology 236(8): 2389–2403. Vrieze E, Pizzagalli DA, Demyttenaere K, et al. (2013) Reduced reward Pechtel P, Dutra SJ, Goetz EL, et al. (2013) Blunted reward responsive- learning predicts outcome in major depressive disorder. Biological ness in remitted depression. Journal of Psychiatric Research 47(12): Psychiatry 73(7): 639–645. 1864–1869. Watkins CJCH and Dayan P (1992) Technical note: Q-learning. Machine Pelsőczi P and Lévay G (2017) Effect of scopolamine on mice motor Learning 8: 279–292. activity, lick behavior and reversal learning in the intellicage. Neuro- Willner P (2017) The chronic mild stress (CMS) model of depression: chemical Research 42(12): 3597–3602. History, evaluation and usage. Neurobiology of Stress 6: 78–93. Pizzagalli DA, Losifescu D, Hallet LA, et al. (2008) Reduced hedonic Yang C, Shirayama Y, Zhang J, et al. (2015) R-ketamine: A rapid-onset capacity in major depressive disorder: Evidence from a probabilistic and sustained antidepressant without psychotomimetic side effects. reward task. Journal of Psychiatric Research 43(1): 76–87. Translational Psychiatry 5: e632.
Brain and Neuroscience Advances – SAGE
Published: Feb 23, 2020
Keywords: Antidepressants; major depressive disorder; probabilistic reversal learning; ketamine; cognitive flexibility; Q-learning; feedback sensitivity; citalopram
Access the full text.
Sign up today, get DeepDyve free for 14 days.