Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Deep Reinforcement Learning for the Navigation of Neurovascular Catheters

Deep Reinforcement Learning for the Navigation of Neurovascular Catheters Current Directions in Biomedical Engineering 2019;5(1):5-8 Tobias Behr*, Tim Philipp Pusch, Marius Siegfarth, Dominik Hüsener, Tobias Mörschel and Lennart Karstensen Deep Reinforcement Learning for the Navigation of Neurovascular Catheters https://doi.org/10.1515/cdbme-2019-0002 Abstract: Endovascular catheters are necessary for state-of- the-art treatments of life-threatening and time-critical diseases like strokes and heart attacks. Navigating them through the vascular tree is a highly challenging task. We present our preliminary results for the autonomous control of a guidewire through a vessel phantom with the help of Deep Reinforcement Learning. We trained Deep-Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG) agents on a simulated vessel phantom and evaluated the training performance. We also investigated the effect of the two enhancements Hindsight Experience Replay (HER) and Human Demonstration (HD) on the training speed of our agents. The results show that the agents are capable of learning to navigate a guidewire from a random start point in the vessel phantom to a random goal. This is achieved with an average Figure 1 Closed loop for autonomous passive catheter success rate of 86.5% for DQN and 89.6% for DDPG. The use navigation of HER and HD significantly increases the training speed. The catheters are difficult to predict or calculate and vascular trees results are promising and future research should address more have a high variance in shape. Therefore movements at the complex vessel phantoms and the use of a combination of proximal end of the catheter translate into unexpected guidewire and catheter. movements at the distal tip due to friction and elastic Keywords: Deep Reinforcement Learning, Catheter, deformation of catheter and vessel wall. For this reason, Guidewire, Endovascular, DQN, DDPG, Hindsight conventional control approaches are hardly applicable for Experience Replay, Human Demonstration, Stroke, Heart catheter navigation. Robotic assistance, to date, is limited to Attack, Neurovascular telemanipulation or making the catheter controllable by having a steerable or magnetic tip [1]. In order to overcome the current limitations of catheter control, we propose a closed loop control system based on Deep 1 Introduction Reinforcement Learning (DRL) as shown in Figure 1 for the autonomous navigation of standard passive catheters. The Endovascular catheters are necessary for state-of-the-art system, once fully developed, will use images from the treatments of life-threatening and time-critical diseases like angiography system to track the catheter position in real time. strokes and heart attacks. Navigating a set of catheter and This position data will be fed into a control algorithm which guidewire (henceforth simply referred to as catheter) through was trained for the task of navigating catheters. The control the vascular tree is a highly challenging task. The dynamics of algorithm will provide a robotic catheter manipulator with ______ *Corresponding author: Tobias Behr: Fraunhofer IPA, Mannheim, Germany, e-mail: tobias.behr@ipa.fraunhofer.de Tim Philipp Pusch, Marius Siegfarth, Dominik Hüsener, Tobias Mörschel and Lennart Karstensen: Fraunhofer IPA, Mannheim, Germany Open Access. © 2019 Tobias Behr, Tim Philipp Pusch, Marius Siegfarth, Dominik Hüsener, Tobias Mörschel, Lennart Karstensen published by De Gruyter. This work is licensed under the Creative Commons Attribution 4.0 License. T. Behr et. al., Deep Reinforcement Learning for the Navigation of Neurovascular Catheters — 6 control commands which the manipulator will use to steer the catheter through the vasculature. The angiography system is already the state of the art in modern catheter labs and robot manipulators are the focus of several research groups. With R-One by Robocath or CorPath by Corindus a few catheter manipulators are already commercially available. However, both catheter tracking and control algorithms remain to be developed and are the focus of our joint research project in cooperation with Fraunhofer MEVIS. While Fraunhofer MEVIS focuses on the tracking algorithm we focus on the development of the control Figure 2 Training loop for an agent using a neural net trained with algorithm using DRL. This paper aims to discuss the Deep Reinforcement Learning feasibility of this approach by presenting the first results for a DRL based control system using a passive guidewire in an interaction step is either sparse which means the agent simply idealized vascular phantom. receives -1 point for every step and an additional +1 when the goal is reached or dense with -0.02 generally, an additional +1 when the target is reached and +0.02 multiplied with the distance from the tip of the catheter to the goal for every step. 2 Materials and Methods The distance is calculated as the normalized distance along the centreline. The experimental setup consists of test bench, catheter For continuous control, we use Deep Deterministic Policy tracking, control algorithm and a catheter simulation for the Gradients (DDPG) [3] partially enhanced by Hindsight generation of training data. Component interaction is Experience Replay (HER) [4] and Human Demonstration implemented using OpenIGTLink [2]. (HD) [5]. Discrete control is achieved by a Deep Q-Network (DQN) [6] with the extension of double Q-learning, dueling networks and prioritized experience replay [7]. The 2.1 Control Algorithm implementations by OpenAI Baselines [8] and RLlib [9] are The inputs of the control algorithm are the absolute used respectively. The neural networks consist of three hidden coordinates of the guidewire tip as well as the target and the layer of 256 neurons for DDPG and two hidden layers of 128 rotation of the manipulator. Outputs are either continuous for neurons for DQN. each degree of freedom (DOF) (rotation and translation) or discrete with eight different movement options. The discrete Table 1 Calculation of rewards. Total rewards are calculated as the options represent every possible combination of the two DOF sum of the rewards of the individual steps. excluding the “no-movement-at-all” option. The training of the control algorithm is achieved using the Sparse -1 for every interaction step Deep Reinforcement Learning (DRL) approach [3]. With this Reward approach an agent interacts with an environment and learns +1 once if the goal is reached how to improve its interactions by receiving rewards for its actions as shown in Figure 2. At the beginning of the training Dense -0.02 for every interaction step process the agent is completely unaware of its task and of the Reward environment. Its first actions therefore are performed +0.02 multiplied with the distance of randomly. The rewards the agent receives from the catheter tip and goal, for every environment are used to optimize the neural net. By updating interaction step each neural net parameters with respect to its influence on the +1 once if the goal is reached quality of the actions the agent learns how to perform good actions and ceases to perform bad ones. By and by the actions improve and originate more and more from experience instead of randomness. Table 1 lists the different kinds of rewards an agent can achieve throughout an episode of training. In our setup reward per T. Behr, et al., Deep Reinforcement Learning for the Navigation of Neurovascular Catheters — 7 and targets are chosen randomly. A camera is mounted above 2.2 Simulation Environment for tracking. The phantom is filled with water for better The movement of the guidewire through the vascular tree is visibility and less wall friction. simulated using SOFA [10] with the beam adapter plugin [11]. The walls of the vascular tree are rigid and the friction coefficient between wall and guidewire is 0.1. The lumen is 2.5 Experiments empty, thus no dynamic resistance to the guidewire motion is The aim of our experiments was to evaluate, whether the use considered. The simulation routine consists of receiving the of DRL may be considered a promising approach for the velocities for both DOFs, calculating the movement for the autonomous navigation of passive catheters. We used the appropriate number of steps and sending the catheter position simulation environment described in 2.2 to train the neural to the control algorithm. nets and recorded their training performances. We trained them with the help of two DRL agents and evaluated their performance in an idealized vessel phantom as a first proof-of- 2.3 Catheter Tracking concept. These agents were Deep-Q-Networks (DQN) for In the current experimental setup, the guidewire is monitored discrete control commands and Deep Deterministic Policy by a camera mounted above the vessel phantom. The images Gradient (DDPG) for continuous ones. We performed two sets are processed in real-time by a processing pipeline in of experiments. The first set was carried out to find possible MeVisLab, MeVis Medical Solutions AG, Bremen. To extract advantages of one DRL agent over the other. The second set the catheter shape, the camera image is registered to a CT of experiments was carried out to evaluate if the use of the image of the phantom to mask the non-vessel parts of the enhancements Hindsight Experience Replay and Human image. In the vessel part, the guidewire is extracted via Demonstration improve the training speed of the agents. Only threshold segmentation, its centreline is extracted, and sent to DDPG is used on the test bench and the second experiment set, the control algorithm. because continuous control is preferred to discreet control and the first set will show little difference in the success rates. 3 Results 2.4 Test Bench Figure 3 Vessel phantom made of PMMA with a guidewire inserted Figure 4 Success rate of the agents over training episodes At this stage, we are evaluating the feasibility of DRL for catheter control using only one component of the catheter set, Figure 4 shows that both agents are capable of learning to namely the guidewire. Our test bench therefore features an navigate a guidewire from a random start point in the vessel experimental robotic manipulator, which can handle the two phantom to a random goal. This is achieved with an average DOF of a guidewire. The maximum linear travel is 300 mm success rate of 89.6% for DDPG + HER using sparse reward while the rotation is infinite. The first idealized vessel and 86.5% for DQN + Rainbow using dense reward. Training phantom, as shown in Figure 3, is made of PMMA with a is considered finished when the success rate is higher than the constant vessel diameter of 10mm and it has a bifurcation average of all the following success rates for the first time. The followed by a bi- and trifurcation in one plane. Starting points success rate is measured every 1000 episodes. DDPG finishes T. Behr, et al., Deep Reinforcement Learning for the Navigation of Neurovascular Cathee t rs — 8 training after about 11000 episodes and DQN after about 5000 of different enhancements of DDPG suggests that the training episodes. The standard deviation for four training attempts is speed can be increased significantly using Hindsight 4.4% for DDPG and 6.2% for DQN. Both agents use standard Experience Replay and Human Demonstration. Fast training hyperparameters as given by the OpenAI Baselines and RLlib. is especially interesting for the future for which we plan to After transferring the trained DDPG model to the test bench, have a pre-trained agent learn on patient-specific vasculature we reached the goal 7 out of 10 times. The failed attempts prior to an intervention. Today the image data of stroke happened at junctions, where exact state data is necessary to patients often arrive at the Stroke Unit before the patient does be able to navigate reliably. The simulation does not yet because the patient was first diagnosed in a more remote include noise and latency. hospital. This data could be used to train the pre-trained agent on this patient’s specific anatomy while waiting for him or her to arrive at the Stroke Unit. To date, we have been using highly idealized vascular phantoms and just a guidewire to investigate the feasibility of our approach. The results show that baseline agents for DRL are able to achieve promising results in these idealized phantoms using similar informational content as exists in real interventions. Future research should address the limitations mentioned by adapting the agents and the setup to more realistic vascular geometries and the use of both guidewire and catheter. Author Statement This work was supported by the Fraunhofer Internal Programs Figure 5 Comparison of the training speeds of plain Deep Deterministic Policy Gradient (DDPG), DDPG + Hindsight under Grant No. WISA 833 967. The authors state no conflict Experience Replay (HER) and DDPG + HER + Human of interest. Demonstration (HD) References Figure 5 shows a comparison of the training speeds of a DDPG [1] H. Rafii-Tari, C. J. Payne, G.-Z. Yang, Current and emerging model with and without the use of Hindsight Experience robot-assisted endovascular catheterization technologies: A Replay (HER) and Human Demonstration (HD). Training for review, Ann. Biomed. Eng., vol. 42, no. 4, pp. 697-715, 2014. plain DDPG finished after 12.3 hours (15000 epsiodes) and [2] J. Tokuda, et. al. OpenIGTLink: An open network protocol for image-guided therapy environment: Int. J. Med. Robot. achieved a success rate of 83.1%. Training of DDPG with Comput. Assist. Surg., 2009, 5, (4), pp. 423–434 (doi: HER and HER + HD both finish after about 9.0 hours (11000 10.1002/rcs.274) epsiodes) and achieved success rates of 89.6% and 87.9% [3] T. Lillicrap et al., Continuous Control with Deep Reinforcement respectively. The use of DDPG + HER + HD led to a Learning, arXiv:1509.02971v2, 2015 significantly faster increase of the success rate during the first [4] M. Andrychowicz, et al. Hindsight Experience Replay: Advances in Neural Information Processing Systems, pages hour of training than when using just DDPG or DDPG+HER. 5048–5058, 2017. Training 1000 epsiodes took 0.82h on a computer equipped [5] Nair et al, Overcoming Exploration in Reinforcement Learning with the CPU AMD Ryzen 7 2700X, GPU GeForce RTX 2070 with Demonstrations, arXiv:1709.10089v2, 2018 and 48 GB RAM. [6] V. Mnih et al., Playing Atari with Deep Reinforcement Learning, arXiv:1312.5092, 2013 [7] M. Hessel, et al., Rainbow: Combining Improvements in Deep Reinforcement Learning. arXiv:1710.02298 4 Discussion [8] OpenAI Baselines, https://github.com/openai/baselines [9] RLlib, https://ray.readthedocs.io/en/latest/rllib.html [10] F. Faure, et al., “SOFA: A Multi-Model Framework for Both DQN and DDPG perform quite similar in an idealized Interactive Physical Simulation”, Soft tissue biomechanical vessel phantom. DQN trains faster while DDPG achieves more modeling for computer assisted surgery. Springer, Berlin, stable results and requires less domain specific knowledge for Heidelberg, 2012. 283-321. [11] C. Duriez, et al., “New approaches to catheter navigation for reward calculation. The results are overall promising given interventional radiology simulation”, Computer aided surgery that no case-specific adaptions have yet been made to the 11, no. 6, 2006: 300-3 baseline agent algorithms. The comparison of training speeds http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Current Directions in Biomedical Engineering de Gruyter

Deep Reinforcement Learning for the Navigation of Neurovascular Catheters

Loading next page...
 
/lp/de-gruyter/deep-reinforcement-learning-for-the-navigation-of-neurovascular-PfiSPUDVVD

References (6)

Publisher
de Gruyter
Copyright
© 2019 by Walter de Gruyter Berlin/Boston
eISSN
2364-5504
DOI
10.1515/cdbme-2019-0002
Publisher site
See Article on Publisher Site

Abstract

Current Directions in Biomedical Engineering 2019;5(1):5-8 Tobias Behr*, Tim Philipp Pusch, Marius Siegfarth, Dominik Hüsener, Tobias Mörschel and Lennart Karstensen Deep Reinforcement Learning for the Navigation of Neurovascular Catheters https://doi.org/10.1515/cdbme-2019-0002 Abstract: Endovascular catheters are necessary for state-of- the-art treatments of life-threatening and time-critical diseases like strokes and heart attacks. Navigating them through the vascular tree is a highly challenging task. We present our preliminary results for the autonomous control of a guidewire through a vessel phantom with the help of Deep Reinforcement Learning. We trained Deep-Q-Network (DQN) and Deep Deterministic Policy Gradient (DDPG) agents on a simulated vessel phantom and evaluated the training performance. We also investigated the effect of the two enhancements Hindsight Experience Replay (HER) and Human Demonstration (HD) on the training speed of our agents. The results show that the agents are capable of learning to navigate a guidewire from a random start point in the vessel phantom to a random goal. This is achieved with an average Figure 1 Closed loop for autonomous passive catheter success rate of 86.5% for DQN and 89.6% for DDPG. The use navigation of HER and HD significantly increases the training speed. The catheters are difficult to predict or calculate and vascular trees results are promising and future research should address more have a high variance in shape. Therefore movements at the complex vessel phantoms and the use of a combination of proximal end of the catheter translate into unexpected guidewire and catheter. movements at the distal tip due to friction and elastic Keywords: Deep Reinforcement Learning, Catheter, deformation of catheter and vessel wall. For this reason, Guidewire, Endovascular, DQN, DDPG, Hindsight conventional control approaches are hardly applicable for Experience Replay, Human Demonstration, Stroke, Heart catheter navigation. Robotic assistance, to date, is limited to Attack, Neurovascular telemanipulation or making the catheter controllable by having a steerable or magnetic tip [1]. In order to overcome the current limitations of catheter control, we propose a closed loop control system based on Deep 1 Introduction Reinforcement Learning (DRL) as shown in Figure 1 for the autonomous navigation of standard passive catheters. The Endovascular catheters are necessary for state-of-the-art system, once fully developed, will use images from the treatments of life-threatening and time-critical diseases like angiography system to track the catheter position in real time. strokes and heart attacks. Navigating a set of catheter and This position data will be fed into a control algorithm which guidewire (henceforth simply referred to as catheter) through was trained for the task of navigating catheters. The control the vascular tree is a highly challenging task. The dynamics of algorithm will provide a robotic catheter manipulator with ______ *Corresponding author: Tobias Behr: Fraunhofer IPA, Mannheim, Germany, e-mail: tobias.behr@ipa.fraunhofer.de Tim Philipp Pusch, Marius Siegfarth, Dominik Hüsener, Tobias Mörschel and Lennart Karstensen: Fraunhofer IPA, Mannheim, Germany Open Access. © 2019 Tobias Behr, Tim Philipp Pusch, Marius Siegfarth, Dominik Hüsener, Tobias Mörschel, Lennart Karstensen published by De Gruyter. This work is licensed under the Creative Commons Attribution 4.0 License. T. Behr et. al., Deep Reinforcement Learning for the Navigation of Neurovascular Catheters — 6 control commands which the manipulator will use to steer the catheter through the vasculature. The angiography system is already the state of the art in modern catheter labs and robot manipulators are the focus of several research groups. With R-One by Robocath or CorPath by Corindus a few catheter manipulators are already commercially available. However, both catheter tracking and control algorithms remain to be developed and are the focus of our joint research project in cooperation with Fraunhofer MEVIS. While Fraunhofer MEVIS focuses on the tracking algorithm we focus on the development of the control Figure 2 Training loop for an agent using a neural net trained with algorithm using DRL. This paper aims to discuss the Deep Reinforcement Learning feasibility of this approach by presenting the first results for a DRL based control system using a passive guidewire in an interaction step is either sparse which means the agent simply idealized vascular phantom. receives -1 point for every step and an additional +1 when the goal is reached or dense with -0.02 generally, an additional +1 when the target is reached and +0.02 multiplied with the distance from the tip of the catheter to the goal for every step. 2 Materials and Methods The distance is calculated as the normalized distance along the centreline. The experimental setup consists of test bench, catheter For continuous control, we use Deep Deterministic Policy tracking, control algorithm and a catheter simulation for the Gradients (DDPG) [3] partially enhanced by Hindsight generation of training data. Component interaction is Experience Replay (HER) [4] and Human Demonstration implemented using OpenIGTLink [2]. (HD) [5]. Discrete control is achieved by a Deep Q-Network (DQN) [6] with the extension of double Q-learning, dueling networks and prioritized experience replay [7]. The 2.1 Control Algorithm implementations by OpenAI Baselines [8] and RLlib [9] are The inputs of the control algorithm are the absolute used respectively. The neural networks consist of three hidden coordinates of the guidewire tip as well as the target and the layer of 256 neurons for DDPG and two hidden layers of 128 rotation of the manipulator. Outputs are either continuous for neurons for DQN. each degree of freedom (DOF) (rotation and translation) or discrete with eight different movement options. The discrete Table 1 Calculation of rewards. Total rewards are calculated as the options represent every possible combination of the two DOF sum of the rewards of the individual steps. excluding the “no-movement-at-all” option. The training of the control algorithm is achieved using the Sparse -1 for every interaction step Deep Reinforcement Learning (DRL) approach [3]. With this Reward approach an agent interacts with an environment and learns +1 once if the goal is reached how to improve its interactions by receiving rewards for its actions as shown in Figure 2. At the beginning of the training Dense -0.02 for every interaction step process the agent is completely unaware of its task and of the Reward environment. Its first actions therefore are performed +0.02 multiplied with the distance of randomly. The rewards the agent receives from the catheter tip and goal, for every environment are used to optimize the neural net. By updating interaction step each neural net parameters with respect to its influence on the +1 once if the goal is reached quality of the actions the agent learns how to perform good actions and ceases to perform bad ones. By and by the actions improve and originate more and more from experience instead of randomness. Table 1 lists the different kinds of rewards an agent can achieve throughout an episode of training. In our setup reward per T. Behr, et al., Deep Reinforcement Learning for the Navigation of Neurovascular Catheters — 7 and targets are chosen randomly. A camera is mounted above 2.2 Simulation Environment for tracking. The phantom is filled with water for better The movement of the guidewire through the vascular tree is visibility and less wall friction. simulated using SOFA [10] with the beam adapter plugin [11]. The walls of the vascular tree are rigid and the friction coefficient between wall and guidewire is 0.1. The lumen is 2.5 Experiments empty, thus no dynamic resistance to the guidewire motion is The aim of our experiments was to evaluate, whether the use considered. The simulation routine consists of receiving the of DRL may be considered a promising approach for the velocities for both DOFs, calculating the movement for the autonomous navigation of passive catheters. We used the appropriate number of steps and sending the catheter position simulation environment described in 2.2 to train the neural to the control algorithm. nets and recorded their training performances. We trained them with the help of two DRL agents and evaluated their performance in an idealized vessel phantom as a first proof-of- 2.3 Catheter Tracking concept. These agents were Deep-Q-Networks (DQN) for In the current experimental setup, the guidewire is monitored discrete control commands and Deep Deterministic Policy by a camera mounted above the vessel phantom. The images Gradient (DDPG) for continuous ones. We performed two sets are processed in real-time by a processing pipeline in of experiments. The first set was carried out to find possible MeVisLab, MeVis Medical Solutions AG, Bremen. To extract advantages of one DRL agent over the other. The second set the catheter shape, the camera image is registered to a CT of experiments was carried out to evaluate if the use of the image of the phantom to mask the non-vessel parts of the enhancements Hindsight Experience Replay and Human image. In the vessel part, the guidewire is extracted via Demonstration improve the training speed of the agents. Only threshold segmentation, its centreline is extracted, and sent to DDPG is used on the test bench and the second experiment set, the control algorithm. because continuous control is preferred to discreet control and the first set will show little difference in the success rates. 3 Results 2.4 Test Bench Figure 3 Vessel phantom made of PMMA with a guidewire inserted Figure 4 Success rate of the agents over training episodes At this stage, we are evaluating the feasibility of DRL for catheter control using only one component of the catheter set, Figure 4 shows that both agents are capable of learning to namely the guidewire. Our test bench therefore features an navigate a guidewire from a random start point in the vessel experimental robotic manipulator, which can handle the two phantom to a random goal. This is achieved with an average DOF of a guidewire. The maximum linear travel is 300 mm success rate of 89.6% for DDPG + HER using sparse reward while the rotation is infinite. The first idealized vessel and 86.5% for DQN + Rainbow using dense reward. Training phantom, as shown in Figure 3, is made of PMMA with a is considered finished when the success rate is higher than the constant vessel diameter of 10mm and it has a bifurcation average of all the following success rates for the first time. The followed by a bi- and trifurcation in one plane. Starting points success rate is measured every 1000 episodes. DDPG finishes T. Behr, et al., Deep Reinforcement Learning for the Navigation of Neurovascular Cathee t rs — 8 training after about 11000 episodes and DQN after about 5000 of different enhancements of DDPG suggests that the training episodes. The standard deviation for four training attempts is speed can be increased significantly using Hindsight 4.4% for DDPG and 6.2% for DQN. Both agents use standard Experience Replay and Human Demonstration. Fast training hyperparameters as given by the OpenAI Baselines and RLlib. is especially interesting for the future for which we plan to After transferring the trained DDPG model to the test bench, have a pre-trained agent learn on patient-specific vasculature we reached the goal 7 out of 10 times. The failed attempts prior to an intervention. Today the image data of stroke happened at junctions, where exact state data is necessary to patients often arrive at the Stroke Unit before the patient does be able to navigate reliably. The simulation does not yet because the patient was first diagnosed in a more remote include noise and latency. hospital. This data could be used to train the pre-trained agent on this patient’s specific anatomy while waiting for him or her to arrive at the Stroke Unit. To date, we have been using highly idealized vascular phantoms and just a guidewire to investigate the feasibility of our approach. The results show that baseline agents for DRL are able to achieve promising results in these idealized phantoms using similar informational content as exists in real interventions. Future research should address the limitations mentioned by adapting the agents and the setup to more realistic vascular geometries and the use of both guidewire and catheter. Author Statement This work was supported by the Fraunhofer Internal Programs Figure 5 Comparison of the training speeds of plain Deep Deterministic Policy Gradient (DDPG), DDPG + Hindsight under Grant No. WISA 833 967. The authors state no conflict Experience Replay (HER) and DDPG + HER + Human of interest. Demonstration (HD) References Figure 5 shows a comparison of the training speeds of a DDPG [1] H. Rafii-Tari, C. J. Payne, G.-Z. Yang, Current and emerging model with and without the use of Hindsight Experience robot-assisted endovascular catheterization technologies: A Replay (HER) and Human Demonstration (HD). Training for review, Ann. Biomed. Eng., vol. 42, no. 4, pp. 697-715, 2014. plain DDPG finished after 12.3 hours (15000 epsiodes) and [2] J. Tokuda, et. al. OpenIGTLink: An open network protocol for image-guided therapy environment: Int. J. Med. Robot. achieved a success rate of 83.1%. Training of DDPG with Comput. Assist. Surg., 2009, 5, (4), pp. 423–434 (doi: HER and HER + HD both finish after about 9.0 hours (11000 10.1002/rcs.274) epsiodes) and achieved success rates of 89.6% and 87.9% [3] T. Lillicrap et al., Continuous Control with Deep Reinforcement respectively. The use of DDPG + HER + HD led to a Learning, arXiv:1509.02971v2, 2015 significantly faster increase of the success rate during the first [4] M. Andrychowicz, et al. Hindsight Experience Replay: Advances in Neural Information Processing Systems, pages hour of training than when using just DDPG or DDPG+HER. 5048–5058, 2017. Training 1000 epsiodes took 0.82h on a computer equipped [5] Nair et al, Overcoming Exploration in Reinforcement Learning with the CPU AMD Ryzen 7 2700X, GPU GeForce RTX 2070 with Demonstrations, arXiv:1709.10089v2, 2018 and 48 GB RAM. [6] V. Mnih et al., Playing Atari with Deep Reinforcement Learning, arXiv:1312.5092, 2013 [7] M. Hessel, et al., Rainbow: Combining Improvements in Deep Reinforcement Learning. arXiv:1710.02298 4 Discussion [8] OpenAI Baselines, https://github.com/openai/baselines [9] RLlib, https://ray.readthedocs.io/en/latest/rllib.html [10] F. Faure, et al., “SOFA: A Multi-Model Framework for Both DQN and DDPG perform quite similar in an idealized Interactive Physical Simulation”, Soft tissue biomechanical vessel phantom. DQN trains faster while DDPG achieves more modeling for computer assisted surgery. Springer, Berlin, stable results and requires less domain specific knowledge for Heidelberg, 2012. 283-321. [11] C. Duriez, et al., “New approaches to catheter navigation for reward calculation. The results are overall promising given interventional radiology simulation”, Computer aided surgery that no case-specific adaptions have yet been made to the 11, no. 6, 2006: 300-3 baseline agent algorithms. The comparison of training speeds

Journal

Current Directions in Biomedical Engineeringde Gruyter

Published: Sep 1, 2019

Keywords: Deep Reinforcement Learning; Catheter; Guidewire; Endovascular; DQN; DDPG; Hindsight Experience Replay; Human Demonstration; Stroke; Heart Attack; Neurovascular

There are no references for this article.