Access the full text.
Sign up today, get DeepDyve free for 14 days.
DE GRUYTER Current Directions in Biomedical Engineering 2022;8(1): 21-24 Jacqueline Ritter, Lennart Karstensen*, Jens Langejürgen, Johannes Hatzl, Franziska Mathis-Ullrich, and Christian Uhl Quality-dependent Deep Learning for Safe Autonomous Guidewire Navigation https://doi.org/10.1515/cdbme-2022-0006 Abstract: Cardiovascular diseases are the main cause of death worldwide. State-of-the-art treatment often includes the process of navigating endovascular instruments through the vasculature. Automation of the procedure receives much at- tention lately and may increase treatment quality and unburden clinicians. However, in order to ensure the patient’s safety the endovascular device needs to be steered carefully through the body. In this work, we present a collection of medical criteria that are considered by physicians during an intervention and that can be evaluated automatically enabling a highly objective Fig. 1: Aortic arch model with centerlines showing the target, assessment. Additionally, we trained an autonomous controller insertion point, the guidewire, and the sensitive tissue that leads to with deep reinforcement learning to gently navigate within a the heart valve. 2D simulation of an aortic arch. Among others, the controller reduced the maximum and mean contact force applied to the vessel walls by 43% and 29%, respectively. Automation of this intervention has attracted much atten- tion recently as it might improve patient safety and operation Keywords: deep reinforcement learning, safety, guidewire efficiency, reduce complications for the patients, and decrease navigation, autonomous, machine learning fatigue and radiation of the clinicians. [1–3] In order to assess the quality of a minimally invasive 1 Introduction endovascular procedure experienced surgeons use structured grading scales such as the global rating scale (GRS) to ensure One third of all deaths worldwide are caused by cardiovascular objective evaluation. The GRS rates the four main aspects of a diseases. These include in particular the common maladies of procedure on a scale from one to v fi e, i.e. the flow of operation, heart attack and stroke, which are often treated with endovas- instrument handling, time and motion, and respect for tissue cular therapy. During an endovascular intervention, a catheter [4]. If the procedure is executed on a simulator, automatically and a guidewire are inserted into an access vessel and steered measured metrics such as procedure time or the contrast vol- through the vasculature until the target is reached. In order to ume can be used to evaluate the quality [5]. Rafii-Tari et al. [6] navigate the endovascular device it is rotated and translated at developed a framework to measure the catheter-tissue contact the proximal end. During the procedure the surgeons are visu- forces as well as operator motion patterns. Additionally, they ally guided by 2D fluoroscopy images that show the position indicate that a low standard deviation of the translation speed of the guidewire within the human vasculature. suggests smooth and controlled navigation behavior. In com- parison to novices, experienced surgeons achieve a reduced number of translational guidewire movements, smoother mo- *Corresponding author: Lennart Karstensen, Fraunhofer IPA, tion in general, a shorter total path length of the device tip, and Theodor-Kutzer-Ufer 1-3, 68167 Mannheim, Germany, e-mail: apply less torque and force to the device, which suggests that lennart.karstensen@ipa.fraunhofer.de those criteria are an indicator for higher quality [7], [8]. Jacqueline Ritter, Jens Langejürgen, Fraunhofer IPA, In research regarding autonomous control of endovascu- Mannheim, Germany Johannes Hatzl, Christian Uhl, Department of Vascular and lar guidewires navigation quality is considered to a limited ex- Endovascular Surgery, University Hosptial Heidelberg, Heidelberg, tend. Zhao et al. [1] train a CNN based controller with image Germany and force measurements as input. The force data is used to *Corresponding author: Lennart Karstensen, Franziska detect a collision between the device tip and obstacles such Mathis-Ullrich, Institute for Anthropomatics and Robotics, Karl- as plague on the vessel wall. As a result, the controller then sruhe Institute of Technology, Karlsruhe, Germany Open Access. © 2022 The Author(s), published by De Gruyter. This work is licensed under the Creative Commons Attribution 4.0 International License. 21 J. Ritter et al., Safe autonomous guidewire navigation executes an avoiding action, i.e. pulling back the guidewire Forward motion is defined as and rotating it by an angle. The system is trained on three dif- Δ𝑝𝑜𝑠 𝑡𝑖𝑝 𝑓𝑜𝑟𝑤𝑎𝑟𝑑 𝑚𝑜𝑡𝑖𝑜𝑛 = (1) ferent rigid models and tested on a separate one. The evalua- 𝑣 · Δ 𝑡𝑟𝑎𝑛𝑠 𝑡 tion metrics comprise the target reached, the operating force, where Δ denotes the duration of one control step, 𝑣 rep- 𝑡 𝑡𝑟𝑎𝑛𝑠 and the navigation time. Chi et al. [2] navigate a guidewire resents the translation velocity, and Δ𝑝𝑜𝑠 defines the dis- 𝑡𝑖𝑝 through a flexible aortic arch model by training a neural net- tance between the position of the device tip in the current and work with imitation and reinforcement learning. Electromag- the previous control step. If the device tip moves freely it per- netic tracking data of the device tip serve as input. The per- forms the same translation movement as the manipulation at formance is evaluated on two different models and compared its base and the value is close to one. Larger values are an in- to the navigation of an experienced surgeon. The evaluation dication that forces and torques are built up in the instrument, criteria comprise the path length, the mean and maximal force since the movement at the base is converted into deformation applied to the vessel walls, and whether the target was reached. of the instrument instead of a movement of the tip. If the force Karstensen et al. [3] propose a controller that is trained in a and torque are released the instrument snaps back into its rest simulation environment and evaluated in an ex-vivo specimen. shape which increases the risk of perforation. The coordinates of the guidewire tip are used as input to train In order to avoid inaccuracies when computing the dis- the neural network with deep reinforcement learning. Naviga- tance between the tip and the centerline we assume that the tion quality is not considered, instead the controller is evalu- centerline points are densely sampled. For each control step ated on the number of randomly distributed targets that can be the distance between the device tip and the closest point on reached within a fixed time span. the centerline is computed. The aforementioned works mostly focus on whether the Adhering to the defined criteria ensures a high quality desired target point was reached and do not optimize for safe navigation process that minimizes the risk for complications navigation. However, in order to keep the risk of damage as for the patient. Furthermore, all criteria can be evaluated auto- low as possible additional criteria need to be taken into con- matically, which makes the evaluation of the procedure highly sideration. In this work, we propose a collection of criteria objective and efficient. that ensures the patient’s safety during endovascular guidewire navigation. They can be measured automatically in a simula- tion environment resulting in a highly objective assessment of 2.2 Aortic Arch Environment the quality. Additionally, we train an autonomous reinforce- ment learning controller that is able to reach arbitrary target Training and evaluation of the controller are carried out in a points within a fixed aortic arch geometry and adheres to our 2D simulation environment. The shape of the aortic arch envi- criteria. ronment and the guidewire are shown in Fig. 1. The guidewire is modeled as a multibody system, in which the elements are connected by damped linear and damped rotational springs. The number of elements varies according to the total length of 2 Methods the guidewire. It is navigated through the aortic arch by trans- lation and rotation at its base. The simulation itself is based on 2.1 Gentle Navigation pymunk, a 2D rigid body physics library [9]. The quality of the navigation process is evaluated with respect to the following criteria: 2.3 Autonomous Controller ∘ Mean contact force applied to surrounding tissue. The design of the neural network controller is based on the ∘ Maximum contact force applied to surrounding tissue. work of Karstensen et al. [3]. However, we enhance the feed- ∘ Total path length covered by device tip. forward architecture using the deep reinforcement learning al- ∘ Standard deviation of translation velocities. gorithm soft actor critic (SAC), which is less sensitive to the ∘ Number and distance of withdrawals. correct choice of hyperparameters [10]. The output of the ac- ∘ Total navigation time. tor network consists of the mean and the standard deviation of ∘ Forward motion of device tip. a normal distribution for every dimension in the action space. ∘ Distance between tip and vessel centerlines. The actions, i.e. translation and rotation velocities, are sam- pled from the corresponding distribution. The observations that serve as input to the neural network controller are defined 22 J. Ritter et al., Safe autonomous guidewire navigation as the current and last position of the guidewire tip, the ac- the controller from executing unforeseen behavior. The suc- tion between them, as well as the target position. The targets cess rate is defined as the percentage of evaluation episodes are randomly distributed within the aortic arch excluding the where the controller is able to reach the target. aorta. For each reward set-up the state of the trained controller During training the navigation task of steering the that reaches the highest success rate is additionally evaluated guidewire from a starting point to a target is executed for 2· 10 on the criteria from Section 2.1 for 1000 episodes. Per episode steps. One navigation task is called episode. An episode is the total path length is divided by the length of the optimal completed either if the target point is reached within a thresh- path along the centerlines in order to allow for comparison. old radius of 5 mm or 200 simulation steps have been per- The metric is denoted by the path length ratio. Its value should formed, which corresponds to 27 seconds in a similar real ideally be close to 1.0. The distance of withdrawals, the for- world scenario. ward motion of the device tip, and the distance between the tip Two different controllers were trained. The baseline con- and the centerlines are averaged over all obtained values. For troller is trained with an extended version of the sparse reward all other criteria the final value of the episode is used. Note that proposed in [3], where additionally the pathlength difference the simulation focuses on optimizing realistic behavior rather between two steps is taken into account. Pathlength is the dis- than realistic forces. tance between device tip and target along the centerlines. The reward extension provides the controller with further informa- tion about the environment that speeds up the training process 3 Results when using SAC. The reward that is used to train the baseline controller is denoted as The learning curve for the two controllers is depicted in Fig. {︃ 2. After 1.25 · 10 steps the quality controller reaches 85% 1.0 target reached 𝑅 = −0.005−0.001· Δ𝑝𝑎𝑡ℎ𝑙𝑒𝑛𝑔𝑡ℎ+ 𝑏𝑎𝑠𝑒 of the targets. It continues to slowly improve until it reaches 0.0 else a maximum of 95.6% after 1.90 · 10 steps. The success of In order to train a controller that adheres to the criteria the baseline controller raises at a faster rate and reaches 89% 6 6 stated in Section 2.1 the reward 𝑅 and 𝑅 are added after 0.35 · 10 steps and a maximum of 96.3% after 1.95 · 10 𝑓𝑜𝑟𝑐𝑒 𝑣𝑎𝑙𝑣𝑒 to 𝑅 . Note that since the criteria depend on each other it is exploration steps. 𝑏𝑎𝑠𝑒 sufficient to consider a subset for the reward function. The evaluation of the criteria from Section 2.1 is summa- rized in Tab. 1. The quality controller outperforms the baseline −7 𝑅 = −4.93 · 10 · 𝑓𝑜𝑟𝑐𝑒 𝑓𝑜𝑟𝑐𝑒 for most of the criteria. However, the baseline controller nav- {︃ igates closer to the centerlines, withdraws the guidewire less −0.1 device touches heart valve 𝑅 = 𝑣𝑎𝑙𝑣𝑒 often and navigates quicker. Fig. 3 shows the trajectories of 0.0 else the two controllers in the aortic arch model aiming to reach Here, the reward 𝑅 punishes contact forces applied to 𝑓𝑜𝑟𝑐𝑒 the same target. The trajectory of both controllers have a sim- the vessel wall. Note that its value is zero if no contact force ilar length, while the baseline controller applies more contact is executed. 𝑅 penalizes any contact of the device with 𝑣𝑎𝑙𝑣𝑒 force overall. the part of the aortic arch that leads to the heart valve, which consists of particularly sensitive tissue. The area that models the sensitive tissue leading to the heart valve is highlighted in red in Fig. 1. The overall reward per step is denoted as 𝑅 𝑞𝑢𝑎𝑙𝑖𝑡𝑦 and combines the rewards described above. 𝑅 = 𝑅 + 𝑅 + 𝑅 𝑞𝑢𝑎𝑙𝑖𝑡𝑦 𝑏𝑎𝑠𝑒 𝑓𝑜𝑟𝑐𝑒 𝑣𝑎𝑙𝑣𝑒 2.4 Evaluation Every 5· 10 steps the success rate of the controller is evaluated for 1000 consecutive episodes. During evaluation, the mean of each distribution is used as an action, thus rendering it deter- Fig. 2: Success rate of the baseline and quality controller during ministic. Especially in medical applications the decision of an the training process. autonomous system needs to be traceable in order to prevent 23 J. Ritter et al., Safe autonomous guidewire navigation Tab. 1: Evaluation of the controllers on the collection of criteria. adjustment. Using our approach the controller learns to navi- gate the unique aortic arch it is trained on. However, it needs to Metric Baseline Quality be able to generalize to unseen aortic arch geometries in order Success rate 96.3% 95.6% to make real world application possible. Future work should Mean contact force [mN] 6.81 4.83 address this problem, e.g., by incorporating recurrent neural Max. contact force [mN] 27.84 15.83 networks. Path length ratio 1.09 0.98 In conclusion, we derived criteria for the quality of Std translation [mm/s] 2.10 1.58 guidewire navigation and successfully trained a deep-learning- Number withdrawals 5.20 13.59 Withdrawal distance [mm] 16.47 8.50 based controller in a 2D simulation to improve these criteria Total navigation time [s] 6.76 7.07 compared to the state of the art. Forward motion 1.20 1.01 Distance centerlines [mm] 4.49 4.76 Author Statement This project is funded by the Ministry of Economics, Labor and Tourism Baden-Württemberg within the framework of the Forum Gesundheitsstandort Baden-Württemberg. References [1] Zhao Y, Guo S, Wang Y, Cui J, Ma Y, Zeng Y, et al. A CNN- based prototype method of unstructured surgical state per- ception and navigation for an endovascular surgery. robot. Med. Biol. Eng. Comput. 2019;57:1875-1887. [2] Chi W, Liu J, Rafii-Tari H, Riga CV, Bicknell CD, Yang G (a) Baseline controller (b) Quality controller Learning-based endovascular navigation through the use of non-rigid registration for collaborative robotic catheterization. Fig. 3: Navigation trajectories from the insertion point to the same Int. J. Comput. Assist. Radiol. Surg. 2018;13:855-864. target. Higher contact forces are indicated by a larger radius of the [3] Karstensen L, Ritter J, Hatzl J, Pätz T, Langejürgen J, Uhl corresponding circle. C, et al. Learning-Based Autonomous Vascular Guidewire Navigation without Human Demonstration in the Venous System of a Porcine Liver. Int. J. Comput. Assist. Radiol. Surg. Forthcoming 2022. 4 Discussion & Conclusion [4] Reznick R, MacRae H Teaching surgical skills changes in the wind. N. Engl. J. Med. 2006;25:2664-2669. The quality controller navigates the guidewire with a lower [5] Mazomenos EB, Chang P, Rolls A, Hawkes DJ, Bicknell CD, mean and maximum contact force, which decreases the risk Vander Poorten E, et al. A survey on the current status and for vessel damage for the patient. Fig. 3 shows the different future challenges towards objective skills assessment in endovascular surgery. J. Med. Robot. Res. 2013;0:1-19. behaviors when navigating the same target. The reduced con- [6] Rafii-Tari H, Payne CJ, Liu J, Riga CV, Bicknell CD, Yang G tact force can be explained by the low standard deviation of Towards Automated Surgical Skill Evaluation of Endovascular the translation velocity resulting in a smoother and more con- Catheterization Tasks based on Force and Motion Signa- trolled behavior. Additionally, when touching the wall and thus tures. IEEE Int. Conf. Robot. Autom. 2015;1789-1794. applying contact force the quality controller withdraws the de- [7] Rafii-Tari H, Payne CJ, Bicknell CD, Kwok K, Cheshire NJW, Riga CV, et al. Objective Assessment of Endovascu- vice, rotates it and advances it rather than rotating the device lar Navigation Skills with Force Sensing. Ann. Biomed. Eng. while it pushes against the wall. Hence, the guidewire applies 2017;45:1315-1327. less force overall, but requires an increased amount of with- [8] Rolls AE, Riga CV, Bicknell CD, Stoyanov DV, Shah CV, drawals and more time to reach the target. Consequently, the Van Herzeele I, et al. A Pilot Study of Video-motion Anal- forward motion of the quality controller is closer to 1.0 cor- ysis in Endovascular Surgery: Development of Real-time responding to less build up of torque and force, and therefore Discriminatory Skill Metrics. Eur. J. Vasc. Endovasc. Surg. 2013;45:509-515. less snapping. [9] Blomqvist V Pymunk: An easy-to-use pythonic rigid body 2d Despite the good results for the simplified 2D simulation physics library (version 6.2.1). 2007. environment, the controller is yet to be transferred to a more [10] Haarnoja T, Zhou A, Hartikainen K, Tucker G, Ha S, Tan realistic 3D environment and the real world, which will in- J, et al. Soft Actor-Critic Algorithms and Applications. crease the complexity of the task and require hyperparameter arXiv:1812.05905 2019.
Current Directions in Biomedical Engineering – de Gruyter
Published: Jul 1, 2022
Keywords: deep reinforcement learning; safety; guidewire navigation; autonomous; machine learning
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.