Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

AlphaPilot: autonomous drone racing

AlphaPilot: autonomous drone racing This paper presents a novel system for autonomous, vision-based drone racing combining learned data abstraction, nonlinear filtering, and time-optimal trajectory planning. The system has successfully been deployed at the first autonomous drone racing world championship: the 2019 AlphaPilot Challenge. Contrary to traditional drone racing systems, which only detect the next gate, our approach makes use of any visible gate and takes advantage of multiple, simultaneous gate detections to compensate for drift in the state estimate and build a global map of the gates. The global map and drift-compensated state estimate allow the drone to navigate through the race course even when the gates are not immediately visible and further enable to plan a near time-optimal path through the race course in real time based on approximate drone dynamics. The proposed system has been demonstrated to successfully guide the drone through tight race courses reaching speeds up to 8m/s and ranked second at the 2019 AlphaPilot Challenge. Keyword Drone racing . Agile flight . Aerial vehicles 1 Introduction fast dynamic obstacles and enhance their maneuverability in confined spaces. Human pilots have shown that drones are 1.1 Motivation capable of flying through complex environments, such as race courses, at breathtaking speeds. However, autonomous Autonomous drones have seen a massive gain in robustness drones are still far from human performance in terms of in recent years and perform an increasingly large set of tasks speed, versatility, and robustness, so that a lot of research across various commercial industries; however, they are still and innovation is needed in order to fill this gap. far from fully exploiting their physical capabilities. Indeed, In order to push the capabilities and performance of most autonomous drones only fly at low speeds near hover autonomous drones, in 2019, Lockheed Martin and the Drone 1,2 conditions in order to be able to robustly sense their environ- Racing League have launched the AlphaPilot Challenge , ment and to have sufficient time to avoid obstacles. Faster an open innovation challenge with a grand prize of $1 million. and more agile flight could not only increase the flight range The goal of the challenge is to develop a fully autonomous of autonomous drones, but also improve their ability to avoid drone that navigates through a race course using machine vision, and which could one day beat the best human pilot. While other autonomous drone races Moon et al. (2017, Video of our approach:https://youtu.be/DGjwm5PZQT8 2019) focus on complex navigation, the AlphaPilot Chal- Talk at RSS 2020:https://youtu.be/k6vGEj1ZZWc lenge pushes the limits in terms of speed and course size to RSS paper:http://rpg.ifi.uzh.ch/docs/RSS20_Foehn.pdf advance the state of the art and enter the domain of human performance. Due to the high speeds at which drones must fly Philipp Foehn, Dario Brescianini, and Elia Kaufmann have contributed in order to beat the best human pilots, the challenging visual equally. environments (e.g., low light, motion blur), and the limited All authors are with the Robotics and Perception Group. computational power of drones, autonomous drone racing B Philipp Foehn https://thedroneracingleague.com/airr/ foehn@ifi.uzh.ch https://www.nytimes.com/2019/03/26/technology/alphapilot-ai- drone-racing.html Dep. of Informatics, Dep. of Neuroinformatics, University of Zurich and ETH, Zurich, Switzerland 123 Autonomous Robots by optic flow is then used to steer the drone towards the detected gate. While this approach is successfully deployed on a real robotic system, the generated control commands do not account for the underactuated system dynamics of the quadrotor, constraining this method to low-speed flight. The approach presented in Kaufmann et al. (2018) also relies on relative gate data but has the advantage that it works even when no gate is visible. In particular, it uses a CNN to directly infer relative gate poses from images and fuse the results with a VIO state estimate. However, the CNN does not perform well when multiple gates are visible as it is frequently the case for drone racing. Assuming knowledge of the platform state and the envi- ronment, there exist many approaches which can reliably generate feasible trajectories with high efficiency. The most Fig. 1 Our AlphaPilot drone waiting on the start podium to prominent line of work exploits the quadrotor’s underactu- autonomously race through the gates ahead ated nature and the resulting differentially-flat output states Mellinger et al. (2012); Mueller et al. (2015), where trajecto- raises fundamental challenges in real-time state estimation, ries are described as polynomials in time. Other approaches perception, planning, and control. additionally incorporate obstacle avoidance Zhou et al. (2019); Gao et al. (2019) or perception constraints Falanga 1.2 Related work et al. (2018); Spasojevic et al. (2020). However, in the context of drone racing, specifically the AlphaPilot Challenge, obsta- Autonomous navigation in indoor or GPS-denied environ- cle avoidance is often not needed, but time-optimal planning ments typically relies on simultaneous localization and is of interest. There exists a handful of approaches for time- mapping (SLAM), often in the form of visual-inertial odom- optimal planning Hehn et al. (2012); Loock et al. (2013); etry (VIO) Cadena et al. (2016). There exists a variety of VIO Ryou et al. (2020); Foehn and Scaramuzza (2020). However, algorithms, e.g., Mourikis and Roumeliotis (2007); Bloesch while Hehn et al. (2012); Loock et al. (2013) are limited to 2D scenarios and only find trajectories between two given et al. (2015); Qin et al. (2018); Forster et al. (2017a), that are based on feature detection and tracking that achieve very states, Ryou et al. (2020) requires simulation and real-world good results in general navigation tasks Delmerico and Scara- data obtained on the track, and the method of Foehn and muzza (2018). However, the performance of these algorithms Scaramuzza (2020) is not applicable due to computational significantly degrades during agile and high-speed flight as constraints. encountered in drone racing. The drone’s high translational and rotational velocities cause large optic flow, making robust 1.3 Contribution feature detection and tracking over sequential images dif- ficult and thus causing substantial drift in the VIO state The approach contributed herein builds upon the work estimate Delmerico et al. (2019). of Kaufmann et al. (2018) and fuses VIO with a robust CNN- To overcome this difficulty, several approaches exploiting based gate corner detection using an extended Kalman filter the structure of drone racing with gates as landmarks have (EKF), achieving high accuracy at little computational cost. been developed, e.g., Li et al. (2019); Jung et al. (2018); The gate corner detections are used as static features to com- Kaufmann et al. (2018), where the drone locates itself rel- pensate for the VIO drift and to align the drone’s flight path ative to gates. In Li et al. (2019), a handcrafted process is precisely with the gates. Contrary to all previous works Li used to extract gate information from images that is then et al. (2019); Jung et al. (2018); Kaufmann et al. (2018), fused with attitude estimates from an inertial measurement which only detect the next gate, our approach makes use of unit (IMU) to compute an attitude reference that guides the any gate detection and even profits from multiple simulta- drone towards the visible gate. While the approach is compu- neous detections to compensate for VIO drift and build a tationally very light-weight, it struggles with scenarios where global gate map. The global map allows the drone to nav- multiple gates are visible and does not allow to employ more igate through the race course even when the gates are not sophisticated planning and control algorithms which, e.g., immediately visible and further enables the usage of sophis- plan several gates ahead. In Jung et al. (2018), a convolu- ticated path planning and control algorithms. In particular, a tional neural network (CNN) is used to retrieve a bounding computationally efficient, sampling-based path planner (see box of the gate and a line-of-sight-based control law aided e.g., LaValle (2006), and references therein) is employed that 123 Autonomous Robots plans near time-optimal paths through multiple gates ahead and is capable of adjusting the path in real time if the global map is updated. This paper extends our previous work Foehn et al. (2020) by including a more detailed elaboration on our gate corner detection in Sect. 4 with an ablation study in Sect. 8.1, further details on the fusion of VIO and gate detection in Sect. 5, and a description of the path parameterization in Sect. 6, com- pleted by an ablation study on the planning horizon length in C z e B Sect. 8.3. e I I e e x y 2 AlphaPilot race format and drone Fig. 2 Illustration of the race drone with its body-fixed coordinate frame B in blue and a camera coordinate frame C in red 2.1 Race format trolled the total thrust f along the drone’s z-axis (see Fig. 2) From more than 400 teams that participated in a series of and the angular velocity, ω = ω ,ω ,ω , in the body-fixed x y z qualification tests including a simulated drone race Guerra coordinate frame B. et al. (2019), the top nine teams were selected to compete in the 2019 AlphaPilot Challenge. The challenge consists 2.3 Drone model of three qualification races and a final championship race at which the six best teams from the qualification races compete Bold lower case and upper case letters will be used to for the grand prize of $1 million. Each race is implemented denote vectors and matrices, respectively. The subscripts in as a time trial competition in which each team is given three p = p − p are used to express a vector from point I I I CB B C attempts to fly through a race course as fast a possible with- C to point B expressed in frame I. Without loss of generality, out competing drones on the course. Taking off from a start I is used to represent the origin of frame I, and B represents podium, the drones have to autonomously navigate through the origin of coordinate frame B. For the sake of readability, a sequence of gates with distinct appearances in the cor- the leading subscript may be omitted if the frame in which rect order and terminate at a designated finish gate. The race the vector is expressed is clear from context. course layout, gate sequence, and position are provided ahead The drone is modelled as a rigid body of mass m with rotor of each race up to approximately ±3 m horizontal uncer- drag proportional to its velocity acting on it Kai et al. (2017). tainty, enforcing teams to come up with solutions that adapt The translational degrees-of-freedom are described by the to the real gate positions. Initially, the race courses were position of its center of mass, p = p , p , p , with B,x B,y B,z planned to have a lap length of approximately 300 m and respect to an inertial frame I as illustrated in Fig. 2.The required the completion up to three laps. However, due to rotational degrees-of-freedom are parametrized using a unit technical difficulties, no race required to complete multiple quaternion, q , where R = R(q ) denotes the rota- IB IB IB laps and the track length at the final championship race was tion matrix mapping a vector from the body-fixed coordinate limited to about 74 m. frame B to the inertial frame I Shuster (1993). A unit quater- nion, q, consists of a scalar q and a vector q ˜ = q , q , q w x y z 2.2 Drone specifications and is defined as q = (q , q ˜ ) Shuster (1993). The drone’s equations of motion are All teams were provided with an identical race drone (Fig. 1) that was approximately 0.7 m in diameter, weighed m p ¨ = R f e − R DR v − m g, (1) IB IB B B z IB 3.4kg, and had a thrust-to-weight ratio of 1.4. The drone was equipped with a NVIDIA Jetson Xavier embedded com- q ˙ = ⊗ q , (2) IB IB puter for interfacing all sensors and actuators and handling all 2 computation for autonomous navigation onboard. The sensor ◦ B suite included two ± 30 forward-facing stereo camera pairs where f and ω are the force and bodyrate inputs, e = (Fig. 2), an IMU, and a downward-facing laser rangefinder (0, 0, 1) is the drone’s z-axis expressed in its body-fixed (LRF). All sensor data were globally time stamped by soft- frame B, D = diag(d , d , 0) is a constant diagonal matrix x y ware upon reception at the onboard computer. Detailed containing the rotor drag coefficients, v = p ˙ denotes the specifications of the available sensors are given in Table 1. drone’s velocity, g is gravity and ⊗ denotes the quaternion The drone was equipped with a flight controller that con- multiplication operator Shuster (1993). The drag coefficients 123 Autonomous Robots Table 1 Sensor specifications Sensor Model Rate Details Cam Leopard imaging IMX 264 60Hz Global shutter, color resolution: 1200 × 720 -4 -3 IMU Bosch BMI088 430Hz Range: ±24g, ±34.5rad/s resolution: 7e g,1e rad/s LRF Garmin LIDAR-Lite v3 120Hz Range: 1–40 m resolution: 0.01 m Sensor Interface Perception State Estimation Planning & Control Drone Interface Camera 1 EKF Gate Path gate map State Detection Camera 2 Planning Estimation trajectory Camera 3 Position Total vehicle state Camera 4 Visual Control Thrust State Inertial IMU attitude Prediction Odometry Attitude Angular Laser Control Velocity Rangefinder Fig. 3 Overview of the system architecture and its main components. All components within a dotted area run in a single thread were identified experimentally to be d = 0.5kg/s and gate detections to align the VIO estimate with the global d = 0.25 kg/s. gate map, i.e., compensates for the VIO drift. Computing the state estimate, in particular interfacing the cameras and running VIO, introduces latency in the order of 130 ms to the system. In order to be able to achieve a high bandwidth of 3 System overview the control system despite large latencies, the vehicle’s state estimate is predicted forward to the vehicle’s current time The system is composed of five functional groups: Sensor using the IMU measurements. interface, perception, state estimation, planning and control, and drone interface (see Fig. 3). In the following, a brief introduction to the functionality of our proposed perception, 3.3 Planning and control state estimation, and planning and control system is given. The global gate map and the latency-compensated state esti- mate of the vehicle are used to plan a near time-optimal path 3.1 Perception through the next N gates starting from the vehicle’s cur- rent state (see Sect. 6). The path is re-planned every time Of the two stereo camera pairs available on the drone, only (i) the vehicle passes through a gate, (ii) the estimate of the the two central forward-facing cameras are used for gate gate map or (iii) the VIO drift are updated significantly, i.e., detection (see Sect. 4) and, in combination with IMU mea- large changes in the gate positions or VIO drift. The path is surements, to run VIO. The advantage is that the amount of image data to be processed is reduced while maintaining a tracked using a cascaded control scheme (see Sect. 7) with an outer proportional-derivative (PD) position control loop very large field of view. Due to its robustness, multi-camera capability and computational efficiency, ROVIO Bloesch and an inner proportional (P) attitude control loop. Finally, the outputs of the control loops, i.e., a total thrust and angular et al. (2015) has been chosen as VIO pipeline. At low speeds, velocity command, are sent to the drone. ROVIO is able to provide an accurate estimate of the quadro- tor vehicle’s pose and velocity relative to its starting position, however, at larger speeds the state estimate suffers from drift. 3.4 Software architecture 3.2 State estimation The NVIDIA Jetson Xavier provides eight CPU cores, how- ever, four cores are used to run the sensor and drone interface. In order to compensate for a drifting VIO estimate, the output The other four cores are used to run the gate detection, VIO, of the gate detection and VIO are fused together with the EKF state estimation, and planning and control, each in a sep- measurements from the downward-facing laser rangefinder arate thread on a separate core. All threads are implemented (LRF) using an EKF (see Sect. 5). The EKF estimates a global asynchronously to run at their own speed, i.e., whenever new map of the gates and, since the gates are stationary, uses the data is available, in order to maximize data throughput and to 123 Autonomous Robots reduce processing latency. The gate detection thread is able 4.1 Stage 1: predicting corner maps and part affinity to process all camera images in real time at 60Hz, whereas fields the VIO thread only achieves approximately 35Hz. In order to deal with the asynchronous nature of the gate detection and In the first detection stage, each input image, I ,is w×h×3 VIO thread and their output, all data is globally time stamped mapped by a neural network into a set of N = 4 corner and integrated in the EKF accordingly. The EKF thread runs maps, C , and N = 4PAFs, E .Pre- w×h×N E w×h×(N ·2) C E every time a new gate or LRF measurement is available. The dicted corner maps as well as PAFs are illustrated in Fig. 4, planning and control thread runs at a fixed rate of 50Hz. To 2nd and 3rd column. The network is trained in a super- achieve this, the planning and control thread includes the vised manner by minimizing the Mean-Squared-Error loss state prediction which compensates for latencies introduced between the network prediction and the ground-truth maps. by the VIO. In the following, ground-truth maps for both map types are explained in detail. 4.1.1 Corner maps For each corner class, j ∈ C , C := { , , , },a j j TL TR BL BR ground-truth corner map, C (s), is represented by a single- 4 Gate detection channel map of the same size as the input image and indicates the existence of a corner of class j at pixel location s in the To correct for drift accumulated by the VIO pipeline, the image. The value at location s ∈ I in C is defined by a gates are used as distinct landmarks for relative localization. Gaussian as In contrast to previous CNN-based approaches to gate detec- tion, we do not infer the relative pose to a gate directly, but ∗ 2 instead segment the four corners of the observed gate in the s − s j 2 C (s) = exp − , (3) input image. These corner segmentations represent the like- lihood of a specific gate corner to be present at a specific pixel coordinate. To represent a value proportional to the likelihood, the maps are trained on Gaussians of the corner ∗ where s denotes the ground truth image position of the near- projections. This allows the detection of an arbitrary amount est corner with class j. The choice of the parameter σ controls of gates, and allows for a more principled inclusion of gate the width of the Gaussian. We use σ = 7 pixel in our imple- measurements in the EKF through the use of reprojection mentation. Gaussians are used to account for small errors in error. Specifically, it exhibits more predictable behavior for the ground-truth corner positions that are provided by hand. partial gate observations and overlapping gates, and allows Ground-truth corner maps are generated for each individual to suppress the impact of Gaussian noise by having multi- gate in the image separately and then aggregated. Aggre- ple measurements relating to the same quantity. Since the gation is performed by taking the pixel-wise maximum of exact shape of the gates is known, detecting a set of charac- the individual corner maps, as this preserves the distinction teristic points per gate allows to constrain the relative pose. between close corners. For the quadratic gates of the AlphaPilot Challenge, these characteristic points are chosen to be the inner corner of the gate border (see Fig. 4, 4th column). However, just detect- 4.1.2 Part affinity fields ing the four corners of all gates is not enough. If just four corners of several gates are extracted, the association of cor- We define a PAF for each of the four possible classes of ners to gates is undefined (see Fig. 4, 3rd row, 2nd column). edges, defined by its two connecting corners as (k, l) ∈ To solve this problem, we additionally train our network to E := {( , ), ( , ), ( , ), ( , )}. For each KL TL TR TR BR BR BL BL TL extract so-called Part Affinity Fields (PAFs), as proposed edge class, (k, l), the ground-truth PAF, E (s), is repre- (k,l) by Cao et al. (2017). These are vector fields, which, in our sented by a two-channel map of the same size as the input case, are defined along the edges of the gates, and point from image and points from corner k to corner l of the same gate, one corner to the next corner of the same gate, see column provided that the given image point s lies within distance d three in Fig. 4. The entire gate detection pipeline consists of such an edge. We use d = 10 pixel in our implementation. of two stages: (1) predicting corner maps and PAFs by the Let G be the set of gates g and S be the set of image (k,l),g neural network, (2) extracting single edge candidates from points that are within distance d of the line connecting the ∗ ∗ the network prediction and assembling them to gates. In the corner points s and s belonging to gate g. Furthermore, let k l ∗ ∗ following, both stages are explained in detail. v be the unit vector pointing from s to s of the same k,l,g k l 123 Autonomous Robots Fig. 4 The gate detection module returns sets of corner points for each corner maps, C (second column), and PAFs, E (third column), are dis- gate in the input image (fourth column) using a two-stage process. In the played in a single image each. While color encodes the corner class for first stage, a neural network transforms an input image, I (first C, it encodes the direction of the 2D vector fields for E. The yellow w×h×3 column), into a set of confidence maps for corners, C (second lines in the bottom of the second column show the six edge candidates of w×h×4 column), and Part Affinity Fields (PAFs) Cao et al. (2017), E the edge class (TL, TR) (the TL corner of the middle gate is below the w×h×(4·2) (third column). In the second stage, the PAFs are used to associate sets detection threshold), see Sect. 4.2. Best viewed in color (Color figure of corner points that belong to the same gate. For visualization, both online) gate. Then, the part affinity field, E (s), is defined as: where s(u) lineraly interpolates between the two corner can- (k,l) didate locations s and s . In practice, S is approximated by k l ∗ uniformly sampling the integrand. v ifs ∈ S , g ∈ G k,l,g (k,l),g E (s) = (4) The line integral S is used as metric to associate corner (k,l) 0 otherwise. candidates to gate detections. The goal is to find the optimal assignment for the set of all possible corner candidates to As in the case of corner maps, PAFs are generated for each gates. As described in Cao et al. (2017), finding this optimal individual gate in the image separately and then aggregated. assignment corresponds to a K -dimensional matching prob- In case a point s lies in S of several gates, the v of (k,l),g k,l,g lem that is known to be NP-Hard West (2001). Following Cao all corresponding gates are averaged. et al. (2017), the problem is simplified by decomposing the matching problem into a set of bipartite matching sub- 4.2 Stage 2: corner association problems. Matching is therefore performed independently for each edge class. Specifically, the following optimization At test time, discrete corner candidates, s , for each corner problem represents the bipartite matching subproblem for class, j ∈ C , are extracted from the predicted corner map edge class (k, l): using non-maximum suppression and thresholding. For each corner class, there might be several corner candidates, due max S = S((s , s )) · z (6) (k,l) k l kl to multiple gates in the image or false positives. These cor- k∈D l∈D k l ner candidates allow the formation of an exhaustive set of s.t. ∀k ∈ D , z ≤ 1 , (7) k kl edge candidates, {(s , s )}, see the yellow lines in Fig. 4. k l l∈D Given the corresponding PAF, E (s), each edge candi- (k,l) date is assigned a score which expresses the agreement of ∀l ∈ D , z ≤ 1 , (8) l kl that candidate with the PAF. This score is given by the line k∈D integral where S is the cumulative matching score and D , D (k,l) k l u=1 denote the set of corner candidates for edge class (k, l).The s − s l k S((s , s )) = E (s(u)) · du, (5) k l (k,l) variable z ∈{0, 1} indicates whether two corner candidates kl s − s u=0 l k 123 Autonomous Robots are connected. Equations (7) and (8) enforce that no two rotational misalignment of the VIO origin frame, V, with edges share the same corner. Above optimization problem respect to the inertial frame, I, represented by p and q , V IV can be solved using the Hungarian method Kuhn (1955), jointly with the gate positions, p , and gate heading, ϕ . IG i i resulting in a set of edge candidates for each edge class (k, l). It can thus correct for an imprecise initial position estimate, With the bipartite matching problems being solved for all VIO drift, and uncertainty in gate positions. The EKF’s state edge classes, the pairwise associations can be extended to space at time t is x = x(t ) with covariance P described k k k k sets of associated edges for each gate. by 4.3 Training data x = p , q , p ,ϕ ,..., p ,ϕ . (9) V IV G IG G IG 0 0 N −1 N −1 The neural network is trained in a supervised fashion using The drone’s corrected pose, p , q , can then be com- a dataset recorded in the real world. Training data is gen- B IB puted from the VIO estimate, p , q , by transforming erated by recording video sequences of gates in 5 different VB VB it from frame V into the frame I using p , q as environments. Each frame is annotated with the corners of V IV all gates visible in the image using the open source image annotation software labelme , which is extended with KLT- p = p + R · p , q = q · q . (10) IV B V VB IB IV VB Tracking for semi-automatic labelling. The resulting dataset used for training consists of 28k images and is split into 24k samples for training and 4k samples for validation. At train- All estimated parameters are expected to be time-invariant ing time, the data is augmented using random rotations of up but subject to noise and drift. This is modelled by a Gaussian to 30 and random changes in brightness, hue, contrast and random walk, simplifying the EKF process update to: saturation. x = x , P = P + t Q, (11) 4.4 Network architecture and deployment k+1 k k+1 k k The network architecture is designed to optimally trade- where Q is the random walk process noise. For each mea- off between computation time and accuracy. By conducting surement z with noise R the predicted apriori estimate, x , a neural network architecture search, the best performing is corrected with measurement function, h(x ), and Kalman architecture for the task is identified. The architecture search gain, K , resulting in the a posteriori estimate, x ,as is limited to variants of U-Net Ronneberger et al. (2015) due to its ability to perform segmentation tasks efficiently −1 − − K = P H H P H + R , k k with a very limited amount of labeled training data. The best k k k k + − − performing architecture is identified as a 5-level U-Net with x = x + K z − h(x ) , k k k k k + − [12, 18, 24, 32, 32] convolutional filters of size [3, 3, 3, 5, 7] P = ( I − K H ) P , (12) k k k k per level and a final additional layer operating on the output of the U-Net containing 12 filters. At each layer, the input with h(x ), the measurement function with Jacobian H . feature map is zero-padded to preserve a constant height However, the filter state includes a rotation quaternion and width throughout the network. As activation function, constrained to unit norm, q  = 1. This is effectively LeakyReLU with α = 0.01 is used. For deployment on the IV an over-parameterization in the filter state space and can Jetson Xavier, the network is ported to TensorRT 5.0.2.6. To lead to poor linearization as well as underestimation of the optimize memory footprint and inference time, inference is covariance. To apply the EKFs linear update step on the performed in half-precision mode (FP16) and batches of two over-parameterized quaternion, it is lifted to its tangent space images of size 592 × 352 are fed to the network. description, similar to Forster et al. (2017b). The quaternion q is composed of a reference quaternion, q , which IV IV ref is adjusted after each update step, and an error quaternion, 5 State estimation q , of which only its vector part, q ˜ , is in the EKF’s V V V V ref ref state space. Therefore we get The nonlinear measurement models of the VIO, gate detec- tion, and laser rangefinder are fused using an EKF Kalman (1960). In order to obtain the best possible pose accuracy 1 − q ˜ · q ˜ V V ref V V ref q = q · q q = relative to the gates, the EKF estimates the translational and IV IV V V V V ref ref ref q ˜ V V ref 3 (13) https://github.com/wkentaro/labelme 123 Autonomous Robots from which we can derive the Jacobian of any measurement where p and R describe a constant transformation BC BC function, h(x), with respect to q by the chain rule as between the drone’s body frame B and camera frame C (see IV Fig. 2). The Jacobian with respect to the EKF’s state space ∂ ∂ ∂ q IV is derived using the chain rule, h(x) = h(x) · (14) ∂ q ˜ ∂ q ∂ q ˜ V V IV V V ref ref ⎡  ⎤ ∂ p (x) ∂ ∂ h (x) −q ˜ Co Gate ij V V ref h (x) = · , (20) Gate ∂ x ∂ p (x) ∂ x ⎣ ⎦ 1−q ˜ ·q ˜ = h(x) ·[q ] Co × V V V V ij IV ref ref ref ∂ q ˜ IV 3×3 where the first term representing the derivative of the pro- (15) jection, and the second term represents the derivative with respect to the state space including gate position and ori- where we arrive at (15)byusing (13)in (14) and use [q ] IV ref entation, and the frame alignment, which can be further to represent the matrix resulting from a lefthand-side multi- decomposed using (14). plication with q . IV ref 5.1.2 Gate correspondences 5.1 Measurement modalities The gate detection (see Fig. 4) provides sets of m measure- All measurements up to the camera frame time t are passed ments, S ={s ,..., s }, corresponding to the Co 0 Co m−1 to the EKF together with the VIO estimate, p and q , i ˆ ˆ ij ij VB,k VB,k with respect to the VIO frame V. Note thate the VIO estimate unknown gate i at known corners j ∈ C . To identify the is assumed to be a constant parameter, not a filter state, which correspondences between a detection set S and the gate G vastly simplifies derivations ad computation, leading to an in our map, we use the square sum of reprojection error. For efficient yet robust filter. this, we first compute the reprojection of all gate corners, s , according to (16) and then compute the square error Co ij 5.1.1 Gate measurements sum between the measurement set, S , and the candidates, s . Finally, the correspondence is established to the gate Co ij Gate measurements consist of the image pixel coordinates, G which minimizes the square error sum, as in s , of a specific gate corner. These corners are denoted Co ij with top left and right, and bottom left and right, as in j ∈ C , j argmin (s − s ) (s − s ) (21) Co Co Co Co ˆ ij ˆ ij ij ij i ∈[0,N −1] C := { , , , } and the gates are enumerated by i ∈ s ∈S j TL TR BL BR Co ˆ i j [0, N − 1]. All gates are of equal width, w, and height, h,so that the corner positions in the gate frame, G , can be written as p = (0, ±w, ±h). The measurement equation can G Co i ij 2 5.1.3 Laser rangefinder measurement be written as the pinhole camera projection Szeliski (2010) of the gate corner into the camera frame. A pinhole camera The drone’s laser rangefinder measures the distance along maps the gate corner point, p , expressed in the camera Co ij the drones negative z-axis to the ground, which is assumed frame, C, into pixel coordinates as to be flat and at a height of 0 m. The measurement equation can be described by 1 f 0 c x x h (x) = s = p , (16) Gate Co ij Co ij 0 f c [ p ] y y Co ij [ p ] [ p + R p ] z z B V IV VB h (x) = = . (22) LRF B B [ R e ] [ R R e ] IB z IV VB z z z where [·] indicates the scalar z-component of a vector, f z x and f are the camera’s focal lengths and c , c is the y x y The Jacobian with respect to the state space is again derived camera’s optical center. The gate corner point, p ,isgiven Co ∂h ∂h LRF LRF ij by and and further simplified using (14). ∂ p ∂ q V IV by p = R p + R p − p , (17) IG Co G i G Co C ij IC i i j 6 Path planning with p and R being the transformation between the iner- IC For the purpose of path planning, the drone is assumed to be a tial frame I and camera frame C, point mass with bounded accelerations as inputs. This simpli- fication allows for the computation of time-optimal motion p = p + R p + R p , (18) IV VB C V VB BC primitives in closed-form and enables the planning of approx- R = R R R , (19) IC IV VB BC imate time-optimal paths through the race course in real time. 123 Autonomous Robots Even though the dynamics of the quadrotor vehicle’s accel- B,x eration cannot be neglected in practice, it is assumed that this p B,y B,z simplification still captures the most relevant dynamics for -5 path planning and that the resulting paths approximate the -10 true time-optimal paths well. In order to facilitate the track- ing of the approximate time-optimal path, polynomials of B,x B,y order four are fitted to the path which yield smoother posi- B,z tion, velocity and acceleration commands, and can therefore -5 be better tracked by the drone. -10 In the following, time-optimal motion primitives based 00.51.01.5 1.96 on the simplified dynamics are first introduced and then a Time [s] path planning strategy based on these motion primitives is Fig. 5 Example time-optimal motion primitive starting from rest at presented. Finally, a method to parameterize the time-optimal the origin to a random final position with non-zero final velocity. The path is introduced. velocities are constrained to ±7.5m/s and the inputs to ±12 m/s .The dotted lines denote the per-axis time-optimal maneuvers 6.1 Time-optimal motion primitive ∗ ∗ ∗ primitive to exist, a new parameter α ∈[0, 1] is introduced The minimum times, T , T and T , required for the vehicle x y z that scales the acceleration bounds, i.e., the applied control to fly from an initial state, consisting of position and veloc- inputs are scaled to αu and αu , respectively. Fig. 5 depicts ity, to a final state while satisfying the simplified dynamics x the position and velocity of an example time-optimal motion p ¨ (t ) = u(t ) with the input acceleration u(t ) being con- primitive. strained to u ≤ u(t ) ≤ u are computed for each axis individually. Without loss of generality, only the x-axis is 6.2 Sampling-based receding horizon path planning considered in the following. Using Pontryagin’s maximum principle Bertsekas (1995), it can be shown that the optimal The objective of the path planner is to find the time-optimal control input is bang-bang in acceleration, i.e., has the form path from the drone’s current state to the final gate, passing through all the gates in the correct order. Since the previously u , 0 ≤ t ≤ t , ∗ x u (t ) = (23) x introduced motion primitive allows the generation of time- ∗ ∗ u , t < t ≤ T , optimal motions between any initial and any final state, the time-optimal path can be planned by concatenating a time- or vice versa with the control input first being u followed by optimal motion primitive starting from the drone’s current u . In order to control the maximum velocity of the vehicle, (simplified) state to the first gate with time-optimal motion e.g., to constrain the solutions to ranges where the simplified primitives that connect the gates in the correct order until the dynamics approximate the true dynamics well or to limit the final gate. This reduces the path planning problem to finding motion blur of the camera images, a velocity constraint of the the drone’s optimal state at each gate such that the total time form v ≤ v (t ) ≤ v can be added, in which case the opti- B B is minimized. To find the optimal path, a sampling-based mal control input has bang-singular-bang solution Maurer strategy is employed where states at each gate are randomly (1977) sampled and the total time is evaluated subsequently. In par- ticular, the position of each sampled state at a specific gate ⎪ u , 0 ≤ t ≤ t , ⎨ x 1 is fixed to the center of the gate and the velocity is sampled ∗ ∗ u (t ) = (24) 0, t < t ≤ t , x uniformly at random such the velocity lies within the con- 1 2 ∗ ∗ u , t < t ≤ T , straints of the motion primitives and the angle between the velocity and the gate normal does not exceed a maximum or vice versa. It is straightforward to verify that there exist angle, ϕ It is trivial to show that as the number of sam- max closed-form solutions for the minimum time, T ,aswellas pled states approaches infinity, the computed path converges the switching times, t , in both cases (23)or (24). to the time-optimal path. Once the minimum time along each axis is computed, In order to solve the problem efficiently, the path planning ∗ ∗ ∗ ∗ the maximum minimum time, T = max(T , T , T ),is problem is interpreted as a shortest path problem. At each x y z computed and motion primitives of the same form as in (23) gate, M different velocities are sampled and the arc length or (24) are computed among the two faster axes but with from each sampled state at the previous gate is set to be equal ∗ ∗ the final time constrained to T such that trajectories along to the duration, T , of the time-optimal motion primitive that each axis end at the same time. In order for such a motion guides the drone from one state to the other. Due to the exis- Velocity [m/s] Position [m] Autonomous Robots tence of a closed-form expression for the minimum time, T , control loop can track setpoint changes perfectly, i.e., without setting up and solving the shortest path problem can be done any dynamics or delay. very efficiently using, e.g., Dijkstra’s algorithm Bertsekas (1995) resulting in the optimal path p (t ). In order to further 7.1 Position control reduce the computational cost, the path is planned in a reced- ing horizon fashion, i.e., the path is only planned through the The position control loop along the inertial z-axis is designed next N gates. such that it responds to position errors p = p − p 6.3 Path parameterization B ,z B ,z B,z err ref in the fashion of a second-order system with time constant Due to the simplifications of the dynamics that were made τ and damping ratio ζ , when computing the motion primitives, the resulting path is pos,z pos,z infeasible with respect to the quadrotor dynamics (1) and 1 2ζ pos,z (2) and thus is impossible to be tracked accurately by the p ¨ = p + p ˙ +¨ p . (31) B,z B ,z B ,z B ,z err err ref τ τ drone. To simplify the tracking of the time-optimal path, the pos,z pos,z path is approximated by fourth order polynomials in time. Similarly, two control loops along the inertial x- and y-axis In particular, the path is divided into multiple segments of are shaped to make the horizontal position errors behave equal arc length. Let t ∈[t , t ) be the time interval of the k k+1 like second-order systems with time constants τ and pos,xy k-th segment. In order to fit the polynomials, p ¯ (t ),tothe damping ratio ζ . Inserting (31) into the translational pos,xy k-th segment of the time-optimal path, we require that the dynamics (1), the total thrust, f , is computed to be initial and final position and velocity are equal to those of the time-optimal path, i.e., [m p ¨ + g + R DR v ] IB B z ref IB f = . (32) ∗ ∗ [R ] p ¯ (t ) = p (t ), p ¯ (t ) = p (t ), (25) IBe k k k+1 k+1 z k k ∗ ∗ ˙ ˙ ¯p (t ) = p ˙ (t ), ¯p (t ) = p ˙ (t ), (26) k k k+1 k+1 k k 7.2 Attitude control and that the positions at t = (t − t ) /2 coincide as well: k+1 k The required acceleration from the position controller deter- mines the orientation of the drone’s z-axis and is used, in t + t t + t k+1 k k+1 k p ¯ = p . (27) k combination with a reference yaw angle, ϕ , to compute ref 2 2 the drone’s reference attitude. The reference yaw angle is chosen such that the drone’s x-axis points towards the refer- The polynomial parameterization p (t ) of the k-th segment ence position 5 m ahead of the current position, i.e., that the is then given by drone looks in the direction it flies. A nonlinear attitude con- troller similar to Brescianini and D’Andrea (2018) is applied 4 3 2 p ¯ (t ) = a s + a s + a s + a s + a , (28) 4,k 3,k 2,k 1,k 0,k that prioritizes the alignment of the drone’s z-axis, which is crucial for its translational dynamics, over the correction of with s = t − t being the relative time since the start of k-th the yaw orientation: segment. The velocity and acceleration required for the drone ⎡ ⎤ to track this polynomial path can be computed by taking the q q − q q w x y z 2sgn(q ) derivatives of (28), yielding −1 ⎣ ⎦ ω = T q q + q q , (33) w y x z att 2 2 q + q w z z 3 2 ¯p (t ) = 4a s + 3a s + 2a s + a , (29) 4,k 3,k 2,k 1,k ¯p (t ) = 12a s + 6a s + 2a . (30) where q , q , q and q are the components of the attitude 4,k 3,k 2,k k w x y z −1 error, q ⊗ q , and where T is a diagonal matrix att IB IB ref containing the per-axis first-order system time constants for 7 Control small attitude errors. This section presents a control strategy to track the near time-optimal path from Sect. 6. The control strategy is based 8 Results on a cascaded control scheme with an outer position control loop and an inner attitude control loop, where the position The proposed system was used to race in the 2019 AlphaPi- control loop is designed under the assumption that the attitude lot championship race. The course at the championship race 123 Autonomous Robots 8m/s p p B VB Final Gate Start Start Gate 1 Gate 1 0 0 6m/s Gate 2 Gate 2 -10 -10 4m/s -20 -20 2m/s Gate 3 Gate 3 Start -30 -30 Gate 4 Gate 4 Gate 1 Final Gate Final Gate 0m/s -10 0 10 20 30 -10 0 10 20 30 04 8 12 p [m] p [m] p [m] x x x Fig. 6 Top view of the planned (left) and executed (center) path at the championship race, and an executed multi-lap path at a testing facility (right). Left: Fastest planned path in color, sub-optimal sampled paths in gray. Center: VIO trajectory as p and corrected estimate as p (Color figure VB B online) consisted of five gates and had a total length of 74 m. A top Table 2 Comparison of different network architectures with respect to intersection over union (IoU), precision (Pre.) and recall (Rec.). view of the race course as well as the results of the path plan- The index in the architecture name denotes the number of levels in ning and the fastest actual flight are depicted in Fig. 6 (left the U-Net. All networks contain one layer per level with kernel sizes and center). With the motion primitive’s maximum veloc- of [3, 3, 5, 7, 7] and [12, 18, 24, 32, 32] filters per level. Architectures ity set to 8 m/s, the drone successfully completed the race labelled with ’L’ contain twice the amount of filters per level. Timings are measured for single input images of size 352 x 592 on a desktop course in a total time of 11.36 s, with only two other teams computer equipped with an NVIDIA RTX 2080 Ti also completing the full race course. The drone flew at an Arch. IoU Pre. Rec. #params Latency [s] average velocity of 6.5m/s and reached the peak velocity of 8m/s multiple times. Note that due to missing ground truth, UNet-5L 0.966 0.997 0.967 613k 0.106 Fig. 6 only shows the estimated and corrected drone position. UNet-5 0.964 0.997 0.918 160k 0.006 The system was further evaluated at a testing facility where UNet-4L 0.948 0.997 0.920 207k 0.085 there was sufficient space for the drone to fly multiple laps UNet-4 0.941 0.989 0.862 58k 0.005 (see Fig. 6, right), albeit the course consisted of only two UNet-3L 0.913 0.991 0.634 82k 0.072 gates. The drone was commanded to pass 4 times through UNet-3 0.905 0.988 0.520 27k 0.005 gate 1 before finishing in the final gate. Although the gates were not visible to the drone for most of the time, the drone successfully managed to fly multiple laps. Thanks to the global gate map and the VIO state estimate, the system was summarizes the performance of different network architec- able to plan and execute paths to gates that are not directly tures. Performance is evaluated quantitatively on a separate visible. By repeatedly seeing either one of the two gates, the test set of 4k images with respect to intersection over union drone was able to compensate for the drift of the VIO state (IoU) and precision/recall for corner predictions. While the estimate, allowing the drone to pass the gates every time IoU score only takes full gate detections into account, the exactly through their center. Note that although seeing gate precision/recall scores are computed for each corner detec- 1 in Fig. 6 (right) at least once was important in order to tion. Based on these results, architecture UNet-5 is selected update the position of the gate in the global map, the VIO for deployment on the real drone due to the low inference drift was also estimated by seeing the final gate. time and high performance. On the test set, this network achieves an IoU score with the human-annotated ground truth The results of the system’s main components are discussed in detail in the following subsections, and a video of the of 96.4%. When only analyzing the predicted corners, the results is attached to the paper. network obtains a precision of 0.997 and a recall of 0.918. Deployment Even in instances of strong changes in illumina- tion, the gate detector was able to accurately identify the gates 8.1 Gate detection in a range of 2 − 17 m. Fig. 4 illustrates the quality of detec- tions during the championship race (1st row) as well as for Architecture search Due to the limited computational bud- cases with multiple gates, represented in the test set (2nd/3rd get of the Jetson Xavier, the network architecture was row). With the network architecture explained in Sect. 4, one designed to maximize detection accuracy while keeping a low simultaneous inference for the left- and right-facing camera inference time. To find such architecture, different variants requires computing 3.86GFLOPS (40kFLOPS per pixel). By of U-Net Ronneberger et al. (2015) are compared. Table 2 implementing the network in TensorRT and performing infer- p [m] p [m] p [m] y Autonomous Robots Table 3 Total flight time vs. N Flight time Computation time gates computation time averaged over 100 runs. The percentage in 19.5935 s 1.66 ms 2.35% parenthesis is the computation 29.2913 s 18.81 ms 26.56% time with respect to the computational time for the full39.2709 s 35.74 ms 50.47% track 49.2667 s 53.00 ms 74.84% 5 (full track) 9.2622 s 70.81 ms 100% CPC Foehn and Scaramuzza (2020) (full track) 6.520 s 4.62 · 10 ms 6524% ence in half-precision mode (FP16), this computation takes shows the flight and computation time of the time-optimal 10.5 ms on the Jetson Xavier and can therefore be performed trajectory generation from Foehn and Scaramuzza (2020), at the camera update rate. which significantly outperforms our approach but is far away from real-time execution with a computation time of 462 s for a single solution. Online replanning would therefore not 8.2 State estimation be possible, and any deviations from the nominal track layout could lead to a crash. Compared to a pure VIO-based solution, the EKF has proven Please also note that the evaluation of our method is per- to significantly improve the accuracy of the state estima- formed in Matlab on a laptop computer, while the final tion relative to the gates. As opposed to the works by Li optimized implementation over N = 3 gates achieved et al. (2019); Jung et al. (2018); Kaufmann et al. (2018), the replanning times of less than 2 ms on the Jetson Xavier and proposed EKF is not constrained to only use the next gate, can thus be done in every control update step. Figure 6 (right) but can work with any gate detection and even profits from shows resulting path and velocity of the drone in a multi-lap multiple detections in one image. Fig. 6 (center) depicts the scenario, where the drone’s velocity was limited to 6 m/s. It flown trajectory estimated by the VIO system as p and the VB can be seen that drone’s velocity is decreased when it has to EKF-corrected trajectory as p (the estimated corrections fly a tight turn due to its limited thrust. are depicted in gray). Accumulated drift clearly leads to a large discrepancy between VIO estimate p and the cor- VB rected estimate p . Towards the end of the track at the two last gates this discrepancy would be large enough to cause the 9 Discussion and conclusion drone to crash into the gate. However, the filter corrects this discrepancy accurately and provides a precise pose estimate The proposed system managed to complete the course at a relative to the gates. Additionally, the imperfect initial pose, velocity of 5 m/s with a success rate of 100% and at 8 m/s in particular the yaw orientation, is corrected by the EKF with a success rate of 60%. At higher speeds, the combi- while flying towards the first gate as visible in the zoomed nation of VIO tracking failures and no visible gates caused section in Fig. 6 (center). the drone to crash after passing the first few gates. This fail- ure could be caught by integrating the gate measurements 8.3 Planning and control directly in a VIO pipeline, tightly coupling all sensor data. Another solution could be a perception-aware path planner Figure 6 (left) shows the nominally planned path for the trading off time-optimality against motion blur and maxi- AlphaPilot championship race, where the coloured line mum gate visibility. depicts the fastest path along all the sampled paths depicted The advantages of the proposed system are (i) a drift-free in gray. In particular, a total of M = 150 different states are state estimate at high speeds, (ii) a global and consistent gate sampled at each gate, with the velocity limited to 8 m/s and map, and (iii) a real-time capable near time-optimal path the angle between the velocity and the gate normal limited planner. However, these advantages could only partially be to ϕ = 30 . During flight, the path is re-planned in a exploited as the races neither included multiple laps, nor had max receding horizon fashion through the next N = 3 gates (see complex segments where the next gates were not directly Fig. 6, center). It was experimentally found that choosing visible. Nevertheless, the system has proven that it can han- N ≥ 3 only has minimal impact of the flight time comapred dle these situations and is able to navigate through complex to planning over all gates, while greatly reducing the com- race courses reaching speeds up to 8 m/s and completing the putational cost. Table 3 presents the trade-offs between total championship race track of 74 m in 11.36 s. flight time and computation cost for different horizon lengths While the 2019 AlphaPilot Challenge pushed the field of N for the track shown in Fig. 6 (left). In addition, Table 3 autonomous drone racing, in particularly in terms of speed, 123 Autonomous Robots autonomous drones are still far away from beating human meet learning for drone racing. IEEE International Conference on Robotics and Automation (ICRA), pp 690–696. pilots. Moreover, the challenge also left open a number of Falanga, Davide., Foehn, Philipp., Lu, Peng., and Scaramuzza, Davide problems, most importantly that the race environment was .(2018). Pampc: Perception-aware model predictive control for partially known and static without competing drones or mov- quadrotors. In: 2018 IEEE/RSJ International Conference on Intel- ing gates. In order for autonomous drones to fly at high speeds ligent Robots and Systems (IROS), pp. 1–8. Foehn, Philipp and Scaramuzza, Davide (2020) CPC: Complemen- outside of controlled or known environments and succeed tary progress constraints for time-optimal quadrotor trajectories. in many more real-world applications, they must be able to Preprint arXiv:2007.06255. handle unknown environments, perceive obstacles and react Foehn, P., Brescianini, D., Kaufmann, E., Cieslewski, T., Gehrig, accordingly. These features are areas of active research and M., Muglikar, M., and Scaramuzza. D.(2020). Alphapilot: Autonomous drone racing. In Robotics: Science and Systems are intended to be included in future versions of the proposed (RSS). 10.15607/RSS.2020.XVI.081. drone racing system. Forster, Christian, Zhang, Zichao, Gassner, Michael, Werlberger, Manuel, & Scaramuzza, Davide. (2017a). SVO: Semidirect visual Acknowledgements This work was supported by the Intel Network on odometry for monocular and multicamera systems. IEEE Transac- Intelligent Systems, the Swiss National Science Foundation (SNSF) tions on Robotics, 33(2), 249–265. https://doi.org/10.1109/TRO. through the National Center of Competence in Research (NCCR) 2016.2623335 Robotics, the SNSF-ERC Starting Grant, and SONY. Forster, Christian, Carlone, Luca, Dellaert, Frank, & Scaramuzza, Davide. (2017b). On-manifold preintegration for real-time visual- Funding Open Access funding provided by Universitàt Zürich. inertial odometry. IEEE Transactions on Robotics, 33(1), 1–21. https://doi.org/10.1109/TRO.2016.2597321 Open Access This article is licensed under a Creative Commons Gao, Fei, William, Wu., Gao, Wenliang, & Shen, Shaojie. (2019). Fly- Attribution 4.0 International License, which permits use, sharing, adap- ing on point clouds: Online trajectory generation and autonomous tation, distribution and reproduction in any medium or format, as navigation for quadrotors in cluttered environments. Journal of long as you give appropriate credit to the original author(s) and the Field Robotics. https://doi.org/10.1002/rob.21842 source, provide a link to the Creative Commons licence, and indi- Guerra, Winter., Tal, Ezra., Murali, Varun., Ryou, Gilhyun., and cate if changes were made. The images or other third party material Karaman, Sertac.(2019). Flightgoggles: Photorealistic sensor sim- in this article are included in the article’s Creative Commons licence, ulation for perception-driven robotics using photogrammetry and unless indicated otherwise in a credit line to the material. If material virtual reality. arXiv:1905.11377. is not included in the article’s Creative Commons licence and your Hehn, M., Ritz, R., & DâAdrea, R., (2012). Performance benchmark- intended use is not permitted by statutory regulation or exceeds the ing of quadrotor systems using time-optimal control. Autonomous permitted use, you will need to obtain permission directly from the copy- Robots,. https://doi.org/10.1007/s10514-012-9282-3. right holder. To view a copy of this licence, visit http://creativecomm Kai, J. M., Allibert, G., Hua, M. D., & Hamel, T. (2017). Nonlinear ons.org/licenses/by/4.0/. feedback control of quadrotors exploiting first-order drag effects. IFAC-PapersOnLine, 50(1), 8189–8195. Kalman, Rudolf E. (1960). A new approach to linear filtering and pre- diction problems. Journal of Basic Engineering, 82(1), 35–45. Kuhn, H. W. (1955). The hungarian method for the assignment problem. References Naval Research Logistics Quarterly, 2(1–2), 83–97. LaValle, S. M. (2006). Planning algorithms. Cambridge Cambridge Bertsekas, Dimitri P. (1995). Dynamic programming and optimal con- Cambridge Cambridge: Cambridge University Press. trol (Vol. 1). MA: Athena scientific Belmont. Li, Shuo., van der Horst, Erik.,Duernay, Philipp., De Wagter, Bloesch, Michael, Omari, Sammy, Hutter, Marco, & Siegwart, Roland. Christophe., and de Croon, Guido.(2019). Visual model-predictive (2015). Robust visual inertial odometry using a direct EKF-based localization for computationally efficient autonomous racing of a approach. 72-gram drone. arXiv:1905.10110. Brescianini, D., & DâAndrea, R., (2018). Tilt-prioritized quadrocopter Loock, W. Van., Pipeleers, G., and Swevers, J.(2013). Time-optimal attitude control. IEEE Transactions on Control Systems Technol- quadrotor flight. In IEEE European Control of Conference (ECC). ogy, 28(2), 376–387. 10.23919/ECC.2013.6669253. Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., Maurer, H. (1977). On optimal control problems with bounded state et al. (2016). Past, present, and future of simultaneous localization variables and control appearing linearly. SIAM Journal on Control and mapping: Toward the robust-perception age. IEEE Transac- and Optimization, 150(3), 345–362. tions on Robotics, 32(6), 1309–1332. Mellinger, D., Michael, N., & Kumar, V. (2012). Trajectory genera- Cao, Zhe., Simon, Tomas., Wei, Shih-En., and Sheikh, Yaser .(2017). tion and control for precise aggressive maneuvers with quadrotors. Realtime multi-person 2d pose estimation using part affinity fields. The International Journal of Robotics Research. https://doi.org/ In: Proceedings of the IEEE Conference on Computer Vision and 10.1177/0278364911434236 Pattern Recognition, pp. 7291–7299. Moon, H., Sun, Y., Baltes, J., & Kim, S. J. (2017). The IROS 2016 Delmerico J, & Scaramuzza D. (2018). A benchmark comparison of competitions. IEEE Robotics Automation Magazine, 24(1), 20– monocular visual-inertial odometry algorithms for flying robots. 29. Moon, H., Martinez-Carranza, J., Cieslewski, T., Faessler, M., Falanga, Delmerico J, Cieslewski T, Rebecq H, Faessler M, & Scaramuzza D. D., Simovic, A., et al. (2019). Challenges and implemented tech- (2019). Are we ready for autonomous drone racing? the UZH-FPV nologies used in autonomous drone racing. Intelligent Service drone racing dataset. Robotics, 12, 137–148. Douglas Brent West. (2001). Introduction to graph theory (Vol. 2). NJ: Mourikis, Anastasios I., and Roumeliotis, S.I.(April 2007). A multi- Prentice hall Upper Saddle River. state constraint Kalman filter for vision-aided inertial navigation. Elia K, M Gehrig, P Foehn, R Ranftl, A Dosovitskiy, V Koltun, and D Scaramuzza. (2018) Beauty and the beast: Optimal methods 123 Autonomous Robots In: IEEE International Conference on Robotics and Automation Sunggoo, J., Sunyou, H., Heemin, S., & Shim, D. H. (2018). Percep- (ICRA), pp. 3565–3572. tion, guidance, and navigation for indoor autonomous drone racing Mueller MW., Hehn M., and D’Andrea R.(2015). A computationally using deep learning. IEEE Robotics and Automation Letters, 3(3), efficient motion primitive for quadrocopter trajectory generation. 2539–2544. https://doi.org/10.1109/LRA.2018.2808368 IEEE Transactions on Robotics. doi: https://doi.org/10.1109/TRO. Szeliski, Richard .(2010). Computer Vision: Algorithms and 2015.2479878. Applications. Texts in Computer Science. Springer. ISBN Qin, Tong, Li, Peiliang, & Shen, Shaojie. (2018). VINS-Mono: A 9781848829343. robust and versatile monocular visual-inertial state estimator. IEEE Zhou, B., Gao, F., Wang, L., Liu, C., & Shen, S. (2019). Robust and Transactions on Robotics, 34(4), 1004–1020. https://doi.org/10. efficient quadrotor trajectory generation for fast autonomous flight. 1109/TRO.2018.2853729 IEEE Robotics and Automation Letters. https://doi.org/10.1109/ Ronneberger, Olaf., Fischer, Philipp., and Brox, Thomas .(2015). U- LRA.2019.2927938 net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and Publisher’s Note Springer Nature remains neutral with regard to juris- computer-assisted intervention, pp. 234–241. Springer. dictional claims in published maps and institutional affiliations. Ryou, G., Tal, E., and Karaman, S.(2020). Multi-fidelity black-box opti- mization for time-optimal quadrotor maneuvers. Robotics: Science and Systems (RSS). 10.15607/RSS.2020.XVI.032. Shuster, Malcolm D. (1993). Survey of attitude representations. Journal of the Astronautical Sciences, 410(4), 439–517. Spasojevic, Igor, Murali, Varun, & Karaman, Sertac. (2020). Joint fea- ture selection and time optimal path parametrization for high speed vision-aided navigation. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Autonomous Robots Springer Journals

Loading next page...
 
/lp/springer-journals/alphapilot-autonomous-drone-racing-aOC3AoDW9y
Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2021
ISSN
0929-5593
eISSN
1573-7527
DOI
10.1007/s10514-021-10011-y
Publisher site
See Article on Publisher Site

Abstract

This paper presents a novel system for autonomous, vision-based drone racing combining learned data abstraction, nonlinear filtering, and time-optimal trajectory planning. The system has successfully been deployed at the first autonomous drone racing world championship: the 2019 AlphaPilot Challenge. Contrary to traditional drone racing systems, which only detect the next gate, our approach makes use of any visible gate and takes advantage of multiple, simultaneous gate detections to compensate for drift in the state estimate and build a global map of the gates. The global map and drift-compensated state estimate allow the drone to navigate through the race course even when the gates are not immediately visible and further enable to plan a near time-optimal path through the race course in real time based on approximate drone dynamics. The proposed system has been demonstrated to successfully guide the drone through tight race courses reaching speeds up to 8m/s and ranked second at the 2019 AlphaPilot Challenge. Keyword Drone racing . Agile flight . Aerial vehicles 1 Introduction fast dynamic obstacles and enhance their maneuverability in confined spaces. Human pilots have shown that drones are 1.1 Motivation capable of flying through complex environments, such as race courses, at breathtaking speeds. However, autonomous Autonomous drones have seen a massive gain in robustness drones are still far from human performance in terms of in recent years and perform an increasingly large set of tasks speed, versatility, and robustness, so that a lot of research across various commercial industries; however, they are still and innovation is needed in order to fill this gap. far from fully exploiting their physical capabilities. Indeed, In order to push the capabilities and performance of most autonomous drones only fly at low speeds near hover autonomous drones, in 2019, Lockheed Martin and the Drone 1,2 conditions in order to be able to robustly sense their environ- Racing League have launched the AlphaPilot Challenge , ment and to have sufficient time to avoid obstacles. Faster an open innovation challenge with a grand prize of $1 million. and more agile flight could not only increase the flight range The goal of the challenge is to develop a fully autonomous of autonomous drones, but also improve their ability to avoid drone that navigates through a race course using machine vision, and which could one day beat the best human pilot. While other autonomous drone races Moon et al. (2017, Video of our approach:https://youtu.be/DGjwm5PZQT8 2019) focus on complex navigation, the AlphaPilot Chal- Talk at RSS 2020:https://youtu.be/k6vGEj1ZZWc lenge pushes the limits in terms of speed and course size to RSS paper:http://rpg.ifi.uzh.ch/docs/RSS20_Foehn.pdf advance the state of the art and enter the domain of human performance. Due to the high speeds at which drones must fly Philipp Foehn, Dario Brescianini, and Elia Kaufmann have contributed in order to beat the best human pilots, the challenging visual equally. environments (e.g., low light, motion blur), and the limited All authors are with the Robotics and Perception Group. computational power of drones, autonomous drone racing B Philipp Foehn https://thedroneracingleague.com/airr/ foehn@ifi.uzh.ch https://www.nytimes.com/2019/03/26/technology/alphapilot-ai- drone-racing.html Dep. of Informatics, Dep. of Neuroinformatics, University of Zurich and ETH, Zurich, Switzerland 123 Autonomous Robots by optic flow is then used to steer the drone towards the detected gate. While this approach is successfully deployed on a real robotic system, the generated control commands do not account for the underactuated system dynamics of the quadrotor, constraining this method to low-speed flight. The approach presented in Kaufmann et al. (2018) also relies on relative gate data but has the advantage that it works even when no gate is visible. In particular, it uses a CNN to directly infer relative gate poses from images and fuse the results with a VIO state estimate. However, the CNN does not perform well when multiple gates are visible as it is frequently the case for drone racing. Assuming knowledge of the platform state and the envi- ronment, there exist many approaches which can reliably generate feasible trajectories with high efficiency. The most Fig. 1 Our AlphaPilot drone waiting on the start podium to prominent line of work exploits the quadrotor’s underactu- autonomously race through the gates ahead ated nature and the resulting differentially-flat output states Mellinger et al. (2012); Mueller et al. (2015), where trajecto- raises fundamental challenges in real-time state estimation, ries are described as polynomials in time. Other approaches perception, planning, and control. additionally incorporate obstacle avoidance Zhou et al. (2019); Gao et al. (2019) or perception constraints Falanga 1.2 Related work et al. (2018); Spasojevic et al. (2020). However, in the context of drone racing, specifically the AlphaPilot Challenge, obsta- Autonomous navigation in indoor or GPS-denied environ- cle avoidance is often not needed, but time-optimal planning ments typically relies on simultaneous localization and is of interest. There exists a handful of approaches for time- mapping (SLAM), often in the form of visual-inertial odom- optimal planning Hehn et al. (2012); Loock et al. (2013); etry (VIO) Cadena et al. (2016). There exists a variety of VIO Ryou et al. (2020); Foehn and Scaramuzza (2020). However, algorithms, e.g., Mourikis and Roumeliotis (2007); Bloesch while Hehn et al. (2012); Loock et al. (2013) are limited to 2D scenarios and only find trajectories between two given et al. (2015); Qin et al. (2018); Forster et al. (2017a), that are based on feature detection and tracking that achieve very states, Ryou et al. (2020) requires simulation and real-world good results in general navigation tasks Delmerico and Scara- data obtained on the track, and the method of Foehn and muzza (2018). However, the performance of these algorithms Scaramuzza (2020) is not applicable due to computational significantly degrades during agile and high-speed flight as constraints. encountered in drone racing. The drone’s high translational and rotational velocities cause large optic flow, making robust 1.3 Contribution feature detection and tracking over sequential images dif- ficult and thus causing substantial drift in the VIO state The approach contributed herein builds upon the work estimate Delmerico et al. (2019). of Kaufmann et al. (2018) and fuses VIO with a robust CNN- To overcome this difficulty, several approaches exploiting based gate corner detection using an extended Kalman filter the structure of drone racing with gates as landmarks have (EKF), achieving high accuracy at little computational cost. been developed, e.g., Li et al. (2019); Jung et al. (2018); The gate corner detections are used as static features to com- Kaufmann et al. (2018), where the drone locates itself rel- pensate for the VIO drift and to align the drone’s flight path ative to gates. In Li et al. (2019), a handcrafted process is precisely with the gates. Contrary to all previous works Li used to extract gate information from images that is then et al. (2019); Jung et al. (2018); Kaufmann et al. (2018), fused with attitude estimates from an inertial measurement which only detect the next gate, our approach makes use of unit (IMU) to compute an attitude reference that guides the any gate detection and even profits from multiple simulta- drone towards the visible gate. While the approach is compu- neous detections to compensate for VIO drift and build a tationally very light-weight, it struggles with scenarios where global gate map. The global map allows the drone to nav- multiple gates are visible and does not allow to employ more igate through the race course even when the gates are not sophisticated planning and control algorithms which, e.g., immediately visible and further enables the usage of sophis- plan several gates ahead. In Jung et al. (2018), a convolu- ticated path planning and control algorithms. In particular, a tional neural network (CNN) is used to retrieve a bounding computationally efficient, sampling-based path planner (see box of the gate and a line-of-sight-based control law aided e.g., LaValle (2006), and references therein) is employed that 123 Autonomous Robots plans near time-optimal paths through multiple gates ahead and is capable of adjusting the path in real time if the global map is updated. This paper extends our previous work Foehn et al. (2020) by including a more detailed elaboration on our gate corner detection in Sect. 4 with an ablation study in Sect. 8.1, further details on the fusion of VIO and gate detection in Sect. 5, and a description of the path parameterization in Sect. 6, com- pleted by an ablation study on the planning horizon length in C z e B Sect. 8.3. e I I e e x y 2 AlphaPilot race format and drone Fig. 2 Illustration of the race drone with its body-fixed coordinate frame B in blue and a camera coordinate frame C in red 2.1 Race format trolled the total thrust f along the drone’s z-axis (see Fig. 2) From more than 400 teams that participated in a series of and the angular velocity, ω = ω ,ω ,ω , in the body-fixed x y z qualification tests including a simulated drone race Guerra coordinate frame B. et al. (2019), the top nine teams were selected to compete in the 2019 AlphaPilot Challenge. The challenge consists 2.3 Drone model of three qualification races and a final championship race at which the six best teams from the qualification races compete Bold lower case and upper case letters will be used to for the grand prize of $1 million. Each race is implemented denote vectors and matrices, respectively. The subscripts in as a time trial competition in which each team is given three p = p − p are used to express a vector from point I I I CB B C attempts to fly through a race course as fast a possible with- C to point B expressed in frame I. Without loss of generality, out competing drones on the course. Taking off from a start I is used to represent the origin of frame I, and B represents podium, the drones have to autonomously navigate through the origin of coordinate frame B. For the sake of readability, a sequence of gates with distinct appearances in the cor- the leading subscript may be omitted if the frame in which rect order and terminate at a designated finish gate. The race the vector is expressed is clear from context. course layout, gate sequence, and position are provided ahead The drone is modelled as a rigid body of mass m with rotor of each race up to approximately ±3 m horizontal uncer- drag proportional to its velocity acting on it Kai et al. (2017). tainty, enforcing teams to come up with solutions that adapt The translational degrees-of-freedom are described by the to the real gate positions. Initially, the race courses were position of its center of mass, p = p , p , p , with B,x B,y B,z planned to have a lap length of approximately 300 m and respect to an inertial frame I as illustrated in Fig. 2.The required the completion up to three laps. However, due to rotational degrees-of-freedom are parametrized using a unit technical difficulties, no race required to complete multiple quaternion, q , where R = R(q ) denotes the rota- IB IB IB laps and the track length at the final championship race was tion matrix mapping a vector from the body-fixed coordinate limited to about 74 m. frame B to the inertial frame I Shuster (1993). A unit quater- nion, q, consists of a scalar q and a vector q ˜ = q , q , q w x y z 2.2 Drone specifications and is defined as q = (q , q ˜ ) Shuster (1993). The drone’s equations of motion are All teams were provided with an identical race drone (Fig. 1) that was approximately 0.7 m in diameter, weighed m p ¨ = R f e − R DR v − m g, (1) IB IB B B z IB 3.4kg, and had a thrust-to-weight ratio of 1.4. The drone was equipped with a NVIDIA Jetson Xavier embedded com- q ˙ = ⊗ q , (2) IB IB puter for interfacing all sensors and actuators and handling all 2 computation for autonomous navigation onboard. The sensor ◦ B suite included two ± 30 forward-facing stereo camera pairs where f and ω are the force and bodyrate inputs, e = (Fig. 2), an IMU, and a downward-facing laser rangefinder (0, 0, 1) is the drone’s z-axis expressed in its body-fixed (LRF). All sensor data were globally time stamped by soft- frame B, D = diag(d , d , 0) is a constant diagonal matrix x y ware upon reception at the onboard computer. Detailed containing the rotor drag coefficients, v = p ˙ denotes the specifications of the available sensors are given in Table 1. drone’s velocity, g is gravity and ⊗ denotes the quaternion The drone was equipped with a flight controller that con- multiplication operator Shuster (1993). The drag coefficients 123 Autonomous Robots Table 1 Sensor specifications Sensor Model Rate Details Cam Leopard imaging IMX 264 60Hz Global shutter, color resolution: 1200 × 720 -4 -3 IMU Bosch BMI088 430Hz Range: ±24g, ±34.5rad/s resolution: 7e g,1e rad/s LRF Garmin LIDAR-Lite v3 120Hz Range: 1–40 m resolution: 0.01 m Sensor Interface Perception State Estimation Planning & Control Drone Interface Camera 1 EKF Gate Path gate map State Detection Camera 2 Planning Estimation trajectory Camera 3 Position Total vehicle state Camera 4 Visual Control Thrust State Inertial IMU attitude Prediction Odometry Attitude Angular Laser Control Velocity Rangefinder Fig. 3 Overview of the system architecture and its main components. All components within a dotted area run in a single thread were identified experimentally to be d = 0.5kg/s and gate detections to align the VIO estimate with the global d = 0.25 kg/s. gate map, i.e., compensates for the VIO drift. Computing the state estimate, in particular interfacing the cameras and running VIO, introduces latency in the order of 130 ms to the system. In order to be able to achieve a high bandwidth of 3 System overview the control system despite large latencies, the vehicle’s state estimate is predicted forward to the vehicle’s current time The system is composed of five functional groups: Sensor using the IMU measurements. interface, perception, state estimation, planning and control, and drone interface (see Fig. 3). In the following, a brief introduction to the functionality of our proposed perception, 3.3 Planning and control state estimation, and planning and control system is given. The global gate map and the latency-compensated state esti- mate of the vehicle are used to plan a near time-optimal path 3.1 Perception through the next N gates starting from the vehicle’s cur- rent state (see Sect. 6). The path is re-planned every time Of the two stereo camera pairs available on the drone, only (i) the vehicle passes through a gate, (ii) the estimate of the the two central forward-facing cameras are used for gate gate map or (iii) the VIO drift are updated significantly, i.e., detection (see Sect. 4) and, in combination with IMU mea- large changes in the gate positions or VIO drift. The path is surements, to run VIO. The advantage is that the amount of image data to be processed is reduced while maintaining a tracked using a cascaded control scheme (see Sect. 7) with an outer proportional-derivative (PD) position control loop very large field of view. Due to its robustness, multi-camera capability and computational efficiency, ROVIO Bloesch and an inner proportional (P) attitude control loop. Finally, the outputs of the control loops, i.e., a total thrust and angular et al. (2015) has been chosen as VIO pipeline. At low speeds, velocity command, are sent to the drone. ROVIO is able to provide an accurate estimate of the quadro- tor vehicle’s pose and velocity relative to its starting position, however, at larger speeds the state estimate suffers from drift. 3.4 Software architecture 3.2 State estimation The NVIDIA Jetson Xavier provides eight CPU cores, how- ever, four cores are used to run the sensor and drone interface. In order to compensate for a drifting VIO estimate, the output The other four cores are used to run the gate detection, VIO, of the gate detection and VIO are fused together with the EKF state estimation, and planning and control, each in a sep- measurements from the downward-facing laser rangefinder arate thread on a separate core. All threads are implemented (LRF) using an EKF (see Sect. 5). The EKF estimates a global asynchronously to run at their own speed, i.e., whenever new map of the gates and, since the gates are stationary, uses the data is available, in order to maximize data throughput and to 123 Autonomous Robots reduce processing latency. The gate detection thread is able 4.1 Stage 1: predicting corner maps and part affinity to process all camera images in real time at 60Hz, whereas fields the VIO thread only achieves approximately 35Hz. In order to deal with the asynchronous nature of the gate detection and In the first detection stage, each input image, I ,is w×h×3 VIO thread and their output, all data is globally time stamped mapped by a neural network into a set of N = 4 corner and integrated in the EKF accordingly. The EKF thread runs maps, C , and N = 4PAFs, E .Pre- w×h×N E w×h×(N ·2) C E every time a new gate or LRF measurement is available. The dicted corner maps as well as PAFs are illustrated in Fig. 4, planning and control thread runs at a fixed rate of 50Hz. To 2nd and 3rd column. The network is trained in a super- achieve this, the planning and control thread includes the vised manner by minimizing the Mean-Squared-Error loss state prediction which compensates for latencies introduced between the network prediction and the ground-truth maps. by the VIO. In the following, ground-truth maps for both map types are explained in detail. 4.1.1 Corner maps For each corner class, j ∈ C , C := { , , , },a j j TL TR BL BR ground-truth corner map, C (s), is represented by a single- 4 Gate detection channel map of the same size as the input image and indicates the existence of a corner of class j at pixel location s in the To correct for drift accumulated by the VIO pipeline, the image. The value at location s ∈ I in C is defined by a gates are used as distinct landmarks for relative localization. Gaussian as In contrast to previous CNN-based approaches to gate detec- tion, we do not infer the relative pose to a gate directly, but ∗ 2 instead segment the four corners of the observed gate in the s − s j 2 C (s) = exp − , (3) input image. These corner segmentations represent the like- lihood of a specific gate corner to be present at a specific pixel coordinate. To represent a value proportional to the likelihood, the maps are trained on Gaussians of the corner ∗ where s denotes the ground truth image position of the near- projections. This allows the detection of an arbitrary amount est corner with class j. The choice of the parameter σ controls of gates, and allows for a more principled inclusion of gate the width of the Gaussian. We use σ = 7 pixel in our imple- measurements in the EKF through the use of reprojection mentation. Gaussians are used to account for small errors in error. Specifically, it exhibits more predictable behavior for the ground-truth corner positions that are provided by hand. partial gate observations and overlapping gates, and allows Ground-truth corner maps are generated for each individual to suppress the impact of Gaussian noise by having multi- gate in the image separately and then aggregated. Aggre- ple measurements relating to the same quantity. Since the gation is performed by taking the pixel-wise maximum of exact shape of the gates is known, detecting a set of charac- the individual corner maps, as this preserves the distinction teristic points per gate allows to constrain the relative pose. between close corners. For the quadratic gates of the AlphaPilot Challenge, these characteristic points are chosen to be the inner corner of the gate border (see Fig. 4, 4th column). However, just detect- 4.1.2 Part affinity fields ing the four corners of all gates is not enough. If just four corners of several gates are extracted, the association of cor- We define a PAF for each of the four possible classes of ners to gates is undefined (see Fig. 4, 3rd row, 2nd column). edges, defined by its two connecting corners as (k, l) ∈ To solve this problem, we additionally train our network to E := {( , ), ( , ), ( , ), ( , )}. For each KL TL TR TR BR BR BL BL TL extract so-called Part Affinity Fields (PAFs), as proposed edge class, (k, l), the ground-truth PAF, E (s), is repre- (k,l) by Cao et al. (2017). These are vector fields, which, in our sented by a two-channel map of the same size as the input case, are defined along the edges of the gates, and point from image and points from corner k to corner l of the same gate, one corner to the next corner of the same gate, see column provided that the given image point s lies within distance d three in Fig. 4. The entire gate detection pipeline consists of such an edge. We use d = 10 pixel in our implementation. of two stages: (1) predicting corner maps and PAFs by the Let G be the set of gates g and S be the set of image (k,l),g neural network, (2) extracting single edge candidates from points that are within distance d of the line connecting the ∗ ∗ the network prediction and assembling them to gates. In the corner points s and s belonging to gate g. Furthermore, let k l ∗ ∗ following, both stages are explained in detail. v be the unit vector pointing from s to s of the same k,l,g k l 123 Autonomous Robots Fig. 4 The gate detection module returns sets of corner points for each corner maps, C (second column), and PAFs, E (third column), are dis- gate in the input image (fourth column) using a two-stage process. In the played in a single image each. While color encodes the corner class for first stage, a neural network transforms an input image, I (first C, it encodes the direction of the 2D vector fields for E. The yellow w×h×3 column), into a set of confidence maps for corners, C (second lines in the bottom of the second column show the six edge candidates of w×h×4 column), and Part Affinity Fields (PAFs) Cao et al. (2017), E the edge class (TL, TR) (the TL corner of the middle gate is below the w×h×(4·2) (third column). In the second stage, the PAFs are used to associate sets detection threshold), see Sect. 4.2. Best viewed in color (Color figure of corner points that belong to the same gate. For visualization, both online) gate. Then, the part affinity field, E (s), is defined as: where s(u) lineraly interpolates between the two corner can- (k,l) didate locations s and s . In practice, S is approximated by k l ∗ uniformly sampling the integrand. v ifs ∈ S , g ∈ G k,l,g (k,l),g E (s) = (4) The line integral S is used as metric to associate corner (k,l) 0 otherwise. candidates to gate detections. The goal is to find the optimal assignment for the set of all possible corner candidates to As in the case of corner maps, PAFs are generated for each gates. As described in Cao et al. (2017), finding this optimal individual gate in the image separately and then aggregated. assignment corresponds to a K -dimensional matching prob- In case a point s lies in S of several gates, the v of (k,l),g k,l,g lem that is known to be NP-Hard West (2001). Following Cao all corresponding gates are averaged. et al. (2017), the problem is simplified by decomposing the matching problem into a set of bipartite matching sub- 4.2 Stage 2: corner association problems. Matching is therefore performed independently for each edge class. Specifically, the following optimization At test time, discrete corner candidates, s , for each corner problem represents the bipartite matching subproblem for class, j ∈ C , are extracted from the predicted corner map edge class (k, l): using non-maximum suppression and thresholding. For each corner class, there might be several corner candidates, due max S = S((s , s )) · z (6) (k,l) k l kl to multiple gates in the image or false positives. These cor- k∈D l∈D k l ner candidates allow the formation of an exhaustive set of s.t. ∀k ∈ D , z ≤ 1 , (7) k kl edge candidates, {(s , s )}, see the yellow lines in Fig. 4. k l l∈D Given the corresponding PAF, E (s), each edge candi- (k,l) date is assigned a score which expresses the agreement of ∀l ∈ D , z ≤ 1 , (8) l kl that candidate with the PAF. This score is given by the line k∈D integral where S is the cumulative matching score and D , D (k,l) k l u=1 denote the set of corner candidates for edge class (k, l).The s − s l k S((s , s )) = E (s(u)) · du, (5) k l (k,l) variable z ∈{0, 1} indicates whether two corner candidates kl s − s u=0 l k 123 Autonomous Robots are connected. Equations (7) and (8) enforce that no two rotational misalignment of the VIO origin frame, V, with edges share the same corner. Above optimization problem respect to the inertial frame, I, represented by p and q , V IV can be solved using the Hungarian method Kuhn (1955), jointly with the gate positions, p , and gate heading, ϕ . IG i i resulting in a set of edge candidates for each edge class (k, l). It can thus correct for an imprecise initial position estimate, With the bipartite matching problems being solved for all VIO drift, and uncertainty in gate positions. The EKF’s state edge classes, the pairwise associations can be extended to space at time t is x = x(t ) with covariance P described k k k k sets of associated edges for each gate. by 4.3 Training data x = p , q , p ,ϕ ,..., p ,ϕ . (9) V IV G IG G IG 0 0 N −1 N −1 The neural network is trained in a supervised fashion using The drone’s corrected pose, p , q , can then be com- a dataset recorded in the real world. Training data is gen- B IB puted from the VIO estimate, p , q , by transforming erated by recording video sequences of gates in 5 different VB VB it from frame V into the frame I using p , q as environments. Each frame is annotated with the corners of V IV all gates visible in the image using the open source image annotation software labelme , which is extended with KLT- p = p + R · p , q = q · q . (10) IV B V VB IB IV VB Tracking for semi-automatic labelling. The resulting dataset used for training consists of 28k images and is split into 24k samples for training and 4k samples for validation. At train- All estimated parameters are expected to be time-invariant ing time, the data is augmented using random rotations of up but subject to noise and drift. This is modelled by a Gaussian to 30 and random changes in brightness, hue, contrast and random walk, simplifying the EKF process update to: saturation. x = x , P = P + t Q, (11) 4.4 Network architecture and deployment k+1 k k+1 k k The network architecture is designed to optimally trade- where Q is the random walk process noise. For each mea- off between computation time and accuracy. By conducting surement z with noise R the predicted apriori estimate, x , a neural network architecture search, the best performing is corrected with measurement function, h(x ), and Kalman architecture for the task is identified. The architecture search gain, K , resulting in the a posteriori estimate, x ,as is limited to variants of U-Net Ronneberger et al. (2015) due to its ability to perform segmentation tasks efficiently −1 − − K = P H H P H + R , k k with a very limited amount of labeled training data. The best k k k k + − − performing architecture is identified as a 5-level U-Net with x = x + K z − h(x ) , k k k k k + − [12, 18, 24, 32, 32] convolutional filters of size [3, 3, 3, 5, 7] P = ( I − K H ) P , (12) k k k k per level and a final additional layer operating on the output of the U-Net containing 12 filters. At each layer, the input with h(x ), the measurement function with Jacobian H . feature map is zero-padded to preserve a constant height However, the filter state includes a rotation quaternion and width throughout the network. As activation function, constrained to unit norm, q  = 1. This is effectively LeakyReLU with α = 0.01 is used. For deployment on the IV an over-parameterization in the filter state space and can Jetson Xavier, the network is ported to TensorRT 5.0.2.6. To lead to poor linearization as well as underestimation of the optimize memory footprint and inference time, inference is covariance. To apply the EKFs linear update step on the performed in half-precision mode (FP16) and batches of two over-parameterized quaternion, it is lifted to its tangent space images of size 592 × 352 are fed to the network. description, similar to Forster et al. (2017b). The quaternion q is composed of a reference quaternion, q , which IV IV ref is adjusted after each update step, and an error quaternion, 5 State estimation q , of which only its vector part, q ˜ , is in the EKF’s V V V V ref ref state space. Therefore we get The nonlinear measurement models of the VIO, gate detec- tion, and laser rangefinder are fused using an EKF Kalman (1960). In order to obtain the best possible pose accuracy 1 − q ˜ · q ˜ V V ref V V ref q = q · q q = relative to the gates, the EKF estimates the translational and IV IV V V V V ref ref ref q ˜ V V ref 3 (13) https://github.com/wkentaro/labelme 123 Autonomous Robots from which we can derive the Jacobian of any measurement where p and R describe a constant transformation BC BC function, h(x), with respect to q by the chain rule as between the drone’s body frame B and camera frame C (see IV Fig. 2). The Jacobian with respect to the EKF’s state space ∂ ∂ ∂ q IV is derived using the chain rule, h(x) = h(x) · (14) ∂ q ˜ ∂ q ∂ q ˜ V V IV V V ref ref ⎡  ⎤ ∂ p (x) ∂ ∂ h (x) −q ˜ Co Gate ij V V ref h (x) = · , (20) Gate ∂ x ∂ p (x) ∂ x ⎣ ⎦ 1−q ˜ ·q ˜ = h(x) ·[q ] Co × V V V V ij IV ref ref ref ∂ q ˜ IV 3×3 where the first term representing the derivative of the pro- (15) jection, and the second term represents the derivative with respect to the state space including gate position and ori- where we arrive at (15)byusing (13)in (14) and use [q ] IV ref entation, and the frame alignment, which can be further to represent the matrix resulting from a lefthand-side multi- decomposed using (14). plication with q . IV ref 5.1.2 Gate correspondences 5.1 Measurement modalities The gate detection (see Fig. 4) provides sets of m measure- All measurements up to the camera frame time t are passed ments, S ={s ,..., s }, corresponding to the Co 0 Co m−1 to the EKF together with the VIO estimate, p and q , i ˆ ˆ ij ij VB,k VB,k with respect to the VIO frame V. Note thate the VIO estimate unknown gate i at known corners j ∈ C . To identify the is assumed to be a constant parameter, not a filter state, which correspondences between a detection set S and the gate G vastly simplifies derivations ad computation, leading to an in our map, we use the square sum of reprojection error. For efficient yet robust filter. this, we first compute the reprojection of all gate corners, s , according to (16) and then compute the square error Co ij 5.1.1 Gate measurements sum between the measurement set, S , and the candidates, s . Finally, the correspondence is established to the gate Co ij Gate measurements consist of the image pixel coordinates, G which minimizes the square error sum, as in s , of a specific gate corner. These corners are denoted Co ij with top left and right, and bottom left and right, as in j ∈ C , j argmin (s − s ) (s − s ) (21) Co Co Co Co ˆ ij ˆ ij ij ij i ∈[0,N −1] C := { , , , } and the gates are enumerated by i ∈ s ∈S j TL TR BL BR Co ˆ i j [0, N − 1]. All gates are of equal width, w, and height, h,so that the corner positions in the gate frame, G , can be written as p = (0, ±w, ±h). The measurement equation can G Co i ij 2 5.1.3 Laser rangefinder measurement be written as the pinhole camera projection Szeliski (2010) of the gate corner into the camera frame. A pinhole camera The drone’s laser rangefinder measures the distance along maps the gate corner point, p , expressed in the camera Co ij the drones negative z-axis to the ground, which is assumed frame, C, into pixel coordinates as to be flat and at a height of 0 m. The measurement equation can be described by 1 f 0 c x x h (x) = s = p , (16) Gate Co ij Co ij 0 f c [ p ] y y Co ij [ p ] [ p + R p ] z z B V IV VB h (x) = = . (22) LRF B B [ R e ] [ R R e ] IB z IV VB z z z where [·] indicates the scalar z-component of a vector, f z x and f are the camera’s focal lengths and c , c is the y x y The Jacobian with respect to the state space is again derived camera’s optical center. The gate corner point, p ,isgiven Co ∂h ∂h LRF LRF ij by and and further simplified using (14). ∂ p ∂ q V IV by p = R p + R p − p , (17) IG Co G i G Co C ij IC i i j 6 Path planning with p and R being the transformation between the iner- IC For the purpose of path planning, the drone is assumed to be a tial frame I and camera frame C, point mass with bounded accelerations as inputs. This simpli- fication allows for the computation of time-optimal motion p = p + R p + R p , (18) IV VB C V VB BC primitives in closed-form and enables the planning of approx- R = R R R , (19) IC IV VB BC imate time-optimal paths through the race course in real time. 123 Autonomous Robots Even though the dynamics of the quadrotor vehicle’s accel- B,x eration cannot be neglected in practice, it is assumed that this p B,y B,z simplification still captures the most relevant dynamics for -5 path planning and that the resulting paths approximate the -10 true time-optimal paths well. In order to facilitate the track- ing of the approximate time-optimal path, polynomials of B,x B,y order four are fitted to the path which yield smoother posi- B,z tion, velocity and acceleration commands, and can therefore -5 be better tracked by the drone. -10 In the following, time-optimal motion primitives based 00.51.01.5 1.96 on the simplified dynamics are first introduced and then a Time [s] path planning strategy based on these motion primitives is Fig. 5 Example time-optimal motion primitive starting from rest at presented. Finally, a method to parameterize the time-optimal the origin to a random final position with non-zero final velocity. The path is introduced. velocities are constrained to ±7.5m/s and the inputs to ±12 m/s .The dotted lines denote the per-axis time-optimal maneuvers 6.1 Time-optimal motion primitive ∗ ∗ ∗ primitive to exist, a new parameter α ∈[0, 1] is introduced The minimum times, T , T and T , required for the vehicle x y z that scales the acceleration bounds, i.e., the applied control to fly from an initial state, consisting of position and veloc- inputs are scaled to αu and αu , respectively. Fig. 5 depicts ity, to a final state while satisfying the simplified dynamics x the position and velocity of an example time-optimal motion p ¨ (t ) = u(t ) with the input acceleration u(t ) being con- primitive. strained to u ≤ u(t ) ≤ u are computed for each axis individually. Without loss of generality, only the x-axis is 6.2 Sampling-based receding horizon path planning considered in the following. Using Pontryagin’s maximum principle Bertsekas (1995), it can be shown that the optimal The objective of the path planner is to find the time-optimal control input is bang-bang in acceleration, i.e., has the form path from the drone’s current state to the final gate, passing through all the gates in the correct order. Since the previously u , 0 ≤ t ≤ t , ∗ x u (t ) = (23) x introduced motion primitive allows the generation of time- ∗ ∗ u , t < t ≤ T , optimal motions between any initial and any final state, the time-optimal path can be planned by concatenating a time- or vice versa with the control input first being u followed by optimal motion primitive starting from the drone’s current u . In order to control the maximum velocity of the vehicle, (simplified) state to the first gate with time-optimal motion e.g., to constrain the solutions to ranges where the simplified primitives that connect the gates in the correct order until the dynamics approximate the true dynamics well or to limit the final gate. This reduces the path planning problem to finding motion blur of the camera images, a velocity constraint of the the drone’s optimal state at each gate such that the total time form v ≤ v (t ) ≤ v can be added, in which case the opti- B B is minimized. To find the optimal path, a sampling-based mal control input has bang-singular-bang solution Maurer strategy is employed where states at each gate are randomly (1977) sampled and the total time is evaluated subsequently. In par- ticular, the position of each sampled state at a specific gate ⎪ u , 0 ≤ t ≤ t , ⎨ x 1 is fixed to the center of the gate and the velocity is sampled ∗ ∗ u (t ) = (24) 0, t < t ≤ t , x uniformly at random such the velocity lies within the con- 1 2 ∗ ∗ u , t < t ≤ T , straints of the motion primitives and the angle between the velocity and the gate normal does not exceed a maximum or vice versa. It is straightforward to verify that there exist angle, ϕ It is trivial to show that as the number of sam- max closed-form solutions for the minimum time, T ,aswellas pled states approaches infinity, the computed path converges the switching times, t , in both cases (23)or (24). to the time-optimal path. Once the minimum time along each axis is computed, In order to solve the problem efficiently, the path planning ∗ ∗ ∗ ∗ the maximum minimum time, T = max(T , T , T ),is problem is interpreted as a shortest path problem. At each x y z computed and motion primitives of the same form as in (23) gate, M different velocities are sampled and the arc length or (24) are computed among the two faster axes but with from each sampled state at the previous gate is set to be equal ∗ ∗ the final time constrained to T such that trajectories along to the duration, T , of the time-optimal motion primitive that each axis end at the same time. In order for such a motion guides the drone from one state to the other. Due to the exis- Velocity [m/s] Position [m] Autonomous Robots tence of a closed-form expression for the minimum time, T , control loop can track setpoint changes perfectly, i.e., without setting up and solving the shortest path problem can be done any dynamics or delay. very efficiently using, e.g., Dijkstra’s algorithm Bertsekas (1995) resulting in the optimal path p (t ). In order to further 7.1 Position control reduce the computational cost, the path is planned in a reced- ing horizon fashion, i.e., the path is only planned through the The position control loop along the inertial z-axis is designed next N gates. such that it responds to position errors p = p − p 6.3 Path parameterization B ,z B ,z B,z err ref in the fashion of a second-order system with time constant Due to the simplifications of the dynamics that were made τ and damping ratio ζ , when computing the motion primitives, the resulting path is pos,z pos,z infeasible with respect to the quadrotor dynamics (1) and 1 2ζ pos,z (2) and thus is impossible to be tracked accurately by the p ¨ = p + p ˙ +¨ p . (31) B,z B ,z B ,z B ,z err err ref τ τ drone. To simplify the tracking of the time-optimal path, the pos,z pos,z path is approximated by fourth order polynomials in time. Similarly, two control loops along the inertial x- and y-axis In particular, the path is divided into multiple segments of are shaped to make the horizontal position errors behave equal arc length. Let t ∈[t , t ) be the time interval of the k k+1 like second-order systems with time constants τ and pos,xy k-th segment. In order to fit the polynomials, p ¯ (t ),tothe damping ratio ζ . Inserting (31) into the translational pos,xy k-th segment of the time-optimal path, we require that the dynamics (1), the total thrust, f , is computed to be initial and final position and velocity are equal to those of the time-optimal path, i.e., [m p ¨ + g + R DR v ] IB B z ref IB f = . (32) ∗ ∗ [R ] p ¯ (t ) = p (t ), p ¯ (t ) = p (t ), (25) IBe k k k+1 k+1 z k k ∗ ∗ ˙ ˙ ¯p (t ) = p ˙ (t ), ¯p (t ) = p ˙ (t ), (26) k k k+1 k+1 k k 7.2 Attitude control and that the positions at t = (t − t ) /2 coincide as well: k+1 k The required acceleration from the position controller deter- mines the orientation of the drone’s z-axis and is used, in t + t t + t k+1 k k+1 k p ¯ = p . (27) k combination with a reference yaw angle, ϕ , to compute ref 2 2 the drone’s reference attitude. The reference yaw angle is chosen such that the drone’s x-axis points towards the refer- The polynomial parameterization p (t ) of the k-th segment ence position 5 m ahead of the current position, i.e., that the is then given by drone looks in the direction it flies. A nonlinear attitude con- troller similar to Brescianini and D’Andrea (2018) is applied 4 3 2 p ¯ (t ) = a s + a s + a s + a s + a , (28) 4,k 3,k 2,k 1,k 0,k that prioritizes the alignment of the drone’s z-axis, which is crucial for its translational dynamics, over the correction of with s = t − t being the relative time since the start of k-th the yaw orientation: segment. The velocity and acceleration required for the drone ⎡ ⎤ to track this polynomial path can be computed by taking the q q − q q w x y z 2sgn(q ) derivatives of (28), yielding −1 ⎣ ⎦ ω = T q q + q q , (33) w y x z att 2 2 q + q w z z 3 2 ¯p (t ) = 4a s + 3a s + 2a s + a , (29) 4,k 3,k 2,k 1,k ¯p (t ) = 12a s + 6a s + 2a . (30) where q , q , q and q are the components of the attitude 4,k 3,k 2,k k w x y z −1 error, q ⊗ q , and where T is a diagonal matrix att IB IB ref containing the per-axis first-order system time constants for 7 Control small attitude errors. This section presents a control strategy to track the near time-optimal path from Sect. 6. The control strategy is based 8 Results on a cascaded control scheme with an outer position control loop and an inner attitude control loop, where the position The proposed system was used to race in the 2019 AlphaPi- control loop is designed under the assumption that the attitude lot championship race. The course at the championship race 123 Autonomous Robots 8m/s p p B VB Final Gate Start Start Gate 1 Gate 1 0 0 6m/s Gate 2 Gate 2 -10 -10 4m/s -20 -20 2m/s Gate 3 Gate 3 Start -30 -30 Gate 4 Gate 4 Gate 1 Final Gate Final Gate 0m/s -10 0 10 20 30 -10 0 10 20 30 04 8 12 p [m] p [m] p [m] x x x Fig. 6 Top view of the planned (left) and executed (center) path at the championship race, and an executed multi-lap path at a testing facility (right). Left: Fastest planned path in color, sub-optimal sampled paths in gray. Center: VIO trajectory as p and corrected estimate as p (Color figure VB B online) consisted of five gates and had a total length of 74 m. A top Table 2 Comparison of different network architectures with respect to intersection over union (IoU), precision (Pre.) and recall (Rec.). view of the race course as well as the results of the path plan- The index in the architecture name denotes the number of levels in ning and the fastest actual flight are depicted in Fig. 6 (left the U-Net. All networks contain one layer per level with kernel sizes and center). With the motion primitive’s maximum veloc- of [3, 3, 5, 7, 7] and [12, 18, 24, 32, 32] filters per level. Architectures ity set to 8 m/s, the drone successfully completed the race labelled with ’L’ contain twice the amount of filters per level. Timings are measured for single input images of size 352 x 592 on a desktop course in a total time of 11.36 s, with only two other teams computer equipped with an NVIDIA RTX 2080 Ti also completing the full race course. The drone flew at an Arch. IoU Pre. Rec. #params Latency [s] average velocity of 6.5m/s and reached the peak velocity of 8m/s multiple times. Note that due to missing ground truth, UNet-5L 0.966 0.997 0.967 613k 0.106 Fig. 6 only shows the estimated and corrected drone position. UNet-5 0.964 0.997 0.918 160k 0.006 The system was further evaluated at a testing facility where UNet-4L 0.948 0.997 0.920 207k 0.085 there was sufficient space for the drone to fly multiple laps UNet-4 0.941 0.989 0.862 58k 0.005 (see Fig. 6, right), albeit the course consisted of only two UNet-3L 0.913 0.991 0.634 82k 0.072 gates. The drone was commanded to pass 4 times through UNet-3 0.905 0.988 0.520 27k 0.005 gate 1 before finishing in the final gate. Although the gates were not visible to the drone for most of the time, the drone successfully managed to fly multiple laps. Thanks to the global gate map and the VIO state estimate, the system was summarizes the performance of different network architec- able to plan and execute paths to gates that are not directly tures. Performance is evaluated quantitatively on a separate visible. By repeatedly seeing either one of the two gates, the test set of 4k images with respect to intersection over union drone was able to compensate for the drift of the VIO state (IoU) and precision/recall for corner predictions. While the estimate, allowing the drone to pass the gates every time IoU score only takes full gate detections into account, the exactly through their center. Note that although seeing gate precision/recall scores are computed for each corner detec- 1 in Fig. 6 (right) at least once was important in order to tion. Based on these results, architecture UNet-5 is selected update the position of the gate in the global map, the VIO for deployment on the real drone due to the low inference drift was also estimated by seeing the final gate. time and high performance. On the test set, this network achieves an IoU score with the human-annotated ground truth The results of the system’s main components are discussed in detail in the following subsections, and a video of the of 96.4%. When only analyzing the predicted corners, the results is attached to the paper. network obtains a precision of 0.997 and a recall of 0.918. Deployment Even in instances of strong changes in illumina- tion, the gate detector was able to accurately identify the gates 8.1 Gate detection in a range of 2 − 17 m. Fig. 4 illustrates the quality of detec- tions during the championship race (1st row) as well as for Architecture search Due to the limited computational bud- cases with multiple gates, represented in the test set (2nd/3rd get of the Jetson Xavier, the network architecture was row). With the network architecture explained in Sect. 4, one designed to maximize detection accuracy while keeping a low simultaneous inference for the left- and right-facing camera inference time. To find such architecture, different variants requires computing 3.86GFLOPS (40kFLOPS per pixel). By of U-Net Ronneberger et al. (2015) are compared. Table 2 implementing the network in TensorRT and performing infer- p [m] p [m] p [m] y Autonomous Robots Table 3 Total flight time vs. N Flight time Computation time gates computation time averaged over 100 runs. The percentage in 19.5935 s 1.66 ms 2.35% parenthesis is the computation 29.2913 s 18.81 ms 26.56% time with respect to the computational time for the full39.2709 s 35.74 ms 50.47% track 49.2667 s 53.00 ms 74.84% 5 (full track) 9.2622 s 70.81 ms 100% CPC Foehn and Scaramuzza (2020) (full track) 6.520 s 4.62 · 10 ms 6524% ence in half-precision mode (FP16), this computation takes shows the flight and computation time of the time-optimal 10.5 ms on the Jetson Xavier and can therefore be performed trajectory generation from Foehn and Scaramuzza (2020), at the camera update rate. which significantly outperforms our approach but is far away from real-time execution with a computation time of 462 s for a single solution. Online replanning would therefore not 8.2 State estimation be possible, and any deviations from the nominal track layout could lead to a crash. Compared to a pure VIO-based solution, the EKF has proven Please also note that the evaluation of our method is per- to significantly improve the accuracy of the state estima- formed in Matlab on a laptop computer, while the final tion relative to the gates. As opposed to the works by Li optimized implementation over N = 3 gates achieved et al. (2019); Jung et al. (2018); Kaufmann et al. (2018), the replanning times of less than 2 ms on the Jetson Xavier and proposed EKF is not constrained to only use the next gate, can thus be done in every control update step. Figure 6 (right) but can work with any gate detection and even profits from shows resulting path and velocity of the drone in a multi-lap multiple detections in one image. Fig. 6 (center) depicts the scenario, where the drone’s velocity was limited to 6 m/s. It flown trajectory estimated by the VIO system as p and the VB can be seen that drone’s velocity is decreased when it has to EKF-corrected trajectory as p (the estimated corrections fly a tight turn due to its limited thrust. are depicted in gray). Accumulated drift clearly leads to a large discrepancy between VIO estimate p and the cor- VB rected estimate p . Towards the end of the track at the two last gates this discrepancy would be large enough to cause the 9 Discussion and conclusion drone to crash into the gate. However, the filter corrects this discrepancy accurately and provides a precise pose estimate The proposed system managed to complete the course at a relative to the gates. Additionally, the imperfect initial pose, velocity of 5 m/s with a success rate of 100% and at 8 m/s in particular the yaw orientation, is corrected by the EKF with a success rate of 60%. At higher speeds, the combi- while flying towards the first gate as visible in the zoomed nation of VIO tracking failures and no visible gates caused section in Fig. 6 (center). the drone to crash after passing the first few gates. This fail- ure could be caught by integrating the gate measurements 8.3 Planning and control directly in a VIO pipeline, tightly coupling all sensor data. Another solution could be a perception-aware path planner Figure 6 (left) shows the nominally planned path for the trading off time-optimality against motion blur and maxi- AlphaPilot championship race, where the coloured line mum gate visibility. depicts the fastest path along all the sampled paths depicted The advantages of the proposed system are (i) a drift-free in gray. In particular, a total of M = 150 different states are state estimate at high speeds, (ii) a global and consistent gate sampled at each gate, with the velocity limited to 8 m/s and map, and (iii) a real-time capable near time-optimal path the angle between the velocity and the gate normal limited planner. However, these advantages could only partially be to ϕ = 30 . During flight, the path is re-planned in a exploited as the races neither included multiple laps, nor had max receding horizon fashion through the next N = 3 gates (see complex segments where the next gates were not directly Fig. 6, center). It was experimentally found that choosing visible. Nevertheless, the system has proven that it can han- N ≥ 3 only has minimal impact of the flight time comapred dle these situations and is able to navigate through complex to planning over all gates, while greatly reducing the com- race courses reaching speeds up to 8 m/s and completing the putational cost. Table 3 presents the trade-offs between total championship race track of 74 m in 11.36 s. flight time and computation cost for different horizon lengths While the 2019 AlphaPilot Challenge pushed the field of N for the track shown in Fig. 6 (left). In addition, Table 3 autonomous drone racing, in particularly in terms of speed, 123 Autonomous Robots autonomous drones are still far away from beating human meet learning for drone racing. IEEE International Conference on Robotics and Automation (ICRA), pp 690–696. pilots. Moreover, the challenge also left open a number of Falanga, Davide., Foehn, Philipp., Lu, Peng., and Scaramuzza, Davide problems, most importantly that the race environment was .(2018). Pampc: Perception-aware model predictive control for partially known and static without competing drones or mov- quadrotors. In: 2018 IEEE/RSJ International Conference on Intel- ing gates. In order for autonomous drones to fly at high speeds ligent Robots and Systems (IROS), pp. 1–8. Foehn, Philipp and Scaramuzza, Davide (2020) CPC: Complemen- outside of controlled or known environments and succeed tary progress constraints for time-optimal quadrotor trajectories. in many more real-world applications, they must be able to Preprint arXiv:2007.06255. handle unknown environments, perceive obstacles and react Foehn, P., Brescianini, D., Kaufmann, E., Cieslewski, T., Gehrig, accordingly. These features are areas of active research and M., Muglikar, M., and Scaramuzza. D.(2020). Alphapilot: Autonomous drone racing. In Robotics: Science and Systems are intended to be included in future versions of the proposed (RSS). 10.15607/RSS.2020.XVI.081. drone racing system. Forster, Christian, Zhang, Zichao, Gassner, Michael, Werlberger, Manuel, & Scaramuzza, Davide. (2017a). SVO: Semidirect visual Acknowledgements This work was supported by the Intel Network on odometry for monocular and multicamera systems. IEEE Transac- Intelligent Systems, the Swiss National Science Foundation (SNSF) tions on Robotics, 33(2), 249–265. https://doi.org/10.1109/TRO. through the National Center of Competence in Research (NCCR) 2016.2623335 Robotics, the SNSF-ERC Starting Grant, and SONY. Forster, Christian, Carlone, Luca, Dellaert, Frank, & Scaramuzza, Davide. (2017b). On-manifold preintegration for real-time visual- Funding Open Access funding provided by Universitàt Zürich. inertial odometry. IEEE Transactions on Robotics, 33(1), 1–21. https://doi.org/10.1109/TRO.2016.2597321 Open Access This article is licensed under a Creative Commons Gao, Fei, William, Wu., Gao, Wenliang, & Shen, Shaojie. (2019). Fly- Attribution 4.0 International License, which permits use, sharing, adap- ing on point clouds: Online trajectory generation and autonomous tation, distribution and reproduction in any medium or format, as navigation for quadrotors in cluttered environments. Journal of long as you give appropriate credit to the original author(s) and the Field Robotics. https://doi.org/10.1002/rob.21842 source, provide a link to the Creative Commons licence, and indi- Guerra, Winter., Tal, Ezra., Murali, Varun., Ryou, Gilhyun., and cate if changes were made. The images or other third party material Karaman, Sertac.(2019). Flightgoggles: Photorealistic sensor sim- in this article are included in the article’s Creative Commons licence, ulation for perception-driven robotics using photogrammetry and unless indicated otherwise in a credit line to the material. If material virtual reality. arXiv:1905.11377. is not included in the article’s Creative Commons licence and your Hehn, M., Ritz, R., & DâAdrea, R., (2012). Performance benchmark- intended use is not permitted by statutory regulation or exceeds the ing of quadrotor systems using time-optimal control. Autonomous permitted use, you will need to obtain permission directly from the copy- Robots,. https://doi.org/10.1007/s10514-012-9282-3. right holder. To view a copy of this licence, visit http://creativecomm Kai, J. M., Allibert, G., Hua, M. D., & Hamel, T. (2017). Nonlinear ons.org/licenses/by/4.0/. feedback control of quadrotors exploiting first-order drag effects. IFAC-PapersOnLine, 50(1), 8189–8195. Kalman, Rudolf E. (1960). A new approach to linear filtering and pre- diction problems. Journal of Basic Engineering, 82(1), 35–45. Kuhn, H. W. (1955). The hungarian method for the assignment problem. References Naval Research Logistics Quarterly, 2(1–2), 83–97. LaValle, S. M. (2006). Planning algorithms. Cambridge Cambridge Bertsekas, Dimitri P. (1995). Dynamic programming and optimal con- Cambridge Cambridge: Cambridge University Press. trol (Vol. 1). MA: Athena scientific Belmont. Li, Shuo., van der Horst, Erik.,Duernay, Philipp., De Wagter, Bloesch, Michael, Omari, Sammy, Hutter, Marco, & Siegwart, Roland. Christophe., and de Croon, Guido.(2019). Visual model-predictive (2015). Robust visual inertial odometry using a direct EKF-based localization for computationally efficient autonomous racing of a approach. 72-gram drone. arXiv:1905.10110. Brescianini, D., & DâAndrea, R., (2018). Tilt-prioritized quadrocopter Loock, W. Van., Pipeleers, G., and Swevers, J.(2013). Time-optimal attitude control. IEEE Transactions on Control Systems Technol- quadrotor flight. In IEEE European Control of Conference (ECC). ogy, 28(2), 376–387. 10.23919/ECC.2013.6669253. Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J., Maurer, H. (1977). On optimal control problems with bounded state et al. (2016). Past, present, and future of simultaneous localization variables and control appearing linearly. SIAM Journal on Control and mapping: Toward the robust-perception age. IEEE Transac- and Optimization, 150(3), 345–362. tions on Robotics, 32(6), 1309–1332. Mellinger, D., Michael, N., & Kumar, V. (2012). Trajectory genera- Cao, Zhe., Simon, Tomas., Wei, Shih-En., and Sheikh, Yaser .(2017). tion and control for precise aggressive maneuvers with quadrotors. Realtime multi-person 2d pose estimation using part affinity fields. The International Journal of Robotics Research. https://doi.org/ In: Proceedings of the IEEE Conference on Computer Vision and 10.1177/0278364911434236 Pattern Recognition, pp. 7291–7299. Moon, H., Sun, Y., Baltes, J., & Kim, S. J. (2017). The IROS 2016 Delmerico J, & Scaramuzza D. (2018). A benchmark comparison of competitions. IEEE Robotics Automation Magazine, 24(1), 20– monocular visual-inertial odometry algorithms for flying robots. 29. Moon, H., Martinez-Carranza, J., Cieslewski, T., Faessler, M., Falanga, Delmerico J, Cieslewski T, Rebecq H, Faessler M, & Scaramuzza D. D., Simovic, A., et al. (2019). Challenges and implemented tech- (2019). Are we ready for autonomous drone racing? the UZH-FPV nologies used in autonomous drone racing. Intelligent Service drone racing dataset. Robotics, 12, 137–148. Douglas Brent West. (2001). Introduction to graph theory (Vol. 2). NJ: Mourikis, Anastasios I., and Roumeliotis, S.I.(April 2007). A multi- Prentice hall Upper Saddle River. state constraint Kalman filter for vision-aided inertial navigation. Elia K, M Gehrig, P Foehn, R Ranftl, A Dosovitskiy, V Koltun, and D Scaramuzza. (2018) Beauty and the beast: Optimal methods 123 Autonomous Robots In: IEEE International Conference on Robotics and Automation Sunggoo, J., Sunyou, H., Heemin, S., & Shim, D. H. (2018). Percep- (ICRA), pp. 3565–3572. tion, guidance, and navigation for indoor autonomous drone racing Mueller MW., Hehn M., and D’Andrea R.(2015). A computationally using deep learning. IEEE Robotics and Automation Letters, 3(3), efficient motion primitive for quadrocopter trajectory generation. 2539–2544. https://doi.org/10.1109/LRA.2018.2808368 IEEE Transactions on Robotics. doi: https://doi.org/10.1109/TRO. Szeliski, Richard .(2010). Computer Vision: Algorithms and 2015.2479878. Applications. Texts in Computer Science. Springer. ISBN Qin, Tong, Li, Peiliang, & Shen, Shaojie. (2018). VINS-Mono: A 9781848829343. robust and versatile monocular visual-inertial state estimator. IEEE Zhou, B., Gao, F., Wang, L., Liu, C., & Shen, S. (2019). Robust and Transactions on Robotics, 34(4), 1004–1020. https://doi.org/10. efficient quadrotor trajectory generation for fast autonomous flight. 1109/TRO.2018.2853729 IEEE Robotics and Automation Letters. https://doi.org/10.1109/ Ronneberger, Olaf., Fischer, Philipp., and Brox, Thomas .(2015). U- LRA.2019.2927938 net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical image computing and Publisher’s Note Springer Nature remains neutral with regard to juris- computer-assisted intervention, pp. 234–241. Springer. dictional claims in published maps and institutional affiliations. Ryou, G., Tal, E., and Karaman, S.(2020). Multi-fidelity black-box opti- mization for time-optimal quadrotor maneuvers. Robotics: Science and Systems (RSS). 10.15607/RSS.2020.XVI.032. Shuster, Malcolm D. (1993). Survey of attitude representations. Journal of the Astronautical Sciences, 410(4), 439–517. Spasojevic, Igor, Murali, Varun, & Karaman, Sertac. (2020). Joint fea- ture selection and time optimal path parametrization for high speed vision-aided navigation.

Journal

Autonomous RobotsSpringer Journals

Published: Oct 19, 2021

Keywords: Drone racing . Agile flight . Aerial vehicles

References