Performance analysis of Volna-OP2 -- massively parallel code for tsunami modelling
Performance analysis of Volna-OP2 -- massively parallel code for tsunami modelling
Giles, Daniel;Kashdan, Eugene;Salmanidou, Dimitra M.;Guillas, Serge;Dias, Frédéric
2020-02-12 00:00:00
The software package Volna-OP2 is a robust and efficient code capable of simulating the complete life cycle of a tsunami whilst harnessing the latest High Performance Computing (HPC) architectures. In this paper, a comprehensive error analysis and scalability study of the GPU version of the code is presented. A novel decomposition of the numerical errors into the dispersion and dissipation components is explored. Most tsunami codes exhibit amplitude smearing and/or phase lagging/leading, so the decomposition shown here is a new approach and novel tool for explaining these occurrences. It is the first time that the errors of a tsunami code have been assessed in this manner. To date, Volna-OP2 has been widely used by the tsunami modelling community. In particular its compu- tational efficiency has allowed various sensitivity analyses and uncertainty quantification studies. Due to the number of simulations required, there is always a trade-off between accuracy and runtime when carrying out these statistical studies. The analysis presented in this paper will guide the user towards an acceptable level of accuracy within a given runtime. Keywords— Tsunami modelling, Finite-volume scheme, Error analysis, GPUs 1 Introduction The software package Volna-OP2 is used for the simulations of tsunami modelling by a number of research groups around the world since its first introduction in 2011 [1]. The driving force for developing the code was a need within the tsunami research community for a solver which was applicable for analysis of realistic tsunami events and aimed to aid operational tsunami research [1, 2]. The code solves the depth-averaged Nonlinear Shallow Water Equations (NSWE) in two horizontal dimensions (x; y) using modern numerical methods for solution of hyperbolic systems. Volna-OP2 can efficiently simulate the complete life of a tsunami from generation induced by bathymetry displacement, propagation and inundation onshore. It can be used for cases of a simplified bathymetry represented by a mathematical formula but also for the complex bathymetry and topography of the examined geographical region. Owing to the use of an unstructured triangular mesh, irregular bathymetric and topographic features can be efficiently captured and represented. The first operational use of Volna-OP2 for a realistic scenario was for the modelling of sliding and tsunami generation in the St. Lawrence estuary in Canada [3]. The code has been used to model varying tsunami- genic episodes [4, 5]; in several cases it has been used in conjunction with statistical modelling to perform comprehensive sensitivity analysis tests and uncertainty quantification [6, 7, 8, 9, 10]. Both the originally developed and the newly parallelised version of Volna-OP2 have been carefully val- idated against well known benchmarks available to the tsunami community [1, 2]. However, in the present paper, we pay particular attention to the accuracy of the new GPU version of the code with special emphasis on dispersion and dissipation errors as well as its computational efficiency on the general purpose GPU cluster. Gaining a deeper understanding of the numerical errors present can inform the user towards more accurate real case simulations. The manuscript is organised as follows: in the next section (2), we describe the mathematical model and numerical schemes implemented in the parallel version of the code. Section (3) is dedicated to an analytic benchmark used to explore the accuracy of the code and it is followed by the analysis of the dispersion and dissipation errors in Section (4). Section (5) discusses application of the code to real cases and the scalabil- ity of its GPU implementation. The paper is wrapped up by concluding remarks and perspective developments in Section (6). arXiv:2002.04889v1 [physics.comp-ph] 12 Feb 2020 2 Mathematical Model and Algorithms 2.1 Nonlinear shallow water equations The shallow water theory is efficiently used to describe the physics of long waves like tsunamis, which are characterised by very large wavelengths in comparison to the depth of the basin over which they propagate. Neglecting dispersion the NSWE yield: @H +r (H~u) = 0; (1) @t @(H~u) g +r (H~u ~u + H I) = gHrh; (2) @t 2 where H = (h + ) is the total water depth, described as the sum of the time-dependent bathymetry h(x; y; t) and the free surface elevation (x; y; t), ~u(u; v) is the fluid velocity in the x and y horizontal directions, I is the identity matrix and g is the acceleration due to gravity. Provided that H > 0 the system is strictly hyperbolic. In the wet/dry transition the system starts to become non-hyperbolic since H = 0 in a dry region. To deal with that an algorithm that solves the shoreline Riemann problem developed by [11] is implemented in the code. 2.2 Spatial discretisation A cell-centered Finite Volume (FV) numerical method is used for the spatial discretization in Volna-OP2 [1]. Other tsunami codes based on the finite volume approach include Tsunami-HySEA [12] and GeoClaw [13]. The numerical flux implemented in a numerical algorithm has to ensure that some standard conservation and consistency properties are satisfied: the fluxes from adjacent control volumes sharing an interface exactly cancel when summed and the numerical flux with identical state arguments reduces to the true flux of the same state. sL sR w* wL wR 0 x Figure 1: The approximate Riemann fan In Volna-OP2 the Harten-Lax-van Leer (HLL) numerical flux was selected to ensure these conditions are met [1]. The HLL approximate Riemann solver was proposed by Harten et al. in 1983 and assumes a two- wave configuration for the exact solution [14]. The wave speed chosen according to [15] yields a very robust approximate Riemann solver. The Riemann solver models two waves that travel with speeds s and s , the L R larger signal velocity is represented by s and the smaller by s ; three states are identified (Fig 1). The R L subscripts R and L are used to represent the right and left cell values respectively. The intermediate state is denoted by w~ , where w~ is a vector of the conserved variables (H; Hu; Hv). The numerical flux function of the scheme can be described by: F for s 0; L L (w~ ; w~ ) := F for s < 0 s ; HLL L R L R F for s < 0: R R 2 ~ where w~ ; w~ are the two interface states and F denotes the true flux at state w~ respectively. The L R L; ;R L; ;R right and left states are known. The intermediate state can be determined by applying the Rankine-Hugoniot conditions twice [1]. It then derives that: ~ ~ s w~ s w~ (F F ) R R L L R L w~ = s s R L ~ ~ s F s F + s s (w~ w~ ) R L L R L R R L F = s s R L In Volna-OP2 the wave speeds are computed as: s = min(u c ; u c ); L nL L s = max(u + c ; u + c ) R nR R where u = ~u ~n , u = ~u ~n and u and c are equal to: nL L LR nR R LR u = (u + u ) + c c ; nL nR L R 1 1 c = (c + c ) (u u ) L R nR nL 2 4 p p where c = gH and c = gH are the gravity wave speeds for the right and left state of the system R R L L respectively and ~n denotes the vector along the shared face between the right and left states. The short- LR coming of the HLL scheme is that it cannot resolve isolated contact discontinuities. It can thus become quite dissipative. 2.3 Temporal discretisation A Strong Stability-Preserving (SSP) method is used in conjunction with a Runge-Kutta method for the temporal discretization in VOLNA. In the current version of the code the optimal second order two stage Runge-Kutta scheme SSP-RK(2,2) is used, with optimal Courant-Friedrichs-Lewy (CFL) condition equal to 1. The scheme is given as follows: (1) (n) (n) w~ = w~ + tL(w~ ) 1 1 1 (n+1) (n) (1) (1) w~ = w~ + w~ + tL(w~ ) 2 2 2 Where L(w~ ) is defined as the finite volume space discretization operator. The stability of the scheme is guaranteed if the CFL condition is satisfied. The Runge-Kutta scheme is very robust, especially in handling discontinuities. However, the scheme is both dissipative and dispersive [16]. Dissipativity causes a leak of energy from the system while dispersion leads to either phase lag or phase lead. A full explanation of these phenomena is given in Section (4). 2.4 Second order extension The classical finite volume schemes are only first order accurate in space, which is insufficient for most modern computational simulations. Simulating a real tsunami case with a first order accurate scheme would require an unfeasible mesh resolution to obtain meaningful results. So in order to yield second order accuracy in space, a reconstruction technique is implemented. One must ensure that the scheme is total variation diminishing (TVD), i.e no artificial maxima and minima are introduced. Thus, within Volna-OP2 a MUSCL (Monotone Upstream-centered Scheme for Conservation Laws) scheme is implemented. The second order spatial accuracy is achieved by reconstructing the conserved variables on the cell interfaces. The reconstruction relies on calcu- lating the gradient of a conserved variable over a cell and projecting the reconstructed value on the interface. Within Volna-OP2, a least squares gradient reconstruction and Barth-Jesperson limiter [17] are implemented. The reconstructed values given below (3) are then used in the numerical flux calculation: w~ (x~ ) = w~ + (rw~ ) (x~ x~ ) (3) f K K K f 0 where w~ is a vector of conserved variables, w~ (x~ ) is the conserved variable evaluated at the interface, is f K the cell specific conserved variable limiter, (rw~ ) is the cell centred gradient and x~ x~ is a vector pointing K f 0 from the cell-centre to the face centre. To avoid large gradients being calculated in the wet/dry region of the domain a threshold depth has been introduced to ensure that the code remains stable. When the depth goes below this threshold the scheme doesn’t carry out a reconstruction. This threshold depth has been set to be H = 10 m. However, it has been found that a conservative value can ensure greater stability, for threshold 5 3 example in Section (3), H = 10 m. For the real case (Section (5)), H = 10 m. At present threshold threshold these values are found through a trial and error approach but more research is required on the optimisation of this threshold depth. 3 2.5 Boundary Conditions - Wall/Solid Boundary For a full explanation on the treatment of boundary conditions the reader is referred to [1]. However, the case of a wall/solid boundary is given here. For all boundary conditions a ghost cell technique is used. This approach allows one to reconstruct the conserved variables on the boundary and thus preserve second order accuracy. Values of the conserved variables on the ghost cells are defined based on the type of boundary condition needed. In the following, cell L is defined to be inside and cell R (ghost cell) is outside of the domain. The boundary of the domain is the common edge between cell L and cell R. For a wall/solid boundary, ~u~n = 0, where ~u is the flow velocity and ~n is the normal vector to the boundary k ? edge. To ensure that this is satisfied, the tangential (~u ) and normal (~u ) velocities for the ghost cell are set to be equal and opposite to the those of the interior cell respectively: ? ? ~u = ~u R L k k ~u = ~u R L 3 Benchmark Test with Analytical Solution The two dimensional case of a radially symmetric paraboloid is implemented following the analytic solution initially proposed by Thacker [18]. This solution is available in the SWASHES (Shallow-Water Analytic Solu- tions for Hydraulic and Environmental Studies) library [19]. The major aim of SWASHES is to aid numerical modellers to validate shallow water equation solvers. The oscillatory motion of the paraboloid is described by a periodic solution in which damping is assumed to be negligible. The morphology of the domain is a paraboloid of revolution given by: z(r) = h 1 ; (4) L L L L 2 2 where r = x + y for each (x; y) 2 [ ; ] [ ; ], where L is the length of the domain, h is the 2 2 2 2 water depth at the central point of the domain when the shoreline elevation is zero and is the horizontal distance from the central point to the shoreline with zero elevation (Fig 2). 4 α r0 h0 0 x L Figure 2: Geometry of the domain used for the analytic solution following [19] The free surface elevation h(r; t) and the velocity components u(x; y; t) and v(x; y; t) are then given by: 2 2 1 A r 1 A h(r; t) = h 1 1 z(r); 2 2 1 A cos(!t) (1 A cos(!t)) 1 !A sin(!t) L u(x; y; t) = x ; 1 A cos(!t) 2 2 1 !A sin(!t) L v(x; y; t) = y ; 1 A cos(!t) 2 2 where ! = 8gh = is the frequency, r is the distance from the central point of the domain to the initial 0 0 2 2 2 2 shoreline location and A = ( r )=( +r ). To model the solution we follow the values proposed in [19], 0 0 where = 1m, r = 0:8m, h = 0:1m, L = 4m, and T = 3(2=!). 0 0 We record the free surface elevation in three positions (x ; y ) = (0; 0)m (centre of the domain), (x ; y ) = 1 1 2 2 (0:5; 0)m, and (x ; y ) = (1; 0)m (shoreline). Fig (3) highlights a top-down view of the bathymetry and the 3 3 locations of the wave gauges. In numerical simulations with Volna-OP2 we model the free surface elevation at the three positions with various spatial resolutions as a function of time up to t = 10s. An analytic solution fin at time t = 0 is chosen as an initial condition and t = 0:45x for the simulations. This was chosen as it was found to be stable for all the mesh resolutions. 5 2.0 0.72 Virtual Gauges 0.63 1.5 0.54 1.0 0.45 0.5 0.36 0.0 0.27 −0.5 0.18 −1.0 0.09 0.00 −1.5 −0.09 −2.0 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0 X (m) Figure 3: Plot of the bathymetry (parabolic bowl) using the parameters outlined by [19]. The colour coding matches the height of the bathymetry. The locations of the virtual gauges are marked by the black stars. The results of the numerical simulations are shown in (Fig 4) – (Fig 6). Y (m) Bathymetry (m) 0.02 0.01 ΔxΔΔ yΔ0.024m ΔxΔΔ yΔ0.012m ΔxΔΔ yΔ0.006m 0.00 Analytical Solution 0 − .01 0 − .02 0 2 4 6 8 10 Time (s) Figure 4: Comparing numerical and analytic solutions in time at (x ; y ) = (0; 0)m (centre of the domain), 1 1 where the analytic solution – black dashed, x = y = 0:024m – green, x = y = 0:012m – blue, and x = y = 0:006m – red. In the numerical simulations, t = 0:45x. The dashed box is the boundaries of (Fig 7). 0.010 0.005 ΔxΔΔ yΔ0.024m ΔxΔΔ yΔ0.012m 0.000 ΔxΔΔ yΔ0.006m Analytical Solution 0.005 0.010 0 2 4 6 8 10 Time (s) Figure 5: Comparing numerical and analytic solutions in time at (x ; y ) = (0:5; 0)m, where the analytic solution 2 2 – black dashed, x = y = 0:024m – green, x = y = 0:012m – blue, and x = y = 0:006m – red. In the numerical simulations, t = 0:45x. Surface Elevation (m) Surface Elevation (m) 0.01 0.00 ΔxΔΔ yΔ0.024m ΔxΔΔ yΔ0.012m ΔxΔΔ yΔ0.006m 0.01 Analytical Solution 0.02 0.03 0 2 4 6 8 10 Time (s) Figure 6: Comparing numerical and analytic solutions in time at (x ; y ) = (1; 0)m (shoreline), where the analytic 3 3 solution – black dashed, x = y = 0:024m – green, x = y = 0:012m – blue, and x = y = 0:006m – red. In the numerical simulations, t = 0:45x. The shoreline forbids the numerical solutions to go below zero. ΔxΔΔ yΔ0.024m 0.026 ΔxΔΔ yΔ0.012m ΔxΔΔ yΔ0.006m Anal tical Solution 0.024 0.022 0.020 0.018 0.016 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3 Time (s) Figure 7: Zoom in on the plot of the gauge at the centre of the domain (x ; y ) = (0; 0)m, one can observe the 1 1 effect of the mesh size on the accuracy. The finest mesh (x = y = 0:006m) yields a representation closest to the surface elevation given by the analytic solution. The discrepancies between the meshes can only be seen by zooming in on the plots (Fig 7). Focusing on the centre of the domain, we plot the absolute difference between the analytic and the numerical free surface elevation over time (Fig 8). As expected, the numerical error is always larger for the coarser meshes. In order to gain an idea on the numerical order of the scheme, a convergence study was carried out. The L norm is calculated at 10s for various mesh resolutions (0.048 – 0.003)m and then plotted versus the characteristic mesh size (Fig 9). Surface Elevation (m) Surface Elevation (m) 0.0007 ΔxΔΔ yΔ0.024m ΔxΔΔ yΔ0.012m 0.0006 ΔxΔΔ yΔ0.006m ΔxΔΔ yΔ0.003m 0.0005 0.0004 0.0003 0.0002 0.0001 0.0000 0 2 4 6 8 10 Time (s) Figure 8: Absolute difference between the analytic and the numerical free surface elevation over time at the centre of the domain (x = y = 0)m for spatial resolution: x = y = 0:003m – yellow line, x = y = 0:006m – red, x = y = 0:012m – blue, x = y = 0:024m – green; t = 0:45x. Slope = 2.13 −0.5 −1.0 −1.5 −2.0 −2.5 −3.0 −2.4 −2.2 −2.0 −1.8 −1.6 −1.4 log(Mesh Size (h)) nd Figure 9: Convergence rate of the L norm. Volna-OP2 with the MUSCL extension is 2 order accurate in space, i.e the error is O(h ). The results shown in (Fig 8) and (Fig 9) highlight that the scheme is second order accurate in space. However, the role of numerical dispersion and dissipation has not been explored and they should be accounted for when running long time tsunami simulations. We come back to this discussion in section (4). To check the temporal discretization error we keep the mesh size constant at x = y = 0:006m and Absolute Difference (m) log(L ) vary the time step by adjusting the constant connecting the spatial and temporal resolutions. We choose three values: t = (0:333; 1; 1:2)x and run the test for a longer time t = 100s. The results of the simulations fin are shown in Figs 10 and 11. Fig 11a Fig 11b 0.02 0.01 Δx=Δy=0.006m (Δt = 0.ΔΔΔΔx) Δx=Δy=0.006m (Δt = 1.0Δx) Δx=Δy=0.006m (Δt = 1.2Δx) 0.00 Analytical Solution 0 − .01 0 − .02 0 20 40 60 80 100 Time (s) Figure 10: Solution at the centre of the domain (x = y = 0m) for various time steps with fixed spatial resolution x = y = 0:006m: the analytic solution – black dash line, t = 0:333x – green, t = x – blue, and t = 1:2x – red. Arrows point towards the areas highlighted in the subplots (Fig 11a and Fig 11b). Surface Ele ation (m) 0.0260 ΔxΔΔ yΔ0.006m (Δ t Δ 0.333Δ x) ΔxΔΔ yΔ0.006m (Δ t Δ 1.0Δ x) 0.0255 ΔxΔΔ yΔ0.006m (Δ t Δ 1.2Δ x) Anal tical Solution 0.0250 0.0245 0.0240 0.0235 0.0230 0.0225 0.0220 8.80 8.85 8.90 8.95 9.00 9.05 9.10 9.15 Time (s) 0.0260 Δx=Δy=0.006m (Δt = 0.333Δx) Δx=Δy=0.006m (Δt = Δ.0Δx) 0.0255 Δx=Δy=0.006m (Δt = Δ.2Δx) Analytical Solution 0.0250 0.0245 0.0240 0.0235 0.0230 0.0225 0.0220 98.50 98.55 98.60 98.65 98.70 98.75 98.80 98.85 Time (s) Figure 11: Zoom in on Fig 10 for two different time windows (Top (a): t = 8:79 9:15s and Bottom (b): t = 98:5 98:86s). The difference between the numerical solution with various time steps is not noticeable in each subplot. However, one can see a damping and phase shifting of the numerical signal when comparing the bottom (11b) and top subplot (11a). Changes in time-step within the stability limits while keeping the same spatial resolution almost do not affect the accuracy of the solution (Fig 10). This is highlighted in Fig 11 where the plots of the numerical solution with various time steps overlap each other. However, comparing the subplots of Fig 11 as time goes on, one observes a damping of the numerical signal and a phase shift. The Runge-Kutta scheme implemented in the code is dissipative and we could expect a leak of the energy from the system, thus we can expect the damping of the numerical signal. Reasons for the phase shift in the signal will be explained in section (4). Overall, the results from this section show that the space discretisation has a strong influence on the solution, while reducing the time step has no visible effect. 4 Error analysis The exact solution of the discretized equations satisfies a PDE which is generally different from the one to be n+1 n solved. The original equation is replaced with the modified equation Au = Bu , or, in other words 1 1 2p 2p+1 X X @! @! @ ! @ ! +L! = 0 becomes +L! = + : 2p 2p+1 2p 2p+1 @t @t @x @x p=1 p=1 The even-order derivatives on the right-hand side produce an amplitude error, or numerical dissipation. The odd-order derivatives on the right-hand side produce a wave-number-dependent phase error known as numer- ical dispersion. In the long time simulations, the numerical behaviour of the scheme largely depends on the role played by the dispersive and dissipative effects also known as ”wiggles” (phase errors) and ”smearing” (amplitude errors) respectively. A negative dispersion coefficient corresponds to phase lagging (i. e. har- monics travel too slowly), while positive dispersion coefficients yield phase leading with spurious oscillations occurring ahead of the wave. According to the Lax-Richtmyer Equivalence Theorem [20], if a scheme has a truncation error of order (p; q) and the scheme is stable, then the difference between the analytic solution and the numerical solution in Surface Ele ation (m) Surface Elevation (m) p q an appropriate norm is of the order (t) + h for all finite time. It has been observed numerically (see Fig 9) that the numerical solution is second order accurate. Taking into account that the time step is proportional to the spatial resolution that we call h, we can write that the total error is of the order O(h ). To analyse the role played by the dissipation and dispersion errors, we rewrite the error as E(t) = C (t)h ; p 1 p=2 where C (t) incorporates all the constants included in the error formula. We choose the three leading terms p 1 of this expansion: 2 3 4 E(t) C (t)h + C (t)h + C (t)h : 1 2 3 The first term in this expansion corresponds to the truncation error (also the leading dissipation error). It is followed by the leading dispersion and the secondary dissipation error terms. We assume that the remaining terms are significantly smaller and can be neglected. Next we go back to the simulations with various spatial resolution discussed in Section (3). For each of the three grids, we have the absolute error as a function of time. If we define the spatial resolution h = 0:024m, two other grids have the resolution h=2 and h=4. The system of equations has the form: C (t) = E (t); k = 1; 2; 3 (5) p 1 k k 1 p (2 ) p=2 and its solution is E (t) 24E (t) + 128E (t) 1 2 3 C (t) = ; 3h 2[E (t) 20E (t) + 64E (t)] 1 2 3 C (t) = ; 8[E (t) 12E (t) + 32E (t)] 1 2 3 C (t) = 3h The plots below show the composition of the total error for each of the grids (Fig 12 – Fig 14) , with a scaled surface elevation overlaid to give an idea on when the errors occur. 0.010 0.005 0.000 0 − .005 0 − .010 Total Error Truncation Error 3rd Order Error 0 − .015 4th Order Error Analytic Solution (Scaled) 0 2 4 6 8 10 Time (s) Figure 12: Decomposition of the absolute error at the centre of the domain (x = y = 0m) for a spatial grid with resolution of x = y = 0:024m and CFL=0.45. The colours correspond to: the truncation error – green, the third order error – red, the fourth order error – blue, the total error – black. The black dashed line is a scaled plot of the surface elevation at the centre of the domain over time. Error in surface ele ation (m) 0.0015 0.0010 0.0005 0.0000 0 − .0005 0 − .0010 Total Error 0 − .0015 Truncation Error 3rd Order Error 4th Order Error 0 − .0020 Analytic Solution (Scaled) 0 2 4 6 8 10 Time (s) Figure 13: Decomposition of the absolute error at the centre of the domain (x = y = 0m) for a spatial grid with resolution of x = y = 0:012m and CFL=0.45. The colours correspond to: the truncation error – green, the third order error – red, the fourth order error – blue, the total error – black. The black dashed line is a scaled plot of the surface elevation at the centre of the domain over time. Total Error 0.0004 Truncation Error 3rd Order Error 4th Order Error 0.0003 Anal tic Solution (Scaled) 0.0002 0.0001 0.0000 0 − .0001 0 − .0002 0 − .0003 0 2 4 6 8 10 Time (s) Figure 14: Decomposition of the absolute error at the centre of the domain (x = y = 0m) for a spatial grid with resolution of x = y = 0:006m and CFL=0.45. The colours correspond to: the truncation error – green, the third order error – red, the fourth order error – blue, the total error – black. The black dashed line is a scaled plot of the surface elevation at the centre of the domain over time. In all the error decompositions, the components tend to cancel each other, which results in the total error being less than the individual components. However, the leading dispersion error is a dominant component for all the total errors. The leading dispersion error exhibits large negative spikes, this points towards phase lagging as the surface elevation changes from negative to positive at the center of the domain. These large th negative spikes in the leading dispersion error coincide with positive spikes in either the truncation or 4 order Error in surface elevation (m) Error in surface ele ation (m) dissipative errors. For finer grids the truncation error (leading dissipation error) plays a dominant role. For the th larger grid the negative spikes are balanced by the 4 order dissipative error (Fig 12). This error analysis is important when considering real cases – see section (5) – as any error could be dominated by either the leading dispersion or dissipation terms. However, this numerical dispersion term could compensate for the fact that physical dispersion is neglected in the nonlinear shallow water equations, as shown by [21]. 5 Real cases In this section we discuss an actual tsunami simulation done with Volna-OP2 running on the general purpose ® ® GPU cluster. The cluster consists of two NVIDIA Tesla V100 cards, with 5,120 CUDA cores, 16GB max memory size each. The domain size is 800 1000km and the physical simulation time is 1 hour. The simulation corresponds to a hypothetical scenario of edge volume collapse (765 km ) at the Rockall Bank Slide Complex. A geophysical study of the event taking into account volumetric, rheological and multiphase collapse considerations has been done under a different framework [22]. Convergence using a simple approach of material with visco-plastic rheology sliding in one go is demonstrated. For this study we have chosen four gauges marked by the red dots on Fig 15 to present the evolution of the tsunami as a function of time in Fig 16. Figure 15: Computational domain used in the simulation and four gauges marked with the red dots. 14 Gauge (a) Gauge (b) ΔxΔΔ yΔ2050m Δx=Δy=20Δ0m ΔxΔΔ yΔ1000m 30 Δx=Δy=1000m ΔxΔΔ yΔ600m Δx=Δy=600m ΔxΔΔ yΔ450m Δx=Δy=4Δ0m 30 20 0 0 1 − 0 1 − 0 2 − 0 3 − 0 2 − 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Time (hrs) Time (hrs) Gauge (c) Gauge (d) Δx=Δy=20Δ0m Δx=Δy=20Δ0m Δx=Δy=1000m Δx=Δy=1000m 20 Δx=Δy=600m Δx=Δy=600m Δx=Δy=4Δ0m Δx=Δy=4Δ0m Maximum ave amplitude 10 10 1 − 0 1 − 0 2 − 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Time (hrs) Time (hrs) Figure 16: Results of the numerical simulations at four gauges from left to right on Fig 15, sorted hori- zontally starting from the upper left corner. Each plot includes four tests with different spacial resolution: x = y =(2050m - red line, 1000m - green line, 600m - blue line, 450m - black dashed line) run for 1 hour with CFL=1.0. The gauges coordinates (from left to right, from top to bottom) are: a - (350km, 500km), b - (400km, 450km), c - (500km, 350km), and d - (545.8km, 312.2km). Overall, the numerical simulations with varying spatial discretisation behave similarly. Notable differences can be seen at gauge 4 (Fig 16d), where unresolved bathymetric features of the continental shelf play a role. To investigate the numerical differences we will focus on the output at gauge 3 (Fig 16c). The reason for choosing this gauge is to minimise the effects of these unresolved bathymetric features as the bathymetry between this gauge location and the landslide source is relatively flat. The maximum wave amplitude of the initial tsunami wave at gauge 3 (Fig 16c) has been highlighted above. As there is no true solution to compare with for this real case we will take the simulation results from the finest mesh x = y = 450m as the ground truth. Relative differences between this ground truth and the other simulations are presented in Fig 17 and table 1. When comparing the signals (Fig 17), the coarser meshes exhibit phase lagging and/or damping of the signal, i.e. the maximum tsunami wave arrives later and its amplitude is diminished. This behaviour was highlighted in the previous error analysis – Section 4 – and is thus expected. It should be noted that for cases which utilize non-uniform mesh resolutions the same error analysis findings will hold true. If the mesh is non-uniform (i.e the characteristic length scale of the cells vary across the domain), the analysis addresses the worst possible scenario and scales all cells by the largest for error computations. Thus, one would expect to see similar behaviour regarding numerical dissipation and dispersion. Wave Height (m) Wave Height (m) Wave Height (m) Wave Height (m) 26.5 Δx=Δy=2050m Δx=Δy=Δ000m Δx=Δy=600m 26.0 Δx=Δy=450m 25.5 25.0 δη 24.5 δt 24.0 23.5 0.468 0.470 0.472 0.474 0.476 0.478 0.480 0.482 Time (hrs) Figure 17: Zoom in on the maximum wave amplitudes of the initial tsunami wave at gauge 3 (Fig 16c) simulated using the varying mesh resolutions Resolution [m] t [s] 600 0.08 2.8 1000 0.43 2.8 2050 1.55 14.8 Table 1: Relative differences in wave height and arrival time of initial wave at gauge 3 between the coarser meshes and finest one (Fig 16). = the difference in maximum wave height and t = the time delay between the arrival of the maximum wave. Turning to the performance of Volna-OP2 on the GPU cluster, the table (2) and plot (Fig 18) summarise the runtimes for the various mesh resolutions. One can see that we get a linear speed up of the runtimes. Those interested in the scalability of the code on other HPC architectures are referred to [2]. This analysis of the relative errors (1) and computational efficiency informs the user on what resolution will provide an acceptable level of accuracy within a given time constraint. Resolution, x = y [m] Number of cells Total runtime [s] - 1 GPU Total runtime [s] - 2 GPUs 2050 246,187 140.7 84.3 1000 1,031,014 555.6 296.1 600 2,872,156 1565.8 826.0 450 5,109,001 2782.3 1433.1 Table 2: Volna-OP2 performance on GPU cluster Wave Height (m) 1 Gpu 3.4 2 Gpus 3.2 3.0 2.8 2.6 2.4 2.2 2.0 5.4 5.6 5.8 6.0 6.2 6.4 6.6 log(Number of Cells) Figure 18: Runtimes of the various mesh resolutions with Volna-OP2 on a GPU cluster. 6 Concluding remarks Based on the current study we can conclude that Volna-OP2 is a robust and efficient parallel solver for the NSWE. It is based on the finite volume scheme for spatial integration, implementing a MUSCL reconstruction and using the 2nd order Runge-Kutta scheme for integration in time. The scheme is conditionally stable with experimentally confirmed CFL=1.0. The code can handle complex geometries and simulate real-life cases. The error analysis shows that it scales quadratically with refining the spatial mesh. However, reducing the time-step does not have a visible effect on the error. The solution amplitude decays in time, and one can observe both damping and phase shifts. So there is an energy leak from the system, which is expected due to the non-conservative Runge-Kutta scheme. However, the error can be minimised by reducing the spatial resolution. The phase error is mostly negative, which corresponds to the phase lag and the artificial wiggles behind the wave front. However, it changes sign occasionally and this leads to the phase lead. The real case simulations show the efficiency and scalability of the code run on a GPU cluster. The 1- hour realistic size tsunami model is simulated in 24 mins on the finest (450m resolution, 5:1M cells) mesh using two GPUs. However, the simulations have shown that comparable results can be achieved using a coarser mesh and reduced runtime. Therefore the users should be familiar with code pros and cons before setting up their simulation in order to get the physically meaningful and numerically accurate results. This acceptable level of accuracy and runtime trade off is an important decision when using Volna-OP2 to perform comprehensive sensitivity analysis tests and uncertainty quantification. To conclude the paper we want to mention that Volna-OP2 is still under development. Therefore its analysis and benchmark testing play an important role for the code’s continuous enhancement and improvement. The code is already an important tool for tsunami modelling community. However, we hope that our work will help to its wider adoption and will lead to discussion on the most suitable algorithms and software platforms for realistic tsunami modelling and prediction of its effects. Acknowledgments The Irish Centre of High End Computing (ICHEC) is acknowledged for their computing resources. Daniel Giles is supported by a Government of Ireland Postgraduate Scholarship from the Irish Research Council (GOIPG/2017/68). Eugene Kashdan would like to acknowledge NVIDIA for the donation, within the frame- work of the Professor Partnership Programme, of the Tesla K40 GPU which was used for this research. Serge log(Runtime (s)) Guillas acknowledges support from the Alan Turing Institute project “Uncertainty Quantification of multi- scale and multiphysics computer models: applications to hazard and climate models” as part of the grant EP/N510129/1 made to the Alan Turing Institute by EPSRC. References [1] D. Dutykh, R. Poncet, and F. Dias, The VOLNA code for the numerical modeling of tsunami waves. European Journal of Mechanics – B/Fluids, 30(6), (2011), 598–615. [2] I. Z. Reguly, D. Giles, D. Gopinathan, L. Quivy, J. H. Beck, M. B. Giles, S. Guillas, and F. Dias, The VOLNA-OP2 tsunami code (version 1.5). Geosci. Model Dev, 11(11), (2018), 4621–4635. [3] R. Poncet et al., A Study of the Tsunami Effects of Two Landslides in the St. Lawrence Estuary. D. C Mosher et al. Editors, Submarine Mass Movements and Their Consequences. Springer, Netherlands, (2010), 755–764. [4] F. Dias et al., On the Modelling of Tsunami Generation and Tsunami Inundation. Procedia IUTAM, 10, (2014), 338–355. [5] D. Gopinathan, M. Venugopal, D. Roy, K. Rajendran, S. Guillas, and F. Dias, Uncertainties in the 2004 SumatraAndaman source through nonlinear stochastic inversion of tsunami waves. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 473(2205), (2017):20170353 [6] T. S. Stefanakis et al., Can small islands protect nearby coasts from tsunamis? An active experimental design approach. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2172(470), (2014), 20140575 [7] J. Beck and S. Guillas, Sequential design with mutual information for computer experiments (MICE): emulation of a tsunami model.. SIAM/ASA Journal on Uncertainty Quantification, 4(1), (2016), 739–766 [8] S. Guillas, A. Sarri, S. J. Day, X. Liu, and F. Dias, Functional emulation of high resolution tsunami modelling over Cascadia.. The Annals of Applied Statistics, 12(4) , (2018) 2023–2053. [9] D. M. Salmanidou, S. Guillas, A. Georgiopoulou, and F. Dias, Statistical emulation of landslide-induced tsunamis at the Rockall Bank, NE Atlantic.. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 473(2200), (2017). [10] X. Liu and S. Guillas, Dimension reduction for Gaussian process emulation: An application to the influence of bathymetry on tsunami heights. SIAM/ASA Journal on Uncertainty Quantification, 5(1), (2017) 787–812. [11] M. Brocchini et al., An efficient solver for nearshore flows based on the WAF method. Coastal Engineer- ing, 43, (2001), 105–129. [12] J.Macias, M.J. Castro, S. Ortega, C. Escalante, and J.M. Gonzalez-Vida, Performance Benchmarking of Tsunami-HySEA Model for NTHMP’s Inundation Mapping Activites. Pure Appl. Geophys, 173, (2017), 3147–3183. [13] M.J. Berger, D.L George, R.J. LeVeque, and K.T. Mandli, The GeoClaw software for depth-average flows with adaptive refinement. Adv. Water Resour., 34, (2011), 1195–1206. [14] A. Harten, P. D. Lax, and B. van Leer, On upstream differencing and Godunov-type schemes for hyper- bolic conservation laws. SIAM Review, 25, (1983), 35–61 [15] J.G. Zhou et al., Numerical solutions of the shallow water equations with discontinuous bed topography. Internat. Journal on Numerical Methods in Fluids, 38, (2002), 769-788. [16] P.J. Van Der Howen and B.P. Sommeijer, Explicit RungeKutta (-Nystrom) methods with reduced phase errors for computing oscillating solutions. SIAM J.Numer.Anal., 24, (1987), 595-617. [17] T.J. Barth and D.C. Jespersen, The design and application of upwind schemes on unstructured meshes. 27th Aerospace Sciences Meeting. (1989) [18] W. C. Thacker, Some exact solutions to the nonlinear shallow water wave equations. Journal of Fluid Mechanics, 107, (1981), 499-508. [19] O. Delestre, et al., SWASHES: a compilation of Shallow-Water analytic solutions for hydraulic and environmental studies. International Journal for Numerical Methods in Fluids, 72, (2013), 269-300. [20] P. D. Lax and R. D. Richtmyer, Survey of the Stability of Linear Finite Difference Equations. Comm. Pure Appl. Math., 9, (1956), 267-293. [21] D. Burwell, E. Tolkova and A. Chawla, Diffusion and dispersion characterization of a numerical tsunami model. Ocean Modelling, 19, (2007), 10 –30. [22] D.M. Salamanidou, A. Georgiopoulou, S. Guillas and F. Dias, Rheological considerations for the mod- elling of submarine sliding at the Rockall Bank, NE Atlantic Ocean. Physics of Fluids, 30, (2018):030705
http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.pngMathematicsarXiv (Cornell University)http://www.deepdyve.com/lp/arxiv-cornell-university/performance-analysis-of-volna-op2-massively-parallel-code-for-tsunami-HtuRCisFTT
Performance analysis of Volna-OP2 -- massively parallel code for tsunami modelling