Access the full text.

Sign up today, get DeepDyve free for 14 days.

Statistics
, Volume 2020 (2002) – Feb 26, 2020

/lp/arxiv-cornell-university/a-visual-sensitivity-analysis-for-parameter-augmented-ensembles-of-zWbzJLBCMt

- ISSN
- 2377-2158
- eISSN
- ARCH-3347
- DOI
- 10.1115/1.4046020
- Publisher site
- See Article on Publisher Site

A Visual Sensitivity Analysis for Parameter- Augmented Ensembles of Curves Ribés, Alejandro EDF Lab Paris-Saclay, 7 Bd Gaspard Monge, 91120 Palaiseau, France alejandro.ribes@edf.fr Pouderoux, Joachim Kitware, 6 Cours André Philip, 69100 Villeurbanne, France joachim.pouderoux@kitware.com Iooss, Bertrand EDF Lab Chatou, 6 Quai Watier, 78401 Chatou, France bertrand.iooss@edf.fr ABSTRACT Engineers and computational scientists often study the behavior of their simulations by repeated solutions with variations in their parameters, which can be for instance boundary values or initial conditions. Through such simulation ensembles, uncertainty in a solution is studied as a function of the various input parameters. Solutions of numerical simulations are often temporal functions, spatial maps or spatio-temporal outputs. The usual way to deal with such complex outputs is to limit the analysis to several probes in the temporal/spatial domain. This leads to smaller and more tractable ensembles of functional outputs (curves) with their associated input parameters: augmented ensembles of curves. This article describes a system for the interactive exploration and analysis of such augmented ensembles. Descriptive statistics on the functional outputs are performed by Principal Component Analysis projection, kernel density estimation and the computation of High Density Regions. This makes possible the calculation of functional quantiles and outliers. Brushing and linking the elements of the system allows in-depth analysis of the ensemble. The system allows for functional descriptive statistics, cluster detection and finally for the realization of a visual sensitivity analysis via cobweb plots. We present two synthetic examples and then validate our approach in an industrial use-case concerning a marine current study using a hydraulic solver. INTRODUCTION uncertainty visualization has been long advocated as one of the top challenges in visualization [9-11]. In this article, simulation refers to the The goal of this work is to propose application of computational models to the study methodologies and tools for researchers and and prediction of physical events or the behavior engineers performing uncertainty studies by of engineered systems. In this context, the analyzing ensembles. An example of such a modern usage of simulation tools has improved strategy can be a hydraulics engineer studying and grown to a point that has far exceeded many results generated by a multi-run finite-elements expectations. That remarkable change has come simulation. In this case, the ensemble could be a about mainly because of developments in the fixed 3D mesh for all members and a varying field computational sciences and the rapid advances in (temperature, water height, pressure, etc.) that computing equipment. Computer models help depends on the experimental design used to engineers to forecast the behavior of the system sample the parameters controlling the under investigation in conditions that cannot be simulations. Thus, when the engineer applies a reproduced in physical experiments (e.g. probe on a node of the mesh she/he obtains not accidental scenarios), or when physical the evolution of a quantity (temperature, water experiments are theoretically possible but at a height, pressure, etc.) over time but another very high cost. To improve and have a better hold smaller ensemble of functional outputs or curves. on these tools, it is crucial to be able to analyze We should then not only deal with an ensemble of them under the scopes of sensitivity and functional outputs but also with their associated uncertainty analysis [1-3]. In particular, sensitivity simulation parameters. We call this kind of data analysis aims at identifying the most influential an augmented ensemble of curves. parameters for a given output of the computer A non-augmented ensemble of curves model and at evaluating the effect of uncertainty presents already a first problem of visual clutter, in each uncertain input variable on model output which is well known in the visualization [4, 5]. community. When a large number of curves are A probabilistic uncertainty study consists superposed to one another, the overall of evaluating the computer model on a large size perception of the graphs is lost, and the user statistical sample of model inputs (which follow a cannot analyze the ensemble. As an example, joint probability distribution), then analyzing all Figure 1 depicts 1,500 curves coming from the results (the model outputs) with specific different runs of the same numerical simulation statistical tools. The result of such a family of runs (from a hydraulics application). When looking at is called ensemble, and each individual run is the overall behavior of an ensemble of curves, called a member. Ensembles are multivariate, such as Figure 1, the first set of basic questions which means that a simulation is run several times that arise are the following: with varying parameters. Their members are What is the median curve? multidimensional (both in space and time) and Can we define some confidence interval curves multivalued (several quantities such as containing most of the curves (as done usually temperature, pressure or velocity are for scalar random variables with the boxplot considered). The usual way to deal with these tool)? kinds of outputs (as a temporal function, a spatial Can we detect some abnormal curves, in the map or a spatio-temporal output) is to limit the sense of a strong difference from the majority analysis to several probes in the temporal/spatial of the curves (as outliers for scalar variables)? domain [1-4]. To deal with this problem, taking Are there some clusters, which correspond to ideas from the visualization community seems different behaviors of the physical model that particularly interesting. Indeed, one of its current generated these outputs? challenges is how to deal with the multivariate These questions can be answered by nature of the ensembles [6-8]. Furthermore methods found in the recent technical literature by the way of Principal Component Analysis (PCA) BACKGROUND AND RELATED WORK methods, with a statistical viewpoint [12-14] or with a visualization viewpoint [15-17]. However, First of all, our work relates to uncertainty for augmented ensembles new challenges arise and sensitivity analysis. In particular, global because a member of such ensemble consists of a sensitivity analysis is an ensemble of techniques set of input parameters (which drove a numerical which aim to identify the influential and non- simulation) and its associated functional output. influential inputs on some computer model First, the interactive exploration needs a outputs [4]. In particular, quantitative global methodology able to visually provide, to the sensitivity analysis deals with a probabilistic analyst, the statistical structure of the curves and representation of the input parameters to the identification of clusters. Second, if the consider their overall variation range. Variance- clusters of functional outputs correspond to based sensitivity measures, also called Sobol’ groups of coherent behaviors of the simulations, indices [18], are currently the most popular is it possible to visually study the relationship method for global sensitivity analysis [5]. The between these behaviors and the input principle of Sobol’ indices is to decompose the parameters? This question implies the realization variance of the output, Y, of the simulation into of a visual sensitivity analysis that we realize by fractions, which can be attributed to each of the linking the classical tool of the cobweb plot in random model input X (with i=1,…,p where p is sensitivity analysis [4] and the ensemble of curves the number of inputs). When Y is a scalar output, visualization described before. these percentages are directly interpreted as The following section lists the important measures of sensitivity. However, sensitivity and main previous works on the subjects covered analysis for large scale numerical systems that by this paper. The third section explains the simulate complex spatial and temporal evolutions method used for estimating functional quantiles remains very challenging because of the while the fourth section describes how to perform treatment of uncertainty [2], the treatment of the the visual sensitivity analysis. In the fifth section, functional nature of the output [19,20] and the applications of the methodology are given on toy large volumes of data that could be produced [21]. examples and an industrial example. The two last Our main contribution is the realization of a visual sections provide a discussion on software sensitivity analysis, linking the cobweb plot (a implementation and a conclusion. classical graphical tool in sensitivity analysis [4]) and the ensemble of curves visualization. Figure 1. Raw visualization of curves coming from a multi-run hydraulics simulation: 1500 curves of water height evolving over time. One of the difficulty when visualizing projection, using PCA as in [12] (which is limited to several one-dimensional curves is to avoid visual the two first components). [13] has introduced clutter. An interesting solution is given by [22] this PCA-based approach for visualizing (but non- which proposes a “curve density estimation” interactively) functional outputs of computer directly in the curves’ space. Other visualizations experiments. Later, [14] has extended the may represent overlaid function graphs as technique to selecting and modeling more than envelopes [23], semi-transparent graphs [24], and two PCA components by advanced statistical offer brushing techniques to highlight selected techniques. Our choice of using PCA is firstly subsets of the functions [23-25]. Another motivated by our idea to jointly interact with the approach offers a re-orderable matrix of time PCA-plane (defined in section “Projecting on the series charts [26]. The way these methods deal PCA bivariate plane”) in which each function is with visual clutter is very different from our represented by a 2d-point. Thus, in this aspect the approach, which offers quantified statistical method relates to dimensionality reduction information by calculating quantile bands and techniques but using a human-in-the-loop outliers. approach; see [29] for a structured literature In our work, we use the extension of the review and references on this field. Furthermore, classical boxplot to functions: the functional the PCA-plane allows, at the same time, to boxplot proposed in [12,27]. A boxplot for scalar calculate functional quantiles and to study the variables allows summarizing the main multimodal nature of ensembles. We remark that information of a data sample: median, first and data depth based techniques of [27] do not deal third quartiles, and an interquantile-based with multimodality. interval which define the limit of non-outliers The PCA technique reduces the data data. First step to build such boxplot is to rank dimension via a linear transformation. In some data thanks to a statistical order or data depth; cases, such a transformation does not work due to such order has first to be defined for functional the underlying structure of the data (see an data [28], which has led to numerous research engineering example in [30]). Non-linear works in the literature. The concept of functional dimension reduction techniques (non-linear PCA, data depth has been generalized to contours by kernel PCA, Riemannian manifold learning, locally [15], which displays boxplots for two-dimensional linear embedding, etc., see [31]) can then be used simulation data in weather forecasting and with a certain increase in complexity and computational fluid dynamics. The so-called band computational cost. The pragmatic approach depth, defined by [28], is particularly relevant for consists in first applying a PCA, and then turning the goals of [15]. Band depth is defined on an to non-linear methods if the data variability is not ensemble of functions, the band depth of each well captures by a small number of PCA function is the probability that the function lies components. Using non-linear dimension within the band defined by a random selection of reduction techniques is beyond the scope of this other functions from the distribution. The band paper and will be studied in future works. depth is computed for each member of the Reference [17] presents a method for ensemble and can be used, as described in [27], to computing streamline variability plots. It consists visualize summary statistics for an ensemble of on the transformation, via PCA, of the set of functions. Our methodology differs from [15] streamlines, into a lower dimension space in because we do not use data depth, the functional which clustering can be performed. Clustering by summary statistics are calculated through and fitting geometric medians and confidence ellipses alternative method based on a PCA projection. is performed in PCA- space. Finally, medians and Furthermore, our focus is in augmented ellipses are transformed back to the domain space ensembles and not in ensembles of contours. and yield the variability plot of the streamline In our method, the functional curves are ensemble. Reference [16], of the same authors, handled by reducing their dimension via applies similar techniques to ensembles of iso- surfaces. Our methodology also uses PCA in order addresses the problem of parameter-finding in to work into a lower dimension space and then re- image segmentation by visually guiding the user project to the original dual space. However, the towards areas that need refinement, in a sparse operations performed in the PCA-space are very sampled parameter space, by placing additional different, [16, 17] perform clustering while we sample points. In a second stage the user calculate HDR (High Density Regions). This navigates through the parameter space in order to operation necessitates an estimate of the determine areas where the response value empirical density function in the PCA space, which (goodness of segmentation) is high. uses kernels thus avoiding fitting a parametric Reference [40] presents a conceptual model (such as ellipses). HDR presents a unique framework in which six typical analysis task can be and strict mathematical definition, and are at the performed: optimization, partitioning, fitting, core of the method to calculate quantitative and outliers, uncertainty and sensitivity. Numerous non-parametric variability of the data. HDR can examples exist of space analysis for optimization, also be used for clustering, in our current such as [37] or [39]. Our work differs from most implementation they assist the users in this task. references [34-38] in the analysis tasks we focus Brushing and linking is extensively used in on (detecting outliers, partitioning and visualizing our system but we have no new contribution in sensitivities). In fact, our main contribution is the this area, the interactions we use were already realization of a visual sensitivity study in the described in classical works such as [32,33]. Other context of time-evolving numerical simulations. works have applied these classical techniques to ensembles of functions, for instance [34] for the ESTIMATING FUNCTIONAL QUANTILES AND investigation of families of data surfaces, [35] to OUTLIERS analyze 2D function ensembles in the development process of powertrain systems and The estimation of the quantiles and [36] for the interactive visual exploration of large outliers of an ensemble of functions is performed 3D scalar ensembles. These references on a plane defined by the first two vectors of its demonstrate the necessity for a flexible visual Principal Component Analysis (PCA). The method analysis system that integrates many different is divided into 3 main steps: linked views for making sense of this complex 1. Project the functions into the PCA bivariate data. In this context, using brushing and linking plane; and statistical aggregations, as [34,36] is seducing; 2. Perform the estimation of the Probability we differentiate from these works because we do Density Function (PDF) on this plane, which not perform any statistical aggregation, i.e. the allows for the estimation of Highest Density computation of statistical moments, and prefer Regions (HDR) which boundaries are quantile analysis that does not introduce any isoprobability contours; hypothesis about the underlying data distribution. 3. Project the HDR boundaries back into the We finally remark that our work is space of curves. The functional quantiles and inscribed in the field of visual parameter space outliers are then computed. analysis. This approach was used by [37] in an interactive system called HyperMoVal that was Projecting on the PCA bivariate plane designed to support model validation. These models related to the development of car engines PCA is a technique of dimensionality for tasks which require a prediction of results in reduction, whose purpose is to represent the real-time. Other examples of this visual analysis source data into a new space of lower dimensions. include [38], which combined a sensitivity analysis It is mathematically defined as an orthogonal with a linked multi-dimensional visualization linear transformation, which maps the data to a providing a way to analyze the behavior of an new coordinate system such that the greatest artificial neural network. Reference [39] variance comes to lie on the first coordinate (called the first principal component), the second estimate the quantiles, which also allows outliers greatest variance on the second coordinate to be detected. Conceptually, two basic (called the second principal component), and so operations should be performed to build the on. Projecting the curves into the PCA bivariate quantiles: i) create a density map on the plane and plane means that only the first and second ii) calculate iso-probability curves of this map. So principal components are kept, thus each curve is as to implement these operations we follow the represented by a point in a two-dimensional space method of Highest Density Regions of [12]. Full (the bivariate plane). mathematical details are given in [12] but the There is an underlying condition for PCA principle of this method is to assimilate reduction: the transformation should keep observations in the bivariate plane of principal enough information about the source data, while components to the realizations of a random allowing to simplify the analysis. In this article, we vector with density f. By calculating an estimate of limit our scope to a bivariate plane for simplicity the density map f, the quantiles can then be but an extension to larger dimensions is possible. computed. Moreover, our examples have a high explained We start from the sample variance using the first two PCA components (𝑋 ) which stands for n observations of 𝑖 𝑖 =1…𝑛 (which means that the two-dimensional reduced the vector X of dimension p=2. The following PCA basis correctly reproduces the overall smoothing process is used: variability of the ensemble of curves). This explained variance is calculated from the n ( ) ( ) 𝑓 𝑋 = ∑ 𝐾 𝑋 − 𝑋 singular values of the covariance matrix 𝐻 𝑖 (𝜎 , 𝜎 ..., 𝜎 ) in the following way: 𝑖 =1 1 2 𝑛 where 𝐾 is the Gaussian smoothing kernel which 𝜎 + 𝜎 1 2 𝑣 = writes 𝜎 + 𝜎 + ... + 𝜎 1 2 𝑛 −1⁄2 −1⁄2 𝐾 (𝑋 ) = |𝐻 | 𝐾 (𝐻 𝑋 ) where 𝜎 ≥ 𝜎 ≥ ... ≥ 𝜎 are the ordered singular 𝐻 1 2 𝑛 values representing the importance of the 1 1 variance of each principal component. In our with 𝐾 (𝑋 ) = 𝑒𝑥𝑝 (− ⟨𝑋 , 𝑋 ⟩) the “standard” 2𝜋 2 implementations, this quantity is systematically Gaussian kernel and H the matrix containing the visualized on the diagrams containing the smoothing parameters (extension in p dimensions bivariate plane. of a smoothing parameter h in dimension 1). As said before, working with three or Depending on this matrix (diagonal or not), some more PCA components is also possible. We have preferential smoothing directions can be chosen. prepared a prototype that uses an interactive In our implementation, we first generate a grid Scatter Plot Matrix view, based on [41], for the covering the bivariate plane, which is initialized to interaction with the n-dimensional space of PCA 100x100 but can be customized by means of the dimensions. However, the real challenge in this user interface. We subsequently apply an case is the computational complexity associated isotropic kernel, whose width is automatically to the estimation of densities in higher initialized by use of the rule of Silverman [42]. dimensions. The work of [14] allows to solve this Once the estimate of the density map 𝑓 is problem. obtained, the Highest Density Regions method (HDR) gives a description of important statistical Highest Density Regions method information. It is defined as Once the functional variable has been 𝑅 = {𝑍 : 𝑓 (𝑍 ) ≥ 𝑓 } 𝛼 𝛼 transformed to a fewer component space (in our case the bivariate plane), the next step is to where is the order of the quantile we choose maximum and minimum values on the curves space. This process generates functional quantiles and the quantile value 𝑓 is such that that are not necessarily existing curves on the 𝑓 (𝑍 ) = 1 − 𝛼 , that defines the region with ensemble. probability coverage 1 − 𝛼 . It means that 𝑓 is Figure 2 shows the analysis of the dataset such that all points within the region 𝑅 have a shown in Figure 1: the 50% inter-quantile area is higher density estimate than any of the points represented in light red, dark red is used for the outside the region, hence the name highest 95% inter-quantile zone. The median curve (in density region. For a density map, the HDRs can be black) is calculated by finding the point in the considered as regions bounded by contours, with bivariate plane that presents the highest value of an expanding coverage as α decreases. In our the density map. implementation we compute two HDRs: An inner HDR with probability coverage α = Information contained in the bivariate plane 50% that corresponds to the central interquartile zone; The bivariate PCA plane presents some An outer HDR where α can interactively be important characteristics worth discussing: modified via a slider (default value α = 95%). It provides a data reduction and visually We consider that all points excluded from understandable representation, where each curve the outer HDR can be some outliers. This allows to is represented by a point; interactively changing the threshold that is used The probability density map associated to the to compute the outliers. bivariate plane allows for the calculation of the median curve, functional quantiles and outliers; Back into the curves space The density map conveys important information about the modality of the curves dataset. Indeed, Once both HDRs are calculated, we would statistical multimodality is normally associated like to see them in the original space of the curves. with a mixture of unimodal distributions. Each of We recall that a point in the PCA plane the underlying modes defines different behaviors corresponds to a curve. However, HDRs represent of the curves and thus the original data can be areas of the PCA plane, which boundaries are divided in clusters. The HDR exposes the mono or contours. It is then necessary to run an algorithm multi-modal nature of the dataset. If an HDR is that converts these contours into their associated formed by disjoint areas, the distribution is functional quantiles. We propose a first exact multimodal. algorithm by traversing all points corresponding to the discretized boundary and choosing the Figure 2. The functional quantiles and the median (black line) of the multi-run hydraulics simulation curves shown in Figure 1. 𝑑𝑍 In the presented method, both the analysis [4]. Our system propagates the selections diagram containing the curves or functional performed on the bivariate plane or on the boxplot and the bivariate PCA plane are jointly functional boxplot to the diagrams associated to visualized by use of a brushing and linking the input parameters. This allows the exploration strategy. In our system brushing corresponds to a of complex input-output relationships. selection operation. Thus, we offer tools to select individual or subsets of points in the bivariate RESULTS plane, which highlights its corresponding curves. The opposite schema “select curve, highlight In this section, we discuss experimental point” is also available. We also set meaningful results to demonstrate the utility of the proposed limits to the exploration by drawing two iso- PCA-based functional boxplot, which allows: probability curves. First contour is fixed at 1. To study the variance of the curves generated probability 50% while the outer one is controlled from a multi-run numerical simulation; by a slider in the user’s interface. We finally define 2. To detect functional outliers; a blue vs red colormap to help the visual 3. To identify clusters of curves that correspond interaction with the plane (blue meaning low to different behaviors of members of the probability and red high probability). ensemble; 4. To perform a visual sensitivity study. LINKING WITH THE INPUT PARAMETERS The discussion is started with two synthetic examples and then an industrial use- The proposed methodology also allows case concerning a marine current study using a for the study of the augmented ensemble, each hydraulic solver is presented. member of such ensemble is a couple consisting of Oscillating tangents the input parameters and a functional output. Thus, a member is represented as a couple (p , f) Our first synthetic example consists of an where pN is a list of N parameters and f is a ensemble of time-oscillating analytical functions function. We will constrain our current study to one-dimensional f functions (curves). We also coming from the following equation: remark that the number N of input parameters is ( ) ( ) ( ) ( ) ( ) not necessarily small. Current numerical 𝑦 𝑡 = 𝑎𝑛𝑎𝑡 𝑋 𝑜𝑠𝑐 𝑡 + 𝑎𝑛𝑎𝑡 𝑋 𝑠𝑖𝑛 𝑡 1 2 simulations can easily present N=50. The whole where 𝑋 and 𝑋 are the input parameters and t ensemble is represented as (p , f) where M is the N M 1 2 represents the time, which is regularly sampled in number of couples or, equivalently, the number of the interval [0, 2π]. Thus we generate an members in the ensemble. We remark that input ensemble of 400 curves by Monte Carlo sampling parameters and functional outputs both possess a of both 𝑋 and 𝑋 based on a uniform distribution common index M on the ensemble. Thus, it is 1 2 in the interval [-7, 7]. technically possible to share the same selection In Figure 3, we show some results strategy between them. Indeed, our linking concerning this ensemble of temporal oscillating strategy allows the use of fully coupled diagrams functions. On the top panel (a) all 400 generated in order to interact, simultaneously, with the input curves are shown. Figure 4 (b) shows the result of parameters and functional outputs of the multi- a user interaction with the PCA bivariate plane of run simulations. the 400 curves, where a blue to red colormap is One important consequence of this joint applied. The explained variance is equal to 100%. visualization is that it allows for what we define as This surprising result is explained by the fact that a visual sensitivity analysis. As a matter of fact, curves of the oscillating tangents function are multi-run simulations are often used to determine regular sinusoids only tuned by their amplitude the impact of input parameters on the results of and frequency. the simulations, which is called a sensitivity (a) Propagated Selection Selection (b) Figure 3. Top panel (a) shows 400 curves generated by the temporal oscillating tangents experiment. Bottom panel (b) shows the results of a user interaction where the analyst has selected (pink points) one of the clusters of the PCA plane. We can see that four clusters appear, multimodal ensemble of curves is indeed a indicating a multi-modal structure of the complex analysis task. oscillating curves. The analyst has selected one of these cluster, then the propagated selection Campbell 1D functions on the curves is highlighted, this selection corresponds to variations of the same Our second synthetic example is inspired by [43, oscillating mode. This example demonstrates 19]. It consists of an ensemble of analytical the interest of visualizing and interacting with functions that evolve in time. This dataset is the PCA bivariate plane in the context of a generated by use of the following equation: partitioning task [40]. Understanding a −(𝜏 − 10𝑋 ) We generate an ensemble of 100 curves 𝑦 (𝜏 ) = 10 + 𝑋 𝑒𝑥𝑝 ( ) 2 2 by Monte Carlo sampling based on a uniform 𝑘 𝑋 + 𝑋 1 1 3 distribution in the interval [-1, 5], the same ( ) + 𝑋 𝑋 𝑥𝑝𝑒 𝑘 𝑋 𝜏 2 4 2 1 sampling is used for all 𝑋 . In the upper panel (a) of Figure 4, we show all 400 curves generated by where 𝑋 , 𝑋 , 𝑋 and 𝑋 the input parameters and 1 2 3 4 the Monte Carlo sampling. At time 80 an event τ is a one by one regularly sampled variable in the occurs and part of the curves diverge from its interval [-90, 90]. The quantities 𝑘 and 𝑘 are 1 2 original tendency while the others keep with its constant, fixed to 60 and 0.002 respectively. [43] original behavior. This can be easily understood by has introduced a slightly different version of this looking at the functional interquantile areas and function in order to test simple sensitivity analysis to the median curve, presented in the bottom tools when model outputs are 1D curves panel (b) of Figure 4. Indeed, the median curve (understanding the role of each of the four inputs and the 50% interquantile area are not modified on the translation from left to right of the curve, by the event while the upper limit of the 95% on the shape of the curve peak and on the curve interquantile area rises up. By looking at this tail behavior). From this, [19] has calibrated a representation, an analyst avoids visual clutter function (called Campbell2D) in order illustrate and easily understands that the event at time 80 tools of sensitivity analysis when model outputs affected only the evolution in time of the top 25% are 2D spatial functions (with strong spatial of the curves. The explained variance by the two heterogeneities, sharp boundaries, and very PCA components is equal to 97%. different spatial distributions of the output values according to the X values). (a) (b) Figure 4. Top panel (a) shows 400 curves generated by the modified 1D Campbell function experiment. Bottom panel (b) shows the corresponding median curve and interquantile areas at 50% and 95% probability. Selection Propagated Selection Figure 5. Realization of a visual sensitivity study over a synthetically generated ensemble using the modified 1D Campbell function. Two interactive linked diagrams are presented: the diagram on the top contains the analysis of the outputs (the curves), while the bottom parallel coordinates diagram represents the four input parameters of the Campbell function. The analyst has selected a group of curves on the top diagram thus this selection is propagated. We superpose “left bracket” symbols to the parallel coordinates’ diagram in order to visually reinforce the dispersion of the propagated selection, which is a measure of sensitivity. Once stated that there is a specific group concentration of the selection, thus they strongly of temporal evolving functions which behavior is influence the outputs. Using this simple criterion modified by the event, the analyst is interested in of “visual dispersion” the parameters can be knowing if some of the input parameters are ordered by importance (𝑋 , 𝑋 , 𝑋 , 𝑋 ), which is 1 2 4 3 responsible for this behavior. It is then possible to one of the main objectives of a sensitivity analysis. perform a visual sensitivity study by selecting, in In our system, the analyst “asks a the functional boxplot diagrams, all the curves question” by using the selection. In this case the ending in the upper part of the 95% interquantile question was: “which parameters generated the area. Then the system propagates the selection to curves ending with the highest values?”. Of the diagrams dealing with the input parameters. course, numerous other questions are possible, The result of this operation is shown in Figure 5. based on the selection on the Functional Boxplot, In this figure, two interactive linked diagrams are the bivariate plane or other diagrams linked to the presented, the diagram on the top contains the ensemble data. This example demonstrates that analysis of the outputs while the bottom parallel our system can perform a visual sensitivity coordinates’ diagram represents the inputs. The analysis. This kind of fast and informative interpretation of the diagrams is straightforward. exploration could be performed before a formal Indeed, it is possible to visually assess the sensitivity analysis, such as the computation of importance of each parameter by looking at the Sobol’ indices (see [5] for this methodological axis of the parallel coordinates’ diagram. In this point of view). case, 𝑋 and 𝑋 presents a high degree of 1 2 A hydraulics study-case Figure 6 shows the result of two interactions. A functional boxplot containing the Our study-case concerns a maritime analysis of the 1,500 curves is linked to the input model of Alderney Ray (or Raz Blanchard in parameters that are represented in a parallel French), which is a strait that runs between coordinates’ diagram. In this figure, the analyst Alderney (UK) and Cap de la Hague (France), a explores the relationship between the functional cape at the northwestern tip of the Cotentin outputs and the parameter “Sea Level”. On the peninsula in Normandy. This strait presents one of top panel (a) of Figure 6, the analyst selects the the fastest marine currents in Europe; the current highest values of “Sea Level” while on the bottom is intermittent, varying with the tide, and can run panel (b) the lowest values are selected. By up to about twelve knots during equinoctial tides. looking at the propagated selections on the A study was performed in order to functional boxplots (in orange), it is easily calibrate a hydrodynamic model, which is typically understood that “Sea Level” behaves like a an engaged and difficult process due to the vertical offset on the oscillating curves generated complexity of the flows and their interaction with by the tide. The analyst thus understands that the shoreline, the seabed and the islands. Thus, it “Sea Level” strongly influences the simulations was essential to understand the relationship results. This is coherent with the formal sensitivity between the modelling calibration parameters analysis that was also performed. Sobol’ indices and the simulated state variables which are were computed and they show that the compared to the observations. A sensitivity parameter “Sea Level” strongly influences analysis using Sobol’ indices was a necessary step (around 97%) the outputs while the others prior to calibration. In this context, several multi- present little influence. In addition, the run studies were performed. In this section we information shown in Figure 6 is richer than the focus on a particular 1,500 runs study where five scalar Sobol’ indices. Sobol’ indices reveal the parameters were varied: strong influence of the parameter “Sea Level” Two coefficients of friction (CF1 and CF2) while Figure 6 underlines the way that this influence is performed (by applying a vertical shift modeling the interaction with the seabed. to the tide). One “SeaLevel” representing the vertical Hydraulics engineers were also interested distance from the surface to the seabed. in using our system to study or verify which Two parameters for tidal modeling: the tidal parameters do not influence the functional range (vertical variation range) and the tidal outputs. This step is fundamental for model velocity. reduction where a parameter is taken out of a The maritime model includes Alderney model when it is considered as non-influential. and the tip of the Cotentin peninsula and covers Figure 7 shows the result of selecting the highest an area roughly 55 km x 35 km. The finite element values of CF1 (one of the coefficients of friction of mesh is composed of 17,983 nodes and 35,361 the seabed). We observe that its propagated triangular elements. The mesh size varies from selection on the functional boxplot is visually 100 m, at the shoreline and within the areas of disperse, which indicates that CF1 has no interest, to 1.8 km offshore (western and influence in the behavior of the outputs. This northern sectors of the model). The computations again is coherent in respect to the Sobol’ indices- were performed by the open source fluid based sensitivity analysis. Moreover, physicist dynamics solver TELEMAC [44] performing the study confirmed that CF1 and CF2 (http://www.opentelemac.org/) that generated should be non-influential in this case because the fields such as velocity, pressure, and water height. seabed is too deep for its friction to have an effect We extracted 1,500 curves of this multi-run study on the sea surface. The explained variance by the by use of a probe in one of the nodes of the mesh; two PCA components is equal to 99%. this leads to the curves shown in Figure 1. Propagated Selection Selection (a) High values of “Sea Level” Propagated Selection Selection (b) Low values of “Sea Level” Figure 6. Interactive exploration of the relationship between the functional outputs and the parameter “Sea Level” in a marine hydraulics multi-run study, which shows that this parameter applies a vertical shift to the tide. High Dispersion Selection Figure 7. Interactive exploration of the relationship between the functional outputs and “CF1” (coefficient of friction 1) shows that this parameter is not relevant for this study, due to of the high dispersion of the propagated selection. Finally, Figure 8 shows a more subtle SOFTWARE IMPLEMENTATION result. The analyst interacted with the propagated selected curves of the bottom panel (b) of Figure The system described in this article was 6. Our system allows refining selections, then sub- developed by a collaboration between ensembles of the curves with low “Sea Level” visualization scientists and statisticians. The aim is values were selected and a second order or the development of mathematical tools to study indirect effect was observed. Figure 8 illustrates and analyze multi-run simulations, before this second order effect by selecting: integrating the more efficient algorithms in the (a) low “Sea Level” and high “Tidal Range” OpenTURNS software [45]. It was decided to values, design and implement new interactive visual (b) low “Sea Level” and low “Tidal Range” values. analytics methods in OpenTURNS by integrating Comparing the curves (in pink) selected on Figure the new developments into ParaView [46]. 8, we observe two modes of oscillation of the tide: OpenTURNS and ParaView are both integrated in for a fixed “Sea Level” the “Tidal Range” controls the SALOME open-source numerical simulation the amplitude of oscillation of the tide. The platform [47]. existence of this behavior is coherent with the The original idea was to introduce a physics of the problem but it could not be Functional Boxplot view in ParaView in order to observed in the performed Sobol-based sensitivity avoid visual clutter and interactively study the analysis. outliers of an ensemble of curves. The bivariate PCA plane and the High Density Regions (HDR) were seen as a way of augmenting the information of the Functional Box Plot view, which simulations dealing with uncertainty. These was an advantage over functional depth methods augmented ensembles are composed by functions like [27-28]. Data depth does not allow to display and their associated parameters. The main data multimodality but only to calculate quantiles contribution of our system is that a visual of functions. In our system, if the structure is sensitivity study becomes possible by jointly multi-modal then the analyst can visually identify analyzing functional outputs and their the clusters, which are disjoint regions of the inner corresponding input parameters. HDR. Furthermore, the bivariate PCA plane could Figure 9 synthesizes the overall be segmented by any automatic clustering methodology. Its principal element is based on algorithm. This is straightforward in our HDR computed on the PCA bivariate plane. This integration in ParaView because a clustering allows the realization of the following tasks: algorithm can be added to its standard Avoid visual clutter by visualizing interquantile visualization pipeline. We positively tested the areas and the median curve; Paraview’s native implementation of k-means. Interactively detect functional outliers; On the other side, interacting among Identify clusters of functions by means of the views introduced problems in the architecture of HDR and PCA-plane. ParaView and, as a consequence of this work, the Combining all these elements with the so-called linked views mechanism was developed. linking of functional outputs to their In this context, other statistical views were also corresponding input parameters allows the implemented. We remark the implementation of realization of a visual sensitivity study. an interactive Scatter Plot Matrix view that is a Two synthetic examples and one version of the work of [41]. industrial use-case have allowed to demonstrate Finally, our system was fully integrated in the potential of the approach which has been ParaView and is available from version 5.0.1. This integrated in a software environment based on software being Open-Source, the examples given the ParaView and OpenTURNS platform. Current in this article can easily be reproduced. Indeed, we works turn to extend this method to larger include all data presented in this article as number of components retained in the PCA step supplemental material and to the visual sensitivity analysis of parameter- (https://gitlab.kitware.com/edf/visual-sensitivity- augmented ensembles of spatial fields. Indeed, in analysis-of-curves). a lot of applications, outputs of computer codes are vectors supported by surfaces (see some CONCLUSION examples in [2-3, 19-20]). Future works will also consider non-linear dimensionality reduction We have designed and implemented a techniques [31] in order to replace the PCA one. system allowing the in-depth study of augmented ensembles issued from multi-run numerical High amplitude tide Double Selection (a) Low amplitude tide Double Selection (b) Figure 8. Interactive exploration of a subtle phenomenon involving two parameters: “Sea Level” and “Tidal Range”. The effect of “Tidal Rage” is not the same depending on the value of “Sea Level”. The ensemble of sampled 1D functions is assembled into a matrix PCA -A kernel-based PDF is calculated on the plane of the two first PCA components - A blue to red colormap is applied to the PDF - HDR are estimated - HDR boundaries are isoprobability contours Back projection -The highest density point estimates the median -HDR boundaries correspond to the functional quantiles -Points outside the outer HDR (95%) gives outliers Figure 9. Scheme of the overall PCA-based methodology. [6] Love, A.L., Pang, A., and Kao, D.L., 2005, “Visualizing REFERENCES spatial multivalue data”, IEEE Computer Graphics and Applications, 25(3), pp. 69-79. [1] De Rocquigny, E., Devictor, N., and Tarantola, S. [7] Sanyal, J., Zhang, S., Dyer, J., Mercer, A., Amburn, P., (Eds), 2008, Uncertainty in industrial practice: a and Moorhead, R., 2010, “Noodles: A tool for guide to quantitative uncertainty management, visualization of numerical weather model John Wiley & Sons. ensemble uncertainty”, IEEE Transactions on [2] Smith, R.C., 2014, Uncertainty quantification, SIAM. Visualization and Computer Graphics, 16(6), pp. [3] Ghanem, R., Higdon, D., and Owhadi, H. (Eds.), 1421-30. 2017, Handbook of uncertainty quantification, [8] Potter, K., Wilson, A., Bremer, P.T., Williams, D., Springer. Doutriaux, C., Pascucci, V., and Johnson, C.R., [4] Saltelli, A., Chan, K., and Scott, E.M. (Eds.), 2000, 2009, “Ensemble-vis: A framework for the Sensitivity analysis, Wiley. statistical visualization of ensemble data”, In: [5] Iooss, B., and Lemaître, P., 2015, “A review on global Data Mining Workshops, ICDMW'09, IEEE sensitivity analysis methods”, In: Uncertainty International Conference, IEEE, pp. 233-240. Management in Simulation-Optimization of [9] Lodha, S.K., Wilson, C.M., and Sheehan, R.E., 1996, Complex Systems, Meloni, C., and Dellino, G. “LISTEN: sounding uncertainty visualization”, In: (Eds), pp. 101-122, Springer US. Proceedings of the 7th conference on Visualization'96, IEEE Computer Society Press. [10] Pang, A.T., Wittenbrink, C.M., and Lodha, S.K., [21] Terraz, T., Ribés, A., Fournier, Y, Iooss, B., and 1997, “Approaches to uncertainty visualization”, Raffin, B., 2017, “Melissa: Large Scale In Transit The Visual Computer, 13(8), pp. 370-90. Sensitivity Analysis Avoiding Intermediate Files”, [11] Johnson, C.R., and Sanderson, A.R., 2003, “A next In: The International Conference for High step: Visualizing errors and uncertainty”, IEEE Performance Computing, Networking, Storage Computer Graphics and Applications, 23(5), pp. 6- and Analysis (Supercomputing). 10. [22] Lampe, O.D., and Hauser, H., 2011, “Curve density [12] Hyndman, R.J., and Shang, H.L., 2010, “Rainbow estimates”, In: Computer Graphics Forum, 30(3), plots, bagplots, and boxplots for functional data”, pp. 633-642, Blackwell Publishing Ltd. Journal of Computational and Graphical Statistics, [23] Hochheiser, H., and Shneiderman, B., 2004, 19(1), pp. 29-45. “Dynamic query tools for time series data sets: [13] Popelin, A.-L., and Iooss, B., 2013, “Visualization timebox widgets for interactive exploration”, tools for uncertainty and sensitivity analyses on Information Visualization, 3(1), pp. 1-8. thermal-hydraulic transients”, In: Proceedings of [24] Konyha, Z., Matkovic, K., Gracanin, D., Jelovic, M., Joint International Conference on and Hauser, H., 2006, “Interactive visual analysis Supercomputing in Nuclear Applications and of families of function graphs”, IEEE Transactions Monte Carlo 2013 (SNA + MC 2013), Paris, France, on Visualization and Computer Graphics, 12(6), October 2013. pp. 1373-85. [14] Nanty, S., Helbert, C., Marrel, A., Pérot, N., and [25] Muigg, P., Kehrer, J., Oeltze, S., Piringer, H., Prieur, C., 2016, “Uncertainty quantification for Doleisch, H., Preim, B., and Hauser, H., 2008, “A functional dependent random variables”, Four‐level Focus+ Context Approach to Comput. Stat. 2016: DOI 10.1007/s00180-016- Interactive Visual Analysis of Temporal Features 0676-0. in Large Scientific Data”, In: Computer Graphics [15] Whitaker, R.T., Mirzargar, M., and Kirby, R.M., Forum, 27(3), pp. 775-782), Blackwell Publishing 2013, “Contour boxplots: A method for Ltd. characterizing uncertainty in feature sets from [26] McLachlan, P., Munzner, T., Koutsofios, E., and simulation ensembles”, IEEE Transactions on North, S., 2008, “LiveRAC: interactive visual Visualization and Computer Graphics, 19(12), pp. exploration of system management time-series 2713-22. data”, In: Proceedings of the SIGCHI Conference [16] Ferstl, F., Kanzler, M., Rautenhaus, M., and on Human Factors in Computing Systems, pp. Westermann, R., 2016, “Visual Analysis of Spatial 1483-1492, ACM. Variability and Global Correlations in Ensembles [27] Sun, Y., and Genton, M.G., 2011, “Functional of Iso‐Contours”, In: Computer Graphics Forum, boxplots”, Journal of Computational and 35(3), pp. 221-230. Graphical Statistics, 20(2), pp. 316-34. [28] López-Pintado, S., and Romo, J., 2009, “On the [17] Ferstl, F., Bürger, K., and Westermann, R., 2016, concept of depth for functional data”, Journal of “Streamline variability plots for characterizing the the American Statistical Association, 104(486), uncertainty in vector field ensembles”, IEEE pp. 718-34. Transactions on Visualization and Computer [29] Sacha, D., Zhang, L., Sedlmair, M., Lee, J.A., Graphics, 22(1), pp. 767-776. Peltonen, J., Weiskopf, D., North, S.C., and Keim, [18] Sobol, I.M., 1993, “Sensitivity estimates for D.A., 2017, “Visual interaction with nonlinear mathematical models”, Mathematical dimensionality reduction: A structured literature Modelling and Computational Experiments”, 1(4), analysis”, IEEE transactions on visualization and pp. 407-14. computer graphics, 23(1), pp. 241-50. [19] Marrel, A., Iooss, B., Jullien, M., Laurent, B., and [30] Auder, B., de Crecy, A., Iooss, B., and Marquès, M., Volkova, E., 2011, “Global sensitivity analysis for 2012, “Screening and metamodeling of computer models with spatially dependent output”, experiments with functional outputs. Application Environmetrics, 22, pp. 383-397. to thermal-hydraulic computations”, Reliability [20] Marrel, A., Saint-Geours, N., and De Lozzo, M., Engineering and System Safety, 107, pp. 122-131. 2017, “Sensitivity analysis of spatial and/or [31] Lee, J.A., and Verleysen, M., 2007, Nonlinear temporal phenomena”, In: Handbook of dimensionality reduction, Springer. Uncertainty Quantification, Ghanem, R., Higdon, [32] Becker, R.A., and Cleveland, W.S., 1987, “Brushing D., and Owhadi, H. (Eds), Springer. scatterplots”, Technometrics, 29(2), pp.127-142. [33] Keim, D.A., 2002, “Information visualization and [40] Sedlmair, M., Heinzl, C., Bruckner, S., Piringer, H., visual data mining”, IEEE transactions on and Möller, T., 2014, “Visual parameter space Visualization and Computer Graphics, 8(1), pp. 1- analysis: A conceptual framework”, IEEE 8. Transactions on Visualization and Computer [34] Matkovic, K., Gracanin, D., Klarin, B., and Hauser, Graphics, 20(12), pp. 2161-70. H., 2009, “Interactive visual analysis of complex [41] Elmqvist, N., Dragicevic, P., and Fekete, J.D., 2008, scientific data as families of data surfaces”, IEEE “Rolling the dice: Multidimensional visual Transactions on Visualization and Computer exploration using scatterplot matrix navigation”, Graphics, 15(6). IEEE transactions on Visualization and Computer [35] Piringer, H., Pajer, S., Berger, W., and Teichmann, Graphics, 14(6), pp. 1539-148. H., 2012, “Comparative visual analysis of 2d [42] Silverman, B.W., 1981, “Using kernel density function ensembles”, In: Computer Graphics estimates to investigate multimodality”, Journal Forum, 31(3), pp. 1195-1204, Blackwell Publishing of the Royal Statistical Society Series B Ltd. (Methodological), 1, pp 97-9. [36] Demir, I., Dick, C., and Westermann, R., 2014, [43] Campbell K, McKay M.D., and Williams B.J., 2006, “Multi-charts for comparative 3D ensemble “Sensitivity analysis when model outputs are visualization”, IEEE Transactions on Visualization functions”, Reliability Engineering & System and Computer Graphics, 20(12), pp. 2694-703. Safety, 91(10), pp.1468-72. [37] Piringer, H., Berger, W. and Krasser, J., 2010, June. [44] Hervouet,J-M., 2000, “TELEMAC modelling Hypermoval: Interactive visual validation of system: an overview”, Hydrological Processes, regression models for real-time simulation. In 14(13), pp. 2209-10. Computer Graphics Forum (Vol. 29, No. 3, pp. [45] Baudin, M., Dutfoy, A., Iooss, B., and Popelin, A-L., 983-992). Oxford, UK: Blackwell Publishing Ltd. 2017, “Open TURNS: An industrial software for [38] Theron, R. and De Paz, J.F., 2006, September. uncertainty quantification in simulation”, In: Visual sensitivity analysis for artificial neural Ghanem R, Higdon D, Owhadi, H (Eds). Handbook networks. In International Conference on of Uncertainty Quantification, Springer. Intelligent Data Engineering and Automated [46] Ahrens, J., Geveci, B., and Law, C., 2005, Learning (pp. 191-198). Springer, Berlin, “Paraview: An end-user tool for large data Heidelberg. visualization”, The Visualization Handbook, 717. [39] T. Torsney-Weir et al., "Tuner: Principled [47] Ribés, A., and Bruneton A., 2014, “Visualizing Parameter Finding for Image Segmentation results in the SALOME platform for large Algorithms Using Visual Response Surface numerical simulations: an integration of Exploration," in IEEE Transactions on Visualization ParaView”, In: Large Data Analysis and and Computer Graphics, vol. 17, no. 12, pp. 1892- Visualization (LDAV), 2014 IEEE 4th Symposium 1901, Dec. 2011. on, pp. 119-120, IEEE.

Statistics – arXiv (Cornell University)

**Published: ** Feb 26, 2020

Loading...

You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!

Read and print from thousands of top scholarly journals.

System error. Please try again!

Already have an account? Log in

Bookmark this article. You can see your Bookmarks on your DeepDyve Library.

To save an article, **log in** first, or **sign up** for a DeepDyve account if you don’t already have one.

Copy and paste the desired citation format or use the link below to download a file formatted for EndNote

Access the full text.

Sign up today, get DeepDyve free for 14 days.

All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.