Project On-line Deliverables: D05

Model Calibration and Uncertainty Analysis

Technical Report (FINAL)

     
    Programme name:  ESPRIT
    Domain:  HPCN
    Project acronym:  HITERM
    Contract number:  22723
    Project title:  High-Performance Computing and Networking 
    for Technological and Environmental Risk Management
    Project Deliverable: D05
    Related Work Package:  WP 5
    Type of Deliverable: Technical Report
    Dissemination level: project internal
    Document Author:  Peter Mieth, Susanne Wottrich, Irene Gerharz, Steffen Unger, GMD
    Edited by:  Kurt Fedra, ESS, Thomas Lux, GMD
    Document Version:  2.0 (Final) 
    First Availability:  1998 01 30
    Last Modification:  1999 7 15




EXECUTIVE SUMMARY

This deliverable describes model calibration rules, sensitivity studies and uncertainty analysis, and documents the results.

Model calibration and uncertainty analysis are two important tasks to encourage potential users to use the developed software and to ensure a maximum level of truthfulness. The use of state-of-the-art methods for sensitivity analysis (Automatic Differentiation) allows the quantification of the sensitivity of a model output with regard to its input parameters for a given input data set. This forms a basis for a pre-selection of parameters which have a dominant influence on the model result. The methodology is documented for a selected spill release model.

Monte Carlo simulation techniques are generally used for varying input parameter sets of release submodules. Rather than using a single value, a time dependent source strength probability function serves as the input for the dispersion calculation. Usually, the evaluation of atmospheric and aquatic dispersion models is based on large field experiments and cannot be executed within the framework of the HITERM project. However, a comparison of test cases with observed structures or a comparison of numerical model results with analytical solutions for simple cases was carried out, whenever it was possible.

Furthermore, due to the crucial role of the technical reliability and required execution time of the complete system, the overall performance is documented.





Table of Contents

1 INTRODUCTION 

2 UNCERTAINTY AND SENSITIVITY ANALYSIS 

  • 2.1 Automatic Differentiation for the Example: Application of ADIFOR to Subroutine Evapor.f 
    • 2.1.1 Pre-conditions 
    • 2.1.2 Specifying input 
    • 2.1.3 Invocation of ADIFOR 
    • 2.1.4 Results 
  • 2.2 Monte Carlo Simulation 
    • 2.2.1 The method 
    • 2.2.2 Example: Monte Carlo simulation for an evaporation module 

3 MODEL EVALUATION 

  • 3.1 The Wind Field Model 
  • 3.2 The Lagrangian Particle Model 
    • 3.2.1 Comparison of LPD results with Gaussian solution for a simple case 
    • 3.2.2 Test runs with different numbers of emitted particles 
    • 3.2.3 Dispersion calculation under different meteorological conditions 

4 REFERENCES 

A   APPENDIX 

  • A.1 Original Routine 
  • A.2 Changed Declarations 
  • A.3 Routine created by ADIFOR 




1 INTRODUCTION

Model performance evaluation is realized by a comparison of the model results with the real world. In the case of atmospheric and air pollution models, an enhanced model evaluation is quite difficult. Data from field campaigns together with data from continuous measuring nets has to be compared with simulated data sets. 

This is not within the scope of this project. Moreover, given the possibility to compare simulated with real data, it is hard to distinguish between real model errors and data incorrectness or inadequateness.

A model may be valid (this includes an appropriate modelling of the desired features, a proper selection of the numerical schemes, and a careful implementation), but the outcome can be insufficient due to a lack of appropriate input data. 

A validation requires the usage of a perfect input data set to distinguish between model and data. Additionally, there is a principle validation problem which is still the subject of scientific investigations. The reason can be found in the different nature of point measurements and gridded model results as well as in the stochastic behavior of local turbulence patterns. 

Most of the selected software components already have a broad range of applications, and can therefore be considered as sufficiently evaluated within their application range. This is the case for the wind field model which has been derived from EPA's DWM (Douglas 1990) and for the different release models. Nevertheless, an evaluation of the system is documented.

The model evaluation methodology in the context of the HITERM project focuses on the following features:

  • demonstration of the appropriate changes of the wind field calculation for different meteorological conditions

  • flow description for complex terrain

  • comparison of Lagrangian dispersion results with an analytical solution for a simple case

  • analysing the dispersion calculation for different atmospheric stability parameters

  • evaluation of the interplay of all submodules.

The major source of uncertainties is given by the, in general, sparse and insufficient data structure of emergency cases. Especially the differences caused by unknown release parameters can lead to uncertainties in the concentration calculation of up to one order of magnitude. 

2 UNCERTAINTY AND SENSITIVITY ANALYSIS

One of the most important input parameters for all kind of dispersion models is the time dependent emission rate of the source. This rate is sufficiently known in the case of a continuously operational source, such as the stack of a power plant or an industrial estate. In the case of an accident, this parameter is very rarely known. Very often, a rough approximation of the total release mass is given and, in some cases, this release mass is unknown as well. This requires the incorporation of different source-strength models. But release and spill models for emergency management tasks suffer from a high degree of uncertainty regarding input parameter set. Depending on the sensitivity of the source function, this is the reason for many unrealistic concentration patterns.

Two different methods have been used to determine the uncertainty range of the source module: Monte Carlo Simulation and Automatic Differentiation. Monte Carlo methods are directly included in the model system whereus Automatic Differentiation has been used in the implementation phase.

The validation of models is a central concern of Applied Mathematics. Sensitivity analysis is carried out in order to examine the robustness of numerical results with respect to changes of the model structure, input parameters, algorithms, machine accuracy etc. The effects of perturbations concerning data or computations can be studied by the application of the Monte Carlo method. Accurate sets of data are perturbed by random data corresponding in their distribution and correlation as closely as possible to those of the real data inaccuracy (Überhuber 1995).

Within HITERM the Monte Carlo approach is used to find a source-strength probability function with the help of measured input parameters and user specified uncertainty ranges for these parameters. The computation of the probability function requires thousands of runs of the same code with varying parameters. In the case of a very complex multi-parameter release module this can be very time-consuming. Additionally, the dependencies of the individual parameters are hidden.

A more general approach is used in automatic differentiation. With automatic differentiation, the sensitivity of the result with regard to the individual parameters can be quantitatively specified. This is especially useful for the preselection of parameters which have to be varied with the Monte Carlo method. If a new release module is added, a sensitivity analysis can be executed using automatic differentiation to filter out the most sensitive parameters for a later Monte Carlo simulation. Additionally, the automatic differentiation approach defines which parameters have to be determined with a maximum of accuracy. 

2.1 Automatic Differentiation for the Example: 
Application of ADIFOR to Subroutine Evapor.f 

Automatic differentiation techniques rely on the fact that every function on a computer is executed as a sequence of elementary operations. By applying the chain rule

repeatedly, derivatives correct up to machine accuracy can be computed in a completely mechanical fashion (Bischof 1992).

The Mathematics and Computer Science Division of the Argonne National Laboratory and the Center for Research on Parallel Computation at Rice University in Houston have developed a program for automatic differentiation, called ADIFOR. This program can serve as a tool for analyzing in advance which parameters of a model should be studied more intensively by the means of a Monte Carlo simulation and which parameters can be neglected. To date, ADIFOR allows the computation of first-order derivatives directly from a given, however complicated, FORTRAN code.

The ADIFOR system provides automatic differentiation for programs written in FORTRAN 77. The system requires that the user supplies the FORTRAN source code and indicates the variables that correspond to the independent and dependent variables (Bischof 1994). ADIFOR is based on a source translator paradigm and has been designed mainly for large-scale functions. The computation of first-order derivatives has been fundamentally simplified as the user does not need to care about its accuracy and efficiency (Bischof 1992). Therefore, it is easily possible to examine even complex models with reasonable effort for the influence of each parameter, as long as the models are available as FORTRAN routines. 

For illustration, the simple evaporation model has been chosen to test the features of ADIFOR. It is not possible to describe each single feature of the ADIFOR system in detail within the limits of this report. Please refer to (Bischof 1995) for more detailed information. 

2.1.1 Pre-conditions

The subroutine evapor.f serves as an example to demonstrate how ADIFOR works, and what the user should take into account. It computes the evaporation rate per unit area at time t after the spill of a liquid (see Appendix A.1).

First of all, it is necessary to put the function that includes the variables which are to be differentiated into a subroutine. Moreover, the user has to ensure, that the variables that correspond to the independent and dependent variables are not both part of the same vector. Both of these requirements are fulfilled in this example anyway. In addition, it seemed reasonable in this example to split the parameter rpar, in order to avoid the computation of unnecessary derivatives. The parameter rpar is a vector transferring the initial values of certain variables to the subroutine. 

2.1.2 Specifying Input

Script-file for mandatory options:
AD_PROG  = ntest.cmp
AD_TOP   = evapor
AD_PMAX  = 6
AD_IVARS = upar
AD_DVARS = evap
  • AD_ PROG: The user can define the name of the so-called composition file, which contains a list of files that have to be included when running the original program.

  • AD_ TOP: Name of the routine responsible for evaluating the function that is to be differentiated.

  • AD_ PMAX: Option defining the maximum number of independent variables of the function to be differentiated. It is used as the first dimension of gradient objects for local and global variables.

  • AD_ IVARS: List of the names of variables containing the independent variables of the considered function.

  • AD_ DVARS: List of the names of variables containing the dependent variables of the considered function.

There are many more options available in order to make use, for instance, of sparse data structures (see [Bischof 1994]). 

2.1.3 Invocation of ADIFOR

After all mandatory options have been set, the user can invoke ADIFOR to generate a new subroutine computing the desired derivatives. In this case, the dependence of the evaporation rate of the solar altitude, cloud cover, wind speed, ambient air temperature, pool diameter, and the initial depth of the pool was to be examined. Therefore, the following derivatives had to be computed:

Glancing at the original routine (see Appendix A), it is not difficult to see that a manual differentiation of this function is rather error-prone due to the calculating components of the nested functions. 

After initializing the matrix and allocating the needed memory, the original subroutine can be replaced by the output routine of ADIFOR (see Appendix A.3). It can easily be seen, that the new routine computes the desired derivatives while calculating the evaporation rate. It even makes use of internal structures of the subroutine in order to avoid calculating values more than once. 

2.1.4 Results

Table 1 shows the derivatives of the evaporation rate taken with respect to the variables named in the heading row. 

t evap sa  cloud  ubar10  tempa  diamp  depthp 
15000.0 8.71728  2.80268  -6.14087E-03  1.63893  1.67706E-02  -4.62261E-02  -0.107984 
15000.0  8.71728  -2.80268  -6.14087E-03  1.63893  1.67706E-02  -4.62261E-02  -0.107984 
15000.0  6.26093  -1.19937  -5.52678E-02  1.63803  1.65657E-02  -4.62009E-02  -0.109105 

The variables denote the following parameters:
time elapsed since the spill [s]
evap  evaporation rate [g/m**2s] 
sa  solar altitude [rad] 
cloud  cloud cover [%]
ubar10  wind speed measured at 10 m [m/s] 
tempa  ambient air temperature [K]
diamp  pool diameter or downwind length of pool [m] 
depthp  initial depth of the pool [m].

For the results shown in the first row of Table 1, the initial values of the parameters are given as follows: 
sa = /3, cloud = 10, ubar10 = 2, tempa = 290, diamp = 10, depthp = 2.

For the second row of Table 1, only the value of sa was changed to 2/3. The third row of Table 1 represents the results for sa = 2/3 and cloud = 90. 

On the one hand, the results reflect the influence of each parameter on the evaporation rate with respect to the selected set of parameters. On the other hand, it is possible to examine the interaction between the single influences. The higher the absolute value of a derivative is, the greater influence it has on the calculation of the evaporation rate.

Moreover, it is quite visible that a change of the percentage of the cloud cover diminishes the influence of the solar altitude whereas its own influence increases. ADIFOR is a convenient tool for selecting the model parameters which should be studied more intensively by a Monte Carlo simulation. Moreover, the system provides facilities to examine the interaction between single parameters with reasonable effort.

The results of a single ADIFOR run are only valid for the selected parameter space. But the systems behaviour is very different for a different set of input parameters. Selecting typical sets of input parameters, ADIFOR can give important advise for a preselection of the input parameters which should be varied by a Monte Carlo method, but it is not suitable for replacing the Monte Carlo approach for constructing a time dependent source strength probability function. 

2.2 Monte Carlo Simulation

2.2.1 The method 

To allow a more realistic determination of the release, a Monte Carlo method is used to construct a probability function of the source-strength for different times of the release duration. With this method, all uncertain parameters are varied in a selected range. 

The parallel implementation of the source term models uses a generalized mask to run different types of source term models. The user must specify the number of input parameters which are not precisely known. In addition, an uncertainty range must be given for every parameter. This range can vary in positive or negative direction. 

2.2.2 Example: Monte Carlo simulation for an evaporation module 

The same evaporation subroutine already used as an example for the application of the automatic differentiation technique has been used to demonstrate the Monte Carlo approach (see Appendix A).

Figure 1
Figure 1: Source strength probability function 

A probability function of the source-strength is constructed using millions of runs of a deterministic release model. The input parameters of this model are varied over the given uncertainty range using a random generator. Figure 1 shows an example of a source-strength probability function for evaporating pool release for a given time.

The input parameter variations for this test run were:

  • cloud cover (0 - 100 %)
  • surface wind speed (20 %)
  • ambient air temperature (-20% / +30%)
  • initial depth of the pool (20 %)
  • total release volume (10 %)
  • total release time of the liquid phase (10 %).

The results exhibits an asymmetric distribution whereas neither the source-strength at the mean nor at the maximum of the probability function is equal to the "exact value" (which is the deterministic outcome of the model by neglecting the uncertainties). 

If the release is a complex function of parameters (which is normally the case), the Monte Carlo run is very useful for finding even the most probable values for the source-strength. This is due to the fact that even the most probable values can be considerably different form the exact one.

The mean and the emission rate at the 95th percentile of the time dependent source term probability function are used as an input for the Lagrangian dispersion model. The mean represents the most probable emission rate, whereas the 95th percentile represents a worst case scenario for the emission rate excluding a small probability of 5 percent of higher emission rates. 

3 MODEL EVALUATION

3.1 The Wind Field Model

The diagnostic wind field model is based on the well-established and evaluated DWM (EPA). To allow the user a faster selection of the meteorological condition where only sparse observational data is given, some new parametrization features were added.

Two examples show the typical wind flow in a hypothetical complex terrain. The measured meteorological values at 5 m above ground parameters were:

  • wind speed 3 m/s
  • wind direction 300
  • surface temperature 290 K.

The assumed roughness length was 0.5, a quite typical average value. The numerical grid has a horizontal resolution of 100 m x 100 m. The complex orography is characterized by fairly steep rise to a mountain range in the easterly parts of the domain, some isolated smoother hills and some sharp gashes of valleys between the range. The simulation was carried out for the same measured values but for different hypothetical stability classes.

Figure 2

Figure 2 (above) represents the flow under very stable conditions, occurring quite often during the night. The wind is forced to flow around the sides of the obstacles. Directly upwind of the hill, some of the air is blocked and becomes nearly stagnant. Additionally, there is a visible tendency to drainage winds (valley winds). The underlying background color represents the orography.

Figure 3

In Figure 3 (above), the meteorological condition is very different. An unstable vertical stratification in the lower atmospheric boundary layer (happens often on hot summer days) leads to a considerably different flow despite the same measured wind direction and speed. The wind flows over the mountain range and exhibits its highest values at the top of the summits. Additionally, there is a thermally generated uphill flow, the so-called mountain winds. As demonstrated above, the diagnostic wind model is able to describe the main features of air flow in complex terrain.

3.2 The Lagrangian Particle Model 

3.2.1 Comparison of LPD results with Gaussian solution for a simple case 

Gaussian plume models are most commonly used for modelling point source emissions. In case of an (inert) air pollutant and homogeneous conditions the resulting plume represents an exact solution. The restrictive conditions comprise a wind field with a rigid advective wind which is constant in time and space and an even homogeneous orography. Thus the horizontal and vertical wind fluctuations are constant as well. The emission rate at the point source is permanent and constant.

For preconditions given as described, the concentration dispersion of an arbitrary (inert) air pollutant at ground level is simulated by the LPD model. It is compared to the analytical Gaussian solution calculated for the respective conditions.
A horizontal grid size of 51 x 51 with a resolution of 100 m is given for the meteorological (input) and the output grid. For the vertical resolution of the input the stratification of a labil wind field is applied while for the output equidistant layers of 10 m are chosen.

The inputs for both models are as follows:

Advective wind field:

The standard deviation of the velocity fluctuation and the Lagrangian time-scale are required for the v and w component and the y, z component, respectively. They are set to the following values: 

The coordinates of the emission source are:
coord_x = 2
coord_y = 25
coord_z = 5

For the simulation time of 20 min the emission rate is assumed to be constant with

In the LPD model the number of emitted particles per time step is set to 200. Hence for a time step of 10 s each particle is associated with a mass of .

For the Gaussian solution the air pollutant concentration is calculated by:

with (x,y,z) the position in the three-dimensional coordinate system and H the effective stack height. For the test case the concentration was calculated at H= qz and the level of z = coord_z.

The standard deviations $\sigma_{\mbox{\scriptsize z}},\,
\sigma_{\mbox{\scriptsize z}}$ in the y and z direction, respectively, are obtained from the Taylor Theorem based on Taylor's homogeneous diffusion theory (Taylor 1921, Yamada 1987):

is constructed analogous.

Figure 4 (left) Figure 4 (right)

Figure 4 (above) represents the resulting dispersion of an air pollutant at the end of the simulation time (20 min) under the described conditions and the level of the source height. The Lagrangian dispersion in the right figure shows a good reproduction of the Gaussian solution (left).

The quality of the Lagrangian dispersion is also displayed in the figures below (figures 5 and 6). The diagrams show the alteration of the concentration along an intersecting line for the Gaussian solution and for the LPD dispersion at identical locations.

Figure 5

Figure 5 Figure 5

In Figure 5 (above), a line intersecting parallel to the main wind direction was chosen whereas in Figure 6 (below) the intersecting line is nearly perpendicular to the main dispersion direction.

the diagram in Figure 5 proves the apparent good reproduction of the Gaussian solution by the Lagrangian dispersion calculation along the centerline of the pollutant plume. The further the distance from the centerline (figure 5) and the emission source, the larger becomes the stochastic error in the LPD model. The stochastic effect is due to the principle of the Lagrangian model theory. It can be reduced by enlarging the number of particles emitted per time step (see Section 3.2.2).

Figure 6

Figure 6

Comparing the LPD dispersion and the Gaussian solution, figure 6 indicates that the error effects are relatively small (see also Section 3.2.2). Discussing this phenomenon, it has to be considered that the solution of Gaussian models gives an exact analytical resolution and hence are not subject to any stochastic influence.

3.2.2 Test runs with different numbers of emitted particles

Lagrangian models are based on the determination of trajectories of individual particles. The figures in this section show the dispersion plume for different numbers of emitted particles per time step - from left to right, top to bottom: 50, 100, 200, 300, 400, and 500 particles. The conditions and parameter settings are the same as presented in Section 3.2.1 for the LPD model.

The number of particles emitted per time step influence the quality of the output, as well as the computation time. The more particles are released the clearer is the shape of the plume. The figures also show that with a certain number of particles asymptotic effects appear. These effects are depended on the presently given conditions, like meteorology, orography, etc., and the time parameters (time step, simulation time).

A limiting factor for the number of particles emitted per time step is the computational burden and the available memory. To achieve reasonable results in a reasonable time - which for risk management has to be as short as possible - a compromise has to be made for the appropriate number of particles.

Figure 7

Figure 7

Figure 7

The present section and the results of Section 3.2.1 show that the LPD model represents a good reproduction of the Gaussian plume under convective conditions. It can be assumed that the modelling of pollution dispersion by the LPD model also holds for stable meteorological conditions.

The number of particles emitted should be chosen in the range of 100 to 200 particles per time step. The default value in the LPD model version is set to 200. The smaller the number of particles, the larger is the effect of the stochastical error, and thus the coarser the result (see figures). A larger number of emitted particles often shows a nearly proportional increase in computation time.

3.2.3 Dispersion calculation under different meteorological conditions

For a complex orography the effect of different meteorological conditions on the dispersion calculation by the LPD model is discussed. A point source for the pollutant emission was chosen and the horizontal reference layer set to the source height. The isolines in Figure 8 (below) denote the height above sea level in m. Equal preconditions as for the emission rates, simulation time, time step, etc. are assumed.

Figure 8 (left) Figure 8 (right)

Representative for the tests, two cases have been chosen for display. The left part of Figure 8 shows the dispersion calculated for a slightly unstable wind field with a mixing height of 800 m (stability class 3). In the right of Figure 8, the dispersion in case of slightly stable conditions with a mixing height of 300 m (stability class 5) is displayed.
For unstable meteorological conditions, the area affected by the pollutant is much wider (left part of Figure 8) than for stable case (right part of Figure 8) in the considered layer. The left part also shows that high concentrations of the pollutant occur only near to the emission source due to larger diffusion, causing, e.g., faster rising effects in an unstable wind field.
Under the given stable conditions the pollutants rest more at the ground/source level and perturbation across the main wind direction is less. Therefore the region affected by the pollutant dispersion is smaller but concentrations are higher (right part of Figure 8).

The simulated effects are as expected (also for other than the displayed meteorological conditions) and reproduce known observations gained by field experiments. This implies that the LPD model provides reasonable results for different kinds of meteorological conditions.

In combination with the argumentation in Section 3.2.1 and Section 3.2.2, the conclusion can be drawn that the LPD model represents an appropriate modul for the dispersion calculation for the given requirements.

4 REFERENCES

Bischof et al. (1992) 
ADIFOR-Generating Derivative Codes from Fortran Programs. Scientific Programming, no.1.
Bischof et al. (1994) 
The ADIFOR 2.0 System for the Automatic Differentiation of Fortran 77 Programs.
Bischof et al. (1995) 
ADIFOR 2.0 User's Guide. Mathematics and Computer Science Division of the Argonne National Laboratory, Technical Memorandum No. 192 and Center for Research on Parallel Computation at Rice University in Houston, Technical Report CRPC-95516-S, Argonne and Houston.
Douglas G. S., Kessler R. C., Carr L. (1990) 
User's Manual for the Diagnostic Wind Model. EPA-450/4-90-007C.
Garnatz Th., Haack U., Sander M., Schröder-Preikschat W. (1996) 
Experiences made with the Design and Development of a Message-Passing Kernel for a Dual-Processor-Node Parallel Computer. In Proceedings of the Twenty-Ninth Annual Hawaii International Conference on System Sciences. (Maui, Hawaii, January 3-6, 1996). IEEE Computer Society Press.
Geist A., Beguelin A., Dongarra J., Jiang W., Manchek R., Sunderam V. (1994) 
PVM: Parallel Virtual Machine, A User's Guide and Tutorial for Networked Parallel Computing. The MIT Press, Cambridge, Massachusetts.
Gerharz I., Lux Th., Sydow A. (1997) 
Inclusion of Lagrangian Models in the DYMOS System. In Proceedings of the IMACS World Congress 1997. (Berlin, Germany, August 25-29). W&T, Berlin, 6: 53-58.
Giloi W.K., Brüning U. (1991) 
Architectural Trends in Parallel Supercomputers. In Proceedings of the Second NEC International Symposium on Systems and Computer Architectures. (Tokyo, Japan, August). Nippon Electric Corp.
Taylor G.I. (1921) 
Diffusion by Continuous Movements. London Mathematical Society, 20: 196-211. 
Überhuber C. (1995) 
Computernumerik. Bd. 1, Springer Verlag, Berlin Heidelberg.
Yamada T., Bunker S., Niccum N. (1987) 
Simulation of the ASCOT Brush Creek Data by a Nested-grid, Second Moment Turbulence-closure Model and a Kernel Concentration Estimator. In Proceedings of the AMS 4th Conference on Mountain Meteorology, (Seattle, WA, August 24-28). 175-179.

Copyright 1995-2002 by:   ESS   Environmental Software and Services GmbH AUSTRIA