Statistical Glossary
From Intamap
Contents |
Mathematical symbols
- s: spatial coordinates s = (sx sy) in a 2 dimensions space
- y(si): measurement or observed data at location si
- z(si): state variable at location si associated with the observed data
- n: number of observations or samples
- y: vector of sampled observations
- D: region of interest
- m: mean
- E: mathematical expectation or expected value of a statistical moment; for instance the expected value E(y(si)) for a gaussian variable is its mean
- X: matrix of covariates (or environmental predictors/auxiliary variables) at the n observation locations
- h: space lag
- C(h): covariance
- γ(h): variogram
- b, β: regression coefficients or parameters estimates of the trend variables. b are ‘known’, β are to be estimated
- ε: stochastic error term
- σ(s): error variance; kriging variance
- θ: parameters of the covariance function or variogram
- λ: kriging weights
- Σ: the n × n variance-covariance matrix of the residuals
- c0: vector of covariances between residuals at unvisited and observed locations
Glossary
abc
- Anisotropy:
The structure of the random field Z depends on direction. See isotropy.
- Bayesian transgaussian kriging:
Assumes that a non-Gaussian spatial random field variable can be transformed to a Gaussian random field by means of the Box-Cox transformation. The Gaussian random field is assumed to have a linear trend function and an isotropic unknown covariance function. The a posteriori distribution is specified by means of simulation.
- BLUE:
Best linear unbiased estimation of the drift in a universal kriging model.
- BLUP:
Best linear unbiased prediction. Simply spatial predictions made using the BLUE kriging formulation. Unbiasedness is assured by the constraint that the average estimation error must sum to zero . In order to meet this requirement, the kriging weights must sum to 1. The best estimate is one that minimizes the mean square estimation error.
- Covariance function:
The expected value of the covariance between any pair of locations s1 and s2 in a random field variable (i.e. E[(Z(s1)-m)( Z(s2)-m)]). The covariance function depends only on the distance lag (or separation) for a stationary random field.
def
- Drift:
General term for designing non stationarity of a random field. The drift may depend on trend variables (also referred to as predictor variables, explanatory variables, and covariates); these are auxiliary environmental variables that have a known or correlative relationship with the observed values of the target variable. A linear model is applied to the drift variables to capture the variable mean or trend component observed in the data.
- Exponential variogram model:
This model is linear at short distances, but it rises exponentially with distance and asymptotes more gradually than the spherical model. Practical range: h = 3a.
γ(h)=1-exp(-3h/a)
ghi
- Gaussian variogram model:
Model is parabolic near the origin, but is otherwise similar to the exponential model.
γ(h)=1-exp(-3h2/a2)
Geostatistics: A field of statistics that has developed methods, theories, and techniques for the analysis of spatially and temporally correlated data, and for the estimation of spatial uncertainty. Models used are based on the theory of regionalized variables (e.g. measures of natural phenomena such as soil type or elevation), which due to their spatial properties, are considered intermediate between truly random and completely deterministic variables.
- Harmonization:
The generation of consistent and exchangeable information. In monitoring of environmental data across the European Union, it is particularly important to be able to harmonize heterogeneous data from a variety of countries with different collection methods and spatial data holdings. Harmonization strategies may remove constant measurement biases, for example.
- Heterogeneities:
Refers to spatial discontinuities in an area made up of elements that are not of the same type. Heterogeneity is scale-dependent. At large extents and coarse resolutions, a pattern may appear homogeneous, whereas at small extents and finer spatial resolutions, heterogeneity emerges. Heterogeneities can refer to measured differences in spatial structure, composition, or function, and can be the result of measurement biases.
- Interpolation:
Estimation of unobserved data points in a spatial domain from a finite set of measurements.
- Intrinsic random field:
Random field with spatial increments that are second order stationary.
- Inverse distance weighting (IDW):
Simple interpolation method where estimation depends on weighting measurement locations by their distance to the estimate. IDW weights sum to 1.0 and the weighting function is an inverse distance-squared function that is scaled to the most distant points used in estimation.
- Isotropy:
A random field variable Z is isotropic if it has a correlation structure (variogram) that is constant in all directions.Therefore, only the straight line distances between the points matters, and not the orientation of the line segments.
jkl
- Kriging:
Involves applying a general methodology known as the best linear unbiased estimation (BLUE) to estimating the form of a regionalized variable in a stationary or intrinsic random field. The method depends on the statistical moments of a spatial random field.
mno
- Markov random chain:
There are many ways of constructing these chains. A sequence of random variables {X0, X1, X2, …} such that at each time (t ≥ 0) the next state Xt+1 is sampled from a distribution P(Xt+1|Xt) which depends only on the current state of the chain, Xt. The sequence of random variables is called a Markov random chain and the probability distribution is called the transition kernel of the chain. Generally the assumption is that the transition kernel does not vary in time.
- Markov random field MRF:
Best described by example. For ‘nearest neighbour’ Markov random fields, the conditional distribution of an estimate in a given area is given by the value in neighbouring areas.
- Maximum Likelihood:
Estimation procedure that rely on the Gaussian assumption…
- Measurement biases:
Systematic measurement errors. For example, in the measurement of gamma dose rates, detectors have a self-effect, which is the value reported in a zero-radiation field. Different monitoring techniques are used across the EU (e.g. γ-spectrometry verses γ-probes).
- Optimization criteria:
Criteria used to optimize the design of sample locations to efficiently monitor environmental variables in space and time. Examples of criteria used in literature include:
--> Spatial coverage: samples spaced according to a geometric measure of geographic space or distances between points.
--> Response-surface design: samples selected to provide the best estimates of model parameters and model structure (i.e. feature space).
--> Equal range design: stratification limits are set at equal distances in feature space and samples are selected randomly from each stratum. The sample with the best spatial coverage is selected.
--> K-optimal: K stands for kriging. Optimizes by minimizing the kriging error variance
--> D-optimal: D stands for the variance-covariance matrix of the regression coefficients. Method minimizes the determinant of the variance-covariance matrix of estimated regression coefficients (feature space).
--> CP-optimal: CP stands for covariance parameter. Optimizes by minimizing the determinant of the asymptotic covariance matrix of covariance parameter estimates.
--> EK-optimal: EK stands for empirical kriging. Hybrid design criterion which uses kriging, but emphasizes the additional prediction uncertainty incurred due to the estimation of covariance parameters.
--> Bayesian approaches: Spatially averaged prediction variance, taking parameter uncertainty for the spatial correlation into account.
- Ordinary kriging (OK):
Kriging of a stationary process with an unknown mean.
pqr
- Range:
The range of a variogram (resp. a covariance function) is the distance lag at which its maximum (resp. minimum) is reached. For a stationary random field, the covariance minimum is zero. For some models the range is infinity hence defining the practical range: the distance at which the variogram value is 95% of the maximum (sill).
- Regression kriging:
Equivalent term for universal kriging but sometimes more specifically for universal kriging with a pure nugget effect as residuals structure leading to ordinary least squares formulation.
- Residual: Difference between the observed data and the model fit (usually defined as observed - predicted). It provides estimates of the unobservable statistical errors around the expected value of the population. The sum of residuals within a random sample is usually zero when the model included fitting a mean term, but this is not true for cross validation residuals from kriging. Distribution of residuals is a zero-mean random function capturing the erratic fluctuations of the studied phenomenon (i.e. the stochastic error term).
stu
- Sampling design optimization:
Refers to the general problem of experimental design theory where the placement of sample points is optimized (by adding, removing, or repositioning points) according to criteria that provide the most accurate estimates for the structure of the processes of interest (e.g. an accurate estimation of the parameters used in the model). Samplind design optimization is preferred to network design optimization because network optimization typically refers to a specific set of problems in spatial graph theory where the links between points are fixed (e.g. by roads or travel routes).
- Simple kriging (SK):
Kriging of a stationary process with a known mean. The mean is specified separately from the data. And the kriging estimator is a weighted average of the data and the given mean. The estimator uses the covariance function.
- Spatial random field:
A set of spatially indexed random variables. A spatial field variable is not known everywhere, but must be estimated over a region from sample locations. Because of scarcity of information, there is no unique solution. Therefore each realization of a spatial field is one out of an ensemble of possibilities, which define all possible solutions to the estimation problem. The ensemble of realizations together with their probabilities make up a random field (also called a random function, or spatial stochastic process).
- Spatial simulated annealing (SSA):
An iterative, combinatorial optimization algorithm in which a sequence of combinations is generated by slightly and randomly changing previous combinations. With each new generation, a quality measure (optimization criteria) is evaluated and compared with the value of the previous combination. The new combination is accepted if the quality measure is improved by the change. However, in order to avoid being trapped in a local optimum, the probability of accepting a worsening combination is not zero, analogous to controlled cooling or annealing in condensed matter physics.
- Spherical variogram model:
One of the most commonly applied models to fit the variogram. The model is linear at small separation distances and reaches a sill at the range. Range: h=a.
γ(h) = 1.5(h/a) - 0.5(h/a)3
- Stationary random field:
Random field with constant mean and with a covariance function that depends only on the distance between pairs of points (i.e. independent of direction).
- Statistical moment:
A statistical average or other summary of all possible ensemble realizations of a spatial random field, which provides an estimate of the expected value.
--> First moment: the mean function, gives the expected value of a random field variable, Z (i.e. m = E[Z]) at any location.
--> Second moment: the covariance function, gives the expected value of the covariance between any pair of locations s1 and s2 in a random field variable (i.e. E[(Z(s1)-m)( Z(s2)-m)]).
- Stochastic error:
Indeterminant error of measurement or estimation that cannot be known in advance. The distribution of stochastic errors can be characterized by statistical properties (e.g. standard deviation); these errors are generally assumed random, normal, with zero mean.
- Universal kriging (UK):
Generalization of ordinary kriging to the estimation of a variable mean (depending on spatial coordinates) and interpolation of an intrinsic residual component. UK is mathematically equivalent to regression kriging, kriging with an external drift, residual kriging, and kriging with a trend – see Hengl et al. 2007, Computers & Geosciences 33: 1301–1315. But, with UK the trend is modelled as a function of coordinates within the kriging system rather than modelling the trend as a function of auxiliary environmental variables.
vwx
- Variogram:
A plot of semivariance by spatial lag. Note that γ(h) is often referred to as the semivariogram, and the term variogram refers to 2 γ(h). However, it is now common to refer to γ(h) as the variogram.
--> Sample variogram: half of the average squared difference between Nk pairs of values within the k-th distance class or lag h.
γ(h) = 1/Nk ∑ ki=1 |z(si)-z(si+h)|2/2
--> Theoretical variogram: a mathematical fit to the variogram.
yz
(empty)
