Spatial Analysis - defining the problem Collecting point data
Interpolations are always a raster-based GIS Nature of a continuous surface model? Raster surfaces TIN surfaces
TINs are more efficient at representing three-dimensional surfaces A Quick Flash Summary of the Nature of Raster and TIN surfaces Creating raster surfaces from points
Continuous Surfaces are generated from points and from images What is a spatial interpolation? Why interpolate? Deterministic Models
Inverse Distance Weighting (IDW)
More on the Inverse Distance Weighting (IDW) Natural Neighbourhood Interpolation
How Natural Neighbourhood Interpolation Works? Variations of Natural Neighbourhood Interpolation Spline Interpolation
Spline the Regularized Method Spline the Tension Method
Rectangular Interpolation (not available in ArcGIS extensions) Rectangular Interpolation How it Works Trends based on Polynomial Interpolations The Essence of Polynomial Interpolations But the Real World has Valleys and Plateaus Visualizing local polynomial interpolations
Polynomial Analysis: Visualizing radial basis functions
A Quick Flash Summary of Deterministic Models (Interpolations)
Statistical techniques using a semi-variogram for developing continuous surface models (Kriging) Effectiveness of Kriging How Kriging Works? More on Kriging Works?
Kriging Works Similarly to Inverse Distance Weighting To make a prediction with Kriging, two tasks are necessary: Generating a Semivariogram Understanding Semivariance Semivariance illustrated Spatial autocorrelation
Understanding a semivariogram-the range, sill, and nugget The range and sill The nugget
An Omnidirection Semivariogram Modifying Directional Parameters Changing the Variogram Model Anisotropic Modelling Other Kriging Techniques
A Quick Flash Summary of Geostatistics - Kriging and the Semivariogram
Developing Triangular Irregular Network (TIN) models for elevation, slope and aspect modelling The Essentials of a TIN model
A TIN model explained in more detail Choices when modelling a TIN
Spatial analysis of categorical data using Neigbourhood Analysis (e.g. generation of soil maps) Voronoi Maps Explained
Which Interpolation methods to use?
Some interpolation techniques can be automatically applied to certain data types. Application of Interpolation Techniques Illustrated
So lets have a look at some typical point data that you generate and work out which interpolated works best.
Is interpolation processing speed a factor?
Is it necessary to over/undershoot the local Min. and Max. values? A Quick Flash Summary of the Other Methods of Spatial Analysis A
Spatial Analysis - defining the problem
As botanists we collect information that is discrete at a particular sampling point. This discrete data has to be converted in some way into useful map representations. In order to achieve this we need to interpolate or model our data in way that accurately predicts the occurrence of similar features in areas where we have not sampled. In essence from known information we need to extrapolate to unknown areas to produce useful maps.
1 Collecting point data
To illustrate this we might have rainfall stations across the country. At each station we know what the rainfall is, but between the stations there is no recording of the data, but we will want to have an estimate of what the rainfall is at these places. The simplest ways to do this is using various interpolation methods for surface generation.
1 A Quick True or False Question
Is Rainfall a continuous variable? a)
True b) False
2 Interpolations are always a raster-based GIS
These interpolations utilize a raster-based GIS (cells) and there are a variety of techniques to derive a prediction for each location. Each of these techniques are underpinned by different assumption (e.g. whether the data fits a normal distribution).
1 Multiple Choice Question
What is a Normal Distribution? a)
A distribution that is bell shape and reflects a underlying variable that has a continuous distribution b)
A distribution of frequency classes c)
A distribution that has been standardized using a transformation d)
A distribution that has a skewness to either the left or the right e)
All of the above f)
None of the above
3 Nature of a continuous surface model?
A surface is a continuous field of values that may vary over an infinite number of points. For example, points in an area on the earth's surface may vary in elevation, proximity to a feature, or concentration of a particular chemical. Any of these values may be represented on the z-axis in a three-dimensional x,y,z coordinate system, so they are often called z-values.
Because a surface contains an infinite number of points, it is impossible to measure and record the z-value at every point. Surface models allow you to store surface information in a GIS. A surface
model approximates a surface by taking a sample of the values at different points on the surface and then interpolating the values between these points.
Surface model of soil chemical composition across an area with points showing where the concentration was sampled.
There are types of surface models: Rasters and TINs (Triangular Irregular Networks), and obvious a large variety of techniques to develop such surfaces.
Rasters represent a surface as a regular grid of locations with sampled or interpolated values. TINs represent a surface as a set of irregularly located points linked to form a network of triangles
with z-values stored at the nodes.
1 Multiple Choice Question
What of the following statements about Vector and Raster GIS is incorrect? a)
A Vector GIS is represented by grided cells b)
A Raster GIS has no spatial reference c)
A Vector GIS is made up of points, lines and polygons d)
A Raster GIS is only useful for modeling e)
A Vector GIS is not as accurate in the calculating of lengths and areas as a Raster GIS
f)
All of the above comments are incorrect
4 Raster surfaces
Raster surfaces are usually stored in grid format. A grid consists of a rectangular array of uniformly spaced cells with z-values. The smaller the cells, the greater the locational precision of the grid.
On the left is a higher precision raster model and on the right is a lower precision raster model
You cannot locate individual features - for example, the summit of a mountain - any more precisely than the size of the grid cells.
Rasters are also used to store images and thematic grid data.
1 A Quick True or False Question to Test Yourself
You cannot locate individual features in a Raster GIS because there are no objects in this type of GIS. a) True b) False
5 TIN surfaces
TINs consist of nodes that store z-values, connected by edges to form contiguous, non-overlapping triangular facets. The edges in TINs can be used to capture the position of linear features that play an important role in the surface such as ridgelines or stream courses.
Nodes and edges of a TIN (red and blues respectively) and Nodes, edges, and faces (which are thematically shaded)
1 Multiple Choice Question
TIN in spatial analysis stands for a)
Technically Innovative Networks b)
Triangular Inverted Networks c)
Telemetrically Inverted Networks d)
Technologically Invented Neurons e)
Telemetrically Induced Networks f)
None of the above
6 TINs are more efficient at representing three-dimensional surfaces
Because the nodes can be placed irregularly over the surface, TINs can have a higher resolution in areas where a surface is highly variable or where more detail is desired and a lower resolution in areas that are less variable or of less interest.
The input features used to create a TIN remain in the same position as nodes or edges in the TIN. This allows a TIN to preserve all of the precision of the input data while simultaneously modeling the values between known points. You can include precisely located features on a surface - such as mountain peaks, roads, and streams - by using them as input features to the TIN.
TIN models are less widely available than raster surface models and tend to be more expensive to build and process. The cost of obtaining good source data can be high, and processing TINs tends to be less efficient than processing raster data because of their complex data structure.
TINs are typically used for high precision modeling of smaller areas, such as in engineering applications, where they are useful because they allow calculations of planimetric area, surface area, and volume.
1 Multiple Choice Question
In engineering applications TINs more efficient than rasters because ... a)
Nodes indicating the axis of triangle can be placed irregularly b)
Surfaces are represent by faces that can be shaded in different colours to show relief c)
They are much smaller files and therefore are quicker to render than their raster counterparts d)
Since you do not even need to add a texture to the surface very large surfaces with lots of detail is viewable e)
All of the above are true of TINs f)
There are only two correct answers in Questions A to D
7 A Quick Flash Summary of the Nature of Raster and TIN surfaces
B
Creating raster surfaces from points
1 Surfaces of continuous data are usually generated from samples taken at points across the area. For example, the irregularly spaced weather stations in a region can be used to create raster surfaces of temperature or air pressure. The resulting surface is a regular grid of values.
Continuous Surfaces are generated from points and from images
Did you know that satellites are increasingly using multi-spectral and hyper-spectral images that represent climate variables .... Here is a quick test for you to do to revise your knowledge on satellites.
1 True or False?
LANDSAT is a Multi-spectral Sensor a) True b) False
2 True or False?
The LANDSAT sensor has THREE visible bands a) True b) False
3 True or False?
The SPOT panchromatic Sensor has three bands a)
True b) False
4 True or False?
The SPOT Multi-spectral Sensor has THREE visible bands, these being RED, GREEN and BLUE. a) True b) False
5 True or False?
The SPOT Multi-spectral Sensor has a higher Resolution than any LANDSAT 7 band. a) True b) False
6 True or False?
The LANDAT 7 which was launched on April 15, 1999 is still operating and sending us images of the earth. a) True b) False
7 True or False?
South Africa is still operating SUNSAT an Imaging Satellite build at the University of Stellenbosch's Electrical Engineering. a) True b) False
8 Multiple Response Question
The following is a characteristic of NOAA's AVHRR sensor a)
It has one one visible band b)
It can calculate vegetation cover as well as make observation about weather and climate c)
It can provide Normalized Difference Vegetation Index (NDVI) images d)
It takes an image of every part of the earth each day e)
It has image swath of approximately 1.1 km f)
It has provided good imagery for examining time series change
9 Multiple Response Question
Go onto the Internet and do a Search on MODIS and the see which of the following is correct for
this satellite a)
The MODIS Sensor aboard the TERRA satellite has 36 Bands b)
It can see the roads in Cape Town c)
It can used for examining Climate Change d)
It takes an image of every part of the earth each day e)
It has a resolution that is as good as LANDSATS f)
It can provide information about the \"Ozone Hole\" over Antarctica
10 True or False
You enjoying this quiz and learnt something's new? a) True b) False
2 What is a spatial interpolation?
Interpolation predicts values for cells in a raster from a limited number of sample data points. It can be used to predict unknown values for any geographic point data: elevation, rainfall, chemical concentrations, noise levels, and so on.
On the left is a point dataset of known values. On the right is a raster interpolated from these points. Unknown values are predicted with a mathematical formula that uses the values of nearby known points.
In this example the input points happen to fall on cell centers - this is unlikely in practice. One problem with creating rasters by interpolation is that the original information is degraded to some extent - even when a data point falls within a cell, it is not guaranteed that the cell will have exactly the same value.
Interpolation is based on the assumption that spatially distributed objects are spatially correlated; in other words, things that are close together tend to have similar characteristics. For instance, if it is raining on one side of the street, you can predict with a high level of confidence that it is also raining on the other side of the street. You would be less sure if it was raining across town and less confident still about the state of the weather in the neighbouring province.
3 Why interpolate?
Visiting every location in a study area to measure the height, magnitude, or concentration of a phenomenon is usually difficult or expensive. Instead, dispersed sample input point locations can be selected and a predicted value can be assigned to all other locations. Input points can be either randomly, strategically, or regularly spaced points containing height, concentration, or magnitude measurements.
A typical use for point interpolation is to create an elevation surface from a set of sample measurements. Each point represents a location where the elevation has been measured. The values between these input points are predicted by interpolation.
The resulting grid is a prediction of what the elevation is at any location on the actual surface.
There are effectively two types of techniques for generating raster surfaces
Deterministic Models use a mathematical function to predict unknown values and result in hard classification of the value of features.
Statistical Techniques produce confidence limits to the accuracy of a prediction but are more difficult to execute since more parameters need to be set. C
Deterministic Models
Deterministic models include Inverse Distance Weighted (IDW), Rectangular, Natural Neighbours, and Spline. You can also develop a trend surface using polynomial functions to create a customized and highly accurate surface.
In contrast to Deterministic Models are Statistical methods and are based on statistical models that include autocorrelation (statistical relationships among the measured points). Not only do these techniques have the capability of producing a prediction surface, but they can also provide some measure of the certainty or accuracy of the predictions. Statistical models include Ordinary Kriging, Simple Kriging, and Universal Kriging.
1 Which of the following techniques is not deterministic? a)
Inverse Distance Weighting? b)
Spline c)
Rectangular d)
Natural Neighbours e) Kriging f)
All of the above are derterministic
1 Inverse Distance Weighting (IDW)
The Inverse Distance Weighting interpolator assumes that each input point has a local influence that diminishes with distance. It weights the points closer to the processing cell greater than those further away. A specified number of points, or all points within a specified radius can be used to determine the output value of each location. Use of this method assumes the variable being mapped decreases in influence with distance from its sampled location.
The Inverse Distance Weighting (IDW) algorithm effectively is a moving average interpolator that is usually applied to highly variable data. For certain data types it is possible to return to the collection site and record a new value that is statistically different from the original reading but within the general trend for the area. Examples of this type of data include soil chemistry results, environmental monitoring data, and consumer behaviour observations. It is not desirable to honour local high/low values but rather to look at a moving average of nearby data points and estimate the local trends.
The interpolated surface, estimated using a moving average technique, is less than the local maximum value and greater than the local minimum value.
2 More on the Inverse Distance Weighting (IDW)
The IDW technique calculates a value for each grid node by examining surrounding data points that lie within a user-defined search radius. Some or all of the data points can be used in the interpolation process. The node value is calculated by averaging the weighted sum of all the points. Data points that lie progressively farther from the node influence the computed value far less than those lying closer to the node
A radius is generated around each grid node from which data points are selected to be used in the calculation. Options to control the use of IDW include
Power
Search Radius
Fixed search radius
Variable Search Radius
Barrier
Power
With IDW you can control the significance of known points upon the interpolated values, based on their distance from the output point. Power is the variable that defines the exponential rate of decay of influence of neighbouring points the farther they lie from the grid node By defining a high power more emphasis is placed on the nearest points and the resulting surface will have more detail and be less smoothed. Specifying a lower value will give more influence to points that are further away resulting in a smoother surface. Increasing the power will decrease the relative influence of more distant neighbours. Values range between one and ten.
Search Radius
Search Radius defines the maximum size, in map units, of a circular zone centred around each grid node within which point values from the original data set are averaged and weighted according to their distance from the node. The characteristics of the interpolated surface can also be controlled by applying a search radius that is fixed or variable, which limits the number of input points that can be used for calculating each interpolated cell.
Consequently the user can define both the minimum and maximum number of data points averaged within each zone. Fixed Search Radius
A fixed search radius requires a distance and a minimum number of points. The distance dictates the radius of the circle of the neighbourhood, in map units. The distance of the radius is constant, so for each interpolated cell, the radius of the circle used to find input points is the same. The minimum number of points indicates the minimum number of measured points to use within the neighbourhood. All measured points that fall within the radius will be used in the calculation of each interpolated cell. When there are fewer measured points in the neighbourhood than the specified minimum, the search radius will be increased Variable Search Radius
With a variable search radius, the number of points used in calculating the values of the interpolated cell is specified, which makes the radius distance vary for each interpolated cell, depending on how far it has to search around each interpolated cell, depending on how far it has to search around each interpolated cell to reach the specified number of input points. Thus, some neighbourhoods can be small and others can be large, depending on the density of the measure points near the interpolated cell. You can also specify a maximum distance, in map units that the search radius cannot exceed. Barrier
A line or polygon dataset can be used as a break that limits the search for input sample points. A line can represent a cliff, ridge, or some other interruption in a landscape. Only those input sample points on the same side of the barrier as the current processing cell will be considered. A choice of No Barriers will use all points within the identified radius.
3 Natural Neighbourhood Interpolation
The Natural Neighbour method is a geometric estimation technique that uses natural neighbourhood regions generated around each point in the data set.
Like IDW, this interpolation method is a weighted-average interpolation method. However, instead of finding an interpolated point's value using all of the input points
weighted by their distance, Natural Neighbors interpolation creates a Delauney Triangulation of
the input points and selects the closest nodes that form a convex hull around the interpolation point, then weights their values by proportionate area. This method is most appropriate where sample data points are distributed with uneven density. It is a good general-purpose interpolation technique and has the advantage that you do not have to specify parameters such as radius, number of neighbours or weights.
This technique is designed to honour local minimum and maximum values in the point file and can be set to limit overshoots of local high values and undershoots of local low values. The method thereby allows the creation of accurate surface models from data sets that are very sparsely distributed or very linear in spatial distribution.
In the natural neighbourhood the interpolated surface is tightly controlled by the original data points
4 How Natural Neighbourhood Interpolation Works?
Very simply, the Natural Neighbour method makes use of an area-stealing, or area-weighting, technique to determine a new value for every grid node. As shown below natural neighbourhood region is first generated for each data point. Then, at every node in the new grid, a new natural neighbourhood region is generated that effectively overlies various portions of the surrounding natural neighbour regions defining each point. The new grid value is calculated as the average of the surrounding point values proportionally weighted according the intersecting area of each point.
A display of the natural neighbourhood regions around the point file as well as the created around a grid node.
5 Variations of Natural Neighbourhood Interpolation
Three variations to this basic technique are incorporated into the Natural Neighbour interpolator are usually available.
A graph showing the three variations of the Natural Neighbour Interpolator.
1) Black line represents a Constant Value interpolator in which each grid node takes on the value of the underlying natural neighbourhood region.
2) Mid-grey line represents a Linear Solution, where the grid value is determined by averaging the point values associated with surrounding natural neighbour regions and weighted according to the are that is encompassed by a temporary natural region generated around the grid cell
3) Light-grey line represents a Slope-based Solution where the grid value is determined by averaging the extrapolated slope of each surrounding natural neighbour region and area weighted as in the Linear Solution. By examining the adjacent points, a determination is made as to whether that point represents a local maximum or minimum value. If such is the case, a slope value of zero is assigned to that value and the surface will therefore honour that point by neither overshooting nor undershooting it.
6 Spline Interpolation
Spline estimates values using a mathematical function that minimizes overall surface curvature, resulting in a smooth surface that passes exactly through the input points.
Conceptually, it is analogous to bending a sheet of rubber to pass through known points while minimizing the total curvature of the surface. It fits a mathematical function to a specified number of nearest input points while passing through the sample points. This method is best for gently varying surfaces, such as elevation, water table heights, or pollution concentrations. There are two spline methods
7 Spline the Regularized Method
The regularized method creates a smooth, gradually changing surface with values that may lie outside the sample data range.
Applying the regularized Spline methods allows a surface to over- and under-shoot the sample data range
Using a regularized spline the higher the weights, the smoother the surface. Weights between 0 to 5 are the most suitable with typical values of 0, 0.001, 0.01, 0.1, and 0.5.
8 Spline the Tension Method
The Tension method tunes the stiffness of the surface according to the character of the modelled phenomenon.
It creates a less-smooth surface with values more closely constrained by the sample data range. For Tension, the higher the weight the coarser the generated surface. The values entered have to equal or greater than zero. The typical values are 0, 1, 5, and 10.
Both the Regularized and Tension spline methods can be further refined by defining the number of points used in the calculation of each interpolated cell. The more input points you specify, the more each cell is influenced by distant points and the smoother the resulting surface.
9 Rectangular Interpolation (not available in ArcGIS extensions)
The Rectangular Interpolation technique is most commonly applied to data that is very regularly and very closely spaced, such as points generated from another gridding application.
The technique creates an interpolation surface that passes through all points without overshooting the maximum values or undershooting the minimum values.
Rectangular Interpolation passes through all points without overshooting or undershooting the lsample maximum and minimum values respectively
10 Rectangular Interpolation How it Works
The Rectangular Interpolator locates the four nearest data points lying within a circular search zone, one from each quadrant, and connects them with a double linear \"rectangular\" framework (see below).
An appropriate value is calculated for each node using the slopes of connecting sides of the rectangle. However, in the absence of additional smoothing, linear artefacts are often generated across the surface when working with an irregular data point distribution. For this reason, the rectangular interpolator is most appropriate for interpolating data that is already arrayed in a closely spaced, regular grid format.
A radius is generated around each grid node from which the closest data point in each quadrant is selected to be used in the calculation
11 Trends based on Polynomial Interpolations
Visualizing global polynomial interpolation
There are other solutions for predicting the values for unmeasured locations. Another proposed site for the observation area is on the face of a gently sloping hill.
The face of the hill is a sloping plane. However, the locations of the samples are in slight depressions or on small mounds (local variation). Using the local neighbors to predict a location may over or underestimate because of the influence of depressions and mounds. Further, you may pick up the local variation and may not capture the overall sloping plane (referred to as the trend). The ability to identify and model local structures and surface trends can increase the accuracy of your predicted surface.
12 The Essence of Polynomial Interpolations
To base your prediction on the overriding trend, you can fit a plane between the sample points. A plane is a special case of a family of mathematical formulas called polynomials. You then determine the unknown height from the value on the plane for the prediction location. The plane may be above certain points and
below others.
The goal for interpolation is to minimize error. You can measure the error by subtracting each measured point from its predicted value on the plane, squaring it, and adding the results together. This sum is referred to as a least-squares fit. This process is the theoretical basis for the first-order global
polynomial interpolation.
A first order polynomial interpolation based on simple plain using a least-squares fit.
13 But the Real World has Valleys and Plateaus
But what if you were trying to fit the plane to a landscape that is a valley? You will have a difficult task obtaining a good surface from a plane. However, if you are allowed one bend in the plane (see image below), you may be able to obtain a better fit (get closer to more values). To allow one bend is the basis for second order global polynomial interpolation. Two bends in the plane would be a third-order polynomial, and so forth. The bends can occur in both directions, possibly resulting in a \"bowl-shaped\" surface.
Modellers often work to the \"fifth order\" polynomial analysis.
Allowing the plain to bend will provide a better fitting surface.
14 Visualizing local polynomial interpolations
Now what happens if the area slopes, levels off, and then slopes again? Asking you to fit a flat plane through this study site would give poor predictions for the unmeasured values. However, if you are permitted to fit many smaller overlapping planes, and then use the center of each plane as the prediction for each location in the study area, the resulting surface will be more flexible and perhaps more accurate. This is the conceptual basis for local polynomial interpolation.
Applying many smaller, overlapping planes will improve surface prediction in a typical surface that has slopes and plains
15 Polynomial Analysis: Visualizing radial basis functions
Radial basis functions enable you to create a surface that captures global trends and picks up the local variation. This helps in cases where fitting a plane to the sample values will not accurately represent the surface. To create the surface, suppose you have the ability to bend and stretch the predicted surface so that it passes through all of the measured values.
There are many ways you can predict the shape of the surface between the measured points. For example, you can force the surface to form nice curves (thin-plate spline), or you can control how tightly you pull on the edges of the surface (spline with tension). This is the conceptual framework for interpolators based on radial basis functions.
A radial basis for constructing
16 A Quick Flash Summary of Deterministic Models (Interpolations)
D Statistical techniques using a semi-variogram for developing continuous surface models (Kriging)
Kriging is a geostatistical interpolation technique that considers both the distance and the degree of variation between known data points when estimating values in unknown areas. A kriged estimate is a weighted linear combination of the known sample values around the point to be estimated.
Applied properly, Kriging allows the user to derive weights that result in optimal and unbiased estimates. It attempts to minimize the error variance and set the mean of the prediction errors to zero so that there are no over- or under-estimates. Included with the Kriging routine is the ability to construct a semivariogram of the data which is used to weight nearby sample points when interpolating. It also provides a means for users to understand and model the directional (e.g., north-south, east-west) trends of their data. A unique feature of Kriging is that it provides an estimation of the error at each interpolated point, providing a measure of confidence in the modeled surface and for this reason it is considered to be a statistical technique rather than a deterministic method.
1 Effectiveness of Kriging
The effectiveness of Kriging depends on the correct specification of several parameters that describe the semivariogram and the model of the drift (i.e., how does the mean value change over distance). Because Kriging is a robust interpolator, even a naïve selection of parameters will
provide an estimate comparable to many other grid estimation procedures. The trade-off for estimating the optimal solution for each point by Kriging is computation time. Given the additional trial and error time necessary to select appropriate parameters, Kriging should be applied where best estimates are required, data quality is good, and error estimates are essential.
Three different methods of Kriging interpolation exist; Ordinary Kriging, Simple Kriging, and Universal Kriging.
2 How Kriging Works?
Kriging is a weighted moving average technique, similar in some ways to Inverse Distance Weighting (IDW) interpolation. Comparing the two techniques provides insight to the benefits of Kriging. With IDW each grid node is estimated using sample points which fall within a circular radius. The degree of influence each of these points will have on the calculated value is based upon the weighted distance of each of sample point from the grid node being estimated. In other words, points that are closer to the node will have a greater degree of influence on the calculated value than those that are farther away. The general relationship between the amount of influence a sample point has with respect to its distance is determined by IDW's power (or exponent) setting, graphically represented below.
Decay Curves used by IDW Interpolation (Exponent values is analogous to Power curves). Most applications use a power (or exponent) of 2.
3 More on Kriging Works?
The disadvantage of the IDW interpolation technique is that it treats all sample points that fall within the search radius the same way.
For example, if a power (or exponent ) of 1 is specified, a linear distance decay function is used to determine the weights for all points that lie within the search radius (see above figure). This same function is also used for all points regardless of their geographic orientation to the node (north,
south etc.) unless a sectored search is implemented. Kriging on the other hand, can use different weighting functions depending on, 1) the distance and orientation of sample points with respect to the node, and 2) the manner in which sample points are clustered.
Unless you developed a sectored search IDW implements a circular search for averaging values.
4 Kriging Works Similarly to Inverse Distance Weighting
Kriging is similar to IDW in that it weights the surrounding measured values to derive a prediction for an unmeasured location. The general formula for both interpolators is formed as a weighted sum of the data:
where
Z(si) is the measured value at the i th location;
?iis an unknown weight for the measured value at the i th location;
s0 is the prediction location;
N is the number of measured values.
In IDW, the weight, ?i, depends solely on the distance to the prediction location. However, in Kriging, the weights are based not only on the distance between the measured points and the prediction location but also on the overall spatial arrangement among the measured points. To use the spatial arrangement in the weights, the spatial autocorrelation must be quantified. Thus, in Ordinary Kriging, the weight, ?i , depends on a fitted model to the measured points, the distance to the prediction location, and the spatial relationships among the measured values around the
prediction location.
5 To make a prediction with Kriging, two tasks are necessary:
(1) to uncover the dependency rules and
(2) to make the predictions.
To realize these two tasks, Kriging goes through a two-step process:
(1) the creation of variograms and covariance functions to estimate the statistical dependence (called spatial autocorrelation) values, which depends on our model of autocorrelation (fitting a model), and
(2) actually predicting the unknown values (making a prediction). It is because of these two distinct tasks that it has been said that Kriging uses the data twice: the first time to estimate the spatial autocorrelation of the data and the second time to make the predictions.
6 Generating a Semivariogram
As mentioned above, Kriging uses a different weighting function depending on both the distance and geographic orientation of the sample point to the node being calculated.
The problem is that it is impossible for a user, at a first glance, to know precisely how a data set varies outward from any one location with respect to distance and direction. There are, however, many techniques available to help determine this, the most popular being a variance analysis.
Example of data that has no variance crosswise but varies greatly along the lengthwise axis of the data set.
7 Understanding Semivariance
Kriging uses a property called the semivariance to express the degree of relationship between points on a surface. The semivariance is simply half the variance of the differences between all possible points spaced a constant distance apart.
The semivariance at a distance d = 0 will be zero, because there are no differences between points that are compared to themselves. However, as points are compared to increasingly distant points, the semivariance increases. At some distance, called the Range, the semivariance will become approximately equal to the variance of the whole surface itself. This is the greatest distance over which the value at a point on the surface is related to the value at another point. The range defines the maximum neighbourhood over which control points should be selected to estimate a grid node, to take advantage of the statistical correlation among the observations.
8 Semivariance illustrated
The image below shows the pairing of one point (the red point) with all other measured locations. This process continues for each measured point.
Often each pair of locations has a unique distance, and there are often many pairs of points. To plot all pairs quickly becomes unmanageable. Instead of plotting each pair, the pairs are grouped into lag bins. For example, compute the average semivariance for all pairs of points that are greater than 40 meters apart but less than 50 meters. The empirical semivariogram is a graph of the averaged semivariogram values on the y-axis and thedistance (or lag) on the x-axis (see diagram below).
Relationship between Variance among measure points and distance showing that the more point you use and hence the further away they are the greater the variance in data that will result. This graph is called a semivariogram.
9 Spatial autocorrelation
Spatial autocorrelation quantifies a basic principle of geography; things that are closer are more alike than things farther apart.
Thus, pairs of locations that are closer (far left on the x-axis of the semivariogram cloud) should have more similar values (low on the y-axis of the semivariogram cloud). As pairs of locations become farther apart (moving to the right on the x-axis of the semivariogram cloud), they should become more dissimilar and have a higher squared difference (move up on the y-axis of the semivariogram cloud).
10 Understanding a semivariogram-the range, sill, and nugget
As previously discussed, the semivariogram depicts the spatial autocorrelation of the measured sample points. Because of a basic principle of geography (things that are closer are more alike), measured points that are close will generally have a smaller difference squared than those farther apart. Once each pair of locations is plotted (after being binned) a model is fit through them. There are certain characteristics that are commonly used to describe these models.
11 The range and sill
When you look at the model of a semivariogram, you will notice that at a certain distance the model levels out. The distance where the model first flattens out is known as the range.
Sample locations separated by distances closer than the range are spatially autocorrelated, whereas locations farther apart than the range are not.
The value at which the semivariogram model attains the range (the value on the y-axis) is called the sill. The partial sill is the sill minus the nugget (see following section).
12 The nugget
Theoretically, at zero separation distance (i.e., lag = 0), the semivariogram value is zero. However, at an infinitely small separation distance, the semivariogram often exhibits a nugget effect, which is some value greater than zero. If the semivariogram model intercepts the y-axis at 2, then the nugget is 2.
The nugget effect can be attributed to measurement errors or spatial sources of variation at distances smaller than the sampling interval (or both). Measurement error occurs because of the error inherent in measuring devices. Natural phenonema can vary spatially over a range of scales (i.e., micro or macro scales). Variation at micro scales smaller than the sampling distances will appear as part of the nugget effect. Before collecting data, it is important to gain some understanding of the scales of spatial variation that you are interested in.
An example of a real semivariogram is shown below.
13 An Omnidirection Semivariogram
An example of an omni-direction semivariogram. The undulating black line in the graph is the plot of calculated variances, plotted on the Y-axis, and their corresponding Lag distances on the X-axis. This plot is given the term experimental semivariogram. The jagged nature of the experimental semivariogram makes it unsuitable for use in calculating the kriging weights, so a smooth mathematical function (model) must be fit to the variogram. The model is shown as the white line in the graph
Although the strength of Kriging is its ability to account for directional trends of the data, it is possible to analyze variance with respect to distance only and disregard how points are geographically oriented. The above experimental semivariogram is an example of this, called an omni-directional experimental semivariogram. If geographic orientation is important then a directional semivariogram should be calculated such as the one shown in Figure below.
An example of a directional semivariogram. Notice the two experimental semivariograms, one representing points oriented north and south of each other, and the other representing points east to west of each other.
When two or more directions are analyzed an experimental semivariogram will be generated for each direction. In the above figure, two directions are being investigated and therefore two experimental semivariograms are plotted. Semivariogram experimentation can uncover fundamental information about the data set i.e., does the data vary in more than one direction. In more technical terms the semivariogram experimentation can reveal if the data set is isotropic (varies the same in all directions), or anisotropic (data varies differently in different directions) as demonstrated in Figure 3.13.
When investigating these directional trends it is necessary to have tools available to modify parameters such as the directions in which the variances will be calculated. These parameters are discussed in the following section.
14 Modifying Directional Parameters
Up to this point the directional calculation of variance has been discussed as being north-south and east-west. In reality data sets will not have directional variations that are described in these exact directions. Therefore it is necessary to create a model that 'looks' in the direction in which the data is varying. This is done by modifying the number of different directions analyzed, the angle in which they are oriented, and the degree of tolerance that will be afforded to each direction.
In the above diagram two directions are analyzed, represented by the dark and light grey pie
shapes. It is important to note that although the diagram shows four pies, variance analysis is always performed in opposing directions.
When more than one direction is set, the angle to which these sectors will be oriented must be specified. In the above diagram the angles are 0 degrees and 80 degrees. It is unlikely to find data pairs along exactly 0 degrees or 80 degrees orientation, thus it is necessary to define an interval around these exact values for which points will be considered. This interval is known as the tolerance. In the above diagram the 0 degree direction has a tolerance of 45 degrees and the 80 degree direction has a tolerance of 20 degrees. Once the experimental semivariogram has been generated, a model curve can be calculated which closely fits the variogram.
15 Changing the Variogram Model
The variogram models included with Vertical Mapper are Spherical, Exponential, Gaussian, Power, Hole Effect, Quadratic, and Rquadratic (Rational Quadratic). By applying one or more of these models to the different directional semivariograms, the model curve can be adjusted to better represent the variance in the data set. After any of these models are applied they can be further modified by changing the Sill and Range values. The Range is the greatest distance over which the value at a point on the surface is related to the value at another point. Variance between points that are farther apart than the range does not increase appreciably. Therefore the semivariance curve flattens out to a Sill. Note that not all data sets exhibit this behaviour. In Figure below the Sill value is at variance of 12 200 and the Range value occurs at a distance of 12 000.
The Sill is a variance value that the model curve ideally approaches but does not cross. The Range is the distance value at which the variogram model determines where the Sill begins.
Anisotropic Modelling
It is quite natural for the behaviour of a data set to vary differently in one direction as compared to another. For example, a steeply sloping hill will typically vary in two directions. The first is up and down the hill where it varies quickly from the top to bottom, and the second is across the hill where it varies more slowly. When this occurs in a data set it is called anisotropy. When performing Anistropic Modeling the user is essentially guiding the Kriging interpolator to use sample data points that will most accurately reflect the behaviour of the surface. This is achieved by creating additional models for each direction analyzed. When interpolating points oriented in a north-south direction the Kriging weights can be influenced to use the parameters of one model while the points oriented in an east-west direction will be weighted using a different model.
A two-directional, two-model semivariogram
17 Other Kriging Techniques Ordinary Kriging
This method assumes that the data set has a stationary variance but also a non-stationary mean value within the search radius. Ordinary Kriging is highly reliable and is recommended for most data sets
Simple Kriging
This method assumes that the data set has a stationary variance and a stationary mean value and requires the user to enter the mean value.\\ Universal Kriging
This method represents a true geostatistical approach to interpolating a trend surface of an area. The method involves a two-stage process where the surface representing the drift of the data is
built in the first stage and the residuals for this surface are calculated in the second stage. With Universal Kriging the user can set the polynomial expression used to represent the drift surface. The most general form of this expression is:
F(x, y) = a20 * x2 + a11 * xy + a02 * y2 + a10 * x + a01 * y + a00
where a00 is always present but rarely set to zero in advance of the calculation. However, any of the other coefficients can be set to zero. The recommended setting is a first degree polynomial which will avoid unpredictable behaviour at the outer margins of the data set. Block Kriging
Any one of the three Kriging interpolation methods can be applied in one of two forms Punctual or Block. Punctual Kriging (the default) estimates the value at a given point and is most commonly used. Block Kriging uses the estimate of the average expected value in a given location (such as a \"block\") around a point. Block Kriging provides better variance estimation and has the effect of smoothing interpolated results.
18 A Quick Flash Summary of Geostatistics - Kriging and the Semivariogram E
Developing Triangular Irregular Network (TIN) models for elevation, slope and aspect modelling
Triangulation is a process of grid generation that is most commonly applied to data that requires no regional averaging, such as elevation readings. The surface created by triangulation passes through (honours) all of the original data points while generating some degree of \"overshoot\" above local high values and \"undershoot\" below local low values.
Elevation is an example of point values that are best \"surfaced\" with a technique that predicts some degree of over- and underestimation. In trying to model a topographic surface from scattered elevation readings, it is not reasonable to assume that data points were collected at the absolute top or bottom of each local rise or depression in the land surface. This is especially true in modeling subsurface layers using elevation readings from bore hole data.
1 The Essentials of a TIN model
Using triangulation, the interpolated surface passes through the original data points, however, peaks and valleys will extend beyond the local maximum and minimum values.
Triangulation involves a process whereby all original data points are connected in space by a network of triangular faces, drawn as equilateral as possible, forming what is referred to as a Triangular Irregular Network (TIN) shown in Figure 3.2. Points are connected based on a nearest neighbour relationship (the Delaunay criterion) which states that a circumcircle drawn around any triangle will not enclose the vertices of any other triangle.
An example of the triangular irregular network (TIN)
2 A TIN model explained in more detail
A smooth grid surface is then fitted to the TIN using a bivariate fifth-order polynomial expression in the X and Y direction for each triangle face. This method guarantees continuity and smoothness of the surface along the sides of each triangle and smoothness of the surface within each triangle. The slope blending algorithm is designed to calculate new slope values for each of the triangle vertices (i.e., each point of the data set) where the influence of adjacent slopes in the blending calculation are weighted according to specified triangle properties.
Five properties of data point geometry and value greatly influence the ability of the slope blending algorithm to control smoothing of the TIN surface. These include:
a) triangle centroid location,
b) triangle aspect ratio,
c) triangle area,
d) angle versus slope of the triangle, and
e) statistically-derived slope of a triangle vertex.
For example, triangles with centroids farther from the vertex being solved have less influence on the slope calculation than triangles whose centroids are closer, or triangles with greater areas have greater influence in the slope calculation than triangles with a smaller area. The end result is a smoothing process that significantly reduces the frequency of angular artifacts, representing remnants of the original TIN facets, in the final gridded surface.
3 Choices when modelling a TIN
Linear Solution which calculates the grid values directly from the TIN surface, therefore no derivative slope solution is applied. The result is a grid surface that exactly duplicates the angular appearance of the TIN.
Fifth Order Solution applies a complex polynomial expression to the calculation of each grid node value. The calculation is based on solving for a number of slope derivatives. The result is a more highly smoothed surface that displays minor angular artifacts from the original TIN. F
Spatial analysis of categorical data using Neigbourhood Analysis (e.g. generation of soil maps)
While the major strength of interpolation lies in its ability to create a continuous grid from non-continuous data points, not all types of data are best represented as a continuously varying surface.
Some types of point data should be mapped as discrete regions within which the values assigned to each point are constant. Point data such as this is referred to as having a natural neighbourhood. Examples include soil type and land use.
A network of Thiessen polygons is generated from the point locations, creating what is called a Voronoi diagram.
1 Voronoi Maps Explained
Computing a natural neighbourhood (Voronoi) diagram for all points yields an excellent measure of point density. By calculating the area of the Thiessen polygon encompassing each point, attaching that area as an attribute to the point, and generating a grid of the new point file through interpolation, a representative density surface grid can be produced.
The most challenging task about creating a surface through interpolation is choosing the most appropriate technique. All interpolators will create gridded surfaces, however, the result may not properly represent how the data behaves through space. The idea of data behaving through space refers to how the values change from one location to the next. For example, if an elevation surface were created from sample points taken in a mountainous area, it would be necessary to choose a technique that could simulate the severe elevation changes because this is how this type of data behaves. G
Which Interpolation methods to use?
It is not always easy to understand how data behaves before commencing with the gridding process and therefore it can be difficult to know what technique should be used.
TIN Triangular Irregular Network
NN Natural Neigbour
IDW Inverse Distance Weighting
Kriging
However, there are some questions that can be asked about a data set that will help determine the most appropriate technique. These questions are listed below.
What kind of data is it or what do the data points represent?
1 Some interpolation techniques can be automatically applied to certain data types.
Data Type Possible Interpolation
Elevation TIN, NN
Soil Chemistry IDW, Kriging
Demographic NN, IDW, Kriging
Drive Test NN
How accurate is the data?
Some techniques assume that the value at every data point is an exact value and will honour it when interpolating. Other techniques assume that the value is more representative of an area.
Point Value Accuracy Possible Interpolator
Very Accurate NN, TIN, Rectangular
Not Very Accurate IDW, Kriging
What does the distribution of the points look like?
Some interpolation techniques produce more reasonable surfaces when the distribution of points is truly random. Other techniques work better with point data that is regularly distributed.
2 Application of Interpolation Techniques Illustrated
3 So lets have a look at some typical point data that you generate and work out which interpolated works best.
4 Is interpolation processing speed a factor?
All interpolation techniques have certain factors that will influence the speed of interpolation. Two factors common to all interpolators is the cell size and the number of points. The smaller the cell and/or the more points in the data set, the longer it takes to calculate the surface. However, some interpolators are faster than others.
Interpolator Speed Limiting Factors
TIN Fast None
IDW Fast Search and Display Radius size
Rectangular Very Fast Search Radius size
NN Slow Point distribution
Kriging Slow Number of directions analyzed
5 Is it necessary to over/undershoot the local Min. and Max. values?
Some interpolators allow for overshooting and undershooting the local minimum and maximum values in a data set. This is generally necessary when interpolating elevation surfaces.
Over/Undershoot? Interpolators
Yes TIN, NN
No IDW, Rectangular,Kriging
6 A Quick Flash Summary of the Other Methods of Spatial Analysis
因篇幅问题不能全部显示,请点此查看更多更全内容