InterpML schema requirements
From Intamap
This page gives the functional and non-functional requirements for the InterpML schema as a simple list of what the representation should be able to do, and what it can cope with!
Some terms:
- : .
Functional requirements - things it can do
Many of these are taken from the R INTAMAP setup page.
- Define an interpolation request:
- Define the observation set to be used (O&M document, maybe from a SOS?)
- Define the domain for which predictions are required
- Define the pre-processing steps to be taken
- Define the interpolation method to be used
- Define the target required (e.g. mean, mean + variance, realisation)
- Define any post-processing desired
- Does this include validation? I hope maybe that is another schema???
- Convey the interpolation result to user (with UncertML)?
- Decribe the preprocessing method used and resulting metadata
- Describe the interpolation method used
- Describe the parameters found e.g. variogram used, might include posterior distributions?
- Return the interpolation result
JdeJ: I Think that all the "define" points should be done in a WPS-XML request and not by InterpML. InterpML sould only contain the original data points, metadata of the interpolation, the interpolation it self.
What happens if no detail is specified - i.e. the Automatic part??????
In more detail:
Define the observation set to be used (O&M document, maybe from a SOS?)
- Contain an O&M document, or a reference to such (e.g. a SOS or URL) which is the set of observations to be processed. Ideally these will contain an UncertML description of their associated errors. Observations might also contain information about their support.
- JdeJ Observations should also be passed as GML point geometry and/or GML feature (this is a very loose approach and keen to produce parsing errors)
- If we want to do bias correction there must be a grouping mechanism for observations - it is not clear to me what that ought to be. Options are:
- Specific flag on each observation to determine group?
- Separate file to allocate observations (via ID's or location?) to different groups?
- JdeJ: Maybe something like: Doc1, Doc2,Doc3 totally as independent documents inside the WPS request XML document and some WPS input defining the groups, like Group1=Doc1, Group2=Doc2,Doc3. Again grouping should only be used if necessary
Define the domain for which predictions are required
- Specify the domain as a GML geometry object. Permissible domains include:
- points - a single point or coverage of points (these should also be used to model line requests and tricky grids)
- polygons - these have a spatial support - again these might be single or in a coverage.
- grids - these will be regular or irregular grids: should define clearly what is returned: e.g. grid box means or punctual means?
JdJ: (From above) each document with points could be considered a different domain (even if they share the same geographical properties) the domain approach is very common on the OGC schemas, in the end we would "group" domains. Each domain would be a WPS input
Define the pre-processing steps to be taken: see R INTAMAP setup page
- Do you want to cluster the observations?
- Define the clustering methods to be used
- Represent the resulting clusters in the observations
- Do you want to bias correct the observations (locally or regionally)?
- Define the bias removal method
- Store and transmit the biases identified for each network / region
- Do you want anisotropy detection?
- Define the anisotropy method to be used
- Represent the results: direction and magnitude of anisotropy; clusters used.
Define the interpolation method to be used
- Allow a named method to be selected (e.g. autokrige, projected process kriging, Bayesian kriging)
- Allow specification of details of the method to be used e.g. for projected process kriging the number of active points, for others maybe the number of points to use in the neighbourhood?
- Define the model form to be used:
- Describe the variogram or covariance function form
- Optionally provide parameter values (not re-estimated), initial values (used as a first guess in optimisation) and priors (this could be done with UncertML).
- Return the encoded parameter values or distributions.
- Describe the mean function to be used
- Optionally provide parameter values (not re-estimated), initial values (used as a first guess in optimisation) and priors (this could be done with UncertML).
- Return the encoded parameter values or distributions.
- This could get complex - do we allow covariates or only simple terms in location?
- If we allow covariates how are these supplied? Honestly this is a can of worms. Maybe someone could attempt this, but automating is a real nightmare!!!
- Describe the variogram or covariance function form
How much control do we really want to give here??? In the R interface maybe more, for the WPS maybe we should keep it simple?
Structure versus flexibility? Can we define some basic things (variogram / covariance) that will always be there, and all algorithm specific stuff gets put into a string! Up to us to parse that within each method?
Define the target required (e.g. mean, mean + variance, realisation)
- Specify what is the desired target. Possibilities include:
- Mean, or mean and marginal variance, or mean and covariance
- Probability of exceeding a threshold (define the threshold)
- Marginal pdf (for non-Gaussian approaches)
- Moments (for non-Gaussian approaches)
- Realisations from the posterior (conditional simulations)
These will be encoded in UncertML and will be provided over the domain requested
Define any post-processing desired
- Define the method to be used for aggregation over different supports
I think that is all that is needed here for now?
Convey the interpolation result to user (with UncertML)
This should be an UncertML document, together with the metadata (about the pre-processing and interpolation methods) and the domain information.
JdeJ: This is the objective of InterpML, to return the interpolation values will all the metadata associated. The interpolation type, pre and post proceses applied, semivariogram type, anisotropia (if found and used) and of course cross validation should be contained as the major components of InterpML, the covariance an other statistical values will be inside UncertML that will extend InterpML.
JdeJ: I think that this approach will create a simpler InterpML with a very specific function(s) and it could be presented to the OGC as a standard.
Describe the interpolation method used, parameters found, and return this with the interpolation result
This should use the same schema as is required to specify the prior knowledge but with the estimated / posterior values included.
Non-functional requirements - things it should be
- reliability:
- usability: the schema should be simple to deploy with other technologies such as GML and should have a well defined naming convention that employs standard geostatistical language throughout.
- JdeJ: It should define Semivariograms and as XML objects
- performance: the schema should be as compact as possible, but not at the price of usability, portability and extensibility which are all more important.
- portability: the schema should be able to be used with other standards.
- JdeJ: I see it more "InterpML using other standards like GML/O&M/UncertML" other than "GML/O&M using InterpML" but in the end "the more flexible the better"
- extensibility: the schema should allow extension to include other interpolation methods.
JdeJ: 10000% agreement :) Back to System Requirements.
