How to set up the crossvalidation

From Intamap

Contents

Introduction

The crossvalidation service is a WPS service designed to test the interpolation results of the INTAMAP system, generating statistical values of the interpolation errors.

An explanation of cross-validation in statistics can be found here [1]

Basically, the crossvalidation service will accept a dataset and send it to INTAMAP system to determine the best spatial model that describes the dataset, after it will split the dataset into a training and validating sets. The training set will be used to interpolate the validation set using the initial spatial model. In the final stage the service will compare the true values of the validation set and the ones returned by the interpolation service.

pyWPS

The pyWPS is WPS implementation written in python whose objective is the implementation of GRASS-GIS tools as web services, nevertheless this implementation will provide WPS support for any python script.

Its webpage can be found here: 1

The pyWPS should be installed from the SVN tree since the SVN more updated and more bug free than the debian packages. The source code can be fetched using the SVN command in a common bash prompt

svn checkout https://svn.wald.intevation.org/svn/pywps/trunk

For more information on SVN please see: SVN and R

The pyWPS requires the following packages:

python2.5 (or 2.4 / 2.6)
python-dev
python-setuptools
python-xml
python-htmltmpl

The first 2 packages are common and easily found in debian/ubuntu/redhat/openSuse repositories. The python-htmltmp is a template engine that can be downloaded in source forge: python-htmltmpl download

The package should be decompressed to a temporary folder and to install it is necessary to run the setup.py script

sudo ./setup.py install

This will install the htmltmpl package in your python module folder (normally /usr/lib/python2.6/dist-packages/ or /usr/lib/python2.5/site-packages/ )

To install the pyWPS module it is enough to issue the same setup.py command from the source trunk of the svn

sudo ./setup.py install 

In case of error caused by the compilation of the templates, the pyWPS can be installed without template compilation.

sudo ./setup.py install --dry-run

After installation the pyWPS will be in the python module folder, it is necessary for the pyWPS template folder to have write-read permission (the templates are used by python-htmltmpl)

sudo chmod 777 /usr/lib/python2.5/site-packages/pywps/Templates/1_0_0

Note that your base path maybe different according to your installation or python paths.

The pyWPS variables and options are set in the pywps.cfg file, this file should be copy from /pywps/etc to /etc/pywps.cfg

pywps.cfg

This file contains the general server parameters and the WPS display information. It is a basic PARAMETER=VALUE and divided by specific sections. Here are some important paramters that are used in the pyWPS implementation of the remwps2 server:

serveraddress=http://remwps2.jrc.ec.europa.eu/cgi-bin/wps
maxoperations=8
maxfilesize=600mb
tempPath=/tmp
processesPath=/usr/local/apache2/cgi-bin/processes
outputUrl=http://remwps.2.jrc.ec.europa.eu/wpsoutputs
outputPath=/usr/local/apache2/htdocs/wpsoutputs
debug=true
logFile=/usr/local/apache2/logs/pywps.log

The maxoperations is the number of concurrent WPS processes that can be accepted and run. If the number of requests is exceed the server will raise an WPS ServerBusy Exception. The value 8 is due to the fact that this server is a dual-quadcore server (8 processing cores in total, allocating one core per process)

The maxfilesize defines the size of the POST content, and therefore the XML input being sent to the server. This option maybe hidden by other limiting paramters in the Apache configuration (for example the body size limitations imposed by mod security).

The tempPath is a folder were the WPS has write permission, when the process is lunched a temporary folder is created that will contain inputs and outputs being processed. When the process reaches the end the folder will be erased and its contains move to the /wpsoutputs and will be integrated in the WPS response.

The outputUrl is the URL location of the WPS status response, while outputPath is the exact location of the output in the server machine. the outputPath shoulbe be accessable, with read/write and indicate in the Apache configuration file so that the URL is accessible.

In case the option debug is set to true the pyWPS will log the process activity and change of status to the file indicates in the parameter logFile.

Apache configuration

The wps.py script needs to be in a executable directory (cgi-bin) in the Apache structure. For example the configuration file (http.conf) should look like this:

ScriptAlias /cgi-bin/ "/usr/local/apache2/cgi-bin/"
<Directory "/usr/local/apache2/cgi-bin">
     AllowOverride None
     Options +FollowSymLinks -MultiViews +ExecCGI
     Order allow,deny
     Allow from all
</Directory>

Normally wps.py is locate in /usr/local/bin and in the /cgi-bin there is only a symbolic link.

crossvalidation process

pyWPS implements the processes as a python class inside the pyWPS module. When the wps.py is called the processes/class are loaded and dealt according to the request. The pywps.cfg contains the path to the folder of the processes,

processesPath=/usr/local/apache2/cgi-bin/processes

The processes maybe located in any folder as long that the python interpreter can access them.

To identify the process folder as a python module and for the wps.py to know that processes to load (that are available) is is necessary to create a __init__.py file with the following line:

__all__=["dummyprocess","crossvalidationprocess","ultimatequestionprocess"]

This __all__ list tell python what components to load, basically it contains the name of files that contain the process (without the .py extension). The pyWPS follows the logic of one file - one process. In this example the pyWPS will load 3 processes that identifer by dummyprocess,crossvalidationprocess and ultimatequestionprocess.

crossvalidation process requirements

The crossvalidation process is a "WPS proxy", a client to the crossvalidation process submits the data, parameters and which interpolation server should be used and the process will generate WPS requests with training/validation data set that will be interpolated. In the end everything is compiled and a WPS response is generated. The crossvalidation process requires the following components:

  • python ElementTree version 1.2.7 ] download
  • python cElemenTree version 1.0.5 download
  • Rpy2 version 2.0.3 download
  • python2.5-dev for Rpy2
  • R-cran version 2.8.1 (Cairo SVG library / lattice 0.17-22 / automap 1.0-0)


It is important that Rpy2 is properly installed and that python can connect to it, if the following doesnt produce an error message than everything is ok.

python -c "import rpy2;import rpy2.robjects as robjects;R=robjects.r"


Testing of the crossvalidation can be done either as a cgi-bin or bash command, as bash command:

bash$>/usr/bin/wps.py "service=wps&request=getcapabilities"

cgi-bin in the browser:

http://<your server>/cgi-bin/wps.py?service=wps&request=getcapabilities

If you want to sent a XML file with a WPS request, you have to use the cat command:

bash$>cat <WPS request file> | /usr/bin/wps.py


If there is any problems either in the WPS or in Python they will be reported as WPS exceptions or in between XML comment tags.

One common problem is the loading of R libraries by Rpy2 and the crossvalidation script. One fast solution is to set the R_LIBS and R_HOME in the script it self. Around line 88 there is the comment:

#os.environ["R_LIBS"]="/usr/lib/apache2/Rlibs"

to uncomment is enought to remove the "#". This command will set the R_LIBS environment variable to the path were the Rlibs are, this path can change from one install to the other. Also it is possible to set the R_HOME using the same command.

Note that this os.environ will only work after the import of Python's os module (import os).

Depending on the system and configuration it maybe a problem for pyWPS to locate the process directory or the configuration file. This can be solved using a wrapper script. In this case the wps.py is called from inside a script that first defines the environment variables

For example see: 1