Notes on set up and use wps

From Intamap

Contents

Extra notes on set up and use of the INTAMAP system

These notes should complement the generic How to set up and use the WPS

Tomcat

Tomcat is a very popular Java servelet server, that is used to run the INTAMAP system.

The INTAMAP system was developed using tomcat 5.5, but it is also possible to use tomcat6. For installation it is advisable to use the following debian/ubuntu repository packages:

Java:
openjdk-6-jre
openjdk-6-jdk

Tomcat:
tomcat5.5
tomcat5.5-admin
tomcat5.5-webapps

After installation, tomcat is probably available in port 8180 or 8080, to check it is sufficient to access the url:

http://localhost:8180/sample/hello.jsp

Normally the JAVA path is not set and will make tomcat to report errors. It can be easily set as bash variable:

prompt> export JAVA_HOME=/usr/lib/jvm/java-6-openjdk

Note that the JAVA_HOME points to the openjdk.

To add this path and the working paths of Tomcat, it is necessary to add the following to file /etc/bash.bashrc (or to what ever file that will contain environment variables are set when a prompt is started)

export JAVA_HOME=/usr/lib/jvm/java-6-openjdk

export CATALINA_HOME=/usr/share/tomcat5.5

export CATALINA_BASE=$CATALINA_HOME

export CATALINA_TMPDIR=$CATALINA_HOME/temp

An incorrect CATALINA_HOME path will generate "resource not found errors"


As default the tomcat installation doesn't provide the admin role, and therefore the access to tomcat's manager is restricted, therefore it is necessary to change the file /etc/tomcat5.5/tomcat-users.xml to include an admin user.

<tomcat-users>
  <role rolename="manager"/>
  <role rolename="tomcat"/>
  <role rolename="admin"/>
  <role rolename="role1"/>
  <user username="tomcat" password="tomcat" roles="tomcat,admin,manager"/>
  <user username="both" password="tomcat" roles="tomcat,role1"/>
  <user username="role1" password="tomcat" roles="role1"/>
</tomcat-users>

In this example the user tomcat (with password tomcat) is administrator and will be able to upload the intamap.war file.

The startup/shutdown of tomcat5.5 is done in /etc/init.d/tomcat5.5, sometimes this script may cause some problems and with is best to use the scripts in the $CATALINA_HOME/bin/startup.sh and $CATALINA_HOME/bin/shutdown.sh to start/stop the service. This will be import for the integration with Apache.

Currently the tomcat is working as localhost:8180, if necessary the configuration can be changed in the file /etc/tomcat5.5/server.xml

 <Host name="localhost" appBase="webapps"
       unpackWARs="true" autoDeploy="true"
       xmlValidation="false" xmlNamespaceAware="false">

Where appBase is the path below CATALINA_HOME that will contain the served applications.

The port is defined in the following section:

<Connector port="8180" maxHttpHeaderSize="8192"
               maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
               enableLookups="false" redirectPort="8443" acceptCount="100"
               connectionTimeout="20000" disableUploadTimeout="true" />

INTAMAP system

The first stage of installation in the creation of the intamap.war file, as described in the Setting up the WPS. To do it, it is necessary the package

ant build

If the java paths are incorrect or pointing to an incorrect java version, ant will not report anything to the prompt or even raise an error when creating the intamap.war

The deploy of the intamap.war is straight forward when using the tomcat administrator page.

The intamap system will be installed in /usr/share/tomcat5.5/webapps/intamap , assuming that server.xml wasn't changed and tomcat is installed in the default path.

Inside the intamap system there is the folder /config here 2 files that need to be configured for a correct WPS output:

wpsCapabilitiesSkeleton.xml
wps_config.xml

The first one, defines the server information that will be shown when the user makes a GetCapabilities and Describeprocess request. Things like Title,Keywords, ServiceProvide, Language etc. are defined in this file.

The second configuarion file wps_config.xml is important for the correct status response of the WPS. Here it is necessary to define the URL to the WPS's storage directory, hostname and port of WPS server. If this these parameters are incorrect a user will not be able to check on the status of his process.

This is the most important section of wps_config.xml

<Server hostport="8180" includeDataInputsInResponse="false" 
hostname="remwps2.jrc.ec.europa.eu" computationTimeoutMilliSeconds="5" 
cacheCapabilites="true" webappPath="intamap">

The default wps_config.xml uses webappPath="wps", at least this parameter needs to be changed to webappPath="intamap"

For a proper presentation of the server it is advisable to add a proper welcome webpage to the /intamap folder, the index.html will be served as default. If necessary the default presentation web page can be changed (for example to index.php) in the configuration file /intamap/WEB-INF/web.xml

The file /intamap/WEB-INF/classes/intamap.properties contains the configuration properties to access the Rserve (see the section below on Rserve), and in case of debugging it is necessary to uncomment the line rimage and to point is to some folder with write permissions, for example:

rimage=~/debug.img

This will save an R session that contains the dataset sent for interpolation, parameters used and error messages.


R compilation

Normally R-base can be downloaded and installed from repository, BUT these distribution packages are not compiled with shared-lib support which is necessary for the Rserve and also for Rpy2 (crossvalidation services)

3 import aspects of the R compilation:

  • Use of the flag --enable-R-shlib
  • Use of internal BLAS library (--with-blas and --with-lapack shouldn't point to other libraries (explanation below))
  • Use of CAIRO (it may be necessary to download the CAIRO develop graphical libraries (libcairo2-dev))

The CAIRO library is only used if the server will also contain the crossvalidation service, for the normal INTAMAP and interpolation service it is not necessary.

Since RServe runs as a deamon it needs that R runs as a library, therefore the --enable-R-shlib as compilation flag.

The use of external libraries like ATLAS will accelerate matrix inversion and calculations by 10x (a good tutorial on the benefits of external optimized libraries can be found in [1] ).

So why it is not recommend to install it ?! The problem is related to an optimization conflict when using the copula interpolation method. The spatialCopula package has advance fortran optimizations that are incompatible with ATLAS. If your WPS service reports an R error like this:

<ns:Exception exceptionCode="JAVA_RootCause">
<ns:ExceptionText>Error in optim(correlation$params, optimfun2, gr = NULL, margin = margin,  : 
  L-BFGS-B needs finite values of 'fn'</ns:ExceptionText></ns:Exception>

Then your R software is using an external ATLAS/BLAS library that conflicts with spatialCopula.

Therefore the compilation command for R, should be like this:

./configure  --enable-R-shlib --with-cairo

As final configure output report, there should be the following output (more or less) :

Additional capabilities:   PNG, JPEG, iconv, MBCS, NLS, cairo
Options enabled:           shared R library, shared BLAS, R profiling, Java

psgp

The psgp R package provides the PSGP (projected sparse Gaussian process) method of interpolation and this method is highly optimized for data-set that contain a high number of observations (>1000).

The optimizations are based in internal algorithm but also on the use of BLAS, LAPACK,FTTW, and IT++ libraries.

The fist step should be the installation of following packages:

libblas-dev
libblas3gf
liblapack-dev
liblapack3gf
libfftw3
libfftw-dev

These libraries have CPU specific compilations flags for optimization (for example the use of 3gf), it is also possible to download the source codes and compiled them as shared libraries. This should increase the code optimization for the machine being used. For more information see: BLAS LAPACK FFTW


The source code for the IT++ can be obtained here http://itpp.sourceforge.net/

It is important that IT++ is compiled with LAPACK and FFTW support.

./configure --with-lapack="-llapack-3" --with-fft="-lfftw3"

The fina configure should contain the following:

External libs:
- BLAS ........... : yes
* MKL .......... : no
* ACML ......... : no
* ATLAS ........ : no
- CBLAS .......... : yes
- LAPACK ......... : yes
- FFT ............ : yes
* MKL .......... : no
* ACML ......... : no
* FFTW ......... : yes

Compiler/linker flags/libs/defs:
- CXX ............ : g++
- F77 ............ : g77
- CXXFLAGS ....... : -DASSERT_LEVEL=1 -O3 -fno-exceptions -pipe
- CXXFLAGS_DEBUG . :
- CPPFLAGS ....... :
- LDFLAGS ........ :
- LIBS ........... : -lfftw3 -llapack -lblas

It is better to use a 3.* version of GCC to compile IT++ because the fortran support and C compilation in gcc 4.* isn't very good.

Instead of compiling, it is possible to download the IT++ packages from the "lenny" and "sid", Debian repository. The openSuse science repository also contains the RPM versions.

libitpp-dev
libitpp6gf

The psgp package installation will use the above libraries for its compilation. Even if the compilation of psgp is sucessfull it may be the case that during runtime some function/functionalities maybe be missing.

Apache

Tomcat runs on port 8080 or 8180, this can be a problem in systems that have tight security and port control. When using APACHE it is possible to accept WPS requests in port 80 or 443 and redirect them to Tomcat and the INTAMAP system (and vice-versa).

The Apache server should be compiled with the following flags

./configure --prefix=/usr/local/apache2 --with-mpm=prefork --disable-charset-lite --disable-include 
--enable-env --enable-setenvif --disable-status --disable-autoindex --disable-asis --disable-negotiation
 --disable-userdir --enable-alias --with-deflate --enable-log-forensic --enable-logio  --enable-unique-id 

Flags like --enable-unique-id or --enable-log-forensic are necessary for the mod_security module and for a more verbose logging. The flags --enable-env and --enable-setenvif pose a security threat but they are necessary for the execution of CGI-BIN scripts (for example the python WPS implementation and the crossvalidation service)

Mod Security

The mod_security module is an web application firewall, basically it big regular expression engine that controls the inputs and outputs of Apache and drops/logs anything that maybe a security threat.

The mod_security source code and documentation can be found here

http://www.modsecurity.org

Aside from the compilation examples of the modsecurity documentation it is possible to use a direct call to axps from the folder that contains the source code

/usr/local/apache/bin/apxs -cia mod_security.c

This will raise an error informing that the apr library couldnt be found, but making make && make install will compile and install the module.

Some tutorial on the installation and use of mod_security can be found here 1 2

The mod_security defines limits for the body of the HTTP requests and size out responses, this has to be change to fit the needs of the WPS. This "change" just means an increase in limits. Below it is an example of a mod_security section in a httpd.conf file

LoadFile /usr/lib/libxml2.so
LoadModule security2_module modules/mod_security2.so
<IfModule security2_module>
    # Basic configuration options 
    SecRuleEngine On 
    SecRequestBodyAccess On
    #To check the response for some issued command 
    SecResponseBodyAccess On 

     # Handling of file uploads 
     # TODO Choose a folder private to Apache. 
     # SecUploadDir /opt/apache-frontend/tmp/ 
     SecUploadKeepFiles Off 

     # Debug log 
     SecDebugLog /var/log/apache2/modsec_debug.log 
     SecDebugLogLevel 0 

      # Serial audit log 
     SecAuditEngine RelevantOnly 
     SecAuditLogRelevantStatus ^5 
     SecAuditLogParts ABIFHZ 
     SecAuditLogType Serial 
     SecAuditLog /var/log/apache2/modsec_audit.log 

     # Maximum request body size we will 
     # accept for buffering 
     # 128 megas
     SecRequestBodyLimit 134217728 
     
     #Max response size
     SecResponseBodyLimit 134217728 

     # Set server signature fake signature
     SecServerSignature "Microsoft-IIS/3.0"
</IfModule>

The majority of the security rules are defined in external files, for example in the modsecurity_crs_35_bad_robots.conf we have the rules for the User-agents that are accepted/rejected, normally WPS request are submitted by Wget which is one of the User-agents rejected (at least in version 2.5.9)

This rejection is defined in rule id 990011 (around line 29 in the file modsecurity_crs_35_bad_robots.con)

SecRule REQUEST_HEADERS:User-Agent "(?:\b(?:(?:indy librar|snoop)y|microsoft url control|lynx)\b|mozilla\/2\.0 \(compatible; newt activex; win32\)|w(?:3mirror|get)|download demon|l(?:ibwww|wp)|p(?:avuk|erl)|big brother|autohttp|netants|eCatch|curl)" \"chain,phase:2,t:none,t:lowercase,log,auditlog,msg:'Request Indicates an automated program explored the site',id:'990011',tag:'AUTOMATION/MISC',severity:'5'"

To allow wget it is necessary to chage the w(?:3mirror|get) into w(?:3mirror)

mod_jk

Mod_jk is how Apache connects to Tomcat. The connection is done thru an open port in the Tomcat system. The port number is 8003 (normally) and it is identified as AJP13

For tutorials on the installation and use of mod_jk please read the following documentation/sites 1 2

The INTAMAP system is a normal servelet and it uses the normal mod_jk configurations

An example of the workers.properties file

worker.list=Worker
worker.Worker.port=8009
worker.Worker.host=localhost
worker.Worker.type=ajp13

The use of localhost should be the default option since this the AJP13 port should only be accessible locally and any external traffic should be blocked by Iptables.

The use of INTAMAP system will be set on the httpd.conf by defining the mounting point (the tomcat folder that will be served by Apache):

<IfModule mod_jk.c>
    :
    :
    JkWorkersFile /usr/local/apache2/conf/workers.properties
    JkLogFile /usr/local/apache2/log/mod_jk.log
    JkLogLevel error
    :
    :
    JkMount /intamap/ Worker
</IfModule>

Note the mount points to the Worker defined in the workers.property file, a single workers file may contain many workers with different configurations

If everything is ok, a request like this should work:

http://localhost/intamap/WebProcessingService

The server name is defined in the http.conf file, if the server name is changed it maybe necessary to recompile the mod_jk to avoid "strange problems" of concerning paths and URLs

At this stage, Tomcat can be accessed by indicating its port in the url:

http://localhost:8080/intamap/WebProcessingService

To close this access, it is sufficient to access the server.xml file (/etc/tomcat5.5/server.xml) and put the server port connection definition inside html comment tags:

<!--
<Connector port="8180" maxHttpHeaderSize="8192"
               maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
               enableLookups="false" redirectPort="8443" acceptCount="100"
               connectionTimeout="20000" disableUploadTimeout="true" />
--!>

This will deactivate port 8180, hidding Tomcat behind Apache.

Even with port 8180, tomcat will leave a small cat-tale behind. The tomcat system accepts shutdown commands thru port 8005. It is better to use IPTables to block the external access to this port.

The INTAMAP configuration should also be changed indicating the new port of access to the INTAMAP system. The port name in wps_config should be changed to 80

<Server hostport="80" includeDataInputsInResponse="false" 
hostname="remwps2.jrc.ec.europa.eu" computationTimeoutMilliSeconds="5" 
cacheCapabilites="true" webappPath="intamap">

The port number should always be present, otherwise there is the risk that a status response could look like this:

http://remwps2.jrc.ec.eu:/intamap/RetriveResultServelet?id=1238602206897

The : will be ignored by the majority of the internet browsers, but if the URL is being used in programming context like cURL or UrlLib is it very likely that these libraries will consider it an invalid URL structure.

Iptables

IPtables can be tricky..... In the case of INTAMAP system it is mainly necessary to close the shutdown port of tomcat to the outside. Also it is necessary for the server to be able to make out going connection if it has the crossvalidation service.

So the iptable needs to: -Accept everything originating from localhost or server IP -Accept everything going to port 80 -Drop external traffic going to port 8009 (ajp13) and port 8005 (tomcat shutdown) -Close external traffic going to port 6311 (Rserve)

Maybe: -Use DNS services -Accept input from traffic originated in the server (crossvalidation server)

A IPtable could look like this:

#Flushing table
iptables -F
iptables -X
iptables -Z
iptables -t nat -F

#Default policies
iptables -P INPUT ACCEPT
iptables -P OUTPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -t nat -P PREROUTING ACCEPT
iptables -t nat -P POSTROUTING ACCEPT

#Accept everything from localhost
iptable -A INPUT -s 127.0.0.1 -i lo -j ACCEPT
iptable -A INPUT -s <SERVER IP> -i eth0 -j ACCEPT
#change eth0 to what ever needed and indicate the server ip

#Accept everything that starts from the server(udp is necessary for the DNS requests)
iptables -A INPUT -p tcp -m state --state ESTABLISHED -j ACCEPT
iptables -A INPUT -p udp -m state --state ESTABLISHED -j ACCEPT

#allow access to HTTP from outside
iptables -A INPUT -s 0.0.0.0/0 -p tcp --dport 80 -j ACCEPT

#Close everything from tomcat
#The reject with tcp-reset will indicate a non existing service
iptables -A INPUT  -s 0.0.0.0/0 -p tcp --dport 8009 -j REJECT --reject-with tcp-reset
iptables -A INPUT  -s 0.0.0.0/0 -p tcp --dport 8005 -j REJECT --reject-with tcp-reset

#Close Rserve
iptables -A INPUT  -s 0.0.0.0/0 -p tcp --dport 6311 -j REJECT --reject-with tcp-reset

#Close everything else
iptables -A INPUT -s 0.0.0.0/0 -p tcp --dport 1:65535 -j DROP
iptables -A INPUT -s 0.0.0.0/0 -p udp --dport 1:65535 -j DROP

This is a example of a IPtables script, different users require different services, for example SSH or SVN