Notes on set up and use wps
From Intamap
Contents |
Extra notes on set up and use of the INTAMAP system
These notes should complement the generic How to set up and use the WPS
Tomcat
Tomcat is a very popular Java servelet server, that is used to run the INTAMAP system.
The INTAMAP system was developed using tomcat 5.5, but it is also possible to use tomcat6. For installation it is advisable to use the following debian/ubuntu repository packages:
Java: openjdk-6-jre openjdk-6-jdk Tomcat: tomcat5.5 tomcat5.5-admin tomcat5.5-webapps
After installation, tomcat is probably available in port 8180 or 8080, to check it is sufficient to access the url:
http://localhost:8180/sample/hello.jsp
Normally the JAVA path is not set and will make tomcat to report errors. It can be easily set as bash variable:
prompt> export JAVA_HOME=/usr/lib/jvm/java-6-openjdk
Note that the JAVA_HOME points to the openjdk.
To add this path and the working paths of Tomcat, it is necessary to add the following to file /etc/bash.bashrc (or to what ever file that will contain environment variables are set when a prompt is started)
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk
export CATALINA_HOME=/usr/share/tomcat5.5
export CATALINA_BASE=$CATALINA_HOME
export CATALINA_TMPDIR=$CATALINA_HOME/temp
An incorrect CATALINA_HOME path will generate "resource not found errors"
As default the tomcat installation doesn't provide the admin role, and therefore the access to tomcat's manager is restricted, therefore it is necessary to change the file /etc/tomcat5.5/tomcat-users.xml to include an admin user.
<tomcat-users> <role rolename="manager"/> <role rolename="tomcat"/> <role rolename="admin"/> <role rolename="role1"/> <user username="tomcat" password="tomcat" roles="tomcat,admin,manager"/> <user username="both" password="tomcat" roles="tomcat,role1"/> <user username="role1" password="tomcat" roles="role1"/> </tomcat-users>
In this example the user tomcat (with password tomcat) is administrator and will be able to upload the intamap.war file.
The startup/shutdown of tomcat5.5 is done in /etc/init.d/tomcat5.5, sometimes this script may cause some problems and with is best to use the scripts in the $CATALINA_HOME/bin/startup.sh and $CATALINA_HOME/bin/shutdown.sh to start/stop the service. This will be import for the integration with Apache.
Currently the tomcat is working as localhost:8180, if necessary the configuration can be changed in the file /etc/tomcat5.5/server.xml
<Host name="localhost" appBase="webapps"
unpackWARs="true" autoDeploy="true"
xmlValidation="false" xmlNamespaceAware="false">
Where appBase is the path below CATALINA_HOME that will contain the served applications.
The port is defined in the following section:
<Connector port="8180" maxHttpHeaderSize="8192"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
connectionTimeout="20000" disableUploadTimeout="true" />
INTAMAP system
The first stage of installation in the creation of the intamap.war file, as described in the Setting up the WPS. To do it, it is necessary the package
ant build
If the java paths are incorrect or pointing to an incorrect java version, ant will not report anything to the prompt or even raise an error when creating the intamap.war
The deploy of the intamap.war is straight forward when using the tomcat administrator page.
The intamap system will be installed in /usr/share/tomcat5.5/webapps/intamap , assuming that server.xml wasn't changed and tomcat is installed in the default path.
Inside the intamap system there is the folder /config here 2 files that need to be configured for a correct WPS output:
wpsCapabilitiesSkeleton.xml wps_config.xml
The first one, defines the server information that will be shown when the user makes a GetCapabilities and Describeprocess request. Things like Title,Keywords, ServiceProvide, Language etc. are defined in this file.
The second configuarion file wps_config.xml is important for the correct status response of the WPS. Here it is necessary to define the URL to the WPS's storage directory, hostname and port of WPS server. If this these parameters are incorrect a user will not be able to check on the status of his process.
This is the most important section of wps_config.xml
<Server hostport="8180" includeDataInputsInResponse="false" hostname="remwps2.jrc.ec.europa.eu" computationTimeoutMilliSeconds="5" cacheCapabilites="true" webappPath="intamap">
The default wps_config.xml uses webappPath="wps", at least this parameter needs to be changed to webappPath="intamap"
For a proper presentation of the server it is advisable to add a proper welcome webpage to the /intamap folder, the index.html will be served as default. If necessary the default presentation web page can be changed (for example to index.php) in the configuration file /intamap/WEB-INF/web.xml
The file /intamap/WEB-INF/classes/intamap.properties contains the configuration properties to access the Rserve (see the section below on Rserve), and in case of debugging it is necessary to uncomment the line rimage and to point is to some folder with write permissions, for example:
rimage=~/debug.img
This will save an R session that contains the dataset sent for interpolation, parameters used and error messages.
R compilation
Normally R-base can be downloaded and installed from repository, BUT these distribution packages are not compiled with shared-lib support which is necessary for the Rserve and also for Rpy2 (crossvalidation services)
3 import aspects of the R compilation:
- Use of the flag --enable-R-shlib
- Use of internal BLAS library (--with-blas and --with-lapack shouldn't point to other libraries (explanation below))
- Use of CAIRO (it may be necessary to download the CAIRO develop graphical libraries (libcairo2-dev))
The CAIRO library is only used if the server will also contain the crossvalidation service, for the normal INTAMAP and interpolation service it is not necessary.
Since RServe runs as a deamon it needs that R runs as a library, therefore the --enable-R-shlib as compilation flag.
The use of external libraries like ATLAS will accelerate matrix inversion and calculations by 10x (a good tutorial on the benefits of external optimized libraries can be found in [1] ).
So why it is not recommend to install it ?! The problem is related to an optimization conflict when using the copula interpolation method. The spatialCopula package has advance fortran optimizations that are incompatible with ATLAS. If your WPS service reports an R error like this:
<ns:Exception exceptionCode="JAVA_RootCause"> <ns:ExceptionText>Error in optim(correlation$params, optimfun2, gr = NULL, margin = margin, : L-BFGS-B needs finite values of 'fn'</ns:ExceptionText></ns:Exception>
Then your R software is using an external ATLAS/BLAS library that conflicts with spatialCopula.
Therefore the compilation command for R, should be like this:
./configure --enable-R-shlib --with-cairo
As final configure output report, there should be the following output (more or less) :
Additional capabilities: PNG, JPEG, iconv, MBCS, NLS, cairo Options enabled: shared R library, shared BLAS, R profiling, Java
psgp
The psgp R package provides the PSGP (projected sparse Gaussian process) method of interpolation and this method is highly optimized for data-set that contain a high number of observations (>1000).
The optimizations are based in internal algorithm but also on the use of BLAS, LAPACK,FTTW, and IT++ libraries.
The fist step should be the installation of following packages:
libblas-dev libblas3gf liblapack-dev liblapack3gf libfftw3 libfftw-dev
These libraries have CPU specific compilations flags for optimization (for example the use of 3gf), it is also possible to download the source codes and compiled them as shared libraries. This should increase the code optimization for the machine being used. For more information see: BLAS LAPACK FFTW
The source code for the IT++ can be obtained here
http://itpp.sourceforge.net/
It is important that IT++ is compiled with LAPACK and FFTW support.
./configure --with-lapack="-llapack-3" --with-fft="-lfftw3"
The fina configure should contain the following:
External libs: - BLAS ........... : yes * MKL .......... : no * ACML ......... : no * ATLAS ........ : no - CBLAS .......... : yes - LAPACK ......... : yes - FFT ............ : yes * MKL .......... : no * ACML ......... : no * FFTW ......... : yes Compiler/linker flags/libs/defs: - CXX ............ : g++ - F77 ............ : g77 - CXXFLAGS ....... : -DASSERT_LEVEL=1 -O3 -fno-exceptions -pipe - CXXFLAGS_DEBUG . : - CPPFLAGS ....... : - LDFLAGS ........ : - LIBS ........... : -lfftw3 -llapack -lblas
It is better to use a 3.* version of GCC to compile IT++ because the fortran support and C compilation in gcc 4.* isn't very good.
Instead of compiling, it is possible to download the IT++ packages from the "lenny" and "sid", Debian repository. The openSuse science repository also contains the RPM versions.
libitpp-dev libitpp6gf
The psgp package installation will use the above libraries for its compilation. Even if the compilation of psgp is sucessfull it may be the case that during runtime some function/functionalities maybe be missing.
Apache
Tomcat runs on port 8080 or 8180, this can be a problem in systems that have tight security and port control. When using APACHE it is possible to accept WPS requests in port 80 or 443 and redirect them to Tomcat and the INTAMAP system (and vice-versa).
The Apache server should be compiled with the following flags
./configure --prefix=/usr/local/apache2 --with-mpm=prefork --disable-charset-lite --disable-include --enable-env --enable-setenvif --disable-status --disable-autoindex --disable-asis --disable-negotiation --disable-userdir --enable-alias --with-deflate --enable-log-forensic --enable-logio --enable-unique-id
Flags like --enable-unique-id or --enable-log-forensic are necessary for the mod_security module and for a more verbose logging. The flags --enable-env and --enable-setenvif pose a security threat but they are necessary for the execution of CGI-BIN scripts (for example the python WPS implementation and the crossvalidation service)
Mod Security
The mod_security module is an web application firewall, basically it big regular expression engine that controls the inputs and outputs of Apache and drops/logs anything that maybe a security threat.
The mod_security source code and documentation can be found here
Aside from the compilation examples of the modsecurity documentation it is possible to use a direct call to axps from the folder that contains the source code
/usr/local/apache/bin/apxs -cia mod_security.c
This will raise an error informing that the apr library couldnt be found, but making make && make install will compile and install the module.
Some tutorial on the installation and use of mod_security can be found here 1 2
The mod_security defines limits for the body of the HTTP requests and size out responses, this has to be change to fit the needs of the WPS. This "change" just means an increase in limits. Below it is an example of a mod_security section in a httpd.conf file
LoadFile /usr/lib/libxml2.so
LoadModule security2_module modules/mod_security2.so
<IfModule security2_module>
# Basic configuration options
SecRuleEngine On
SecRequestBodyAccess On
#To check the response for some issued command
SecResponseBodyAccess On
# Handling of file uploads
# TODO Choose a folder private to Apache.
# SecUploadDir /opt/apache-frontend/tmp/
SecUploadKeepFiles Off
# Debug log
SecDebugLog /var/log/apache2/modsec_debug.log
SecDebugLogLevel 0
# Serial audit log
SecAuditEngine RelevantOnly
SecAuditLogRelevantStatus ^5
SecAuditLogParts ABIFHZ
SecAuditLogType Serial
SecAuditLog /var/log/apache2/modsec_audit.log
# Maximum request body size we will
# accept for buffering
# 128 megas
SecRequestBodyLimit 134217728
#Max response size
SecResponseBodyLimit 134217728
# Set server signature fake signature
SecServerSignature "Microsoft-IIS/3.0"
</IfModule>
The majority of the security rules are defined in external files, for example in the modsecurity_crs_35_bad_robots.conf we have the rules for the User-agents that are accepted/rejected, normally WPS request are submitted by Wget which is one of the User-agents rejected (at least in version 2.5.9)
This rejection is defined in rule id 990011 (around line 29 in the file modsecurity_crs_35_bad_robots.con)
SecRule REQUEST_HEADERS:User-Agent "(?:\b(?:(?:indy librar|snoop)y|microsoft url control|lynx)\b|mozilla\/2\.0 \(compatible; newt activex; win32\)|w(?:3mirror|get)|download demon|l(?:ibwww|wp)|p(?:avuk|erl)|big brother|autohttp|netants|eCatch|curl)" \"chain,phase:2,t:none,t:lowercase,log,auditlog,msg:'Request Indicates an automated program explored the site',id:'990011',tag:'AUTOMATION/MISC',severity:'5'"
To allow wget it is necessary to chage the w(?:3mirror|get) into w(?:3mirror)
mod_jk
Mod_jk is how Apache connects to Tomcat. The connection is done thru an open port in the Tomcat system. The port number is 8003 (normally) and it is identified as AJP13
For tutorials on the installation and use of mod_jk please read the following documentation/sites 1 2
The INTAMAP system is a normal servelet and it uses the normal mod_jk configurations
An example of the workers.properties file
worker.list=Worker worker.Worker.port=8009 worker.Worker.host=localhost worker.Worker.type=ajp13
The use of localhost should be the default option since this the AJP13 port should only be accessible locally and any external traffic should be blocked by Iptables.
The use of INTAMAP system will be set on the httpd.conf by defining the mounting point (the tomcat folder that will be served by Apache):
<IfModule mod_jk.c>
:
:
JkWorkersFile /usr/local/apache2/conf/workers.properties
JkLogFile /usr/local/apache2/log/mod_jk.log
JkLogLevel error
:
:
JkMount /intamap/ Worker
</IfModule>
Note the mount points to the Worker defined in the workers.property file, a single workers file may contain many workers with different configurations
If everything is ok, a request like this should work:
http://localhost/intamap/WebProcessingService
The server name is defined in the http.conf file, if the server name is changed it maybe necessary to recompile the mod_jk to avoid "strange problems" of concerning paths and URLs
At this stage, Tomcat can be accessed by indicating its port in the url:
http://localhost:8080/intamap/WebProcessingService
To close this access, it is sufficient to access the server.xml file (/etc/tomcat5.5/server.xml) and put the server port connection definition inside html comment tags:
<!--
<Connector port="8180" maxHttpHeaderSize="8192"
maxThreads="150" minSpareThreads="25" maxSpareThreads="75"
enableLookups="false" redirectPort="8443" acceptCount="100"
connectionTimeout="20000" disableUploadTimeout="true" />
--!>
This will deactivate port 8180, hidding Tomcat behind Apache.
Even with port 8180, tomcat will leave a small cat-tale behind. The tomcat system accepts shutdown commands thru port 8005. It is better to use IPTables to block the external access to this port.
The INTAMAP configuration should also be changed indicating the new port of access to the INTAMAP system. The port name in wps_config should be changed to 80
<Server hostport="80" includeDataInputsInResponse="false" hostname="remwps2.jrc.ec.europa.eu" computationTimeoutMilliSeconds="5" cacheCapabilites="true" webappPath="intamap">
The port number should always be present, otherwise there is the risk that a status response could look like this:
http://remwps2.jrc.ec.eu:/intamap/RetriveResultServelet?id=1238602206897
The : will be ignored by the majority of the internet browsers, but if the URL is being used in programming context like cURL or UrlLib is it very likely that these libraries will consider it an invalid URL structure.
Iptables
IPtables can be tricky..... In the case of INTAMAP system it is mainly necessary to close the shutdown port of tomcat to the outside. Also it is necessary for the server to be able to make out going connection if it has the crossvalidation service.
So the iptable needs to: -Accept everything originating from localhost or server IP -Accept everything going to port 80 -Drop external traffic going to port 8009 (ajp13) and port 8005 (tomcat shutdown) -Close external traffic going to port 6311 (Rserve)
Maybe: -Use DNS services -Accept input from traffic originated in the server (crossvalidation server)
A IPtable could look like this:
#Flushing table iptables -F iptables -X iptables -Z iptables -t nat -F #Default policies iptables -P INPUT ACCEPT iptables -P OUTPUT ACCEPT iptables -P FORWARD ACCEPT iptables -t nat -P PREROUTING ACCEPT iptables -t nat -P POSTROUTING ACCEPT #Accept everything from localhost iptable -A INPUT -s 127.0.0.1 -i lo -j ACCEPT iptable -A INPUT -s <SERVER IP> -i eth0 -j ACCEPT #change eth0 to what ever needed and indicate the server ip #Accept everything that starts from the server(udp is necessary for the DNS requests) iptables -A INPUT -p tcp -m state --state ESTABLISHED -j ACCEPT iptables -A INPUT -p udp -m state --state ESTABLISHED -j ACCEPT #allow access to HTTP from outside iptables -A INPUT -s 0.0.0.0/0 -p tcp --dport 80 -j ACCEPT #Close everything from tomcat #The reject with tcp-reset will indicate a non existing service iptables -A INPUT -s 0.0.0.0/0 -p tcp --dport 8009 -j REJECT --reject-with tcp-reset iptables -A INPUT -s 0.0.0.0/0 -p tcp --dport 8005 -j REJECT --reject-with tcp-reset #Close Rserve iptables -A INPUT -s 0.0.0.0/0 -p tcp --dport 6311 -j REJECT --reject-with tcp-reset #Close everything else iptables -A INPUT -s 0.0.0.0/0 -p tcp --dport 1:65535 -j DROP iptables -A INPUT -s 0.0.0.0/0 -p udp --dport 1:65535 -j DROP
This is a example of a IPtables script, different users require different services, for example SSH or SVN
