Installation Guide: Intel(R) Cluster Checker sensor plugin for Sensor
System for High Performance Computing (SENSYS)

Intel(R) Cluster Checker includes a sensor plugin for Open Resilent Cluster
Manager (SENSYS). The sensor plugin file is mca_sensor_clck.so found in the
sensys directory within the Intel(R) Cluster Checker installation. This
replaces the 'daemon' capability that was present in Intel(R) Cluster Checker
3.x. The sensor plugin is made available as a technical preview.

If asynchronous data collection is not needed, this can be safely ignored.

SENSYS, the associated PostgreSQL database, and PostgreSQL ODBC drivers must be
installed and setup prior to using the Intel(R) Cluster Checker sensor plugin.
If SENSYS is already installed and setup, skip to the 'Intel(R) Cluster Checker
sensor plugin setup' step. Otherwise, start with the 'SENSYS installation'
step.

Contents.
1. PostgreSQL installation and setup
2. SENSYS installation
3. SENSYS database setup
4. SENSYS daemon setup
5. Intel(R) Cluster Checker sensor plugin setup
6. Data aggregation
6.1. Single node providers
6.2. Cluster providers
7. Intel(R) Cluster Checker data analysis


---------------------------------
1. PostgreSQL installation and setup
---------------------------------
Please refer to https://www.postgresql.org/download/ for information on how to
install and set up the PostgreSQL database.

The PostgreSQL server, client and ODBC packages are required for data
aggregation through SENSYS daemons and data analysis through Intel(R) Cluster
Checker. The minimum version of PostgreSQL required is 9.3.

The recommended method to install the required packages is through the package
manager available with the system. Please refer to the PostgreSQL documentation
on instructions for adding the appropriate repositories to install from.

---------------------------
2. SENSYS installation
---------------------------
The sources for SENSYS can be obtained from
https://github.com/intel-ctrlsys/sensys. SENSYS uses autotools, and the steps
for installation are similar to the installation steps for other products that
use autotools. The dependencies for building SENSYS can be found at
https://intel-ctrlsys.github.io/sensys/2-Build-and-Installation-Guide/2.1-Sensys-Build-and-Installation/2.1.06-Build-From-GitHub-Repo.html.


It is recommended that the SENSYS installation directory be a location that is
shared across the cluster.

steps -

# ./autogen.pl
# ./configure --prefix <sensys_install_directory>\
    --with-postgres=<postgres_install_directory>
# make
# make install

Note that in order to use the Intel(R) Cluster Checker plugin, it is
required that SENSYS is built with shared module support. Shared module support
is enabled by default, and is enabled by adding the "--enable-shared=yes
--enable-static=no" flags to configure.


Additionally, the "--with-postgres" option is required in order to write to a
database, which is required for analysis with Intel(R) Cluster Checker. This
option should point to the directory where PostgreSQL was installed in step 1,
such as "--with_postgres=/usr/pgsql-9.6"

---------------------------
3. SENSYS database setup
---------------------------
Please refer to the instructions provided at
https://github.com/intel-ctrlsys/sensys/wiki/2.2.1-Database-Server in order to
prepare the PostgreSQL server for data aggregation through SENSYS.

The server URI, database name, database user and password are required for
SENSYS.

In order to enable the analysis of data aggregated via SENSYS, a database view
named clck_1 must be created. The following command can be used to create the
view.

CREATE VIEW clck_1 AS
SELECT t.event_id as rowid,
       t.Hostname as Hostname,
       EXTRACT(EPOCH FROM t.time_stamp) AS Timestamp,
       t.value_str as Provider,
       t1.value_real AS Duration,
       t2.value_int as Encoding,
       t3.value_int as Exit_status,
       t4.value_str as node_names,
       t5.value_int as num_nodes,
       t6.value_str as OptionID,
       t7.value_str as STDERR,
       t8.value_str as STDOUT,
       t9.value_int as Version,
       t10.value_str as Username,
       EXTRACT(EPOCH FROM t.time_stamp) AS Unique_timestamp
FROM data_sample_raw t
LEFT OUTER JOIN data_sample_raw t1 ON t.event_id = t1.event_id AND
t1.data_item='clck_Duration'
LEFT OUTER JOIN data_sample_raw t2 ON t.event_id = t2.event_id AND
t2.data_item='clck_Encoding'
LEFT OUTER JOIN data_sample_raw t3 ON t.event_id = t3.event_id AND
t3.data_item='clck_Exit_status'
LEFT OUTER JOIN data_sample_raw t4 ON t.event_id = t4.event_id AND
t4.data_item='clck_node_names'
LEFT OUTER JOIN data_sample_raw t5 ON t.event_id = t5.event_id AND
t4.data_item='clck_num_nodes'
LEFT OUTER JOIN data_sample_raw t6 ON t.event_id = t6.event_id AND
t6.data_item='clck_OptionID'
LEFT OUTER JOIN data_sample_raw t7 ON t.event_id = t7.event_id AND
t7.data_item='clck_STDERR'
LEFT OUTER JOIN data_sample_raw t8 ON t.event_id = t8.event_id AND
t8.data_item='clck_STDOUT'
LEFT OUTER JOIN data_sample_raw t9 ON t.event_id = t9.event_id AND
t9.data_item='clck_Version'
LEFT OUTER JOIN data_sample_raw t10 ON t.event_id = t10.event_id AND
t10.data_item='clck_Username'
WHERE t.data_item='clck_provider';


---------------------------
4. SENSYS daemon setup
---------------------------
The first step in running SENSYS is to update the config file.

The default location of the config file is
<sensys_install_directory>/etc/openmpi-mca-params.conf This file should be
edited with key value pairs. For a complete listing of key value pairs, the
following command can be run -

# <sensys_install_directory>/bin/orcm-info --param all all

The following key value pairs are recommended -
mca_base_component_show_load_errors = 1
mpi_param_check = 0
orte_abort_timeout = 10
hwloc_base_mem_bind_failure_action = silent
btl_sm_free_list_max = 768
sensor_base_sample_rate = 5
sensor_heartbeat_rate = 10
sensor_base_verbose = 100
sensor_base_log_samples = 1
sst_orcmd_scheduler_reqd = 0
db_print_file=-

The configuration file should also include the details of the PostgreSQL server.
The following configuration options are required -

db_postgres_uri = <database server name>
db_postgres_database = <database name>
db_postgres_user = <database user name>:<database password>

The details for the above configuration option are the results of the actions
performed in step 3 above.

Additional details can be found in
https://intel-ctrlsys.github.io/sensys/2-Build-and-Installation-Guide/2.2-Database-Installation/2.2.1-Database-Server.html

The second step is to create an SENSYS site configuration file. This is an XML
file providing information about the head nodes and the compute nodes.

A sample XML file can be found in the etc/orcm-site.xml location under the
source code check out tree.

This file should be placed under <sensys_install_dir>/etc as 'orcm-site.xml'.

The firewall on the nodes must be opened for the ports specified in the
site configuration file.

----------------------------------------------------
5. Intel(R) Cluster Checker sensor plugin setup
----------------------------------------------------

Upon installation of Intel(R) Cluster Checker 2018, the SENSYS sensor plugin
can be found under <clck_install_directory>/sensys.

Prior to running the Intel(R) Cluster Checker sensor plugin, the SENSYS
configuration file must be appended with the following key value pair

mca_base_component_path =
<sensys_install_directory>/lib/openmpi:<clck_install_directory>/sensys

The directories listed as the contents of the 'mca_base_component_path'
variable are scanned by SENSYS for plugins. Consequently, it is important to
include both the directories, the one containing the default plugins and the
one containing the Intel(R) Cluster Checker plugin.


Additionally, the following key value pair is also required in the SENSYS
configuration file.

sensor_clck_provider_base_path = <clck_install_directory>/provider

The Intel(R) Cluster Checker sensor plugin scans this directory for
valid provider configuration files. Please refer to the Intel(R) Cluster Checker
documentation for more details.


The Intel(R) Cluster Checker sensor plugin dynamically loads certain
libraries that are a part of the base Intel(R) Cluster Checker installation.
As a pre-requisite, the LD_LIBRARY_PATH environment variable must be updated
to include this library prior to the execution of the SENSYS daemons. This can
be achieved through the following command -

$ export LD_LIBRARY_PATH=<clck_install_directory>/lib/intel64:$LD_LIBRARY_PATH


When Intel(R) Cluster Checker clck-collect tool is invoked in
asynchronous mode with cluster providers, it dumps the output and other
metadata in temporary files. These temporary files are stored in the
directory /tmp/orcm-clck by default. SENSYS attempts to automatically
create this directory with global write permissions when it is first
started.


If a different directory is desired for this purpose, SENSYS's configuration
file needs to be modified to include the following key value pair.

sensor_clck_provider_output_path = <new directory>

Additionally, clck-collect needs to be run with a configuration file which
should include the following tag.

<configuration>
    <collector>
        <async_output_dir>new directory</async_output_dir>
    </collector>
</configuration>

Please refer to section 6.2 of this documentation for instructions on running
clck-collect in asynchronous mode.

--------------
6. Data aggregation
---------------
The SENSYS daemons are launched by executing the 'orcmd' binary under
$SENSYS_INSTALLDIR/bin.

By default, all sensor plugins are loaded. If it is desired to only run
the Intel(R) Cluster Checker plugin, the following command line can be
used -

# $SENSYS_INSTALLDIR/bin/orcmd -omca sensor heartbeat,clck

Note that it is required to run the heartbeat plugin irrespective of the other
plugins desired.

The daemons parse the SENSYS site configuration XML files and determine their
roles. The aggregator daemons receive data from the compute daemons, and
print the contents to the screen. The compute daemons run the providers
and push the provider output details to the aggregator daemons.

-------------------------
6.1 Single Node providers
-------------------------
Providers which only require the node they are invoked on to run are executed
periodically by the SENSYS daemon running on that corresponding node.

---------------------
6.2 Cluster providers
---------------------
Providers which require multiple nodes to run are not executed directly by the
SENSYS daemons.

The clck-collect tool needs to be run with the '-A' flag in order to run cluster
providers. Please refer to the Intel(R) Cluster Checker documentation for more
details on running clck-collect.

When running clck-collect with the '-A' flag, Intel(R) Cluster Checker will not
attempt to aggregate the data itself. Instead, the data will be dumped to the
output directory configured in step 5.

The SENSYS daemon periodically scans the directory for valid output files. When
valid output files are found, the SENSYS compute daemons read the files, push
the contents to the aggregator daemons and remove the files from the directory.


----------------------------------------------
7. Intel(R) Cluster Checker Data Analysis
----------------------------------------------
In order to use a PostgreSQL database populated by the SENSYS daemons with
Intel(R) Cluster Checker, the following three steps are required -

(a) Creation of $HOME/odbcinst.ini
This ini file should contain the following information


[ODBC Data Sources]
sensys     = PostgreSQL ODBC driver

[sensys]
Driver       = /usr/pgsql-9.4/lib/psqlodbcw.so
Description  = PostgreSQL ODBC driver
FileUsage    = 1
Servername   = <database server URI>
Port         = <database server port>
UserName     = <database user name>:<database password>
Database     = <database name>


In the above file, the Driver object should point to the location of the ODBC
driver installed in Step 1.

(b) Creation of $HOME/odbc.ini
The $HOME/odbc.ini should be a symbolic link to $HOME/odbcinst.ini created
in the previous step

(c) Modification of Intel(R) Cluster Checker configuration file
Intel(R) Cluster Checker's clck-analyze tool should be run with a
configuration file that includes the following tag -

<configuration>
    <database>
        <database_backend name="sensys" source="sensys">odbc</database_backend>
    </database>
</configuration>

The source attribute value in the <database_backend> tag should match the section
name in the odbcinst.ini file created in the previous step.

----------------------------------------------
8. Known Issues
----------------------------------------------
