Monitoring External Systems with Platform Agents

Licensing

Process-based Contracts

All monitoring is included in the license.

License-based Contracts

A UNIX or Microsoft Windows agent can be configured to be for monitoring only by not assigning any job definition types that run on agents and not assigning any file events. Such process servers do not consume any licenses. On OpenVMS you must assign the DCL job definition type and thus there is no free monitoring.

Prerequisites

Monitoring must be enabled
Monitoring process server must be configured for monitoring

Configuration

Process server and queue monitoring is disabled by default for performance reasons. You set the /configuration/jcs/monitoring/enabled configuration entry to true to enable monitoring and for each process server you wish to monitor, you set the MonitorInterval process server parameters.

Default Monitor Nodes

The platform agent will report CPU busy, IO page rate and disk capacity by default. You can tune how often it does this by changing the MonitorIntervalprocess server parameter. The data is stored in the monitor tree in the following paths:

System.ProcessServer.${PSName}.Performance.Load - By default the number of processes the process server is currently processing or a representation of the load factors as configured.
System.ProcessServer.${PSName}.Performance.LoadThreshold - By default the maximum number of processes allowed to run simultaneously or the maximum load specified on the load factor tab.
System.ProcessServer.${PSName}.Performance.CPUCount - The number of CPU's the system has.
System.ProcessServer.${PSName}.Performance.CPUBusy - The CPU usage on the server.
System.ProcessServer.${PSName}.Performance.PageRate - The amount of memory paging that is taking place.
System.ProcessServer.${PSName}.Performance.NetworkResponseAverage - Average communication overhead with platform agent per transfer, in seconds.
System.ProcessServer.${PSName}.Performance.NetworkResponseMaximum - Average communication overhead with platform agent per transfer, in seconds.
System.ProcessServer.${PSName}.Performance.NetworkResponseMinimum - Minimum communication overhead with platform agent per transfer, in seconds.
System.ProcessServer.${PSName}.Performance.NetworkTransferCount - Number of transfers exchange with platform agent.
System.ProcessServer.${PSName}.Performance.NetworkTransferRate - Volume of network traffic sent and received by platform agent, in bytes per second.
System.ProcessServer.${PSName}.Performance.NetworkUptime - Time since last network error or startup, in seconds.
System.ProcessServer.${PSName}.FileSystem.${FileSystemPath}.Free - The free space on the specific file system.
System.ProcessServer.${PSName}.FileSystem.${FileSystemPath}.Used - The used space on the specific file system.
System.ProcessServer.${PSName}.FileSystem.${FileSystemPath}.Total - the total size of the file system.
System.ProcessServer.${PSName}.FileSystem.${FileSystemPath}.UsedPercentage - Percentage of used space on the file system.
System.ProcessServer.${PSName}.Checks.${Check_Name}.${Monitored value} - Custom checks.
${PSName} - process server name, for example System.
${FileSystemPath} - the path to the local filesystem, for example C:\\ or /home (SAN file systems may be considered local if, for example, they are mounted via iSCSI.
{Check_Name} - the name of the check or its description, if the latter is set.
{Monitored value} - the name of the check that is performed; depends on the type of check.

The Load and LoadThreshold are calculated for all process servers, not just for process servers that include a PlatformAgentService. The LoadFactors for a process server point to a MonitorCheck such as CPUBusy or PageRate. All load factors are added up into a particular load. If the summed load is higher than the maximum allowed by the process server's LoadThreshold attribute the process server will be overloaded. Besides showing this status you can also create programmatic actions by defining a condition that checks the summed load and raises the appropriate events.

note

The file system statistics are reported for all local disks, network shares are not taken into account.

Network Statistics Logging

The logging is done at least every 24 hours, but usually every hour if there is anything to report, and takes the following from in the platform agent log files:

INFO 2023-11-16 16:34:48,663 CES common.statistics - The agent started 0 job processors in the last 359 minutes, with at most 0 in parallel
INFO 2023-11-16 16:34:48,663 CES common.statistics - Performed 1 HTTP requests in the last 359 minutes, average 0.124s, max 0.124s, min 0.124s
INFO 2023-11-16 16:34:48,663 CES common.statistics - Performed 1087 HTTP requests (scheduler) in the last 359 minutes, average 0.052s, max 0.204s, min 0.030s
INFO 2023-11-16 16:34:48,663 CES common.statistics - Performed 19 file reads in the last 359 minutes, total 25024 bytes
INFO 2023-11-16 16:34:48,663 CES common.statistics - Performed 173947 file writes in the last 359 minutes, total 24063781 bytes
INFO 2023-11-16 16:34:48,663 CES common.statistics - Performed 8 network connections in the last 359 minutes, average 0.010s, max 0.029s, min 0.001s
INFO 2023-11-16 16:34:48,663 CES common.statistics - Performed 12 network name lookups in the last 359 minutes, average 0.013s, max 0.126s, min 0.000s
INFO 2023-11-16 16:34:48,663 CES common.statistics - Performed 7565 network reads in the last 359 minutes, total 890417 bytes
INFO 2023-11-16 16:34:48,663 CES common.statistics - Performed 2948 network writes in the last 359 minutes, total 475673 bytes

The "network connections" statistics (average, max, min) are usually way below one second. In the above, the average response is 10 ms with a worst case of 29 ms. Note that this includes both the pure network latency as well as the time the network takes to do data transfers. The latter factor is usually negligible, but be careful in cases where large files are sent over the network.

The "network name lookup" statistics show how the customer DNS service is performing. You can see that the spread is a little more than the internet connections themselves!

HTTP requests not marked as HTTP requests (scheduler) were requests where the request was either to a different HTTP service than the pure agent to server communication. Note that no HTTP request failures happened in the above log, so they are not reported. Such failures would show up like this:

INFO 2023-11-16 16:34:48,663 CES common.statistics - Performed 1 HTTP requests (failed) in the last 359 minutes, average 30.03s, max 30.03s, min 30.03s

Note that only failed HTTP requests are logged separately, not failed DNS requests.

Check Styles and Platforms

Eventlog - Windows only
Logfile - UNIX, OpenVMS & Windows
Process - UNIX & OpenVMS
Service - Windows only
Socket - UNIX, OpenVMS & Windows

Process Server Checks

A process server with an attached platform agent service can monitor system operation when it is of the UNIX, Microsoft Windows or OpenVMS family type.

You can add the checks on the Checks tab of the Process Server edit dialog. See the Creating Monitor Checks topic for more information.

The monitoring system has three general severity grades (green, yellow and red), and levels from -1 to 100. -1 means disabled, 0 usually means everything is as it should be whereas 100 usually means there is a critical problem (red). Values 50 until and including 74 translate to yellow, which is meant to be a "warning".

When you implement a check, you want to set levels and grades accordingly so that operators can immediately analyse the situation and react accordingly. You should create at least two checks for everything you want to monitor, one to match green and one to match red grades. You do this with the Severity Expression.

The fields you can add per check are:

Field	Description
Name	Name for the check.
Description	A description for the check.
Documentation	A comment for the check.
Enabled	when ticked, the check is enabled.
Style	The type of check.
Object Name	The first attribute of the check (compulsory).
Attribute 2	The second attribute (compulsory for Logfile and EventLog).
Poll interval	The interval at which to check.
Severity	The severity of the condition expression.
Condition Expression	An expression that describes a state, for example `=Count > 0`.
Delay Amount	Number of Delay Units to wait before firing the ad hoc alert or submitting the Reaction Process Type process.
Delay Units	The delay units.
Ad Hoc Alert Source	Ad hoc alert source to fire.
Process Definition	Process definition to submit.
Address	Address to be used for the ad hoc alert source or parameter.
Message	Message to be used for the ad hoc alert source or parameter.
Data	Data to be used for the ad hoc alert source or parameter.

Example

You want to make sure that the Oracle database is running.

Name	Value
Description	Check Oracle running.
Documentation	Check that Oracle is running.
Style	Process
Object Name	`ora_orcl`
Attribute 2
Poll interval	3
Severity	0
Condition Expression	`=Count > 10`

Add another check, so that the severity is set to high when less than 2 processes are running for Oracle.

Name	Value
Description	Check Oracle Not running.
Documentation	Check that Oracle is not running.
Style	Process
Object Name	`ora_orcl`
Attribute 2
Poll interval	3
Severity	75
Condition Expression	`=Count < 2`

Check if the Oracle Listener is working:

Name	Value
Description	Check Oracle Listener is running.
Documentation	Check that Oracle Listener is running.
Style	Socket
Service	`1521`
Poll interval	`5`
Severity	`75`
Condition Expression

The Name is used as an identifier to distinguish checks of the same process server in the log files. They also determine what the path of the checks in the monitor tree are. Depending on the Style the path will be:

System.ProcessServer.${PSName}.Check.$｜CheckName}.Count
System.ProcessServer.${PSName}.Check.${CheckName}.Message

The Style can be selected from the drop down box, and is one of Process, Socket, Logfile, Service, Eventlog.

Object Name is always required, what it determines depends on the style.

For the Process (UNIX, OpenVMS) and the Service (Microsoft Windows) styles it contains a pattern using GLOB matching that selects the name of the objects. Matching objects are counted. For OpenVMS the matching record is the process name. For UNIX the matching record is the output of a line of ps -ef or its equivalent. For Microsoft Windows services the matching record is Displayname (Servicename) which means that you can check on both names of the service, if desired.
For Logfile it contains the filename of the logfile that is to be checked.
For Eventlog (Microsoft Windows) it contains the name of the log. Typical values are System and Application, but other Microsoft Windows logs are allowed.
For Socket it contains the service port to be checked. You can specify a port number in decimal or a reference that will be resolved by the agent on the target system.

Attribute 2 is only used for some styles.

It is not used for the Process and Service styles.
For the Logfile and Eventlog styles this contains a pattern using GLOB matching that selects records. The Logfile records are the lines in the file. The Microsoft Windows Eventlog records are the complete message expanded using the locale defined for the agent.
For the Socket style this contains the network address that the socket should be bound to. The default is 0.0.0.0 (all IP addresses of the server).

note

GLOB matching means that you can use * to search for any number of characters and ? for a single character, just as you do on Microsoft Windows Command prompt or in Microsoft Dos, for example. Use * at the beginning and end of the pattern if you want your pattern to match a particular string somewhere in the record instead of the whole record.

The Poll Interval is used as the upper bound for how often the check is performed. This is not a pure interval because the agent can often check multiple checks of the same style using a single pass over whatever it checks. In such cases the check may be performed more often than set here.

The Severity and Condition Expression are used to create a default condition in the monitor tree. Normally, a condition named Default will be created on the monitor check that is created as a result of the process server check. This condition will set severity 50 (Yellow) and Condition Expression = Count < 1 unless you set other values in the process server check. You should not edit the Default condition as the values in there will then be overwritten with those from the process server check.

If you want to use more complicated conditions than the simple single condition allowed by the Severity and Condition Expression fields you can do so by adding your own Conditions on the MonitorCheck with a name other than Default. As soon as you create such a condition the Default condition will not be updated or recreated.

Examples of valid ProcessServerChecks:

OS Family	Style	Object Name	Attribute 2	Explanation
UNIX	Process	ora_dbwr_		Process matching on UNIX is on the output of ps -ef, so wildcards are needed
VMS	Process	NETACP		Process matching on VMS is purely on the process name, so no wildcards needed
UNIX	Logfile	/var/log/system.log	dhcp:	Log messages written by the DHCP service
UNIX	Socket	21		Check that the FTP service is running
Windows	Service	W32Time		Check that the Windows Time Service is running (by its service name)
Windows	Service	Windows Time		Check that the Windows Time Service is running (by its display name)

Product Documentation

Monitoring External Systems with Platform Agents

Licensing

Process-based Contracts

License-based Contracts

Prerequisites

Configuration

Default Monitor Nodes

note

Network Statistics Logging

Check Styles and Platforms

Process Server Checks

Example

note

See Also