Redwood Documentation

Product Documentation

 

›Process Servers

RunMyJobsProcess Servers and Queues

Process Servers

  • Using Process Servers
  • Creating a Process Server
  • Configuring a Process Server
  • Raising Events with Files
  • Creating a Monitoring Platform Agent
  • Monitoring External Systems with Platform Agents
  • Visualizing Process Server - Queue Relationships
  • Files
  • Values for Creating a Process Server
  • Process Server Parameters

Queues

  • Using Queues
  • Creating a Queue
  • Using Queues to Administer Process Execution
  • Using Resources to distribute the load
← Creating a Monitoring Platform AgentVisualizing Process Server - Queue Relationships →

Monitoring External Systems with Platform Agents

Licensing

Process-based Contracts

All monitoring is included in the license.

License-based Contracts

A UNIX or Microsoft Windows agent can be configured to be for monitoring only by not assigning any job definition types that run on agents and not assigning any file events. Such process servers do not consume any licenses. On OpenVMS you must assign the DCL job definition type and thus there is no free monitoring.

Prerequisites

  • Monitoring must be enabled
  • Monitoring process server must be configured for monitoring

Configuration

Process server and queue monitoring is disabled by default for performance reasons. You set the /configuration/jcs/monitoring/enabled configuration entry to true to enable monitoring and for each process server you wish to monitor, you set the MonitorInterval process server parameters.

Default Monitor Nodes

The platform agent will report CPU busy, IO page rate and disk capacity by default. You can tune how often it does this by changing the MonitorIntervalprocess server parameter. The data is stored in the monitor tree in the following paths:

  • System.ProcessServer.${PSName}.Performance.Load - By default the number of processes the process server is currently processing or a representation of the load factors as configured.
  • System.ProcessServer.${PSName}.Performance.LoadThreshold - By default the maximum number of processes allowed to run simultaneously or the maximum load specified on the load factor tab.
  • System.ProcessServer.${PSName}.Performance.CPUCount - The number of CPU's the system has.
  • System.ProcessServer.${PSName}.Performance.CPUBusy - The CPU usage on the server.
  • System.ProcessServer.${PSName}.Performance.PageRate - The amount of memory paging that is taking place.
  • System.ProcessServer.${PSName}.Performance.NetworkResponseAverage - Average communication overhead with platform agent per transfer, in seconds.
  • System.ProcessServer.${PSName}.Performance.NetworkResponseMaximum - Average communication overhead with platform agent per transfer, in seconds.
  • System.ProcessServer.${PSName}.Performance.NetworkResponseMinimum - Minimum communication overhead with platform agent per transfer, in seconds.
  • System.ProcessServer.${PSName}.Performance.NetworkTransferCount - Number of transfers exchange with platform agent.
  • System.ProcessServer.${PSName}.Performance.NetworkTransferRate - Volume of network traffic sent and received by platform agent, in bytes per second.
  • System.ProcessServer.${PSName}.Performance.NetworkUptime - Time since last network error or startup, in seconds.
  • System.ProcessServer.${PSName}.FileSystem.${FileSystemPath}.Free - The free space on the specific file system.
  • System.ProcessServer.${PSName}.FileSystem.${FileSystemPath}.Used - The used space on the specific file system.
  • System.ProcessServer.${PSName}.FileSystem.${FileSystemPath}.Total - the total size of the file system.
  • System.ProcessServer.${PSName}.FileSystem.${FileSystemPath}.UsedPercentage - Percentage of used space on the file system.
  • System.ProcessServer.${PSName}.Checks.${Check_Name}.${Monitored value} - Custom checks.
  • ${PSName} - process server name, for example System.
  • ${FileSystemPath} - the path to the local filesystem, for example C:\\ or /home (SAN file systems may be considered local if, for example, they are mounted via iSCSI.
  • {Check_Name} - the name of the check or its description, if the latter is set.
  • {Monitored value} - the name of the check that is performed; depends on the type of check.

The Load and LoadThreshold are calculated for all process servers, not just for process servers that include a PlatformAgentService. The LoadFactors for a process server point to a MonitorCheck such as CPUBusy or PageRate. All load factors are added up into a particular load. If the summed load is higher than the maximum allowed by the process server's LoadThreshold attribute the process server will be overloaded. Besides showing this status you can also create programmatic actions by defining a condition that checks the summed load and raises the appropriate events.

note

The file system statistics are reported for all local disks, network shares are not taken into account.

Network Statistics Logging

The logging is done at least every 24 hours, but usually every hour if there is anything to report, and takes the following from in the platform agent log files:

INFO 2023-09-28 16:34:48,663 CES common.statistics - The agent started 0 job processors in the last 359 minutes, with at most 0 in parallel
INFO 2023-09-28 16:34:48,663 CES common.statistics - Performed 1 HTTP requests in the last 359 minutes, average 0.124s, max 0.124s, min 0.124s
INFO 2023-09-28 16:34:48,663 CES common.statistics - Performed 1087 HTTP requests (scheduler) in the last 359 minutes, average 0.052s, max 0.204s, min 0.030s
INFO 2023-09-28 16:34:48,663 CES common.statistics - Performed 19 file reads in the last 359 minutes, total 25024 bytes
INFO 2023-09-28 16:34:48,663 CES common.statistics - Performed 173947 file writes in the last 359 minutes, total 24063781 bytes
INFO 2023-09-28 16:34:48,663 CES common.statistics - Performed 8 network connections in the last 359 minutes, average 0.010s, max 0.029s, min 0.001s
INFO 2023-09-28 16:34:48,663 CES common.statistics - Performed 12 network name lookups in the last 359 minutes, average 0.013s, max 0.126s, min 0.000s
INFO 2023-09-28 16:34:48,663 CES common.statistics - Performed 7565 network reads in the last 359 minutes, total 890417 bytes
INFO 2023-09-28 16:34:48,663 CES common.statistics - Performed 2948 network writes in the last 359 minutes, total 475673 bytes

The "network connections" statistics (average, max, min) are usually way below one second. In the above, the average response is 10 ms with a worst case of 29 ms. Note that this includes both the pure network latency as well as the time the network takes to do data transfers. The latter factor is usually negligible, but be careful in cases where large files are sent over the network.

The "network name lookup" statistics show how the customer DNS service is performing. You can see that the spread is a little more than the internet connections themselves!

HTTP requests not marked as HTTP requests (scheduler) were requests where the request was either to a different HTTP service than the pure agent to server communication. Note that no HTTP request failures happened in the above log, so they are not reported. Such failures would show up like this:

INFO 2023-09-28 16:34:48,663 CES common.statistics - Performed 1 HTTP requests (failed) in the last 359 minutes, average 30.03s, max 30.03s, min 30.03s

Note that only failed HTTP requests are logged separately, not failed DNS requests.

Check Styles and Platforms

  • Eventlog - Windows only
  • Logfile - UNIX, OpenVMS & Windows
  • Process - UNIX & OpenVMS
  • Service - Windows only
  • Socket - UNIX, OpenVMS & Windows

Process Server Checks

A process server with an attached platform agent service can monitor system operation when it is of the UNIX, Microsoft Windows or OpenVMS family type.

You can add the checks on the Checks tab of the Process Server edit dialog. See the Creating Monitor Checks topic for more information.

The monitoring system has three general severity grades (green, yellow and red), and levels from -1 to 100. -1 means disabled, 0 usually means everything is as it should be whereas 100 usually means there is a critical problem (red). Values 50 until and including 74 translate to yellow, which is meant to be a "warning".

When you implement a check, you want to set levels and grades accordingly so that operators can immediately analyse the situation and react accordingly. You should create at least two checks for everything you want to monitor, one to match green and one to match red grades. You do this with the Severity Expression.

The fields you can add per check are:

FieldDescription
NameName for the check.
DescriptionA description for the check.
DocumentationA comment for the check.
Enabledwhen ticked, the check is enabled.
StyleThe type of check.
Object NameThe first attribute of the check (compulsory).
Attribute 2The second attribute (compulsory for Logfile and EventLog).
Poll intervalThe interval at which to check.
SeverityThe severity of the condition expression.
Condition ExpressionAn expression that describes a state, for example =Count > 0.
Delay AmountNumber of Delay Units to wait before firing the ad hoc alert or submitting the Reaction Process Type process.
Delay UnitsThe delay units.
Ad Hoc Alert SourceAd hoc alert source to fire.
Process DefinitionProcess definition to submit.
AddressAddress to be used for the ad hoc alert source or parameter.
MessageMessage to be used for the ad hoc alert source or parameter.
DataData to be used for the ad hoc alert source or parameter.

Example

You want to make sure that the Oracle database is running.

NameValue
DescriptionCheck Oracle running.
DocumentationCheck that Oracle is running.
StyleProcess
Object Name*ora*_orcl
Attribute 2

Poll interval3
Severity0
Condition Expression=Count > 10

Add another check, so that the severity is set to high when less than 2 processes are running for Oracle.

NameValue
DescriptionCheck Oracle Not running.
DocumentationCheck that Oracle is not running.
StyleProcess
Object Name*ora*_orcl
Attribute 2

Poll interval3
Severity75
Condition Expression=Count < 2

Check if the Oracle Listener is working:

NameValue
DescriptionCheck Oracle Listener is running.
DocumentationCheck that Oracle Listener is running.
StyleSocket
Service1521
Poll interval5
Severity75
Condition Expression

The Name is used as an identifier to distinguish checks of the same process server in the log files. They also determine what the path of the checks in the monitor tree are. Depending on the Style the path will be:

  • System.ProcessServer.${PSName}.Check.$|CheckName}.Count
  • System.ProcessServer.${PSName}.Check.${CheckName}.Message

The Style can be selected from the drop down box, and is one of Process, Socket, Logfile, Service, Eventlog.

Object Name is always required, what it determines depends on the style.

  • For the Process (UNIX, OpenVMS) and the Service (Microsoft Windows) styles it contains a pattern using GLOB matching that selects the name of the objects. Matching objects are counted. For OpenVMS the matching record is the process name. For UNIX the matching record is the output of a line of ps -ef or its equivalent. For Microsoft Windows services the matching record is Displayname (Servicename) which means that you can check on both names of the service, if desired.
  • For Logfile it contains the filename of the logfile that is to be checked.
  • For Eventlog (Microsoft Windows) it contains the name of the log. Typical values are System and Application, but other Microsoft Windows logs are allowed.
  • For Socket it contains the service port to be checked. You can specify a port number in decimal or a reference that will be resolved by the agent on the target system.

Attribute 2 is only used for some styles.

  • It is not used for the Process and Service styles.
  • For the Logfile and Eventlog styles this contains a pattern using GLOB matching that selects records. The Logfile records are the lines in the file. The Microsoft Windows Eventlog records are the complete message expanded using the locale defined for the agent.
  • For the Socket style this contains the network address that the socket should be bound to. The default is 0.0.0.0 (all IP addresses of the server).
note

GLOB matching means that you can use * to search for any number of characters and ? for a single character, just as you do on Microsoft Windows Command prompt or in Microsoft Dos, for example. Use * at the beginning and end of the pattern if you want your pattern to match a particular string somewhere in the record instead of the whole record.

The Poll Interval is used as the upper bound for how often the check is performed. This is not a pure interval because the agent can often check multiple checks of the same style using a single pass over whatever it checks. In such cases the check may be performed more often than set here.

The Severity and Condition Expression are used to create a default condition in the monitor tree. Normally, a condition named Default will be created on the monitor check that is created as a result of the process server check. This condition will set severity 50 (Yellow) and Condition Expression = Count < 1 unless you set other values in the process server check. You should not edit the Default condition as the values in there will then be overwritten with those from the process server check.

If you want to use more complicated conditions than the simple single condition allowed by the Severity and Condition Expression fields you can do so by adding your own Conditions on the MonitorCheck with a name other than Default. As soon as you create such a condition the Default condition will not be updated or recreated.

Examples of valid ProcessServerChecks:

OS FamilyStyleObject NameAttribute 2Explanation
UNIXProcessora_dbwr_

Process matching on UNIX is on the output of ps -ef, so wildcards are needed
VMSProcessNETACP

Process matching on VMS is purely on the process name, so no wildcards needed
UNIXLogfile/var/log/system.logdhcp:Log messages written by the DHCP service
UNIXSocket21

Check that the FTP service is running
WindowsServiceW32Time

Check that the Windows Time Service is running (by its service name)
WindowsServiceWindows Time

Check that the Windows Time Service is running (by its display name)

See Also

  • Monitoring System Performance
  • Monitoring Servers with Platform Process Servers
  • Creating a Monitoring Platform Agent
  • Creating a Monitor Check
  • Creating Custom Monitor Checks
← Creating a Monitoring Platform AgentVisualizing Process Server - Queue Relationships →
  • Licensing
    • Process-based Contracts
    • License-based Contracts
  • Prerequisites
  • Configuration
  • Default Monitor Nodes
  • Network Statistics Logging
  • Check Styles and Platforms
  • Process Server Checks
  • Example
  • See Also
Docs
Getting StartedInstallationFinance InstallationConcepts
TroubleshootingArchiving
Learn and Connect
Support Portal
BlogEventsResources
ISO/ IEC 27001 Information Security Management
Automate to be human

2023 All Rights Reserved |

Terms of Service | Policies | Cookies | Glossary | Third-party Software | Contact | Copyright | Impressum |