Workload
Queues distribute work across a larger system. If one or more machines can perform a task, then queues can be used to distribute work among these machines by attaching a process server to each machine, and a queue to these process servers. Processes are automatically distributed to all the process servers as they start.
The following figure shows a queue and two process servers working normally.
This scenario also provides for fail over in both planned and unplanned scenarios.
In a planned scenario (for example, an upgrade of the operating system is required), the process server is first shut down gracefully: all running processes are allowed to complete, but no new processes start. Processes now automatically start on one of the other process servers until the planned work is completed and the process server restarted. The following figure illustrates a planned down-time scenario.
In an unplanned scenario, the process server is unreachable. This can be due to network failure, or a hardware failure.
The following figure illustrates an unplanned, network-related down-time scenario.
In the case of a network failure, processes continue to execute, and complete on the machine. The process server attempts to notify the central system that the processes has completed. It continues to retry this operation if it fails.
The following figure illustrates an unplanned, system-related down-time scenario.
In the case of a machine failure, processes on the machine are set to the UNKNOWN status to indicate that their results may not be reliable (since the machine failed).