Tymeac Tuning

Tuning is a full time job for professionals and involves every parameter of the operating system, Java, applications, and network.

This section is not a how to tune manual, but a what does Tymeac do, and therefore, what should I look for in tuning. Two parameters, applicable to Tymeac, are: Threads and Wait Lists.

Threads

Tymeac runs on all processors that support Java. The RMI Servers run on all processors that support RMI . However, not all RMI Servers work identically on all platforms for all releases of RMI or operating systems.

Tymeac manages threads. Tymeac handles the synchronization, deadlock avoidance, blocking - runnable tasks for threads. Tymeac insulates the Tymeac Client from thread management.

The Queue Maintenance Class (TyQueMaint) or User Classes (TyUserQueues) defines the number of threads for each Queue.

Tymeac may start all threads for a Queue at Tymeac start up when indicated. Otherwise, Tymeac instantiates and starts each thread when necessary.

Tymeac differs from the traditional thread pool in its ability to thoroughly control the threading environment.

Three big problems with basic thread pools are:
    1. When to start/terminate a thread.
    2. How to control how many threads are executing at any one time.
    3. How to determine when a thread stalls.

Let's say there are three threads in a pool.

  1. Normally you start all three threads at system startup. The threads remain alive until the system shuts down. Each Java thread maps to an operating system thread. During light activity for the queue, there is no way to reduce the load on the total system.

The other way is to start each thread when you put a request into a queue (lazy initialization) up until the maximum. Each thread reduces the load on the queue until there is no more work, at which time it terminates. While this reduces the thread load on the system, it results in a large thread create/destroy overhead for a heavily used system.

Tymeac solves this problem with lazy initialization and the ability to reuse or selectively terminate threads during light activity.

  1. The threads block until there is work. If there is one request in the queue, then one thread is working and the other two are blocking. When there are three or more requests in the queue, then all threads are working.

    There is no way to control the number of threads working when there is more than one request in the queue. When a thread can process a request in a short time period, then using more than one thread for a light load is a waste of resources. Additionally, since threads compete with each other for cycles, storage etc., unnecessary threads may slow down the overall throughput.

Tymeac solves this problem with the inactive status and new thread thresholds (see below).

  1. When problems arise in a thread (deadlocks, exceptions, errors), the thread stalls. That is, it cannot continue executing without intervention. There is no way to internally determine when a deadlock is present. There is no automatic error recovery and restart. The best one can hope for is that someone notices that the work is not getting done.

Tymeac solves this problem by timing each status in the life of a thread, automatically recovering whenever possible and notifying administrators when there is a hard stall.

While the Thread Pool... Classes in java.util.concurrent address the first problem, they don't handle problems two and three. Additionally, they don't adequately deal with priority queues and thresholds.


A thread waits (blocks) until it is needed (runnable). When necessary, Tymeac gets a new instance of the thread (i.e. allocated) and starts() it. Active and inactive are Tymeac thread status codes. Initially, all threads are inactive.

Tymeac first activates a thread when there is work for the Queue.

When a thread has no work it enters a status of Waiting for work.

If a new request comes into the Queue, then Tymeac notifies the first thread with this status that there is work. The thread's status becomes Notified, awaiting execution

As soon as Java time slices the thread it changes its status to Thread processing. Since every request goes into a Wait List, this newly notified thread may not be the thread to process the request. Any currently working thread in the Queue may fetch the request. In any case, when the thread has no work it re-enters a status of Waiting for work.

When the time for waiting expires (this is, the Wait Time Element in Queue Maintenance TyQueMaint) the thread sets its status to Inactive. The thread is alive but blocking and is no longer participating in scheduling until reactivated as a result of a Threshold exception, below. (The Tymeac Monitor may destroy a thread after its Idle Thread Life time expires.)

See the full life cycle of a thread, below.

Tymeac always reactivates a thread when there is an overflow in the wait lists. An overflow is when the request does not fit into the desired wait list because that wait list is full and the request must go into the next, higher wait list.

The New Thread Thresholds (TyQueMaint) describes the algorithms used in determining when Tymeac may reactivate a thread when there is no overflow. However, not expounded there is the threshold scan delay.

The threshold scan delay is a two second delay Tymeac uses between starting a new thread and next scanning the wait lists for threshold values. This delay is to avoid unnecessarily starting too many threads when there is a flood of requests. For example:

Let's say the individual threshold percent for a Queue is 60%. Therefore, when the number of pending requests in a wait list exceeds 60%, Tymeac starts a new thread.

Currently the load on the Queue is 50% but then a burst of requests comes in increasing the load well past 60%. Tymeac starts a new thread when the first new request exceeds 60% but does not scan the wait lists again for two seconds. This gives the newly started thread a chance to reduce the load before more threads enter the system.

The scan delay value is system wide, not part of each individual Queue. This is because of the fact that sometimes too many variables makes tuning a nightmare, especially across multiple operating systems.
See below, in Wait Lists (fixed number of entries within each Queue).
See also the Monitor (interval of the timed events).


Maximum Active Threads gives little warning of a problem. This is not the total number of threads defined within Java but the total number of threads within Java actively processing  MAT is our way of saying the processor is out of gas.

Threads, whether operating system threads or logical threads, compete with each other within a Tymeac Queue, with other threads in Java, and with other processes in the Box. Sometimes adding more threads slows down overall processing because of resource limits (memory, cycle, network, locks, blocking, etc.)

Hence, there is a very close correlation between the number of threads for each Queue and the Maximum Active Threads for the system.

This is the purpose of inactivating a Tymeac thread. While Inactive, and alive, the thread is not looking for work and Java does not time slice the thread. Tymeac Scheduling does not directly select an Inactive thread for a client request. Therefore, the thread uses a minimum of resources and does not compete with other threads.

An example is helpful.

A Queue has three threads. These we will call, #0, #1, and #2.

After Tymeac start up, no threads are active.

When the first request comes in for this Queue, Tymeac places it in a Wait List and activates thread #0.

More, and more, and more requests come in for this Queue. Tymeac places the requests into Wait Lists. Thread #0 processes these requests, without competition from other threads.

As the Wait Lists fill up, Tymeac checks the New Thread Thresholds for this Queue. When the Queue exceeds a threshold, Tymeac activates thread #1.

More, and more, and more requests come in for this Queue. Tymeac places the requests into Wait Lists. Threads #0 and #1 process these requests.

As the Wait Lists become empty and the threads are no longer busy, the threads wait for new work.  The status is Waiting for Work. Waiting threads no longer look for work themselves. They are blocking. These waiting threads never get time sliced until the Tymeac Scheduler notifies them (runnable).

When new requests come in, Tymeac notifies the first thread in the list, thread #0. As long as thread #0 finishes its work before a new request comes in then Tymeac never notifies thread #1. Thread #0 waits and Tymeac notifies it again.

When the Wait Time expires, thread #1 becomes inactive. A threshold exception must take place before thread #1 can participate again.

As more, and more, and more requests come in for this Queue. Tymeac places the requests into Wait Lists. Thread #0 processes these requests, without competition from other threads.

When the Wait Lists are empty and no new requests come in, then thread #0's Wait Time expires and it becomes inactive. When the next request comes in, Tymeac reactivates thread #0.

The Idle Thread Life parameter in the Queue determines when the Tymeac Monitor destroys a thread. Specifying a time of zero ignores this parameter. Otherwise, when the the thread remains inactive for "Idle Thread Life" seconds, the Tymeac Monitor, when next it runs, destroys the thread.

The Thread Display/Purge/Enable Class, TyQueThd, is available to monitor the status of each Queue's threads.

Playing nice
Why go through all this trouble? It's called playing nice.

Tymeac Functions play nice within Tymeac. Each Tymeac Queue uses the minimum amount of resources and returns the resource as soon as possible. We assume there are other Tymeac clients using different Tymeac Functions. When one client hogs the resources, everyone feels the pain.

Tymeac Functions play nice within the Box. We assume there are other applications executing in the Box besides us.


The Shut Down and On Request Classes write statistics to the Statistics repository. These statistics are useful in analyzing performance:

Number Processed, Number Activated, Number of Waits, Number Notified, Number Instantiated.

Number Processed
The algorithm, for thread notification, is 'first available'. The usage is usually heaviest at the top. However, what may appear as a lopsided usage in no way precludes that something is wrong. The status of higher order threads may be 'in use' so that lower order threads 'are notified' more often. The application process of some threads may take longer than others. The Number Processed by a thread in itself is of little importance.

The sum of all threads' Number Processed is the total requests processed by the Queue.

Number Activated
The Number Activated of zero is significant. A prudent reserve is always wise and business rules change daily. However, after observing a zero usage over time, eliminating some never used threads reduces storage usage.

Number of Waits, Number Notified
The Number of Waits and Number Notified are significant. 

The Number of Waits is the number of time-outs (i.e., wait interval expired with no new work.) The thread sets its status to Waiting for work, and issues a wait. If Tymeac notifies (Number Notified) the thread at anytime before the wait interval expires then there is no time-out.

When the Number Notified is significantly lower than the Number of Waits, then the Wait Time may be too low.

Number Instantiated
The Number Instantiated is the total new instances of a thread. This number is only significant when taken together with the interval the Tymeac Server is executing. For example, if the number is 3,000 and the interval for the Tymeac Server is three (3) days, then this is good. However, if the interval is three (3) hours, then the Queue parameter, Idle Thread Life is probably too small.

Once again, this is not a how to tune manual, but a what does Tymeac do, and therefore, what should I look for in tuning.

The Queue Update Class (TyQueData) is a way to experiment without making permanent (DBMS/User Class) changes. The New Thread Thresholds and Wait Interval are alterable and effect an immediate change.


Thread Life Cycle

Tymeac Start-up creates the Queues. Each thread entry within the Queue is initially set with a status of Never Used. Whether there is a physical Java thread attached depends on the start up option in the Queue definition. The Queue Thread Display/Alter indicates the presents of a physical Java thread as 'existent', and the absence of a physical Java thread as 'null'.

When the start up option is to start all threads, Tymeac instantiates and start() all threads in the Queue. The status remains as Never Used. However, during display of the Queue's threads, the suffix changes from 'null' to 'existent'.

When the thread life option in the Queue definition has a value greater than zero, the Tymeac Monitor destroys the thread when the interval expires. The status remains as Never Used.  However,  during display of the Queue's threads, the suffix changes from 'existent' to null'.

The first request for a Queue activates the first thread in the Queue. If no physical thread exists, Tymeac instantiates and starts the Java thread. The status changes from Never Used to Reactivated, awaiting execution.  This new status is a timed event. If this status remains for an unreasonable amount of time, the Tymeac Monitor flags the thread as a possible problem. Other threads may become active as a result of Wait List overflows or Threshold exceptions.

Work Cycle:

When the thread gets execution time, it changes its status to Thread Processing. The thread is executing code such as fetching a request from a Wait List, preparing to invoke the application method, or processing the output from the application. This new status is a timed event. If this status remains for an unreasonable amount of time, the Tymeac Monitor flags the thread as a possible problem.

If the thread successfully fetches a request from a Wait List: (Since every request goes into a Wait List, this newly notified thread may not be the thread to process the request. Any currently working thread in the Queue may fetch the request.)

  • it changes its status to In application class and invokes the Processing Application Class. This new status is a timed event. If this status remains for an unreasonable amount of time, the Tymeac Monitor flags the thread as a possible problem.
  • Upon return from the Processing Application Class, the thread changes its status to Thread Processing. The thread saves the Processing Application's output Object, when necessary.
  • If this is an Asynchronous Request with an Output Agent Queue, and this is the final Queue in the request, then the thread changes its status to Scheduling Output Agent and schedules the Output Agent Queue. This new status is a timed event. If this status remains for an unreasonable amount of time, the Tymeac Monitor flags the thread as a possible problem.
  • Upon return from scheduling the Output Agent Queue, the thread changes its status to Thread Processing and attempts to fetch the next request from a Wait List.

When no work remains in any Wait List, the thread changes its status to Waiting for work. This new status is optionally timed.

If any new work comes in, this thread is immediately available to processes that request. The request changes its status to Notified, awaiting execution. This new status is a timed event. If this status remains for an unreasonable amount of time, the Tymeac Monitor flags the thread as a possible problem. The above, work cycle, repeats.

If no new work comes in and
this Queue is participating in the optional timing of waiting threads and
that time elapses,
then the thread changes its status to Inactive. This is a timeout event. Tymeac keeps statistics on the life cycle of threads.

If new work comes in, this thread may only participate in processing when:

  • There are no other threads working in the Queue,
  • the placement of a request into a Wait List results in an overflow, or
  • a Wait List exceeds a Threshold limit.

When necessary, the new request activates the thread. If no physical thread exists, Tymeac instantiates and starts the Java thread. The status changes to Reactivated, awaiting execution. The above, work cycle, repeats.

If no new work comes in and
the Queue is participating in the optional timing of inactive threads, and
that time elapses
then, when the Tymeac Monitor next runs, it destroys the physical thread.

When the next request comes into this Queue, that request activates the thread. The status changes to Reactivated, awaiting execution. This new status is a timed event. If this status remains for an unreasonable amount of time, the Tymeac Monitor flags the thread as a possible problem. The above, work cycle, repeats.

Flagged Threads:

There are two stages of a thread's life cycle that denote a problem. These are status codes of Disabled and Cancelled. The section, "How a thread's status becomes disabled", discusses the Disabled status code.

The Cancelled status code results from an excessive amount of time in a state of "processing" or "about to process". The problem with threads, in any environment, is: What are they doing?  Are they alive?, etc. To address this problem, Tymeac times certain events. When the thread exceeds one of these time limits, the Tymeac Monitor changes its status to Cancelled. On the next cycle of the Tymeac Monitor, all Cancelled threads are set to Disabled.

These Cancelled or Disabled status codes have no effect on the thread. If a thread is just slow, then it changes its own status according to its processing.

These status codes have an effect on scheduling an Inactive thread. Recall from above, that when new work comes in, the Inactive thread may only participate in processing when:

  • There are no other threads working in the Queue,
  • the placement of a request into a Wait List results in an overflow, or
  • a Wait List exceeds a Threshold limit.

Cancelled or Disabled status code threads are not considered working nor are they considered available to activate. Therefore, they are non-participants in the Queue.

Consider the following:

A Queue has three threads. These we will call, #0, #1, and #2.

When the first request comes in for this Queue, Tymeac places it in a Wait List and activates thread #0.

The Processing Application Class develops a 'hanging' problem for which it never recovers. It may be waiting for another resource to become available, etc.

Tymeac considers this thread "working". As any new requests comes in, Tymeac places the request in a Wait List. As long as no overflow occurs or no Threshold limit exceeds, then it is reasonable to assume that thread #0 will process the new request when it finishes with the current one.  Therefore, Tymeac does not activate a new thread.

Thread #0 exceeds a time limit. The Tymeac Monitor sets the status to Cancelled. If the request thread #0 is working on is an Asynchronous Request, the Tymeac Monitor places an entry in the Stall Array for this request.

As any new requests comes in, Tymeac places the request in a Wait List.  Since no threads are considered "Working", Tymeac activates thread #1.  As long as thread #1 does not 'hang' as well, it processes the requests normally.

On the next cycle of the Tymeac Monitor, it sets the status of thread #0 to Disabled and may send a message to the Notification Queue.

Wait Lists

Each Queue must have at least one Wait List. The Queue Maintenance Class (TyQueMaint) or User Classes (TyUserQueues) defines the number of Wait Lists and number of physical and logical entries available in each Wait List. All Wait Lists use FIFO processing. Tymeac Server, when activating a new thread, always puts the request into a Wait List. This is so that the first thread looking for work may process the request rather than making the request wait for the new thread to become active.

The request (TymeacParm) specifies the priority of the request. Tymeac puts it into its respective Wait List. Priority 3 goes into Wait List 3, Priority 1 goes into Wait List 1.

Wait Lists may not be priority lists. Using two or more Wait Lists when the request always specifies priority 1 is an overflow technique. See the discussion in the Queue Maintenance Class (TyQueMaint) for how to use these Wait Lists and the problems of such.

Tymeac attempts to put the request into the Wait List corresponding to its priority. However, when that Wait List is full, Tymeac puts the request into the next higher Wait List. This is a primary overflow. When that Wait List is also full, Tymeac puts the request into the next higher Wait List. This is a secondary overflow. Overflows are handled errors.

All Wait Lists for a Queue have a fixed number of entries taken from the Queue Maintenance Class. Using a variable number for each Wait List makes tuning extremely difficult.

Circumstances change day to day. In reality, nobody monitors any system with extreme diligence. Using a variable number of entries for each Wait List means that similar Queues may not reflect similar behavior. When conditions change, the algorithm for determining what is a problem must be unique for the number of entries factor. The fixed number makes the calculation simple, affords an at-a-glance picture with numbers and charts, and a meaningful assumption about similar Queues is possible.

See also the note in the discussion of Disabled Queues.

The Queue Elements Display/Alter Class (TyQueData) permits modifying the number of entries in Wait Lists during execution. This gives you another tool with which to tune performance.

The statistics show the high water mark, primary and secondary overflows at Shut Down and On Request. Eeach Wait List is listed starting at 1:

Times Used
Just what it says.

Highest Used
High water mark for number of pending requests at any time. When this number is close to the physical number of entries in a Wait List, then it may be time to increase the physical number of slots.

Times Reset
During a back-out as the result of a scheduling failure, time-out or cancel -- the number of requests removed from this Wait List.

Times overflowed primary
When a request does not fit in the desired Wait List, it overflows into the next Wait List. Overflow is a handled error. If this number is high, make the physical number of entries in the Wait List higher.

Times overflowed secondary
When a request does not fit in the desired Wait List, it overflows into the next Wait List. However, when that Wait List is also full, the request overflows into a subsequent Wait List. Overflow is a handled error. If this number is high, then the physical number of entries in the Wait List is much too low.

One may use the statistics to observe a trend before it becomes a problem (high water mark.) One may increase or decrease the number of Wait Lists and/or entries available for Wait Lists.

See also the Queue statistics for balancing the number of logical slots in Wait Lists:

  • Total times reached Overall Threshold
  • Total times reached Individual Threshold
  • Total times reached Weighted Average Threshold

 

 

© 1998 - 2008 Cooperative Software Systems, Inc.  All rights reserved.