Tymeac Tuning
Tuning is a full time job for professionals and involves every parameter
of the operating system, Java, applications, and network.
This section is not a how to tune manual, but a what does Tymeac do, and
therefore, what should I look for in tuning. Two parameters, applicable to
Tymeac, are: Threads and Wait Lists.
Threads
Tymeac runs on all processors that support Java. The RMI Servers run on all
processors that support RMI . However, not all RMI
Servers work identically on all platforms for all releases of RMI or operating
systems.
Tymeac manages threads. Tymeac handles the synchronization, deadlock avoidance,
blocking - runnable tasks for threads. Tymeac insulates the Tymeac Client from thread
management.
The Queue Maintenance Class (TyQueMaint) or User Classes
(TyUserQueues) defines the number of threads for each Queue.
Tymeac may start all threads for a Queue at Tymeac start up when indicated. Otherwise, Tymeac instantiates and starts
each thread when necessary.
Tymeac differs from the traditional thread pool in its ability to
thoroughly
control the threading environment.
Three big problems with basic thread pools are:
1. When to start/terminate a thread.
2. How to control how many threads are executing at any one time.
3. How to determine when a thread stalls.
Let's say there are three threads in a pool.
- Normally you start all three threads at system startup. The threads remain alive until
the system shuts down. Each Java thread maps to an operating system thread. During light
activity for the queue, there is no way to reduce the load on the total system.
The other way is to start each thread when you put a request into a queue (lazy
initialization) up until the maximum. Each thread reduces the load on the queue until
there is no more work, at which time it terminates. While this reduces the thread load on
the system, it results in a large thread create/destroy overhead for a heavily used
system.
Tymeac solves this problem with lazy initialization and the ability to reuse or
selectively terminate threads during light activity.
- The threads block until there is work. If there is one request in the
queue, then one thread is working and the other two are blocking. When there are three or
more requests in the queue, then all threads are working.
There is no way to control the number of threads working when there is more than one
request in the queue. When a thread can process a request in a short time period, then
using more than one thread for a light load is a waste of resources. Additionally, since
threads compete with each other for cycles, storage etc., unnecessary threads may slow
down the overall throughput.
Tymeac solves this problem with the inactive status and new thread
thresholds (see below).
- When problems arise in a thread (deadlocks, exceptions, errors), the thread stalls. That
is, it cannot continue executing without intervention. There is no way to internally
determine when a deadlock is present. There is no automatic error recovery and restart.
The best one can hope for is that someone notices that the work is not getting done.
Tymeac solves this problem by timing each status in the life of a thread,
automatically recovering whenever possible and notifying administrators when there is a
hard stall.
While the Thread Pool... Classes in java.util.concurrent address the first
problem, they don't handle problems two and three. Additionally, they don't
adequately deal with priority queues and thresholds.
A thread waits (blocks) until it is needed (runnable). When necessary, Tymeac gets
a new instance of the thread (i.e. allocated) and starts() it. Active and inactive
are Tymeac thread status codes. Initially, all
threads are inactive.
Tymeac first activates a thread when there is
work for the Queue.
When a thread has no work it enters a status of Waiting for work.
If a new request comes into the Queue, then Tymeac notifies the first thread with this
status that there is work. The thread's status becomes Notified, awaiting
execution.
As soon as Java time slices the thread it changes its status to Thread
processing. Since every request goes into a Wait List, this newly notified thread may
not be the thread to process the request. Any currently working thread in the Queue
may fetch the request. In any case, when the thread has no work it re-enters a status of Waiting
for work.
When the time for waiting expires (this is, the Wait Time Element in Queue Maintenance TyQueMaint) the thread sets its status to Inactive. The
thread is alive but blocking and is no longer participating in scheduling until
reactivated as a result of a Threshold exception, below. (The Tymeac Monitor may
destroy a thread after its Idle Thread Life time
expires.)
See the full life cycle of a thread, below.
Tymeac always reactivates a thread when there is an overflow in the wait lists. An
overflow is when the request does not fit into the desired wait list because that wait
list is full and the request must go into the next, higher wait list.
The New Thread Thresholds (TyQueMaint) describes the
algorithms used in determining when Tymeac may reactivate a thread when there is no
overflow. However, not expounded there is the threshold scan delay.
The threshold scan delay is a two second delay Tymeac uses between
starting a new thread and next scanning the wait lists for threshold values. This delay is
to avoid unnecessarily starting too many threads when there is a flood of requests. For
example:
Let's say the individual threshold percent for a Queue is 60%. Therefore, when the
number of pending requests in a wait list exceeds 60%, Tymeac starts a new thread.
Currently the load on the Queue is 50% but then a burst of requests comes in increasing
the load well past 60%. Tymeac starts a new thread when the first new request exceeds 60%
but does not scan the wait lists again for two seconds. This gives the newly started
thread a chance to reduce the load before more threads enter the system.
The scan delay value is system wide, not part of each individual Queue. This is because
of the fact that sometimes too many variables makes tuning a nightmare, especially across
multiple operating systems.
See below, in Wait Lists (fixed
number of entries within each Queue).
See also the Monitor (interval
of the timed events).
Maximum Active Threads
gives little warning of a problem. This is not the total number of threads defined
within Java but the total number of threads within Java actively processing
MAT is our way of saying the processor is out of gas.
Threads, whether operating system threads or logical threads, compete with each other
within a Tymeac Queue, with other threads in Java, and with other processes in the
Box. Sometimes adding more threads slows down overall processing because of resource
limits (memory, cycle, network, locks, blocking, etc.)
Hence, there is a very close correlation between the number of threads for each Queue
and the Maximum Active Threads for the
system.
This is the purpose of inactivating a Tymeac thread. While Inactive, and
alive, the thread is not looking for work and Java does not time slice the
thread. Tymeac Scheduling does not directly select an Inactive thread for a
client request. Therefore, the thread uses a minimum of resources and does not
compete with other threads.
An example is helpful.
A Queue has three threads. These we will call, #0, #1, and #2.
After Tymeac start up, no threads are active.
When the first request comes in for this Queue, Tymeac places it in a Wait List and
activates thread #0.
More, and more, and more requests come in for this Queue. Tymeac places the requests
into Wait Lists. Thread #0 processes these requests, without competition from other
threads.
As the Wait Lists fill up, Tymeac checks the New Thread Thresholds for this
Queue. When the Queue exceeds a threshold, Tymeac activates thread #1.
More, and more, and more requests come in for this Queue. Tymeac places the
requests into Wait Lists. Threads #0 and #1 process these requests.
As the Wait Lists become empty and the threads are no longer busy, the threads wait for
new work. The status is Waiting for Work. Waiting threads no longer look
for work themselves. They are blocking. These waiting threads never get time sliced
until the Tymeac Scheduler notifies them (runnable).
When new requests come in, Tymeac notifies the first thread in the list, thread #0. As
long as thread #0 finishes its work before a new request comes in then Tymeac never
notifies thread #1. Thread #0 waits and Tymeac notifies it again.
When the Wait Time expires, thread #1 becomes inactive. A threshold
exception must take place before thread #1 can participate again.
As more, and more, and more requests come in for this Queue. Tymeac places the
requests into Wait Lists. Thread #0 processes these requests, without competition from
other threads.
When the Wait Lists are empty and no new requests come in, then thread #0's Wait Time
expires and it becomes inactive. When the next request comes in, Tymeac
reactivates thread #0.
The Idle Thread Life parameter in the Queue
determines when the Tymeac Monitor destroys a thread. Specifying a time of zero ignores
this parameter. Otherwise, when the the thread remains inactive for "Idle
Thread Life" seconds, the Tymeac Monitor, when next it runs, destroys the
thread.
The Thread Display/Purge/Enable Class, TyQueThd, is
available to monitor the status of each Queue's threads.
Playing nice
Why go through all this trouble? It's called playing nice.
Tymeac Functions play nice within Tymeac. Each Tymeac Queue uses the
minimum amount of resources and returns the resource as soon as possible. We
assume there are other Tymeac clients using different Tymeac Functions. When
one client hogs the resources, everyone feels the pain.
Tymeac Functions play nice within the Box. We assume there are other
applications executing in the Box besides us.
The Shut Down and On Request Classes write statistics to the
Statistics repository. These statistics are useful in analyzing performance:
Number Processed, Number Activated, Number of Waits, Number
Notified, Number Instantiated.
Number Processed
The algorithm, for thread notification, is 'first available'. The usage is usually
heaviest at the top. However, what may appear as a lopsided usage in no way precludes that
something is wrong. The status of higher order threads may be 'in use' so that lower
order threads 'are notified' more often. The application process of some threads may
take longer than others. The Number Processed by a thread in itself is of little
importance.
The sum of all threads' Number Processed is the total requests processed by the
Queue.
Number Activated
The Number Activated of zero is significant. A prudent reserve is always
wise and business rules change daily. However, after observing a zero usage over
time, eliminating some never used threads reduces storage usage.
Number of Waits, Number
Notified
The Number of Waits and Number Notified are significant.
The Number
of Waits is the number of time-outs (i.e., wait interval expired with no new
work.) The thread sets its status to Waiting for work, and issues a
wait. If Tymeac notifies (Number Notified) the thread at anytime before the wait interval expires then
there is no time-out.
When the Number Notified is significantly lower than the Number
of Waits, then the Wait Time may be too low.
Number Instantiated
The Number Instantiated is the total new instances of a thread. This number
is only significant when taken together with the interval the Tymeac Server is executing.
For example, if the number is 3,000 and the interval for the Tymeac Server is three (3)
days, then this is good. However, if the interval is three (3) hours, then the Queue
parameter, Idle Thread Life is probably too small.
Once again, this is not a how to tune manual, but a what does Tymeac do, and therefore,
what should I look for in tuning.
The Queue Update Class (TyQueData) is a way to experiment
without making permanent (DBMS/User Class) changes. The New Thread Thresholds and Wait
Interval are alterable and effect an immediate change.
Thread Life Cycle
Tymeac Start-up creates the Queues. Each thread entry within the Queue is
initially set with a status of Never Used. Whether there is a physical Java
thread attached depends on the start up option in the
Queue definition. The Queue Thread Display/Alter indicates the
presents of a physical Java thread as 'existent', and the absence of a physical Java
thread as 'null'.
When the start up option is to start all threads, Tymeac instantiates and start() all
threads in the Queue. The status remains as Never Used. However, during
display of the Queue's threads, the suffix changes from 'null' to 'existent'.
When the thread life option in the Queue definition
has a value greater than zero, the Tymeac Monitor destroys the thread when the interval
expires. The status remains as Never Used. However, during display
of the Queue's threads, the suffix changes from 'existent' to null'.
The first request for a Queue activates the first thread in the Queue. If no physical
thread exists, Tymeac instantiates and starts the Java thread. The status changes
from Never Used to Reactivated, awaiting execution. This new status is
a timed event. If this status remains for an unreasonable
amount of time, the Tymeac Monitor flags the thread as a
possible problem. Other threads may become active as a result of Wait List overflows
or Threshold exceptions.
Work Cycle:
When the thread gets execution time, it changes its status to Thread Processing.
The thread is executing code such as fetching a request from a Wait List, preparing to
invoke the application method, or processing the output from the application. This new
status is a timed event. If this status remains for an
unreasonable amount of time, the Tymeac Monitor flags the
thread as a possible problem.
If the thread successfully fetches a request from a Wait List: (Since every
request goes into a Wait List, this newly notified thread may not be the thread to process
the request. Any currently working thread in the Queue may fetch the request.)
- it changes its status to In application class and invokes the Processing
Application Class. This new status is a timed event.
If this status remains for an unreasonable amount of time, the Tymeac Monitor flags the thread as a possible problem.
- Upon return from the Processing Application Class, the thread changes its status to Thread
Processing. The thread saves the Processing Application's output Object, when
necessary.
- If this is an Asynchronous Request with an Output Agent Queue, and this is the final
Queue in the request, then the thread changes its status to Scheduling Output Agent
and schedules the Output Agent Queue. This new status is a timed
event. If this status remains for an unreasonable amount of time, the Tymeac
Monitor flags the thread as a possible problem.
- Upon return from scheduling the Output Agent Queue, the thread changes its status to Thread
Processing and attempts to fetch the next request from a Wait List.
When no work remains in any Wait List, the thread changes its status to Waiting for
work. This new status is optionally timed.
If any new work comes in, this thread is immediately available to processes that
request. The request changes its status to Notified, awaiting execution. This
new status is a timed event. If this status remains for
an unreasonable amount of time, the Tymeac Monitor flags
the thread as a possible problem. The above, work cycle,
repeats.
If no new work comes in and
this Queue is participating in the optional timing
of waiting threads and
that time elapses,
then the thread changes its status to Inactive. This is a timeout event. Tymeac
keeps statistics on the life cycle of threads.
If new work comes in, this thread may only participate in processing when:
- There are no other threads working in the Queue,
- the placement of a request into a Wait List results in an overflow, or
- a Wait List exceeds a Threshold limit.
When necessary, the new request activates the thread. If no physical thread exists,
Tymeac instantiates and starts the Java thread. The status changes to Reactivated,
awaiting execution. The above, work cycle,
repeats.
If no new work comes in and
the Queue is participating in the optional timing of
inactive threads, and
that time elapses
then, when the Tymeac Monitor next runs, it destroys the physical thread.
When the next request comes into this Queue, that request activates the
thread. The status changes to Reactivated, awaiting execution. This new
status is a timed event. If this status remains for an
unreasonable amount of time, the Tymeac Monitor flags the
thread as a possible problem. The above, work cycle,
repeats.
Flagged Threads:
There are two stages of a thread's life cycle that denote a problem. These are status
codes of Disabled and Cancelled. The section, "How a thread's status becomes disabled", discusses the Disabled
status code.
The Cancelled status code results from an excessive amount of time in a state of
"processing" or "about to process". The problem with threads, in any
environment, is: What are they doing? Are they alive?, etc. To address this
problem, Tymeac times certain events. When the
thread exceeds one of these time limits, the Tymeac Monitor changes its status to Cancelled. On
the next cycle of the Tymeac Monitor, all Cancelled threads are set to Disabled.
These Cancelled or Disabled status codes have no effect on the
thread. If a thread is just slow, then it changes its own status according to its
processing.
These status codes have an effect on scheduling an Inactive thread. Recall
from above, that when new work comes in, the Inactive thread may only participate
in processing when:
- There are no other threads working in the Queue,
- the placement of a request into a Wait List results in an overflow, or
- a Wait List exceeds a Threshold limit.
Cancelled or Disabled status code threads are not considered working nor
are they considered available to activate. Therefore, they are non-participants in the
Queue.
Consider the following:
A Queue has three threads. These we will call, #0, #1, and #2.
When the first request comes in for this Queue, Tymeac places it in a Wait List and
activates thread #0.
The Processing Application Class develops a 'hanging' problem for which it never
recovers. It may be waiting for another resource to become available, etc.
Tymeac considers this thread "working". As any new requests comes in,
Tymeac places the request in a Wait List. As long as no overflow occurs or no
Threshold limit exceeds, then it is reasonable to assume that thread #0 will process the
new request when it finishes with the current one. Therefore, Tymeac does not
activate a new thread.
Thread #0 exceeds a time limit. The Tymeac Monitor sets the status to Cancelled.
If the request thread #0 is working on is an Asynchronous Request, the Tymeac Monitor
places an entry in the Stall Array for this request.
As any new requests comes in, Tymeac places the request in a Wait List. Since no
threads are considered "Working", Tymeac activates thread #1. As long as
thread #1 does not 'hang' as well, it processes the requests normally.
On the next cycle of the Tymeac Monitor, it sets the status of thread #0 to Disabled
and may send a message to the Notification Queue.
Wait Lists
Each Queue must have at least one Wait List. The Queue Maintenance Class (TyQueMaint) or User Classes (TyUserQueues)
defines the number of Wait Lists and number of
physical and
logical entries available in each Wait
List. All Wait Lists use FIFO processing. Tymeac Server, when activating a new
thread, always puts the request into a Wait List. This is so that the first thread looking
for work may process the request rather than making the request wait for the new thread
to become active.
The request (TymeacParm)
specifies the priority of the request. Tymeac puts it into its respective Wait List. Priority 3 goes
into Wait List 3, Priority 1 goes into Wait List 1.
Wait Lists may not be priority lists. Using two or more Wait Lists when the
request always specifies priority 1 is an overflow technique. See the
discussion in the Queue Maintenance Class (TyQueMaint)
for how to use these Wait Lists and the problems of such.
Tymeac attempts to put the request into the Wait List corresponding to its
priority. However, when that Wait List is full, Tymeac puts the request into the next
higher Wait List. This is a primary overflow. When that Wait List is also full,
Tymeac puts the request into the next higher Wait List. This is a secondary overflow.
Overflows are handled errors.
All Wait Lists for a Queue have a fixed number of entries taken from
the Queue Maintenance Class. Using a variable number for each Wait List makes tuning
extremely difficult.
Circumstances change day to day. In reality, nobody monitors any system with extreme diligence. Using a
variable number of entries for each Wait List means that similar Queues may not reflect
similar behavior. When conditions change, the algorithm for determining what is a
problem must be unique for the number of entries factor. The fixed number makes the
calculation simple, affords an at-a-glance picture with numbers and charts, and a
meaningful assumption about similar Queues is possible.
See also the note in the discussion of Disabled Queues.
The Queue Elements Display/Alter Class (TyQueData) permits
modifying the number of entries in Wait Lists during execution. This gives
you another tool with which to tune performance.
The statistics show the high water mark, primary and secondary overflows at Shut
Down and On Request. Eeach Wait List is listed starting at 1:
Times Used
Just what it says.
Highest Used
High water mark for number of pending requests at any time. When this number is
close to the physical number of entries
in a Wait List, then it may be time to increase the physical number of slots.
Times Reset
During a back-out as the result of a scheduling failure, time-out or cancel --
the number of requests removed from this Wait List.
Times overflowed primary
When a request does not fit in the desired Wait List, it overflows into the next
Wait List. Overflow is a handled error. If this number is high, make the
physical number of entries in the Wait
List higher.
Times overflowed secondary
When a request does not fit in the desired Wait List, it overflows into the next
Wait List. However, when that Wait List is also full, the request overflows into
a subsequent Wait List. Overflow is a handled error. If this number is high,
then the physical number of entries in
the Wait List is much too low.
One may use the statistics to observe a trend before it becomes a problem
(high water mark.) One may
increase or decrease the number of Wait Lists and/or entries available for Wait Lists.
See also the Queue statistics for balancing the number of
logical slots in Wait Lists:
- Total times reached Overall Threshold
- Total times reached Individual Threshold
- Total times reached Weighted Average Threshold
|