A Java? Parallel Failure

Java? SE 8 introduced parallel operations but there were severe performance problems. JDK1.8_40 debuted whack-a-mole resolutions for some of the problems while adding a few of its own. (1400 words)
 

Edward Harned (eh at coopsoft dot com, @ed_harned)
Senior Developer, Cooperative Software Systems, Inc.
March 2015 -- Updated May 2016

This article is part three of a three part series on parallel computing in Java?.

Part one deals with the problems of an academic centric Fork/Join framework in a commercial development arena.
Part two deals with the devastating effects the Fork/Join framework has on parallel Bulk Data Operations.
    Since those articles have become extremely length, there is a PDF consolidation.

This part deals with the failure to improve the performance of parallel operations.

The Problem

Invoke a parallel forEach and you may end up with hundreds or thousands of Fork/Join threads. The Fork/Join framework’s bad design cannot handle long call chains without experiencing stack overflows and out of memory errors so the framework throws threads at the problem. An amateur solution with unintended consequences. Now, after a huge stink in the community about the Fork/Join framework's compensation threads policy (JDK-8056248), the architects finally begin treating the symptoms.

A little background.

A fundamental flaw with the F/J framework is that is wants to use intermediate joins. Intermediate joins don’t work outside of a controlled environment.

The F/J framework compensated for the failure of intermediate joins with continuation threads in Java7?.

The F/J framework compensated for the failure of intermediate joins with stalling threads in Java8?.

The architects pretty much gave up with intermediate joins and created the CountedCompleter class in JDK1.8. However, they could not eliminate joining completely since the central design of the framework is dyadic recursive division.

Whack-a-Mole Solution

Now in JDK1.8_40 the F/J framework partially compensates for creating less compensation threads with excessive spinning for work.

In pre-8u40, when a thread exhausted its own deque of work and the request was not finished, the current thread created a continuation thread to look elsewhere and then the current thread issued a join (wait.) The create-new/wait-for-old was necessary so call stacks would not get too long and threads would not lose the chain of Tasks necessary for retrieving the results and for finding the root when exceptions occurred. (More on this below.)

In 8u40, the current thread looks elsewhere itself and does not create a continuation thread. Unfortunately, when it does not find work it continues looking and looking and looking in a spin loop.

The possible reason is that it believes another thread will spit a Task and the current thread will find that new work. While that sounds good in a research environment, excessive spinning doesn’t always work in a production environment. Tasks may not split anymore. Tasks may be in a compute intensive section or waiting for an outside resource and the current thread will spin and spin and spin uselessly.

Even worse, the current thread looks for Tasks another thread stole from its deque and tries to steal descendent Tasks back (selective work stealing.) A little explanation…

Since this framework cannot have a thread execute Tasks split from a different root (it causes problems if the fetched Task abnormally terminates (as in the example below)), the thread must find only related Tasks. That is, Tasks that were spit from the original Task and stolen by another thread. Therefore, even if the current thread finds pending work in another deque it may have to ignore it and continue looking and looking and looking until it finds a descendent Task.

The architect says this is “OK because the joiner has nothing better to do…”
        (JavaDoc in ForkJoinPool::tryHelpStealer)

Wrong.

In a professional work-sharing, Data Parallel framework, a scheduler loads thread queues so threads keep busy doing useful work and threads only look to steal Tasks when there is truly a dormancy.

Threads without work should wait, either for new work or for the completion of the request (a signal from another thread.) Excessive spinning takes CPU resources away from other users. What we call “not playing nice with others.”

Additionally, the excessive spinning and pre-optimization has resulted in more bugs. Albert Einstein was not wrong when he said, “Make things as simple as possible — but no simpler.”

To prove the point, the Oracle? staff now recommend removing this fix in JDK-8080623

Excessive spinning to restrict excessive thread creation is a failure.

Speed improvements misplaced

With the minor improvements in Fork/Join forEach mechanics (JDK-8029452) and since the framework is not creating all those continuation threads, parallel forEach should be significantly faster than the sequential version. Not so. Both parallel and sequential versions run about the same speed. You can download the source code for a demo that uses two levels of nested parallel processing (NestedParallel.java), below. Eliminating the massive thread creation speeded up one side of the parallel equation but the excessive spinning pulled the speed back down.

When adding a third level of nesting, the excessive thread problem returns vigorously and the parallel version can take longer than the sequential version. You can download the source code for a demo that uses three levels of nested parallel processing (NestedParallel2.java), below.

Speed improvements are a failure.

Exception handling mismanagement

When introducing an exception in the third level of nesting (NestedParallel2.java), both sequential and parallel versions simply cannot handle such an event and the threads loop forever in:
    setExceptionalCompletion()
    recordExceptionalCompletion()
    LockSupport.park()

Failure is what happens when threads loose contact with descendent Tasks (above) and threads execute Tasks from another root while holding the chain from the current root. This is a serious design flaw.

In a professional Data Parallel framework, each request gets its own RequestObject that contains fields necessary to track the request and recover from anomalies. Threads execute Tasks from any request in any order since the root is irrelevant. Threads can find any Task's request since each Task contains a reference to the RequestObject. A simple design works better than a multifarious scheme.

Error recovery is a failure.

The excessive threads problem still is present in classes that use the F/J framework

CountedCompleter:

When a CountedCompleter class issues a join(), it can spew hundreds/thousands of compensation threads. You can download the source code for a demo that uses join() to eject excessive threads in a CountedCompleter (MultiRecurCountedSubmitError.java) below.

ForkJoinPool.managedBlocker:

The F/J framework cannot detect a stall. Therefore, it includes an inner interface that informs the Pool a caller is executing a blocking method. The blocking may be for a few nanoseconds or several seconds. It doesn’t matter. The framework has no code to determine the length of a blocking method. Consequently, the framework’s answer to a blocking method is to throw threads at the problem. You can download the source code for a demo that uses a managed blocker to spew excessive threads (MultiRecurCountedManagedSubmit.java), below.

CompletableFuture:

Although there were many changes that limited excessive thread creation (JDK-8056249), the fixes could do nothing for the join problem. CompletableFuture::get is a blocking call; it needs to wait for the get to complete. The get issues a join() which stalls the F/J framework. Consequently, the framework’s answer to a join method is to throw threads at the problem. You can download the source code for a demo that uses a CompletableFuture to discharge excessive threads (MultiCompletables.java), below.

Phaser:

When using a large number of parties in a tiered Phaser, the framework creates compensation threads while each original thread waits for sub-phasers to arrive. The more parties, the more compensation threads. You can download the source code for a demo that uses a TieredPhaser to disgorge excessive threads (TieredPhaser.java), below.

The fix for the excessive threads problem is a failure.

Conclusion

Since day one, the problem has been a framework designed to appease the academic community endeavoring to perform in a commercial development domain. Every patch, tweak, goose, fudge, dodge, hedge, hack, and massage used to make this framework perform admiringly has failed.

The main reason this framework performs at the speed it does is due to the excessive use of the Unsafe.class to manipulate memory and avoid normal Java? functions like bounds checking (resembling a native ANSI C program.)

Perhaps someday the Java? engineers will stop beating a dead horse, recognize a failed dream when they see it, and embrace a redesign that considers functional correctness over theoretical efficiency.

References

Download the source code for the article here.

Download the PDF consolidation

Part one article — A Java? Fork-Join Calamity
http://coopsoft.com/ar/CalamityArticle.html

Part two article — A Java? Parallel Calamity
http://coopsoft.com/ar/Calamity2Article.html

JDK-8056248: Improve ForkJoin thread throttling behavior
https://bugs.openjdk.java.net/browse/JDK-8056248

JDK-8080623: CPU overhead in FJ due to spinning in awaitWork
https://bugs.openjdk.java.net/browse/JDK-8080623

JDK-8029452 Fork/Join task ForEachOps.ForEachOrderedTask clarifications and minor improvements
https://bugs.openjdk.java.net/browse/JDK-8029452

JDK-8056249: Update CompletableFuture
https://bugs.openjdk.java.net/browse/JDK-8056249

Missed submissions in latest versions of Java ForkJoinPool
http://jsr166-concurrency.10961.n7.nabble.com/Missed-submissions-in-latest-versions-of-Java-ForkJoinPool-td12566.html

Intermediate Joins in Part I article
http://coopsoft.com/ar/CalamityArticle.html#faulty

Intermediate Joins in Part II article
http://coopsoft.com/ar/Calamity2Article.html#join

Counted Completer
http://www.coopsoft.com/ar/Calamity2Article.html#counted

A Java Fork/Join Framework — Doug Lea

Data parallelism - Wikipedia, the free encyclopedia
http://en.wikipedia.org/wiki/Data_parallelism

Stall Article
http://coopsoft.com/ar/StalledArticle.html

Unsafe.class
http://www.coopsoft.com/ar/CalamityArticle.html#complex

About the Author

Edward Harned is a software developer with over thirty years industry experience. He first led projects as an employee in major industries and then worked as an independent consultant. Today, Ed is a senior developer at Cooperative Software Systems, Inc., where, for the last fourteen years, he has used Java? programming to bring parallel solutions to a wide range of tasks.

The female stars cheap hair extensions by real hair extensions and Tang Yin are rushing to dye this hair extensions one by one – “Power Tang (Honey Sugar) color!

? 2011 - 2017  E.P. Harned  All rights reserved