Добавил:
Upload Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
java_concurrency_in_practice.pdf
Скачиваний:
104
Добавлен:
02.02.2015
Размер:
6.66 Mб
Скачать

166 Java Concurrency In Practice

you actually start timing. On HotSpot, running your program with -XX:+PrintCompilation prints out a message when dynamic compilation runs, so you can verify that this is prior to, rather than during, measured test runs.

Running the same test several times in the same JVM instance can be used to validate the testing methodology. The first group of results should be discarded as warm up; seeing inconsistent results in the remaining groups suggests that the test should be examined further to determine why the timing results are not repeatable.

The JVM uses various background threads for housekeeping tasks. When measuring multiple unrelated computationally intensive activities in a single run, it is a good idea to place explicit pauses between the measured trials to give the JVM a chance to catch up with background tasks with minimal interference from measured tasks. (When measuring multiple related activities, however, such as multiple runs of the same test, excluding JVM background tasks in this way may give unrealistically optimistic results.)

12.3.3. Unrealistic Sampling of Code Paths

Runtime compilers use profiling information to help optimize the code being compiled. The JVM is permitted to use information specific to the execution in order to produce better code, which means that compiling method M in one program may generate different code than compiling M in another. In some cases, the JVM may make optimizations based on assumptions that may only be true temporarily, and later back them out by invalidating the compiled code if they become untrue.[8]

[8] For example, the JVM can use monomorphic call transformation to convert a virtual method call to a direct method call if no classes currently loaded override that method, but it invalidates the compiled code if a class is subsequently loaded that overrides the method.

As a result, it is important that your test programs not only adequately approximate the usage patterns of a typical application, but also approximate the set of code paths used by such an application. Otherwise, a dynamic compiler could make special optimizations to a purely single threaded test program that could not be applied in real applications containing at least occasional parallelism. Therefore, tests of multithreaded performance should normally be mixed with tests of single threaded performance, even if you want to measure only single threaded performance. (This issue does not arise in TimedPutTakeTest because even the smallest test case uses two threads.)

12.3.4. Unrealistic Degrees of Contention

Concurrent applications tend to interleave two very different sorts of work: accessing shared data, such as fetching the next task from a shared work queue, and thread local computation (executing the task, assuming the task itself does not access shared data). Depending on the relative proportions of the two types of work, the application will experience different levels of contention and exhibit different performance and scaling behaviors.

If N threads are fetching tasks from a shared work queue and executing them, and the tasks are compute intensive and long running (and do not access shared data very much), there will be almost no contention; throughput is dominated by the availability of CPU resources. On the other hand, if the tasks are very short lived, there will be a lot of contention for the work queue and throughput is dominated by the cost of synchronization.

To obtain realistic results, concurrent performance tests should try to approximate the thread local computation done by a typical application in addition to the concurrent coordination under study. If the the work done for each task in an application is significantly different in nature or scope from the test program, it is easy to arrive at unwarranted conclusions about where the performance bottlenecks lie. We saw in Section 11.5 that, for lock based classes such as the synchronized Map implementations, whether access to the lock is mostly contended or mostly uncontended can have a dramatic effect on throughput. The tests in that section do nothing but pound on the Map; even with two threads, all attempts to access the Map are contended. However, if an application did a significant amount of thread local computation for each time it accesses the shared data structure, the contention level might be low enough to offer good performance.

In this regard, TimedPutTakeTest may be a poor model for some applications. Since the worker threads do not do very much, throughput is dominated by coordination overhead, and this is not necessarily the case in all applications that exchange data between producers and consumers via bounded buffers.

12.3.5. Dead Code Elimination

One of the challenges of writing good benchmarks (in any language) is that optimizing compilers are adept at spotting and eliminating dead code code that has no effect on the outcome. Since benchmarks often don't compute anything, they are an easy target for the optimizer. Most of the time, it is a good thing when the optimizer prunes dead code from a program, but for a benchmark this is a big problem because then you are measuring less execution than you think. If you're lucky, the optimizer will prune away your entire program, and then it will be obvious that your data is bogus. If

Соседние файлы в предмете [НЕСОРТИРОВАННОЕ]