The presence, in the loop body, of conditionals whose execution does not depend on the value of the index of the outer loop does not affect the load imbalance resulting from the partitioning schemes described so far; each iteration of the outer loop still performs the same amount of work and, consequently, perfect load balance can be achieved. In the special case where the execution of a conditional depends on a condition involving the index of the outer loop and a constant, then, applying index set splitting [20] prior to partitioning, the conditional may be removed [2].
For instance, let the inequality
bound the index of the outermost, parallel
loop
, and the inequality
correspond to a logical expression evaluated
by a conditional in the loop body. Then, the original loop nest can
be split into three
consecutive loop nests, whose indices take values in the following
intervals respectively:
|
Consider the code shown in Figure 1.a;
applying
(6) and assuming that for the values of L, U, A
it is known at compile-time that
L
A
U, the code shown in Figure
1.b results.
One approach to partition
this code is to partition each loop using any of the two schemes described
in Section 3.1, and then assign the same partition
of each loop to the same processor; for both loops, if the number of
iterations is a multiple of the number of processors,
, then
perfect load balance is achieved. In the general case, let
be the number of loop iterations, and
the amount of work
in the body of each loop, respectively;
assuming that the same partitioning
scheme (either by decreasing or increasing order)
is applied to both loops, then the expected load imbalance
is equal to
Whenever
,
the load imbalance can be reduced by
partitioning one of the loops by decreasing order
and the other one by increasing order. This approach is followed
in the code shown in Figure 1.c.