An Example of Task Graphs: Sweep3d

In order to illustrate the above, consider again Sweep3D (which was mentioned in the Introduction) [13] The main body of the code, that is the subroutine sweep(), consists of a wavefront computation on a 3-dimensional grid of cells. The subroutine computes the flux of neutron particles through each cell along several possible directions (discretized angles) of travel. The angles are grouped into 8 octants corresponding to the 8 diagonals of the cube. Along each angular direction, the flux of each interior cell depends on the fluxes of three neighboring cells. This corresponds to a 3-dimensional pipeline for each angle, with parallelism existing between angles within a single octant. The current version of the code partitions the

and

dimensions of the domain among the processors. To improve the balance between parallel utilization and communication in the pipelines, the code blocks the third (

) dimension and also uses blocks of angles within each octant.

The static task graph for the main body of the code is shown in Figure 1(a). Each node of the graph represents a different task node, where circles correspond to control flow operations, ellipses to communication operations, and rectangles to computation (each rectangle represents a condensed task node, viz., a task node in the static task graph that will be instantiated into several instances of condensed tasks in the condensed dynamic task graph). Solid lines denote those precedence edges of the task graph that are enforced implicitly by intraprocessor control flow while the dotted lines denote those that require interprocessor communication.¹The program uses blocking communication operations (MPI_Send and MPI_Recv), each of which is represented by a single communication task.

Figure 1: Task graphs for Sweep3D. Solid lines depict control-flow; dashed lines depict communication.

$\psfig{figure=adve-sakellariou-fig1a.eps,width=50truemm}$
(a) Static task graph

$\psfig{figure=adve-sakellariou-fig1b.eps,width=90truemm}$
(b) Part of dynamic task graph on a $3\times 3$ processor grid.

Figure 1(b) shows part of the condensed dynamic task graph on a $3\times 3$ processor grid (recall that loops are fully unrolled in the dynamic task graph). The graph corresponds to the last wavefront of octant 2 (top) and the first wavefront of octant 3 (bottom). The number inside each computation task corresponds to the processor executing that task.

Using the condensed form of the dynamic task graph described earlier (see Section 3.4), the size of the dynamic task graph can be kept reasonable. Thus, for the the subroutine sweep() with a

problem size on a $2\times 3$ processor grid, the dynamic task graph would contain over $24 \times 10^6$ tasks, whereas the condensed dynamic task graph has only 3570 tasks.