As noted in Section 2, the dynamic task graph (DTG) is essentially an instantiation of the STG representing a single execution for a particular input. The DTG is an acyclic graph containing no control-flow nodes. The time for instantiating the DTG grows linearly with the number of task instances in the execution of the program, but much less computation per task is usually required for the instantiation than for the actual execution. This is an optional step that can be performed when required for detailed performance prediction.
The information required to instantiate the DTG varies significantly across
programs. For a regular, non-adaptive code, the parallel execution
behavior of the program can usually be determined
directly from the program input
(in which we include the processor configuration parameters).
In such cases, the DTG can be instantiated directly from the
STG once the program input is specified.
In general, and particularly in adaptive codes, the parallel execution
behavior (and therefore the DTG) may depend on intermediate
computational results of the program.
For example, this could happen in a parallel -body problem if the
communication pattern changed as the positions of the bodies evolved
during the execution of the program.
In the current work, we
focus on the techniques needed to instantiate the DTG in the former
case, i.e., that of regular non-adaptive codes. These techniques are also
valuable in the latter case, but they must be applied at runtime when the
intermediate values needed are known. The issues to be faced in that case
are briefly discussed later in this section.
There are two main aspects to instantiating the DTG: (1) enumerating the outcomes of all the control flow nodes, effectively by unrolling the DO nodes and resolving the dynamic instances of the branch nodes; and (2) enumerating the dynamic instances of each node and edge in the STG. These are discussed in turn below. Of these, the second step is significantly more challenging in terms of the compile-time techniques required, particularly for sophisticated message passing programs with general computation partitioning strategies and communication patterns.