Improving WCET Evaluation using Linear Relation Analysis

The precision of a worst case execution time (WCET) evaluation tool on a given program is highly dependent on how the tool is able to detect and discard semantically infeasible executions of the program. In this paper, we propose to use the classical abstract interpretation-based method of linear relation analysis to discover and exploit relations between execution paths. For this purpose, we add auxiliary variables (counters) to the program to trace its execution paths. The results are easily incorporated in the classical workﬂow of a WCET evaluator, when the evaluator is based on the popular implicit path enumeration technique . We use existing tools – a WCET evaluator and a linear relation analyzer – to build and experiment a prototype implementation of this idea.


Abstract
The precision of a worst case execution time (WCET) evaluation tool on a given program is highly dependent on how the tool is able to detect and discard semantically infeasible executions of the program.In this paper, we propose to use the classical abstract interpretation-based method of linear relation analysis to discover and exploit relations between execution paths.For this purpose, we add auxiliary variables (counters) to the program to trace its execution paths.The results are easily incorporated in the classical workflow of a WCET evaluator, when the evaluator is based on the popular implicit path enumeration technique.We use existing tools -a WCET evaluator and a linear relation analyzer -to build and experiment a prototype implementation of this idea.

Introduction
The computation of a precise and safe approximation of the worst case execution time (WCET) of programs on a given architecture is an important step in the design of hard real-time systems [41].
It is part of the validation of the design, and a prerequisite for tasks scheduling.In this computation, over-approximation is mainly due to pessimistic abstraction of (1) complex hardware mechanisms (caches, pipeline) and ( 2) the program semantics (loop bounds, infeasible executions).Taking into account the target execution platform is, by far, the most difficult problem.It has been largely studied in the literature and remarkable tools exist, both in the academia [5,27,29] and in the industry [40].
In this paper, we specifically address the problem of taking into account the program semantics.The objective is to extract semantic properties that make some executions infeasible, and to exploit these properties in an existing WCET evaluator.It is generally admitted that such properties are easier to analyze on high-level code -e.g., C programs -than on binary, even if semantic analysis of executable code has been explored [3,4,36].WCET evaluation is performed on object code in order to be able to take into account the execution architecture.This raises the problem of traceability between the source and the object code.
The most popular approach to evaluate the WCET is called implicit path enumeration technique (IPET) [28].A micro-architectural analysis provides an evaluation of the duration of each basic block of the object-code control-flow graph.The WCET is then expressed as the solution of an integer linear programming problem (ILP) where the variables are the number of times each basic block is traversed during an execution.Relations between these variables come from the control-flow graph (flow equations) and from semantic "flow facts", including at least loop bounds.Indeed, each loop in the program should have a constant bound to guarantee that the execution time is finite; such bounds may be provided by the user, or discovered by program analysis.
Hence, the IPET-based evaluation takes into account semantic properties expressed as linear constraints on counters.A natural idea is then to combine it with a semantic analysis devoted to the discovery of invariant linear relations.Polyhedra-based abstract interpretation [2,8,17,20], also called linear relation analysis (LRA), is such an analysis.It is able to associate with each control point of a sequential program a system of linear inequalities (whose set of solutions is a convex polyhedron) satisfied by the numerical variables at this control point in any execution of the program.
Our proposal consists in applying LRA to a copy of the source program enriched with counter variables, and translate the obtained relations between counters into constraints to be added to the ILP.Let us illustrate this proposal with a simple example.

An example
Consider the program fragment of Figure 1.a with its control-flow graph (Fig. 1.b).Let us add counters α, β, γ to the main basic blocks as done in the instrumented program Figure 1.c.These counters are initialized to 0 and incremented in their corresponding block.An LRA analysis of this instrumented program automatically discovers that the following relations are satisfied at the end of the program: The inequality α = 100 gives the exact bound of the loop.More interestingly, β + γ ≤ 110 means that there are at most 10 iterations of the loop where both blocks b 3 and b 5 are executed.
Assume the object code has the same control structure as the C program, i.e., the basic blocks of their control flow graphs are in an one-to-one correspondence.The standard WCET evaluation computes pessimistic execution times (say t i , i = 0..6) of the basic blocks (b i , i = 0..6), and constructs the following ILP, where n i (resp., e i,j ) denotes the number of occurrences of the basic block b i (resp., the edge from b i to b j ) in an execution of the program: , e If we are able to maintain the correspondence between basic blocks in the source and the object code, i.e., to associate our counters α, β, γ with the variables of the ILP (n 2 , n 3 , n 5 respectively), we can add to the ILP the corresponding constraints: n 2 = 100, n 3 + n 5 ≤ 110, which is likely to reduce the maximum value of the objective function1 .

02:4
Improving WCET Evaluation using Linear Relation Analysis

Contents of the paper
In Section 2, we focus on some available tools, and experiment their semantics awareness on some simple examples.Two recent papers were dedicated to the state of the art related to semantic analyses for WCET estimation and infeasible path detection [1,10].Section 2.3 presents some more recent publications.
Our proposal consists in combining existing techniques, namely IPET-based WCET analysis and Linear Relation Analysis, recalled in Section 3, together with the specific tools that we used in our implementation.In Section 4, we explain how the counters are added and related to ILP counters thanks to debugging information provided by the compiler.Our implementation of the method is used to validate the approach on two existing benchmarks.We also investigated the robustness of the approach in presence of compiler optimizations.These experiments are summarized in Section 5. We conclude with the discussion of possible future work.

Existing tools
We have experimented with some existing tools, to evaluate their ability to discover and exploit semantic properties.Four tools have been considered, all of which go through similar steps: 1. extracting a control-flow graph from the object code, 2. performing a set of micro-architectural analyses to obtain execution times for each basic blocks, 3. using IPET to compute a safe WCET.
We compare these tools with respect to their capabilities to extract semantic properties to cut infeasible paths.

The Chronos Timing Analyzer
Chronos [27] is an academic tool developed at National University of Singapore.It takes as input a C program, performs limited data-flow analysis at C source code level to determine loop bounds, and requests the user to provide this information when it fails.The semantic analysis in Chronos uses a pattern-based method to detect infeasible paths [39].The so-called two-phase technique addresses infeasibility from a conflicting pairs point of view.In the first phase, an analysis detects some conflicts that capture the fact that two branches can not be taken along the same path.In the second phase, each conflicting pair relation is encoded into an ILP constraint.

The Swedish Timing Analyzer
SWEET 2 [29] is a research toolbox developed at Mälardalen Real-Time Research Center (MRTC).The main objective of SWEET is flow analysis, which computes flow-facts, i.e., information about loop bounds and infeasible paths in the program.The main technique to discover flow-facts is abstract execution [16].Abstract execution is a form of context-sensitive abstract interpretation, because it uses a symbolic execution to produce context information for each loop iteration and function call.Instead of using the fixpoint engine of abstract interpretation, abstract execution executes the program in the abstract domain, merging the execution paths at certain points in the program.SWEET does not support LRA.It currently implements only the abstract domain of intervals.

AbsInt -The aiT Tool
Developed by AbsInt 3 , aiT is the main industrial product for WCET analysis.It consists of a set of binary executables analyzers, which take the intrinsic cache and pipeline behavior into account.Concerning semantic analysis, aiT uses a value analysis based on intervals [13] to compute safe ranges of values for the program variables.aiT uses this information to determine loop bounds and detect infeasible paths.The approach towards computing loop bounds is not general, but it handles loop patterns.In order to gain precision, aiT pre-processes each loop by transforming its body into a function, in order to expose the iteration contexts.The key element in this transformation is to identify the loop index and to set it as a function parameter.Then, an interval analysis computes the ranges for all the loop variables.The loop transformation is based on loop patterns, which depend on the particularities of the architecture (e.g., parameter order) or on the loop structure (e.g., for-loops, triangular-loops, branch conditions).aiT is able to detect infeasible paths using the results of the value analysis, like conditions made infeasible because of the computed intervals.

oRange, the flow fact analyzer of OTAWA
OTAWA [5] is an academic toolbox, developed at IRIT (University of Toulouse), designed as a generic framework to develop static analyses for WCET computation.Although OTAWA implements several approaches to WCET computation, the one based on IPET is the most mature.OTAWA relies on an auxiliary tool, called oRange [9], to compute loop bounds.oRange analyses C code.As a first phase, oRange detects loop indices and constructs a normal form: a symbolic expression of the bound independently of the call context.In a second phase, by an abstract execution, a syntactic tree is built in function of a full or partial call context.It combines loop bounds and conditional expressions as numeric or symbolic expressions.Finally, the tree is computed in the full context in order to produce a file in the specific flow-facts format FFX [42].

Some experiments
In order to evaluate the capabilities of these tools to detect infeasible paths, we have applied each of them to programs containing various situations of semantic infeasibility.These situations are given in Figure 2: Example 1 is a case where simple pattern-based method may fail, since constant propagation is needed.Example 2 may be a problem for pattern-based methods for finding iteration numbers, since the apparent index x is modified.Example 3 is our introductory example of §1.1.Example 4 is a fragment of code generated by the SCADE 4 compiler, from a design manipulating arrays.On one hand, the loops are exited from inside, which complicates the evaluation of iteration numbers.On the other hand, the third loop is unreachable because of some non-trivial arithmetic conditions.In this paper, we propose a method and a tool-chain that is able to discover the infeasible paths of these 4 examples, namely, infeasible paths that depend on a semantic analysis and that may concern distant program points.

Other approaches
An extended state of the art related to semantic analyses for WCET estimation can be found in [1] and a general survey of infeasible path detection in [10].We complement them with some more recent publications.
Several recent works make use of SMT solvers [23,37].The idea is to ask the solver if the worst-case path obtained by the ILP solver is feasible.Whenever the path is infeasible, a corresponding constraint is added to the ILP.As in our approach, adding constraints does not always mean that the WCET is refined (2 paths may have the same WCET).In [18], the whole

WCET evaluation with OTAWA
The WCET estimation work-flow (Figure 3) involves a compiler, a Linear Program solver, and two tools from the OTAWA toolbox: oRange and owcet.
Compilation: the source C code is compiled by a third party tool; for this experiment, we use a cross compiler from the GNU Compiler Collection (arm-elf-gcc 4.4.2),but other compilers can be used, provided that it produces ELF code (Executable and Linkable Format), with debugging information in DWARF format.oRange is a data flow analyzing tool, dedicated to the discovery of loop bounds.Bounds are stored in the OTAWA flow facts format (FFX).owcet is the OTAWA command dedicated to the WCET evaluation.The main steps of this tool, not detailed in Figure 3, are: the construction of the control-flow graph (CFG) of the object code; during the construction, and thanks to debugging information, basic blocks (BB) are associated (if possible) to lines in the source program; thanks to this correspondence, the loop bounds computed by oRange are translated into control flow constraints in the CFG.The annotated CFG can be dumped in a file, allowing other tools to exploit it.the micro-architectural analysis, which associates a local WCET estimation with each BB of the CFG. the construction of the Integer Linear Programming (ILP) system; as in the introduction example ( §1.1) the resulting system gathers (1) structural constraints (CFG structure), (2) loop bounds constraints (from oRange flow facts) (3) the objective function to be maximized (sum of BB counters weighted by their local WCET).ILP solver: the ILP system is then solved by a third-party tool; OTAWA integrates and uses LP_SOLVE 6 (any other equivalent tool can be used).

Principles of LRA
Linear Relation Analysis [8] is a classical program analysis, based on abstract interpretation [7].It is able to discover, at each control point of a sequential program, a conjunction of linear relations (equalities and inequalities) invariantly satisfied by the numerical variables at this point.Classical algorithms are used to propagate linear systems over the statements of the program.Several causes may result in information loss: the analysis safely ignores non-linear expressions in assignments and tests; the analysis performs a convex hull at control path junctions, instead of propagating the disjunction of incoming information.It means that the propagated value is the most precise conjunction of linear relations implied by both incoming systems; to avoid infinite propagation along loops, the classical widening-narrowing method is applied to guess a safe approximation of the limit.Note that, unlike in symbolic execution [23] or SMT methods [18], loops are not unrolled.

Applying LRA to our example
We do not detail further the techniques applied in LRA, and refer the reader to the bibliography.We just show the main steps of the analysis of our example of Figure 1.Let us consider the control point at the entry of the while loop.The first step of the analysis straightforwardly computes the first iterate: Its propagation through the loop body provides, with a convex hull at the end of the conditional: Now a convex hull with the first iterate gives the second iterate at the entry of the loop: Instead of continuing the iterations, a first widening/narrowing step is performed (using "lookahead widening" [14]), which provides: Now, the "else" branch of the test x<10 becomes feasible, and a second widening/narrowing step is performed, providing: P. Raymond et al.

02:9
which is found invariant after one more propagation.Propagated to the end of the program, it becomes:

LRA and loop bounds
It may happen, like in the previous example, that LRA discovers a bound to a loop counter, thus providing an essential information for WCET evaluation.However, finding loop bounds is not our main goal in this work, as the method is intrinsically unable to discover non linear relations, which drastically limits its capability to find loop bounds.As a matter of fact, in presence of nested loops, the number of executions of the body of the innermost loop is not linear in the constants of the program.For instance, in the program fragment " for(i=0;i<n;i++){for(j=i;j<n;j++){...}}" the number of executions of the body of the innermost loop is n(n + 1)/2, which cannot be found by LRA.
The LRA method must then be used together with some other method able to bound nested loops.We can use existing tools such as oRange that comes with OTAWA, or more basically user-given bounds, given as pragmas in the code.
There exist also approaches based on polyhedra manipulation to find loop bounds, such as the one proposed in [38,30]: it consists in building a polyhedral upper approximation P of the iteration domain, i.e., the set of possible valuations of loop counters (in the previous example, Under realistic assumptions concerning the determinism of the program, the number of executions of the innermost loop is bounded by the number of integer points in P , and algorithms are available to compute this number.Notice that LRA can be combined with this approach, since it can discover linear invariants reducing the iteration domain, thus improving the precision of the result.Notice also that LRA can deal with parameters (symbolic bounds, like n in our example), an issue specifically addressed by [38].

The PAGAI prototype analyzer
Several tools performing LRA are available ([2, 12, 19, 20] to cite a few).Here, we use the PAGAI prototype analyzer, which implements the basic LRA together with recent improvements like "lookahead widening" [14] and SMT-based "path focusing" [19].PAGAI analyses LLVM code [24] produced from a C program (thanks to Clang7 ), and is able to return discovered properties at the C level.PAGAI may be used with other abstract domains than general linear systems -like octagons [33] -thanks to the common interface APRON [21].

4
Adding and tracing counters These tools are detailed in this section.We illustrate the successive steps of the method by detailing the processing of an example program, called lcdnum.c,extracted from TacleBench programs suite [15].The main program is given in Figure 5.It calls a function num_to_lcd, the execution time of which is taken into account by OTAWA.

Instrumented program version
The goal of the front-end ("instrumentation", Figure 4, top) is to produce, from the original C code, a reference C program.Some semantics preserving transformations of the source code are necessary or advisable, in order to use properly the analyzers, and trace the information between them.
Some transformations are purely lexical, and do not change the program structure: because the standard ELF/DWARF traceability mechanism is line-based, line breaks are introduced to isolate each atomic statements on its own line.Some transformations that modify the control structure are necessary because of the limitation of the analyzers.For instance, a single-return statement per function is mandatory for exploiting the results of PAGAI: this unique control point is the place where counter invariants actually express properties on the whole execution of the function.Other transformations are required because of the limitation of both OTAWA and PAGAI: the control structure (CFG) must be statically known, which forbids dynamic computation of program pointers.In particular, "switch/case" statements must be rewritten into a static control structure based on "if" and "goto" statements.
Another transformation is desirable in our case: the current version of PAGAI does not handle inter-procedural analysis.In order to exploit the plain capacity of this tool to find invariants, a light-weight solution is to inline function calls at the source level.This transformation is indeed hardly admissible in real-life, but it must be seen here as a "trick" to reach our goal (study the ability of LRA to detect infeasible executions).The front-end produces the reference C code in two flavors.
The reference C code with counters (Figure 4, right) is instrumented with auxiliary counters, in the same manner as in the introductory example ( § 1.1).The present version introduces a counter for each sequential block in the program control flow.However, some strategy could be used to reduce the number of counters by targeting blocks that are more likely to have an influence [43].
The reference C code without counters (Figure 4, left) is the same code, where all lines related the counters (declaration, initialization and incrementation) have been commented out.This method ensures a semantic equivalence between the programs analyzed by OTAWA and PAGAI: since they only differ on the side-effect-free local variables, these programs are naturally input/output equivalent.Moreover, at least at the source level, the two programs are also structurally equivalent: a block in the reference C code is executed if and only if the corresponding block (marked with a counter c) is executed in the reference C program with counters.This property becomes false in general at the binary level, since the C compiler may modify the control structure: this well-know problem of traceability is discussed later.An auxiliary file is generated, that contains the mapping between each counter and its corresponding source line in the reference C code.
Example 1. Applied to our example program (Figure 5), our instrumentation front-end calls the C preprocessor, eliminates the multiple returns and switches (only within num_to_lcd, not shown), and produces the reference C programs.The first one (without counters) is shown on Figure 6; the second (not shown) is exactly the same with uncommented lines involving counters.An auxiliary file (not shown) simply lists the pairs "counter/line" (e.g., (cptr_main_1,144), (cptr_main_2, 147)).
The first version is provided to OTAWA.Loop bounds computation by oRange is optional, which allows us to check if PAGAI is able to find them on its own.OTAWA calls the gcc compiler (here with -O0 optimization level), builds the CFG of the object code, performs the micro-architectural analysis, and builds the ILP problem.
PAGAI is applied to the second version of the program, and returns the following invariants: The first equation finds the exact loop bound (which may also be found by oRange).The second equation is structural (from the shape of the source CFG, cptr_main_2 and cptr_main_4 are equal).The third property is new, and expresses, in particular, that the function num_to_lcd is called at most 5 times.

Tracing back the counters
The back-end ("ILP constraints translation & merge", Figure 4, bottom) gathers the information coming form OTAWA and PAGAI: Thanks to the counter/C-line mapping provided by the front-end, and the C-line/binary-BB mapping provided by OTAWA (through the ELF/DWARF information), a counter/BB mapping is built.Note that this mapping is partial, and deliberately pessimistic: depending on the compilation process, it may happen that a counter is associated either to zero or to several binary basic blocks.In this case, the counter is simply ignored: only counters that are associated to one single BB are retained.

L I T E S
Example 1 (cont.).Table 2 shows the mapping between counters and blocks that is built by our back-end.
The linear constraints on the retained counters are then translated literally into linear constraints on BB, and added to the basic ILP system provided by OTAWA.
Example 1 (cont.).The translation of the constraints discovered by PAGAI is the following: x4_main = 10; x6_main = 10; x5_main <= 5; At last, both systems are solved and the corresponding estimations can be compared.

Traceability and optimization
In our framework, traceability is the ability to relate execution paths in the binary code (bin.CFG) to execution paths in the source code (source CFG).
Some optimizations performed by the compiler may strongly modify the control structure and thus alter traceability: loop unrolling, block replication, out-of-order execution.This is why most of the related works assume no compiler optimization to guarantee a perfect matching between the two CFGs.
However forbidding optimization is not satisfactory in real-time domains, where execution times have to be predictable, but also short.For a standard compiler like gcc, the observed speed-up between no optimization (-O0 option) and a standard level of optimization (-O1) is around two.
The most satisfactory solution would be a compiler that provides a precise traceability even in case of CFG optimization.Some work has been done to design and/or adapt the compilation process for this purpose, for instance [26,31,22].
Unfortunately, off-the-shelf standard compilers such as gcc hardly provide a precise and reliable information in case of CFG optimization.The idea is then to use the compiler options in order to forbid (as far as possible) CFG transformations, but still allow other optimizations, in particular those that concern data management.
The gcc compiler proposes numerous options to control optimizations, but there hardly exists a comprehensive and exhaustive description of their effects and inter-dependencies.For this experiment, we have empirically defined a customized level (called CO in the sequel).We started from the standard -O1 level, and removed about 20 individual optimizations using the -fno directive (see appendix B).We cannot guarantee that this customized level will preserve the CFG for all programs, but the method is safe: as explained in Section 4.3, a counter (and then a source code line) that is not associated to exactly one basic block of the binary code is simply ignored.As a consequence, the only risk is to lose information that would have made the WCET estimation tighter.Note that this statement suppose that the gcc debugging information is reliable, which is indeed unprovable, but empirically reasonable.

L I T E S
02:14 Improving WCET Evaluation using Linear Relation Analysis Example 2. When applying the CO method to our running example, we get 100% traceability.As a consequence, the interesting counter property (5-cptr_main_3 >= 0) can still be translated into a BB constraint (x11_main <= 5;) leading to the final result: Estimation WITHOUT PAGAI: 641 Estimation WITH PAGAI: 421 On this example, we observe that code optimization leads to an initial WCET estimation 2.4x smaller (641 vs 1540).The traceability is preserved and the improvement due to the counter analysis is of the same order (34.3% vs 38.6%).

Benchmarks
We tested our approach on programs from the TacleBench [11], a set of C programs widely used in the WCET community.8A first check has been made to retain only purely sequential programs that compile "out of the box": 53 applications of the 58 in the TacleBench9 For each program, we try to estimate the WCET of all functions appearing in the code, including the top-level one (main).For each function, inner function calls are recursively inlined at the C level (see Section 4.2).Recursive functions are rejected during this step, and not considered for WCET analysis.
Our goal is to study the influence of our counter-based method (Fig. 4) on a classical estimation (Fig. 3).A prerequisite is therefore that a reference estimation exists; hence the programs for which the basic WCET estimation fails are not selected.The OTAWA estimation may fails because of unsupported programming features (pointer arithmetics), or because the analysis does not terminates before a chosen timeout (2 hours).
After this initial selection, 589 functions (out of 639) from the 53 programs of the TacleBench suite are retained.

Experimental setup
The proposed framework as presented on Fig. 4 has numerous parameters (C code instrumentation, linear analysis tuning, compiler optimization etc.) leading to a combinatorial numbers of possibilities.For this systematic experiment, we focus only on two kinds of parameters: those that influence the precision of linear analysis, and those that influence the traceability.The other parameters are fixed once and for all as follows: OTAWA hardware model: our goal is not to bench or "stress" OTAWA in terms of hardware.
We only want it to give an initial IPET system in which we will insert flow facts discovered via LRA.In order to maximize the number of test benches for which OTAWA gives an initial ILP in reasonable time, we consider a very simple, cache-free, ARM-based architecture.Misc.CFG transformations: some CFG transformations are necessary, due to limitations of OTAWA (switch statements not supported) and /or PAGAI (multiple return statements).This transformations are performed using the CIL library [34].Inlining: because the current version of PAGAI has limited support for inter-procedural analysis, function calls are systematically inlined.This transformation is also implemented using the CIL library.This method improves the precision of the analysis, but makes the analysis much more costly in time and memory.

(b) octagon domain
Figure 7 LRA analysis statistics on 589 functions, for the two relational abstract domains.
Loop bounds: as explained in 3.2.3,our method is intrinsically unable to bound nested loops, so a complementary method is necessary to find loop bounds.For this purpose, we can use oRange, but it appears that the CFG transformations performed using CIL strongly alters its performance10 .In order to maximize the size of the benchmark we thus systematically exploit, when available, the user pragmas given in source code.Nevertheless, we made a complementary experiment, without using pragmas nor oRange, in order to identify the cases where LRA is sufficient to bound the execution time.

Lessons learnt
This section presents the lessons learnt form the experiment, by focusing on several points: the ability of the linear analysis to discover "flow facts", and hopefully to enhance the WCET estimation; the influence of the abstract domain on the analysis; the ability of linear analysis to discover loop bounds, and finally the influence of compiler optimizations on traceablity.

Linear analysis and flow facts discovery
When traceability allows it, the constraints discovered by linear analysis are directly translated into flow facts giving information on the (im)possible execution paths.These flow facts may be useless if they are redundant with the structural constraints, otherwise they are new facts, giving non trivial information on the execution paths.However, even new facts can be useless if they do not concern the worst case execution path.A utility has been developed to check whether the facts discovered by LRA analysis are new or not.Each fact is checked by adding its negation to the set of structural constraints: the fact is redundant if and only if the system becomes infeasible.Figure 7 gives statistic on the behavior of the LRA method, for the two relational domains (octagon and polyhedra).Let us focus on the polyhedra case first (a).The PAGAI tool terminates for 509 cases out of 589 (86.4%); for the missing cases (13.6%), it runs out of resources in memory or time.Flow facts are found in 347 cases, and at least one fact is new for 268 ones; finally, new facts lead to a WCET improvement for 76 cases.Statistics are similar for the octagon domain, except that it terminates more often: this explains why the WCET is enhanced more often with octagons, even if this domain is less precise.A possible conclusion is that LRA, when it works, is actually good at finding non redundant semantic facts (more than half of the time, when it terminates), but that those facts do not necessarily lead to a WCET improvement (about 15% of the termination cases).

Abstract domains
The main goal of the experiment is to observe the influence of the linear analysis on the WCET estimation.The linear analysis performed by PAGAI is parameterized by the choice of an abstract domain to represent the possible values of the counters.Two domains proposed by PAGAI are relational, and thus are likely to express relations between our counters and the original variables in the programs: The polyhedra domain is the most precise since it can handle any linear relation, and its algorithmic cost is exponential in the worst case.
The octagon domain handles intervals and bounded pairwise sums or differences.It is less precise, but has a polynomial cost in the worst case: O(n 3 ) in time, and O(n 2 ) in space.
To be exhaustive, we also consider the domain box, which handles only intervals.Since this domain is non-relational, it is intrinsically unable to relate our additional counters to the program variables.The flow facts that can be discovered with the box domain are thus limited (basically, counters stuck down to zero, which correspond to dead code).
The WCET estimation is improved by at least one domain for 90 functions.The gain ranges from negligible (0.1%) to interesting (around 10%) or even huge (more than 50%).We limit here the comments to the cases where the enhancement is greater 0.8%.The detailed results for these 60 cases are given in appendix (table 6, page 24), and a selection of typical cases is given in table 3.
The experiment gives some interesting information: The interest of the box domain is very limited: it is an indirect way of performing constant propagation and dead code "pruning".Most the time it gives no improvement (42 out of 60, e.g., md.13, gs.9).However, since it is the cheapest domain, it may give results when other domains fail (6 times, e.g., mp.9).
When both octagons and polyhedra terminate, they often give the same result (34 out of 60 cases, e.g., cr.2, md.13).However there are some cases (12 out of 60, e.g., gs.9),where the expressiveness of polyhedra is actually useful (constraint involving 3 or more variables, and pairwise relations with non unit coefficients).
In compliance with the theoretical complexity, octagons may terminates while polyhedra fails (7 cases, e.g., md.14).Nevertheless, there is also one case where octagons fail while polyhedra works (md.5).This is due to the fact that the cost of octagons is almost always cubic in the number of variables, while the exponential cost of polyhedra is rarely reached in practice.

Loop bounds
LRA is intrinsically limited to the discovery of single loop bounds (cf.3.2.3).We made a complementary experiment to check if and when LRA actually finds such loops.For this experiment, we only consider the short-list of programs from Table 6 where PAGAI terminates when using a relational domain (octagon or polyhedra); as a matter of fact, using the box domain is irrelevant since it can't find any loop bound other than 0. For these 54 programs, we have: computed the loop level, which is maximal depth of nested loops appearing in the program (0: no loop at all, 1: only single loops, 2 or more: nested loops); launched our tool without using oRange nor user-pragmas.The LRA analysis is performed twice: with the octagon and the polyhedra domain, and we keep only the best result.
Table 7 (page 25) lists the results; the column "pagai" simply indicates if the analysis give a bounded WCET, since the WCET value is, in this case, the same as the one in Table 6.
There are 10 test cases that are loop-free, and thus with no bounds to found.There are 25 programs with only single loops (level=1); these are the cases where PAGAI is supposed to find bounds, and it actually does it for most of the cases (19 out of 25).In fact, PAGAI finds the bounds for all loops that are semantically guarded by a counter condition, that is, for loops or equivalent.The cases where PAGAI does not find bounds are those where the loop is guarded by a points-to condition (e.g., while (*p++)).
We expected PAGAI not to bound any program with a loop level greater than 1, which is the case except for one program (ex.2).In fact this example is a "false counter-example": the loop depth is syntactically 2, but the inner-loop appears in a branch which is never executed.The loop depth is then semantically 1.

Optimization level and traceability
The main focus of this work is the influence of linear analysis on the precision of the WCET estimation.Nevertheless, since analysis is performed at the C level, the problem of the traceability between the C and the binary code must be considered.Forbidding any optimization is not an option in real-time domain.We argue that a well-chosen set of optimizations can lead to a reasonable compromise between traceability and program speed-up.
For all functions that give some enhancement on the non-optimized code, we run the experiments using the custom optimization (CO) level defined in 4.4.Since the counter analysis is completely independent to the compilation method, the linear relations found are the same, and the ability to enhance the WCET estimation is only due to traceability.
The detailed results of this experiment are given in appendix (table 8, page 26), and a selection of typical cases is given in table 4. The table gathers the results obtained with the non-optimized binary code O0, and the optimized one CO.For each optimization level, the table gives: the Initial WCET, in CPU cycles, computed by OTAWA, the Best WCET, enhanced thanks to the properties discovered with PAGAI, with some abstract domain, the corresponding Improvement percentage.
The table also shows the Optimization speed-up, which is the ratio between the initial O0 and the initial CO estimation, i.e., it measures the gain obtained just because of the compilation, before applying the counter method.Finally, for CO compilation, the table gives an information on the Traceability: the percentage of counters introduced for LRA at C level, that are actually associated to some basic block, at binary level.Traceability in the O0 mode is not shown in the table as it is always 100%.
The interesting information given by the experiment are: Even if the CO level is very limited (subset of O1 level, and a fortiori of O2), it generates a fairly optimized code: the speed-up is mostly between 2x and 4x.
In most of the cases (53 out of 60) traceability is 100%, and one can observe an enhancement due to LRA similar to the one obtained with O0 code.Indeed, this improvement is obtained on the CO initial WCET, which is already much smaller than the one obtained for the non-optimized code (e.g., md.13, an.0).

02:19 6 Conclusion and future work
Linear Relation Analysis is a powerful technique to discover invariant linear relations between numerical variables of a program.On the other hand, the classical evaluation of WCET using Implicit Path Enumeration Technique is based on expressing the WCET as the solution of an Integer Linear Program, the variables of which are counters associated with the basic blocks of the program.So, the idea of adding these counters as auxiliary variables in the program, and using the results of LRA as semantic flow-facts to be added to the ILP, is rather natural.Our goal, in this paper, was to conduct a light-weight experiment -by combining existing tools -to evaluate the benefits of the approach.Secondarily, such an experiment raised the question of traceability, since semantic flow-facts are discovered on the source program, while the WCET is evaluated on the executable code.The conclusion of this experiment on public benchmarks is manyfold: LRA finds new semantic facts in many examples (46%), but many of these new facts do not influence the evaluated WCET. the WCET is improved on a significant subset (almost 14%) of the examples, and the improvement is often interesting.the traceability problems can be safely dealt with, using the debugging information provided by the compiler; this is the case even in the presence of strong compiling optimizations, as long as these optimizations do not modify too much the control structure of the program.This work could be continued in several directions.It would be interesting to limit the number of counters, as the cost of LRA can be exponential in the number of variables.Of course, counters which are structurally related by flow equations can be saved, but their cost is low in polyhedra computations (they are linked to each other by equations).An appealing idea would be to introduce counters on the branches of a conditional, only when these branches appear to have strongly different execution times, a measure that is roughly available after the micro-architectural analysis [43].
Existing LRA analyzers (like PAGAI) are generally not inter-procedural, which forced us to inline the procedures in our experiments.An inter-procedural version of LRA must be studied to solve this problem.The relational nature of LRA is surely an advantage, since a procedure can be associated a summary as an input-output relation.Summaries of called procedures can then be used in the caller, in a bottom-up fashion.
Traceability is still a concern, which would benefit from a better cooperation of the compiler [25].

A Experiment Results
The material necessary for reproducing the experiment presented here is freely available at https: //gricad-gitlab.univ-grenoble-alpes.fr/verimag/reproducible-research/LRA4w7.Experiment was performed on 589 individual C functions extracted from the TACLeBench [11].An improvement of the WCET estimation is observed for 90 functions (15% of the cases).This section details the results for the 60 cases where the improvement is greater than 0.8%.
Table 5 contains label definitions to ease and shorten the reference to the bench functions: the label (column 1), the source folder in the TACLeBench (column 2), and the function name (column 3).
Table 6 contains the experiment results using the gcc -O0 compilation level.The first column holds the function label, and the second one holds the initial WCET estimation computed by OTAWA.The remaining columns hold information related to the improvement obtained (or not) with Linear relation analysis, using 3 different abstract domains: boxes (intervals), octagons and polyhedra.For each domain, the table gives the improvement in number of cycles (∆) and percentage (Imp t ), and the time necessary to perform the LRA with PAGAI11 .Numbers in bold highlight the best improvements among various methods (box, octagons, polyhedra).Empty cells ('-') mean that the corresponding case triggered the 2 hours timeout set for the experiment.
Table 7 gives information on the ability of PAGAI to discover loop bounds ; to obtain this table, the experiments are re-played without the help of any external method (neither oRange nor the user-given pragmas).For each program, the table gives its loop level (maximal depth of nested loops) and indicates wheter PAGAI finds a bounded WCET or not.
Finally, table 8 aims at observing the impact of compiler optimization on WCET estimation in general, and our method in particular.We consider two optimization levels: the standard -O0 (no optimization at all), and the ad hoc customized -O1 level (designed to limit CFG transformation and maximize traceability).Since the LRA analysis is performed at the C level, the flow facts discovered are the same whatever is the optimization level.A lack of improvement in the case of optimized code is then necessarily due to an "imperfect" traceability.
The first group of columns recalls the results optained with -O0; it only gives the best result, obtained for some abstract domain (refer to Table 6 for details).The second group gives information on the optimized code: the initial WCET estimation given by OTAWA, together with the corresponding speed-up factor which indicates how "faster" is the optimized code compared to the non-optimized one; the best WCET estimation (together with the improvement percentage) optained using PAGAI; the traceability ratio indicates how many counters introduced by our method are actually associated to some basic block in the binary code.With a traceability of 100%, we expect to observe an improvement percentage of the same order than the one obtained on the nonoptimized code.Note that the traceability with the non-optimized code is not given since it is always 100%.

Figure 1
Figure 1 Instrumenting an example program with counters.

Figure 4 Figure 4
Figure4shows the proposed workflow for the experiment.It involves two existing components: timing analysis with OTAWA (left) and program analysis with PAGAI (right).Two new tools have been developed to complete the workflow: a front-end (top, Instrumentation), which produces the input for the analyzers (OTAWA and PAGAI), and a back-end (ILP translation & merge), which gathers the results into a more constrained ILP system, and obtains a possibly enhanced WCET estimation.

Figure 8
Figure8WCET improvement and analysis time depending on abstract domains (b=box, o=octagons, p=polyhedra).

Table 1
[6,37]d the available version of Chronos.Some additional work has been done that complement the infeasible path analysis[6,37], which is not part of the available version.
L I

Table 1
Results of tools on programs of Figure2.

Table 2
Mapping between counters and blocks.

Table 3
Some WCET improvement results (full table page 24).

Table 4
Impact of compiler optimizations on WCET and LRA

Table 8
Observing the impact of compilation levels on LRA.