Primitive Floats in Coq

Some mathematical proofs involve intensive computations, for instance: the four-color theorem, Hales’ theorem on sphere packing (formerly known as the Kepler conjecture) or interval arithmetic. For numerical computations, ﬂoating-point arithmetic enjoys widespread usage thanks to its eﬃciency, despite the introduction of rounding errors. Formal guarantees can be obtained on ﬂoating-point algorithms based on the IEEE 754 standard, which precisely speciﬁes ﬂoating-point arithmetic and its rounding modes, and a proof assistant such as Coq, that enjoys eﬃcient computation capabilities. Coq oﬀers machine integers, however ﬂoating-point arithmetic still needed to be emulated using these integers. A modiﬁed version of Coq is presented that enables using the machine ﬂoating-point operators. The main obstacles to such an implementation and its soundness are discussed. Benchmarks show potential performance gains of two orders of magnitude.


Motivation
The proof of some mathematical facts can involve a numerical computation in such a way that trusting the proof requires trusting the numerical computation itself.Thus, being able to efficiently perform this kind of proofs inside a proof assistant eventually means that the tool must offer efficient numerical computation capabilities.Floating-point arithmetic is widely used in particular for its efficiency thanks to its hardware implementation.Although it does not generally give exact results, introducing rounding errors, rigorous proofs can still be obtained by bounding the accumulated errors.There is thus a clear interest in providing an efficient and sound access to the processor floating-point operators inside a proof assistant such as Coq.R := 0; for j from 1 to n do for i from 1 to j − 1 do Ri,j := Ai,j − Σ i−1 k=1 R k,i R k,j /Ri,i; end for Rj,j := Mj,j − Σ j−1 k=1 R k,j2 ; end for Figure 1 Cholesky decomposition: given A ∈ R n×n , attempts to compute R such that A = R T R.

Proofs Involving Numerical Computations
We give below a few examples of proofs involving floating-point computations.
As a first example, consider the proof that a given real number a ∈ R is nonnegative.One can exhibit another real number r such that a = r 2 and apply a lemma stating that all squares of real numbers are nonnegative.Typically, one could use the square root √ a.A similar method can be applied to prove that a matrix A ∈ R n×n is positive semidefinite 1 as one can exhibit R such that 2 A = R T R. Such a matrix can be computed using an algorithm called Cholesky decomposition, given in Figure 1.The algorithm succeeds, taking neither square roots of negative numbers nor divisions by zero, whenever A is positive definite 3 .
When executed with floating-point arithmetic, the exact equality A = R T R is lost but it remains possible to bound the accumulated rounding errors in the Cholesky decomposition such that the following theorem holds under mild conditions.Theorem 1 (Corollary 2.4 in [34]).For A ∈ R n×n , defining c := (n+1) 1−2(n+1) tr(A) + 4n (2(n + 1) + max i A i,i ) η, if the floating-point Cholesky decomposition succeeds on A − c I, then A is positive definite.and η are tiny constants given by the floating-point format used.
A formal proof in Coq of this theorem can be found in a previous work [33].Thus, an efficient implementation of floating-point arithmetic inside the proof assistant leads to efficient proofs of matrix positive definiteness.This can have multiple applications, such as proving that polynomials are nonnegative by expressing them as sums of squares [26] which can be used in a proof of the Kepler conjecture [24].Interval arithmetic constitutes another example of proofs involving numerical computations.Sound enclosing intervals can be easily computed in floating-point arithmetic using directed roundings, towards ±∞ for lower or upper bounds.The Coq.Interval library [25] implements interval arithmetic and could benefit from efficient floating-point arithmetic.
More generally, there are many results on rigorous numerical methods [35] that could see efficient formal implementations provided efficient floating-point arithmetic is available inside proof assistants.

Objectives
The Coq proof assistant has built-in support for computation, which can be used within proofs, and recent progress have been done to provide efficient integer computation (relying on 63-bit machine integers).
The overall goal of this work is to implement efficient floating-point computation in Coq, relying directly on machine binary64 floats, instead of emulating floats with pairs of integers.Experimentally, that latter emulation in Coq incurs a slowdown of about three orders of magnitude with respect to an equivalent implementation written in OCaml.

Outline
The article is organized as follows: Section 2 provides the background required to position our approach, from proof-by-reflection to the IEEE 754 standard for floating-point arithmetic to interval arithmetic formalized in Coq.Section 3 is devoted to the implementation itself, with a special focus on the interface that its exposes.Section 4 gathers a discussion on several design choices or technicalities that have been important to carry out the implementation and avoid some pitfalls.Section 5 provides benchmarks to evaluate the performance of the implementation.Section 6 finally gives concluding remarks and perspectives for future work.

Prerequisites and Related Works
In this section, we start by reviewing the two main features that underlie and motivate our work in the Coq proof assistant: Poincaré's principle and the availability of efficient reduction tactics (in Section 2.1).We then give an overview of all notions of floating-point arithmetic that appear necessary to make this paper self-contained (in Section 2.2).We finally summarize the features of two related Coq libraries that are either a prerequisite for our developments (in Section 2.3), or an important building block for a possible extension of this work (in Section 2.4).

Proof by Reflection and Efficient Numerical Computation
In the family of formal proof assistants, the underlying logic of several systems -including Agda, Coq, Lego, and Nuprl [2] -provides a notion of definitional equality that allows one to automatically prove some equalities by a mere computation.This feature is called Poincaré's principle in reference to Poincaré's statement that "a reasoning proving that 2 + 2 = 4 is not a proof in the strict sense, it is a verification" [32, chap.I].Based upon this principle, the so-called proof by reflection methodology has been developed to take advantage of the computational capabilities of the provers and build efficient (semi)-decision procedures [7]: this approach has been successfully applied to various application domains, such as: graph theory, with the formal verification of the four-color theorem in Coq by Gonthier and Werner [14], discrete geometry, with the formal proof of the Kepler conjecture developed in the Flyspeck project [17], Boolean satisfiability, with the verification of SAT traces in Coq [1], satisfiability modulo theories, with the development of the SMTCoq library [13], or global optimization, with the development of the ValidSDP library [26].
To be able to address the verification of increasingly complex proofs relying on this approach, works have been carried out to increase the computational performance of proof assistants, relying on two complementary approaches: (i) implement alternative evaluation engines, such as evaluators based on compilation to bytecode or native code, and (ii) optimized data structures that might be based on machine values and hardware operators.
For example, the Isabelle proof assistant provides (i) several evaluators that can be used within proofs, and allows one to generate Standard ML, OCaml, Haskell, or Scala code, then (ii) libraries of fast machine words (for fixed size or unspecified size) have been developed while ensuring compatibility with all Isabelle's target languages and evaluators [23].

7:4 Primitive Floats in Coq
In this work, we specifically focus on the Coq proof assistant which offers in particular (i) the reduction tactics vm_compute, involving bytecode compilation and evaluation by a virtual machine [15] and native_compute, involving code generation and native OCaml compilation [3], as well as (ii) machine integers, upon which the Bignums library for multipleprecision arithmetic has been developed [16].
Regarding machine integers in Coq, the original implementation by Spiwack [1,39] was based on the so-called retro-knowledge approach, which consisted in developing a reference implementation of 31-bit integer operators in Coq (using lists of bits), then optimizing their evaluation in vm_compute (and later native_compute) by replacing the considered Coq operator on-the-fly with the corresponding hardware operator.The implicit assumption here is that both implementations match.This implementation has been recently replaced with so-called primitive integers 4 [12]: this approach required adding a representation of 63-bit machine integers in the kernel, and has the two-fold benefit of offering efficient operators for all reduction strategies with a compact representation of integers, and making explicit the axioms that specify the primitive operators.
The overall aim of this work is to provide a similar facility for floating-point arithmetic, to be able to compute with primitive floating-point numbers in Coq, instead of emulating floating-point numbers with pairs of integers.
A facility to compute with floating-point numbers for prototyping purposes is available in the PVS proof assistant thanks to the PVSio package [31] but to the best of our knowledge, no proof assistant currently provides support for machine floating-point computations in the scope of proof by reflection.

Floating-point Arithmetic
This section reviews the main concepts of floating-point arithmetic used in the remainder of this paper.The reader interested in more details could find them in reference books [30].
Computing in floating-point arithmetic amounts to performing calculations in what is often called scientific notation with one digit before the dot, a fixed number of digits following it and a power of ten specifying the position of the dot, hence the name floating-point arithmetic.When results do not fit in the required precision, they have to be rounded, e.g., with a precision of five digits, 1.234 • 10 2 + 5.678 • 10 −1 = 1.240 • 10 2 .

IEEE 754 Standard
Implementations of floating-point arithmetic in hardware nowadays adhere to the IEEE 754 standard [19].This standard prescribes sets of floating-point numbers, mostly as subsets of the real numbers field R, binary representations for them, rounding modes and basic arithmetic operators +, −, ×, ÷ and √ • defined as functions giving the same result as the operator in the real field composed with a rounding.
A floating-point format F is a subset of R such that x ∈ F when for some m, e ∈ Z, |m| < β p and e min ≤ e ≤ e max − p.The integer m is called the mantissa of x and e its exponent 5 .The constants β and p are called respectively the radix and precision of the format F while the constants e min and e max define the exponent range of F. Some floating-point values can have multiple representations, e.g., 1230 • 10 2 = 123 • 10 3 .To get a canonical representation, |m| ≥ β p−1 is enforced as soon as |x| ≥ β p−1+emin .In other words, all the space allowed by the precision is used for the mantissa.Mantissas smaller than β p−1 are only used for tiny values x such that β emin ≤ |x| < β p−1+emin , called denormalized numbers.Finally, 0 can get a canonical representation by arbitrary choosing an exponent.

Binary64 Format
The IEEE 754 standard defines multiple formats in radix β = 2 and β = 10 and various precisions.In the remaining of this paper, binary64 will be the only format considered 6 .This is a binary format, i.e. β = 2, offering a precision of p = 53 bits and its minimal and maximal exponents are respectively e min = −1074 and e max = 1024.As its name suggests, this format enjoys a binary representation on 64 bits as follows: sign exponent (11 bits) mantissa (52 bits) The exponent is encoded on 11 bits while the mantissa is encoded as its sign and its absolute value on 52 bits One is used for denormalized numbers, and 0 when the mantissa is 0, the other for special values NaN, and infinities when the mantissa is 0. The two infinities −∞ and +∞ are used to represent values that are too large to fit in the range of representable numbers.Similarly, it is worth noting that due to the sign bit, there are actually two representations of 0, namely −0 and +0.The standard states that these two values should behave as if they were equal for comparison operators =, < and ≤.However, they can be distinguished since 1 ÷ (+0) returns +∞ whereas 1 ÷ (−0) returns −∞.Finally, NaN stands for "Not a Number" and is used when a computation does not have any mathematical meaning, e.g., 0 ÷ 0 or √ −2.NaNs propagate, i.e., any operator on a NaN returns a NaN.Moreover, comparison with a NaN always returns false, in particular both x < y and x ≥ y are false when x is a NaN, as well as8 x = x.Thanks to the mantissa and sign bits, there are actually 2 53 − 2 different NaN values.These payloads can be used to keep track of which error created the special value but they are only partially specified by the standard and are in practice hardware dependent.

Precise Specification of Rounding Modes
From a formal point of view, a key definition introduced by the IEEE 754 standard is the notion of rounding.For a given floating-point format F, a rounding is an increasing function : R → F ∪ {±∞} whose restriction to F is identity, that is: The IEEE 754-2008 standard [19] defines five standard rounding modes:

7:6
Primitive Floats in Coq toward −∞: RD(x) is the largest floating-point number ≤ x; toward +∞: RU(x) is the smallest floating-point number ≥ x; toward zero: RZ(x) is equal to RD(x) if x ≥ 0, and to RU(x) if x ≤ 0; to nearest even: RNE(x) is the floating-point number closest to x.
In case of a tie: the one with an even mantissa; to nearest away from zero: RNA(x) is the floating-point number closest to x.
In case of a tie: the one with the largest mantissa in absolute value.In this work, we will only rely on the RNE rounding, which is the default rounding mode in most floating-point programming environments.See Section 4.1 for a more in depth discussion of this point.
Then, all floating-point operators are required to be correctly rounded, that is to say, they should behave as if they were computed with an infinitely precise mantissa, then rounded according to the specified rounding mode.To be more precise, for a given floating-point format F, operator * : R × R → R, and rounding mode : R → F, a correctly-rounded implementation * of * should verify: The benefits of this definition are two-fold: all floating-point operators that are correctly-rounded (the 2008 revision of the standard requiring this for +, −, ×, ÷, √ •) are fully-specified, which straightforwardly ensures the reproducibility of the results; it allows one to devise floating-point algorithms that directly rely upon this specification, as exemplified in the upcoming Section 2.2.2.

Error Free Transformations
Noticing that the rounding error of a floating-point addition is itself a floating-point number, algorithms such as Fast2Sum [11] and 2Sum [21,28] can compute that exact error, taking advantage of correct rounding.
These two "compensated summation algorithms" fall into the larger class of error-free transformations [22,37] which constitute an essential building block in the development of extended precision floating-point algorithms.

Standard Model
Although precise specifications are known for roundings, hence for basic arithmetic operators, a simpler model is commonly used to prove compound bounds of rounding errors on larger expressions [18].Despite being weaker, this model is more amenable to algebraic proofs, whether pen and paper or mechanized.Called standard model of floating-point arithmetic, it states the following main properties in the absence of overflow where and η are tiny constants depending on the floating-point format10 .As a recent example, the following result is proved in a slightly refined standard model [20].
Theorem 2 (Theorem 4.1 in [20]).For x ∈ F n , denoting ŝ the sum n i=1 x i computed with floating-point arithmetic in any order 11 , assuming no overflow occurs, it satisfies Coq proofs of such results can be performed, and are at the core of the proof of Theorem 1 [33].

The Flocq Library
Flocq [5,6]  With an unbounded exponent range, i.e., without underflow nor overflow.Although unrealistic, this model is attractive for its simplicity and commonly used for error bounds [18].With an exponent range only lower bounded, i.e., with underflow but without overflow.This may still seem unrealistic but overflows can often be studied separately which usually proves much harder for underflows [33].
A binary model of the binary32 and binary64 formats defined in the IEEE 754 standard, with underflows, overflows to infinities, signed zeros and NaNs with payloads.This model is used in the verified C compiler CompCert [4].Along with these models and links between them, the library contains many classical results about roundings, about some error-free transformations as presented in Section 2.2.2, and basic properties of the standard model described in Section 2.2.3.
The library is mainly developed by Sylvie Boldo and Guillaume Melquiond and is available at URL http://flocq.gforge.inria.fr/.

The Coq.Interval Library
Another Coq library could benefit from efficient floating-point arithmetic: Coq.Interval [25], which offers a modular formalization of interval arithmetic.First, module types (a.k.a.signatures) are defined for floating-point and interval operators.Then, several implementations of the floating-point signature are provided, relying on the Flocq library and specifically its model with unbounded exponent range.A generic implementation is provided, as well as a specialized implementation assuming radix 2 and representing mantissa and exponent as pairs of integers from Bignums.Next, a parameterized module implements interval operators where intervals are pairs of floating-point numbers, and related computations are performed using directed roundings, towards −∞ or +∞.Elementary functions such as exp, ln or atan are provided among these interval operators, but correct rounding is not guaranteed (namely, the computed intervals can be overestimated, albeit the containment property always holds and has been formally proved).Finally, tactics interval (decision procedure) and interval_intro (for forward reasoning) are provided to automatically and formally prove inequalities on real-valued expressions.
The library is mainly developed by Guillaume Melquiond and is available at URL http://coq-interval.gforge.inria.fr.ldshiftexp f e returns f × 2 e−shift .This is the reverse of frshiftexp and it is exact except when underflow or overflow occurs, in which case the result is rounded using RNE.
Converts a primitive integer to a floating-point value.Since primitive integers are unsigned 63-bit integers, they do not all fit into the 53-bit mantissas of the binary64 format.Values that do not fit are rounded using RNE .
Finally, two functions compute the successor and predecessor of a floating-point value.They can be used to implement interval arithmetic for instance.
Equipped with this interface, the Coq user can now perform floating-point computations using the processor operators and any of the evaluation mechanisms provided by Coq.

Specification
Although floating-point computations are possible, they remain entirely useless in proofs at this point, since there is no specification of their behavior.We thus need a Coq specification of floating-point arithmetic.First of all, the set of floating-point values itself has to be specified 15 .

7:10 Primitive Floats in Coq
This is similar to the full_float type in the IEEE754.Binary module of the Flocq library except for one point: the sign and payload of NaNs are not modeled here.It is also worth noting that this models much more values than the binary64 format 16 since no bounds on mantissas nor exponents are enforced.This makes for a simple specification.Then, each of the above operators must be specified on this spec_float type.This specification is mostly borrowed 17 from the IEEE754.Binary module of the Flocq library and totals 398 lines in our implementation 18 .We thus only detail the multiplication operator.We first need to define a few characteristics of the binary64 format as seen in Section 2. When |x| ∈ 2 e−1 , 2 e , then fexp e is the exponent used to encode x in the binary64 format.
As seen in Section 2.2.1.2,the floating point multiplication is defined by x ⊗ y = (x × y).When x = m x 2 ex and y = m y 2 ey , then x × y = (m x × m y ) 2 ex+ey and the rounding operator has to remove the extra bits in the mantissa to make this value fit in the format.To this end, we first abstract the bits to remove as two booleans, the rounding bit remembers the first forgotten bit whereas the sticky bit is true when any of the remaining forgotten bits is 1 and false when they are all 0. The function shr_1 then shifts a mantissa one bit to the right, updating the rounding and sticky bits accordingly  In addition to the usual operators, two functions are defined going back and forth from primitive floats to specification floats.Definition Prim2SF : float -> spec_float.Definition SF2Prim : spec_float -> float.
Finally, one needs to establish a link between the primitive operators and the specification.This is done by adding axioms to the system. 19First, to specify the two functions Prim2SF and SF2Prim above, one needs to characterize those values of type spec_float that actually represent a binary64 floating-point number, i.e., values with appropriately bounded mantissa and exponent.Again, this code comes from the Flocq library [5].So equipped, the following three axioms can be stated: Axiom Prim2SF_valid : forall x, valid_binary (Prim2SF x) = true.Axiom SF2Prim_Prim2SF : forall x, SF2Prim (Prim2SF x) = x.Axiom Prim2SF_SF2Prim : forall x, valid_binary x = true -> Prim2SF (SF2Prim x) = x. 19See file theories/Floats/FloatAxioms.v in the implementation.

7:12 Primitive Floats in Coq
These properties allow one to prove that both Prim2SF and SF2Prim are injective and thereby form a bijection between primitive floats and the subset of valid specification floats.
Thus, all of the fifteen operators given in Section 3.1 are linked to their specification by an axiom such as, for the multiplication: Axiom mul_spec : forall x y, Prim2SF (x * y)%float = SFmul (Prim2SF x) (Prim2SF y).
Since the specification is almost identical to the IEEE754.Binary module of Flocq, a link with Flocq is straightforwardly built 20 , establishing a bridge towards real numbers and giving access to all the results already proved in the library.This plays a key role in enabling actual proofs using primitive floating-point computations.Moreover, this enables to gain additional confidence in the above non trivial specification, since Flocq contains correctness theorems basically stating for instance 21 that, except when overflow occurs, SFmul x y is indeed the rounding of the real number x × y.

Implementation
The implementation was submitted to be integrated in Coq through the GitHub pull request https://github.com/coq/coq/pull/9867.
Below is an overview of the size of the development at the time of writing, summarized by sub-components (over the ≈ 3.7 kLoC added).This implementation required the addition of some code in the kernel of Coq.Most of it only consists in wrapping the floating-point operators into the different evaluation mechanisms of Coq and its core, actually dealing with floating-point arithmetic, can be found in the files kernel/float64.ml,kernel/byterun/coq_interp.c and kernel/byterun/coq_-float64.h.Most operators are implemented in C, as required by the vm_compute mechanism, and boil down to calls to the appropriate functions of the C standard library.Thus, no involved algorithmic happens in this added code itself.

Rounding Modes
We implement only one of the five rounding modes defined in the IEEE 754-2008 standard, namely rounding to nearest even (RNE).We argue here that implementing other rounding modes would not only easily be seriously harmful in terms of performance, notwithstanding the potential threat to soundness of the implementation, but also not very useful.
Unfortunately on most common processors, operators with different rounding modes are not implemented using different opcodes but a status flag.Once the flag is set to a particular rounding mode, all subsequent computations are performed with this rounding mode.Changing the rounding mode is then costly as it requires flushing pipelines.
Interval arithmetic constitutes the main use of rounding modes other than RNE we can foresee in a proof assistant.A common solution to the aforementioned performance issue is to set the rounding mode once to +∞ (RU), used to compute upper bounds, and emulate rounding toward −∞ (RD), used to compute lower bounds, by relying on properties like 22 RD(x + y) = − RU((−x) + (−y)).Although a monadic interface could be a reasonable implementation, this remains an imperative programming feature and doesn't integrate well within the functional paradigm offered by Coq.Moreover, if no particular care is taken to avoid or disable them, wild compiler optimizations -assuming that only RNE is used -could easily break the previous property, thus ruining the soundness of the whole approach.
However, interval arithmetic doesn't require precise directed roundings but only overand under-approximations thereof.We thus offer the next_up and next_down functions, computing the successor and predecessor of a floating-point value.Together with rounding to nearest operators, they satisfy the following property, ensuring soundness of interval arithmetic while providing a reasonably precise approximation of directed roundings:

Parsing and Pretty-Printing
Parsing and pretty-printing floating-point values is a non trivial question.We expect the following main property: printing a floating-point value and then reparsing the output of the printing function should give the initial value, i.e., parse • print should be the identity over binary64.It is worth noting that this necessarily implies the injectivity of the printing function.However, we don't require the parsing function to be injective, i.e., we do accept that multiple strings are parsed as the same floating-point value.
A simple solution would be to print an exact hexadecimal representation of the floatingpoint values, with a binary exponent, e.g., "0xcp-3".This fulfills the above requirement.Unfortunately, this is not very user-friendly.A decimal output would be much more human readable, e.g., "1.5" instead of "0xcp-3".
It is known that printing binary64 values using at least 17 significant digits and implementing parsing as a rounding to nearest guarantees the above requirements [30,Table 2.3,p. 44].This is thus the adopted solution.The current version of Coq only offers support for parsing and printing integer constants, so we extended this support 23 to decimal constants using the ubiquitous format integer_part .fractional_part e decimal_exponent , e.g., "1.23e-4". 22The opposite x → −x being exact in floating-point arithmetic (the sign bit is simply flipped). 23See the pull request https://github.com/coq/coq/pull/8764.I T P 2 0 1 9

Soundness
During our development, we identified three main potential threats to soundness: Specification Issues due to a mismatch w.r.t. the implementation would break the soundness.
We hope that taking in extenso our specification from the Flocq library, resulting from a few decades of experience in the field and proving links with other models, mitigates this risk.Moreover, such an error in the specification can only be harmful when the corresponding axiom is used.It is worth noting that all the axioms used in a proved theorem explicitly appear in the result of the Coq command Print Assumptions.
Incompatible Implementations in different evaluation mechanisms (compute, vm_compute or native_compute) or even on different machines could lead to a proof of False by evaluating a same term to different results.For instance, the payload of NaNs is not fully specified by the IEEE 754 standard and different hardwares can produce different NaNs for a same computation.That's why we chose to consider all NaNs as equal and not distinguish them.Thus incompatible implementations at the bit level remain compatible at the logical level.Double roundings due to the x87 on old 32 bits architectures [29] could also be harmful.The OCaml 24 compiler systematically relies on it, forcing us to implement all floating-point operators in C and to use the appropriate compiler flags.A runtime test 25 is eventually added to prevent Coq from running in case of miscompilation.
Another extreme example of implementation discrepancy would be a hardware bug such as the one encountered in the division of the early Pentium processors.
Incorrect Convertibility Test that distinguish two values that shouldn't or vice versa is also a threat.For instance, implementing this test using the equality test on floating-point values (as defined in the IEEE 754 standard) would be wrong as it equates −0 and +0 which should be distinguished since 1 ÷ (−0) = −∞ = 1 ÷ (+0) = +∞.Fortunately enough, this keeps a very simple implementation, with the following OCaml code: A few other, more minor, points appeared during the development.Among them, the fact that primitive integers in Coq are unsigned did require some care 26 .Finally, the way OCaml optimizes arrays 27 of floating-point values 28 did cause a few nasty bugs, although it is unlikely that such bugs could lead to a proof of False as they often yield a mere segmentation fault. 24The implementation language of Coq. 25 See file kernel/float64.ml in the implementation. 26We indeed fixed a few soundness bugs in primitive integers, pertaining with unsigned integers, before they were merged in Coq master development branch (https://github.com/coq/coq/pull/6914). 27 Arrays are used to communicate environments between the OCaml implementation of the kernel and the C implementation of the vm_compute virtual machine. 28This causes other issues in OCaml itself and seems to be a hot topic currently in the OCaml community [9].

Benchmarks
The overall objective of this work is to increase the performance of reflexive tactics involving floating-point arithmetic in Coq.Thus we first measure the performance gain on such a tactic, then evaluate it on its individual floating-point operators.We first present the reference problems under study (Section 5.1), then recap the hardware and software setup for these benchmarks (Section 5.2), and finally give the experimental results (Section 5.3).

Reference Test-suite
We developed a reflexive tactic posdef_check, performing some matrix positive definiteness check along the lines of Theorem 1 introduced in Section 1.1.Its implementation was adapted by reusing building blocks from our previous work on the validsdp tactic for multivariate polynomial positivity [26].This tactic is available in four flavors using vm_compute or native_compute and emulated floats or primitive floats.Emulated floats are a state of the art implementation of floating point arithmetic, based on primitive integers, from the Coq.Interval library whereas primitive floats are our new implementation.
Regarding the test-suite, we generated a set of random positive definite matrices (after fixing a given seed to make the random data reproducible) of size 50×50 up to 400×400.
We perform two kinds of benchmarks on this test-suite: the overall speedup between the versions of posdef_check using emulated vs. primitive floats; and the individual speedup in floating-point operators involved in this tactic.

Hardware/Software Setup
The formalization of the posdef_check tactic relies on a large set of dependencies that takes around one hour to compile.For greater convenience, we devised some Docker images containing the benchmark environment, based on Debian Stretch, opam 2 (the OCaml package manager) and OCaml 4.07.0+flambda.The source code of all benchmarks as well as guidelines to install Docker and run the benchmarks are gathered on GitHub at this URL: https://github.com/validsdp/benchs-primitive-floats/tree/1.0 The use of Docker (a so-called OS-level virtualization system) for these benchmarks yields a number of interesting features, beyond the facility to download and run a pre-built image on different machines: it runs containers in an isolated environment from the host machine, it ensures portability (across OSes such as GNU/Linux, macOS and Windows) and reproducibility, while being more lightweight than traditional virtual machines (VMs).
The experimental results of the upcoming Section 5.3 have been obtained using a Debian GNU/Linux workstation based on a Intel Core i7-7700 CPU clocked at 3.60 GHz, with 16 GB of RAM.All benchmarks have been executed sequentially (namely, without the -j option of make), with a total elapsed time of about 3h35', using the following image: "docker pull registry.gitlab.com/erikmd/docker-coq-primitive-floats/master_compileredge:9_coq-2ac1f46532264bacf2b1d8f5b6ee3659fe0cde67".

Experimental Results
We first measure the execution time of the whole tactic on the test-suite and compare it between emulated floats and primitive floats.The results are displayed in Table 1 for vm_compute and native_compute.Each timing is measured 5 times.The tables indicate the corresponding average and relative error among the 5 samples.One can notice that the obtained speedups are far from the three order of magnitudes separating emulated floats from equivalent OCaml implementations.From the above results, it appears that arithmetic operators constitute most of the computation time with emulated floats (at least 95% with vm_compute) but nothing tells us this is still the case with primitive floats.In fact, with primitive floats, most of the computation time is dedicated to list manipulating functions as our matrices are implemented using lists29 [8].Thus, we would like to get an idea of the time actually devoted to floating-point arithmetic in the total proof time of our reflexive tactic.We use the following simple methodology: replace each arithmetic operator with a version, uselessly, performing the computation twice30 , then subtract the execution time of the original program ("Op" in the tables) to the one of this modified program ("Op×2" in the tables).The obtained time ("Op time" in the tables) corresponds to the time devoted to the considered arithmetic operator.Note that the redundant computations involved in the modified program ("Op×2") could not be implemented with a mere additional let-in such as ...let m1 := mul a b in let m2 := mul a b in m2 because the virtual machine and the OCaml native compiler would optimize away the unused local definition; but doing so and adding an extra function call ...in select m1 m2 with Definition select (a b : F.type) := a. made it possible to use this doubling trick.The results are given in Table 2 for vm_compute and Table 3 for native_compute, in each case both on addition and multiplication 31 .Again, each timing is measured 5 times.It is worth noting that those last results should be taken more as coarse orders of magnitude than precise results.In particular, due to the overhead stemming from the duplication itself of the operators 32 , the speedups are -maybe seriously -underapproximated.Actual speedups could thus be higher than the ones suggested here.

Conclusion and Future Work
We developed a theory of floating-point arithmetic for the Coq proof assistant, composed of primitive implementation of basic arithmetic operators (+, −, ×, ÷, √ •), using the processor floating-point operators in rounding-to-nearest even, as well as successor and predecessor operators that can be used to approximate directed roundings to −∞ or +∞.This implementation is axiomatized under the assumption that the processor complies with the IEEE 754 standard for floating-point arithmetic.Particular care has been taken to make the implementation compatible across the different reduction engines of Coq, and across different hardware, thereby avoiding soundness issues that could be caused, for example, by the semantics of NaN payloads that is under-specified in the IEEE 754 standard.
We evaluated the performance on an implementation -carried out in Gallina, the input language of Coq -of a Cholesky decomposition that underlies a reflexive tactic for matrix positive definiteness, and the experimental results indicate a speedup of two orders of magnitude for arithmetic operators using vm_compute.This is consistent with the performance factor of about three orders of magnitude observed between floating-point arithmetic emulated using primitive integers in Coq and equivalent implementations written in OCaml.Now that primitive floats are available in a proof assistant, multiple future works can be envisioned.The most obvious one would be to adapt the Coq.Interval library to take advantage of primitive floats.Still in this direction, it is known that the successor and predecessor functions, used to approximate directed roundings, can be efficiently implemented using only arithmetic operators [36,38].Such an implementation could enable to remove these functions from the trusted code base.It would also be interesting to look at more elaborate elementary functions such as exp or arctan, relying for example on the CR-libm I T P 2 0 1 9

7:18
Primitive Floats in Coq implementation [10].Finally, in an attempt to improve confidence in the consistency between specification and implementation, and while waiting for a fully formally specified hardware interface, it is worth noting that this consistency is amenable to some intensive automatic testing, although exhaustive testing is out of reach for even unary operators on binary64.

7 .
One can notice that, out the of the 2048 values enabled by the 11 bits of exponent, two are unused when encoding exponents in the range [e min , e max −p] = [−1074, 971].
OCaml and C: 1815 LoC (floats − → kernel : 1070) (vm_compute support: 255) (native_compute support: 355) (parsing and pretty-printing: 85) (Coq checker: 50) Coq specifications: 620 LoC [mostly borrowed from Flocq] Coq proofs: 340 LoC Tests: 800 LoC Sphinx documentation: 115 LoC is a Coq library offering a very generic formalization of floating-point arithmetic.Radix and precision can be fully parameterized and floating-point values are defined, similarly to (1), as a subset of the real numbers R provided in the Coq standard library [27, Chapter 1].More specifically, multiple models are available: Finally, the rounding function first shifts the mantissa, rounds it, shifts the result one bit to the right in case the rounding added an extra bit and handles potential overflows 16spec_float gathers an infinite number of values, whereas binary64 only contains finitely many values.17Exceptfor the specifications of frexp, ldexp, normfr_mantissa, succ and pred which were not yet present in Flocq and which we took the opportunity to add https://gitlab.inria.fr/flocq/flocq/merge_requests/3. 18See file theories/Floats/SpecFloat.v in the implementation.| S754_finite sx mx ex, S754_finite sy my ey => binary_round_aux (xorb sx sy) (Zpos (mx * my)) (ex + ey) end.

Table 1
Proof time for the reflexive tactic posdef_check.

Table 2
Computation time for individual operators with vm_compute.

Table 3
Computation time for individual operators with native_compute.