Acquiring Maps of Interrelated Conjectures on Sharp Bounds

To automate the discovery of conjectures on combinatorial objects, we introduce the concept of a map of sharp bounds on characteristics of combinatorial objects, that provides a set of interrelated sharp bounds for these combinatorial objects. We then describe a Bound Seeker, a CP-based system, that gradually acquires maps of conjectures. The system was tested for searching conjectures on bounds on characteristics of digraphs: it constructs sixteen maps involving 431 conjectures on sharp lower and upper-bounds on eight digraph characteristics


Introduction
Research on conjectures making systems in the context of discrete mathematics is a topic that goes back to the late 1950s and the 1980s [8,14,32] and got renewed interest [20,21,29,31].Within CP, some initial research on the generation of implied constraints was done by Charley et al. [11] and the most recent work focuses on model and constraint acquisition [4,7,10,19,27,28] rather than on conjecture making.Within OR, Hansen's AutoGraphiX system [1,17] focuses on finding unrelated bounds using Variable Neighbourhood Search.
Four reasons motivate our work: (i) to highlight that CP can contribute to the automatic discovery of conjectures, (ii) to systematically search sharp bounds on characteristics of objects that show up in combinatorial problems, (iii) to stress the need to develop conjecture discovery programs that build up a body of strongly interrelated knowledge rather than unrelated conjectures as it has been the case so far, (iv) by the fact that bounds are an essential feature of branch-and-bound methods in optimisation but also a weakness of CP [16,22]: the development of sharp bounds that consider several interrelated characteristics is still a manual process [3,6].Our approach is unique among all works for conjectures generation, as the result is not a set, but rather a graph of conjectures, linked by projection (i.e.variable elimination) operators.Our contributions are: We introduce the concept of map of sharp bounds as a set of interrelated conjectures providing sharp lower and upper-bounds wrt the characteristic of a combinatorial object.
For each conjecture on a sharp bound, the map gives some extremal characteristics i.e., the characteristic values common to all combinatorial objects achieving the bound.By introducing secondary characteristics and by permitting the use of common sub-expressions in a polynomial, as well as simple Boolean and conditional formulae, we tend to produce explainable conjectures.This also reveals unified conjectures across different subsets of characteristics.
We demonstrate the usefulness of CP for acquiring such maps: using digraphs as combinatorial objects, the system produces 431 conjectures distributed in 16 maps obtained from 8 characteristics combined with lower and upper bounds.It retrieves a set of known results, enhances some known bounds, and comes up with new conjectures, some of which we proved to be true.The significance of maps is twofold.Beyond sharp bounds, a map brings together the relations between several sharp bounds and the structure of combinatorial objects reaching each bound under the same edifice.A map can be used to test the mutual consistency of independently acquired bounds by verifying that one bound can be derived from another bound.
In Sect.2, we introduce the concept of a map that presents a set of conjectures for sharp bounds and their logical relations.In Sect.3.1, we provide the workflow of our acquisition system.We introduce, in Sect.3.2, a parameterised CP conjecture generator.We evaluate the produced conjectures in Sect.4, discuss related work in Sect.5, and conclude in Sect.6.

Conjectures map as a symbolic piece of knowledge
After providing an informal overview of maps of conjectures, and a first example of a map, we motivate, define and illustrate the map concept.Then we show how the use of secondary characteristics permits both acquiring formulae sharing common sub-expressions, and sometimes come up with the same bound for different subsets of input characteristics.
Informal overview of maps.Consider digraphs as an example of combinatorial objects.It is well known that any digraph G satisfies the following invariant: the number of arcs a of G is less than or equal to the square of the number of vertices v 2 of G, and the maximum value v 2 is only reached when the number of vertices of the smallest connected component of G is equal to v, i.e.G consists of a single connected component of v vertices.We are interested in systematically generating such candidate invariants, a.k.a.conjectures, for a richer set of characteristics, e.g. the number of connected components c of G, the number c of vertices of the smallest connected component of G.
Our conjectures have one of the following forms: (i) sharp bounds of a digraph characteristic wrt other digraph characteristics, e.g. a ≤ v 2 , or (ii) implication showing that, when a sharp bound is reached, some characteristics are fixed or functionally determined by some other characteristics, e.g. a = v 2 ⇒ c = 1, and Finally, we are interested in connecting sharp bounds, revealing that the right-hand side of an implication of type (ii) can be used to eliminate a characteristic of a sharp bound and retrieve a sharp bound with one less characteristic.For instance, replacing c by v in the sharp bound a ≤ c 2 + (v − c) 2 , we retrieve the sharp bound a ≤ v 2 .We call these different conjectures and the links connecting sharp bounds "map".
A first example of map.As an example of combinatorial objects, we use in this paper digraphs with these characteristics: the number v of vertices, the number a of arcs, the number c (resp.s) of connected components (resp.strongly connected components), the number c (resp.c) of vertices of the smallest (resp.largest) connected component, the number s (resp.s) of vertices of the smallest (resp.largest) strongly connected component.To compare the bounds obtained by the Bound Seeker with the database of invariants of the global constraint catalogue, see Sect.
Map of two sharp bounds on the maximum number of arcs of a digraph.
In this paper, all maps of conjectures are presented in the same way as the map in Fig. 1: (i) the upper left corner of a node gives a node label in black, (ii) the upper right corner provides the parameters used in the sharp bound of this node in red, (iii) a dark label of the form ❶ refers to the sharp bound itself, (iv) a light label of the form ① designates an equation which must hold to reach the sharp bound given in (iii), (v) a brown illustration shows a witness to the sharpness of the bound.Finally, an arrow from a first node to a second node indicates which equation(s) in the second node should be used to substitute some parameters used in the first node's bound to retrieve the bound given in the second node.For space reasons, some large maps, e.g.Fig 4, may omit the elements (i) and (v).
Motivating and defining the concept of map.We introduce the concept of a map of conjectures as a way to reveal the links between a set of conjectures related to sharp bounds for a characteristic of a combinatorial object.Our goal is to describe conjectures on sharp bounds of characteristics of a combinatorial object, e.g. a digraph, a tree, and to organise these conjectures into a single structure, a map of sharp bounds, which (i) systematically interconnects these conjectures, and which (ii) describes the structure of the combinatorial objects for which the bounds are reached.In the map in Fig. 1, we consider for digraphs three characteristics, a, v and c for the number of arcs, of vertices, and of connected components.

▶ Definition 2. Given a finite set of input characteristics P and an output characteristic o /
∈ P, a map of sharp upper bounds M o ≤ P is defined as a digraph where: Each node of the map is associated with a subset P ⊆ P of input characteristics and corresponds to a maximum conjecture of the form o ≤ f (P ).This inequality is tight, i.e. there exist values that can be given to the parameters P in order to reach the equality.C P 2 0 2 2 6:4

Acquiring Maps of Interrelated Conjectures on Sharp Bounds
In addition, a node contains maximality conjectures, at most one per characteristic q in the complement of P wrt P, represented by the symbolic equality q = g q (P ), where g q is a function defined over realisable parameters values of P and called a maximum characterisation, and expressing the following property: for any combination of parameters P reaching the maximum f (P ), the characteristic q is equal to g q (P ).Each arc from conjecture o ≤ f i (P i ) to conjecture o ≤ f j (P j ) corresponds to a projection from a subset P i of input characteristics to a subset P j of input characteristics, by eliminating a characteristic q i,j , i.e.P j = P i \ {q i,j }.The arc is labelled with an equality q i,j = g qi,j (P j ) where g qi,j (P j ) is the value given to q i,j to reach the equality in the conjecture o ≤ f j (P j ).The equality q i,j = g qi,j (P j ) is called a maximality conjecture.
In a map, there is a single output characteristic that we bound using the other characteristics called input characteristics.The output characteristic is the bounded characteristic, while the input characteristics are the bounding characteristics.While the maximum conjecture provides a bound on the output characteristic wrt the characteristics in P , the maximality conjectures indicate the values taken by the characteristics not in P when the bound is reached.Similarly to M o ≤ P , a map M o ≥ P provides a collection of sharp lower bounds as a set of minimum conjectures of the form o ≥ f j (P j ), and a set of minimality conjectures.
{v,c,c} with the sharp upper-bounds ❶, ❷, ❸, ❹ for the number of arcs in a digraph; each node presents an example in brown: given a value for the characteristics attached to the node, a graph reaching the maximum is described, as a union of cliques Ki, with i vertices, e.g. in node (B), given the assignments v = 7 and c = 2, the digraph with 2 cliques K2, K5 reaches the maximum 29 for the number a of arcs; cond ?x : y denotes x if condition cond holds, y otherwise.
▶ Example 3 (Extending Ex. 1 to a map of four nodes).Fig. 2 presents Map M a ≤ {v,c,c} , where we consider the following characteristics of digraphs: as input characteristics, the number v of vertices, the number c of connected components, and the number c of vertices of the smallest connected component; as output characteristic, the number a of arcs.In Map M a ≤ {v,c,c} , there are four nodes, corresponding to the subsets {v, c, c}, {v, c}, {v, c} and {v}, shown in red, whereas the power set of {v, c, c} contains eight subsets.For the four other subsets, namely {c, c}, {c}, {c} and ∅, no conjecture can be found, as the number of arcs is not upper bounded wrt these characteristics.In the nodes (A), (B), (C) and (D), the items labelled with ❶, ❷, ❸ and ❹ indicate a maximum conjecture wrt the number a of arcs, while the elements marked with ⑤, ⑥, ⑦ and ⑧ show maximality conjectures wrt c and c.For instance, in Node (B), the maximum conjecture ❷ a ≤ c 2 + (v − c) 2 really means: among all digraphs with v nodes and whose smallest component contains c nodes, the digraph with most arcs has exactly c 2 + (v − c) 2 arcs.Each arc is labelled with a maximality conjecture giving the value of the characteristic that is eliminated.For instance, from Node (A) to Node (B), the characteristic c that is eliminated from ❶ satisfies this maximality conjecture ⑤: when the maximum of number of arcs is reached, the value of c is 1 if v = c, 2 otherwise.
Capturing more bounds with secondary characteristics.As the number of input characteristics grows, the bound formulae can get rather complicated.Consequently, we introduce a set A of auxiliary characteristics to obtain simpler formulae.Examples of such auxiliary characteristics are, for instance, (i) c >1 , (ii) s >1 , and (iii) c ∈{2,3} which correspond to (i) the number of connected components with more than one vertex, (ii) to the number of strongly connected components with more than one vertex, and (iii) to the number of connected components with two or three vertices and for which all strongly connected components have only one vertex.Also initially introduced when searching for lower bounds on the number of arcs, such characteristics have proved useful for many other bounds.We introduce the notion of secondary characteristics of the node of a map, which will be illustrated in Ex. 5 and 6.
▶ Definition 4. Given a node of a map that is associated to a subset P ⊆ P of input characteristics, to an output characteristics o, to a maximum conjecture of the form o ≤ f (P ), and a set of auxiliary characteristics A, the set of secondary characteristics of the node is defined as the characteristics of the set A ∪ (P − P − {o}) which are functionally determined by the set P when o = f (P ).
To test that a secondary characteristic is functionally determined by P , we check for each generated combination of values for P that the value of the secondary characteristic is unique.This test is performed while generating our dataset used for acquiring conjectures.
To find bounds that exploit these secondary characteristics, we use a multi-level approach: (i) first, we look for a formula for each secondary characteristic; (ii) then we try to catch a sharp bound also considering the secondary characteristics for which we could find a formula.Both in (i) and (ii) a formula can either use input characteristics and secondary characteristics for which we already found a formula.As a result, we obtain formulae that are easier to interpret, as we can associate a straightforward meaning to the sub-terms that appear in a bound.Ex. 5 illustrates this point.
▶ Example 5 (Bound expressed wrt several secondary characteristics).This example shows the only lower bound found by the Bound Seeker on the number of arcs a of a digraph G wrt the size c of its largest connected component and the size s of its smallest strongly connected component.We have P = {v, a, c, c, c, s, s, s}, the bound parameters P = {c, s}, the output characteristic o = a, and the auxiliary characteristics c, c, s, s, c >1 , s >1 } are functionally determined by c and s.The lower bound found by the Bound Seeker is where c = ((2 • s − c) ≤ 0 ?c : s), where a Boolean expression such as (s ≥ 2) is used as an integer, i.e. either 0 for false or 1 for true.While the main formula s >1 − c >1 + v is simple, it uses a secondary characteristic s >1 which is expressed directly wrt c and s, and two other secondary characteristics c >1 and v which mention the two extra secondary characteristics c and c for which two formulae involving only c and s could be found.The occurrence of Boolean expressions reflects slight variations in the structure of witness digraphs, i.e. digraphs reaching a sharp bound, as shown in Table 1. . . . . . .

6:6 Acquiring Maps of Interrelated Conjectures on Sharp Bounds
Within a same map, expressing bounds in terms of secondary characteristics may reveal a same bound formula for several subsets of input characteristics.We observed this phenomenon in the majority of the acquired maps.Ex. 6 illustrates this for the acquired map giving the upper bound on the number of vertices of the largest connected component of a digraph.
▶ Example 6 (Map example illustrating how bounds can be unified by using secondary characteristics).In the appendix, Fig. 4 depicts the maximum and maximality conjectures of the map M c ≤ {v,c,c,s,s,s} found by the Bound Seeker for the upper-bound on the size of the largest connected component c with the related links.Note that v needs to be an input characteristic, as otherwise the upper-bound of c is unbounded.Part (A) shows the 16 bounds found when using only the input characteristics: these bounds are defined by 5 maximum conjectures ➊,. . .,➎ and 4 maximality conjectures ⑥,. . .,⑨.Each link illustrates how a maximum conjecture is projected onto an other maximum conjecture via a maximality conjecture: e.g., the link Missing arcs are due to the lack of functional dependencies.For instance, in Part (A), we have no arc from {v, s} to {v}, as the number of strongly connected components s is not functionally determined by the number of vertices v when the sharp bound ❶ is reached, i.e. when c = v: e.g., for c = v = 2 we both have s = 2 and s = 1 as shown by . .and . . .

Overview of the map acquisition system
Parts (A) and (B) of Fig. 3 gives the different phases for generating a map: software components are shown in cyan and labelled with capital letters, while data is displayed in orange.
(A1) Generating data.To learn valid conjectures for any digraph of at most k vertices, we produce all parameter combinations of interest for digraphs up to a maximum number n of vertices.An exhaustive generation of such data is not a problem, as a program is used for this purpose.However, the issue is to select the appropriate value of k, neither too small to create invalid conjectures for digraphs with more than k vertices, nor too large to limit 6:7

(C) Bound table example
Figure 3 Workflow in the Bound Seeker: (A) data and (B) conjecture acquisition phases; Phase (A1) with a red background depends on the combinatorial objects we consider (digraphs in our case), while Phases (A2), (A3), . . ., (B3) are domain independent; (C1) example of an upper-bound table for digraphs of at most 3 vertices with the input characteristic v, c, the output characteristics a, and the secondary characteristic c corresponding to the number of vertices of the largest connected component; (C2) digraphs corresponding to each entry of the bound table shown in (C1).
the number of generated constraints to acquire the conjectures in Phase (B2).To this end, Phase (A1) produces a table T with the characteristics values for digraphs of at most n vertices in such a way that the size of the table T does not exceed a given memory limit.With this table T , Phase (A1) extracts for each i between 2 and n, for each subset of input characteristics P of P, and each output characteristic o, a bound table T o ≤ P,i based only on the entries of T corresponding to digraphs with at most i vertices.Each row of a bound table represents a feasible combination of values for P , with the corresponding bound value for o, and the values of the secondary characteristics.
Unlike all the next steps, Phase (A1) depends on the type of combinatorial objects for which we generate conjectures.For digraphs, our data generation phase uses a CP model to produce a set of bound tables that is used by the acquisition process.As illustrated in Part (C1) of Fig. 3   The minimum/maximum values of each column and the number of distinct values.
The minimal functional dependencies [24] that determine in the table T o ≤ P,k the output characteristic and the secondary characteristics.Each functional dependency gives a subset of characteristics that functionally determine another characteristic.For instance, in the bound table T a ≤ {v,c},3 , columns a and c are functionally determined by columns v and c.But column a is also functionally determined by columns v and c.
Binary constraints between two distinct columns i and j of the table In T a ≤ {v,c},3 we have for each row that the number of vertices is greater than or equal to the number of connected components, i.e. v ≥ c, and similarly v ≥ c, a ≥ v, a ≥ c, a ≥ c.Such knowledge is used to focus the search for conjectures: first by selecting promising subsets of input parameters for a formula, and second by providing information that avoids producing meaningless formulae.For instance, we do not generate a formula with a term min(v, c) as v ≥ c is true.The generated metadata is also the input of the next phase.
(A3) Generating meta metadata to find the relevant size of the training dataset.Based on the information computed by Phase (A2), Phase (A3) determines for the subset P and the output characteristic o, the size k used when searching for conjectures.To select the size k in the datasets T o ≤ P,i (with i ∈ [2, n]) from which we acquire the conjectures, we operate as follows.As a functional dependency or a binary constraint of a table T o ≤ P,i may become invalid for a table T o ≤ P,j with j > i, we identify the smallest size k from which the set of minimal functional dependencies and the set of binary constraints of the tables T o ≤ P,k , . . ., T o ≤ P,n remain identical.In practice, for space reason, we generated digraphs with up to n = 26 vertices.To avoid overfitting when the number of rows of table T o ≤ P,k is too small, we select the smallest size corresponding to the table with at least 200 rows: on average, conjectures were produced using digraphs with up to 18 vertices.
(B1) Generating candidate formulae.This phase generates for a bound table T o ≤ P,k , partially instantiated candidate formulae to acquire the corresponding maximal and maximality conjectures.Given the parameters P , the output characteristic o, the set of secondary characteristics of the selected bound table T o ≤ P,k , Phase (B1) produces on request the next candidate formula to find a conjecture.The set of potential characteristics that the formula may mention, and the formula itself, are restricted by the functional dependencies and the binary constraints that were identified by the metadata generation phase.Table 2 shows some candidates formulae that are successively produced for table T a ≤ {v,c},3 .
(B2) Generating a CP model linking a parameterised formula with the data.This phase uses a candidate formula generated by Phase (B1) to post an equational constraint for each entry in a bound table T o ≤ P,k to obtain a formula where all input parameters and coefficients are fixed and thus produce a conjecture.Phase (B2) queries Phase (B1) for the next candidate parameterised formula, tries to instantiate it, and asks again for a next candidate formula.To find a value for each coefficient of a candidate formula, we use a constraint model to link a candidate formula the functional dependencies and binary constraints identified by the metadata generation phase, and (ii) all the bound table entries of the selected size.Many constraints break different symmetry types and force all sub-terms of a formula to be meaningful.The second column of Table 2 shows for each candidate formula the corresponding concrete formula found by the CP model.

(B3) Testing the candidate conjectures. This last phase tests the validity of the conjectures against the largest bound table T o ≤
P,n , i.e. against the largest available generated dataset.

A constraint approach for acquiring symbolic equations
The search for sharp bounds leads to the identification of equations in which the left-hand side is an output or a secondary characteristic, and the right-hand side is a formula involving input and secondary characteristics.As already noted in the introduction of [9] and in the conclusion of [18], the space of candidate formulae constitutes a major challenge for equation discovery methods.Rather than applying a bottom-up approach that generates formulae of increasing complexity, we adopt the following strategy.As we aim at finding simple formulae, we use three complementary classes of formulae that turned out to appear concomitantly in a map: (1) Boolean formulae involving k arithmetic conditions linked by a single commutative logical operator or by a sum, (2) simple conditional formulae, and (3) formulae over polynomials that can share common sub-expressions.A first attempt to use only polynomials without common sub-expressions missed some formulae, e.g.see Ex. 5, and quite often provided too complicated formulae, as illustrated in Ex. 8. Based on the metadata introduced in Sect.3.1, we will present a CP approach for restricting the space of formulae: for space reasons, we focus on polynomials sharing common sub-expressions.

A parameterised candidate formulae generator for Phase (B1)
Formula syntax.All conjectures we generate have the form characteristic op formula, where op is one of the comparison operators ≤, =, ≥, and formula is a formula involving a set of characteristics.Consequently, formulae are described by the following set of simplified grammar rules, where "Small Capitals" indicates a non-terminal symbol, "Roman" denotes a function or a known constant, "Italic" highlights a (digraph) characteristics, "Bold" denotes an unknown integer constant.Within these rules, polynomial(Params, degree) denotes a polynomial whose maximum degree is fixed (with degree > 0) on a non-empty subset of parameters of its potential parameters Params, and the functions geq0(x), geq(x, y), sum_consec(x), cmod(x, y), dmod(x, y) resp.stand for 1 if x ≥ 0 otherwise 0, 1 if x ≥ y otherwise 0, x•(x+1)

2
, x − (y mod x), x − (x mod y).▶ Example 8 (Finding simpler bounds using Boolean and conditional formulae).We illustrate with an example generated by the system on the lower bound of the number of arcs a wrt the size of the smallest and largest connected components c and c, and the size s of the largest strongly connected component, how using Boolean and conditional formulae often leads to simpler conjectures.Without using Boolean and conditionals, we get a ≥ s >1 − c >1 + v with s >1 = min(s−1, 1), c >1 = min(min(c, 2), min(c, 2)+c−c−1), and v = min(c+c, c•c−c 2 +c); enabling Boolean and conditional formulae, we get the simpler bound: Candidate formulae generator.Since we want to try out a variety of formulae, we create a parameterised candidate formulae generator, which, upon backtracking, proposes a new candidate formula with non-fixed coefficients; these are variables for the constants and for the input characteristics that will be used in a candidate formula.In this generator we specify: The structure of the formula, that is whether we use (1) a Boolean formula, (2) a simple conditional formula, or (3) a formula over polynomials; in this later case we also specify how many unary and binary terms occur in each polynomial.
The arithmetic functions we may use in the terms.The complexity of a polynomial, that is its potential maximum degree, its maximum number of non-zero coefficients, the ranges of its coefficients.The list of possible combinations of characteristics that the candidate formula can use in its parameters.Such combinations correspond to functional dependencies identified by the metadata generation phase, i.e.Phase (A2).We use more than one generator to design a formula generation policy where the simplest candidate formulae are tried first.

Constraint model for acquiring a conjecture for formulae over polynomials for Phase (B2)
Given a candidate formula F, (corresponding either to Pol, to PolBinary, or to PolUnary as described in the set of grammar rules in Sect.3.2.1),for which the set of used parameters is partially determined, and for which the coefficients are not yet fixed, we create a constraint model that relates these unknowns to all rows in a bound table.Our model includes four types of constraints, namely (i) structural constraints on the input and secondary characteristics that will be used in F, (ii) symmetry-breaking constraints, (iii) constraints preventing the generation of formulae in which a term could be simplified, and (iv) equational constraints on each row of a bound table.We describe the model variables, the constraints on the characteristics used in F, the constraints on the unary/binary terms and binary function of F, and the equational constraints on the table entries.The number of variables and constraints of the model is linear wrt the number of table entries as it is dominated by the equational constraints.For reasons of space, concerning the constraints of the type (ii) and (iii), we will only detail the constraints related to the min function.
Variables used in the model.when B_O i = 0, and (C B_IND2 i , C B_IND1 i ) otherwise.When the binary term is commutative, e.g.min, the order of the arguments is irrelevant and B_O i will be set to 0 (see constraint (4.c) in Table 4), but otherwise, e.g.mod, the order matters.
Table 3 Variables of the model, where ncu is an abbreviation of the term nc + nu.

Objects Variables Comments
Characteristics Cj index of the used characteristics minimum value of the used characteristics U _MAX i maximum value of the used characteristics value of term U i wrt r-th row and the j-th column (with value of term B i wrt r-th row, the B_IND1 i -th, and the B_IND2 i -th columns of table value of polynomial P i wrt r-th row of table T Constraints on the structure of the formula.The upper part of Table 4 lists the constraints, (i) specifying which characteristics the formula F uses, i.e. see (1a), (ii) forcing a unary term, a binary term, and a polynomial to use the appropriate number of characteristics, i.e. see (2a), (3a) and (4a), (iii) connecting the characteristics used by the unary and binary terms with the characteristics used in the polynomials and the formula, i.e. see (5a), (6a), (iv) restricting non-zero coefficients of polynomials, i.e. see (7a), (8a).

6:12
Acquiring Maps of Interrelated Conjectures on Sharp Bounds Table 4 (Top) Constraints on the structure of a formula F; fd_table is the list of characteristics combinations that may be used by F, created by the candidate formulae generator, while maxz is the maximum number of non-zero coefficients of a polynomial.(Mid) Constraints on a unary term Ui (with i ∈ [1, nu]), where u f i is the function assigned to Ui, minj (with j ∈ [1, nc]), is the smallest value of the j-th characteristic.(Bottom) Constraints on a binary term Bi (with i ∈ [1, n b ]), where b f i is the function assigned to Bi, and table_unordered is the set of pairs of characteristics indices such that the 1st characteristic is not always smaller, or greater, than 2nd characteristic; char. is an abbreviation for characteristic.

Constraints Comments
(1a) table(⟨C1, . . ., Cc⟩, fd_table) P i uses at least one char., or at least one unary or binary term force each unary/binary term to be used by at least 1 polynomial each polynomial has a maximum number of non-zeros coefficients get index of used char.
get min.value of used char.
get max.value of used char.
get index of first used char.
(3c) B_IND1 i < B_IND2 i indexes are ordered fix order of the 2 arguments as min is a commutative function assign two char.whosevalues are not ordered Constraints on unary/binary terms and on a binary function.Within Table 4, constraint (1b) (resp.(1c), (2c)), links the 0-1 variables U i,j (resp.B i,j ) to the index of the characteristic involved in the term.To avoid generating unary terms of the form min(Characteristic, Cst) which could just be rewritten as Characteristic or as Cst, constraint (4b) restricts the minimum and maximum values of the constant.When using the min function in a binary term, constraint (4c) avoids generating equivalent binary terms whose arguments are permuted.Constraint (5c) prevents generating a binary term when the min could be simplified, e.g.avoids generating min(c, c) as the metadata information found in Phase (A2) indicates that c is always smaller than or equal to c. Finally, when the candidate formula F is a binary function corresponding to min, that uses the polynomials P 1 and P 2 of degree d, we post the lexicographic ordering constraint ⟨M 1,1 , . . ., M 1,( n+d d ) ⟩ < lex ⟨M 2,1 , . . ., M 2,( n+d d ) ⟩ between the monomial coefficients of P 1 and P 2 .Note that, for space reason, besides constraints (4b), (4c), and (5c), we omit in Table 4 the symmetry and simplification constraints related to functions that are different from min.

6:13
Equational constraints.For each row r of the bound table T we post some constraints linking the selected characteristics C j with (i) the value variable U _VAL i,r of each unary term U i , (ii) the value variable B_VAL i,r of each binary term B i , and (iii) the value variable P_VAL i,r of each polynomial P i .For each row r we also post an equality constraint linking the value of the candidate formula F on row r with the corresponding bound value on the same row.Finally, for a binary function min between two polynomial P 1 and P 2 , we impose that for at least one of the entries of the bound table the value of P 1 is strictly less than the value of P 2 on the same entry, and that the converse applies for another entry of the table.To avoid unnecessarily complex formulae, we minimise the sum of the absolute values of the coefficients of a candidate formula F.

Evaluation of the Bound Seeker
We focus on constructing 16 maps on the lower and upper-bounds of the number of vertices, the number of arcs, the number of connected (resp.strongly connected) components, and their minimum and maximum sizes.The components of the system are written in SICStus Prolog and consist of 10000 lines of code for the Data Generation, the Metadata Generation, the Meta Metadata Generation, the Candidate Formulae Generation, the CP Model Generation, and the Test phase.The Data Generation phase generates a total of 1944 bound tables (occupying 2 Gb) for each maximum number of digraph vertices ranging from 2 to 26; each bound table gives the lower or upper-bound of a characteristic wrt different subsets of input characteristics.We evaluate the Bound Seeker from several standpoints: The percentage of conjectures that, while acquired from the size selected by the Meta Metadata, still hold for all entries of the largest generated bound tables, i.e. the tables of digraphs containing up to 26 vertices.The percentage of bounds from the database of invariants in [2] that was retrieved (resp.not found).
Besides the conjectures retrieved from the global constraint catalogue database, we manually proved ten new conjectures.Using WolframAlpha, we also checked the consistency of 105 projections of a sharp bound B 1 onto a sharp bound B 0 involving one less input characteristic, by substituting in B 1 the input characteristic to be eliminated, by the expression defined by the corresponding maximality conjecture.As the complexity of a formula increases with the number of input characteristics, we limit our evaluation to up to 3 input characteristics.All experiments to acquire the conjectures for the 16 maps were done using the same system parameters, i.e. none of the components have been tuned manually to behave differently depending on the considered map.Out of 350 (resp.202) combinations of input characteristics for which the Bound Seeker tried to find a sharp lower (resp.upper) bound, using only polynomials, it got at least one sharp bound for 279 (resp.149) combinations of characteristics, as well as 1236 (resp.975) minimality (resp.maximality) conjectures.Using also Boolean and conditional expressions it found 3 extra lower bounds and 93 new maximality/minimality conjectures.Table 5 provides the results for the 16 maps using SICStus 4.6.0 on a 2015 iMac with a 4 GHz Core i7 and 32Gb of memory: for each map, we give the number of formulae found using only polynomials (see col. #P1), then using Boolean, conditional, and polynomial (see col. #B2, #C2 and #P2).Using Boolean and conditional expressions generates 3.8% new formulae compared to when using polynomials alone; moreover, 31.07% of the formulae that use polynomials are replaced by simpler formulae that use Boolean or conditionals expressions.The time spent is explained by a significant number of candidate formulae tested, as it comes from the C P 2 0 2 2 6:14 combination of minimal functional dependencies and grammar rules.Moreover, arithmetic constraints like div and mod with multiple occurrences of the same variable are handled poorly by CP solvers.The datasets used in the experiments and the sixteen maps found will be available for download in a technical report.

Acquiring Maps of Interrelated Conjectures on Sharp Bounds
Evaluation of the acquired conjectures wrt the largest data sets.Of the 3625 conjectures acquired when only using polynomials, we found 5 invalid conjectures when tested against all samples of the largest data set, i.e. all digraphs up to 26 vertices.Of the 3264 conjectures acquired when also using Boolean and conditional expressions, we found 16 invalid conjectures.Note that in this setting the Bound Seeker does not try to find polynomial formulae if it already found a Boolean or a conditional formula.
Comparing the conjectures founds with proved bounds of the constraint catalogue.As shown in Table 6, the Bound Seeker retrieves 66.66% of the bounds of the constraint catalogue, even if the resulting formulae have sometimes a different form: e.g., the upper-bound on the number of arcs a wrt the number of vertices v, connected components c, and strongly connected components s in the catalogue is expressed as a while the Bound Seeker finds the equivalent inequality a ≤ ⌊ Unlike the bound given by [2], the bound found by the Bound Seeker defines the size c of the smallest connected component, and the size s of the largest strongly connected component of those extreme digraphs for which the upper-bound is reached.
An example of a generalised bound found by the Bound Seeker is the lower bound a ≥ ((v − c) ≤ 1 ?max(v − 1, 1) : v − 2), with v = (c = c ? c : c + c) which extends the catalogue bound c ̸ = c ⇒ a ≥ c + c − 2 + (c = 1).An example of correct bound found by the Bound Seeker replacing the erroneous bound (i) a ≥ v − ⌊ s−1 2 ⌋ of the catalogue is (ii) a ≥ v − c ∈{2,3} with c ∈{2,3} = (v = s ?⌊ v 2 ⌋ : ⌊ s−1 2 ⌋): for the edge condition v = s = 2, (i) returns 2, rather than 1 as (ii) does.Bound (ii) a ≥ v − c ∈{2,3} can be interpreted as follows: to minimise the number of arcs, one has to maximise the number of connected components of the form . ., . . ., . . .or . . . .The missing bounds of the catalogue are partially explained by the limited complexity of the common subexpressions (see BTerm, UTerm in Sect.3.2.1) of our polynomials, and by the lack of some secondary characteristics.

Related work
While there exist several discovery programs in the context of mathematics devoted to set theory, number theory, finite algebra and knot theory [12,23,13], only a few systems focus on finding bounds between characteristics of a combinatorial object.The two most notable systems are S. Fajtlowicz's Graffiti program [14] and P. Hansen's AutoGraphiX system [1,17].The first difference is that the Bound Seeker attempts to systematically construct a set of sharp bounds on all possible combinations of a set of input characteristics.The second main difference is that the Bound Seeker introduces secondary characteristics and searches for key properties of extreme combinatorial objects for which the bounds are reached.
In slightly different domains, recent work in CP uses machine learning techniques to estimate the domain boundaries of an objective function [30] of an optimisation problem.Some other work uses CP to extract equations from a spreadsheet [18,25], and some recent work investigates how to integrate integer programming solvers within neural networks [15,26].
The specificity of our approach compared to machine learning and constraint acquisition [7] is twofold: (i) we can generate our input data, but we need to ensure that these data contain the correct values of the sharp bounds we consider, as otherwise, we would necessarily obtain wrong maximal conjectures; moreover, maximality conjectures only make sense for sharp bounds; (ii) we have to learn concise conjectures that fit perfectly to all available data, as minimising an error measure would be irrelevant for acquiring conjectures on sharp bounds.

Conclusion
We introduce a structure that connects a set of sharp bounds.Based on this structure, we propose a constructive approach to acquire a set of interrelated conjectures on sharp bounds.We show the relevance of using a variety of types of formulae, i.e., Boolean, conditionals, and polynomials with shared sub-expressions, to acquire simpler conjectures.This work opens a new application domain for CP for automated conjectures-making systems.It creates a new line of research to those already reported in a recent survey on machine learning for combinatorial optimisation [5].

A
Figure 4 Map M c ≤ {v,c,c,s,s,s} of upper-bounds of the output characteristic c found by the Bound Seeker, where each dotted node contains, from left to right, a reference to the maximum conjecture ❶,. . .,❺, ❶,❷, possibly a set of maximality conjectures ⑥,. . .,⑨,③,. . .,⑦, and the set of input characteristics in red; Part (A) corresponds to the bounds found while only using the input characteristics, and Part (B) refers to the bounds found using also the secondary characteristics.

▶ Example 1 .
4.3 of [2], we assume that each vertex of a digraph has at least one incoming or outgoing arc.Fig. 1 illustrates the map concept with a map containing three conjectures labelled as ❶, ❷, and ③: Two conjectures about the sharp bounds ❶ a ≤ (v − (c − 1)) 2 + (c − 1), and ❷ a ≤ v 2 on the maximum number of arcs a in a digraph G wrt the number of vertices v, and the number of connected components c of G.The conjecture ③ of node (B) indicates that the bound v 2 is reached only when c = 1.The arrow going from node (A) to node (B) is labelled by ③ as the bound v 2 is obtained by replacing c by 1 in the bound (v − (c − 1)) 2 + (c − 1).The leftmost and rightmost parts of Fig. 1 show, in brown, two digraphs achieving these bounds.
as we have ⑦ c = 1.Part (B) shows the bounds found when also using the secondary characteristics r and c, where r is a secondary characteristic corresponding to v − c • c.We only have 2 maximum conjectures ➊ c ≤ v and ➋ c ≤ r + c, where r and c are defined by the 5 maximality conjectures ③,. . .,⑦ shown on Part (B).The natural upper-bound of c is the number of vertices of the digraph (see ➊), unless c or c are part of the input characteristics (see ➋), which requires to consider the feasibility conditions induced by the use of such inputs.
Candidate formulae generated by Phase (B1) Formulae found by Phase (B2) polynomial of degree 1 parameterised by v and c to determine c c = v − c + 1 polynomial of degree 1 parameterised by v and c to determine a none polynomial of degree 2 parameterised by v and c to determine a a = c 2 − 2 • v • c + v 2 − c + 2 • v polynomial of degree 1 parameterised by v and c to determine a none polynomial of degree 2 parameterised by v and c to determine a a = c 2 − c + v

Table 1
Digraphs minimising the number of arcs for four values of the bound parameters c and s.

C P 2 0 2 2 6:8 Acquiring Maps of Interrelated Conjectures on Sharp Bounds
, the bound table T a ≤ {v,c},3 provides a sharp upper bound of the output characteristic a wrt the input characteristics v and c.A bound table may also mention secondary characteristics, e.g.c in T a ≤ {v,c},3 , which are functionally determined by the input characteristics.Each column of the table T a ≤ {v,c},3 refers to a characteristic, i.e. v, c, a, c, while each row corresponds to a combination of parameter values for v, c with the associated maximum number of arcs a and the value of the secondary characteristic c.For each bound table T o ≤ P,i (with P ⊆ P and i ∈ [2, n]), with nrows rows, where T o ≤ P,i [r, j] denotes the value of the r-th row and the j-th column, Phase (A2) calculates the aggregated information D o ≤ P,i (with P ⊆ P and i ∈ [2, n]) used to select the size k employed when searching for the conjectures of the subset P and the output characteristics o, such as:

Table 2
Examples of candidates formulae and corresponding generated formulae for the bound table T a ≤ {v,c},3 in Part (C1) of Fig 3.
Table 3 introduces the variables used to represent a nonconstant formula F involving at most n c characteristics (i.e.input and secondary characteristics), n u unary terms, n b binary terms, and n p polynomials, wrt a bound table T of nrows rows.We use n as a shortcut for n c + n u + n b .For the binary term B i , the variables B_IND1 i , B_IND2 i , B_O i designate a term with the arguments (C B_IND1 i , C B_IND2 i )

Table 5
Number of minimum/maximum and minimality/maximality conjectures found for each of the 16 maps and time in min.using only polynomials (see only Poly), and using Booleans, conditionals and polynomials (see Bool/Cond/Poly).

Table 6
Comparing the conjectures on the bounds found by the Bound Seeker (BS) with the database of invariants of the global constraint catalogue (GCC).