Computing Outside the Box: Average Consensus over Dynamic Networks

Networked systems of autonomous agents, and applications thereof, often rely on the control primitive of average consensus , where the agents are to compute the average of private initial values. To provide reliable services that are easy to deploy, average consensus should continue to operate when the network is subject to frequent and unpredictable change, and should mobilize few computational resources, so that deterministic, low powered, and anonymous agents can partake in the network. In this stringent adversarial context, we investigate the implementation of average consensus by distributed algorithms over networks with bidirectional, but potentially short-lived, communication links. Inspired by convex recurrence rules for multi-agent systems, and the Metropolis average consensus rule in particular, we design a deterministic distributed algorithm that achieves asymptotic average consensus, which we show to operate in polynomial time in a synchronous temporal model. The algorithm is easy to implement, has low space and computational complexity, and is fully distributed, requiring neither symmetry-breaking devices like unique identifiers, nor global control or knowledge of the network. In the fully decentralized model that we adopt, to our knowledge, no other distributed average consensus algorithm has a better temporal complexity. Our approach distinguishes itself from classical convex recurrence rules in that the agent’s values may sometimes leave their previous convex hull. As a consequence, our convergence bound requires a subtle analysis, despite the syntactic simplicity of our algorithm.


Asymptotic average consensus
We consider a networked system of n agents -the generic term we use to denote the autonomous nodes of the network -denoted by the integer labels 1, . . ., n. Agent i begins with an input value µ i ∈ R, and maintains an estimate x i (t) of an objective.The input represents the agent's private observation of some aspect of its environment, which we assume to be taken arbitrarily from the domain of the problem; for example, the input may be a temperature reading, or the agent's initial position in space or velocity, if it is mobile.The

10:2
Computing Outside the Box estimate represents some aspect of the environment affected by the agent; depending on the system, it may simply be a local variable in the agent's memory, or it may directly represent some external parameter like the agent's heading or altitude.
Here, we focus on (asymptotic) average consensus, a control primitive widely studied by the distributed control community, where the estimates are made to achieve asymptotic consensus on the average of the input values -that is, to jointly converge towards the same limit µ := 1 n i µ i .The problem of computing an average is central to many applications in distributed control: let us cite sensor fusion and data aggregation [37,27,36], distributed optimization and machine learning [24,28,26], collective motion [32,30], and more [13,8,12].More generally, an average consensus primitive can be used to compute the relative frequency of the input values [16], and as such allows for the distributed computation of other statistical measures, for example the mode -the value with the highest support.
We study the problem of designing distributed algorithms for average consensus in the adversarial context of dynamic networks, where the communication links joining the agents change over time.Indeed, average consensus primitives are often needed in inherently dynamic settings, that static models fail to adequately describe.For a few examples, let us cite mobile ad-hoc networks, where links change as external factors cause the agents to move in space; autonomous vehicular networks, where agents are in control of their motion; or peer-to-peer networks, where constant arrivals and departures cause the network to reconfigure.
Specifically, we study distributed algorithms in a fully decentralized context: all agents start in the same state, run the same local algorithm, receive no global information about the system, only manipulate local variables, and interact with the system exclusively by exchanging messages with neighboring agents in the instantaneous communication graph.These constraints preclude the use of many standard solutions where the agents receive unique identifiers, where an agent is designated as a leader, or where tll agentshey initially agree on a bound on the network's degree or size.Moreover, we adopt a standard local broadcast communication model, particularly suited to modeling wireless networks, in which agents cast their messages without knowledge of their eventual recipients, and in particular cannot individually address their neighbors.
These conditions make it extremely hard to compute functions of the input values µ 1 , . . ., µ n : on general fixed directed networks, deterministic distributed algorithms are only capable of computing functions that depend on the set of the input values {µ 1 , . . ., µ n }, but not on their multi-set [17].In particular, this precludes the distributed computation of the average.Here, we only consider networks with bidirectional communication links.Under this condition, the problem is rather simple if we assume a static communication graph [37,5], in which case we can even deploy efficient solutions [31,28] relying on spectral properties of the underlying graph.The problem is obviously much harder in a dynamic setting, which, for example, forbids the use of such sophisticated spectral techniques.

Contribution
A standard approach to asymptotic consensus has agents regularly adjust their estimates as a convex combination of those of their neighbors [10,33], defined by a convex recurrence rule.We adopt a standard model of synchronized rounds, where this is expressed as a recurrence relation taking the generic form x i (t) = j∈Ni(t) a ij (t)x j (t − 1), where the weights a ij (t) are taken to form a convex combination, and the sum is over an agent's incoming neighbors in the communication graph at round t.

10:3
While asymptotic consensus is guaranteed as long as the never permanently splits [22], the estimates do not, in general, converge towards the average µ; reaching average consensus usually requires additionally enforcing symmetric weights a ij (t) = a ji (t).Here, we study distributed algorithms for average consensus, i.e., we are interested in devising an algorithm that produces such weights through local computations only, in a fully decentralized manner.
For a simple example, average consensus comes easily by picking the weights a ij (t) = 1 n when agents i ̸ = j are neighbors in round t, and a ii (t) = 1 − deg i (t)−1 n .However, this scheme might be simple to describe, but getting the agents to use these weights clearly requires getting them to know n, which is itself a serious distributed computing problem.
We will argue that the Metropolis rule [37], defined by the weights a ij (t) = for any two i ̸ = j neighbors in round t, breaks down over dynamic networks because of similar, albeit subtler, issues.We then propose a symmetric recurrence rule that is implementable over dynamic bidirectional networks, that we show to produce average consensus over any sufficiently connected network.The issues faced by the Metropolis rule are overcome by making the rule sometimes break convexity, which allows for keeping the average of the estimates constant even though the network changes unpredictably.
The temporal convexity of our distributed algorithm is polynomial, namely with a bound in O(n 4 log n), whereas the theoretical complexity bound of the Metropolis rule is of O(n 2 log n) [5].To the best of our knowledge, this is the first deterministic algorithm that achieves asymptotic average consensus over bidirectional dynamic networks without any centralized input or symmetry-breaking assumptions.We note in passing that there exist randomized algorithms that are efficient in bandwidth and memory and converge in O(n) rounds to a good approximation of the average µ with high probability [6,20,23].
We dub our distributed algorithm MaxMetropolis.Compared to the Metropolis rule, the change that we propose is deceptively simple: in the expression of the Metropolis weights, we replace the degree deg i (t) with the value deg i (t − 1) = max{deg i (1), . . ., deg i (t − 1)}.However, the resulting rule is no longer convex -the estimates x i (t) may sometimes leave the convex hull of the set {x 1 (t − 1), . . ., x n (t − 1)} -which makes the analysis substantially harder than in the purely convex case.Interestingly, although such "bad", convexity-breaking rounds, can happen at an arbitrarily late stage in the execution, we are able to bound the convergence time independently of when bad rounds occur -that is, once our target error threshold has been reached, disagreement in the system can still increase in later bad rounds, but not enough to break the threshold again.

Related works
Average consensus itself is at the center of a large body of works: among many others, let us cite [33,34,8,19,35,37,25,3,13,28,14], and see [26] for a recent overview of the domain.The approach based on doubly stochastic matrices in particular has been studied in depth, notably in [25,29], with an analytical approach that focuses on aspects such as the temporal complexity and tolerance to quantization, whereas we address issues of a distributed nature, in particular the implementation of rules by distributed algorithms.We also note earlier work on random walks by Avin et al., who showed that dynamic networks can present considerable obstacles to mixing, in stark contrast with the well-behaved static case.Although their proposed solution is not directly implementable in our model, as it leverages global information (a bound over n), their study nonetheless deeply influenced the current work.
Of interest to our argument, we note that [35] looks for the fixed affine weights that optimize the speed of convergence towards average consensus over a given fixed graph, and find that the weights can often be negative.Our algorithm is itself able to solve average S A N D 2 0 2 2 10:4 Computing Outside the Box consensus over dynamic networks precisely because it is sometimes allowed to use negative weights.When compared with our approach, the important difference is that we consider dynamic graphs and focus on distributed implementation of the recurrence rules, while the weights obtained in [35] are given by a centralized optimization problem, and are incompatible with a distributed approach.
A number of strategies aim at speeding up convex recurrence rules over static networks by having the agents learn what amounts to spectral elements of the graph Laplacian [4], and can result in linear-time convergence [31].As is the case here, these represent distributed methods by which the agents learn structural properties of the communication graph.However, these methods rely on centralized symmetry-breaking crutches like unique identifiers, and their memory and computation footprint is much greater than ours, with agents computing and memorizing, in each round, the kernels of Hankel matrices of dimension Θ(n) × Θ(n).In contrast, our method can be used by anonymous agents, requires ⌈log n⌉ additional bits of memory and bandwidth, and has a trivial computational overhead.

Mathematical toolbox
Let us fix some notation.If k is a positive integer, we denote by [k] the set {1, . . ., k}.If any set S ⊂ R is non-empty and bounded, we denote its diameter by diam S := max S − min S.
and G is strongly connected when directed paths join any pair of vertices -or simply connected when G is bidirectional.
All graphs that we consider here will be reflexive, bidirectional, and connected graphs of the form G = ([n], E).In such a graph, the vertices linked to some vertex i form its and the count of its neighbors is its degree deg i (G) := |N i (G)|.By definition, the degree is at most n, and in a reflexive graph it is at least 1.
We consistently denote matrices and vectors in bold italic style: upper case for matrices (e.g., A) and lower case for vectors (e.g., u), with their individual entries in regular italic style, (e.g., A ij , u k ).The shorthand v N denotes the infinite vector sequence v(0), v(1), . ...

The graph G
Given a vector v ∈ R n , we write diam v to mean the diameter of the set {v 1 , . . ., v n } of its entries.The diameter constitutes a seminorm over R n ; we call consensus vectors those of null diameter.
A matrix or a vector with non-negative (resp.positive) entries is itself called non-negative (resp.positive).A vector is called stochastic if its entries are non-negative sum to 1.
A matrix A is stochastic if its rows are all stochastic -that is, if A1 = 1 -and any matrix that satisfies the condition A1 = 1 will be said to be affine.We say that a matrix A is doubly stochastic when both A and A T are stochastic.
We denote the mean value of a vector v ∈ R n by ⟨v⟩ := 1 n i v i .Doubly stochastic matrices play a central role in the study of average consensus, as multiplying any vector v by a doubly stochastic matrix A preserves its average -that is, ⟨Av⟩ = ⟨v⟩.
For any matrix A ∈ R n×n , we can arrange its n eigenvalues λ 1 , . . ., λ n , counted with their algebraic multiplicities, in decreasing order of magnitude: Under this convention, the spectral radius of the matrix A is the quantity ρ A := |λ 1 |, and its spectral gap is the quantity γ In particular, a stochastic matrix has a spectral radius of 1, which is itself an eigenvalue for the eigenvector 1.

Computing model
We consider a networked system of n agents, denoted 1, 2, . . ., n. Computation proceeds in synchronized rounds that are communication closed, in the sense that no agent receives messages in round t that are sent in a different round.In each round t ∈ N >0 , each agent i successively 1. broadcasts a single message m i (t) determined by its state at the beginning of round t 2. receives some messages among m 1 (t), . . ., m n (t)

undergoes an internal transition to a new state
4. produces a round output x i (t) ∈ R and proceeds to round t + 1.
The agents receiving agent i's message m i (t) are unknown to agent i at the time of emission, in step 1. Communications that occur in round t are modeled by a directed graph G(t) := ([n], E(t)), called the round t communication graph, which may change from one round to the next.We assume each communication graph G(t) to be reflexive, as an agent always has access to its own messages without delay or transmission loss.
Messages to be sent in step 1 and state transitions in step 3 are determined by a sending and a transition functions, which together define the local algorithm for agent i. Collected together, the local algorithms of all agents in the system constitute a distributed algorithm.We posit no a priori global coordination or knowledge of the agents: in particular, we assume no leader, no unique identifiers, and no initial agreement on global parameters such as n.An agent's computations only involve its own local variables in memory.
An execution of a distributed algorithm is a sequence of rounds, as defined above, with each agent running the corresponding local algorithm.We assume that all agents start simultaneously in round 1, since the algorithms under our consideration are robust to asynchronous starts, retaining the same time complexity as when the agents start simultaneously.Indeed, asynchronous starts only induce an initial transient period during which the network is disconnected, which cannot affect the convergence and complexity results of algorithms driven by convex recurrence rules.
In any execution of a distributed algorithm, the entire sequence x N is determined by the input vector µ and the patterns of communications in each round t, i.e., the sequence of communication graphs G := (G(t)) t⩾1 , called the dynamic communication graph of the execution, and so we write x N = x N (G, µ).When the dynamic graph G is understood, we let N i (t) and deg i (t) respectively stand for N i (G(t)) and deg i (G(t)).As no confusion can arise, we will sometimes identify an agent with its corresponding vertex in the communication graph, and speak of the degree or neighborhood of an agent in a round of an execution.

Recurrence rules for consensus
We distinguish local algorithms, as defined above, from the recurrence rules that they implement: the latter are recurrence relations that only describe how the estimates x i (t) change over time, while the former specifies the distributed implementation of such rules in the system, through local interactions.This discrepancy is apparent in the Metropolis rule, whose distributed implementation over dynamic networks is problematic due to its dependence on "knowledge at distance two".

Affine recurrence rules Definition
Here, we focus on algorithmic solutions to the average consensus problem whose executions realize recurrence relations of the general form where the time-varying weights a ij (t) satisfy the affine constraint j∈Ni(t) a ij (t) = 1 and may depend on the dynamic graph G and the input values µ 1 , . . ., µ n .We refer to such relations as affine recurrence rules, and we say that a distributed algorithm implements the rule, insisting again that a distributed algorithm is distinct from the rule it implements.
Because of the constraint j∈Ni(t) a ij (t) = 1, the self-weights satisfy a ii (t) = 1 − j∈Ni(t)\{i} a ij (t).An affine recurrence rule is thus fully specified by the weights a ij (t) assigned to an agent's proper neighbors j ̸ = i.
The affine rule of Equation ( 1) is equivalent to the vector equation x(t) = A(t)x(t − 1), where A ij (t) = a ij (t) when i and j are neighbors in round t, and A ij (t) = 0 otherwise.The affinity constraint then corresponds to the condition A(t)1 = 1.

Convexity and convergence
We call the rule convex when all weights are non-negative -equivalently, when all matrices A(t) are stochastic.By and large, the study of affine recurrence rules focuses on that of convex recurrence rules, which guarantee convergence under mild conditions.We recall a standard convergence result, found under various forms in the literature, see for example [7,33,18,22].
▶ Proposition 2. Assume that the weights of Equation (1) admit a uniform positive lower bound α: a ij (t) ⩾ α > 0 for all t, i, and j ∈ N i (t).Under Assumption 1, the vectors x(t) converge to a consensus vector.
We speak of uniform convexity when such a parameter α exists, and we note that in this case asymptotic consensus is actually ensured by conditions much weaker than Assumption 1: for bidirectional interactions, it is enough that the network never become permanently split [22,Theorem 1].
Remark that Proposition 2 says nothing of the value of the consensus; affine recurrence rules for average consensus are typically designed to produce matrices that are doubly stochastic.By enforcing the invariant ⟨x(t)⟩ = ⟨x(t − 1)⟩, this makes the initial average µ the only admissible consensus value.

10:7
The convergence time of a single sequence z N , given by T(ε; z N ) := inf{t ∈ N | ∀τ ⩾ t : diam z(τ ) ⩽ ε}, measure its progress towards asymptotic consensus.For a rule or an algorithm, we consider the more helpful worst-case relative convergence time over a class C: for a system of n agents, it is defined by where we drop the class C if it is clear from the context.We recall the following bounds for uniformly convex recurrence rules over the class B c : when all matrices are doubly stochastic, the convergence time is in O(α −1 n 2 log n /ε) [25, Theorem 10].In the common case that α = Θ( 1 /n), all rules are known to admit executions that do not converge before Ω(n 2 log 1 /ε) rounds over the fixed line graph with n vertices [29, Theorem 6.1].

Consensus and average consensus rules
The EqualNeighbor rule The prototypical example of a convex recurrence rule is the EqualNeighbor rule, where an agent assigns the equal weights to all its neighbors, itself included: x j (t). ( We can mechanically derive an algorithm implementing the EqualNeighbor rule: in each round t, broadcast one's latest estimate x i (t − 1), and pick as new estimate x i (t) the arithmetic mean of the incoming values.Since deg i (t) ⩽ n, this rule admits 1 /n as a parameter of uniform convexity, and for a dynamic graph of B c , Proposition 2 shows that any solution to Equation (3) converges to a consensus vector.
Clearly, the EqualNeighbor rule does not solve the average consensus problem on the entire class B c , as the weights are generally not symmetric, unless each communication graph G(t) is regular -that is, if all its vertices have the same degree.

The Metropolis rule
In [37], Xiao et al. investigate the problem of distributed sensor fusion with the help of an average consensus primitive.For that, they describe the "maximum-degree" rule, parametrized with an integer N ⩾ 1, defined by the constant weights a ij (t) = 1 /N for any agents i ̸ = j neighbors in round t.
The authors note that this rule solves average consensus over the class ∪ n⩽N B c|n , but remark that implementing this rule hinges on the agents initially agreeing on the bound N , embedding an assumption of centralized control.This makes the "maximum-degree" rule inapplicable over truly decentralized systems -indeed, our communication model does not generally allow for the distributed computation of such a bound N [1].Xiao et al. go on suggesting the alternative rule: generally referred to as the Metropolis rule, as it is inspired from the Metropolis-Hastings method [15,21].

Computing Outside the Box
Analytically, this rule is appealing, as it was recently shown [5] to display a worst-case convergence time of O(n 2 log n) over the entire class B c -making it the fastest rule known to us to solve either consensus or average consensus on that class.From a computational perspective, it is argued in [37] that the Metropolis rule is better suited for decentralized systems, as it only leverages "local" knowledge.Indeed, agents can implement this rule knowing only, in each round, their own degrees in the current communication graph and that of their neighbors -compared to the initial agreement over N ⩾ n required of the "maximum-degree" rule.
Unfortunately, local algorithms cannot implement the Metropolis rule over dynamic networks.The rule is only "local" in the weak sense that an agent's next estimate x i (t) depends on information present within distance 2 of agent i in the communication graph G(t), which is not local enough when the network is subject to change.
Indeed, since agent j ∈ N i (t) only learns its round t degree deg j (t) at the end of round t -by counting its incoming messages -it cannot share this information with other agents before the following round.Any distributed implementation of the Metropolis rule would therefore require communication links that evolve at a slow and regular pace; one can imagine a network whose topology can only change once every k rounds, when t ≡ 0 mod k, e.g., at even rounds.
When the network is subject to unpredictable changes, the situation is even worse: we need to warn all agents, ahead of time, about any upcoming topology change.In effect, this amounts to having a global synchronization signal precede every change in the communication topology.For a topology change in round t 0 , this differs little from starting an entirely new execution with new input values µ ′ 1 = x 1 (t 0 − 1), . . ., µ ′ n = x n (t 0 − 1).To paraphrase, given a sufficiently stable communication network, one "can" implement the Metropolis rule over dynamic networks; however, the execution is fully decentralized only as long as no topology change actually occurs.
We note that, although we have covered the Metropolis rule here, other average consensus rules typically face similar problems, even when expressingly designed for dynamic networks.As an example, while the Metropolis rule can be implemented with a two-message protocole.g., on a communication graph that changes every other round, and with all agents agreeing on the parity of the round number, see e.g., [9] for a discussion -the rules given in [29,Algorithm 8.2] and [25, Section IV.A] involve a three-message protocol.Their implementation thus requires more network stability, and a stronger agreement, than Metropolis.

4
The MaxMetropolis algorithm

A symmetric affine rule Symmetrizing
Let us briefly recall the idea of the Metropolis-Hastings [15,21] method: given a positive stochastic vector π, the method turns a stochastic matrix A -usually viewed as the transition matrix of a reversible Markov chain -into another stochastic matrix A ′ with stationary distribution π, by picking off-diagonal entries as n , we get the simpler transform M (−), defined entry-wise by: Let us call this transform the Metropolis-Hastings symmetrization; as an example, the symmetrization of the EqualNeighbor matrix yields the Metropolis matrix.We can make a few remarks: for any matrix A, the matrix M (A) is affine and symmetric by construction, and for any j ̸ = i we have [M (A)] ij ⩽ A ij and therefore [M (A)] ii ⩾ A ii .In particular, if the matrix A is stochastic with positive diagonal entries, then so is M (A); if we can use Proposition 2 to establish the convergence of the system x(t) = A(t)x(t − 1), then necessarily the system y(t) = M (A(t))y(t − 1) also converges, and achieves average consensus.

Bound learning
To apply the Metropolis-Hastings symmetrization while avoiding the aforementioned limitations of the Metropolis rule, let us temporarily assume that each agent i ∈ [n] initially knows an upper bound q i ⩾ 1 over its degree throughout the execution, i.e., q i ⩾ deg i (t) for all t ⩾ 1.
In this case, an agent may broadcast in each round the pair ⟨q i , x i (t − 1)⟩ to its neighbors, and adjust its estimate as max(q j , q i ) ; (6) we easily see that this rule produces symmetric weights (a ij (t) = a ji (t)) and has a uniform convexity parameter of 1/ max i q i .For a dynamic graph of B c , any solution z N of Equation ( 6) converges to a consensus vector, by Proposition 2, and therefore achieves asymptotic average consensus, since the weights are symmetric.Using e.g., the aforementioned result of [25, Theorem 10], we can show that the convergence time behaves as O(max i q i • n 2 log n /ε), which is polynomial in n when the bounds q i themselves are.
Obviously, assuming such bounds q i supposes that the agents dispose of information about the dynamic structure of the network ahead of the execution, which our model explicitly disallows.Instead of assuming such bounds, we next show that we can solve the average consensus problem for the class B c by making agents learn good bounds over time in a manner consistent with our symmetric and local model.
To this effect, for each agent i we let deg i (t) := max{deg i (1), . . ., deg i (t)} for any round t.For a dynamic graph in B c|n , the value deg i (t) ∈ [2, n] is weakly increasing with t, and therefore stabilizing: we have deg i (t) = deg i := max τ ⩾1 deg i τ for all rounds t beyond some round t * i , Thus, by keeping track of deg i (t), agent i will eventually hold a bound on its future degrees for the rest of the execution, which may be used to implement Equation (6), not for the whole interval [1, ∞[, but on all but finitely many rounds.
Moreover, we have by definition deg i (t) ⩾ deg i (t), so that using deg i (t) in place of q i in Equation ( 6) produces a convex rule -even though deg i (t) may be inferior to agent i's future degree.Unfortunately, the weights 1 max(deg i (t),deg j (t)) cannot be computed in a local manner: since deg i (t) depends on deg i (t), the issues of the Metropolis rule apply here as well, as an agent cannot communicate its degree to its neighbors at the time they need the information.
We overcome this obstacle with a small, but crucial adjustment: building the round t weights using the latest known bound deg i (t − 1) in place of deg i (t) allows us to conform to the stringent locality constraints by sacrificing the convexity of the rule.Specifically, we propose the MaxMetropolis algorithm -given in Algorithm 1, -a deterministic distributed algorithm which solves the average consensus problem over the class B c in polynomial time, by implementing the rule Input: 4 In each round do: The weights are clearly symmetric, and so any solution to Equation ( 7) satisfies the invariant ⟨x(t + 1)⟩ = ⟨x(t)⟩.Moreover, by construction, there exists a round t * after which we have deg i (t − 1) = deg i ⩾ deg i (t); the assumptions of Proposition 2 are then satisfied over the infinite interval [t * , ∞[.Taken together, these observations immediately give us that MaxMetropolis is an average consensus distributed algorithm for the class B c .
On the other hand, in contrast with the Metropolis rule, the MaxMetropolis rule offers no guarantee of convexity: we easily see that if, for example, deg i (t) is much larger than deg i (t − 1), x i (t) may leave the convex hull of {x j (t − 1) | j ∈ N i (t)}, and in fact may even leave the convex hull of {x j (t − 1) | j ∈ [n]}.Such convexity-breaking rounds can occur late in the execution, and our main analytical difficulty will be to show that these "late bad rounds" cannot introduce too much noise in the system once a given degree of agreement has been reached.▶ Theorem 3. The MaxMetropolis algorithm solves the average consensus problem in all of its executions over the class B c .For a system of n agents and an error threshold of ε > 0, the convergence time is bounded by T(ε; n) = O(n 4 log n /ε).

Temporal complexity of the MaxMetropolis algorithm
To prove Theorem 3, we need to introduce a few technical results borrowed from [5], where they are given a more general and detailed exposition.In the following, we denote by σ(−) the sample standard deviation: σ(x) := i (x i − ⟨x⟩) 2 .The crux of the proof is to dominate σ(x(t)) with a geometrically decreasing sequence, taking care when handling matrices with possibly negative entries.
▶ Lemma 4. For any vector v ∈ R n , we have The inequalities are strict if, and only if, the vectors v and 1 are independent.
Proof.Developing the definition of the standard deviation, we have σ(v) = 1 2 i̸ =j (v i − v j ) 2 , which yields the left-hand side inequality.Moreover, without loss of generality we can assume ⟨v⟩ = 0, in which case σ(v) = ∥v∥; the right-hand side inequality then follows from the classic bounds diam The following lemma is a restatement of a standard variational characterization of the eigenvalues of the matrix I − A T A; see e.g., [11] for an in-depth treatment of the question.▶ Lemma 5. Let A denote a doubly stochastic matrix, irreducible and with positive diagonal entries.For any vector v, we have in the particular case where A is symmetric, we have σ(Av) Finally, we will rely on the following spectral bound, given in [25, Lemma 9].
▶ Lemma 6.Let A be a stochastic matrix, with smallest positive entry α.If A is symmetric, irreducible, and has positive diagonal entries, then we have With Lemmas 4-6, we can turn to the proof of Theorem 3. deg i , and

Proof of
where by convention we set deg i (0) = 2 so that the set K is properly defined.
and x(0) = (µ 1 , . . ., µ n ) is given by the input values of the execution.Equation ( 12), shows that the affine matrix A(t) is symmetric, and thus for any vector v we have ⟨A(t)v⟩ = ⟨v⟩.This is true for all t ⩾ 1, and so ⟨x(t)⟩ = µ is an invariant of the execution.If we show asymptotic consensus, then the consensus value is necessarily the initial average µ.
As a result of the Metropolis-Hastings symmetrization, the diagonal entries of the matrix A(t) satisfy which gives in particular A ii (t) ⩾ 1 /n when t / ∈ K.The vector sequence (x(t)) t⩾t * thus satisfies the assumptions of Proposition 2 for the uniform convexity parameter α = 1 /n, and so x(t) converges to a consensus vector.As already discussed, the limit value is necessarily S A N D 2 0 2 2 10:12 Computing Outside the Box the initial average µ, and the system achieves asymptotic average consensus.This holds for any dynamic graph G ∈ B c and arbitrary input values µ 1 , . . ., µ n , and thus MaxMetropolis is an average consensus algorithm for the class B c .
It remains to show the polynomial convergence bound T(ε; n) = O(n 4 log n /ε).We start with the remark that the diagonal entry A ii (t) can be negative in a round t during which deg i (t) > deg i (t − 1).Because of this, the estimate x i (t) might end up outside the range of the previous estimates {x 1 (t), . . ., x n (t)}.As a consequence, rounds t ∈ K are "bad" rounds, where the system may move away from consensus, delaying the eventual convergence.In the class B c , there is no uniform upper bound on the value of t * , and such convexity-breaking rounds may occur arbitrarily late in the execution.Our challenge is therefore to show that, in finite time, the system reaches a given degree of agreement which cannot be undone in later "bad" rounds.We do this by accounting, from the start, the total delay that can be accrued in rounds t ∈ K.
We follow the variations of the sample standard deviation S(t) := σ(x(t)) from one round to the next, distinguishing on whether t ∈ K or not.

Case t /
∈ K.By Equation ( 13), the irreducible matrix A(t) has positive diagonal entries, and thus has a positive spectral gap.By Lemma 5, we have Case t ∈ K. Here, the matrix A(t) may have negative diagonal entries.It need not be a stochastic matrix, and indeed its spectral radius ρ A(t) is possibly greater than 1.However, as a symmetric matrix, the matrix A(t) is diagonalizable, and thus we have ∥A(t)v∥ ⩽ ρ A(t) • ∥v∥ for any vector v.For the particular case v = x(t − 1) − µ1, this results in Equation ( 15) actually holds for all t ⩾ 1, but it is strictly worse than Equation ( 14) for rounds t / ∈ K.

For any t ∈ K and i ∈ [n], we have
As a consequence, given any error threshold ε > 0, the estimates are contained in a ball of diameter (ε • diam µ) at the latest in round t ε ⩽ δ + γ −1 log(2 −2n+1 ϖ 2 √ n/ε).From Lemma 6, we have γ Compared to the O(n 2 log n /ε) convergence time of the Metropolis rule, the latter asymptotic bound is worse by a factor n deg G .From the proof, we can give a rough analysis of this factors: the factor n represents the delay due to broken convexity, as each agent individually induces a delay of log deg i .The factor deg G comes from the fact that, whereas the Metropolis rule always selects the best possible off-diagonal weights -that is, the largest ones, -the MaxMetropolis rule makes conservative choices so as to allow for a decentralized algorithmic implementation that only breaks convexity finitely many times.
Improvements to the MaxMetropolis approach, based for example on adjusting the parameters q i downwards in pursuit of faster mixing, must therefore be considered with extreme care, as gains due to larger weights might result in greater delays due to broken convergence.

Conclusion
In this paper, we have presented the MaxMetropolis algorithm, a parsimonious distributed algorithm for average consensus that operates in polynomial time over connected bidirectional dynamic networks, without resorting to any centralized crutch like unique identifiers, a designated leader, or global information on the network.Our solution has many potential uses, given that average consensus primitives underpin many applications studied in distributed control.In contrast with the classic approaches used in this domain, we take an algorithmic stance, grounded in the theory of anonymous computation [1,2,17] and of the algorithmic study of dynamic networks [20].We argue that the fundamental convex recurrence rule for average consensus, namely, the Metropolis rule, cannot be implemented in a fully distributed and decentralized setting when the network is subject to unpredictable change.Our solution consists in relaxing the convexity constraint, resulting in an affine recurrence rule for average consensus that is algorithmically implementable in any networked multi-agent system with a time-varying communication graph, under the sole constraint of bidirectional links and permanent connectivity.
In the long version of our paper, we will relax the latter assumption and show that (B ⩾ 1)-bounded connectivity -where it is only each matrix product A(t + B − 1) • • • A(t) that is assumed irreducible -only delays our convergence bound by a factor B. An open question is whether one can design a fully decentralized average consensus algorithm that doesn't break the convex hull of the estimates, or whether that is impossible.
We call a network class a set of dynamic graphs; given a class C, we denote by C |n the subclass {G ∈ C | |G| = n}.Here, our investigation will revolve around the class B c of dynamic graphs of the following sort.▶Assumption 1.In each round t ∈ N >0 , the communication graph G(t) is reflexive, bidirectional, and connected.

)Algorithm 1
The MaxMetropolis algorithm, code for agent i.

S A N D 2 0 2 2 10: 14 Computing
Outside the Box Theorem 3. Let us fix a dynamic graph G ∈ B c with n ⩾ 2 vertices, and define By definition, each sequence deg i (t) is weakly increasing with t, and has deg i for limit.Since deg i (t) ⩽ n, there are at most deg i rounds with deg i (t − 1) < deg i (t).The set K is therefore finite, with cardinal δ := |K| ⩽ i deg i .We let t * := max K + 1; by construction, in all rounds t ⩾ t * we have deg i (t) = deg i .By an immediate induction, we see that, in any execution of the MaxMetropolis algorithm over the dynamic communication graph G, the sequence of estimate vectors satisfies the 2n ϖ 2 , deg i (t) = deg i (t − 1) when t / ∈ K where ϖ := i∈[n] deg i .From here, we let γ := inf t / ∈K γ A(t) , and we have τ ⩽t κ(τ ) = τ ∈[1,t]∩K κ(τ ) τ ∈[1,t]\K κ(τ ) ⩽ τ ∈K ρ A(t) τ ∈[1,t]\K (1 − γ A(t) )