Our proof is guided by linear programming and duality arguments. The starting point is the observation that choosing the best ﬂip parameters in the ﬂi...

0 downloads 1 Views 524KB Size

No documents

Loading...

arXiv:1810.12980v1 [cs.DS] 30 Oct 2018

Sitan Chen∗

Michelle Delcourt

†

Ankur Moitra‡

Luke Postle

Guillem Perarnau

§

¶

November 1, 2018

Abstract

A well-known conjecture in computer science and statistical physics is that Glauber dynamics on the set of k-colorings of a graph G on n vertices with maximum degree ∆ is rapidly mixing for k ≥ ∆ + 2. In FOCS 1999, Vigoda [Vig99] showed that the ﬂip dynamics (and therefore also Glauber dynamics) is rapidly mixing for any k > 11 6 ∆. It turns out that there is a natural barrier at 11 , below which there is no one-step coupling 6 that is contractive with respect to the Hamming metric, even for the ﬂip dynamics. We use linear programming and duality arguments to fully characterize the obstructions to going beyond 11 6 . These extremal conﬁgurations turn out to be quite brittle, and in this paper we use this to give two proofs that the Glauber dynamics is rapidly mixing for any k ≥ ( 11 6 − ǫ0 )∆ for some absolute constant ǫ0 > 0. This is the ﬁrst improvement to Vigoda’s result that holds for general graphs. Our ﬁrst approach analyzes a variable-length coupling in which these conﬁgurations break apart with high probability before the coupling terminates, and our other approach analyzes a one-step path coupling with a new metric that counts the extremal conﬁgurations. Additionally, our results extend to list coloring, a widely studied generalization of coloring, where the previously best known results required k > 2∆.

∗ EECS, MIT, Cambridge, Massachusetts. [email protected] This work was supported in part by NSF CAREER Award CCF-1453261 and NSF Large CCF-1565235. † School of Mathematics, University of Birmingham, Birmingham, UK. [email protected] Research supported by supported by EPSRC grant EP/P009913/1. ‡ Math & CSAIL, MIT, Cambridge, Massachusetts. [email protected] This work was supported in part by NSF CAREER Award CCF-1453261, NSF Large CCF-1565235, a David and Lucile Packard Fellowship, an Alfred P. Sloan Fellowship, and an ONR Young Investigator Award. § School of Mathematics, University of Birmingham, Birmingham, UK. [email protected] ¶ Combinatorics and Optimization Department, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada. [email protected] Partially supported by NSERC under Discovery Grant No. 2014-06162.

1 1.1

Introduction Background

Here we study the problem of sampling random proper colorings of a bounded degree graph. More precisely, let k be the number of colors and let ∆ be the maximum degree. A long-standing open question is to give an algorithm that works for any k ≥ ∆ + 2, when the space of proper colorings is ﬁrst connected. Despite a long line of intensive investigation [Jer95, SS97, DF03, DFHV13, Hay03, HV03, Mol04, HV05, FV06, FV07], the best known bounds are quite far from the conjecture. In fact, there is a natural Markov chain called the Glauber dynamics that is widely believed to work: in each step, choose a random node and recolor it with a random color not appearing among its neighbors. It is easy to see that its steady state distribution is uniform on all proper k-colorings, again provided that k ≥ ∆ + 2. It is even conjectured that on an n node graph, the mixing time is O(n log n) which would be tight [HS05]. We remark that rapidly mixing Markov chains for sampling random colorings immediately give a fully polynomial randomized approximation scheme (FPRAS) for counting the number of proper colorings. There is also interest in this question in combinatorics [BW02] and in statistical physics, where it corresponds to approximating the partition function of the zero temperature anti-ferromagnetic Potts model [Pot52]. Jerrum [Jer95] gave the ﬁrst signiﬁcant results and showed that when k > 2∆ the Glauber dynamics mixes in time O(n log n). The modern proof of this result is easier and proceeds through path coupling [BD97], whereby it is enough to couple the updates between two colorings σ and τ that diﬀer only at a single node v and show that the expected distance between them is strictly decreasing. Then Jerrum’s bound follows by comparing how often the distance between the colorings decreases (when v is selected and after the update has the same color in both) vs. how often it increases (when a neighbor of v is selected and recolored in one but not the other). This result is closely related to work in the statistical physics community by Salas and Sokal [SS97] on the Dobrushin uniqueness condition. In a breakthrough work, Vigoda [Vig99] gave the ﬁrst algorithm for sampling random colorings that crossed the natural barrier of 2∆. His approach was through a diﬀerent Markov chain that in addition to recoloring single nodes also swaps the colors in larger Kempe components (which are also called alternating components). His chain was a variant of the Wang-Swendsen-Koteck´ y (WSK) algorithm [WSK89]. The key insight is that the bottleneck in Jerrum’s approach — when the neighbors of v all have distinct colors — can be circumvented by ﬂipping larger components. More precisely, when a neighbor of v is recolored in one chain in a way that would have increased the distance, one can instead match it with the ﬂip of a Kempe component of size two in the other chain that keeps the distance the same. But now one needs to couple the ﬂips of larger Kempe components in some manner. Vigoda devised a coupling and a choice of ﬂip parameters that works for any k > 11 6 ∆. His Markov chain mixes in time O(n log n) and one can also connect it to Glauber dynamics and prove an O(n2 ) mixing time under the same conditions. This is still the best known bound for general graphs. Subsequently, there was a ﬂurry of work on getting better bounds for restricted families of graphs. Dyer and Frieze [DF03] considered graphs of maximum degree Ω(log n) and girth Ω(log log n) and proved that the Glauber dynamics mixes rapidly whenever k > α∆ where α is the solution to α = e1/α and numerically α = 1.763 · · · . Their approach was to show that under the uniform distribution on proper colorings, the number of colors missing from the neighborhood of v is roughly k(1 − k1 )∆ with high probability. Results like these were later termed local uniformity properties. They were improved in many directions in terms of reducing the degree and/or the girth requirements to be independent of n [Hay03, Mol04, HV05, FV06, LM06], culminating in two incomparable results.

1

Dyer et al. [DFHV13] showed that Glauber dynamics mixes rapidly whenever k > β∆ where β is the solution to (1 − e−1/β )2 + βe−1/β = 1 and numerically β = 1.489 · · · for girth g ≥ 6 and the degree ∆ being a suﬃciently large constant. Hayes and Vigoda [HV03] showed rapid mixing for any k > (1 + ǫ)∆ for any ǫ > 0 provided that the girth g ≥ 11 and the degree is logarithmic, using an intriguing non-Markovian coupling. ˇ showed that for triangle-free graphs, unless On the hardness side, Galanis et al. [GSV14] NP = RP, it is NP-hard to approximately sample k-colorings for d-regular graphs G when k < d, even when G is triangle-free. There have also been many other algorithmic improvements, but all for special graph families. Through an eigenvalue generalization of the √ Dobrushin condition, Hayes [Hay06] showed that Glauber dynamics mixes rapidly for k > ∆ + c ∆ on planar graphs and graphs of constant treewidth. Berger et al. [BKMP05] showed rapid mixing on graphs of logarithmic cutwidth, which was strengthened by Vardi [Var17] to graphs of logarithmic pathwidth. Some recent papers have ˇ studied settings such as bipartite or random graphs [DFFV06, MS10, EHSV18], where it is possible to mix with fewer colors than the maximum degree. Hayes et al. [HVV15] notably improved the abovementioned result of [Hay06] to show that Glauber dynamics in fact mixes rapidly for planar graphs when k = Ω(∆/ log ∆). [TVVY12] established that the mixing time of Glauber dynamics for sampling colorings of trees undergoes a phase transition at the reconstruction threshold (up to ﬁrst order). Given that there has been progress in so many restricted graph families, it is natural to wonder why there hasn’t been any further progress on the general case since Vigoda’s results.

1.2

Our Results

Our main result is the ﬁrst improvement on randomly sampling colorings on general bounded degree graphs since the FOCS 1999 paper of Vigoda [Vig99]. Speciﬁcally, we prove: Theorem 1.1. The flip dynamics for sampling k-colorings is rapidly mixing with mixing time O(n log n), for any k ≥ ( 11 6 − ǫ0 )∆ where ǫ0 > 0 is an absolute constant that is independent of ∆. As in Vigoda’s paper [Vig99], we obtain the following as implications of our main result1 : Theorem 1.2. The Glauber dynamics for sampling k-colorings is rapidly mixing with mixing time O(n2 ), for any k ≥ ( 11 6 − ǫ0 )∆ where ǫ0 > 0 is the same constant from Theorem 1.1. Theorem 1.3. The k-state zero temperature anti-ferromagnetic Potts model on Zd lies in the disordered phase when k ≥ ( 11 3 − 2ǫ0 )d where ǫ0 > 0 is the same constant from Theorem 1.1. Our proof is guided by linear programming and duality arguments. The starting point is the observation that choosing the best ﬂip parameters in the ﬂip dynamics (i.e. the probability of ﬂipping Kempe components of each possible size), when utilizing Vigoda’s greedy coupling [Vig99], can be cast as a linear program. In this manner, Vigoda’s analysis provides a feasible solution and it is natural to wonder if choosing the ﬂip parameters diﬀerently or ﬂipping larger size components 1

In order to prove rapid mixing of Glauber dynamics, one can use the comparison technique of Diaconis and SaloffCoste [DSC93]. Vigoda [Vig99] directly used the results in [DSC93] to show that the mixing time of the Glauber dynamics is O(n2 log n). It has been observed (see e.g. [FV07]) that τmix (ǫ) = O(n(log n + log ǫ−1 )) for flip dynamics implies mixing time O(n2 ) for Glauber dynamics. This can be shown using spectral bounds on the mixing time for ǫ = 1/n [Sin92] and observing that the spectral gaps of Glauber dynamics and flip dynamics are the same up to a constant factor. The same argument applies to our case, so Theorem 1.1 implies Theorem 1.2. Regarding Theorem 1.3, it is known (see e.g. Lemma 7 of [Vig99]) that it holds provided that under any boundary configuration, the flip dynamics mixes in time O(n log n) on QL the d-dimensional cube of Zd with side length 2L + 1. Theorem 1.4 below implies this is the case for k ≥ (11/6 − ǫ0 )∆ (note that the degree of QL is ∆ = 2d).

2

would lead to an improvement. As it turns out, the answer is no in a strong sense. Not only are Vigoda’s choice of parameters an optimal solution to the linear programming relaxation2 , but we use extremal configurations, which correspond to tight constraints in the linear program, to build a small family of hard instances showing that 11 6 is a natural barrier for a class of analyses. Speciﬁcally, for all n and inﬁnitely many ∆ we can exhibit two graphs G1 and G2 , together with a pair of proper colorings (σi , τi ) for each Gi that diﬀer at a single node, such that the following holds: for any choice of ﬂip parameters and one-step coupling of the ﬂip dynamics, (σ1 , τ1 ) and (σ2 , τ2 ) cannot both contract in Hamming distance under the coupling provided k < 11 6 ∆ (see Construction 3.1 and Lemma 3.3). At this juncture, there are two potential approaches for circumventing the 11 6 barrier: (1) Instead of using a one-step coupling, use a multi-step coupling, still with respect to the Hamming metric. (2) Use a one-step coupling but change the metric. As it turns out, both approaches work and we are able to give two separate proofs of Theorem 1.1. The present work is a merger of [CM18] and [DPP18], and we chose to present the results together because there are parallels between the two proofs.3 The main idea behind both approaches is that any extremal conﬁguration is quite brittle, and if there are few enough extremal conﬁgurations present, there is a choice of ﬂip parameters for which we can go below the 11/6 barrier. To get an idea for just how brittle these conﬁgurations are, suppose v is the unique node of disagreement between two colorings σ and τ . For every color c, there is an extremal conﬁguration which if present in σ, τ would require that in σ, the maximal connected induced subgraph colored only with c and σ(v) and containing v is a tree of size 7 and moreover that v has degree exactly 2 in this tree. There are many possible transitions in the ﬂip dynamics that would break this rigid pattern, for instance, ﬂipping the color of a descendant of v in the tree. In our ﬁrst proof of Theorem 1.1, we run a multi-step coupling which terminates when the Hamming distance between the two colorings changes. We argue that by the time the coupling terminates, extremal conﬁgurations like the one above are likely to break apart. In the above example, regardless of the choice of ﬂip parameters, at any point there is an Ω(1/n) chance of breaking it (Lemma 4.6) and a Θ(1/n) chance the coupling terminates (Lemma 4.2). What remains, and this is the most delicate part of the proof, is to upper bound the probability that conﬁgurations that are not extremal transition to ones that are, and the key point is that this is also O(1/n) (Lemma 4.8). These three bounds together imply that by the end of the coupling, the number of non-extremal conﬁgurations around v will be at least some constant times the number of extremal conﬁgurations in expectation. This is enough to show that for some tuning of the ﬂip dynamics, even if the expected change in Hamming distance after one step of coupling is still positive when k is slightly below 11 6 ∆, the expected change by the time the coupling terminates will be negative. In our second proof of Theorem 1.1, the main idea is essentially the same: we want to win on colorings like G1 and G2 at the expense of losing on the other possible types of colorings. However we accomplish this by changing the metric. More precisely, we choose d to be the Hamming metric dH minus a small correction factor dB that counts the number of extremal conﬁgurations around the node v of disagreement. We show that even when k is slightly less than 11 6 ∆, there is a choice of ﬂip parameters for which the following win-win analysis shows d contracts in expectation in one step. For pairs of colorings with few non-extremal conﬁgurations around v, dH will stay the same 2

Several approximations are made along the way in reaching this linear program, such as restricting to flipping components of size at most 6 and replacing certain infinite sets of constraints with a finite set of stronger constraints. 3 The ǫ0 obtained in Theorem 1.1 is slightly different under the two proofs, but both roughly on the order of 10−5 .

3

in expectation in one step (Corollary 5.1). When most of the conﬁgurations around v are extremal, we show that even though dH does not contract in expectation in one step, dB does (Corollary 5.2). It is here that our proofs bear a sharp resemblance: The proof that dB is contractive revolves around the exact same types of reasoning about lower bounding the probability that an extremal conﬁguration breaks apart, and upper bounding the probability that a non-extremal conﬁguration becomes extremal (Lemma 5.2). While we have chosen to present the technical pieces in both proofs separately because of the subtle diﬀerences in their structure and relevant deﬁnitions4 , we believe that presenting both of them and drawing analogies when possible gives a more complete picture about the diﬀerent ways to circumvent bottlenecks in one-step coupling with respect to the Hamming metric, and what the conceptual relationship is between these diﬀerent techniques. Lastly, we are able to extend our techniques to the problem of sampling list colorings, a natural and well-studied generalization of coloring. Jerrum’s proof for k > 2∆ carries over immediately for list-coloring, and while there have been subsequent works studying this problem in the case of triangle-free graphs when k > 1.763∆ [GMP05, GK07, GKM15], there had been no known improvement on 2∆ for general bounded degree graphs. Given that Vigoda’s algorithm for sampling random colorings with 11 6 ∆ colors has been known for many years, it is somewhat surprising that it (and our subsequent improvements) can be extended to the list-coloring setting in a fairly straightforward manner: Theorem 1.4. The Glauber dynamics for sampling k-list-colorings is rapidly mixing with mixing time O(n2 ), for any k ≥ ( 11 6 − ǫ0 )∆ where ǫ0 > 0 is the same constant from Theorem 1.1. The key step in our proof is to introduce a notion of ﬂip dynamics for list colorings where a Kempe component is ﬂipped only if both colors appear in all lists of vertices of the component; we call such components flippable. While such a chain seems natural in hindsight, we are not aware of it appearing anywhere in the literature. Roughly speaking, introducing this distinction between ﬂippable and un-ﬂippable Kempe components requires introducing additional constraints to Vigoda’s linear program. The point is that none of these extra conﬁgurations are extremal under the ﬂip parameters used to prove Theorem 1.1, so both proofs of Theorem 1.1 carry over to prove Theorem 1.4.

1.3

Further Discussion

Here we survey some previous uses of multi-step coupling, alternative metrics, and linear programming in approximate sampling. In terms of multi-step coupling for sampling colorings, the ﬁrst improvements to Vigoda’s bound for graphs of large degree and girth by Dyer and Frieze [DF03] used a burn-in method. Hayes and Vigoda [HV03] gave a sophisticated method based on looking into the future to prevent singly blocked updates in Jerrum’s maximal coupling. Non-Markovian couplings have also been used to get O(n log n) mixing time — rather than O(n2 ) — for Glauber dynamics with k = (2 − ǫ)∆ colors [DGG+ 01, HV07]. The crux of these last results is to terminate the coupling at a random stopping time corresponding to the ﬁrst point at which something interesting happens in the coupled evolution, e.g. the Hamming distance changing. This variable-length coupling approach is also one of the approaches we take. On the other hand, to our knowledge alternative metrics have found only one application in previous works on sampling colorings, namely in the analysis of the “scan” chain for sampling colorings of bipartite graphs in [BDK06]. That said, path coupling using alternative metrics has 4

See Remark 4.2 for a discussion of some of these differences.

4

found success in other problems in the approximate sampling literature [LV99, BD98, BDK06]. Bordewich et al. [BDK06] gave evidence that the multi-step stopping time-based approach can be captured by one-step coupling with an appropriate metric, though the metric they use to establish this connection is itself based on stopping times. In an orthogonal direction, another take on the question of designing metrics for one-step coupling is given in [HVV15] via the spectral radius of ˇ+ 16] via the Jacobian of the belief propagation operator. the adjacency matrix and in [EHS Lastly, we remark that before Vigoda’s work [Vig99], Bubley et al. [BDG98] used linear programming to show that Glauber dynamics is rapidly mixing with ﬁve colors on graphs with maximum degree three. Their approach however required solving several hundred linear programs, and was subject to “combinatorial explosion” as a function of the degree.

Organization In Section 2 we review the basics of path coupling, brieﬂy summarize Vigoda’s Markov chain and coupling analysis, and recall some basic notions in variable-length coupling. In Section 3, we interpret Vigoda’s one-step coupling analysis as implicitly solving a linear program and identify a family of worst-case neighborhoods which is tight at k > 11 6 ∆ for one-step coupling of the ﬂip dynamics with respect to the Hamming metric. In Section 4, we give a proof of Theorem 1.1 via variable-length coupling. In Section 5, we give a proof via one-step coupling with an alternative metric. In Section 6, we overview how to extend our techniques to prove Theorem 1.4. In Appendix A, we give a self-contained exposition of the details of Vigoda’s coupling analysis. In Appendix B, we provide the set of conﬁgurations in Vigoda’s analysis that are extremal under his choice of ﬂip parameters. In Appendix C and D, we supply proofs of technical lemmas that appeared in Sections 4 and 5 respectively. In Appendix E, we complete the proof of Theorem 1.4 that was sketched in Section 6.

2

Preliminaries

Let N0 denote the set of non-negative integers. In a graph G = (V, E), for vertex v ∈ V deﬁne N (v) to be the set of neighbors of v and ∆(v) to be the degree of vertex v. Given a (proper or improper) coloring σ : V → [k], deﬁne Aσ (v) to be the set of colors available to v, i.e. the set of colors c for which no neighbor of v is colored c. Given a Markov chain with transition probability matrix P on ﬁnite state space Ω and initial state σ (0) , denote the distribution of state σ (t) at time t by P t (σ (0) , ·). Denote the stationary distribution of an ergodic Markov chain by π, let ′

τmix (ǫ) := max min{t : dTV (P t (σ (0) , ·), π) ≤ ǫ ∀t′ ≥ t}, σ(0) ∈Ω

where dTV (·, ·) is the total variation distance, and deﬁne the mixing time τmix to be τmix (1/2e). The state spaces we will be interested in are Ω = [k]V the set of all colorings (or labellings) of G, and Ω∗ the set of proper colorings of G.

2.1

The Flip Dynamics

The Markov chain we use is a variant of the Wang-Swendsen-Koteck´ y (WSK) algorithm [WSK89] studied in [Vig99], which we deﬁne below. In a (proper or improper) coloring σ of a graph G, for vertex v and color c let the Kempe component Sσ (v, c) denote the set of vertices w for which there exists an alternating path between v and w using only the colors c and σ(v). Under this deﬁnition, Sσ (v, σ(v)) = ∅. The motivation for this deﬁnition is that if σ is proper, then if one flips Sσ (v, c), 5

i.e. changes the color of all σ(v)-colored vertices in Sσ (v, c) to c and that of all c-colored vertices in Sσ (v, c) to σ(v), the resulting coloring is still proper. Definition 2.1. Let p = {pα }α∈N0 be a collection of flip parameters. The flip dynamics is a random process generating a sequence of colorings σ (0) , σ (1) , σ (2) , . . . of G where σ (0) is an arbitrary coloring in Ω and σ (t) is generated from σ := σ (t−1) as follows: 1. Select a random vertex v (t) and a random color c(t) . 2. Let α = |Sσ (v (t) , c(t) )| and ﬂip Sσ (v (t) , c(t) ) with probability pα /α. The reason for the pα /α term is that we have a nice equivalent way of formulating the ﬂip dynamics. Let Sσ denote the family of all Kempe components in G under the coloring σ, i.e. all S ⊂ V for which there exist v, c such that S = Sσ (v, c). Here we emphasize that Sσ is a multiset.5 Then, σ (t) is generated from σ := σ (t−1) as follows: 1. Pick any component S (t) ∈ Sσ , each with probability 1/nk. 2. Let α = |S (t) | and ﬂip S (t) with probability pα . Because the ﬂip dynamics embeds Glauber dynamics, it is ergodic on the space of proper colorings for every k ≥ ∆ + 2. As every improper coloring has a positive probability of eventually being transformed into a proper one, the space of proper colorings is the only closed subset of the space of all colorings. The following holds as a consequence: Lemma 2.1 ([Vig99]). The stationary distribution of the flip dynamics is the uniform distribution over proper colorings of G. The WSK algorithm corresponds to the choice of pα = α for all α ∈ N0 . For the purposes of path coupling, it turns out one only needs to ﬂip Kempe components of size at most some absolute constant Nmax (in Vigoda’s chain, Nmax = 6), and this “local” nature of the ﬂip dynamics will simplify the analysis. Henceforth, we will take p0 = 0, p1 = 16 , and 0 ≤ pα+1 ≤ pα ≤ 1 for all α ≥ 1.

2.2

Path Coupling

Coupling is a common way to bound the mixing time of Markov chains. A T -step coupling for a Markov chain with transition matrix P and state space Ω deﬁnes for every initial (σ (0) , τ (0) ) ∈ Ω2 a stochastic process (σ (t) , τ (t) ) such that the distribution of σ (T ) (resp. τ (T ) ) is the same as P T (σ (0) , ·). (resp. P T (τ (0) , ·)). The coupling inequality states that for any starting point σ (0) for the Markov chain, (1) dTV (σ (t) , π) ≤ max P[σ (T ) 6= τ (T ) ]. τ (0)

We will think of T -step couplings as random functions Ω2 → Ω2 , so we will denote them by 7→ (σ (T ) , τ (T ) ), or more succinctly, (σ, τ ) 7→ (σ ′ , τ ′ ). For an initial pair of colorings σ, τ , we say that a coupling γ-contracts for (σ, τ ) for some γ ∈ (0, 1) and metric d on Ω if it satisﬁes (σ (0) , τ (0) )

E[d(σ ′ , τ ′ )] ≤ γ d(σ, τ ).

(2)

If there exist α > 0 and a coupling that (1 − α)-contracts for all (σ, τ ), then one can show that τmix = O(T log(D)/α), where D is the diameter of Ω under d. 5

For each color c not in the neighborhood of v, there exists a distinct component Sσ (v, c) = {v} in Sσ . Note that one must set p1 = 1 because otherwise, by rescaling all flip parameters by a factor of 1/p1 , the mixing time simply scales by a factor of 1/p1 . 6

6

For the rest of this subsection, we specialize our discussion of coupling to the setting of sampling colorings. Fix any Markov chain over Ω whose stationary distribution is the uniform distribution over Ω∗ , e.g. the Glauber or ﬂip dynamics, and denote it by σ 7→ σ ′ . In complicated state spaces like the space of all proper colorings of a graph, it is often tricky to construct couplings that give good bounds on mixing time. Path coupling, introduced in [BD97], is a useful tool for simplifying this process: rather than deﬁne (σ, τ ) 7→ (σ ′ , τ ′ ) for all (σ, τ ) ∈ Ω2 , it is enough to do so for a small subset of initial pairs in Ω2 . This subset is speciﬁed by a pre-metric. Definition 2.2. A pre-metric on Ω is a pair (Γ, ω) where Γ is a connected, undirected graph with vertex set Ω, and ω is a function that assigns positive, real-valued weights to edges στ of Γ such that for every edge στ , ω(στ ) is the minimum weight among all paths between σ and τ . We will often refer to a pair of adjacent colorings σ, τ in Γ as a neighboring coloring pair, denoted by (G, σ, τ ). Where the context is clear, we omit G and refer to neighboring coloring pairs as (σ, τ ). For any σ ˜ , τ˜ ∈ Ω, let Pσ˜ ,˜τ denote the set of simple paths φ = (φ0 , . . . , φs ) in Γ whereP φ0 = σ ˜ and φs = τ˜. The metric d induced by the pre-metric (Γ, ω) is deﬁned by d(˜ σ , τ˜) := minφ∈Pσ˜,˜τ si=1 ω(φi−1 φi ).

Example 2.1. If Γ is the graph with vertex set Ω and edges between colorings σ, τ which differ on exactly one vertex, and ω assigns weight 1 to all edges of Γ, then the metric induced by (Γ, ω) is simply the Hamming distance dH , i.e. the total number of vertices on which two colorings differ.

Theorem 2.1 ([BD97]). Let (Γ, ω) be a pre-metric on Ω where ω takes on values in (0, 1], and let d be the metric it induces. If the Markov chain σ 7→ σ ′ has a coupling (σ, τ ) 7→ (σ ′ , τ ′ ) defined for all neighboring coloring pairs (σ, τ ) that (1 − α)-contracts for some α > 0, then there exists a coupling defined for all pairs of colorings (σ, τ ) ∈ Ω2 which (1 − α)-contracts. Remark 2.1. The reason we need to extend the state space from Ω∗ to Ω is already apparent in the context of path coupling with the Hamming metric. Given two colorings σ, τ for which dH (σ, τ ) = ℓ, there does not necessarily exist a sequence of proper colorings σ = σ0 , σ1 , . . . , σℓ = τ of length ℓ for which (σi , σi+1 ) are neighboring coloring pairs for all 0 ≤ i < ℓ. However, there certainly exist such sequences if we allow the colorings to be improper.

2.3

Variable-Length Coupling

In this section we review the basics of variable-length coupling. This will only be relevant to Section 4 which gives the ﬁrst of our two proofs of Theorem 1.1. Jerrum’s k > 2∆ bound [Jer95] and Vigoda’s k > 11 6 ∆ bound [Vig99] can both be proved via one-step path couplings. Yet there is evidence that multi-step couplings can sometimes be stronger than one-step couplings. As shown in [KR01], there exist Markov chains for some sampling problems where one-step coupling analysis is insuﬃcient. [CKKL99] used multi-step coupling for a tighter analysis of a Markov chain for sampling random permutations, and the celebrated k ≥ (1 + ǫ)∆ result of [HV03] for Ω(log n)-degree graphs uses a multi-step coupling which is constructed by looking into future time steps. There are also several other works that carried out a multi-step coupling analysis by looking at one-step coupling over multiple time steps [DGG+ 01, DGM02, HV07] and obtained slight improvements over Jerrum’s k > 2∆ bound by terminating path coupling of the Glauber dynamics at a random stopping time. In the literature, this is known as variable-length coupling, and this is the approach we take in Section 4, but for the ﬂip dynamics. Definition 2.3. [Deﬁnition 1 in [HV07]] For every initial neighboring coloring pair (σ (0) , τ (0) ), let (σ, τ , Tstop ) be a random variable where Tstop is a nonnegative integer, and σ, τ are sequences 7

of colorings (σ (0) , . . . , σ (Tstop ) ) ∈ ΩTstop and (τ (0) , . . . , τ (Tstop ) ) ∈ ΩTstop respectively. We say that (σ, τ , Tstop ) is a variable-length path coupling if σ, τ are faithful copies of the Markov chain in the following sense. For (σ (0) , τ (0) ) and t ≥ 0, deﬁne the random variables σt , τt via the following experiment: 1) sample (σ, τ , Tstop ), 2) if t ≤ T , deﬁne σt = σ (t) , τt = τ (t) , 3) if t > Tstop , then sample σt and τt from P t−Tstop (σ (Tstop ) , ·) and P t−Tstop (τ (Tstop ) , ·) respectively. We say that σ (resp. τ ) is a faithful copy if for every neighboring coloring pair (σ (0) , τ (0) ) and t ≥ 0, σt and τt deﬁned above are distributed according to P t (σ (0) , ·) and P t (τ (0) , ·) respectively. When Tstop is always equal to some ﬁxed T , this is just the usual notion of T -step path coupling. The following extends the path coupling theorem of [BD97] to variable-length path coupling. Theorem 2.2 (Corollary 4 of [HV07]). Let ǫ > 0. For a variable-length path coupling (σ, τ , Tstop ), let α := 1 − max E[dH (σ (Tstop ) , τ (Tstop ) )], σ(0) ,τ (0)

W :=

max

σ(0) ,τ (0) ,t≤Tstop

dH (σ (t) , τ (t) ),

β := max E[Tstop ]. σ(0) ,τ (0)

If α > 0, then τmix (ǫ) ≤ 2 ⌈2βW/α⌉ · ⌈ln(n/ǫ)/α⌉ .

2.4

Vigoda’s Greedy Coupling

In Appendix A, we give a self-contained review of the one-step path coupling analysis from [Vig99]. We encourage readers unfamiliar with his analysis to simply read the entirety of Appendix A in place of this subsection, as here we only introduce relevant notation and state the parts of his analysis that are essential to our proofs. Fix a neighboring coloring pair (G, σ, τ ). For c ∈ [k], let Uc denote the set of neighbors of v (in either coloring) that are colored c, and let δc = |Uc |. We will sometimes denote the vertices in Uc by {uc1 , . . . , ucδc }; where c is clear from context, we will simply denote these by {u1 , . . . , uδc }. Note that the symmetric diﬀerence D = Sσ ∆Sτ is precisely the Kempe components Sσ (uci , τ (v)) and Sσ (v, c) in σ and the Kempe components Sτ (uci , σ(v)) and Sτ (v, c) in τ , for all colors c appearing in the neighborhood of v and all i ∈ [δc ]. All other Kempe components are shared between σ and τ , so for those, it is enough to use the identity coupling. Note that for colors c 6= σ(v), τ (v) not appearing in N (v), the identity coupling then matches the ﬂip of Sσ (v, c) to that of Sτ (v, c) so that the two colorings of G become identical. So the main concern is how to couple the ﬂips of components in D. It is easy to see that for c 6= σ(v), τ (v), ! ! δc δc [ [ Sσ (uci , τ (v)) ∪ {v}, (3) Sτ (uci , σ(v)) ∪ {v} Sτ (v, c) = Sσ (v, c) = i=1

i=1

Purely for simplicity of exposition, we will assume that the sets Sσ (uci , τ (v)) are distinct as i varies, and likewise for Sτ (uci , σ(v)), and we will only consider c 6= σ(v), τ (v), referring the reader respectively to Remarks A.1 and A.2 in Appendix A for the missing details. We remark that the extra cases of c = σ(v), τ (v) are the primary place where one needs to be careful about the fact that neighboring coloring pairs σ, τ need not be proper. For c such that δc > 0, deﬁne Ac := |Sσ (v, c)|, Bc := |Sτ (v, c)|, aci = |Sτ (uci , σ(v))|, and c bi = |Sσ (uci , τ (v))|. Deﬁne the vectors ac := (aci : i ∈ [δc ]) and bc := (bci : i ∈ [δc ]). We say that a neighboring coloring pair (G, σ, τ ) has configuration (Ac , Bc ; ac , bc ) of size δc for color c. Also deﬁne acmax = maxi aci and denote a maximizing i by icmax . Likewise deﬁne bcmax = maxj bcj 8

c . When the color c is clear from context, we refer to these as and denote a maximizing j by jmax A, B, ai , bi , a, b, amax , imax , bmax , jmax . When c 6= σ(v), τ (v), note that X X Ac = 1 + aci , Bc = 1 + bci . (4) i

i

While Sσ (v, c) and Sτ (v, c) can be quite diﬀerent, we do know that Sσ (v, c) ⊃ Sτ (ui , σ(v)). The idea of Vigoda’s coupling is thus to greedily couple the ﬂips of the largest components, i.e. Sσ (v, c), Sτ (v, c), to the ﬂips of the next largest components, i.e. Sτ (uimax , σ(v)), Sσ (ujmax , τ (v)), and then to couple together as closely as possible the ﬂips of Sσ (ui , τ (v)) and Sτ (ui , σ(v)) for each i ∈ [δc ]. Henceforth, we will refer to this coupling as the greedy coupling. For any conﬁguration (A, B; a, b), deﬁne X H(A, B; a, b) := (A − amax − 1)pA + (B − bmax − 1)pB + (5) ai qi + bi qi′ − min(qi , qi′ ), i

where qi = pai − pA · 1i=imax and qi′ = pbi − pB · 1i=jmax . In [Vig99] it is shown that under this greedy coupling, for c 6= σ(v), τ (v) appearing in the neighborhood of v, kn · E[1Xc · (dH (σ ′ , τ ′ ) − 1)] ≤ H(Ac , Bc ; ac , bc ),

(6)

where Xc is the event that the coupling ﬂips components in Dc in both colorings. Therefore: Lemma 2.2 ([Vig99]). Let (σ, τ ) 7→ (σ ′ , τ ′ ) be the greedy coupling. Then X 1 E[dH (σ ′ , τ ′ ) − 1] ≤ −|{c : δc = 0}| + H(Ac , Bc ; ac , bc ) . nk

(7)

c:δc 6=0

The function H implicitly depends on the choice of ﬂip parameters {pα }, while (Ac , Bc ; ac , bc ) depends on (G, σ, τ ). The remaining analysis in [Vig99] once (7) has been deduced essentially boils down to picking a good set of ﬂip parameters.

3

Linear Programming and Choice of Flip Parameters

The key idea to choose the ﬂip parameters is to cast the problem as an instance of linear programming with variables {pα }α∈N0 to minimize the right-hand side of (7) over all neighboring coloring pairs (G, σ, τ ) where G has maximum degree ∆ and σ, τ are k-colorings. The following gives terminology for quantifying over all such (G, σ, τ ). Definition 3.1. A conﬁguration (A, B; a, b) is realizable if there exists a neighboring coloring pair (G, σ, τ ) and color c such that (A, B; a, b) = (Ac , Bc ; ac , bc ). Vigoda’s remaining analysis can thus be interpreted as solving the following linear program. Linear Program 1. For variables {pα }α∈N0 and λ, minimize λ subject to: p0 = 0 ≤ pα ≤ pα−1 ≤ p1 = 1 for all α ≥ 2, and H(A, B; a, b) ≤ −1 + λ · m for all realizable (A, B; a, b) of size m. There are three minor issues with this linear program: (a) the linear program has an inﬁnite number of variables, (b) it has an inﬁnite number of constraints, and (c) given a, b, it is not immediately obvious how to enumerate all A, B for which (A, B; a, b) is realizable. 9

Vigoda handles (a) by restricting to ﬂips of components of size at most Nmax , i.e. by ﬁxing some small constant Nmax and insisting that pα = 0 ∀α > Nmax .

(8)

We emphasize that this still leaves an inﬁnite number of constraints as m can be unbounded. He handles (b) by shrinking the feasible region via the following observation. P Lemma 3.1 ([Vig99]). H(A, B; a, b) ≤ (A − 2)pA + (B − 2)pB + i (ai pai + bi pbi − min(pai , pbi )).

Whereas f (ui ) are linear functions of pai , pbi , pA , pB , the summands in the upper bound of Lemma 3.1 are simply linear functions of pai , pbi . So we can pick some m∗ ([Vig99] picks m∗ = 3) and replace the inﬁnitely many constraints for which m ≥ m∗ in Linear Program 1 with ﬁnitely many constraints to optimize the right-hand side of Lemma 3.1. Finally, Vigoda implicitly handles (c) as follows. To cover all constraints corresponding to realizable (Ac , Bc ; ac , bc ) with c 6= σ(v), τ (v) include H(A, B; a, b) ≤ −1 + λ · m,

(9)

for all 1 ≤ m < m∗ and (A, B; a, b) for which a, b ∈ {0, 1, . . . , Nmax }m \{(0, 0, . . . 0)} and A, B satisfy (4). As we will discuss in Appendix A.2, to cover all constraints corresponding to c = σ(v) and c = τ (v), it is enough to include (B − bm )pB +

m−1 X i=1

bi pbi ≤ −1 + λ · m,

for all 2 ≤ m < m∗ , 0 ≤ b1 ≤ · · · ≤ bm ≤ Nmax where bm > 0, and B = α · pα ≤ 1, for all α ∈ N0 .

(10) P

i bi ,

as well as (11)

Concretely, we have the following relaxation of Linear Program 1. Linear Program 2. Fix some Nmax ≥ 1 and m∗ ≥ 2. For variables {pα }α∈N0 and λ, and dummy variables x, y, minimize λ subject to the following constraints: p0 = 0 ≤ pα ≤ pα−1 ≤ p1 = 1 for all α ≥ 2, constraint (8), constraint (9) for all a, b ∈ {0, 1, . . . , Nmax }m \{(0, . . . , 0)} with 1 ≤ m < m∗ ∗ and A, B satisfying P (4), constraint (10) for all 2 ≤ m < m and 0 ≤ b1 ≤ · · · ≤ bm ≤ Nmax where bm > 0 and B = i bi , constraint (11), and constraints x ≥ (A − 2)pA

y ≥ a · pa + b · pb − min(pa , pb )

−1 + λ · m∗ ≥ 2x + m∗ · y

(12)

for every A, a, b satisfying 0 ≤ A ≤ 1 + Nmax and 0 ≤ a < b ≤ Nmax . Remark 3.1. Note that we only add in constraints for the upper bound of Lemma 3.1 in the case of m = m∗ (constraint (12)) because the constraints for m = m∗ implies the constraints for m > m∗ . Lemma 3.2. Let λ∗2 be the objective value of Linear Program 2. If λ∗2 ≥ 1 and k > λ∗2 d, then there exist flip parameters {pα }α∈N0 for which E[dH (σ ′ , τ ′ ) − 1] < 0 for all neighboring coloring pairs (G, σ, τ ), where (σ, τ ) 7→ (σ ′ , τ ′ ) is the greedy coupling. 10

Proof. Let {pα }α∈N0 be ﬂip parameters achieving objective value λ∗2 . By (7) and Lemma 3.1, we have that X nk · E[dH (σ ′ , τ ′ ) − 1] ≤ −|{c : δc = 0}| + H(Ac , Bc ; ac , bc ) ≤ −k + λ∗2 · d < 0. c:δc 6=0

In [Vig99], Vigoda shows that for m∗ = 3, Nmax = 6, and the following ﬂip parameters, Linear Program 2 attains a value λ∗2 = 11/6: p1 = 1, p2 = 13/42, p3 = 1/6, p4 = 2/21, p5 = 1/21, p6 = 1/84 and pα = 0 ∀α ≥ 7.

(13)

Not only is this assignment a feasible solution to Linear Program 2, but it happens to be an optimal solution of Linear Programs 1 and 2. We will be interested in which constraints are tight under such optimal solutions, as they will guide us to the conﬁgurations key to our proofs. Definition 3.2. Given a feasible solution p of Linear Program 1 with objective value λ, a conﬁguration (A, B; a, b) of size m is p-extremal (or simply extremal if the ﬂip parameters are clear from the context) if H(A, B; a, b) = −1 + λ · m under the assignation p. Vigoda’s proof in [Vig99] already implicitly gives a collection of six extremal conﬁguration under the assignment (13) (see Observation B.1 in Appendix B). It turns out that among these tight constraints, two of them are already enough to force the objective value of Linear Program 2 to be 11/6. Consider the following linear program obtained by restricting to constraints (9) associated with conﬁgurations (3, 2; (2), (1)) and (7, 3; (3, 3), (1, 1)), which are both realizable. Linear Program 3. For variables {pα }α∈N0 and λ, minimize λ subject to: p0 = 0 ≤ pα ≤ pα−1 ≤ p1 = 1 and p1 + p2 − 2p3 − min(p1 − p2 , p2 − p3 ) ≤ −1 + λ,

2p1 + 5p3 − min(p1 − p3 , p3 − p7 ) ≤ −1 + 2λ.

It is easy to check that Linear Program 3 also has objective value 11/6, and its constraints are a strict subset of those of Linear Programs 1 and 2, from which we conclude that Corollary 3.1. The objective values of Linear Program 1, Linear Program 2 with Nmax ≥ 6 and m∗ = 3, and Linear Program 3 are all equal to 11/6. Corollary 3.1 allows us to exhibit a family C of just two neighboring coloring pairs (G, σ, τ ) for which no one-step coupling, greedy or otherwise, simultaneously contracts with respect to the Hamming metric for k < (11/6)∆ for all (G, σ, τ ) ∈ C. To clarify, this lemma is not used in the proofs of our main result, but provides intuition for the limitations of one-step coupling with respect to the Hamming metric and motivates our two approaches for circumventing them. Construction 3.1. Let G1 be the tree of height 2 rooted at vertex v with ∆ path graphs, each consisting of 2 other vertices, attached to v. Let u1 , . . . , u∆ be the neighbors of v. In colorings σ1 , τ1 , assign two different colors to v, σ1 (v) and τ1 (v), assign each ui the color ci 6= σ1 (v), τ1 (v), and assign the child of ui the color σ1 (v). For ∆ even, let G2 be the tree of height two rooted at a vertex v with exactly ∆ children u1 , . . . , u∆ such that each ui has exactly two children w1i and w2i . In colorings σ2 , τ2 , assign two different colors to v, σ2 (v) and τ2 (v), assign u2j−1 and u2j the color cj 6= σ2 (v), τ2 (v) for j ∈ {1, . . . , ∆/2}, and assign all wℓi the color σ2 (v). Let C ∗ = {(G1 , σ1 , τ1 ), (G2 , σ2 , τ2 )} (see Figure 1). 11

σ1 (v)

σ2 (v)

σ2 (v)

σ2 (v) σ1 (v)

σ2 (v)

σ1 (v)

c2

c1

σ2 (v)

c8

c1

σ2 (v)

c1

c4

σ2 (v)

c3

σ1 (v)

c7

σ1 (v) τ1 (v)

c4

c5

σ1 (v)

σ2 (v)

c2

σ1 (v)

c4

σ2 (v) τ2 (v)

σ2 (v)

c6

σ2 (v)

c2 σ2 (v)

c3

c3

σ2 (v)

σ1 (v) σ2 (v) σ1 (v)

σ2 (v) σ2 (v)

(G1 , σ1 , τ1 )

σ2 (v)

(G2 , σ2 , τ2 )

Figure 1: Neighboring coloring pairs deﬁned in Construction 3.1 for ∆ = 8. Lemma 3.3. If k < (11/6)∆, there exists no choice of flip parameters {pα } and one-step coupling (σ, τ ) 7→ (σ ′ , τ ′ ) for which E[dH (σ ′ , τ ′ ) − 1] < 0 for all (G, σ, τ ) ∈ C ∗ , where C ∗ is defined in Construction 3.1. We defer the proof of this to Appendix B. Lemma 3.3 states that it is impossible to design a one-step coupling analysis of the ﬂip dynamics with the Hamming metric that crosses Vigoda’s 11/6 barrier for general graphs (or even for the family of two trees deﬁned by C ∗ ). There are two natural strategies to overcome this problem: use a multi-step coupling analysis with the Hamming metric, or use a one-step coupling with an alternative metric. In the remainder of the paper we present two independent proofs of Theorem 1.1. In Section 4, we present the multi-step coupling approach to the proof as devised by Chen and Moitra [CM18]. In Section 5, we use the alternative metric approach to the proof as devised by Delcourt, Perarnau and Postle [DPP18]. For the sake of clarity, we defer the proof of some technical lemmas to Appendices C and D.

4

Proof of Theorem 1.1 using Variable-Length Coupling

In this section, we prove Theorem 1.1 by arguing that a suitably deﬁned variable-length coupling contracts with respect to the Hamming metric dH . Recall from Deﬁnition 2.2 that under the premetric inducing the Hamming metric, (G, σ, τ ) is a neighboring coloring pair if σ, τ are colorings of G that diﬀer on exactly one vertex v.

4.1

Modifying the LP

For a color c, the condition that (Ac , Bc ; ac , bc ) for a neighboring coloring pair (G, σ, τ ) be extremal is a very stringent condition on (G, σ, τ ). The hope is that for a suitable notion of “typical,” this condition holds for few colors c for a “typical” neighboring coloring pair. Definition 4.1. Given a neighboring coloring pair (G, σ, τ ) and a color c appearing in the neighborhood of v, then the pair σ, τ is in the state

12

1. Singc if δc = 1 and c 6= σ(v), τ (v)7 2. Badc if (Ac , Bc ; ac , bc ) is either (7, 3; (3, 3), (1, 1)) or (3, 7; (1, 1), (3, 3)) 3. Goodc otherwise. Moreover, deﬁne Nsing (σ, τ ), Nbad (σ, τ ), and Ngood (σ, τ ) to be the number of c for which (G, σ, τ ) is in state Singc , Badc , Goodc respectively. Observation 4.1. Let (G, σ, τ ) be any neighboring coloring pair. For c = σ(v), τ (v), if δc > 0, then σ, τ are in state Goodc . Proof. If δc = 1, then by deﬁnition σ, τ are in state Goodc . If δc ≥ 2, then because Ac = 0 for c = σ(v), τ (v) by Remark A.2, σ, τ must be in state Goodc . Nsing (σ, τ ) can be large even for a “typical” neighboring coloring: consider any (G, σ, τ ) where the neighbors of v form a clique. Indeed, this example is the reason that all existing results on sampling colorings that proceeded [Vig99] needed to at least assume triangle-freeness of G, otherwise the local uniformity properties they leverage simply do not hold. Instead of avoiding state Singc , we want “typical” neighboring coloring pairs to avoid Badc for many c. Speciﬁcally, we want Nbad (σ, τ ) to be at most a constant times Ngood (σ, τ ). Remark 4.1. At this point the reader may be wondering: why can we get away with only analyzing what happens to the extremal configurations of size 2 and not those of size 1? The reason is that neighboring coloring pairs where v is surrounded by many configurations of size 1 are precisely the kinds of examples on which Vigoda’s analysis does particularly well: under Jerrum’s maximal coupling, it is impossible to go beneath k > 2∆ for any one-step coupling of the Glauber dynamics with respect to the Hamming metric, but under Vigoda’s greedy coupling of the flip dynamics, one can perfectly couple the flips of Kempe components in Dc for any c with δc = 1 if Nmax is big enough. The reason Vigoda’s analysis doesn’t get all the way down to k > (1 + ǫ)∆ is simply that if Nmax is too big and the flip parameters too tuned to configurations of size 1, one cannot closely couple flips corresponding to configurations of size at least 2. For this reason, what we really care about is actually the fraction of configurations of size at least 2 around v that are extremal. Consider the following thought experiment. Let C consist of all neighboring coloring pairs such that for every (G, σ, τ ) ∈ C, Nbad (σ, τ ) ≤ γ · Ngood (σ, τ ) (14)

for some absolute constant γ > 0. Suppose k = (11/6 − ǫ)∆ for some small absolute constant ǫ > 0, and our goal is just to pick ﬂip parameters so every pair in C contracts. Observation B.1 and complementary slackness intuitively suggest that this should be possible for small enough ǫ depending only on γ. To get an eﬀective estimate for ǫ, we encode (14) into Linear Program 2: Linear Program 4. Introduce into Linear Program 2 the additional variables λsing , λbad , λgood . In constraints (12) and (10), replace λ with λgood . In the constraints (9) corresponding to configuration (A, B; a, b), replace λ with λsing if m = 1, λbad if (A, B; a, b) is either (7, 3; (3, 3), (1, 1)) or (3, 7; (1, 1), (3, 3)), or λgood otherwise. Lastly, introduce the constraints λ ≥ λsing ,

λ ≥ λgood ,

λ≥

γ 1 · λbad + · λgood . γ+1 γ+1

7 For c 6= σ(v), τ (v), Singc corresponds to the extremal configuration of size 1, as well as all other configurations which satisfy δc = 1. We include these latter configurations just for simplicity of analysis; if we did not do this, it would yield additional improvements upon our main result. Also note that we exclude the cases of c = σ(v), τ (v) from Singc because in those cases, we have that Ac = 0 in which case (Ac , Bc ; ac , bc ) is not extremal.

13

Call this the γ-mixed coupling LP and denote its objective value by λ∗γ . The following is straightforward to prove; see Appendix C.1 for a formal proof. Lemma 4.1. If k > λ∗γ ∆, then under the greedy coupling (σ, τ ) 7→ (σ ′ , τ ′ ), E[dH (σ ′ , τ ′ ) − 1] < 0 for any (G, σ, τ ) ∈ C. Next, we go from the intuition of this thought experiment to a rigorous notion of “typical” neighboring coloring pairs avoiding the state Badc . Having already reduced ﬁnding a coupling for all of C to analyzing the γ-mixed coupling LP, in the sequel we will reduce ﬁnding a coupling for all neighboring coloring pairs to analyzing the γ-mixed coupling LP.

4.2

Variable-Length Coupling

The key idea is that regardless of what neighboring coloring pair (G, σ, τ ) one starts with, the probability that σ ′ , τ ′ derived from one step of greedy coupling has changed in distance is Θ(1/n) (see Lemma 4.2 below). So in expectation, one can run Θ(n) steps of greedy coupling before the two colorings either coalesce to the same coloring or have Hamming distance greater than 1, but by that time the set of colors around v will have changed substantially. This is the main insight of [DGG+ 01, HV07], who leverage it to analyze the Glauber dynamics and slightly improve upon Jerrum’s k ≥ 2d bound under extra girth and degree assumptions. We leverage this insight to analyze the ﬂip dynamics under no extra assumptions. Our variable-length coupling simply runs greedy coupling until the distance between the colorings changes: start with neighboring colorings σ (0) , τ (0) , initialize t = 1, and repeat the following. 1. Run the greedy one-step coupling of Section 2.4 to ﬂip components St in σ (t−1) and St′ in τ (t−1) , producing σ (t) , τ (t) (note that St or St′ might be empty, e.g with probability 1 − pα , a component of size α that is chosen to be ﬂipped is not actually ﬂipped). 2. If dH (σ (t) , τ (t) ) 6= dH (σ (t−1) , τ (t−1) ), terminate and deﬁne Tstop = t. Else, increment t. We call any subsequence of pairs of ﬂips (Si , Si′ ), . . . , (Sj , Sj′ ) a coupling schedule starting from the neighboring coloring pair (G, σ (i−1) , τ (i−1) ). It is easy to see that this satisﬁes the conditions of being a variable-length coupling as in Deﬁnition 2.3. Indeed it is the same coupling as in [HV07], except for the ﬂip dynamics instead of the Glauber dynamics. Note that we can characterize which pairs of ﬂips (St , St′ ) terminate the coupling: at least one of them must belong to the symmetric diﬀerence D deﬁned in Section 2.4. Definition 4.2. Given a neighboring coloring pair (G, σ, τ ), a pair of components S in σ and S ′ in τ is terminating if S = Sσ (v, c) or S ′ = Sτ (v, c), or there exists u ∈ N (v) for which S = Sσ (u, τ (v)) or S ′ = Sτ (u, σ(v)). Note that for any t, St and/or St′ may be the empty set. Moreover, because ﬂips of components outside of D are matched via the identity coupling, if (St , St′ ) is not terminating, then St = St′ . Lemma 4.2. Let components S in σ and S ′ in τ be chosen according to the greedy coupling. Then k−∆−2 k + 2p2 ∆ ≤ P[(S, S ′ ) terminating] ≤ , nk nk where p2 is the flip parameter for components of size 2.

14

Proof. For the lower bound, note that the pair (Sσ (v, c), Sτ (v, c)) is terminating for any c. In particular, for c 6= {σ(v), τ (v)} such that δc = 0, Sσ (v, c) = Sτ (v, c) = {v} — note that while the vertex sets for these components are all {v}, the ﬂips are all distinct as they vary with c. Each such pair of ﬂips has probability mass (1/nk) · p1 = (1/nk), and there are at least k − ∆ − 2 such choices of c, giving the lower bound. We defer the proof of the upper bound to Appendix C.2. Corollary 4.1. maxσ(0) ,τ (0) E[Tstop ] ≤

nk k−∆−2 .

We now give a reduction from analyzing the expected change in distance under our variablelength coupling to proving that the relation (14) from our thought experiment holds in expectation by the end of the coupling. Lemma 4.3. Suppose there exists a constant γ > 0 for which E[Nbad (σ (Tstop −1) , τ (Tstop −1) )] ≤ γ · E[Ngood (σ (Tstop −1) , τ (Tstop −1) )] for any initial neighboring coloring pair (G, σ (0) , τ (0) ). Let λ∗Cγ be the objective value of the Cγ-mixed coupling LP, where C := E[dH (σ (Tstop ) , τ (Tstop ) ) − 1] ≤

−k + λ∗Cγ ∆ k−∆−2

k+2p2 ∆ k−∆−2 .

(15) Then

.

This mainly just follows by linearity of expectation and the calculation done in the proof of Lemma 4.1. The only complication is that the probability that the coupling terminates at any given point is not ﬁxed, but this is ﬁne because it is still the same up to constant factors, which only leads to the loss of a factor of C as deﬁned in the lemma. We defer the full proof to Appendix C.3. Remark 4.2. (15) is one way to say that a typical coloring has few extremal configurations around v. This is slightly different from the analogous notion in the second proof of Theorem 1.1 in Section 5. There, the goal is to show that the number of extremal configurations around v of size 1 or 2 goes down in one step of Vigoda’s greedy coupling. In contrast, we only choose to upper bound the fraction of configurations of size at least 2 at the end of our variable-length coupling which are extremal. That we don’t attempt to analyze how extremal components of size 1 break apart is purely out of technical convenience, and as we discuss in Remark 4.1, still enough to break the 11 6 barrier.

4.3

Few Extremal Configurations When Coupling Terminates

We are left with proving the main technical lemma that (15) holds. Throughout this subsection, assume that k ≥ 1.833∆. Lemma 4.4. Suppose flip parameters {pα }α∈N0 satisfy p0 = 0 ≤ pα ≤ pα−1 ≤ p1 = 1 for all α ≥ 2, 2 ∆) constraint (11), and additionally αpα−2 ≤ 3 for all α ≥ 3. Then for γ := (6k−∆−2)(k+2p 4(k−∆−2)(k−∆−1) , we have that (15) holds for any initial neighboring coloring pair (G, σ (0) , τ (0) ).

Remark 4.3. The additional constraint that αpα−2 ≤ 3 already holds for the solutions to the γ-mixed coupling LP, so we assume it just to obtain better constant factors in our analysis.

15

We ﬁrst make a simple reduction. Fix any initial neighboring coloring pair (G, σ (0) , τ (0) ), and for every color c denote by pbad (c) and pgood (c) the probability that σ (Tstop ) , τ (Tstop ) is in state Badc and Goodc , respectively. By linearity of expectation we have that X X E[Nbad (σ (Tstop ) , τ (Tstop ) )] = pbad (c), E[Ngood (σ (Tstop ) , τ (Tstop ) )] = pgood (c). c

c

Therefore to show Lemma 4.4, it is enough to show the following. Lemma 4.5. pbad (c) ≤ γ · pgood (c) for every color c. This is certainly true for c = σ (0) (v), τ (0) (v), in which case pbad (c) = 0 and pgood (c) = 1 by Observation 4.1. We point out that while the case of c = σ (0) (v), τ (0) (v) is the one for which the fact that our state space includes all colorings, improper and proper, introduces complications in the deﬁnition of the greedy coupling (see Remark A.2), it happens to be the easiest case of Lemma 4.5. So henceforth assume c 6= σ (0) (v), τ (0) (v). We proceed via a fractional matching argument. Take any coupling schedule Σpre = (S1 , S1 ), (S2 , S2 ), . . . , (ST −1 , ST −1 ) consisting of pairs of identical ﬂips, and deﬁne W to be the set of all coupling schedules of the form (S1 , S1 ), (S2 , S2 ), . . . , (ST −1 , ST −1 ), (ST , ST′ ) for (ST , ST′ ) terminating. In other words, W consists of all T -step coupling schedules whose ﬁrst T − 1 steps are ﬁxed to Σpre and which only changes the distance between the colorings in the last step (ST , ST′ ). We will match to the collection of schedules W an (inﬁnite) collection of schedules of the following form. Definition 4.3. Fix S1 , . . . , ST −1 . A coupling schedule Σ∗ starting from the neighboring coloring pair (G, σ (0) , τ (0) ) is satisfying if it is of the form Σ∗ = (S1 , S1 ), . . . , (ST −1 , ST −1 ), (ST∗ , ST∗ ), . . . , (ST∗ ∗ −1 , ST∗ ∗ −1 ), (ST∗ ∗ , ST′∗∗ )

(16)

for (ST∗ ∗ , ST′∗∗ ) terminating, and gives rise to a sequence of colorings (T )

(T )

(T ∗ )

(σ (0) , τ (0) ), (σ (1) , τ (1) ), . . . , (σ (T −1) , τ (T −1) ), (σ∗ , τ∗ ), . . . , (σ∗

(T ∗ )

, τ∗

)

for which (T ∗ −1)

1. σ∗

(t)

(T ∗ −1)

, τ∗

are in state Goodc

(t)

2. σ∗ , τ∗ is not in state Badc for any T ≤ t < T ∗ . In Deﬁnition 4.3, Property 2) ensures that from any satisfying Σ∗ , we can uniquely decode the collection W to which it is being fractionally matched: in Σ∗ , take the last pair of colorings in state Badc , and Σpre is the subsequence of Σ∗ starting from (σ (1) , τ (1) ) and ending at that pair. If we can show for any Σpre that X

Σ∗ satisfying

P[(ST∗ , ST∗ ), . . . , (ST∗ ∗ −1 , ST∗ ∗ −1 ), (ST∗ ∗ , ST′∗∗ )|Σpre ] ≥

1 k + 2p2 ∆ · , γ nk

then this will imply that pbad (c) ≤ γ · pgood (c) because the upper bound of Lemma 4.2 tells us that 2∆ P[(ST , ST′ ) terminating|Σpre ] ≤ k+2p . nk

16

To exhibit such a collection of satisfying coupling schedules Σ∗ , we ﬁrst deﬁne a coarsening of the state space as follows. Starting from the neighboring coloring pair (G, σ (T −1) , τ (T −1) ) which is in state Badc , take any subsequent coupling schedule (ST∗ , ST∗ ), . . . , (ST∗ ∗ , ST′∗∗ )

(17)

with (ST∗ ∗ , ST′∗∗ ) terminating which gives rise to a sequence of pairs of colorings (T )

(T ∗ )

(T )

(σ∗ , τ∗ ), . . . , (σ∗ (t)

(T ∗ )

, τ∗

),

(18)

(t)

where (G, σ∗ , τ∗ ) is a neighboring coloring pair for all t except t = T ∗ . Deﬁne the following auxiliary states. To avoid confusion with the states in Deﬁnition 4.1, we will refer to the auxiliary states deﬁned below as stages. Definition 4.4. Let c be any color, not necessarily one appearing in the neighborhood of v. We (t−1) (t−1) (t) (t) is in state Goodc and the pair of ﬂips , τ∗ say that σ∗ , τ∗ is in stage GoodEndc if σ∗ (t−1) (t−1) (t) (t) is terminating. , τ∗ (S, S ′ ) giving rise to σ∗ , σ∗ from σ∗ (t) (t) We say σ∗ , τ∗ is in stage BadEndc if, intuitively, we choose to quit looking for satisfying (t) (t) (t) (t) (0) (0) coupling schedules among those of which (σ∗ , τ∗ ), . . . , (σ∗ , τ∗ ) is a preﬁx. Formally, σ∗ , τ∗ is in stage BadEndc if least one of the following conditions holds (note that these conditions aren’t necessarily mutually exclusive): (T )

(T )

(i) t = T and the pair of ﬂips (S, S ′ ) giving rise to σ∗ , σ∗ from the initial pair σ (T −1) , τ (T −1) is terminating (i.e. if (S1 , S1 ), . . . , (ST −1 , ST −1 ), (S, S ′ ) ∈ W). (t)

(t)

(ii) t = T and σ∗ , τ∗ is not in state Goodc . (t)

(t−1)

(t−1)

(t)

is in state Goodc but σ∗ , τ∗ is not in state Goodc or stage GoodEndc (this , τ∗ (iii) σ∗ (t) (t) includes the case that c does not appear in the neighborhood of v in σ∗ , τ∗ ). (t−1)

(iv) t > T and σ∗ (t)

(t−1)

, τ∗

is in stage BadEndc .

(t)

If σ∗ , τ∗ is not in stage BadEndc or GoodEndc and is in state Badc (resp. Goodc ), then we say it is also in stage Badc (resp. stage Goodc ). Note that if a sequence of the form (18) contains a pair of colorings in stage GoodEndc , that (T ∗ ) (T ∗ ) (T ∗ ) (T ∗ ) is in stage pair must be σ∗ , τ∗ . Furthermore, given any sequence (18) for which σ∗ , τ∗ GoodEndc with associated coupling schedule (17), note that the corresponding coupling schedule Σ∗ deﬁned in (16) is satisfying, by deﬁnition of stage BadEndc . So it is enough to show that if we start from a neighboring coloring pair (G, σ (T −1) , τ (T −1) ) which is in state Badc and evolve a sequence of pairs of colorings (18) according to the greedy coupling at each step, then (T ∗ )

P[σ∗

(T ∗ )

, τ∗

are in stage GoodEndc |σ (T −1) , τ (T −1) ] ≥

1 k + 2p2 ∆ · . γ nk

(19)

It remains to bound the probabilities of the transitions between the diﬀerent stages of Deﬁnition 4.4 under the ﬂip dynamics and the greedy coupling (see Figure 2 for a depiction of the transitions that can occur). A key point is that these bounds will be independent of the speciﬁc colorings or structure of G. 17

Badc

Goodc

BadEndc

GoodEndc

Figure 2: Possible transitions among stages of Deﬁnition 4.4 (T ∗ )

(T ∗ )

(t)

(t)

to be in stage GoodEndc , σ∗ , τ∗ cannot be in stage Badc for any t ≥ T . In For σ∗ , τ∗ (T −1) (T −1) is in state Badc , the pair of colorings must escape from Badc , τ∗ other words, because σ∗ in the very ﬁrst step of (17) and never return. The following says that this probability of escape is comparable to the total probability mass of W. We defer its proof to Appendix C.4.

Lemma 4.6. Let σ, τ be any neighboring coloring pair in state Badc , and let σ ′ , τ ′ be derived from . one step of greedy coupling. Then P [σ ′ , τ ′ in state Goodc ] ≥ 4(k−∆−1) nk

Once a pair of colorings has escaped from state Badc into state Goodc , at every step it can only stay at Goodc , end at stage GoodEndc , or get absorbed into stage BadEndc . The next two lemmas say that the last two events have probability Ω(1/n) and O(1/n) respectively. Lemma 4.7. Let σ, τ be any neighboring coloring pair in stage Goodc , and let σ ′ , τ ′ be derived from one step of greedy coupling. Then P [σ ′ , τ ′ in stage GoodEndc ] ≥ k−∆−2 nk .

Lemma 4.8. Let σ, τ be any neighboring coloring pair in stage Goodc , and let σ ′ , τ ′ be derived from one step of greedy coupling. Then P [σ ′ , τ ′ in stage BadEndc ] ≤ n5 . Lemma 4.7 follows immediately from Lemma 4.2. Lemma 4.8 is the most technically involved step, and we defer its proof to Appendix C.4. We can now complete the proofs of Lemma 4.4 and Theorem 1.1. Proof of Lemma 4.4. Starting from σ (T −1) , τ (T −1) in stage Badc , by Lemma 4.6, the probability of transitioning to stage Goodc in the very next step is at least 4(k−∆−1) . As shown in Figure 2, once nk nk 5k = k−∆−2 times as we leave stage Badc we never return. From stage Goodc , it is at most n5 · k−∆−2 likely to eventually end up at stage BadEndc as it is to end up at stage GoodEndc , by Lemmas 4.7 k−∆−2 and Lemma 4.8. So the probability of ending in stage GoodEndc is at least 5k+(k−∆−2) , · 4(k−∆−1) nk and we conclude that (19) and consequently Lemma 4.4 hold for γ as deﬁned. Proof of Theorem 1.1. Note that for k > 1.833∆ and p2 < 0.3, γ =

(6k−∆−2)(k+2p2 ∆) 4(k−∆−2)(k−∆−1)

< 7.683410,

k+2p2 ∆ k−∆−2

< 2.920764, so Cγ < 25.597784 as deﬁned in Lemma 4.3. Thus, substituting while C = 25.597784 into the γ parameter for Linear Program 4 and solving numerically8 , we ﬁnd that for pˆ1 = 1, pˆ2 ≈ 0.296706, pˆ3 ≈ 0.166762, pˆ4 ≈ 0.101790, pˆ5 ≈ 0.058475, pˆ6 ≈ 0.025989, pˆα = 0 ∀α ≥ 7, (20) Linear Program 4 attains value λ∗ < 1.833239. So provided k ≥ 1.833239∆, Lemma 4.3 implies k−λ∗ ∆ . that the variable-length coupling is (1 − α)-contractive for absolute constant α := k−∆−2 For k ≥ 1.833239∆, Corollary 4.1 implies that β in the deﬁnition of Theorem 2.2 is at most nk k−∆−2 ≤ 2.21n, so applying Theorem 2.2 with β = 2.21n, and W = 2Nmax + 1 = 13 gives that τmix (ǫ) = O(n log(n/ǫ)). In particular, τmix = O(n log n). 8

Code for solving Linear Program 4 can be found at https://github.com/sitanc/mixedlp.

18

5

Proof of Theorem 1.1 Using an Alternative Metric

This section contains the proof of Theorem 1.1 using an alternative metric as presented in [DPP18]. Lemma 3.3 shows that a one-step coupling analysis using the Hamming metric cannot yield any improvement over 11/6. The ﬁrst step in our proof is to ﬁnd a set of ﬂip parameters that only has two extremal conﬁgurations, up to symmetry, namely the ones used to deﬁne Linear Program 3. We then introduce a new metric that depends on the number of colors in non-extremal conﬁgurations. Analyzing the one-step coupling deﬁned in Appendix A (greedy coupling) with this metric, we obtain a constant improvement over the 11/6 bound.

5.1

Choice of Flip Parameters and Expected Change in Hamming metric

By Corollary 3.1, the objective value of Linear Program 3 is 11/6. It is straightforward to check that λ = 11/6 only if the solution satisﬁes p3 = 1/6 and pα = 0 for all α ≥ 7. Since for any such assignment, the constraints corresponding to (3, 2; (1), (1)) and (7, 3; (3, 3), (1, 1)) are tight, we introduce the following variant of Linear Program 1: Linear Program 5. For variables {pα }α∈N0 and λ, minimize λ subject to the following constraints: p0 = 0 ≤ pα ≤ pα−1 ≤ p1 = 1 for all α ≥ 2, p3 = 1/6, pα = 0 for α ≥ 7 and for all realizable configurations (A, B; a, b) of size m different from (3, 2; (2), (1)), (2, 3; (1), (2)), (7, 3; (3, 3), (1, 1)) and (3, 7; (1, 1), (3, 3)), define a constraint H(A, B; a, b) ≤ −1 + λm. Consider the following reduced linear program with a ﬁnite set of variables and constraints. Linear Program 6. For variables {p1 , . . . , p6 } and λ, minimize λ subject to the following constraints: 0 ≤ pα ≤ pα−1 ≤ p1 = 1 for all α ≥ 2, p3 = 1/6, constraints i(pi − pi+1 ) + j(pj − pj+1 ) − min{pi − pi+1 , pj − pj+1 } ≤ −1 + λ,

(21)

for i ∈ {1, . . . , 6}, j ∈ {2, . . . , 6} with (i, j) 6= (1, 2), and

2p1 + 3p2 − min{p2 − p5 , p3 − p1 } ≤ −1 + 2λ.

One can solve the program using a computer. ˆ = 161 = 1.8295 . . . and an optimal Observation 5.1. The Linear Program 6 has objective value λ 88 solution is given by 1 47 9 2 185 , pˆ3 = , pˆ4 = , pˆ5 = , pˆ6 = . pˆ1 = 1, pˆ2 = 616 6 462 154 77 Lemma 5.1. The assignment p ˆ = {ˆ pα }α∈N0 where pˆα is given by Observation 5.1 for α ∈ [6] and ˆ pˆα = 0 otherwise, forms a feasible solution of Linear Program 5 with objective value λ. We defer the proof of the lemma to Appendix D.1. Observation 5.2. Consider the solution p ˆ of Linear Program 5 given in Observation 5.1. The constraints in Linear Program 1 that are not contained in Linear Program 5 are implied by the conditions p1 = 1, p3 = 1/6 and p7 = 0. Thus, p ˆ is a feasible solution of Linear Program 1 with objective value 11/6. Since the objective value of p ˆ in Linear Program 5 is strictly smaller than 11/6, up to symmetry, there are only two p ˆ-extremal configurations: (3, 2; (1), (1)) and (7, 3; (3, 3), (1, 1)). We conclude that H(A, B; a, b) ≤

(

11 6

ˆ= λ

161 88

for every p ˆ -extremal conﬁguration (A, B; a, b), otherwise. 19

(22)

5.2

Definition of the Alternative Metric

In this section we introduce the alternative metric we will use for the analysis of the one-step coupling, deﬁned using a pre-metric (Γ, ω). Let Γ be the graph with vertex set Ω where two colorings are adjacent if and only if they diﬀer at exactly one vertex. We proceed to deﬁne ω. Fix the assignation of ﬂip parameters p ˆ given in Observation 5.1. Our deﬁnition of the premetric is driven by the fact that extremal conﬁgurations are an obstacle to show contraction of the metric when k < 11∆/6. Deﬁne the following sets of colors, 1 Cσ,τ (v) := {c : (Ac , Bc , ac , bc ) is an extremal conﬁguration for (σ, τ ) of size 1} ,

2 Cσ,τ (v) := {c : (Ac , Bc , ac , bc ) is an extremal conﬁguration for (σ, τ ) of size 2} .

1 (v) ∪ C 2 (v) be the set of colors c such that (σ, τ ) has an extremal conﬁgLet Cσ,τ (v) = Cσ,τ σ,τ 2 (v), there are two neighbors of v with color c. uration for c. Note that for each color c ∈ Cσ,τ 1 (v)| + 2|C 2 (v)|)/∆, that is, the number of neighbors of v that participate in Let γσ,τ (v) = (|Cσ,τ σ,τ extremal conﬁgurations of (σ, τ ) normalised by a factor ∆; thus γσ,τ (v) ≤ 1. Let η ∈ 0, 12 be a suﬃciently small constant to be ﬁxed later. The weight function that we will use for our pre-metric is deﬁned as

ω(σ, τ ) := 1 − η(1 − γσ,τ (v)) .

(23)

dB (˜ σ , τ˜) := dH (˜ σ , τ˜) − d(˜ σ , τ˜) .

(24)

Note that ω(σ, τ ) ∈ [1 − η, 1]. Since η < 12 and γσ,τ (v) ≤ 1, every path containing at least two edges has weight greater than one. So every edge is a minimum weight path, implying that (Γ, ω) is a pre-metric. Let d be the metric on Ω obtained from (Γ, ω) using minimum weight paths in Γ. By Remark 2.1, for every (˜ σ , τ˜) ∈ Ω2 there exists a path between σ ˜ and τ˜ in Γ with dH (˜ σ , τ˜) edges in which every edge has weight at most 1. It follows that d(˜ σ , τ˜) ≤ dH (˜ σ , τ˜). Deﬁne

In general, dB is not a metric, here we will only use that it is non-negative. The contribution of dB will be crucial for the constant improvement over 11 6 in this approach. Given the greedy coupling (σ, τ ) → (σ ′ , τ ′ ) for neighboring coloring pairs (σ, τ ), deﬁne ∇(σ, τ ) := nk E d(σ ′ , τ ′ ) − d(σ, τ ) . (25)

Deﬁne the rescaled contributions to the expected change of dH and dB as ∇H (σ, τ ) := nk E dH (σ ′ , τ ′ ) − 1 , ∇B (σ, τ ) := −nk E dB (σ ′ , τ ′ ) − dB (σ, τ ) ,

and note that ∇(σ, τ ) = ∇H (σ, τ ) + ∇B (σ, τ ). We ﬁrst bound ∇H . Recall that Xc is the event that the coupling ﬂips Kempe components in Dc in both chains. Denote by X the complement of the event ∪c:δc >0 Xc . Deﬁne ∇H (σ, τ, c) := nk E 1Xc · (dH (σ ′ , τ ′ ) − 1) . (26) Note that E 1X · (dH (σ ′ , τ ′ ) − 1) = 0 as we use the identity coupling if X holds, so σ ′ and τ ′ only diﬀer at v. Using (6), Lemma 2.2, and (22), we obtain a bound on the expected change of the Hamming part of the metric in terms of the number on non-extremal conﬁgurations. Corollary 5.1. Let δ =

11 6

−

161 88 .

For every neighboring coloring pair (σ, τ ), we have 11 ∇H (σ, τ ) ≤ − δ (1 − γσ,τ (v)) ∆ − k . 6 20

5.3

Contribution of the Extremal Part of the Metric

In this section we bound ∇B (σ, τ ) from above. Deﬁne the contributions

∇B (σ, τ, c) := −nk E[1Xc · (dB (σ ′ , τ ′ ) − dB (σ, τ ))] , ∇B (σ, τ ) := −nk E[1X · (dB (σ ′ , τ ′ ) − dB (σ, τ ))] .

By equations (23) and (24), since dH (σ, τ ) = 1, we have dB (σ, τ ) = 1 − ω(σ, τ ) = η(1 − γσ,τ (v)). Moreover, dB (σ ′ , τ ′ ) ≥ 0. By the properties of the greedy coupling, ∇B (σ, τ, c) ≤ η(1 − γσ,τ (v))nkP[Xc = 1] ≤ 2η(1 − γσ,τ (v))(δc + 1) , where we have used that the probability of ﬂipping a given Kempe component is at most 1/nk. We can bound the expected change of ∇B as follows X ∇B (σ, τ ) = ∇B (σ, τ ) + ∇B (σ, τ, c) ≤ ∇B (σ, τ ) + 2η(k + ∆)(1 − γσ,τ (v)) . (27) c∈[k]

Let D = (Sσ ∪ Sτ ) \ D denote the set of Kempe components of σ and τ that do not involve vertex v. Note that each component in D is a Kempe component of both σ and τ . An important diﬀerence here as opposed to the analysis of the contribution of ∇H , is that the components in D have an eﬀect on the expected change of ∇B . For a coloring σ and S ∈ Sσ , let σS denote the coloring obtained by ﬂipping S in σ. ForPS ∈ D, since dH (σS , τS ) = 1, we have dB (σS , τS ) = η(1 − γσS ,τS (v)). It follows that, ∇B (σ, τ ) = η S∈D p|S| (γσS ,τS (v) − γσ,τ (v)). For each c ∈ [k] and i ∈ {1, 2} and S ∈ D, let i (v) and c ∈ −i if c ∈ Cσ,τ / CσS ,τS (v), if c ∈ / Cσ,τ (v) and c ∈ Cσi S ,τS (v), i 2 (v) and c ∈ C 1 ξσ,τ (v, c, S) := −1 if c ∈ Cσ,τ σS ,τS (v), 1 if c ∈ Cσ,τ (v) and c ∈ Cσ2S ,τS (v), 1 0 otherwise.

The variable ξσ,τ (v, c, S) can be understood as the contribution of color c to γσS ,τS (v) − γσ,τ (v). For every S ′ ⊆ D, we deﬁne η X p|S|ξσ,τ (v, c, S) , ∇B (σ, τ, c, S ′ ) := ∆ S∈S ′ P and note that ∇B (σ, τ ) = c∈[k] ∇B (σ, τ, c, D) . Next lemma bounds from above the contribution ∇B (σ, τ, c, D) for each c. This is the most technical part of our approach and we defer its proof to Appendix D.2. Lemma 5.2. For every neighboring coloring pair (σ, τ ) and color c, we have: i (v), then ∇ (σ, τ, c, D) ≤ −iη k − 3 ; i) For i ∈ {1, 2}, if c ∈ Cσ,τ B ∆ 2 ii) If c ∈ / Cσ,τ (v), then ∇B (σ, τ, c, D) ≤ 2η 9 + 15k ∆ . The following bound on ∇B follows directly from (27) and Lemma 5.2.

Corollary 5.2. For every neighboring coloring pair (σ, τ ), we have k 3 16k ∇B (σ, τ ) ≤ −η − γσ,τ (v) + 2η 10 + (1 − γσ,τ (v)) . ∆ 2 ∆ 21

5.4

Contraction of the Metric and Proof of Theorems 1.1 and 1.2

We now can show that the metric d contracts in expectation. Theorem 5.1. For the ˆ given in Observation 5.1 there exists ǫ0 , µ > 0 such that flip parameters p − ǫ for every k ≥ 11 ∆ and every neighboring coloring pair (σ, τ ), the greedy coupling satisfies 0 6 ∇(σ, τ ) ≤ −µk .

161 1 δ∆ 1 11 δ Proof. Recall that δ = 11 6 − 88 = 264 , and set η = 53k . Fix ǫ0 = 84000 and note that 6 − 318 ∆ ≤ k ≥ 95 . Using (25) and Corollaries 5.1 and 5.2, it k − µk for some small constant µ > 0. Note that ∆ follows that 11 16k 3 k ∇(σ, τ ) ≤ − δ − 2η 10 + − (1 − γσ,τ (v)) − η γσ,τ (v) ∆ − k 6 ∆ ∆ 2 11 δ ≤ − ∆ − k ≤ −µk . 6 318 We now proceed with the proof of our main result. Proof of Theorem 1.1. Consider the metric d on Ω deﬁned in Section 5.2. By Theorem 5.1, for the ﬂip probabilities p ˆ there exist ǫ0 , µ > 0 such that if k ≥ 11 6 − ǫ0 ∆, then the greedy coupling ′ ′ (σ, τ ) → (σ , τ ) deﬁned on neighbouring coloring pairs (σ, τ ) satisﬁes µ µ d(σ, τ ) . (28) E d(σ ′ , τ ′ ) ≤ d(σ, τ ) − ≤ 1 − n n By Theorem 2.1 with α = µ/n, we can extend the coupling over all (σ, τ ) ∈ Ω2 so (28) is still satisﬁed. As η < 1/2, for σ 6= τ one has d(σ, τ ) ∈ (1/2, n]. We use the coupling bound in (1) together with Markov’s inequality, to obtain for σ (0) ∈ Ω dTV (P t (σ (0) , ·), π) ≤ max P[σ (t) 6= τ (t) ] = max P[d(σ (t) , τ (t) ) ≥ 1/2] ≤ max 2 E[d(σ (t) , τ (t) )] τ (0) ∈Ω

τ (0) ∈Ω

τ (0) ∈Ω

≤ 2(1 − µ/n)t n . It follows that τmix (ǫ) ≤ Cn(log n + log ǫ−1 ), for some absolute constant C > 0.

6

List Colorings

In this section we introduce the notation for list-colorings and give an overview of the proof of Theorem 1.4, deferring the details to Appendix E. A list assignment of G is a function L : V (G) → 2N . An L-coloring is a function σ : V (G) → N such that σ(u) ∈ L(u) for all u ∈ V (G). Usually in the literature list-colorings are assumed to be proper, here we will not require this but distinguish between proper and not necessarily proper list-colorings. We denote by ΩL the set of all L-colorings of G. If |L(u)| = k for all u ∈ V (G), then we say that L is a k-list-assignment and that an L-coloring is a k-list-coloring. The Glauber dynamics for L-colorings is a discrete-time Markov chain (σ (t) ) with state space ΩL where σ (t) is generated from σ := σ (t−1) as follows: 1. Choose v (t) uniformly at random from V (G). 2. For all vertices v 6= v (t) , let σ (t) (v) = σ(v). 22

3. Choose c(t) uniformly at random from L(v (t) ), if c does not appear among the colors in the neighborhood of v (t) then let σ (t) (v (t) ) = c, otherwise let σ (t) (v (t) ) = σ(v (t) ). Although we deﬁne the chain over ΩL , σ (t) will converge to the uniform distribution on proper L-colorings, as in the case of non-list-colorings. Given σ ∈ ΩL , one can deﬁne Kempe components of σ as for colorings and we denote by SσL the multiset of Kempe components S = Sσ (u, c) with u ∈ V (G) and c ∈ L(u). Let σS be the coloring obtained from σ by swapping the colors in S and note that σS is not necessarily an L-coloring as the new color of a vertex might not be in its list. Given S = Sσ (v, c) ∈ SσL , we say that S is flippable if for every u ∈ S we have {σ(v), c} ⊆ L(u). If S is ﬂippable, then σS ∈ ΩL . Let p = {pα }α∈N0 be a collection of flip parameters. The flip dynamics for L-colorings is a random process generating a sequence of colorings σ (0) , σ (1) , σ (2) , · · · of G where σ (0) is an arbitrary coloring in ΩL and σ (t) is generated from σ := σ (t−1) as follows: 1. Choose v (t) uniformly at random from V (G). 2. Choose c(t) uniformly at random from L(v (t) ). 3. Let S = Sσ (v (t) , c(t) ) and α = |S|. If S is ﬂippable, with probability pα /α let σ (t) = σS , otherwise let σ (t) = σ. We prove the analogue of Theorem 1.1 for list-colorings. Theorem 6.1. The flip dynamics for sampling k-list-colorings is rapidly mixing with mixing time O(n log n), for any k ≥ ( 11 6 − ǫ0 )∆ where ǫ0 > 0 is the same constant from Theorem 1.1. The proof of this theorem follows the same strategy as the proof of Theorem 1.1. The main diﬀerence in the analysis of the ﬂip dynamics for list-coloring is that some of the moves that are valid in the non-list case, are not legal here. An important observation is that (4) no longer holds for list-colorings. For instance, it might be the case that Ac = 0 since Sσ (v, c) is not ﬂippable, while some of the aci 6= 0. This leads to a weaker deﬁnition of realizable conﬁguration and produces a linear program whose set of contraints is a superset of the constraints in Linear Program 1. An analysis similar to the one given for c ∈ {σ(v), τ (v)} implies that the new constraints have positive slack for the optimal solutions of the linear program we use, so no new extremal conﬁguration arises. Thus, the analysis of the expected change of the one-step greedy coupling with respect to the Hamming metric is the same as for non-list-colorings in Section 3. The other key observation common to extending both approaches of this work to list colorings is that if S = Sσ (v (t) , c(t) ) has size 1, then S is always ﬂippable since c(t) ∈ L(v (t) ). This is enough to show that the same ﬂip parameters used to prove Theorem 1.1 in either approach also work in the list coloring setting. In Section 4, only size 1 ﬂips are used to lower bound the probabilities of breaking apart extremal conﬁgurations and terminating the coupling, so Lemma 4.6, Lemma 4.7, and the lower bound in Lemma 4.2 still hold. It is also obvious that upper bounds on these events (Lemma 4.8 and the upper bound in Lemma 4.2) still hold, so Section 4 carries over to the list setting. ˆ given by ObservaIn Section 5, the list version of Corollary 5.1 holds for the ﬂip parameters p tion 5.1. As components of size one are the ones giving the negative contribution in Lemma 5.2, the list version of Corollary 5.2 also holds. These two corollaries imply Theorem 6.1 in a similar way as in Section 5.4. We provide a full proof of Theorem 6.1 in Appendix E using the approach of Section 5. This implies Theorem 1.4 in the same way that Theorem 1.1 implies Theorem 1.2.

23

7

Acknowledgments

The authors would like to thank Eric Vigoda for his helpful suggestion regarding the mixing time in Theorem 1.2, as well as the anonymous reviewers for their feedback.

References [BD97]

Russ Bubley and Martin Dyer. Path coupling: A technique for proving rapid mixing in Markov chains. In Proceedings of the 38th Annual Symposium on Foundations of Computer Science (FOCS), 1997., pages 223–231. IEEE, 1997.

[BD98]

Russ Bubley and Martin E Dyer. Faster random generation of linear extensions. In SODA, pages 350–354, 1998.

[BDG98]

Russ Bubley, Martin Dyer, and Catherine Greenhill. Beating the 2∆ bound for approximately counting colourings: a computer-assisted proof of rapid mixing. In 9th Annual Symposium on Discrete Algorithms, ACM–SIAM, New York–Philadelphia, pages 355– 363, 1998.

[BDK06]

Magnus Bordewich, Martin Dyer, and Marek Karpinski. Stopping times, metrics and approximate counting. In International Colloquium on Automata, Languages, and Programming, pages 108–119. Springer, 2006.

[BKMP05] Noam Berger, Claire Kenyon, Elchanan Mossel, and Yuval Peres. Glauber dynamics on trees and hyperbolic graphs. Probability Theory and Related Fields, 131(3):311–340, 2005. [BW02]

Graham R Brightwell and Peter Winkler. Random colorings of a Cayley tree. Contemporary Combinatorics, 10:247–276, 2002.

[CKKL99] Artur Czumaj, Przemka Kanarek, Miroslaw Kutylowski, and Krzyztof Lory´s. Delayed path coupling and generating random permutations via distributed stochastic processes. In Proceedings of the tenth annual ACM-SIAM Symposium on Discrete Algorithms, pages 271–280, 1999. [CM18]

Sitan Chen and Ankur Moitra. Linear programming bounds for randomly sampling colorings. arXiv preprint arXiv:1804.03156, 2018.

[DF03]

Martin Dyer and Alan Frieze. Randomly coloring graphs with lower bounds on girth and maximum degree. Random Structures & Algorithms, 23(2):167–179, 2003.

[DFFV06] Martin Dyer, Abraham D Flaxman, Alan M Frieze, and Eric Vigoda. Randomly coloring sparse random graphs with fewer colors than the maximum degree. Random Structures & Algorithms, 29(4):450–465, 2006. [DFHV13] Martin Dyer, Alan Frieze, Thomas P Hayes, and Eric Vigoda. Randomly coloring constant degree graphs. Random Structures & Algorithms, 43(2):181–200, 2013. [DGG+ 01] Martin Dyer, Leslie Ann Goldberg, Catherine Greenhill, Mark Jerrum, and Michael Mitzenmacher. An extension of path coupling and its application to the Glauber dynamics for graph colorings. SIAM Journal on Computing, 30(6):1962–1975, 2001.

24

[DGM02]

Martin Dyer, Catherine Greenhill, and Mike Molloy. Very rapid mixing of the Glauber dynamics for proper colorings on bounded-degree graphs. Random Structures & Algorithms, 20(1):98–114, 2002.

[DPP18]

Michelle Delcourt, Guillem Perarnau, and Luke Postle. Rapid mixing of Glauber dynamics for colorings below Vigoda’s 11/6 threshold. arXiv preprint arXiv:1804.04025, 2018.

[DSC93]

Persi Diaconis and Laurent Saloﬀ-Coste. Comparison theorems for reversible Markov chains. The Annals of Applied Probability, pages 696–730, 1993.

ˇ+ 16] Charilaos Efthymiou, Thomas P Hayes, Daniel Stefankovic, ˇ [EHS Eric Vigoda, and Yitong Yin. Convergence of MCMC and loopy BP in the tree uniqueness region for the hardcore model. In 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pages 704–713. IEEE, 2016. ˇ ˇ [EHSV18] Charilaos Efthymiou, Thomas P Hayes, Daniel Stefankoviˇ c, and Eric Vigoda. Sampling random colorings of sparse random graphs. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1759–1771. SIAM, 2018. [FV06]

Alan Frieze and Juan Vera. On randomly colouring locally sparse graphs. Discrete Mathematics and Theoretical Computer Science, 8, 2006.

[FV07]

Alan Frieze and Eric Vigoda. A survey on the use of Markov chains to randomly sample colourings. Oxford Lecture Series in Mathematics and its Applications, 34:53, 2007.

[GK07]

David Gamarnik and Dmitriy Katz. Correlation decay and deterministic FPTAS for counting list-colorings of a graph. In Proceedings of the eighteenth annual ACM-SIAM Symposium on Discrete Algorithms, pages 1245–1254. Society for Industrial and Applied Mathematics, 2007.

[GKM15]

David Gamarnik, Dmitriy Katz, and Sidhant Misra. Strong spatial mixing of list coloring of graphs. Random Structures & Algorithms, 46(4):599–613, 2015.

[GMP05]

Leslie Ann Goldberg, Russell Martin, and Mike Paterson. Strong spatial mixing with fewer colors for lattice graphs. SIAM Journal on Computing, 35(2):486–517, 2005.

ˇ [GSV14]

ˇ Andreas Galanis, Daniel Stefankoviˇ c, and Eric Vigoda. Inapproximability for antiferromagnetic spin systems in the tree non-uniqueness region. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 823–831. ACM, 2014.

[Hay03]

Thomas P Hayes. Randomly coloring graphs of girth at least ﬁve. In Proceedings of the thirty-fifth annual ACM Symposium on Theory of Computing, pages 269–278. ACM, 2003.

[Hay06]

Thomas P Hayes. A simple condition implying rapid mixing of single-site dynamics on spin systems. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2006., pages 39–46. IEEE, 2006.

[HS05]

Thomas P Hayes and Alistair Sinclair. A general lower bound for mixing of single-site dynamics on graphs. In 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2005., pages 511–520. IEEE, 2005. 25

[HV03]

Thomas P Hayes and Eric Vigoda. A non-Markovian coupling for randomly sampling colorings. In Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science (FOCS), 2003., pages 618–627. IEEE, 2003.

[HV05]

Tom Hayes and Eric Vigoda. Coupling with the stationary distribution and improved sampling for colorings and independent sets. In Proceedings of the sixteenth annual ACM-SIAM Symposium on Discrete Algorithms, pages 971–979. Society for Industrial and Applied Mathematics, 2005.

[HV07]

Thomas P Hayes and Eric Vigoda. Variable length path coupling. Random Structures & Algorithms, 31(3):251–272, 2007.

[HVV15]

Thomas P Hayes, Juan C Vera, and Eric Vigoda. Randomly coloring planar graphs with fewer colors than the maximum degree. Random Structures & Algorithms, 47(4):731– 759, 2015.

[Jer95]

Mark Jerrum. A very simple algorithm for estimating the number of k-colorings of a low-degree graph. Random Structures & Algorithms, 7(2):157–165, 1995.

[KR01]

VS Anil Kumar and Hariharan Ramesh. Coupling vs. conductance for the JerrumSinclair chain. Random Structures and Algorithms, 18(1):1–17, 2001.

[LM06]

Lap Chi Lau and Michael Molloy. Randomly colouring graphs with girth ﬁve and large maximum degree. In Latin American Symposium on Theoretical Informatics, pages 665–676. Springer, 2006.

[LV99]

Michael Luby and Eric Vigoda. Fast convergence of the Glauber dynamics for sampling independent sets. Random Structures & Algorithms, 15(3-4):229–241, 1999.

[Mol04]

Michael Molloy. The Glauber dynamics on colorings of a graph with high girth and maximum degree. SIAM Journal on Computing, 33(3):721–737, 2004.

[MS10]

Elchanan Mossel and Allan Sly. Gibbs rapidly samples colorings of G(n,d/n). Probability Theory and Related Fields, 148(1-2):37–69, 2010.

[Pot52]

Renfrey Burnard Potts. Some generalized order-disorder transformations. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 48, pages 106–109. Cambridge University Press, 1952.

[Sin92]

Alistair Sinclair. Improved bounds for mixing rates of Markov chains and multicommodity ﬂow. Combinatorics, Probability and Computing, 1(4):351–370, 1992.

[SS97]

Jes´ us Salas and Alan D Sokal. Absence of phase transition for antiferromagnetic Potts models via the Dobrushin uniqueness theorem. Journal of Statistical Physics, 86(34):551–579, 1997.

[TVVY12] Prasad Tetali, Juan C Vera, Eric Vigoda, and Linji Yang. Phase transition for the mixing time of the Glauber dynamics for coloring regular trees. The Annals of Applied Probability, 22(6):2210–2239, 2012. [Var17]

Shai Vardi. Randomly coloring graphs of logarithmically bounded pathwidth. arXiv:1708.02677, 2017.

26

[Vig99]

Eric Vigoda. Improved bounds for sampling colorings. In 40th Annual Symposium on Foundations of Computer Science (FOCS), 1999., pages 51–59. IEEE, 1999.

[WSK89]

Jian-Sheng Wang, Robert H Swendsen, and Roman Koteck` y. Antiferromagnetic Potts models. Physical Review Letters, 63(2):109, 1989.

A

Review of Vigoda’s Greedy Coupling

A.1

The Coupling

In this section we give a self-contained overview of Vigoda’s coupling analysis. For the reader’s convenience, we restate some notation that was already introduced in Section 2.4. Fix a neighboring coloring pair (G, σ, τ ). For c ∈ [k], let Uc denote the set of neighbors of v (in either coloring) that are colored c, and let δc = |Uc |. We will sometimes denote the vertices in Uc by {uc1 , ..., ucδc }; where c is clear from context, we will simply denote these by {u1 , ..., uδc }. Note that the symmetric diﬀerence D = Sσ ∆Sτ is precisely the Kempe components Sσ (uci , τ (v)) and Sσ (v, c) in σ and the Kempe components Sτ (uci , σ(v)) and Sτ (v, c) in τ , for all colors c appearing in the neighborhood of v and all i ∈ [δc ]. All other Kempe components are shared between σ and τ , so for those, it is enough to use the identity coupling. Note that for colors c 6= σ(v), τ (v) not appearing in N (v), the identity coupling then matches the ﬂip of Sσ (v, c) to that of Sτ (v, c) so that the two colorings of G become identical. So the main concern is how to couple the ﬂips of components in D. We can decompose D into ∪c:δc >0 Dc , where the sets Dc are deﬁned as follows: Definition A.1. Let Dc be the set of Kempe components consisting of Sσ (v, c), Sτ (v, c), and all Sσ (u, τ (v)) and Sτ (u, σ(v)) for all u ∈ Uc . Informally, Dc is the subset of D that involves the color c. It is easy to see that for c 6= σ(v), ! δc [ Sτ (uci , σ(v)) ∪ {v}, (29) Sσ (v, c) = i=1

and when c = σ(v), Sσ (v, c), Sτ (u, σ(v)) = ∅ for any u ∈ Uc . Likewise we have that for c 6= τ (v), ! δc [ Sσ (uci , τ (v)) ∪ {v}, (30) Sτ (v, c) = i=1

and when c = τ (v), Sτ (v, c), Sσ (u, τ (v)) = ∅ for u ∈ Uc . The sets Dc are disjoint except possibly the pair Dσ(v) , Dτ (v) , as these both contain (σ(v), τ (v))colored Kempe components, though we defer this point to later. Remark A.1. One subtlety is that there may exist multiple neighbors u′1 , . . . , u′m ∈ N (v) which are colored c but which satisfy Sτ (u′1 , σ(v)) = · · · = Sτ (u′m , σ(v)); to guarantee that the flip of each component is considered exactly once, redefine Sτ (u′i , σ(v)) = ∅ for all 1 < i ≤ m. Handle the components Sσ (u′i , τ (v)) analogously. In [Vig99], Vigoda couples ﬂips of Kempe components within Dc as follows. First we require some notation. For c such that δc > 0, deﬁne Ac := |Sσ (v, c)|, Bc := |Sτ (v, c)|, aci = |Sτ (uci , σ(v))|, and bci = |Sσ (uci , τ (v))|. Deﬁne the vectors ac := (aci : i ∈ [δc ]) and bc := (bci : i ∈ [δc ]). Also deﬁne acmax = maxi aci and denote the maximizing i by icmax . Likewise deﬁne bcmax = maxj bcj and denote 27

c . When it is clear from context that we are just focusing on a generic the maximizing j by jmax color c, we will refer to these as A, B, ai , bi , a, b, amax , imax , bmax , jmax . We say that neighboring coloring pair (G, σ, τ ) has a configuration (Ac , Bc ; ac , bc ) of size δc . Naively we have the bounds X X 1 + amax ≤ A ≤ 1 + ai , 1 + bmax ≤ B ≤ 1 + bi , (31) i

i

and moreover the upper bounds in (4) are equalities when c 6= σ(v), τ (v). Note that Sσ (v, c) and Sτ (v, c) can be quite diﬀerent but Sσ (v, c) ⊃ Sτ (ui , σ(v)) so it is easier to understand the overlap between these two components. Among all choices of i, this overlap is maximized for i = imax , and the idea of Vigoda’s coupling is thus to greedily couple the ﬂips of the biggest components, i.e. Sσ (v, c), Sτ (v, c), to the ﬂips of the next biggest components, i.e. Sτ (uimax , σ(v)), Sσ (ujmax , τ (v)), and then to couple together as closely as possible the ﬂips of Sσ (ui , τ (v)) and Sτ (ui , σ(v)) for each i ∈ [δc ]. Formally, assuming p1 ≥ p2 ≥ · · · we have: 1. Flip Sσ (v, c) and Sτ (uimax , σ(v)) together with probability pA . 2. Flip Sτ (v, c) and Sσ (ujmax , τ (v)) together with probability pB . 3. For i ∈ [δc ], deﬁne

pai − pA pai

if i = imax otherwise

(32)

( pbi − pB = pbi

if i = jmax otherwise

(33)

qi =

qi′

(

Note that qi and qi′ are the remaining probability associated to ﬂips Sτ (ui , σ(v)) and Sσ (ui , τ (v)) respectively. (a) Flip Sτ (ui , σ(v)) and Sσ (ui , τ (v)) together with probability min(qi , qi′ ) (b) Flip only Sτ (ui , σ(v)) together with probability qi − min(qi , qi′ ) (c) Flip only Sσ (ui , τ (v)) together with probability qi′ − min(qi , qi′ )

Coupled moves 1) and 2) change the Hamming distance by at most A−amax −1 and B −bmax −1 respectively (with equality, for example, if G is a tree rooted at v). For any given i ∈ [δc ], coupled move 3a) changes the Hamming distance by ai + bi − 1, where the extra -1 term comes from the fact that Sτ (ui , σ(v)) and Sσ (ui , τ (v)) are of size ai and bi respectively but share vertex ui . On the other hand, coupled moves 3b) and 3c) obviously change the Hamming distance by ai and bi respectively. For a conﬁguration (A, B; a, b), deﬁne H(A, B; a, b) = (A − amax − 1)pA + (B − bmax − 1)pB + where

X

f (ui ),

(34)

i

f (ui ) = ai qi + bi qi′ − min(qi , qi′ )

(35)

The above discussion implies that for c 6= σ(v), τ (v) appearing in the neighborhood of v, kn · E[1Xc · (d(σ ′ , τ ′ ) − 1)] ≤ H(Ac , Bc ; ac , bc ), 28

(36)

where Xc is the random event that the coupling ﬂips components in Dc in both colorings. For c not appearing in the neighborhood of v, the Hamming distance will not change if Kempe components containing the color c are ﬂipped in both colorings, as the coupling is the identity on these components, except if v is ﬂipped to c in both colorings, in which case the Hamming distance decreases by 1. Lastly, we review how the case of c = σ(v), τ (v) and Dσ(v) ∪ Dτ (v) 6= ∅ is handled in [Vig99]. This is the main place where one needs to be careful about the fact that neighboring coloring pairs σ, τ need not be proper. Remark A.2. When c = σ(v), τ (v), we must make sure not to double count flips, as it is possible that Dσ(v) and Dτ (v) share Kempe components. In this remark, suppose Dσ(v) ∩ Dτ (v) 6= ∅. This can only happen if there exist xi , yj ∈ N (v) colored σ(v), τ (v) respectively for which Sσ (v, τ (v)) = Sσ (xi , τ (v)) and Sτ (v, σ(v)) = Sτ (xi , σ(v)). To avoid double counting, Vigoda sets Sσ (v, τ (v)) = Sτ (yj , σ(v)) = ∅ in this case. The bound (6) then holds for both c = σ(v), τ (v). The only difference is that some values among Ac , Bc and the entries of ac , bc will be zero, in which case we take p0 = 0. Specifically, for c = τ (v), we have Ac = 0, Bc = bcmax = 0, and at least one acj is zero, namely the one corresponding P to the component Sτ (yj , σ(v)) = Sτ (v, σ(v)). In this case one can check that H(Ac , Bc ; ac , bc ) = acj pacj , and provided αpα ≤ 1 for all α, this is at most δc − 1. P For c = σ(v), we have Ac = 0, aci = 0 for all i, and Bc = j bcj . Let j ∗ be the index of the unique neighbor uj ∗ of v for which Sτ (v, σ(v)) = Sσ (uj ∗ , σ(v)). Then because Sσ (uj ∗ , σ(v)) contains v, we need to modify the definition of bcmax . Let bcmax = maxj (bcj − Ij=j ∗ ) and denote the maximizing c . Then the lower bound on Bc in (4) still holds, and (6) still holds. Moreover, if δc = 1, j by jmax then E[d(σ ′ , τ ′ ) − 1|Xσ(v) ] = H(Ac , Bc ; ac , bc ) = 0. Henceforth, we will refer to the coupling deﬁned above as the greedy coupling. We can conclude the following, implicit in [Vig99]: Lemma A.1. Let (σ, τ ) 7→ (σ ′ , τ ′ ) be the greedy coupling. Then X 1 −|{c : δc = 0}| + E[d(σ ′ , τ ′ ) − 1] ≤ H(Ac , Bc ; ac , bc ) . nk

(37)

c:δc 6=0

The function H implicitly depends on the choice of ﬂip parameters {pα }, while Ac , Bc , ac , bc depend on (G, σ, τ ). The remaining analysis in [Vig99] once (7) has been deduced essentially boils down to picking a good set of ﬂip parameters.

A.2

Enumerating Realizable Configurations

In Section 3 we raised the issue of enumerating all realizable conﬁgurations in Linear Program 1. In particular, while it was easy to enumerate realizable conﬁgurations (Ac , Bc ; ac , bc ) for which c 6= σ(v), τ (v), we provided without proof two types of constraints ((10) and (11)) that we claimed would handle realizable conﬁgurations for which c = σ(v), τ (v). In this subsection we ﬁll in the details for why these two constraints suﬃce for conﬁgurations with c = σ(v), τ (v). For c = σ(v), by Remark A.2, any realizableP(Ac , Bc ; ac , bc ) satisﬁes H(Ac , Bc ; ac , bc ) = 0 if δc = 1, and satisﬁes Ac = 0, ac = (0, ..., 0), Bc = i bci , and X X bci pbci + bjmax (pbcjmax − pBc ) ≤ (Bc − bcm )pBc + bci pbci H(Ac , Bc , ac , bc ) = (Bc − bcmax − 1)pBc + i6=jmax

if δc > 1. So the relaxed constraint (10) covers all constraints corresponding to c = σ(v). 29

For c = τ (v), we know by Remark A.2 that when c = τ (v), (11) ensures that H(Ac , Bc ; ac , bc ) ≤ δc − 1, and for any λ > 1 (corresponding to k > d, which is the regime we are interested in to begin with), we automatically have that δc − 1 < −1 + λ · δc .

B

Extremal Configurations for Vigoda’s Choice of Flip Parameters and Missing Proofs from Section 3

Observation B.1. Consider the assignment (13) in Linear Program 2 for Nmax = 6 and m∗ = 3. Constraint (11) is tight under the assignment (13) only for α = 1. Among the constraints of the form (9) associated to a realizable configuration (A, B; a, b), up to symmetry, there are six tight constraints: i) m = 1, A − 1 = a1 ∈ {2, 3, 4, 5} and B − 1 = b1 = 1; ii) m = 2, A = a1 + a2 + 1, a1 = a2 ∈ {2, 3}, B = 1 and b1 = b2 = 1. Any other constraints of the form (9) that do not meet these conditions, and all constraints of the form (10) and (12), are not tight under the assignment (13). This can be verified numerically. It follows that Vigoda’s solution has six extremal realizable configurations, up to symmetries. Proof. The tightness of (11) only for α = 1 is obvious. That the other constraints mentioned in the observation have zero slack can be checked by hand. We verify that all other constraints have nonzero slack. Case 1. Constraint (9) for m = 1 We ﬁrst consider realizable (A, B; ac , bc ). It is easy to see that (i − 1)(pi − pi+1 ) ≤ 1/7 with equality if and only if i ∈ {2, 3, 4, 5}, and that i(pi − pi+1 ) ≤ 29/42 with equality if and only if i = 1. Note that for m = 1, H(A, B; a, b) = max (a1 (pa1 − pa1 +1 ) + (b1 − 1)(pb1 − pb1 +1 ), (a1 − 1)(pa1 − pa1 +1 ) + b1 (pb1 − pb1 +1 )) 29 1 5 ≤ + = , 42 7 6 with equality if and only if a1 = 1 and b1 ∈ {2, 3, 4, 5} or b1 = 1 and a1 ∈ {2, 3, 4, 5}. Case 2. Constraint (9) for m = 2 We analyze this case in the same way that Claim 6 of [Vig99] is proved. Assume without loss of generality that pamax − pA ≤ pbmax − pB and a1 ≥ a2 . In [Vig99] it is noted that one may assume that b2 ≥ b1 so that H(A, B; a, b) = (A − 2a1 )pA + (B − 2b2 − 1) + (a1 − 1)pa1 + a2 pa2 + b1 pb1 + b2 pb2 − min(pa2 , pb2 − pB ). Now we proceed by casework on min(pa2 , pb2 − pB ): • pa2 ≤ pb2 − pB : In this case we have H(A, B; a, b) = (a1 − 1)pa1 + (a2 − 1)pa2 + (A − 2a1 )pA + b1 pb1 + b2 pb2 + (B − 2b2 − 1)pB . (38) One can check that (a−1)pa ≤ 1/3 with equality if and only if a = 3. If a1 = 3, (A−2a1 )pA ≤ 0 with equality if and only if a2 = 3 and 6 ≤ A ≤ 7. However (6, 3; (3, 3); (b1 , b2 )) is not 30

realizable. If a1 6= 3, (A − 2a1 )pA > 0 if and only if a1 = a2 and A = a1 + a2 + 1. It turns out that p3 < 2p2 + p5 = 4p3 + p7 = 2/3 and thus (38) is only maximized when a1 = a2 ∈ {2, 3} and A = a1 + a2 + 1. In a similar manner, we can verify that for any ﬁxed A, a1 , a2 , (38) is only maximized when b1 = b2 = 1 and B = 3 and, in such a case, the contribution to the part involving B, b1 , b2 is 2. So, with the given assumptions, the only two conﬁgurations with H(A, B; a, b) = 8/3 are (7, 3; (3, 3); (1, 1)) and (5, 3; (2, 2); (1, 1)), up to . • pa2 > pb2 − pB : In this case we have H(A, B; a, b) = (a1 − 1)pa1 + a2 pa2 + (A − 2a1 )pA + b1 pb1 + (b2 − 1)pb2 + (B − 2b2 )pB . This is symmetric with respect to ﬂipping the roles of (a1 , a2 ) and (b2 , b1 ), and it can be veriﬁed that (a1 − 1)pa1 + a2 pa2 + (A − 2a1 )pA < 4/3, so H(A, B; a, b) < 8/3. Case 3. Constraint (10) For m = 2, the left-hand side of (10) is b1 pb1 +b2 + b1 pb1 + b2 pb2 , which attains its maximum value of p2 + 2 < −1 + 2λ at b1 = b2 = 1. For m > 2, note that (11) implies that the left-hand side of (10) is at most m + 1 < −1 + λ · m provided λ > 5/3, which is certainly the case. Case 4. Constraint (12) One can check that (A − 2)pA ≤ 4/21. And if pa ≤ pb , then a · pa + b · pb − min(pa , pb ) = (a − 1)pa + b · pb . But (a − 1)pa ≤ 1/3 and b · pb ≤ 1. So x∗ = 4/21 and y ∗ = 4/3, and it is clear that −1 + (11/6) · 3 > 2 · x∗ + m∗ · y ∗ for m∗ = 3, so (12) has nonzero slack. Proof of Lemma 3.3. We ﬁrst show there exists no choice of ﬂip parameters for which greedy coupling contracts for all of C ∗ . Let λ = k/d, and suppose to the contrary that 1 ≤ λ < 11/6 and yet there exists a set of ﬂip parameters {pα } for which all pairs of colorings in C ∗ contracted in distance. The expected change in distance for G1 is d d(−1 + λ) · H(3, 2; (2), (1)) = d (p1 + p2 − 2p3 − min(p1 − p2 , p2 − p3 )) < 0 ≤ . nk nk The expected change in distance for G2 is d(−1 + 2λ) d · H(7, 3; (3, 3), (1, 1)) = (d/2) (2p1 + 5p3 − min(p1 − p3 , p3 − p7 )) < 0 ≤ . 2nk 2nk As this implies p1 − p3 ≤ −1 + λ and 2p1 + 4p3 + p7 ≤ −1 + 2λ, using p1 = 1 and p7 ≥ 0 it yields λ ≥ 11/6, a contradiction. Finally, it is straightforward to check that no one-step coupling can do better than the greedy coupling. This is clear for G1 . Indeed, certainly for any component not in D, the coupling should just be the identity. Now for any neighbor u of v with color c, suppose a nonzero amount of probability mass p for the ﬂip of Sτ (u, σ(v)) is matched in the optimal one-step coupling to the ﬂip of a component other than Sσ (v, c). The expected change in distance conditioned on this pair of components being chosen in the coupling is strictly greater than the expected change if that mass p were instead reallocated to the empty ﬂip in σ, contradicting optimality. By symmetry we can show that the ﬂip of Sσ (ui , τ (v)) is coupled only to the empty ﬂip in τ and the ﬂip of Sτ (ui , c). Finally, if not all of the probability mass for the ﬂip of Sσ (v, c) is matched to the ﬂip of Sτ (u, σ(v)), then we can strictly improve the coupling by reallocating that mass to Sτ (u, σ(v)). A similar argument shows that the optimal one-step coupling for G2 is the greedy coupling. 31

C C.1

Missing Proofs from Section 4 Proof of Lemma 4.1

Proof. Denote a minimizing choice of {pα } and λsing , λbad , λgood for the γ-mixed coupling LP by {p∗α } and λ∗sing , λ∗bad , λ∗good . Then for any (G, σ, τ ) ∈ C and the greedy coupling (σ, τ ) 7→ (σ ′ , τ ′ ), observe that X X X H(Ac , Bc ; ac , bc ) H(Ac , Bc ; ac , bc ) + H(Ac , Bc ; ac , bc ) + E[dH (σ ′ , τ ′ ) − 1] ≤ −|{c : δc = 1}| + ≤ −|{c : δc = 1}| +

X

c:σ,τ Goodc

c:σ,τ Badc

c:σ,τ Singc

(−1 + λsing ) +

X

(−1 + 2λbad ) +

(−1 + δc λgood )

c:σ,τ Goodc

c:σ,τ Badc

c:σ,τ Singc

X

= −k + λsing · Nsing (σ, τ ) + 2λbad · Nbad (σ, τ ) +

X

c:σ,τ Goodc

δc · λgood .

(39)

But because δc ≥ 2 for any c 6= σ(v), τ (v) for which σ, τ are in state Goodc , because σ, τ are always in state Goodσ(v) , Goodτ (v) , and because Nsing (σ, τ ) + 2Nbad (σ, τ ) +

X

δc = ∆(v),

c:σ,τ Goodc

we conclude that (39) is a convex combination of the terms 1 γ ∗ ∗ ∗ ∗ ·λ + ·λ ∆(v). −k + λsing · ∆(v), −k + λgood · ∆(v), −k + γ + 1 bad γ + 1 good So we conclude that E[dH (σ ′ , τ ′ ) − 1] ≤ −k + λ∗γ ∆(v) < 0 as long as k > λ∗γ ∆.

C.2

Proof of Upper Bound in Lemma 4.2

Proof. Fix a color c for which δc 6= 0 and some i ∈ [δc ]. For i 6= imax , jmax , the pairs of ﬂips (Sσ (ui , τ (v)), Sτ (ui , σ(v))), (Sσ (ui , τ (v)), ∅), and (∅, Sτ (ui , σ(v))) have probability mass min(pbi , pai ), max(0, pbi − pai ), and max(pai − pbi , 0), for a total of max(pai , pbi ). The remaining pairs of ﬂips have probability masses which depend on whether imax = jmax , as shown in Table 1. From these we can conclude that X X X nk · P[(S, S ′ ) terminating] ≤ max(paci , pbci ) + p1 . pAc + pBc + c:δc >0

c:δc >0,i∈[δc ]

c:δc =0

The sum of the second and third summands is at most k. For the ﬁrst summand, note that when Ac , Bc are nonzero and δc > 0, Ac , Bc ≥ 2, so the ﬁrst summand is at most 2p2 ∆. The desired upper bound follows.

C.3

Proof of Lemma 4.3

Proof. Let λ∗sing , λ∗tree , λ∗good be the values for λsing , λtree , λgood of the minimizer of the Cγ-mixed ′

S,S coupling LP from Deﬁnition 4. For Kempe components S, S ′ in σ, τ respectively, deﬁne Eσ,τ = ′ ′ ′ ′ ′ dH (σ , τ ) − 1, where σ , τ is the pair of colorings obtained by ﬂipping S in σ and S in τ , and let

32

Table 1: Probability masses for some coupled ﬂips Flip in σ

Flip in τ

imax = jmax

Sσ (v, c)

Sτ (uimax , σ(v))

imax 6= jmax

pA

pA

Sτ (v, c)

Sσ (ujmax , τ (v))

pB

pB

Sσ (uimax , τ (v))

Sτ (uimax , σ(v))

(Sσ (ujmax , τ (v))

Sτ (ujmax , σ(v))

min(paimax − pA , pbimax − pB )

min(paimax − pA , pbimax )

(Sσ (uimax , τ (v))

∅

max(0, pbimax − pB − paimax + pA )

max(0, pbimax − paimax + pA )

∅

Sτ (uimax , σ(v))

max(0, paimax − pA − pbimax + pB )

max(0, paimax − pA − pbimax )

Sσ (ujmax , τ (v))

∅

N/A

Sτ (ujmax , σ(v))

max(0, pajmax − pbjmax + pB )

N/A

max(0, pbjmax − pB − pajmax )

max(paimax +pA , pbimax +pB )

max(pajmax + pB , pbjmax ) + max(pbimax + pA , paimax )

∅

min(pbjmax − pB , pajmax )

N/A

Total

′

pS,S be the probability that S, S ′ are ﬂipped in one step of greedy coupling starting from σ(T −1) ,τ (T −1) σ (T −1) , τ (T −1) . Then we have that E[dH (σ (Tstop ) , τ (Tstop ) ) − 1] = where Z(σ (T −1) , τ (T −1) ) :=

X S,S ′

X

T,σ,τ

P[σ (T −1) = σ, τ (T −1) = τ ] · Z(σ (T −1) , τ (T −1) ), ′

(40)

′

I[(S, S ′ ) terminating] · pS,S · EσS,S (T −1) ,τ (T −1) σ(T −1) ,τ (T −1) ′

But note that for v (T ) , c(T ) not terminating, EσS,S (T −1) ,τ (T −1) = 0 because the one-step coupling is just the identity coupling, so Z(σ (T −1) , τ (T −1) ) is just the expected change in distance under one step of greedy coupling on the neighboring coloring pair (G, σ (T −1) , τ (T −1) ). Therefore, for any T ≤ Tstop , Z(σ (T −1) , τ (T −1) ) = E[dH (σ (T ) , τ (T ) ) − 1] 1 (−1 + λ∗sing ) · Nsing (σ (T −1) , τ (T −1) ) + (−1 + 2λ∗bad ) · Nbad (σ (T −1) , τ (T −1) ) ≤ nk X ∗ (−1 + δc · λgood ) − |{c : δc = 0}| + c:σ(T −1) ,τ (T −1) Goodc

=

1 · − k + λ∗sing Nsing (σ (T −1) , τ (T −1) ) + 2λ∗bad Nbad (σ (T −1) , τ (T −1) ) nk X δc · λ∗good . + c:σ(T −1) ,τ (T −1) Goodc

Because Nsing (σ, τ ) + 2Nbad (σ, τ ) +

X

c:σ,τ Goodc

33

δc = ∆(v)

(41)

for all neighboring coloring pairs (G, σ, τ ), we conclude from (40) and (41) that X 1 P[σ (T −1) = σ, τ (T −1) = τ ] −k + λ∗sing ∆sing + λ∗bad ∆bad + λ∗good ∆good E[dH (σ (Tstop ) , τ (Tstop ) )−1] = nk T,σ,τ

(42)

for some ∆sing , ∆bad , ∆good ≥ 0 satisfying ∆sing + ∆bad + ∆good = ∆(v).

(43)

But note that P (T −1) = σ, τ (T −1) = τ ] · N ∆bad bad (σ, τ ) T,σ,τ P[σ =P (T −1) (T −1) ∆good = σ, τ = τ ] · Ngood (σ, τ ) T,σ,τ P[σ ≤C·

E[Nbad (σ (Tstop −1) , τ (Tstop −1) )] E[Ngood (σ (Tstop −1) , τ (Tstop −1) )]

≤ Cγ

(44)

2∆ for C := k+2p k−∆−2 , where the second inequality follows by hypothesis and the ﬁrst inequality follows by the fact that for s ∈ {bad, good}, X E[Ns (σ (Tstop −1) , τ (Tstop −1) )] = P[σ (T −1) = σ, τ (T −1) = τ ] · P[(ST , ST′ ) terminating] · Ns (σ, τ )

T,σ,τ

∈

X k − ∆ − 2 k + 2p2 ∆ , · P[σ (T −1) = σ, τ (T −1) = τ ] · Ns (σ, τ ), nk nk T,σ,τ

where we use the notation x ∈ [a, b] · y to denote the fact that a · y ≤ x ≤ b · y. The ﬁrst step above follows by deﬁnition and the second step follows by Lemma 4.2. Finally, observe that (43) and (44) imply that −k + λ∗sing ∆sing + λ∗bad ∆bad + λ∗good ∆good is a Cγ 1 convex combination of −k + λ∗sing ∆(v), −k + λ∗good ∆(v), and −k + Cγ+1 λ∗bad + Cγ+1 λ∗good ∆(v), so in particular from (42) we get that X −k + λ∗Cγ ∆(v) 1 E[dH (σ (Tstop ) , τ (Tstop ) )−1] ≤ , P[σ (T −1) = σ, τ (T −1) = τ ] (−k+λ∗Cγ ∆(v)) ≤ nk k−∆−2 T,σ,τ

where the ﬁnal step follows from the fact that X P[σ (T −1) = σ, τ (T −1) = τ ] · P[(ST , ST′ ) terminating] = 1 T,σ,τ

and the lower bound of Lemma 4.2.

C.4

Proof of Lemmas from Section 4.3

Proof of Lemma 4.6. Without loss of generality suppose that (Ac , Bc , ac , bc ) = (7, 3, (3, 3), (1, 1)). Let u1 , u2 be the two c-colored neighbors of v, and denote the elements of Sτ (u1 , σ(v)) and Sτ (u2 , σ(v)) by {u1 , w11 , w12 } and {u2 , w21 , w22 } respectively. We know that the vertices {w11 , w12 , w21 , w22 } , the pair of ﬂips (S, S ′ ) chosen under the greedy couare all distinct. With probability n4 · k−∆−1 k pling satisﬁes S = S ′ = Sσ (wji , c′ ) for some i, j ∈ {1, 2} and c′ ∈ Aσ (wji )\{σ(wji )} (note that 34

Aσ (wji )\{σ(wji )} contains neither σ(v) nor c). In this case, the ﬂips are just of vertex wji from color σ(wji ) to a diﬀerent color not already present in its neighborhood, so the neighboring coloring pair σ ′ , τ ′ resulting from the ﬂips is in state Goodc . Proof of Lemma 4.8. First note that in order for σ ′ , τ ′ to be in stage BadEndc given that σ, τ were in stage Goodc , it must be that condition (iii) of Deﬁnition 4.4 holds. Furthermore, the pair of components (S, S ′ ) ﬂipped to get from σ, τ to σ ′ , τ ′ cannot be terminal, so S = S ′ . Suppose that δc > 2. In this case, the probability that σ, τ leave stage Goodc for stage BadEndc is at most the probability that enough c-colored neighbors of v are ﬂipped so that δc becomes at most 2. A Kempe component S outside of Dc and containing at least (δc −2) c-colored neighbors of v must be ﬂipped in both colorings to achieve this, and the probability the greedy coupling chooses any particular such (S, S) is pδc −2 /(nk). The number of such Kempe components is at most δc · (k − 2), δ p ·(k−2) < 3/n. so by a union bound the probability that δc becomes 2 is at most c δc −2 nk On the other hand, if δc < 2, then σ, τ are not in stage Goodc to begin with. So for the rest of the proof, we consider the case of δc = 2. We will proceed by casework on (Ac , Bc , ac , bc ), which we will denote as (A, B, (a1 , a2 ), (b1 , b2 )) for simplicity. Let E denote the event that σ, τ transition to stage BadEndc . Denote the two c-colored neighbors of v by u1 , u2 . We have that E ⊆ E1 ∪ E2 , where E1 is the event that u1 or u2 is ﬂipped in both colorings to a new color, and E2 is the event that u1 or u2 are not ﬂipped but σ, τ nevertheless transition to stage BadEndc . Obviously P[E1 ] ≤ 2/n. We now proceed to bound P[E2 ]. Case 1. If ai > 3 or bi > 3 for some i = 1, 2, then P[E2 ] ≤ n3 . Proof. Without loss of generality, say that a1 > 3. From the vertices of Sτ (u1 , σ(v)) pick out w, w′ 6= u1 such that w, w′ , u1 form a Kempe component. We have that event E2 ⊆ A∪B, where A is the event that all vertices in Sτ (u1 , σ(v))\{u1 , w, w′ } are ﬂipped so that Sτ ′ (u1 , σ(v)) ⊆ {u1 , w, w′ }, and B is the event that w or w′ is ﬂipped and no longer belongs to Sτ ′ (u1 , σ(v)). Obviously P[B] ≤ 2/n. For A, the (a1 − 3) neighbors of u1 , w, w′ in Sτ (u1 , σ(v)) must be ﬂipped at once, which by a union bound occurs with probability at most n1 · (a1 − 3) · pa1 −3 ≤ n1 , where the inequality follows by (11). So P[E2 ] ≤ P[A] + P[B] ≤ 3/n. Case 2. If ai = 0 for some i and b1 , b2 ≤ 3, or if bi = 0 for some i and a1 , a2 ≤ 3, then P [E2 ] ≤ n1 . Proof. Suppose without loss of generality that a1 = 0 and b1 , b2 ≤ 3. By the deﬁnition of the greedy coupling and the fact that c 6= σ(v), τ (v), a1 = 0 if and only if Sτ (u2 , σ(v)) consists of u1 , u2 , w for some σ(v)-colored w ∈ N (u1 ) ∪ N (u2 ). So E2 is a subset of the event that w is ﬂipped to any other color. Thus, P[E2 ] ≤ n1 . Case 3. If 1 ≤ a1 , a2 , b1 , b2 ≤ 3, and if (a1 , a2 ) and (b1 , b2 ) are both not among {(1, 1), (3, 3)}, then 48 P [E2 ] ≤ nk . Proof. Suppose (a1 , a2 ) and (b1 , b2 ) are both not among {(1, 1), (3, 3)}. Then E2 is a subset of the event that the pair of ﬂips (S, S) chosen increases or decreases at least one of a1 , a2 and decreases or increases at least one of b1 , b2 , respectively. But for a ﬂip S to decrease some ai for i ∈ {1, 2}, it must contain a member of Sτ (ui , σ(v)), and for a ﬂip S to increase some bj for j ∈ {1, 2}, it must contain the color c or τ (v). There are at most 3 members of Sτ (ui , σ(v)), so the probability 6 , and by a union bound over the of (S, S) both increasing ai and decreasing bj is at most n3 · k2 = nk 48 . eight diﬀerent choices of i, j, and increasing/decreasing, we conclude that P[E2 ] ≤ nk Case 4. If 1 ≤ a1 , a2 , b1 , b2 ≤ 3 and exactly one of the tuples (a1 , a2 ) and (b1 , b2 ) is among {(1, 1), (3, 3)}, then P [E2 ] ≤ 4∆+48 nk . 35

Proof. Suppose (b1 , b2 ) = (1, 1). E2 ⊆ X ∪ Y, where X is the event that the pair of ﬂips (S, S) chosen increases or decreases some ai and decreases or increases some bi , respectively, and Y is 48 the event that (a1 , a2 ) becomes (3, 3). We already know by Case 3 that P[X ] ≤ nk . Supposing without loss of generality that a1 < 3, the event Y is a subset of the event that a neighbor of a vertex in Sτ (u1 , σ(v)) is ﬂipped to the color c or σ(v). There are at most 2∆ such neighbors, so 2 4∆ 4∆+48 P[Y] ≤ 2∆ n · k = nk , and thus P[E2 ] ≤ nk . Now suppose (b1 , b2 ) = (3, 3). E2 ⊆ X ∪ Z where X is the event deﬁned above and Z is the event that (a1 , a2 ) becomes (1, 1). Supposing without loss of generality that a1 > 1, Z is a subset of the event that one of the members of Sτ (u1 , σ(v)) other than u1 is ﬂipped. There are at most two such vertices, so P[Z] ≤ 2/n and P[E2 ] ≤ 2k+48 nk . Case 5. If 1 ≤ a1 , a2 , b1 , b2 ≤ 3 and (a1 , a2 , b1 , b2 ) = (1, 1, 1, 1), then P [E2 ] ≤

4∆ nk .

Proof. E2 is a subset of the event that one of the neighbors of u1 or u2 is ﬂipped to the color σ(v) 2 4∆ or τ (v), so P[E2 ] ≤ 2∆ n · k = nk . Case 6. If 1 ≤ a1 , a2 , b1 , b2 ≤ 3 and (a1 , a2 , b1 , b2 ) = (3, 3, 3, 3), then P [E2 ] ≤ n2 . Proof. E2 ⊆ S ∪ T , where S (resp. T ) is the event that all σ(v)-colored (resp. τ (v)-colored) neighbors in N (u1 ) ∪ N (u2 ) in τ (resp. σ) are ﬂipped to a diﬀerent color. Consider an arbitrary σ-colored neighbor w of u1 . S is a subset of the event that w is ﬂipped, so P[S] ≤ 1/n. We can bound P[T ] similarly, so P[E2 ] ≤ 2/n. Of the upper bounds on P[E2 ] in all of the above cases, the bound of 3/n from Case 1 is the greatest when k ≥ 1.833∆, completing the proof of Lemma 4.8.

D D.1

Missing Proofs from Section 5 Proof of Lemma 5.1

Proof. We need to prove that for every m and every realizable conﬁguration (A, B; a, b) of size m diﬀerent from the four excluded ones, we have ˆ H(A, B; a, b) ≤ −1 + mλ.

(45)

It is straightforward to check that for every α ∈ {1, . . . , 6} the given assignment satisﬁes αˆ pα ≤ 1, ˆ (α − 1)ˆ pα ≤ 1/3 and (α − 2)ˆ pα ≤ (3λ − 5)/2; we will useP some of these inequalities in the proof. P Let us ﬁrst assume that c 6= σ(v), τ (v), so A = 1 + i ai and B = 1 + i bi . If m = 1, then any realizable conﬁguration has the form (i + 1, j + 1, (i), (j)). Observe that (21) correspond to the constraint H(i+1, j +1, (i), (j)) ≤ −1+λ. We may assume that i, j ≤ 6 and that j 6= 1 as otherwise we obtain weaker constraints, since pˆα = 0 for α ≥ 7. Thus (45) follows for every conﬁguration of size 1 in Linear Program 5, from the constraints in Linear Program 6. If m = 2, as pˆα = 0 for every α ≥ 7, there is a ﬁnite amount of non-trivial realizable nonextremal conﬁgurations (A, B; a, b) of size 2. One can check by computer that for the given values ˆ with ˆ any conﬁguration of size 2 in Linear Program 5 satisﬁes H(A, B; a, b) ≤ −1 + 2λ, of p ˆ and λ, equality if and only if the conﬁguration is (5, 3, (2, 2), (1, 1)) or (3, 5, (1, 1), (2, 2)). Suppose that m = 3. By Lemma 3.1, recall that X H(A, B; a, b) ≤ (A − 2)pA + (B − 2)pB + (ai pai + bi pbi − min{pai , pbi }). i

36

Using the properties of p ˆ , we have ai pai + bi pbi − min{pai , pbi } ≤ 4/3. Thus, X ˆ. H(A, B; a, b) ≤ (A − 2)pA + (B − 2)pB + (ai pai + bi pbi − min{pai , pbi }) ≤ −1 + 3λ i

ˆ > 4/3, we have If m ≥ 4 and since λ ˆ − 5 + 4m/3 ≤ −1 + mλ ˆ. H(A, B; a, b) ≤ 3λ We ﬁnally deal with the case c ∈ {σ(v), τ (v)}. If c = τ (v), Remark A.2 and αpα ≤ 1 implies ˆ If c = σ(v) and m = 1 then Remark 2.2 implies H(A, B; a, b) = H(A, B; a, b) ≤ −1+m ≤ −1+mλ. ˆ ˆ as αpα ≤ 1, (α − 1)pα ≤ 1/3, so 0 ≤ −1 + λ. Finally, if c = σ(v) and m ≥ 2, then (10) holds for λ ˆ H(A, B; a, b) ≤ −1 + mλ. ˆ We conclude that p ˆ is a feasible solution to Linear Program 5 with objective value λ.

D.2

Proof of Lemma 5.2

Proof. Recall that the extremal conﬁgurations for our choice of ﬂip parameters are (3, 2, (2), (1)) and (7, 3, (3, 3), (1, 1)), up to symmetries, and Uc = {u1 , . . . , um }, where m = δc . i (v) for some i ∈ {1, 2}. Consider the sets of components Assume ﬁrst that c ∈ Cσ,τ / CσS ,τS (v)} , S0 := {S ∈ D : c ∈

S2 := {S ∈ D : c ∈ Cσ2S ,τS (v)} .

Note that when i = 1, then for every S ∈ D \ (S0 ∪ S2 ) we have ξσ,τ (v, c, S) ≤ 0; therefore, ∇B (σ, τ, c, D) ≤ ∇B (σ, τ, c, S0 ) + ∇B (σ, τ, c, S2 ) . Note that when i = 2, then for every S ∈ D \ S0 we have ξσ,τ (v, c, S) ≤ 0; therefore, ∇B (σ, τ, c, D) ≤ ∇B (σ, τ, c, S0 ) . We proceed to bound ∇B (σ, τ, c, S0 ) for i ∈ {1, 2}. Without loss of generality, assume that a1 > b1 . Let w ∈ Sτ (u1 , σ(v)) with τ (w) = σ(v); we note that w ∈ / Uc ∪ {v} and that such a ′ ′ vertex always exists as a1 ≥ 2. Choose a color c ∈ [k] with c ∈ / σ(N (w)) ∪ {σ(v), τ (v)}. Let S = Sσ (w, c′ ) ∈ D. As S = {w}, (σS , τS ) has either a (2, 2, (1), (1)) or a (j + 4, 3, (j, 3), (1, 1)) (with j ∈ {1, 2}) conﬁguration for c, i.e. c ∈ / CσS ,τS (v). As there are at least k − ∆ − 2 choices for c′ and as p|S| = p1 = 1, we have η(k − ∆ − 2) ·i. ∇B (σ, τ, c, S0 ) ≤ − ∆ Now we bound ∇B (σ, τ, c, S2 ), provided that i = 1. Let S ∈ S2 , then |S ∩ (N (v) \ {u1 })| ≥ 1 and if u ∈ S ∩ (N (v) \ {u1 }), then σS (u) = c. Thus, S can be described as S = Sσ (u, c) for u ∈ N (v), implying that |S2 | ≤ ∆. Moreover, |S| ≥ 2 as at least two vertices need to change their color to transform an extremal 1-conﬁguration into an extremal 2-conﬁguration. Since p|S| ≤ p2 ≤ 31 and ξσ,τ (v, c, S) = 1, we have η ∇B (σ, τ, c, S2 ) ≤ . 3 From the bounds on ∇B (σ, τ, c, S0 ) and ∇B (σ, τ, c, S2 ) derived above, we obtain that for i ∈ i (v) {1, 2} and c ∈ Cσ,τ η k − 4∆ k 3 3 −2 ∇B (σ, τ, c, D) ≤ − · i ≤ −iη − , ∆ ∆ 2 37

and this proves the ﬁrst statement. To prove the second statement, assume that c ∈ / Cσ,τ (v) and let T := {S ∈ D : c ∈ CσS ,τS (v)}. Again, for every S ∈ D \ T , we have ξσ,τ (v, c, S) ≤ 0. Therefore, ∇B (σ, τ, c, D) ≤ ∇B (σ, τ, c, T ) . Deﬁne UcS := N (v) ∩ (σS )−1 (c) with mS := |UcS | and note that mS ≤ 2. Consider the partition T = T1 ∪ T2 ∪ T3 with T1 := {S ∈ D : Uc \ UcS 6= ∅} ,

T2 := {S ∈ D : UcS \ Uc 6= ∅} \ T1 , T3 := {S ∈ D : UcS = Uc } .

For every S ∈ T3 , if c ∈ Cσ1S ,τS (v), let (AS , B S , (aS1 ), (bS1 )) be the extremal 1-conﬁguration for c in (σS , τS ) and if c ∈ Cσ2S ,τS (v), let (AS , B S , (aS1 , aS2 ), (bS1 , bS2 )) be the extremal 2-conﬁguration for c in (σS , τS ). Recall that (A, B; (a1 , . . . , am ), (b1 , . . . , bm )) denotes the conﬁguration for c in (σ, τ ). As it is non-extremal, there exists x ∈ {a, b} and j ∈ [mS ], such that xj 6= xSj . Note that xSj ≤ 3. Consider the partition T3 = T3+ ∪ T3− with T3+ := {S ∈ T3 : xj > xSj } ,

T3− := {S ∈ T3 : xj < xSj } .

To bound the size of S ′ ∈ {T1 , T3+ } we will proceed as follows. For every S ∈ S ′ , there is a vertex in a Kempe component of either σ or τ that does not belong to the corresponding component in either σS or τS . If there exists R(S ′ ) ⊆ Sσ (v, c) ∪ Sτ (v, c) such that S ∩ R(S ′ ) 6= ∅ for every S ∈ S ′ , then, any S ∈ S ′ can be described as S = Sσ (w, c′ ) for w ∈ R(S ′ ) and c′ ∈ [k], and |S ′ | ≤ |R(S ′ )|k. If S ′ = T1 and S ∈ S ′ , then observe that |S ∩ Uc | = |Uc \ UcS | ≥ max{m − mS , 1}. Let ℓ = min{mS +1, m}. If R(T1 ) = R1 = {u1 , . . . , uℓ }, it follows that |S ∩R1 | ≥ |S ∩Uc |−(m−(mS +1)) ≥ 1 and |T1 | ≤ (mS + 1)k ≤ 3k. If S ′ = T3+ and S ∈ S ′ , recall that xj > xSj and set ϕ = σ if x = b and ϕ = τ if x = a, and let π ∈ {σ, τ } \ {ϕ}. Let R(T3+ ) = R3 be an arbitrary set of xSj vertices in Sϕ (uj , π(v)) \ {uj }. As uj ∈ / R3 , we have S ∩ R3 6= ∅. Since there are 4 choices for the extremal conﬁguration, we have |T3+ | ≤ 4xSj k ≤ 12k.

To bound the size of S ′ ∈ {T2 , T3− } we will proceed as follows. For every S ∈ S ′ , there is a vertex in the neighborhood of a Kempe component of either σ or τ , that belongs to the corresponding component in either σS or τS . If there exists a set N (S ′ ) of neighbors of Sϕ (v, c) such that S ∩ N (S ′ ) 6= ∅ for every S ∈ S ′ , then, any S ∈ S ′ can be described as S = Sϕ (w, c′ ) for w ∈ N (S ′ ) and a unique c′ ∈ {c, π(v)}, and |S ′ | ≤ |N (S ′ )|. If S ′ = T2 and S ∈ S ′ , then let N (T2 ) = N2 = N (v) \ Uc . Clearly S ∩ N2 6= ∅ and |T2 | ≤ ∆. If S ′ = T3− and S ∈ S ′ , recall that xj < xSj and set ϕ = σ if x = b and ϕ = τ if x = a, and let π ∈ {σ, τ } \ {ϕ}. Let N (T3− ) = N3 be the set of neighbors of Sϕ (uj , π(v)), which satisﬁes S ∩ N3 6= ∅. As S ∈ T3− , |S| ≤ xj ∆ ≤ (xSj − 1)∆ ≤ 2∆. Since there are 4 choices for the extremal conﬁguration, we have |T3− | ≤ 8∆. Since p|S| ≤ 1 and ξσ,τ (v, c, S) ≤ 2, we conclude the second statement of the lemma, 2η 15k ∇B (σ, τ, c, D) ≤ ∇B (σ, τ, c, T ) ≤ (3k + 12k + ∆ + 8∆) = 2η 9 + . ∆ ∆ 38

E

Proof of Theorem 6.1

Proof. The proof follows the same lines as the proof of Theorem 1.1 presented in Section 5, although a similar analysis can be done using the multi-step coupling approach presented in Section 4. We will describe the proof strategy, stressing the parts where the argument is diﬀerent for list-coloring and omitting the ones that are straightforward adaptations of the coloring case. Let σ, τ ∈ ΩL that diﬀer only at a vertex v. For the rest of the proof, we will assume c 6= σ(v), τ (v); the case c ∈ {σ(v), τ (v)} produces weaker constraints and can be dealt similarly as in the non-list-coloring case (see Remark A.2). For ϕ ∈ {σ, τ } ⊆ ΩL , π ∈ {σ, τ } \ {ϕ}, c ∈ L(v) and Uc = {u1 , . . . , um } the set of neighbors of v with color c with m = δc , we deﬁne the conﬁgurations L L L (AL , B L , (aL 1 , . . . , am ); (b1 , . . . , bm )) for c in (σ, τ ) as before, with the sole diﬀerence that we also set AL = 0 if Sσ (v, c) is un-ﬂippable, B L = 0 if Sτ (v, c) is un-ﬂippable, aL i = 0 if Sτ (ui , σ(v)) is un-ﬂippable and bL = 0 if S (u , τ (v)) is un-ﬂippable. As c = 6 σ(v), τ (v), if AL 6=, then AL = σ i i L L L L L L L 1 + a1 + · · · + am , and similarly for B . We deﬁne imax , jmax , amax and bmax analogously as before, and note that the latter two can be zero. Deﬁne qi (L) and qi′ (L) as in (32) and (33) for the list version of the parameters. According to this, we use the same deﬁnition of extremal conﬁgurations, metric d on ΩL , dH and dB . Again, for any pair σ ′ , τ ′ ∈ ΩL , we have d(σ ′ , τ ′ ) ≤ dH (σ ′ , τ ′ ), which implies that dB (σ ′ , τ ′ ) ≥ 0. L We use the same coupling as the one deﬁned in Appendix A and deﬁne ∇L , ∇L H and ∇B analogously as for colorings. Fix the ﬂip parameters p ˆ provided in Observation 5.1. We will prove an analogue of Corollary 5.1 to bound ∇L H for list-colorings. As in Section 5.2, we have X ∇L (σ, τ ) = ∇L H H (σ, τ, c) . c∈N

Suppose ﬁrst that m = 0. Then c ∈ L(v) and ∇L H (σ, τ, c) = −1. If m ≥ 1, the list analogues of equations (5) and (6) hold, so

L L L L ∇L H (σ, τ, c) ≤ (A − amax − 1)paL + (B − bmax − 1)pbL X L ′ ′ + (aL i qi (L) + bi qi (L) − min{qi (L), qi (L)}) .

(46)

i∈[m]

We will bound each term ∇L / L(v). H (σ, τ, c) depending on whether c ∈ L(v) or c ∈ ˆ Note that AL = B L = 0, qi (L) = p L If c ∈ / L(v), then it suﬃces to show that ∇L (σ, τ, c) ≤ m λ. ai H L } = {aL , bL } with p . Using (46), ≥ p , d and qi′ (L) = pbL for every i ∈ [m]. Let {cL we obtain dL i i i i cL i

i

i

∇L H (σ, τ, c) ≤ =

X

i∈[m]

X

i∈[m]

L (aL i paL + bi pbL − min{paL , pbL }) i

i

i

i

4 ˆ, ≤ m < mλ + (dL cL i − 1)pdL i pc L i i 3

where we have used that αpα ≤ 1 and (α − 1)pα ≤ 13 . Now assume that c ∈ L(v). We will compare these bounds with the ones we obtained in L L L Section 5.1 by plugging the values of the conﬁguration (AL , B L , (aL 1 , . . . , am ), (b1 , . . . , bm )). Observe L L that there are only two diﬀerences with respect to non-list-colorings; ﬁrst, A and B can be zero, L and second, aL max and bmax can be zero. Recall that p0 = 0. It is important to stress that, L since c ∈ L(v), amax = 0 implies AL = 0, and similarly for B L . Therefore, the only diﬀerence between (46) and the bound obtained from (5), are the cases where either AL = 0 or B L = 0. If 39

L AL = aL max = 0, then the total contribution of this part is zero and analogously for B . Therefore, L L the only interesting case is when A = 0 and amax 6= 0; in this case m ≥ 2. Since AL = 0 and c ∈ L(v), there exists j ∈ [m] such that aL j = 0. Consider the conﬁguration of size m − 1 L L L L L L L (A, B; (aL 1 , . . . , aj−1 , aj+1 , . . . , am ), (b1 , . . . , bj−1 , bj+1 , . . . , bm )) .

(47)

P P L L where A = 1 + i6=j aL i and B = 1 + i6=j bi . Let bmax be the maximum of the bi with i 6= j ˆ is a feasible solution for Linear Program 1 with objective and note that bmax ≤ bL max . Recall that p and a feasible solution of Linear Program 5 with objective value 161 value 11 6 88 . If m ≥ 4, then the conﬁguration of size m − 1 in (47) for c is non-extremal and X L ′ ′ ∇H (σ, τ, c) ≤ (A − aL − 1)p + (B − b − 1)p + aL A max B max i qi + bi qi − min{qi , qi } i6=j

≤

161 (m − 1) − 1 . 88

If 1 ≤ m ≤ 3, then (47) can be extremal and ∇H (σ, τ, c) ≤ (A − aL max − 1)pA + (B − bmax − 1)pB + ≤

161 131 11 (m − 1) − 1 ≤ (m − 1) − . 6 88 132

X i6=j

L ′ ′ aL i qi + bi qi − min{qi , qi }

′ ′ For i = iL max we have qi = pamax − pA and qi (L) = pamax . Moreover, we have qj = 0 and qj (L) ≤ pbL . j

We may assume that bL max 6= 0, as otherwise we have b = bmax = 0 and the contribution of this part is zero, as before. Using these bounds and (46), we obtain that for any such c ∈ [k] X L ′ ′ L L + (aL ∇L (σ, τ, c) ≤ (B − b − 1)p L B i qi (L) + bi qi (L) − min{qi (L), qi (L)}) H max i∈[m]

≤ ∇H (σ, τ, c) − (A −

2aL max

L − 1)pA + (B L − bL max − 1)pB L + bj pbL j

L

≤ ∇H (σ, τ, c) + (A − 1)pA + (B − 2)pB L ≤ ∇H (σ, τ, c) + ≤

+ bL j pbL j

19 12

161 ·m−1, 88

1 1 L where we have used that bL max ≥ 1, A ≥ amax + 1, αpα ≤ 1, (α − 1)pα ≤ 3 and (α − 2)pα ≤ 4 . Thus, L Corollary 5.1 also holds for ∇H . Corollary 5.2 also holds for ∇L B as well, since all the negative contributions on the bound are given by Kempe components S = Sσ (u, c) of size 1, which are always ﬂippable as c ∈ L(u). The positive contributions of the Kempe components is still bounded by the same quantity since, in the worst case, they are all ﬂippable. Using the same ﬂip parameters and reasoning as in the proof of Theorem 5.1, it follows that for every k-list assignment L with k ≥ ( 11 6 − η)∆, ﬂip dynamics for L-colorings satisﬁes τmix (ǫ) = O(n(log n + log ǫ−1 )), concluding the proof of Theorem 6.1.

40