Concentration Inequalities

In a 1986 article of The American Mathematical Monthly written by Richard Guy, the following question was asked, and attributed to Bogusłav Tomaszewski: Consider n real numbers a₁, …, a_n such that Σ_ia_i² = 1. Of the 2ⁿ expressions |±a₁±⋯±a_n|,

can there be more with value > 1 than with value ≤ 1?

A cursory attempt to find such real numbers a_i where more of the absolute signed sums have value > 1 than have value ≤ 1 should be enough to convince you that it is, in fact, impossible. The answer was therefore expected to be no, it is not possible. This has claim since been known as Tomaszewski’s conjecture, and there have been many proofs of weaker versions over the years until, finally, in 2020, it was proved by Keller and Klein in the paper Proof of Tomaszewski’s Conjecture on Randomly Signed Sums.

An alternative formulation is in terms of Rademacher sums

$\displaystyle Z=a_1X_1+a_2X_2+\cdots+a_nX_n$

(1)

where X₁, …, X_n are independent ‘random signs’. That is, they have the Rademacher distribution ℙ(X_i = 1) = ℙ(X_i = -1) = 1/2. Then, Z has variance Σ_ia_i² and each of the 2ⁿ values ±a₁±⋯±a_n occurs with equal probability. So, Tomaszewski’s conjecture is the statement that

$\displaystyle {\mathbb P}(\lvert Z\rvert\le1)\ge{\mathbb P}(\lvert Z\rvert > 1)$

(2)

for unit variance Rademacher sums Z. It is usually stated in the equivalent, but more convenient form

$\displaystyle {\mathbb P}(\lvert Z\rvert\le1)\ge1/2.$

(3)

I will discuss Tomaszewski’s conjecture and the ideas central to the proof given by Keller and Klein. I will not give a full derivation here. That would get very tedious, as evidenced both by the length of the quoted paper and by its use of computer assistance. However, I will prove the ‘difficult’ cases, which makes use of the tricks essential to Keller and Klein’s proof, with all remaining cases being, in theory, provable by brute force. In particular, I give a reformulation of the inductive stopping time argument that they used. This is a very ingenious trick that was introduced by Keller and Klein, and describing this is one of the main motivations for this post. Another technique also used in the proof is based on the reflection principle, in addition to some tricks discussed in the earlier post on Rademacher concentration inequalities.

To get a feel for Rademacher sums, some simple examples are shown in figure 1. I use the notation a = (a₁, …, a_n) to represent the sequence with first n terms given by a_i, and any remaining terms equal to zero. The plots show the successive partial sums for each sequence of values of the random signs (X₁, X₂, …), with the dashed lines marking the ±1 levels.

The examples demonstrate that ℙ(|Z| ≤ 1) can achieve the bound of 1/2 in some cases, and be strictly more in others. The top-left and bottom-right plots show that, for certain coefficients, |Z| has a positive probability of being exactly equal to 1 and, furthermore, the claimed bound fails for ℙ(|Z|< 1). So, the inequality is optimal in a couple of ways. These examples concern a small number of nonzero coefficients. In the other extreme, for a large number of small coefficients, the central limit theorem says that Z is approximately a standard normal and ℙ(|Z| ≤ 1) is close to Φ(1) – Φ(-1) ≈ 0.68. Continue reading “Tomaszewski’s Conjecture” →

Rademacher sum

Concentration inequalities place lower bounds on the probability of a random variable being close to a given value. Typically, they will state something along the lines that a variable Z is within a distance x of value μ with probability at least p,

$\displaystyle {\mathbb P}(\lvert Z-\mu\rvert\le x)\ge p.$

(1)

Although such statements can be made in more general topological spaces, I only consider real valued random variables here. Clearly, (1) is the same as saying that Z is greater than distance x from μ with probability no more than q = 1 – p. We can express concentration inequalities either way round, depending on what is convenient. Also, the inequality signs in expressions such as (1) may or may not be strict. A very simple example is Markov’s inequality,

$\displaystyle {\mathbb P}(\lvert Z\rvert\ge x)\le\frac{{\mathbb E}\lvert Z\rvert}{x}.$

In the other direction, we also encounter anti-concentration inequalities, which place lower bounds on on the probability of a random variable being at least some distance from a specified value, so take the form

$\displaystyle {\mathbb P}(\lvert Z-\mu\rvert\ge x)\ge p.$

(2)

An example is the Paley-Zygmund inequality,

$\displaystyle {\mathbb P}(\lvert Z\rvert > x)\ge\frac{({\mathbb E}\lvert Z\rvert-x)^2}{{\mathbb E}[Z^2]}$

which holds for all 0 ≤ x ≤ 𝔼|Z|.

While the examples given above of the Markov and Paley-Zygmund inequalities are very general, applying whenever the required moments exist, they are also rather weak. For restricted classes of random variables much stronger bounds can often be obtained. Here, I will be concerned with optimal concentration and anti-concentration bounds for Rademacher sums. Recall that these are of the form

$\displaystyle Z=a\cdot X=\sum_{n=1}^\infty a_nX_n$

for IID random variables X = (X₁, X₂, …) with the Rademacher distribution, ℙ(X_n = 1) = ℙ(X_n = -1) = 1/2, and a = (a₁, a₂, …) is a square-summable sequence. This sum converges to a limit with zero mean and variance

$\displaystyle {\mathbb E}[Z^2]=\lVert a\rVert_2^2=\sum_na_n^2.$

I discussed such sums at length in the posts on Rademacher series and the Khintchine inequality, and have been planning on making this follow-up post ever since. In fact, the L⁰ Khintchine inequality was effectively the same thing as an anti-concentration bound. It was far from optimal as presented there, and relied on the rather inefficient Paley-Zygmund inequality for the proof. Recently, though, a paper was posted on arXiv claiming to confirm conjectured optimal anti-concentration bounds which I had previous mentioned on mathoverflow. See Tight lower bounds for anti-concentration of Rademacher sums and Tomaszewski’s counterpart problem by Lawrence Hollom and Julien Portier.

While the form of the tight Rademacher concentration and anti-concentration bounds may seem surprising at first, being piecewise constant and jumping between rather arbitrary looking rational values at seemingly arbitrary points, I will explain why this is so. It is actually rather interesting and has been a source of conjectures over the past few decades, some of which have now been proved and some which remain open. Actually, as I will explain, many tight bounds can be proven in principle by direct computation, although it would be rather numerically intensive to perform in practice. In fact, some recent results — including those of Hollom and Portier mentioned above — were solved with the aid of a computer to perform the numerical legwork.

Anti-Concentration Bounds

For a Rademacher sum Z of unit variance, recall from the post on the Khintchine inequality that the anti-concentration bound

$\displaystyle {\mathbb P}(\lvert Z\rvert\ge x)\ge(1-x^2)^2/3.$

(3)

holds for all non-negative x ≤ 1. This followed from Payley-Zygmund together with the simple Khintchine inequality 𝔼[Z²] ≤ 3. However, this is sub-optimal and is especially bad in the limit as x increases to 1 where the bound tends to zero whereas, as we will see, the optimal bound remains strictly positive.

In the other direction if, for positive integer n, we choose coefficients a ∈ ℓ² with a_k = 1/√n for k ≤ n and zero elsewhere then, by the central limit theorem, Z = a·X tends to a standard normal distribution. as n becomes large. Hence

$\displaystyle {\mathbb P}(\lvert Z\rvert \ge x)\rightarrow2\Phi(-x)$

where Φ is the cumulative normal distribution function. So, any anti-concentration bound must be no more than this.

The optimal anti-concentration bounds have been open conjectures for a while, but are now proved and described by theorem 1 below, as plotted in figure 1. They are given by a piecewise constant function and, as clear from the plot, lie strictly between the simple Paley-Zymund bound and Gaussian probabilities.

Theorem 1 The optimal lower bound p for the inequality ℙ(|Z| ≥ x) ≥ p for Rademacher sums Z of unit variance is,

$\displaystyle p=\begin{cases} 1,&{\rm for\ }x=0,\\ 1/2,&{\rm for\ }0 < x\le1/\sqrt7,\\ 29/64,&{\rm for\ }1/\sqrt7 < x \le1/\sqrt5,\\ 3/8,&{\rm for\ }1/\sqrt5 < x\le1/\sqrt3,\\ 1/4,&{\rm for\ }1/\sqrt3 < x\le2/\sqrt6,\\ 7/32,&{\rm for\ }2/\sqrt6 < x\le 1,\\ 0,&{\rm for\ }1 < x. \end{cases}$

At first sight, this result might seem a little strange. Why do the optimal bounds take this discrete set of values, and why does it jump at these arbitrary seeming values of x? To answer that, consider the distribution of a Rademacher sum. When all coefficients are small it approximates a standard normal and the anti-concentration probabilities approach those indicated by the ‘Gaussian bound’ in figure 1. However, these are not optimal, and the minimal probabilities are obtained at the opposite extreme with a small number n of relatively large coefficients and the remaining being zero. In this case, the distribution is finite with probabilities being multiples of 2^–n, and the bound jumps when x passes through the discrete levels.

The values of a ∈ ℓ² for which the stated bounds are achieved are not hard to find. For convenience, I use (a₁, a₂, …, a_n) to represent the sequence starting with the stated values and remaining terms being zero, a_k = 0 for k > n. Also, if c is a numeral than c_k will denote repeating this value k times.

Lemma 2 The optimal lower bound stated by theorem 1 for Rademacher sum Z = a·X is achieved with

$\displaystyle a=\begin{cases} (1),&{\rm for\ }x=0,\\ (1_2)/\sqrt2,&{\rm for\ }0 < x\le1/\sqrt7,\\ (1_7)/\sqrt7,&{\rm for\ }1/\sqrt7 < x\le1/\sqrt5,\\ (1_5)/\sqrt5,&{\rm for\ }1/\sqrt5 < x\le1/\sqrt3,\\ (1_3)/\sqrt3,&{\rm for\ }1/\sqrt3 < x\le2/\sqrt6,\\ (1_6)/\sqrt6,&{\rm for\ }2/\sqrt6 < x\le1,\\ (1),&{\rm for\ }1 < x.\\ \end{cases}$

This is straightforward to verify by simply counting the number of sign values of (X₁, |, X_n) for which |a·X| ≥ x and multiplying by 2^–n. It does however show that it is impossible to do better than theorem 1 so that, if the bounds hold, they must be optimal. Also, as ℙ(|Z| ≥ x) is decreasing in x, to establish the result it is sufficient to show that the bounds hold at the values of x where it jumps. This reduces theorem 1 to the following finite set of inequalities.

Theorem 3 A Rademacher sum Z of unit variance satisfies,

$\displaystyle \begin{aligned} &{\mathbb P}(\lvert Z\rvert\ge1/\sqrt7)\ge1/2,\\ &{\mathbb P}(\lvert Z\rvert\ge1/\sqrt5)\ge29/64,\\ &{\mathbb P}(\lvert Z\rvert\ge1/\sqrt3)\ge3/8,\\ &{\mathbb P}(\lvert Z\rvert\ge2/\sqrt6)\ge1/4,\\ &{\mathbb P}(\lvert Z\rvert\ge1)\ge7/32. \end{aligned}$

The last of these has been an open conjecture for years since it was mentioned in a 1996 paper by Oleszkiewicz. I asked about the first one in a 2021 mathoverflow question, and also mentioned the finite set of values x at which the optimal bound jumps, hinting at the full set of inequalities, which was the desired goal. Finally, in 2023, a preprint appeared on arXiv claiming to prove all of these. While I have not completely verified the proof and the computer programs used myself, it does look likely to be correct.

Although the bounds given above are all for anti-concentration about 0, a simple trick shows that they will also hold about any real value.

Lemma 4 Suppose that the inequality ℙ(|Z| ≥ x) ≥ p holds for all Rademacher sums Z of unit variance. Then the anti-concentration bound

$\displaystyle {\mathbb P}\left(\lvert Z-\mu\rvert\ge x\right)\ge p$

also holds about every value μ.

Proof: If Z = a·X and, independently, Y is a Rademacher random variable, then Z + μY is a Rademacher sum of variance 1 + μ². This follows from the fact that it has the same distribution as b·X where b = (μ, a₁, a₂, …). So,

$\displaystyle \begin{aligned} p &\le{\mathbb P}\left(\lvert Z+\mu Y\rvert/\sqrt{1+\mu^2}\ge x\right)\\ &=\frac12{\mathbb P}\left(\lvert Z-\mu\rvert\ge x\sqrt{1+\mu^2}\right)+\frac12{\mathbb P}\left(\lvert Z+\mu\rvert\ge x\sqrt{1+\mu^2}\right) \end{aligned}$

By symmetry of the distribution of Z, both probabilities on the right are equal giving,

$\displaystyle {\mathbb P}\left(\lvert Z-\mu\rvert\ge x\sqrt{1+\mu^2}\right)\ge p$

implying the result. ⬜

You have probably noted already, that all of the coefficients described in lemma 2 are of the form (1_n)/√n. That is, they are a finite a sequence of equal values with the remaining terms being zero. In fact, it has been conjectured that all optimal concentration and anti-concentration bounds (about 0) for Rademacher sums can be attained in this way. This is attributed to Edelman dating back to private communications in 1991. If true, it would turn the process of finding and proving such optimal bounds into a straightforward calculation but, unfortunately, in 2012, the conjecture was shown to be false by Pinelis for some concentration inequalities.

Before moving on, let’s mention how such bounds can be discovered in the first place. Running a computer simulation for randomly chosen finite sequences of coefficients very quickly converges on the optimal values. As soon as we randomly select values close to those given by lemma 2, we obtain the exact bounds and any further simulations only serve to verify that these hold. Running additional simulations with coefficients chosen randomly in the neighbourhood of points where the bound is attained, and near certain ‘critical’ values where the bound looks close to being broken, further strengthens the belief that they are indeed optimal. At least, this is how I originally found them before asking the mathoverflow question, although this is still far from a proof.

Concentration Bounds

Continue reading “Rademacher Concentration Inequalities” →

	Anonymous on About
	Anonymous on About
	Anonymous on The Stochastic Integral
	Anonymous on Bessel Processes
	Anonymous on Bessel Processes
	Stat Prof on Pathwise Regularity of Optiona…
	Yang Chu on Spitzer’s Formula
	Anonymous on About
	Anonymous on Continuous Processes with Inde…
	Anonymous on Continuous Processes with Inde…