For a Rademacher sequence and square summable sequence of real numbers , the Khintchine inequality provides upper and lower bounds for the moments of the random variable,

We use for the space of square summable real sequences and

for the associated Banach norm.

Theorem 1 (Khintchine)For each , there exists positive constants such that,

(1)

for all .

Note the similarity to the Burkholder-Davis-Gundy inequality. In fact, the process is a martingale, with time index *N*. Then, the quadratic variation is equal to , and (1) can be regarded as a special case of the BDG inequality — at least, when *p* is greater than one. However, the Khintchine inequality is *much* easier to prove. I will give a proof in a moment but, first, there are some simple observations that can be made. We already know, from the previous post, that the map is an isometry between and . That is,

So, the Khintchine inequality for is trivial, and we can take . Also, it is a simple application of Jensen’s inequality to show that is increasing in *p*, for any nonnegative random variable *Z*. Specifically, for , convexity of the map gives,

Hence,

for all . We immediately see that the right-hand Khintchine inequality holds with for . Similarly, for ,

and we see that the left-hand Khintchine inequality holds with . So, the only non-trivial cases of (1) are the left-hand inequality for and the right-hand one for .

*Proof of Theorem 1:* We start with the right-hand inequality, by making use of inequality (2) from the previous post,

for any fixed positive . Since grows faster than , for any , there exists a positive constant satisfying . Hence,

Dividing by gives

with the constant .

The left-hand Khintchine inequality follows from the right-hand one, by making use of the fact that is log-convex in *p* (see lemma 2 below). Since we have already noted that for , we assume that . Choosing any , log-convexity gives

Rearranging gives the result with . â¬ś

A function is log-convex if is convex or, equivalently,

for all . In the proof above, I made use of log-convexity of the moments. This is equivalent to the statement that moment generating functions are log-convex. For completeness, I include a brief proof of this standard fact.

Lemma 2LetZbe a nonnegative random variable. Then, is log-convex over .

*Proof:* One method is to simply differentiate twice with respect to *p*. Alternatively, for , set and so that . Applying HĂ¶lder’s inequality gives the result,

â¬ś

An alternative way of looking at the Khintchine inequality is that it says that, on the space of Rademacher series, the topologies all coincide. I will use to denote . Over , it is well-known that this is a Banach norm. However, it defines a topology for each , so that convergence of a sequence of random variables to a limit in the topology is equivalent to .

Lemma 3LetVbe a linear subspace of for some . v Then, the and topologies are equivalent onVif and only if there exists positive constants satisfying,

(2)

for all .

*Proof:* First, if (2) holds, then it is immediate that a sequence converges in if and only if it converges in , so we look at the converse. Using proof by contradiction, suppose that there was no constant *C* for which the right-hand inequality holds. Then, there exists a sequence such that . By scaling, we can assume that . However, this gives so, by assumption, we also have convergence in the topology giving the contradiction .

We have shown that the right-hand inequality of (2) holds for some positive constant *C*, and the left hand inequality follows by exchanging the role of *p* and *q*. â¬ś

These ideas extend to the topology, which is just convergence in probability.

Lemma 4LetVbe a linear subspace of for some . Then, the and topologies are equivalent onVif and only if there exists strictly positive constants satisfying,

(3)

for all .

*Proof:* First, it is standard that convergence in in the topology implies convergence in probability. To be explicit, if for a sequence then, for each ,

as required. Conversely, suppose that (3) holds for given and that tends to zero in probability. We need to show that . If this was not the case then, by passing to a subsequence, we would have for some fixed positive *K*. So,

contradicting (3).

Finally, supposing that the and topologies coincide, we just need to show that (3) holds. I use proof by contradiction so, suppose that (3) does not hold for any . Then, there exists a sequence satisfying

By scaling, we can suppose that so that, in particular, the sequence does not tend to zero in . However, for any we have

for . This shows that tends to zero in but not in , contradicting the initial assumption. â¬ś

Statements such as (3) are known as *anti-concentration* inequalities, and bound, from above, the probability that a random variable can be within a given distance of its mean. So, to show the equivalence of the and topologies for Rademacher series, it is only really necessary to prove a single non-trivial anti-concentration inequality.

Lemma 5LetXbe a Rademacher sequence and . Then,

(4)

for all .

*Proof:* Using , the Paley-Zygmund inequality gives

Here, the left-hand Khintchine inequality was used in the final inequality. In fact, we can use , giving (4).

Note that is antisymmetric in flipping the sign of whenever , so has zero expectation. Hence, writing , we obtain

Letting *N* go to infinity, and using convergence in ,

So, as claimed. â¬ś

Combining with lemma 4, this result shows that the and topologies coincide for Rademacher series. Similarly, lemma 3 shows that the Khintchine inequality is equivalent to stating that the topologies coincide for all . Hence, we obtain the following alternative statement of the Khintchine inequalities, which naturally incorporates the version.

Theorem 6The space is contained in for all and, for all , the and topologies are equivalent on this space.

Although this statement unites the Khintchine inequalities for with the corresponding version for , when expressed as explicit quantitative statements as in (1) and (4), they do look rather different. For , the inequality is of the form

for some increasing functions . The left-hand inequality for the case can also be expressed in a similar style.

Lemma 7For a random variableZ, inequality (3) holds if and only if

(5)

for all increasing functions .

*Proof:* If (5) holds, then (3) follows immediately by taking . Conversely, if (3) holds then,

as required. â¬ś

The (left-hand) Khintchine inequality is then expressed by

for some constants and every increasing function . This form also makes clear that the left-hand inequality for all follows from the version. We simply take and .

Although the Khintchine inequality is only concerning Rademacher sequences, it does have implications for much more general sequences. For example, the following consequence does not place any restriction on the distribution of the random variables , which are not even required to be independent. This was central to the construction of the stochastic integral given earlier in my notes, which only required the most basic properties of semimartingales.

Theorem 8Let be a sequence of random variables in such that the set

(6)

is bounded, for some . Then, is in .

*Proof:* Set . Then, if is the uniform probability measure on , we can write

for some which are increasing unbounded functions of . Specifically for , the left-hand Khintchine inequality says that this holds with and . For , lemma 7 says that we can choose however we like, and of the form . We choose to be left-continuous and, by boundedness in probability, such that is bounded over .

In either case, we can take expectations,

and the right hand side is bounded by some constant *K* independently of *n*. Hence, letting *n* go to infinity and applying Fatou’s lemma on the left gives

as required. â¬ś

The conclusion of theorem 8 directly implies that the sequence converges to zero in the topology. It was this consequence which was important in my construction of the stochastic integral.

Corollary 9Let be a sequence of random variables such that the set (6) is bounded, for some . Then, in .

*Proof:* Since theorem 8 says that is in and, in particular, is almost surely finite, we see that almost surely. This implies that in and, for , as , dominated convergence gives in . â¬ś

I only made use of the version of corollary 9 in the construction of the stochastic integral, as this is sufficient for the property of bounded convergence in probability. However, the corollary can also be used in its versions to obtain an theory of stochastic integration. For a semimartingale *X*, we require that

is bounded, for each positive time *t*. The resulting stochastic integral will then satisfy bounded convergence in . This approach dates back to the 1981 paper Stochastic Integration and L^{p}-Theory of Semimartingales by Bichteler, and is used in his book Stochastic Integration with Jumps.

Note that the statement of corollary 9 makes sense in any topological vector space. So, for any such space *V*, we can ask whether, for every sequence such that the set

is bounded, does necessarily tend to zero? It is not difficult to show that this is true in Hilbert spaces and, more generally, in uniformly convex spaces. However, it does not hold in all spaces. Take, for example, , which is the space of bounded real sequences under uniform convergence. Then consider such that for and . The sums all have uniform norm equal to 1, so form a bounded set. However, does not tend to zero.

The spaces for fail many of the usual `nice’ properties that are often required when working with topological vector spaces. For example, is generally not uniformly convex and, for , is not even locally convex. So, corollary 9 gives a fundamental and nontrivial property that spaces *do* satisfy, and which is sufficient to ensure a well behaved theory of stochastic integration.

#### Optimal Khintchine Constants

In the discussion above, I paid no concern to the optimal values of the constants in the Khintchine inequality. All that we were concerned with is that finite and positive constants do exist. However, the inequality is clearly improved if can be made as small as possible, and is as large as possible. In fact, the optimal values of these constants are known. I will not give full proofs here, but the actual values are enlightening, so I will mention them, and prove the `easy’ case of over .

Use to denote the *p*‘th absolute Gaussian moment, which can be computed explicitly in terms of the gamma function. Letting *N* be a standard normal random variable, on some probability space,

It is straightforward (using Jensen’s inequality) to show that is strictly increasing in *p*, and . So, for and for . It turns out that, over , moments of Khintchine series are bounded above by Gaussian moments.

Theorem 10The optimal Khintchine constants are,

(7)

The first thing to note is that, for any *p*, it is always possible to achieve the bounds given in (7) or, at least, approximate them as closely as we like. Specifically, consider with in the following cases.

- If then .
- If then .
- By lemma 6 of the post on Rademacher series, if we let go to zero then .

This shows that, if it can be shown that the Khintchine inequality holds with the constants in (7), then it is optimal. We already noted above that the inequality holds with for and for , so this is optimal. The remaining cases are,

Incidentally, we already showed that in the process of proving lemma 5. A similar method, simply expanding out the power of the series, can be applied for all *even integer* *p*.

Lemma 11For each even integer , the Khintchine inequality holds with .

*Proof:* Writing , expanding the power of the sum gives

(8) |

For the product of the *X*‘s, by collecting together equal terms, we can write

where the are distinct. By symmetry in switching the sign of , if any of the powers are odd, then this will have zero expectation. On the other hand, if all of the powers are even, then it is equal to 1.

In order to simplify relating this sum to , we employ a devious trick. Let be an IID sequence of standard normal random variables defined on some probability space. As above,

will have zero mean if any of the powers are odd and, if they are all even then,

In any case, if we set then taking expectations of (8) gives,

As sums of independent normals are normal, is normal with variance ,

Finally, letting *N* go to infinity and applying Fatou’s lemma on the left hand side gives the result. â¬ś

I extend this result to all *real* . As we can no longer simply expand out the power of the sum, a convexity argument employing Jensen’s inequality will be used.

Lemma 12For each , the function

is convex on the nonnegative reals.

*Proof:* Differentiating twice, and using and , gives

Note that both terms on the right are positive so long as . On the other hand, for , compute the derivative

As , this is nonnegative, so

and substituting into the expression for gives as required. â¬ś

Unfortunately, for , the function in lemma 12 is *not* convex, and a different approach is required. I concentrate on . We will use a similar trick as in lemma 11 whereby we replace the Rademacher random variables by normals. To simplify the argument a bit, rather than replacing them all in one go, here we will replace them one at a time.

Lemma 13For each real , the Khintchine inequality holds with .

*Proof:* Applying lemma 12, and scaling, the function

is convex for any real . Hence, if *X* is a Rademacher random variable and *Y* is standard normal, then and Jensen’s inequality gives

Next, if *S* is any random variable and *X*, *Y* are as above, independently of *S*, then

(9) |

We now consider the finite sum . Replacing the Rademacher random variables one-by-one by IID standard normals , we obtain the sums

If we also consider the same sum with the *n*‘the term excluded,

then, assuming that the series are chosen independent of each other, will be independent of both and . So, applying (9),

As sums of independent normals are normal, is normal with variance . Also, , so we obtain

Letting *N* increase to infinity and applying Fatou’s lemma on the left hand side gives the result. â¬ś

The optimal left-hand inequality for and right-hand inequality for require some further techniques, as the function in lemma 12 is neither convex nor concave. I refer to the paper Ball, Haagerup, and Distribution Functions for these cases, and also to The optimal constants in Khintchine’s inequality for the case 2 < p < 3.

Finally, using the fact that and are both less than one on the range , I note that the optimal left-hand inequality is given by theorem 10 as,

However, it is not immediately obvious which of the two terms in the minimum is smaller. Writing

we see that we should take whenever

By log-convexity, there will be a unique solving and, hence, the expression for can be written more clearly as

Solving numerically gives .

Hi George! You claim that Corollary 9 can be generalized to Hilbert spaces (or more generally, uniformly convex spaces). Do you have a reference of this result? Also, I was wondering if you know about an analogous result for the Bochner space $L^0_H$ of equivalence classes of Hilbert space valued random elements?

Thanks for yet another interesting blog entry!

No reference, this is something I came up with while writing the post. Actually, I did consider if the result holds for all reflexive Banach spaces. Maybe it does, but I am not sure, and did not want to spend too long on what was a side-comment and not the focus of the post. For a Hilbert space it is easy. If the norm of elements of the set is bounded by K, then

and it follows that . Uniformly convex spaces are a little trickier. Consider , where are chosen inductively to maximize (given the values of for k less than n). It can be seen that is non-decreasing and, as it is bounded by assumption, tends to a limit K (we have not used uniform convexity yet…). The uniform convex property can be used to show that for each fixed positive , there exists a such that

whenever , which is a contradiction unless for all large n, showing that it converges to zero.

It extends to Bochner spaces (defined wrt a Hilbert space H), you can generalise the Khintchine inequality to a_n lying in an arbitrary Hilbert space, so the result should still hold.

Indeed you are right! Thanks for this nice proof and your insight into these special cases!

In your second equation line, where you bound the sum by $K^2$, should not the first equality be replaced by an inequality? Since you take the sum over all permutations of +-1, you will get some extra terms. Of course, this will change nothing and your elegant argument still holds.

No, it should be an equality, basically it is the parallelogram law extended to arbitrarily many terms, and is an identity in any Hilbert space.

Thanks for another nice post!

Two minor typos:

(1) Proof of lemma 4: inequality should be reversed before “contradicting (3)”.

(2) Proof of lemma 5: the last inequality should be reversed… and it seems the coefficient before should be 2?

[George: Fixed, thanks!]I’m sorry, but I don’t think the proof to the right hand side of Theorem 1 is right. The term $\Vert a \Vert_2$ is over an exponential term, how does dividing $\lambda^p$ implies the inequality?

I mean, lambda is an arbitrary positive number, so can be replaced by lambda / ||a||_2, in which case you get the required inequality.

Thank you for this post! I just used your Lemma 2 and the proof therein to upper bound unknown moments of a non-negative random variable via an interpolation of the moments that I know. This saved a proof within the last hours before submitting an article đź™‚