For a Rademacher sequence and square summable sequence of real numbers
, the Khintchine inequality provides upper and lower bounds for the moments of the random variable,
We use for the space of square summable real sequences and
for the associated Banach norm.
Theorem 1 (Khintchine) For each
, there exists positive constants
such that,
(1) for all
.
Note the similarity to the Burkholder-Davis-Gundy inequality. In fact, the process is a martingale, with time index N. Then, the quadratic variation
is equal to
, and (1) can be regarded as a special case of the BDG inequality — at least, when p is greater than one. However, the Khintchine inequality is much easier to prove. I will give a proof in a moment but, first, there are some simple observations that can be made. We already know, from the previous post, that the map
is an isometry between
and
. That is,
So, the Khintchine inequality for is trivial, and we can take
. Also, it is a simple application of Jensen’s inequality to show that
is increasing in p, for any nonnegative random variable Z. Specifically, for
, convexity of the map
gives,
Hence,
for all . We immediately see that the right-hand Khintchine inequality holds with
for
. Similarly, for
,
and we see that the left-hand Khintchine inequality holds with . So, the only non-trivial cases of (1) are the left-hand inequality for
and the right-hand one for
.
Proof of Theorem 1: We start with the right-hand inequality, by making use of inequality (2) from the previous post,
for any fixed positive . Since
grows faster than
, for any
, there exists a positive constant
satisfying
. Hence,
Dividing by
gives
with the constant .
The left-hand Khintchine inequality follows from the right-hand one, by making use of the fact that is log-convex in p (see lemma 2 below). Since we have already noted that
for
, we assume that
. Choosing any
, log-convexity gives
Rearranging gives the result with . ⬜
A function is log-convex if
is convex or, equivalently,
for all . In the proof above, I made use of log-convexity of the moments. This is equivalent to the statement that moment generating functions are log-convex. For completeness, I include a brief proof of this standard fact.
Lemma 2 Let Z be a nonnegative random variable. Then,
is log-convex over
.
Proof: One method is to simply differentiate twice with respect to p. Alternatively, for
, set
and
so that
. Applying Hölder’s inequality gives the result,
⬜
An alternative way of looking at the Khintchine inequality is that it says that, on the space of Rademacher series, the topologies all coincide. I will use
to denote
. Over
, it is well-known that this is a Banach norm. However, it defines a topology for each
, so that convergence of a sequence of random variables
to a limit
in the
topology is equivalent to
.
Lemma 3 Let V be a linear subspace of
for some
. v Then, the
and
topologies are equivalent on V if and only if there exists positive constants
satisfying,
(2) for all
.
Proof: First, if (2) holds, then it is immediate that a sequence converges in if and only if it converges in
, so we look at the converse. Using proof by contradiction, suppose that there was no constant C for which the right-hand inequality holds. Then, there exists a sequence
such that
. By scaling, we can assume that
. However, this gives
so, by assumption, we also have convergence in the
topology giving the contradiction
.
We have shown that the right-hand inequality of (2) holds for some positive constant C, and the left hand inequality follows by exchanging the role of p and q. ⬜
These ideas extend to the topology, which is just convergence in probability.
Lemma 4 Let V be a linear subspace of
for some
. Then, the
and
topologies are equivalent on V if and only if there exists strictly positive constants
satisfying,
(3) for all
.
Proof: First, it is standard that convergence in in the topology implies convergence in probability. To be explicit, if
for a sequence
then, for each
,
as required. Conversely, suppose that (3) holds for given and that
tends to zero in probability. We need to show that
. If this was not the case then, by passing to a subsequence, we would have
for some fixed positive K. So,
contradicting (3).
Finally, supposing that the and
topologies coincide, we just need to show that (3) holds. I use proof by contradiction so, suppose that (3) does not hold for any
. Then, there exists a sequence
satisfying
By scaling, we can suppose that so that, in particular, the sequence does not tend to zero in
. However, for any
we have
for . This shows that
tends to zero in
but not in
, contradicting the initial assumption. ⬜
Statements such as (3) are known as anti-concentration inequalities, and bound, from above, the probability that a random variable can be within a given distance of its mean. So, to show the equivalence of the and
topologies for Rademacher series, it is only really necessary to prove a single non-trivial anti-concentration inequality.
Lemma 5 Let X be a Rademacher sequence and
. Then,
(4) for all
.
Proof: Using , the Paley-Zygmund inequality gives
Here, the left-hand Khintchine inequality was used in the final inequality. In fact, we can use , giving (4).
Note that is antisymmetric in flipping the sign of
whenever
, so has zero expectation. Hence, writing
, we obtain
Letting N go to infinity, and using convergence in ,
So, as claimed. ⬜
Combining with lemma 4, this result shows that the and
topologies coincide for Rademacher series. Similarly, lemma 3 shows that the Khintchine inequality is equivalent to stating that the
topologies coincide for all
. Hence, we obtain the following alternative statement of the Khintchine inequalities, which naturally incorporates the
version.
Theorem 6 The space
is contained in
for all
and, for all
, the
and
topologies are equivalent on this space.
Although this statement unites the Khintchine inequalities for with the corresponding version for
, when expressed as explicit quantitative statements as in (1) and (4), they do look rather different. For
, the inequality is of the form
for some increasing functions . The left-hand inequality for the
case can also be expressed in a similar style.
Lemma 7 For a random variable Z, inequality (3) holds if and only if
(5) for all increasing functions
.
Proof: If (5) holds, then (3) follows immediately by taking . Conversely, if (3) holds then,
as required. ⬜
The (left-hand) Khintchine inequality is then expressed by
for some constants and every increasing function
. This form also makes clear that the
left-hand inequality for all
follows from the
version. We simply take
and
.
Although the Khintchine inequality is only concerning Rademacher sequences, it does have implications for much more general sequences. For example, the following consequence does not place any restriction on the distribution of the random variables , which are not even required to be independent. This was central to the construction of the stochastic integral given earlier in my notes, which only required the most basic properties of semimartingales.
Theorem 8 Let
be a sequence of random variables in
such that the set
(6) is
bounded, for some
. Then,
is in
.
Proof: Set . Then, if
is the uniform probability measure on
, we can write
for some which are increasing unbounded functions of
. Specifically for
, the left-hand Khintchine inequality says that this holds with
and
. For
, lemma 7 says that we can choose
however we like, and
of the form
. We choose
to be left-continuous and, by boundedness in probability, such that
is bounded over
.
In either case, we can take expectations,
and the right hand side is bounded by some constant K independently of n. Hence, letting n go to infinity and applying Fatou’s lemma on the left gives
as required. ⬜
The conclusion of theorem 8 directly implies that the sequence converges to zero in the topology. It was this consequence which was important in my construction of the stochastic integral.
Corollary 9 Let
be a sequence of random variables such that the set (6) is
bounded, for some
. Then,
in
.
Proof: Since theorem 8 says that is in
and, in particular, is almost surely finite, we see that
almost surely. This implies that
in
and, for
, as
, dominated convergence gives
in
. ⬜
I only made use of the version of corollary 9 in the construction of the stochastic integral, as this is sufficient for the property of bounded convergence in probability. However, the corollary can also be used in its
versions to obtain an
theory of stochastic integration. For a semimartingale X, we require that
is bounded, for each positive time t. The resulting stochastic integral will then satisfy bounded convergence in
. This approach dates back to the 1981 paper Stochastic Integration and Lp-Theory of Semimartingales by Bichteler, and is used in his book Stochastic Integration with Jumps.
Note that the statement of corollary 9 makes sense in any topological vector space. So, for any such space V, we can ask whether, for every sequence such that the set
is bounded, does necessarily tend to zero? It is not difficult to show that this is true in Hilbert spaces and, more generally, in uniformly convex spaces. However, it does not hold in all spaces. Take, for example,
, which is the space of bounded real sequences
under uniform convergence. Then consider
such that
for
and
. The sums
all have uniform norm equal to 1, so form a bounded set. However,
does not tend to zero.
The spaces for
fail many of the usual `nice’ properties that are often required when working with topological vector spaces. For example,
is generally not uniformly convex and, for
,
is not even locally convex. So, corollary 9 gives a fundamental and nontrivial property that
spaces do satisfy, and which is sufficient to ensure a well behaved theory of stochastic integration.
Optimal Khintchine Constants
In the discussion above, I paid no concern to the optimal values of the constants in the Khintchine inequality. All that we were concerned with is that finite and positive constants do exist. However, the inequality is clearly improved if can be made as small as possible, and
is as large as possible. In fact, the optimal values of these constants are known. I will not give full proofs here, but the actual values are enlightening, so I will mention them, and prove the `easy’ case of
over
.
Use to denote the p‘th absolute Gaussian moment, which can be computed explicitly in terms of the gamma function. Letting N be a standard normal random variable, on some probability space,
It is straightforward (using Jensen’s inequality) to show that is strictly increasing in p, and
. So,
for
and
for
. It turns out that, over
, moments of Khintchine series are bounded above by Gaussian moments.
Theorem 10 The optimal Khintchine constants are,
(7)
The first thing to note is that, for any p, it is always possible to achieve the bounds given in (7) or, at least, approximate them as closely as we like. Specifically, consider with
in the following cases.
- If
then
.
- If
then
.
- By lemma 6 of the post on Rademacher series, if we let
go to zero then
.
This shows that, if it can be shown that the Khintchine inequality holds with the constants in (7), then it is optimal. We already noted above that the inequality holds with for
and
for
, so this is optimal. The remaining cases are,
Incidentally, we already showed that in the process of proving lemma 5. A similar method, simply expanding out the power of the series, can be applied for all even integer p.
Lemma 11 For each even integer
, the Khintchine inequality holds with
.
Proof: Writing , expanding the power of the sum gives
(8) |
For the product of the X‘s, by collecting together equal terms, we can write
where the are distinct. By symmetry in switching the sign of
, if any of the powers
are odd, then this will have zero expectation. On the other hand, if all of the powers are even, then it is equal to 1.
In order to simplify relating this sum to , we employ a devious trick. Let
be an IID sequence of standard normal random variables defined on some probability space. As above,
will have zero mean if any of the powers are odd and, if they are all even then,
In any case, if we set then taking expectations of (8) gives,
As sums of independent normals are normal, is normal with variance
,
Finally, letting N go to infinity and applying Fatou’s lemma on the left hand side gives the result. ⬜
I extend this result to all real . As we can no longer simply expand out the power of the sum, a convexity argument employing Jensen’s inequality will be used.
Lemma 12 For each
, the function
is convex on the nonnegative reals.
Proof: Differentiating twice, and using
and
, gives
Note that both terms on the right are positive so long as . On the other hand, for
, compute the derivative
As , this is nonnegative, so
and substituting into the expression for gives
as required. ⬜
Unfortunately, for , the function in lemma 12 is not convex, and a different approach is required. I concentrate on
. We will use a similar trick as in lemma 11 whereby we replace the Rademacher random variables by normals. To simplify the argument a bit, rather than replacing them all in one go, here we will replace them one at a time.
Lemma 13 For each real
, the Khintchine inequality holds with
.
Proof: Applying lemma 12, and scaling, the function
is convex for any real . Hence, if X is a Rademacher random variable and Y is standard normal, then
and Jensen’s inequality gives
Next, if S is any random variable and X, Y are as above, independently of S, then
(9) |
We now consider the finite sum . Replacing the Rademacher random variables one-by-one by IID standard normals
, we obtain the sums
If we also consider the same sum with the n‘the term excluded,
then, assuming that the series are chosen independent of each other,
will be independent of both
and
. So, applying (9),
As sums of independent normals are normal, is normal with variance
. Also,
, so we obtain
Letting N increase to infinity and applying Fatou’s lemma on the left hand side gives the result. ⬜
The optimal left-hand inequality for and right-hand inequality for
require some further techniques, as the function in lemma 12 is neither convex nor concave. I refer to the paper Ball, Haagerup, and Distribution Functions for these cases, and also to The optimal constants in Khintchine’s inequality for the case 2 < p < 3.
Finally, using the fact that and
are both less than one on the range
, I note that the optimal left-hand inequality is given by theorem 10 as,
However, it is not immediately obvious which of the two terms in the minimum is smaller. Writing
we see that we should take whenever
By log-convexity, there will be a unique solving
and, hence, the expression for
can be written more clearly as
Solving numerically gives .
Hi George! You claim that Corollary 9 can be generalized to Hilbert spaces (or more generally, uniformly convex spaces). Do you have a reference of this result? Also, I was wondering if you know about an analogous result for the Bochner space $L^0_H$ of equivalence classes of Hilbert space valued random elements?
Thanks for yet another interesting blog entry!
No reference, this is something I came up with while writing the post. Actually, I did consider if the result holds for all reflexive Banach spaces. Maybe it does, but I am not sure, and did not want to spend too long on what was a side-comment and not the focus of the post. For a Hilbert space it is easy. If the norm of elements of the set
is bounded by K, then
and it follows that
. Uniformly convex spaces are a little trickier. Consider
, where
are chosen inductively to maximize
(given the values of
for k less than n). It can be seen that
is non-decreasing and, as it is bounded by assumption, tends to a limit K (we have not used uniform convexity yet…). The uniform convex property can be used to show that for each fixed positive
, there exists a
such that
whenever
, which is a contradiction unless
for all large n, showing that it converges to zero.
It extends to Bochner spaces (defined wrt a Hilbert space H), you can generalise the Khintchine inequality to a_n lying in an arbitrary Hilbert space, so the result should still hold.
Indeed you are right! Thanks for this nice proof and your insight into these special cases!
In your second equation line, where you bound the sum by $K^2$, should not the first equality be replaced by an inequality? Since you take the sum over all permutations of +-1, you will get some extra terms. Of course, this will change nothing and your elegant argument still holds.
No, it should be an equality, basically it is the parallelogram law extended to arbitrarily many terms, and is an identity in any Hilbert space.
Thanks for another nice post!
Two minor typos:
should be 2?
(1) Proof of lemma 4: inequality should be reversed before “contradicting (3)”.
(2) Proof of lemma 5: the last inequality should be reversed… and it seems the coefficient before
[George: Fixed, thanks!]
I’m sorry, but I don’t think the proof to the right hand side of Theorem 1 is right. The term $\Vert a \Vert_2$ is over an exponential term, how does dividing $\lambda^p$ implies the inequality?
I mean, lambda is an arbitrary positive number, so can be replaced by lambda / ||a||_2, in which case you get the required inequality.
Thank you for this post! I just used your Lemma 2 and the proof therein to upper bound unknown moments of a non-negative random variable via an interpolation of the moments that I know. This saved a proof within the last hours before submitting an article 🙂