Welcome!

Welcome to Absolutely Sure! This is my new blog, and a companion to the already-existing existing Almost Sure, which is a `random mathematical blog’ concentrating on probability theory and stochastic calculus. In contrast, Absolutely Sure will focus on pure mathematics and,more generally, any mathematical content which does not fit into the category of probability theory.

While the potential scope of this new blog is quite wide, encapsulating all of mathematics, there are some subjects which I plan to kick off with.

  • The Riemann Zeta function, Dirichlet Series, and L-series.
  • The prime number theorem and Dirichlet’s theorem on primes in an arithmetic progression.
  • The Riemann Hypothesis.
  • p-adic numbers, Valuation Theory, and Adelic numbers.

In particular, I would like to look at approaches to the Riemann hypothesis, which is one of the great unsolved problems of mathematics. It has been solved in the case of zeta functions over function fields, or algebraic varieties over finite fields. We will look at some of the known proofs for function fields, which many researchers have tried to extend to the number field case. In particular, Enrico Bombieri’s proof for the function field case can be understood with just a knowledge of valuation theory, whereas other methods require some algebraic geometry. Although the Riemann hypothesis is unsolved, there are some partial results which we will look at — such as zero-free regions of the zeta function in the critical strip and the proof that a positive proportion of the zeros do lie on on the critical line.

Besides the ideas suggested above, there are many different topics which could be covered here.

George Lowther

The Riemann Zeta Function and the Functional Equation

For these initial posts of this blog, I will look at one of the most fascinating objects in mathematics, the Riemann zeta function. This is defined by the infinite sum

\displaystyle  \zeta(s)=\sum_{n=1}^\infty n^{-s}, (1)

which can be shown to be uniformly convergent for all {s\in{\mathbb C}} with {\Re(s) > 1}. It was Bernhard Riemann who showed that it can be analytically continued to the entire complex plane with a single pole at {s=1}, derived a functional equation, and showed how its zeros are closely linked to the distribution of the prime numbers. Riemann’s seminal 1859 paper is still an excellent introduction to this subject, an English translation of which (On the Number of Prime Numbers less than a Given Quantity) can be found on the Clay Mathematics Institute website, who included the conjecture that the `non-trivial’ zeros all lie on the line {\Re(s)=1/2} among their million dollar millenium problems.

In this post I will give a brief introduction to the zeta function and look at its functional equation. In particular, the functional equation can be generalized and reinterpreted as an identity of Mellin transforms, which links the additive Fourier transform on {{\mathbb R}} with the multiplicative Mellin transform on the nonzero reals {{\mathbb R}^*}. The aim is to the prove the generalized functional equation and some properties of the zeta function, working from first principles, and discuss at a high level how this relates to the ideas in Tate’s thesis. Some standard complex analysis and Fourier transform theory will be used, but no prior understanding of the Riemann zeta function is assumed.

The zeta function has a long history, going back to the Basel problem which was posed by Pietro Mengoli in 1644. This asked for the exact value of the sum of the reciprocals of the square numbers or, equivalently, the value of {\zeta(2)}. This was eventually solved by Leonard Euler in 1734 who discovered the famous identity

\displaystyle  \zeta(2)=\frac{\pi^2}{6}.

Euler found the values of {\zeta} at all positive even numbers, although I will not be concerned with this here. More pertinent to the current discussion is the product expression also found by Euler,

\displaystyle  \zeta(s)=\prod_p(1-p^{-s})^{-1}. (2)

The product is taken over all prime numbers {p}, and converges on {\Re(s) > 1}. Proving (2) is straightforward. The formula for summing a geometric series gives

\displaystyle  (1-p^{-s})^{-1}=1+p^{-s}+p^{-2s}+p^{-3s}+\cdots.

Substituting this into (2) and expanding the product gives an infinite sum over terms of the form

\displaystyle  (p_1^{r_1}p_2^{r_2}\cdots p_k^{r_k})^{-s}

for {k\ge0}, primes {p_1 < p_2 < \cdots < p_k}, and integers {r_i > 0}. Using the fact that every positive integer has a unique expression as a product of powers of distinct primes, we see that the Euler product expands as a sum of terms of the form {n^{-s}} as {n} ranges over the positive integers. This is just the right hand side of (1) and shows that the Euler product converges and is equal to {\zeta(s)} whenever the sum (1) is absolutely convergent.

The Euler product provides a link between the zeta function and the prime numbers, with far-reaching consequences. For example, the prime number theorem describing the asymptotic distribution of the prime numbers was originally proved using the Euler product, and the strongest known error terms available for this theorem still rely on the link between the prime numbers and the zeta function given by (2). Euler used the fact that (1) diverges at {s=1} to argue that (2) also diverges at {s=1}. From this, it is immediately deduced that there are infinitely many primes and, more specifically, the reciprocals of the primes sum to infinity.

The Euler product can also be expressed in terms of the logarithm of the zeta function. Using the Taylor series expansion of {\log(1-p^{-s})}, we obtain

\displaystyle  \log\zeta(s)=\sum_p\sum_{k=1}^\infty \frac{p^{-ks}}{k}. (3)

As the terms on the right hand side are bounded by {\lvert n^{-s}\rvert} as {n} runs through the subset of the natural numbers consisting of prime powers, it will be absolutely convergent whenever (1) is. In particular, (3) converges on the half-plane {\Re(s) > 1}. Although the complex logarithm is generally only defined up to integer multiples of {2\pi i}, (3) gives the unique continuous version of {\log\zeta(s)} over {\Re(s) > 1} which takes real values on the real line.

We will first look at the zeta functional equation as described by Riemann. This involves the gamma function defined on {\Re(s) > 0} by the absolutely convergent integral

\displaystyle  \Gamma(s)=\int_0^\infty x^{s-1}e^{-s}\,dx. (4)

This is easily evaluated at {s=1} to get {\Gamma(1)=1}, and an integration by parts gives the functional equation of {\Gamma},

\displaystyle  \Gamma(s+1)=s\Gamma(s).

This can be used to evaluate the gamma function at the positive integers, {\Gamma(n)=(n-1)!}. Also, by expressing {\Gamma(s)} in terms of {\Gamma(s+1)}, it allows us to extend {\Gamma(s)} to a meromorphic function on {\Re(s) > -1} with a single simple pole at {s=0}. Repeatedly applying this idea extends {\Gamma(s)} as a meromorphic function on the entire complex plane with a simple pole at each non-positive integer. Furthermore, it is known that {\Gamma(s)} is non-zero everywhere on {{\mathbb C}}.

The functional equation can now be stated as follows.

Theorem 1 (Riemann) The function {\zeta(s)} defined by (1) uniquely extends to a meromorphic function on {{\mathbb C}} with a single simple pole at {s=1} of residue {1}. Setting

\displaystyle  \Lambda(s)=\pi^{-s/2}\Gamma(s/2)\zeta(s) (5)

this satisfies the identity

\displaystyle  \Lambda(s)=\Lambda(1-s). (6)

Riemann actually gave two independent proofs of this, the first using contour integration and the second using an identity of Jacobi. Many alternative proofs have been discovered since, with Titchmarsh listing seven (The Theory of the Riemann Zeta Function, 1986, second edition). I will not replicate these here, but will show an alternative formulation as an identity of Mellin transforms, from which Riemann’s functional equation (6) follows as a special case.

As an example of the use of the functional equation to derive properties of the zeta function on the left half-plane {\Re(s)\le0}, we evaluate {\zeta(0)}. Using the special value {\Gamma(1/2)=\sqrt{\pi}} and the fact that {\zeta(s)} has a pole of residue {1} at {s=1}, we see that {\Lambda(s)\sim1/(s-1)} as {s} approaches {1}. Similarly, using the fact that the gamma function has a pole of residue {1} at {s=0}, we see that {\Lambda(s)\sim2\zeta(0)/s} as {s} approaches {0}. Putting these limits into the functional equation gives

\displaystyle  \zeta(0)=-1/2.

Similarly, the functional equation expresses the values of {\zeta} at negative odd integers in terms of its values at positive even integers. For example, taking {s=-1},

\displaystyle  \pi^{1/2}\Gamma(-1/2)\zeta(-1)=\pi^{-1}\Gamma(1)\zeta(2).

Plugging in the values {\Gamma(-1/2)=-2\sqrt{\pi}}, {\Gamma(1)=1} and Euler’s value of {\zeta(2)=\pi^2/6},

\displaystyle  \zeta(-1)=-\frac{1}{12}.

Via a process known as zeta function regularization, these special values of {\zeta} are sometimes written as the famous, but rather confusing, expressions

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle 1+1+1+1+\cdots = -\frac12,\smallskip\\ &\displaystyle 1+2+3+4+\cdots = -\frac1{12}. \end{array}

Next, using standard properties of the gamma function, theorem 1 can be used to investigate the zeros of the zeta function. The Euler product implies that {\zeta(s)} has no zeros on {\Re(s) > 0}, as I will show below, and it is well known that the gamma function has no zeros at all. So, {\Lambda(s)} has no zeros or poles anywhere on {\Re(s) > 1}, and the functional equation extends this statement to {\Re(s) < 0}. It follows that, on {\Re(s) < 0}, the zeros of the zeta function must cancel with the poles of {\Gamma(s/2)}, which are at the negative even integers.

On the strip {0\le\Re(s)\le1}, the precise location of the zeros of {\zeta(s)} are not known. However, as {\Gamma(s/2)} has no poles or zeros on this domain (other than at {s=0}), they must coincide with the zeros of {\Lambda(s)}. From the definition (1) of the zeta function, it satisfies {\zeta(\bar s)=\overline{\zeta(s)}} (using a bar to denote complex conjugation). So, its zeros are preserved by reflection {s\mapsto\bar s} about the real line. Also, by the functional equation, they are preserved by the map {s\mapsto1-s} in the aforementioned strip. We have arrived at the following.

Lemma 2 The function {\zeta(s)} has zeros at the (strictly) negative even integers. The only remaining zeros lie in the vertical strip {0\le\Re(s)\le1} and are preserved by the maps

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle s\mapsto \bar s,\\ &\displaystyle s\mapsto 1-s. \end{array} (7)

The zeros at the negative even integers are called the trivial zeros of {\zeta}, with the remaining ones referred to as the non-trivial zeros. The domain {0\le\Re(s)\le1} is known as the critical strip. So, the non-trivial zeros of the Riemann zeta function are precisely those lying in the critical strip, and are the same as the zeros of the function {\Lambda(s)} defined by (5). The vertical line {\Re(s)=\frac12} lying along the center of the critical strip is called the critical line. Then, (7) says that the non-trivial Riemann zeta zeros are symmetric under reflection about both the real line and the critical line. The Riemann hypothesis, as originally conjectured by Riemann in his 1859 paper, states that the non-trivial zeros all lie on the critical line. This, however, remains unknown and is one of the great open problems of mathematics.

We will now move on to the alternative interpretation of the functional equation relating additive Fourier transforms with multiplicative transforms. We will use the following convention for the Fourier transform of a function {f\colon{\mathbb R}\rightarrow{\mathbb C}},

\displaystyle  \hat f(y)=\int_{-\infty}^\infty e^{-2\pi ixy}f(x)\,dx. (8)

For this to make sense, it should at least be required that {f} is integrable. I will restrict to the particularly nice class of Schwartz functions. These are the infinitely differentiable functions from {{\mathbb R}} to {{\mathbb C}} which vanish faster than polynomially at infinity, along with their derivatives to all orders. That is, {x^rf^{(s)}(x)\rightarrow0} as {\lvert x\rvert\rightarrow\infty}, for all integers {r,s\ge0}. Denote the space of Schwartz functions by {\mathcal S}. Schwartz functions are integrable and it is known that their Fourier transforms are again in {\mathcal{S}}. Then, for any {f\in\mathcal S}, its Fourier transform is inverted by

\displaystyle  f(x)=\int_{-\infty}^\infty e^{2\pi ixy}\hat f(y)\,dy.

I’ll explain now why the Fourier transform (8) is an additive transform of {f}. For each fixed {y}, the map {x\mapsto e^{2\pi ixy}} is a continuous homomorphism from the additive group of real numbers to the multiplicative group of nonzero complex numbers {{\mathbb C}^*},

\displaystyle  e^{2\pi i(x_1+x_2)y}=e^{2\pi ix_1y}e^{2\pi ix_2y}.

So, {x\mapsto e^{2\pi ixy}} is a character of the reals under addition. Furthermore, integration is invariant under additive translation,

\displaystyle  \int_{-\infty}^\infty f(x)\,dx=\int_{-\infty}^\infty f(x+a)\,dx.

That is, the standard (Riemann or Lesbesgue) integral is the Haar measure of the additive group of reals, and the Fourier transform (8) is the integral of {f(x)} against additive characters with respect to the additive Haar measure.

The Mellin transform of {f\colon{\mathbb R}\rightarrow{\mathbb C}} is

\displaystyle  M(f,s)=\int_{-\infty}^\infty f(x)\lvert x\rvert^{s-1}\,dx, (9)

which is defined for any {s\in{\mathbb C}} for which the integral is absolutely convergent. (This differs slightly from the usual definition where a lower limit of {0} is used for the integral. See the note on Mellin transforms at the end of this post.) Now, the map {x\mapsto\lvert x\rvert^{s}} is a continuous homomorphism from the multiplicative group of nonzero reals {{\mathbb R}^*} to {{\mathbb C}^*},

\displaystyle  \lvert x_1x_2\rvert^s=\lvert x_1\rvert^s\lvert x_2\rvert^s.

Denoting {d^*x=dx/\lvert x\rvert}, integration with respect to {d^*x} is invariant under multiplicative rescaling by any {a\in{\mathbb R}^*},

\displaystyle  \int_{-\infty}^{\infty} f(ax)\,d^*x=\int_{-\infty}^\infty f(x)\,d^*x.

That is, {\int\cdot\,d^*x} is the multiplicative Haar measure on {{\mathbb R}^*}, and the Mellin transform is the integral of {f(x)} against multiplicative characters with respect to the multiplicative Haar measure. This explains why the Fourier transform (8) is additive and the Mellin transform (9) is multiplicative.

For an explicit example of a Mellin transform, consider {f(x)=e^{-\pi x^2}},

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle M(f,s)&\displaystyle=\int_{-\infty}^\infty \lvert x\rvert^{s-1}e^{-\pi x^2}\,dx\smallskip\\ &\displaystyle=\int_0^\infty \pi^{-s/2}y^{s/2-1}e^{-y}\,dy. \end{array}

Here, the substitution {y=\pi x^2} was used. Comparing with the definition (4) of the gamma function,

\displaystyle  M(f,s)=\pi^{-s/2}\Gamma(s/2). (10)

For Schwartz functions, the integral defining the Mellin transform is absolutely convergent on the right half-plane {\Re(s) > 0}, and can be analytically continued to the entire complex plane.

Theorem 3 If {f\in\mathcal{S}}, then {M(f,s)} is well defined over {\Re(s) > 0}, and uniquely extends to a meromorphic function on {{\mathbb C}} with only simple poles at the non-positive even numbers {-2n}, with residue

\displaystyle  {\rm Res}({M(f,\cdot)},-2n)=2\frac {f^{(2n)}(0)}{(2n)!}.

I’ll give a proof of theorem 3 below. For now, we will move straight on to the statement of the functional equation relating the Mellin transform of {f} to that of its Fourier transform {\hat f}.

Theorem 4 If {f\in\mathcal{S}} then {M(f,s)\zeta(s)} extends to a meromorphic function with only simple poles at {0} and {1} of residue {-f(0)} and {\hat f(0)} respectively. The functional equation

\displaystyle  M(f,s)\zeta(s)=M(\hat f,1-s)\zeta(1-s) (11)

holds everywhere.

A proof of this will be given further down. It can be shown that the specific case {f(x)=e^{-\pi x^2}} is equal to its own Fourier transform, {\hat f=f}. So, using expression (10) for the Mellin transform {M(f,s)}, we see that Riemann’s functional equation follows directly from (11).

Above, we discussed how Riemann’s functional equation allows values of {\zeta(s)} to be determined on the left half-plane {\Re(s)\le0} and restricts the locations of its zeros to be as described in lemma 2. This made use of properties of the gamma function, specifically the locations of its poles and the fact that it has no zeros. These arguments can be made by instead using version (11) of the functional equation, and the gamma function need not be referred to at all. For an arbitrary smooth function {f}, theorem 3 gives the poles of {M(f,s)}. Also, by choosing {f} with compact support in {{\mathbb R}^*}, {M(f,s)} will be well-defined everywhere by (9) and is analytic. It is also easy to choose {f} such that the Mellin transform does not vanish at any specified point, which is enough to apply the arguments above.

I now briefly consider the relation between the functional equation in the form (11) and the ideas of John Tate’s 1950 thesis. Theorem 4 can be viewed primarily as relating the Mellin transform of the Fourier transform to the Mellin transform of the original function. The zeta function plays more of an ancillary role as a multiplicative factor in this identity. This treatment of the Mellin transform as the primary object of interest was taken much further in Tate’s thesis. Tate refocussed attention from the rational and real number fields (or algebraic number field) to the larger ring of adeles, {\mathbb A}. This is outside of the scope of this post, but the important properties are that, just like the embedding of the rationals inside the reals, {{\mathbb Q}} embeds in {\mathbb A}, and the theory of Fourier and Mellin transforms extends to functions defined on the adeles. In the adelic case, he obtained the functional equation

\displaystyle  M(f,s)=M(\hat f,1-s). (12)

Now, the zeta function does not appear at all! Digging a bit deeper, the ring of adeles over the rational numbers can be expressed as a product of the reals and the ring of `finite’ adeles,

\displaystyle  \mathbb A ={\mathbb R}\times\mathbb A_f.

That is, every element {x} of the adelic ring can be expressed as a pair {(x_\infty,x_f)} consisting of a real number {x_\infty} and a finite adele {x_f}. For a function {f\colon\mathbb A\rightarrow{\mathbb C}} which is a product of the real and finite parts, {f(x)=g(x_\infty)h(x_f)}, the Mellin and Fourier transforms are also products of the transforms of the individual components.

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle \hat f(x)=\hat g(x_\infty)\hat h(x_f),\smallskip\\ &\displaystyle M(f,s)=M(g,s)M(h,s). \end{array}

Applying this to the adelic functional equation (12),

\displaystyle  M(g,s)M(h,s)=M(\hat g,1-s)M(\hat h,1-s).

Just as the special case {g(x)=e^{-\pi x^2}} lead to the gamma factor in Riemann’s functional equation, choosing a particular example for the function {h} on the finite adeles which equals its own Fourier transform leads to the appearance of the zeta function in (5) and (11). This places the gamma term and the zeta term in the functional equation on a roughly equal footing.

We can go further. The finite adeles can be broken down into a restricted product of fields corresponding to each prime number — the {p}-adic numbers,

\displaystyle  \mathbb A_f = {\prod_p}^\prime{\mathbb Q}_p.

In the case where the function {h(x_f)} factors into a product of functions on the {p}-adic components, {h(x_f)=\prod_ph_p(x_p)}, then the Mellin transform commutes with this factorisation,

\displaystyle  M(h,s)=\prod_pM(h_p,s).

For a particular choice of {h}, specifically the indicator function of the integral adeles, this is just the Euler product described above. From this viewpoint, the gamma factor in Riemann’s functional equation, the Mellin transform appearing in (11), the Riemann zeta function, and the {(1-p^{-s})^{-1}} terms in the Euler product, are all just manifestations of factorizations of the Mellin transform on the adeles, which itself satisfies the functional equation (12).

Finally, we note that there is an intimate connection between additive and multiplicative structures pervading the above discussion. The natural numbers, which are generated under addition by the unit element {1}, are also generated under multiplication by the prime numbers. This is reflected in the definition of the zeta function as a sum over the natural numbers (1), which is equivalent to the multiplicative definition given by the Euler product (2). Then, the functional equation (11) ties together the additive Fourier transform over {{\mathbb R}} with the multiplicative Mellin transform over {{\mathbb R}^*}.


Elementary Inequalities

Above, I stated a few results but, now, let’s move on and actually prove a few things regarding the Riemann zeta function. As this post is not assuming any prior understanding of {\zeta(s)}, I start at a very basic level and will derive a few elementary inequalities. By elementary, I mean things which can be proved straight from the definition (1) of the zeta function. These will be rather basic and far from optimal — especially in the critical strip — but are easy to prove and give some understanding of what the zeta function looks like.

First, for any positive real {s}, the function {x\mapsto x^{-s}} is decreasing, giving

\displaystyle  (n+1)^{-s} < x^{-s} < n^{-s}

for any positive integer {n} with {n < x < n +1}. Integrating over {x\ge1} and substituting in the definition of {\zeta(s)} for {s > 1},

\displaystyle  \zeta(s)-1=\sum_{n=1}^\infty(n+1)^{-s} < \int_1^\infty x^{-s}\,dx < \sum_{n=1}^\infty n^{-s}=\zeta(s).

Substituting in the value {1/(s-1)} for the integral gives the following bounds.

Lemma 5 The sum (1) converges absolutely at all real {s > 1}, and satisfies the bound

\displaystyle  \frac1{s-1} < \zeta(s) < \frac s{s-1}. (13)

In particular, {\zeta(s)\sim1/(s-1)} as {s} approaches {1} from above.

Moving on to {s\in{\mathbb C}}, we can use the identity {\lvert x^s\rvert=x^\sigma}, where {\sigma} is the real part of {s}, to write,

\displaystyle  \left\lvert\zeta(s)-1\right\rvert\le\sum_{n=2}^\infty\lvert n^{-s}\rvert =\sum_{n=2}^\infty n^{-\sigma}.

Comparing the right hand side with the definition of {\zeta}, we get,

Lemma 6 The sum (1) converges absolutely at all {s\in{\mathbb C}} with {\Re(s) > 1} and satisfies the bound

\displaystyle  \lvert\zeta(s)-1\rvert\le\zeta(\sigma)-1 < \frac1{\sigma-1}

where {\sigma=\Re(s)}.

The right-hand inequality here is just an application of (13). In particular, lemma 6 implies that {\zeta(s)} is uniformly bounded on the half-plane {\Re(s)\ge\sigma_0}, any {\sigma_0 > 1}, with the bound {\sigma_0/(\sigma_0-1)}. It also shows that {\zeta(s)\rightarrow1} uniformly as {\Re(s)\rightarrow\infty}.

Next, the Euler product expansion (2) can be utilized to show that {\zeta} has no zeros on the open right half-plane {\Re(s) > 1}. Applying the inequality {\lvert 1-x\rvert^{-1} > 1-\lvert x\rvert}, which applies for all {0 < \lvert x\rvert < 1},

\displaystyle  \lvert\zeta(s)\rvert > \prod_p(1-\lvert p^{-s}\rvert)=\prod_p(1-p^{-\sigma})

with {\sigma=\Re(s)}. Noting that the right hand side is just the reciprocal of the Euler product of {\zeta(\sigma)} gives a lower bound.

Lemma 7 The zeta function {\zeta(s)} is nonzero everywhere on the domain {\Re(s) > 1} and satisfies the lower bound,

\displaystyle  \lvert\zeta(s)\rvert > \zeta(\sigma)^{-1} > 1-\sigma^{-1} (14)

with {\sigma=\Re(s)}.

The right-hand inequality here is another direct application of (13). Lemma 7 shows that {\zeta(s)} is uniformly bounded away from zero on the half-plane {\Re(s)\ge\sigma_0}, for any {\sigma_0 > 1}.

Expression (3) for the logarithm of the zeta function can be used to obtain further bounds. On the half-plane {\Re(s)=\sigma > 1}, we use {\lvert p^{-ks}\rvert=p^{-k\sigma}},

\displaystyle  \lvert\log\zeta(s)\rvert\le\sum_p\sum_{k=1}^\infty\frac{p^{-k\sigma}}{k}=\log\zeta(\sigma).

So, we have obtained the following.

Lemma 8 The logarithm of the zeta function over {\Re(s) = \sigma > 1} satisfies the bound

\displaystyle  \lvert\log\zeta(s)\rvert\le\log\zeta(\sigma) < \log(1-\sigma^{-1})^{-1}.

Applying this bound to {-\log\lvert\zeta(s)\rvert} gives (14) as a special case.

Using {\lfloor\cdot\rfloor} to denote the floor function, we can use the equality {\lfloor x\rfloor^{-s}=n^{-s}} for {n\le x < n+1} and each positive integer {n} to rewrite the summation (1) as an integral

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle \zeta(s)&\displaystyle=\int_1^\infty\lfloor x\rfloor^{-s}\,dx\smallskip\\ &\displaystyle=\int_1^\infty x^{-s}\,dx +\int_1^\infty(\lfloor x\rfloor^{-s}-x^{-s})\,dx. \end{array}

Substituting in the value {1/(s-1)} for the first integral on the right hand side gives the following expression for the zeta function,

\displaystyle  \zeta(s)=\frac1{s-1}+\int_1^\infty\left(\lfloor x\rfloor^{-s}-x^{-s}\right)\,dx. (15)

The idea is of that the integrand here is small in comparison to {x^{-s}}, so that we can expect it to converge on a larger domain than the sum (1). In fact, as we will show, it is absolutely integrable on {\Re(s) > 0}. As uniform limits of analytic functions are analytic, this will extend {\zeta(s)-1/(s-1)} to an analytic function on this domain.

To bound the integrand in (15), note that {x^{-s}} has the derivative {-sx^{-s-1}} with respect to {x}. Using {\sigma=\Re(s)}, this has norm {\lvert s\rvert x^{-\sigma-1}} and, as it is decreasing in {x}, is bounded above by {\lvert s\rvert\lfloor x\rfloor^{-\sigma-1}}. Hence, the mean value theorem gives the inequality

\displaystyle  \left\lvert\lfloor x\rfloor^{-s}-x^{-s}\right\rvert\le\lvert s\rvert\lfloor x\rfloor^{-\sigma-1}(x-\lfloor x\rfloor),

which will be strict whenever {x} is not an integer. For any positive integer {n}, we have {\lfloor x\rfloor=n} on the interval {[n,n+1)} and,

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle \int_n^{n+1}\left\lvert\lfloor x\rfloor^{-s}-x^{-s}\right\rvert\,dx &\displaystyle < \lvert s\rvert n^{-\sigma-1}\int_n^{n+1}(x-n)\,dx\smallskip\\ &\displaystyle=\frac12\lvert s\rvert n^{-\sigma-1}. \end{array}

Summing over {n} and comparing with the definition of {\zeta(\sigma+1)} gives a finite value, so we have proved the following.

Lemma 9 The zeta function {\zeta(s)} extends to a meromorphic function on {\Re(s) > 0} with a simple pole of residue {1} at {s=1}, and is given by the absolutely convergent integral (15). Furthermore, on this domain, it satisfies the bound

\displaystyle  \left\lvert\zeta(s)-\frac1{s-1}\right\rvert < \frac12\lvert s\rvert\zeta(\sigma+1) < \frac12\lvert s\rvert(1+\sigma^{-1}) (16)

with {\sigma=\Re(s)}.

The final inequality here is yet another application of (13). So, we have a bound for {\zeta(s)} of size {O(\lvert s\rvert)} on the right half-plane {\Re(s)\ge\sigma_0}, for any {\sigma_0 > 0}. This is by no means optimal, and it can be improved to {O(\lvert s\rvert^{1/2})}, with even better bounds given by the — as yet — unproven Lindelöf hypothesis.

Applying inequality (16) for real {s} in the interval {0 < s < 1} shows that {\zeta(s)} does not vanish,

\displaystyle  2\zeta(s) < \frac2{s-1}+s(1+s^{-1})=\frac{-1-s^2}{1-s}.

Corollary 10 {\zeta(s) < -1/2}, so is nonzero, on the real line segment {0 < s < 1}.

Interestingly, this bound is optimal, as {\zeta(0)=-1/2}.


Elementary Extension of the Zeta Function

I will now describe an elementary method of analytically continuing the Riemann zeta function to the entire complex plane. Nothing in this section is required for the results discussed above, so can be skipped if required. The reason for including it here is to gain an intuitive understanding why the definition (1) of {\zeta(s)} given on the half-plane {\Re(s) > 1} should continue to all of {{\mathbb C}}, without using any `magic’ formulas such as Poisson summation or the functional equation. Instead, we can use the Euler-Maclaurin formula. Rather than just stating and applying this equation, I will derive it, as it is straightforward to do and gives a better understanding of why the zeta function necessarily extends to the complex plane.

We can apply a similar ideas to that which was used to express {\zeta(s)} over {\Re(s) > 0} by identity (15). To do this in more generality, I will look at a sum {\sum_nf(n)} for a smooth function {f}. Some assumptions will be required on {f} in order that the sums and integrals converge, so suppose that it is smooth and that its derivatives to all orders are integrable over {[1,\infty)}. This is the case for the Riemann zeta function where {f(x)=x^{-s}}.

For a differentiable function {u} defined on the interval {[0,1]}, an integration by parts gives

\displaystyle  u(0)=\int\limits_0^1u(x)\,dx+(1+c)(u(0)-u(1))+\int\limits_0^1(x+c)u^\prime(x)\,dx,

for any constant {c}. The idea is to replace {u(x)} by {f(n+x)} in this identity and sum over {n}. In order that {x+c} has average value {0} over the unit interval, we will take {c=-1/2}. So, setting {p_1(x)=x-1/2},

\displaystyle  \sum_{n=1}^\infty f(n)=\int\limits_1^\infty f(x)\,dx+p_1(1)f(1)+\int\limits_1^\infty p_1(\{x\})f^\prime(x)\,dx (17)

with {\{x\}} denoting the fractional part of {x}. That is {\{x\}=x-n} on the interval {n\le x < n+1}. The hope here is that {f^\prime} is sufficiently smaller then {f}, so that the right-hand integral converges even when the sum on the left diverges.

We take this a step further and express the integral over {f^\prime} as an integral over {f^{\prime\prime}}. Again, consider a function {u} defined on the unit interval and, choosing {p_2} to be the integral of {p_1}, another integration by parts gives

\displaystyle  \int\limits_0^1p_1(x)u(x)\,dx=p_2(1)u(1)-p_2(0)u(0)-\int\limits_0^1p_2(x)u^\prime(x)\,dx.

As {p_1} has zero integral, {p_2(1)=p_2(0)}, so replacing {u(x)} by {f^\prime(x+n)} and summing over {n},

\displaystyle  \int\limits_1^\infty p_1(\{x\})f^\prime(x)\,dx=-p_2(1)f^{\prime}(1)-\int\limits_1^\infty p_2(\{x\})f^{\prime\prime}(x)\,dx.

Again, {p_2} is only defined up to an arbitrary constant, so can be chosen to have zero integral over the unit interval.

We repeat this procedure {r} times and substitute into (17),

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle \sum_{n=1}^\infty f(n)=&\displaystyle\int\limits_1^\infty f(x)\,dx+p_1(1)f(1)-p_2(1)f^\prime(1)+\cdots\smallskip\\ &\displaystyle\quad-(-1)^rp_r(1)f^{(r-1)}(1)-(-1)^r\int\limits_1^\infty p_r(\{x\})f^{(r)}(x)\,dx. \end{array} (18)

Here, {p_{k+1}} is defined as the integral of {p_k} with constant of integration chosen such that it has zero integral over the unit interval,

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle p_1(x)=x-\frac12,\smallskip\\ &\displaystyle p_{k+1}(x)=\int_0^1yp_k(y)\,dy-\int_x^1p_k(y)\,dy. \end{array}

From this definition, it can be seen that {p_k(x)=B_k(x)/k!} where {B_k(x)} are the Bernoulli polynomials, {p_k(1)=B_k/k!} for Bernoulli numbers {B_k}, and (18) is the Euler-Maclaurin formula.

In particular, the derivatives of {f(x)=x^{-s}} can be computed as

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle f^{(k)}(x)&\displaystyle=(-1)^ks(s+1)\cdots(s+k-1)x^{-s-k}\smallskip\\ &\displaystyle=(-1)^ks^{\overline k}x^{-s-k} \end{array}

with {s^{\overline k}} denoting the rising factorial, which is just a polynomial in {s}.

Taking {f(x)=x^{-s}} for {\Re(s) > 1}, the left hand side of identity (18) is {\zeta(s)} and the first integral on the right is {1/(s-1)}. Applying this to definition (1) of the zeta function,

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle \zeta(s)=&\displaystyle\frac1{s-1}+p_1(1)s^{\overline 0}+p_2(1)s^{\overline 1}+\cdots\smallskip\\ &\displaystyle\quad+p_r(1)s^{\overline{r-1}}-s^{\overline r}\int\limits_1^\infty p_r(\{x\})x^{-s-r}\,dx. \end{array} (19)

As the integral on the right converges absolutely on {\Re(s) > -r-1}, and {r} is an arbitrary positive integer, we have the analytic extension.

Theorem 11 The function {\zeta(s)} defined on {\Re(s) > 1} by (1) continues to a meromorphic function on {{\mathbb C}} with a single simple pole at {s=1} of residue {1}.

Furthermore, the integral on the right of (19) is uniformly bounded over {\Re(s)\ge\alpha}, any {\alpha > -r-1}, and the {s^{\overline k}} terms are polynomials, so we have the following bound on the growth of the zeta function.

Lemma 12 For every real number {\alpha}, there exists an {A} such that

\displaystyle  \zeta(s)=O\left(\lvert s\rvert^A\right)

over {\Re(s)\ge\alpha}, as {\lvert s\rvert\rightarrow\infty}.

I purposefully did not put in any specific value for {A} here, as the point is that the zeta function is polynomially bounded on each right half-plane and, in any case, there are more optimal values available from applying the functional equation.


Mellin Transforms

We show that the Mellin transform of a Schwartz function {f\in\mathcal S} can be continued from the region {\Re(s) > 0} to the entire complex plane, proving theorem 3. Choosing a positive integer {N}, write

\displaystyle  R_f(x) = f(x)-1_{\{\lvert x\rvert < 1\}}\sum_{n=0}^{N-1} \frac{f^{(n)}(0)}{n!}x^n.

This is bounded, and for {\lvert x\rvert < 1} is just the remainder term in the Taylor polynomial approximation of {f}. By Taylor’s theorem, {R_f(x)=O(\lvert x\rvert^N)} as {x} approaches {0}. So, {R_f(x)\lvert x\rvert^{s-1}} is absolutely integrable on {\Re(s) > -N} and, hence, {M(R_f,s)} is a well-defined analytic function on this domain. The transform of the polynomial term can be computed,

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle M(1_{\{\lvert x\rvert < 1\}}x^n,s) &\displaystyle= \int_{-1}^1 x^n\lvert x\rvert^{s-1}\,dx\smallskip\\ &\displaystyle= (1+(-1)^n)\int_0^1x^{s+n-1}\,dx\smallskip\\ &\displaystyle=\frac{1+(-1)^n}{s+n}. \end{array}

The Mellin transform of {f} is then,

\displaystyle  M(f,s)=M(R_f,s)+2\sum_{n=0}^{N-1}1_{\{n{\rm\ is\ even}\}}\frac{f^{(n)}(0)}{n!}\frac1{s+n}.

This statement holds for {\Re(s) > 0} but, as the right hand side is a well-defined meromorphic function on {\Re(s) > -N}, it extends {M(f,s)} to a meromorphic function on this domain. The poles arise from the {1/(s+n)} terms with the residue stated in theorem 3. By choosing {N} arbitrarily large, we have the extension to the complex plane.


Poisson Summation

The second proof of the functional equation given by Riemann in his 1859 paper made use of the following identity of Jacobi,

\displaystyle  2\sum_{n=1}^\infty e^{-n^2\pi x}+1=x^{-\frac12}\left(2\sum_{n=1}^\infty e^{-n^2\pi/x}+1\right).

For the Mellin transform version of the functional equation, we make use of the Poisson summation formula. To avoid having to explicitly write limits everywhere, the notation {\sum_n} is used to denote the sum as {n} ranges over the integers {{\mathbb Z}}.

Theorem 13 If {f\in \mathcal S} has Fourier transform {\hat f} then,

\displaystyle  \sum_n f(n)=\sum_n\hat f(n). (20)

Jacobi’s identity is just a special case of this using {f(u)=e^{-u^2\pi x}}. The Poisson summation formula can be proved using Fourier series. The idea is that, for any Schwartz function {f}, we can define a periodic {g\colon{\mathbb R}\rightarrow{\mathbb C}} by

\displaystyle  g(x)=\sum_nf(x+n) (21)

Since {f} and its derivatives vanish rapidly at {\infty}, this sum is uniformly convergent, with smooth limit. Writing out its Fourier expansion,

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle g(x)=\sum_nc_ne^{2\pi inx}\smallskip\\ &\displaystyle c_n=\int_0^1 g(x)e^{-2\pi i nx}\,dx, \end{array}

the Fourier coefficients can be evaluated,

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle c_n &\displaystyle=\sum_m\int_0^1f(x+m)e^{-2\pi i n x}dx\smallskip\\ &\displaystyle=\sum_m\int_m^{m+1}f(x)e^{-2\pi i n x}\,dx\smallskip\\ &\displaystyle=\int f(x)e^{-2\pi i nx}\,dx=\hat f(n). \end{array}

Substituting into (21) proves theorem 13.

\displaystyle  \sum_nf(n)=g(0)=\sum_n c_n=\sum_n\hat f(n).

In practise, it is convenient to express the Poisson summation formula in a slightly more general way. For each fixed {x\in{\mathbb R}^*}, the Fourier transform of {y\mapsto f(yx)} is equal to {\lvert x\rvert^{-1}\hat f(y/x)} and, putting this in (20), gives the following alternative statement of Poisson summation.

Theorem 14 If {f\in \mathcal S} has Fourier transform {\hat f} then, for any {x\in{\mathbb R}^*},

\displaystyle  \sum_n f(nx)=\frac1{\lvert x\rvert}\sum_n\hat f(n/x) (22)

The Functional Equation

The proof of the functional equation starts with the following identity

\displaystyle  \int f(nx)\lvert x\rvert^s\,d^*x = \lvert n\rvert^{-s}\int f(x)\lvert x\rvert^s\,d^*x.

Here, {n} is any nonzero integer, {f} is a Schwartz function on the reals, {\Re(s) > 0}, and {d^*x=dx/\lvert x\rvert} represents the Haar measure on the multiplicative group {{\mathbb R}^*}. The identity is achieved simply by substituting {x} with {x/n}. Restricting to {\Re(s) > 1}, we can sum over {n},

\displaystyle  \int \sum_{n\not=0}f(nx)\lvert x\rvert^s\,d^*x = 2M(f,s)\zeta(s). (23)

What we would really like to do here is to simply substitute in (22) for the sum of {f(nx)}, substitute {x} by {x^{-1}} in the integral, and immediately derive the functional equation (11). Unfortunately this leads to divergent sums and integrals. Instead, start by rearranging (22) as

\displaystyle  \sum_{n\not=0}f(nx)=\frac1{\lvert x\rvert}\sum_{n\not=0}\hat f(n/x) + \frac1{\lvert x\rvert}\hat f(0)-f(0).

We will apply this to the integrand in (23), but only over the range with {\lvert x\rvert < 1}.

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle \int\limits_{\lvert x\rvert < 1}\sum_{n\not=0}f(nx)\lvert x\rvert^s d^*x&\displaystyle=\int\limits_{\lvert x\rvert < 1}\left(\sum_{n\not=0}\hat f(n/x)\lvert x\rvert^{s-1}+\hat f(0)\lvert x\rvert^{s-1}-f(0)\lvert x\rvert^s\right)\,d^*x\smallskip\\ &\displaystyle=\int\limits_{\lvert x\rvert > 1}\sum_{n\not=0}\hat f(nx)\lvert x\rvert^{1-s}\,d^*x+2\frac{\hat f(0)}{s-1}-2\frac{f(0)}{s} \end{array}

Here, we substituted {x^{-1}} for {x} in the first term on the right hand side, and used the exact value for the integral in the other two terms. Using this in (23),

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle M(f,s)\zeta(s)=&\displaystyle\frac12\int\limits_{\lvert x\rvert > 1}\sum_{n\not=0}\left(f(nx)\lvert x\rvert^s+\hat f(nx)\lvert x\rvert^{1-s}\right)\,d^*x\smallskip\\ &\displaystyle\qquad+\frac{\hat f(0)}{s-1}-\frac{f(0)}{s}. \end{array} (24)

As {f\in\mathcal S} vanishes faster than any power of {x} at infinity, the sum {\sum_{n\not=0}f(nx)} is absolutely convergent and also vanishes faster than any power of {x}. The same statement holds for {\hat f}, so the integral in (24) is defined for all {s\in{\mathbb C}} and is analytic. This extends {M(f,s)\zeta(s)} to a meromorphic function on the complex plane with poles and residues as stated in theorem 4. Finally, noting that the Fourier transform of {\hat f(x)} is {f(-x)}, the right hand side of (24) is unchanged if {s} is replaced by {1-s} and {f} is replaced by {\hat f}, proving the functional equation (11).


A Note on Mellin Transforms

The Mellin transform was defined above as an integral over the real numbers (9), which deviates slightly from the more usual definition as an integral over the positive reals,

\displaystyle  M_{{\mathbb R}^+}(f,s)=\int_0^\infty f(x) x^{s-1}\,dx.

The reason for using the alternative definition is that we were interested in the functional equation relating it to the Fourier transform defined over the reals, so also required the Mellin transform to be defined with the same domain of integration. However, in doing so, we lose some properties, such as the existence of an inversion formula which, for the usual Mellin transform, is

\displaystyle  f(x)=\frac1{2\pi i}\int\limits_{c-i\infty}^{c+i\infty}M_{{\mathbb R}^+}(f,s) x^{-s}\,ds,

for any fixed {c} in the domain where the Mellin transform is absolutely integrable.

The transform defined by (9) is unchanged if {f(x)} is replaced by {f(-x)}, so is not one-to-one and cannot be inverted. The best that can be done is to recover the even part of {f},

\displaystyle  f(x)+f(-x)=\frac1{2\pi i}\int\limits_{c-i\infty}^{c+i\infty}M(f,s)\lvert x\rvert^{-s}\,ds.

An explanation for the non-invertibility of the Mellin transform defined over {{\mathbb R}^*} is that we did not consider the full set of characters. We only looked at characters of the form {x\mapsto\lvert x\rvert^{s}} but, for example, this excludes the function {{\rm sgn}(\cdot)} mapping positive reals to {1} and negative reals to {-1}. More generally, for any {s\in{\mathbb C}} and {\epsilon=\pm1}, a character {\chi_{\epsilon,s}} can be defined by

\displaystyle  \chi_{\epsilon,s}(x)=\begin{cases} \lvert x\rvert^s,&{\rm if\ }\epsilon=1,\\ {\rm sgn}(x)\lvert x\rvert^s,&{\rm if\ }\epsilon=-1. \end{cases} (25)

It is immediate that this is a continuous map from the nonzero reals to {{\mathbb C}^*} satisfying {\chi_{\epsilon,s}(xy)=\chi_{\epsilon,s}(x)\chi_{\epsilon,s}(y)}. In fact, it can shown that (25) gives the full set of characters on {{\mathbb R}^*} . The characters given by {\epsilon=1}, which we made use of in the discussion above, are precisely those which are trivial on the roots of unity {\{\pm1\}} and are called unramified characters. Those given by {\epsilon=-1} are called ramified.

The Mellin transform with respect to an arbitrary character {\chi\colon{\mathbb R}^*\rightarrow{\mathbb C}^*} is

\displaystyle  M(f,\chi)=\int_{-\infty}^\infty f(x)\chi(x)\,\frac{dx}{\lvert x\rvert}.

The transform defined using the full set of characters can be inverted,

\displaystyle  f(x)=\frac{1}{4\pi i}\int\limits_{c-i\infty}^{c+i\infty}\sum_{\epsilon=\pm1}M(f,\chi_{\epsilon,s})\chi_{\epsilon,s}(x)^{-1}\,ds.

The argument given above, including the proof of the functional equation (11), could have been carried out with the full set of characters. In that case, the zeta function is replaced by

\displaystyle  \zeta(\epsilon,s)=\frac12\sum_{n\not=0}\chi_{\epsilon,s}(n)^{-1}.

For the unramified characters, {\epsilon=1}, this is the usual Riemann zeta function. For ramified characters, {\zeta} equals {0} and the functional equation reduces to the trivial statement {0=0}. So, there was nothing to be gained by including ramified characters in the discussion.

The Gaussian Correlation Inequality

When I first created this blog, the subject of my initial post was the Gaussian correlation conjecture. Using {\mu_n} to denote the standard n-dimensional Gaussian probability measure, the conjecture states that the inequality

\displaystyle  \mu_n(A\cap B)\ge\mu_n(A)\mu_n(B)

holds for all symmetric convex subsets A and B of {{\mathbb R}^n}. By symmetric, we mean symmetric about the origin, so that {-x} is in A if and only {x} is in A, and similarly for B. The standard Gaussian measure by definition has zero mean and covariance matrix equal to the nxn identity matrix, so that

\displaystyle  d\mu_n(x)=(2\pi)^{-n/2}e^{-\frac12x^Tx}\,dx,

with {dx} denoting the Lebesgue integral on {{\mathbb R}^n}. However, if it holds for the standard Gaussian measure, then the inequality can also be shown to hold for any centered (i.e., zero mean) Gaussian measure.

At the time of my original post, the Gaussian correlation conjecture was an unsolved mathematical problem, originally arising in the 1950s and formulated in its modern form in the 1970s. However, in the period since that post, the conjecture has been solved! A proof was published by Thomas Royen in 2014 [7]. This seems to have taken some time to come to the notice of much of the mathematical community. In December 2015, Rafał Latała, and Dariusz Matlak published a simplified version of Royen’s proof [4]. Although the original proof by Royen was already simple enough, it did consider a generalisation of the conjecture to a kind of multivariate gamma distribution. The exposition by Latała and Matlak ignores this generality and adds in some intermediate lemmas in order to improve readability and accessibility. Since then, the result has become widely known and, recently, has even been reported in the popular press [10,11]. There is an interesting article on Royen’s discovery of his proof at Quanta Magazine [12] including the background information that Royen was a 67 year old German retiree who supposedly came up with the idea while brushing his teeth one morning. Dick Lipton and Ken Regan have recently written about the history and eventual solution of the conjecture on their blog [5]. As it has now been shown to be true, I will stop referring to the result as a `conjecture’ and, instead, use the common alternative name — the Gaussian correlation inequality.

In this post, I will describe some equivalent formulations of the Gaussian correlation inequality, or GCI for short, before describing a general method of attacking this problem which has worked for earlier proofs of special cases. I will then describe Royen’s proof and we will see that it uses the same ideas, but with some key differences. Continue reading “The Gaussian Correlation Inequality”

The Projection Theorems

In this post, I introduce the concept of optional and predictable projections of jointly measurable processes. Optional projections of right-continuous processes and predictable projections of left-continuous processes were constructed in earlier posts, with the respective continuity conditions used to define the projection. These are, however, just special cases of the general theory. For arbitrary measurable processes, the projections cannot be expected to satisfy any such pathwise regularity conditions. Instead, we use the measurability criteria that the projections should be, respectively, optional and predictable.

The projection theorems are a relatively straightforward consequence of optional and predictable section. However, due to the difficulty of proving the section theorems, optional and predictable projection is generally considered to be an advanced or hard part of stochastic calculus. Here, I will make use of the section theorems as stated in an earlier post, but leave the proof of those until after developing the theory of projection.

As usual, we work with respect to a complete filtered probability space {(\Omega,\mathcal{F},\{\mathcal{F}\}_{t\ge0},{\mathbb P})}, and only consider real-valued processes. Any two processes are considered to be the same if they are equal up to evanescence. The optional projection is then defined (up to evanescence) by the following.

Theorem 1 (Optional Projection) Let X be a measurable process such that {{\mathbb E}[1_{\{\tau < \infty\}}\lvert X_\tau\rvert\;\vert\mathcal{F}_\tau]} is almost surely finite for each stopping time {\tau}. Then, there exists a unique optional process {{}^{\rm o}\!X}, referred to as the optional projection of X, satisfying

\displaystyle  1_{\{\tau < \infty\}}{}^{\rm o}\!X_\tau={\mathbb E}[1_{\{\tau < \infty\}}X_\tau\,\vert\mathcal{F}_\tau] (1)

almost surely, for each stopping time {\tau}.

Predictable projection is defined similarly.

Theorem 2 (Predictable Projection) Let X be a measurable process such that {{\mathbb E}[1_{\{\tau < \infty\}}\lvert X_\tau\rvert\;\vert\mathcal{F}_{\tau-}]} is almost surely finite for each predictable stopping time {\tau}. Then, there exists a unique predictable process {{}^{\rm p}\!X}, referred to as the predictable projection of X, satisfying

\displaystyle  1_{\{\tau < \infty\}}{}^{\rm p}\!X_\tau={\mathbb E}[1_{\{\tau < \infty\}}X_\tau\,\vert\mathcal{F}_{\tau-}] (2)

almost surely, for each predictable stopping time {\tau}.

Continue reading “The Projection Theorems”

Pathwise Regularity of Optional and Predictable Processes

As I have mentioned before in these notes, when working with processes in continuous time, it is important to select a good modification. Typically, this means that we work with processes which are left or right continuous. However, in general, it can be difficult to show that the paths of a process satisfy such pathwise regularity. In this post I show that for optional and predictable processes, the section theorems introduced in the previous post can be used to considerably simplify the situation. Although they are interesting results in their own right, the main application in these notes will be to optional and predictable projection. Once the projections are defined, the results from this post will imply that they preserve certain continuity properties of the process paths.

Suppose, for example, that we have a continuous-time process X which we want to show to be right-continuous. It is certainly necessary that, for any sequence of times {t_n\in{\mathbb R}_+} decreasing to a limit {t}, {X_{t_n}} almost-surely tends to {X_t}. However, even if we can prove this for every possible decreasing sequence {t_n}, it does not follow that X is right-continuous. As a counterexample, if {\tau\colon\Omega\rightarrow{\mathbb R}} is any continuously distributed random time, then the process {X_t=1_{\{t\le \tau\}}} is not right-continuous. However, so long as the distribution of {\tau} has no atoms, X is almost-surely continuous at each fixed time t. It is remarkable, then, that if we generalise to look at sequences of stopping times, then convergence in probability along decreasing sequences of stopping times is enough to guarantee everywhere right-continuity of the process. At least, it is enough so long as we restrict consideration to optional processes.

As usual, we work with respect to a complete filtered probability space {(\Omega,\mathcal{F},\{\mathcal{F}_t\}_{t\ge0},{\mathbb P})}. Two processes are considered to be the same if they are equal up to evanescence, and any pathwise property is said to hold if it holds up to evanescence. That is, a process is right-continuous if and only is it is everywhere right-continuous on a set of probability 1. All processes will be taken to be real-valued, and a process is said to have left (or right) limits if its left (or right) limits exist everywhere, up to evanescence, and are finite.

Theorem 1 Let X be an optional process. Then,

  1. X is right-continuous if and only if {X_{\tau_n}\rightarrow X_\tau} in probability, for each uniformly bounded sequence {\tau_n} of stopping times decreasing to a limit {\tau}.
  2. X has right limits if and only if {X_{\tau_n}} converges in probability, for each uniformly bounded decreasing sequence {\tau_n} of stopping times.
  3. X has left limits if and only if {X_{\tau_n}} converges in probability, for each uniformly bounded increasing sequence {\tau_n} of stopping times.

The `only if’ parts of these statements is immediate, since convergence everywhere trivially implies convergence in probability. The importance of this theorem is in the `if’ directions. That is, it gives sufficient conditions to guarantee that the sample paths satisfy the respective regularity properties.

Note that conditions for left-continuity are absent from the statements of Theorem 1. In fact, left-continuity does not follow from the corresponding property along sequences of stopping times. Consider, for example, a Poisson process, X. This is right-continuous but not left-continuous. However, its jumps occur at totally inaccessible times. This implies that, for any sequence {\tau_n} of stopping times increasing to a finite limit {\tau}, it is true that {X_{\tau_n}} converges almost surely to {X_\tau}. In light of such examples, it is even more remarkable that right-continuity and the existence of left and right limits can be determined by just looking at convergence in probability along monotonic sequences of stopping times. Theorem 1 will be proven below, using the optional section theorem.

For predictable processes, we can restrict attention to predictable stopping times. In this case, we obtain a condition for left-continuity as well as for right-continuity.

Theorem 2 Let X be a predictable process. Then,

  1. X is right-continuous if and only if {X_{\tau_n}\rightarrow X_\tau} in probability, for each uniformly bounded sequence {\tau_n} of predictable stopping times decreasing to a limit {\tau}.
  2. X is left-continuous if and only if {X_{\tau_n}\rightarrow X_\tau} in probability, for each uniformly bounded sequence {\tau_n} of predictable stopping times increasing to a limit {\tau}.
  3. X has right limits if and only if {X_{\tau_n}} converges in probability, for each uniformly bounded decreasing sequence {\tau_n} of predictable stopping times.
  4. X has left limits if and only if {X_{\tau_n}} converges in probability, for each uniformly bounded increasing sequence {\tau_n} of predictable stopping times.

Again, the proof is given below, and relies on the predictable section theorem. Continue reading “Pathwise Regularity of Optional and Predictable Processes”

The Section Theorems

Consider a probability space {(\Omega,\mathcal{F},{\mathbb P})} and a subset S of {{\mathbb R}_+\times\Omega}. The projection {\pi_\Omega(S)} is the set of {\omega\in\Omega} such that there exists a {t\in{\mathbb R}_+} with {(t,\omega)\in S}. We can ask whether there exists a map

\displaystyle  \tau\colon\pi_\Omega(S)\rightarrow{\mathbb R}_+

such that {(\tau(\omega),\omega)\in S}. From the definition of the projection, values of {\tau(\omega)} satisfying this exist for each individual {\omega}. By invoking the axiom of choice, then, we see that functions {\tau} with the required property do exist. However, to be of use for probability theory, it is important that {\tau} should be measurable. Whether or not there are measurable functions with the required properties is a much more difficult problem, and is answered affirmatively by the measurable selection theorem. For the question to have any hope of having a positive answer, we require S to be measurable, so that it lies in the product sigma-algebra {\mathcal{B}({\mathbb R}_+)\otimes\mathcal{F}}, with {\mathcal{B}({\mathbb R}_+)} denoting the Borel sigma-algebra on {{\mathbb R}_+}. Also, less obviously, the underlying probability space should be complete. Throughout this post, {(\Omega,\mathcal{F},{\mathbb P})} will be assumed to be a complete probability space.

It is convenient to extend {\tau} to the whole of {\Omega} by setting {\tau(\omega)=\infty} for {\omega} outside of {\pi_\Omega(S)}. Then, {\tau} is a map to the extended nonnegative reals {\bar{\mathbb R}_+={\mathbb R}_+\cup\{\infty\}} for which {\tau(\omega) < \infty} precisely when {\omega} is in {\pi_\Omega(S)}. Next, the graph of {\tau}, denoted by {[\tau]}, is defined to be the set of {(t,\omega)\in{\mathbb R}_+\times\Omega} with {t=\tau(\omega)}. The property that {(\tau(\omega),\omega)\in S} whenever {\tau(\omega) < \infty} is expressed succinctly by the inclusion {[\tau]\subseteq S}. With this notation, the measurable selection theorem is as follows.

Theorem 1 (Measurable Selection) For any {S\in\mathcal{B}({\mathbb R}_+)\otimes\mathcal{F}}, there exists a measurable {\tau\colon\Omega\rightarrow\bar{\mathbb R}_+} such that {[\tau]\subseteq S} and

\displaystyle  \left\{\tau < \infty\right\}=\pi_\Omega(S). (1)

As noted above, if it wasn’t for the measurability requirement then this theorem would just be a simple application of the axiom of choice. Requiring {\tau} to be measurable, on the other hand, makes the theorem much more difficult to prove. For instance, it would not hold if the underlying probability space was not required to be complete. Note also that, stated as above, measurable selection implies that the projection of S is equal to a measurable set {\{\tau < \infty\}}, so the measurable projection theorem is an immediate corollary. I will leave the proof of Theorem 1 for a later post, together with the proofs of the section theorems stated below.

A closely related problem is the following. Given a measurable space {(X,\mathcal{E})} and a measurable function, {f\colon X\rightarrow\Omega}, does there exist a measurable right-inverse on the image of {f}? This is asking for a measurable function, {g}, from {f(X)} to {X} such that {f(g(\omega))=\omega}. In the case where {(X,\mathcal{E})} is the Borel space {({\mathbb R}_+,\mathcal{B}({\mathbb R}_+))}, Theorem 1 says that it does exist. If S is the graph {\{(t,f(t))\colon t\in{\mathbb R}_+\}} then {\tau} will be the required right-inverse. In fact, as all uncountable Polish spaces are Borel-isomorphic to each other and, hence, to {{\mathbb R}_+}, this result applies whenever {(X,\mathcal{E})} is a Polish space together with its Borel sigma-algebra. Continue reading “The Section Theorems”

Predictable Processes

In contrast to optional processes, the class of predictable processes was used extensively in the development of stochastic integration in these notes. They appeared as integrands in stochastic integrals then, later on, as compensators and in the Doob-Meyer decomposition. Since they are also central to the theory of predictable section and projection, I will revisit the basic properties of predictable processes now. In particular, any of the collections of sets and processes in the following theorem can equivalently be used to define the predictable sigma-algebra. As usual, we work with respect to a complete filtered probability space {(\Omega,\mathcal{F},\{\mathcal{F}_t\}_{t\in{\mathbb R}_+},{\mathbb P})}. However, completeness is not actually required for the following result. All processes are assumed to be real valued, or take values in the extended reals {\bar{\mathbb R}={\mathbb R}\cup\{\pm\infty\}}.

Theorem 1 The following collections of sets and processes each generate the same sigma-algebra on {{\mathbb R}_+\times\Omega}.

{{[\tau,\infty)}: {\tau} is a predictable stopping time}.

  • {Z1_{[\tau,\infty)}} as {\tau} ranges over the predictable stopping times and Z over the {\mathcal{F}_{\tau-}}-measurable random variables.
  • {\{A\times(t,\infty)\colon t\in{\mathbb R}_+,A\in\mathcal{F}_t\}\cup\{A\times\{0\}\colon A\in\mathcal{F}_0\}}.
  • The elementary predictable processes.
  • {{(\tau,\infty)}: {\tau} is a stopping time}{\cup}{{A\times\{0\}\colon A\in\mathcal{F}_0}}.

  • The left-continuous adapted processes.
  • The continuous adapted processes.
  • Compare this with the analogous result for sets/processes generating the optional sigma-algebra given in the previous post. The proof of Theorem 1 is given further below. First, recall that the predictable sigma-algebra was previously defined to be generated by the left-continuous adapted processes. However, it can equivalently be defined by any of the collections stated in Theorem 1. To make this clear, I now restate the definition making use if this equivalence.

    Definition 2 The predictable sigma-algebra, {\mathcal{P}}, is the sigma-algebra on {{\mathbb R}_+\times\Omega} generated by any of the collections of sets/processes in Theorem 1.

    A stochastic process is predictable iff it is {\mathcal{P}}-measurable.

    Continue reading “Predictable Processes”

    Optional Processes

    The optional sigma-algebra, {\mathcal{O}}, was defined earlier in these notes as the sigma-algebra generated by the adapted and right-continuous processes. Then, a stochastic process is optional if it is {\mathcal{O}}-measurable. However, beyond the definition, very little use was made of this concept. While right-continuous adapted processes are optional by construction, and were used throughout the development of stochastic calculus, there was no need to make use of the general definition. On the other hand, optional processes are central to the theory of optional section and projection. So, I will now look at such processes in more detail, starting with the following alternative, but equivalent, ways of defining the optional sigma-algebra. Throughout this post we work with respect to a complete filtered probability space {(\Omega,\mathcal{F},\{\mathcal{F}_t\}_{t\in{\mathbb R}_+},{\mathbb P})}, and all stochastic processes will be assumed to be either real-valued or to take values in the extended reals {\bar{\mathbb R}={\mathbb R}\cup\{\pm\infty\}}.

    Theorem 1 The following collections of sets and processes each generate the same sigma-algebra on {{\mathbb R}_+\times\Omega}.

    {{[\tau,\infty)}: {\tau} is a stopping time}.

  • {Z1_{[\tau,\infty)}} as {\tau} ranges over the stopping times and Z over the {\mathcal{F}_\tau}-measurable random variables.
  • The cadlag adapted processes.
  • The right-continuous adapted processes.
  • The optional-sigma algebra was previously defined to be generated by the right-continuous adapted processes. However, any of the four collections of sets and processes stated in Theorem 1 can equivalently be used, and the definitions given in the literature do vary. So, I will restate the definition making use of this equivalence.

    Definition 2 The optional sigma-algebra, {\mathcal{O}}, is the sigma-algebra on {{\mathbb R}_+\times\Omega} generated by any of the collections of sets/processes in Theorem 1.

    A stochastic process is optional iff it is {\mathcal{O}}-measurable.

    Continue reading “Optional Processes”

    Measurable Projection and the Debut Theorem

    I will discuss some of the immediate consequences of the following deceptively simple looking result.

    Theorem 1 (Measurable Projection) If {(\Omega,\mathcal{F},{\mathbb P})} is a complete probability space and {A\in\mathcal{B}({\mathbb R})\otimes\mathcal{F}} then {\pi_\Omega(A)\in\mathcal{F}}.

    The notation {\pi_B} is used to denote the projection from the cartesian product {A\times B} of sets A and B onto B. That is, {\pi_B((a,b)) = b}. As is standard, {\mathcal{B}({\mathbb R})} is the Borel sigma-algebra on the reals, and {\mathcal{A}\otimes\mathcal{B}} denotes the product of sigma-algebras.

    Theorem 1 seems almost obvious. Projection is a very simple map and we may well expect the projection of, say, a Borel subset of {{\mathbb R}^2} onto {{\mathbb R}} to be Borel. In order to formalise this, we could start by noting that sets of the form {A\times B} for Borel A and B have an easily described, and measurable, projection, and the Borel sigma-algebra is the closure of the collection such sets under countable unions and under intersections of decreasing sequences of sets. Furthermore, the projection operator commutes with taking the union of sequences of sets. Unfortunately, this method of proof falls down when looking at the limit of decreasing sequences of sets, which does not commute with projection. For example, the decreasing sequence of sets {S_n=(0,1/n)\times{\mathbb R}\subseteq{\mathbb R}^2} all project onto the whole of {{\mathbb R}}, but their limit is empty and has empty projection.

    There is an interesting history behind Theorem 1, as mentioned by Gerald Edgar on MathOverflow (1) in answer to The most interesting mathematics mistake? In a 1905 paper, Henri Lebesgue asserted that the projection of a Borel subset of the plane onto the line is again a Borel set (Lebesgue, (3), pp 191–192). This was based on the erroneous assumption that projection commutes with the limit of a decreasing sequence of sets. The mistake was spotted, in 1916, by Mikhail Suslin, and led to his investigation of analytic sets and to begin the study of what is now known as descriptive set theory. See Kanamori, (2), for more details. In fact, as was shown by Suslin, projections of Borel sets need not be Borel. So, by considering the case where {\Omega={\mathbb R}} and {\mathcal{F}=\mathcal{B}({\mathbb R})}, Theorem 1 is false if the completeness assumption is dropped. I will give a proof of Theorem 1 but, as it is a bit involved, this is left for a later post.

    For now, I will state some consequences of the measurable projection theorem which are important to the theory of continuous-time stochastic processes, starting with the following. Throughout this post, the underlying probability space {(\Omega,\mathcal{F})} is assumed to be complete, and stochastic processes are taken to be real-valued, or take values in the extended reals {\bar{\mathbb R}={\mathbb R}\cup\{\pm\infty\}}, with time index ranging over {{\mathbb R}_+}. For a first application of measurable projection, it allows us to show that the supremum of a jointly measurable processes is measurable.

    Lemma 2 If X is a jointly measurable process and {S\in\mathcal{B}(\mathbb{R}_+)} then {\sup_{s\in S}X_s} is measurable.

    Proof: Setting {U=\sup_{s\in S}X_s} then, for each real K, {U > K} if and only if {X_s > K} for some {s\in S}. Hence,

    \displaystyle  U^{-1}\left((K,\infty]\right)=\pi_\Omega\left((S\times\Omega)\cap X^{-1}\left((K,\infty]\right)\right).

    By the measurable projection theorem, this is in {\mathcal{F}} and, as sets of the form {(K,\infty]} generate the Borel sigma-algebra on {\mathbb{\bar R}}, U is {\mathcal{F}}-measurable. ⬜

    Next, the running maximum of a jointly measurable process is again jointly measurable.

    Lemma 3 If X is a jointly measurable process then {X^*_t\equiv\sup_{s\le t}X_s} is also jointly measurable.

    Continue reading “Measurable Projection and the Debut Theorem”

    Predictable Projection For Left-Continuous Processes

    In the previous post, I looked at optional projection. Given a non-adapted process X we construct a new, adapted, process Y by taking the expected value of {X_t} conditional on the information available up until time t. I will now concentrate on predictable projection. This is a very similar concept, except that we now condition on the information available strictly before time t.

    It will be assumed, throughout this post, that the underlying filtered probability space {(\Omega,\mathcal{F},\{\mathcal{F}_t\}_{t\in{\mathbb R}_+},{\mathbb P})} satisfies the usual conditions, meaning that it is complete and right-continuous. This is just for convenience, as most of the results stated here extend easily to non-right-continuous filtrations. The sigma-algebra

    \displaystyle  \mathcal{F}_{t-} = \sigma\left(\mathcal{F}_s\colon s < t\right)

    represents the collection of events which are observable before time t and, by convention, we take {\mathcal{F}_{0-}=\mathcal{F}_0}. Then, the conditional expectation of X is written as,

    \displaystyle  Y_t={\mathbb E}[X_t\;\vert\mathcal{F}_{t-}]{\rm\ \ (a.s.)} (1)

    By definition, Y is adapted. However, at each time, (1) only defines Y up to a zero probability set. It does not determine the paths of Y, which requires specifying its values simultaneously at the uncountable set of times in {{\mathbb R}_+}. So, (1) does not tell us the distribution of Y at random times, and it is necessary to specify an appropriate version for Y. Predictable projection gives a uniquely defined modification satisfying (1). The full theory of predictable projection for jointly measurable processes requires the predictable section theorem. However, as I demonstrate here, in the case where X is left-continuous, predictable projection can be done by more elementary methods. The statements and most of the proofs in this post will follow very closely those given previously for optional projection. The main difference is that left and right limits are exchanged, predictable stopping times are used in place of general stopping times, and the sigma algebra {\mathcal{F}_{t-}} is used in place of {\mathcal{F}_t}.

    Stochastic processes will be defined up to evanescence, so two processes are considered to be the same if they are equal up to evanescence. In order to apply (1), some integrability requirements need to imposed. I will use local integrability. Recall that, in these notes, a process X is locally integrable if there exists a sequence of stopping times {\tau_n} increasing to infinity and such that

    \displaystyle  1_{\{\tau_n > 0\}}\sup_{t \le \tau_n}\lvert X_t\rvert (2)

    is integrable. This is a strong enough condition for the conditional expectation (1) to exist, not just at each fixed time, but also whenever t is a stopping time. The main result of this post can now be stated.

    Theorem 1 (Predictable Projection) Let X be a left-continuous and locally integrable process. Then, there exists a unique left-continuous process Y satisfying (1).

    As it is left-continuous, the fact that Y is specified, almost surely, at any time t by (1) means that it is uniquely determined up to evanescence. The main content of Theorem 1 is the existence of Y, and the proof of this is left until later in this post.

    The process defined by Theorem 1 is called the predictable projection of X, and is denoted by {{}^{\rm p}\!X}. So, {{}^{\rm p}\!X} is the unique left-continuous process satisfying

    \displaystyle  {}^{\rm p}\!X_t={\mathbb E}[X_t\;\vert\mathcal{F}_{t-}]{\rm\ \ (a.s.)} (3)

    for all times t. In practice, X will usually not just be left-continuous, but will also have right limits everywhere. That is, it is caglad (“continu à gauche, limites à droite”).

    Theorem 2 Let X be a caglad and locally integrable process. Then, its predictable projection is caglad.

    The simplest non-trivial example of predictable projection is where {X_t} is constant in t and equal to an integrable random variable U. Then, {{}^{\rm p}\!X_t=M_{t-}} is the left-limits of the cadlag martingale {M_t={\mathbb E}[U\;\vert\mathcal{F}_t]}, so {{}^{\rm p}\!X} is easily seen to be a caglad process. Continue reading “Predictable Projection For Left-Continuous Processes”