Spitzer’s Formula

Spitzer’s formula is a remarkable result giving the precise joint distribution of the maximum and terminal value of a random walk in terms of the marginal distributions of the process. I have already covered the use of the reflection principle to describe the maximum of Brownian motion, and the same technique can be used for simple symmetric random walks which have a step size of ±1. What is remarkable about Spitzer’s formula is that it applies to random walks with any step distribution.

We consider partial sums

$\displaystyle S_n=\sum_{k=1}^n X_k$

for an independent identically distributed (IID) sequence of real-valued random variables X₁, X₂, …. This ranges over index n = 0, 1, … starting at S₀ = 0 and has running maximum

$\displaystyle R_n=\max_{k=0,1,\ldots,n}S_k.$

Spitzer’s theorem is typically stated in terms of characteristic functions, giving the distributions of (R_n, S_n) in terms of the distributions of the positive and negative parts, S_n⁺ and S_n^–, of the random walk.

Theorem 1 (Spitzer) For α, β ∈ ℝ,

$\displaystyle \sum_{n=0}^\infty \phi_n(\alpha,\beta)t^n=\exp\left(\sum_{n=1}^\infty w_n(\alpha,\beta)\frac{t^n}{n}\right)$ (1)

where ϕ_n, w_n are the characteristic functions

$\displaystyle \begin{aligned} \phi_n(\alpha,\beta)&={\mathbb E}\left[e^{i\alpha R_n+i\beta(R_n-S_n)}\right],\\ w_n(\alpha,\beta)&={\mathbb E}\left[e^{i\alpha S_n^++i\beta S_n^-}\right]\\ &={\mathbb E}\left[e^{i\alpha S_n^+}\right]+{\mathbb E}\left[e^{i\beta S_n^-}\right]-1. \end{aligned}$

As characteristic functions are bounded by 1, the infinite sums in (1) converge for |t|< 1. However, convergence is not really necessary to interpret this formula, since both sides can be considered as formal power series in indeterminate t, with equality meaning that coefficients of powers of t are equated. Comparing powers of t gives

$\displaystyle \begin{aligned} \phi_1&=w_1,\\ \phi_2&=\frac{w_1^2}2+\frac{w_2}2,\\ \phi_3&=\frac{w_1^3}6+\frac{w_1w_2}2+\frac{w_3}3, \end{aligned}$

(2)

and so on.

Spitzer’s theorem in the form above describes the joint distribution of the nonnegative random variables (R_n, R_n - S_n) in terms of the nonnegative variables (S_n⁺, S_n^–). While this does have a nice symmetry, it is often more convenient to look at the distribution of (R_n, S_n) in terms of (S_n⁺, S_n), which is achieved by replacing α with α + β and β with –β in (1). This gives a slightly different, but equivalent, version of the theorem.

Theorem 2 (Spitzer) For α, β ∈ ℝ,

$\displaystyle \sum_{n=0}^\infty \tilde\phi_n(\alpha,\beta)t^n=\exp\left(\sum_{n=1}^\infty \tilde w_n(\alpha,\beta)\frac{t^n}{n}\right)$ (3)

where ϕ̃_n, w̃_n are the characteristic functions

$\displaystyle \begin{aligned} \tilde\phi_n(\alpha,\beta)&={\mathbb E}\left[e^{i\alpha R_n+i\beta S_n}\right],\\ \tilde w_n(\alpha,\beta)&={\mathbb E}\left[e^{i\alpha S_n^++i\beta S_n}\right]. \end{aligned}$

Taking β = 0 in either (1) or (3) gives the distribution of R_n in terms of S_n⁺,

$\displaystyle \sum_{n=0}^\infty {\mathbb E}\left[e^{i\alpha R_n}\right]t^n=\exp\left(\sum_{n=1}^\infty {\mathbb E}\left[e^{i\alpha S_n^+}\right]\frac{t^n}n\right)$

(4)

I will give a proof of Spitzer’s theorem below. First, though, let’s look at some consequences, starting with the following strikingly simple result for the expected maximum of a random walk.

Corollary 3 For each n ≥ 0,

$\displaystyle {\mathbb E}[R_n]=\sum_{k=1}^n\frac1k{\mathbb E}[S_k^+].$ (5)

Proof: As R_n ≥ S₁⁺ = X₁⁺, if X_k⁺ have infinite mean then both sides of (5) are infinite. On the other hand, if X_k⁺ have finite mean then so do S_n⁺ and R_n. Using the fact that the derivative of the characteristic function of an integrable random variable at 0 is just i times its expected value, compute the derivative of (4) at α = 0,

$\displaystyle \begin{aligned} \sum_{n=0}^\infty i{\mathbb E}[R_n]t^n &=\exp\left(\sum_{n=1}^\infty \frac{t^n}n\right)\sum_{n=1}^\infty i{\mathbb E}[S_n^+]\frac{t^n}n\\ &=\exp\left(-\log(1-t)\right)\sum_{n=1}^\infty i{\mathbb E}[S_n^+]\frac{t^n}n\\ &=(1-t)^{-1}\sum_{n=1}^\infty i{\mathbb E}[S_n^+]\frac{t^n}n\\ &=(1+t+t^2+\cdots)\sum_{n=1}^\infty i{\mathbb E}[S_n^+]\frac{t^n}n. \end{aligned}$

Equating powers of t gives the result. ⬜

The expression for the distribution of R_n in terms of S_n⁺ might not be entirely intuitive at first glance. Sure, it describes the characteristic functions and, hence, determines the distribution. However, we can describe it more explicitly. As suggested by the evaluation of the first few terms in (2), each ϕ_n is a convex combination of products of the w_n. As is well known, the characteristic function of the sum of random variables is equal to the product of their characteristic functions. Also, if we select a random variable at random from a finite set, then its characteristic function is a convex combination of those of the individual variables with coefficients corresponding to the probabilities in the random choice. So, (3) expresses the distribution of (R_n, S_n) as a random choice of sums of independent copies of (S_k⁺, S_k).

In fact, expressions such as (1,3) are common in many branches of maths, such as zeta functions associated with curves over finite fields. We have a power series which can be expressed in two different ways,

$\displaystyle \sum_{n=0}^\infty a_nt^n = \exp\left(\sum_{n=1}^\infty b_n\frac{t^n}n\right).$

The left hand side is the generating function of the sequence a_n. The right hand side is a kind of zeta function associated with the sequence b_n, and is sometimes referred to as the combinatorial zeta function. The logarithmic derivative gives Σ_nb_nt^n-1, which is the generating function of b_n+1.

The exponential formula explicitly describes coefficients a_n in terms of the b_n,

$\displaystyle a_n=\frac1{n!}\sum_{\sigma\in {\rm Sym}(n)}b_1^{\sigma_1}b_2^{\sigma_2}\cdots b_n^{\sigma_n}.$

(6)

Here, σ is a permutation of n elements and σ_k denotes the number of cycles of length k. Applying to Spitzer’s formula (1), this gives

$\displaystyle \phi_n=\frac1{n!}\sum_{\sigma\in{\rm Sym}(n)}w_1^{\sigma_1}w_2^{\sigma_2}\cdots w_n^{\sigma_n}.$

(7)

As the characteristic function of a sum of independent random variables is the product of their characteristic functions, (7) gives the following construction for the distribution of (R_n, S_n). Let random permutation σ be uniform on Sym(σ). If its cycle lengths are l₁, l₂, …, l_r then the term on the right of (7) corresponding to the permutation is w_l₁w_l₂⋯w_{l_r}. Hence, choosing an independent sequence of random variables Z₁, …, Z_r having the same distributions as S_l₁, …, S_{l_r} then, (R_n, R_n - S_n) has the same distribution as

$\displaystyle (Z_1^++\cdots+Z_r^+,Z_1^-+\cdots+Z_r^-)$

or, equivalently, (R_n, S_n) has the same distribution as

$\displaystyle (Z_1^++\cdots+Z_r^+,Z_1+\cdots+Z_r).$

(8)

This formulation makes it clear that 𝔼[R_n] is the sum of 𝔼[S_k⁺] weighted by the expected number of cycles of length k. Corollary 3 follows from the fact that the expected number of length k cycles of a random permutation in Sym(n) equals 1/k.

As an example application, consider the probability that the random walk is never positive for the first n steps or, equivalently, R_n = 0.

Lemma 4

$\displaystyle \sum_{n=0}^\infty{\mathbb P}(R_n=0)t^n=\exp\left(\sum_{n=1}^\infty{\mathbb P}(S_n\le0)\frac{t^n}{n}\right).$

Proof: Using the notation above,

$\displaystyle \begin{aligned} {\mathbb P}(R_n=0) &={\mathbb P}(Z_1^++\cdots+Z_r^+=0)\\ &={\mathbb E}[{\mathbb P}(Z_1\le0\vert\;\sigma)\cdots{\mathbb P}(Z_r\le0\vert\;\sigma)]. \end{aligned}$

This is using the fact that, conditioned on the random permutation σ, Z_k are independent. As Z_k is distributed as S_{l_k} where l_k is the length of the k‘th cycle and recalling that σ_k denotes the number of cycles of length k,

$\displaystyle {\mathbb P}(R_n=0)={\mathbb E}\left[{\mathbb P}(S_1\le0)^{\sigma_1}{\mathbb P}(S_2\le0)^{\sigma_2}\cdots{\mathbb P}(S_n\le0)^{\sigma_n}\right].$

Comparing with the exponential formula (6) gives the result. ⬜

For symmetric distributions, we obtain a simple and surprising result which does not depend on the specific random walk distribution.

Corollary 5 Suppose that the steps X_k of the random walk have a continuous symmetric distribution. Then,

$\displaystyle {\mathbb P}(R_n=0)=\frac{(2n)!}{(2^nn!)^2}$

for each n ≥ 0.

Proof: As S_k has a symmetric distribution and, by continuity, ℙ(S_k = 0) = 0, we obtain ℙ(S_k ≤ 0) = 1/2. Lemma 4 gives

$\displaystyle \begin{aligned} \sum_{n=0}^\infty{\mathbb P}(R_n=0)t^n &=\exp\left(\sum_{n=1}^\infty\frac{t^n}{2n}\right)\\ &=\exp\left(-\frac12\log(1-t)\right)\\ &=(1-t)^{-1/2}\\ &=\sum_{n=0}^\infty\binom{-1/2}{n}(-1)^nt^n\\ &=\sum_{n=0}^\infty\frac{(2n)!}{(2^nn!)^2}t^n. \end{aligned}$

The final equality is just plugging in an explicit expression for the binomial coefficients and, equating powers of t, gives the result. ⬜

This result is sometimes known as the Sparre Anderson theorem, and there is a more direct proof which does not go via Spitzer’s formula. See section 2 of the 2012 paper Persistence of iterated partial sums by Dembo, Ding and Gao. The approach given there, though, does not provide a proof of the more general statement of lemma 4. Yet another method of proof is given in a 2010 paper by Satya Majumdar.

Returning to the discussion for the joint distribution, recall that the steps X_k of the random walk are IID. So, the sum of any number m of these has the same distribution as S_m, and sums over disjoint sets of indices are independent. This means that the terms Z_j in (8) can be expressed as a sum of X_k as k ranges over l_j values. It is convenient to allow it to range over the elements of the corresponding length-l_j cycle, which partition {1, …, n} and, hence, the resulting terms Z_j have the required distribution and sum to S_n.

Lemma 6 Using cyc(σ) to denote the set of cycles and writing

$\displaystyle \tilde R_n=\sum_{A\in{\rm cyc}(\sigma)}\left(\sum_{k\in A}X_k\right)^+,$ (9)

then (R̃_n, S_n) has the same distribution as (R_n, S_n).

In Spitzer’s original proof, lemma 6 was proved directly. This implies theorem 1 by applying the logic above in reverse. The proof is combinatorial and a bit tricky though. See the 1959 paper A combinatorial lemma and its application to probability theory by Spitzer. By contrast, the proof I will give below is of a very different nature and drops out from some straightforward algebra.

Instead of using (9) directly, it can be convenient to reorder the random walk increments X_k to be contiguous for each cycle.

Theorem 7 For fixed positive integer n, let

$\displaystyle n=\tau_0 > \tau_1 > \cdots > \tau_m=0$

be a finite (random) length sequence of integer random variables such that τ₀ – τ₁, …, τ_m-1 – τ_m has the same distribution (up to order) as the cycle lengths of a uniformly random permutation of size n. If these are independent of the random walk S_k, set

$\displaystyle \tilde R_n=\sum_{k=1}^m(S_{\tau_{k-1}}-S_{\tau_k})^+.$

Then (R̃_n, S_n) has the same distribution as (R_n, S_n).

Proof: The set of differences l_k = τ_k-1 – τ_k are distributed as the set of cycle lengths of a random permutation from Sym(n). By independence, conditioned on these, the increments S_{τ_k-1} – S_{τ_k} are independent with the same distribution as S_{l_k}. So, the result follows from (8). ⬜

Constructing random times τ_k with the required property is not difficult. Starting with τ₀ = n, for each k ≥ 0 in order:

if τ_k > 0 then choose τ_k+1 uniformly over 0, 1, …, τ_k – 1.
if τ_k = 0 then set m = k and terminate.

It can be seen that these have the required distribution using the following simple fact: for a random permutation σ ∈ Sym(n), the length of the cycle containing element 1 is uniform from 1 to n. We can choose n – τ₁ to be the length of this cycle. Then, deleting the elements of this cycle, we have a random permutation of length τ₁ and the remaining cycle lengths are chosen in the same way for this shorter permutation.

$\displaystyle$

Lévy Processes

The continuous-time counterpart of a random walks is a Lévy process. These are stochastic processes satisfying an IID increments property. Specifically, {S_t}_t≥0 is a Lévy process if it has right-continuous sample paths and, for times s, t > 0, the increment S_t+s – S_t is independent of {S_u: u ≤ t} and has distribution not depending on t. Choosing a time step h > 0, the discrete-time process S̃_n = S_nh is a random walk with IID increments X_n = S_nh – S_(n-1)h and the original Lévy process is recovered by taking h arbitrarily small. Examples include Poisson processes, Cauchy processes and, of course, Brownian motion.

Spitzer’s formula and its consequences follow across to the continuous-time situation, as I will explain now. Throughout this section, S_t will be a real-valued Lévy process starting from S₀ = 0 and its running maximum is denoted

$\displaystyle R_t=\sup_{0\le u\le t}S_u.$

This section will be a straightforward run through of the results already stated above, and converting them to the continuous-time setting. I start with the continuous-time version of corollary 3, giving the expected value of the maximum of a Lévy process.

Lemma 8 For all t ≥ 0,

$\displaystyle {\mathbb E}[R_t]=\int_0^t{\mathbb E}[S_u^+]\frac{du}{u}.$

Proof: By scaling time if necessary, without loss of generality we can take t = 1. Fixing integer n > 0, consider the random walk S̃_k = S_k/n. Corollary 3 gives,

$\displaystyle {\mathbb E}\left[\sup_{k\le n}S_{k/n}\right]=\sum_{k=1}^n\frac1k{\mathbb E}[S^+_{k/n}].$

As n increases to infinity, the term inside the expectation on the left increases to R₁ so, by monotone convergence, the left hand side tends to 𝔼[R₁]. For the right hand side, set f(u) = 𝔼[S_u⁺]/u to write it as

$\displaystyle \sum_{k=1}^nf(k/n)\frac 1n =\int_0^1 f_n(u)\,du$

where f_n(u) = f(k/n) for (k - 1)/n < u ≤ k/n. Lemma 11 below shows that f(u) is continuous and bounded by a multiple of the decreasing integrable function u^-1/2, so long as S_u⁺ is integrable for any u. Hence, by dominated convergence, the right hand side tends to ∫₀¹f(u)du as required. On the other hand, if S_u⁺ has infinite expectation for all u, then both sides of the equality in the statement of the lemma would be infinite, so it still holds. ⬜

As an example, consider standard Brownian motion. From the post on the reflection principle we know that R_t has the same distribution as |S_t|, so its expectation is 2𝔼[S_t⁺]. Applying lemma 8 instead, scaling invariance of Brownian motion tells us that 𝔼[S_u⁺] is equal to √u/t𝔼[S_t⁺] so we obtain

$\displaystyle {\mathbb E}[R_t]=\int_0^t {\mathbb E}[S_t^+]\frac{du}{\sqrt{ut}}=2{\mathbb E}[S_t^+]$

agreeing with the result using the reflection principle.

The main statement of Spitzer’s formula given by theorem 1 also follows across to the continuous-time situation. The difference is that, rather than generating functions of a sequence, we obtain the Laplace transform of the characteristic functions with respect to the time index. Note that, by continuity in probability of Lévy processes, the characteristic functions used by the theorem are continuous in the time index.

Theorem 9 For any λ > 0 and α, β ∈ ℝ,

$\displaystyle \int_0^\infty\phi_t(\alpha,\beta)e^{-\lambda t}\,dt = \lambda^{-1}\exp\left(\int_0^\infty(w_t(\alpha,\beta)-1)e^{-\lambda t}\frac{dt}{t}\right)$ (10)

where ϕ_t, w_t are the characteristic functions

$\displaystyle \begin{aligned} \phi_t(\alpha,\beta)&={\mathbb E}\left[e^{i\alpha R_t+i\beta(R_t-S_t)}\right],\\ w_t(\alpha,\beta)&={\mathbb E}\left[e^{i\alpha S_t^++i\beta S_t^-}\right]. \end{aligned}$

Proof: Choose h > 0 and consider the random walk S̃_n = S_nh with running maximum R̃_n. Defining the characteristic function

$\displaystyle \tilde\phi_{n,h}(\alpha,\beta)={\mathbb E}\left[e^{i\alpha \tilde R_n+i\beta(\tilde R_n-\tilde S_n)}\right]$

and applying theorem 1 (Spitzer’s theorem) gives

$\displaystyle \sum_{n=0}^\infty\tilde\phi_{n,h}t^n=\exp\left(\sum_{n=1}^\infty w_{nh}\frac{t^n}{n}\right).$

Substituting in t = e^–λh and multiplying by h,

$\displaystyle \sum_{n=0}^\infty\tilde\phi_{n,h}e^{-\lambda nh}h=\exp\left(\log h+\sum_{n=1}^\infty w_{nh}\frac{e^{-\lambda nh}}{n}\right).$

(11)

The left hand side can be written as ∫₀^∞f_n(t) dt where f_n(t) = ϕ̃_n,he^–λnh for (n - 1)h < t ≤ nh. We note that this is bounded by the integrable function e^–λt. By right-continuity of Lévy process paths, bounded convergence shows that f_n(t) → ϕ_te^–λt as h → 0. So, by dominated convergence, the integral above tends to the left hand side of (10).

Next, look at the right hand side of (11). Using the power series expansion

$\displaystyle -\log(1-t)=t+t^2/2+\cdots,$

the exponent is

$\displaystyle \log(h/(1-e^{-\lambda h})) + \sum_{n=1}^\infty\frac{w_{nh}-1}{n}e^{-\lambda nh}.$

The first term tends to -logλ as h → 0. Using g(t) = t^-1(w_t - 1)e^–λt, the summation term can be written as ∫₀^∞g_h(t) dt where g_h(t) = g(nh) for (n - 1)h < t ≤ nh. We have g_h(t) → g(t) as h → 0, so the result follows from dominated convergence so long as it can be shown that g_h are dominated by an integrable function.

Using the fact that |e^ix – 1| is bounded by |x| ∧ 2, this bounds g(t) by a multiple of t^-1𝔼[|S_t| ∧ 2]e^–λt. In fact, 𝔼[|S_t| ∧ 2] is bounded by a multiple of √t, but I leave this simple fact to lemma 12 below. This bounds g(t) by a multiple of t^-1/2e^–λt, which is integrable and decreasing, so also bounds g_h(t) as required.

We have shown that the exponent on the right hand of (11) side tends to ∫₀^∞g(t) dt – logλ as h → 0, and taking the exponential gives the right hand side of (10). ⬜

Theorem 7 also carries across to continuous-time, allowing us to directly construct a random variable which has the same joint distribution as the Lévy process maximum has with the terminal value. Given an IID sequence U₁, U₂, U₃, … of random variables uniformly distributed on [0, 1], consider the partial products

$\displaystyle \tau_0=1,\ \tau_1=U_1,\ \tau_2=U_1U_2,\ \ldots.$

(12)

The collection of differences {τ_k-1 – τ_k} is a random set having what is known as the Poisson-Dirichlet distribution (up to order). This is the relative length of the cycles of a random permutation in Sym(n) in the limit as n goes to infinity. Scaling by the positive time t, these choices of random times can be used in the theorem below.

Theorem 10 For fixed positive time t, let

$\displaystyle t=\tau_0 > \tau_1 > \tau_2 > \cdots > 0$

be a decreasing sequence of random times such that t^-1(τ_k-1 - τ_k) have the Poisson-Dirichlet distribution (up to ordering) independently of S. Set

$\displaystyle \tilde R_t=\sum_{k=1}^m(S_{\tau_{k-1}}-S_{\tau_k})^+.$

Then (R̃_t, S_t) has the same distribution as (R_t, S_t).

Proof: Scaling the time index if necessary, without loss of generality we assume that t = 1. Fixing positive integer n, consider the random walk S′_k = S_k/n with maximum R′_n = max_k≤nS̃_k. Consider an IID sequence of random variables U₁, U₂, … uniformly distributed over [0, 1] and define times τ_k by (12). Also define random nonnegative integer times τ̃₀ = n and, for each k > 0,

$\displaystyle \tilde\tau_k = \lfloor\tilde\tau_{k-1} U_k\rfloor$

As each floor ⌊·⌋ can only impact the value of τ̃_k/n by less than 1/n, it is clear that these tend to τ_k in the limit as n goes to infinity.

By theorem 7,

$\displaystyle \tilde R'_n=\sum_{k=1}^\infty \left(S_{\tilde\tau_{k-1}/n}-S_{\tilde\tau_k/n}\right)^+$

has the same joint distribution with S₁ as does R′_n. In the limit as n goes to infinity, R′_n converges to R₁ so, to complete the proof, it is sufficient to show that R̃′_n converges in probability to

$\displaystyle \tilde R_1=\sum_{k=1}^\infty \left(S_{\tau_{k-1}}-S_{\tau_k}\right)^+.$

By continuity in probability of Lévy processes, it will converge term-by-term. Selecting positive integer N, the sum of the first N terms will converge, so it only needs to be shown that by taking N sufficiently large, the sum of the remaining terms, can be made as small as we like. Taking expectations,

$\displaystyle \begin{aligned} &{\mathbb E}\left[1\wedge\sum_{k \ge N}(S_{\tilde\tau_{k-1}/n}-S_{\tilde\tau_k/n})^+\right]\\ &\le\sum_{k \ge N}{\mathbb E}\left[1\wedge(S_{\tilde\tau_{k-1}/n}-S_{\tilde\tau_k/n})^+\right]\\ &\le\sum_{k\ge N}c{\mathbb E}\left[\sqrt{\tilde\tau_{k-1}/n-\tilde\tau_{k}/n}\right]\\ &\le\sum_{k\ge N}c{\mathbb E}\left[\sqrt{\tilde\tau_{k-1}/n}\right] \end{aligned}$

for some constant c > 0. This is using the fact that 𝔼[1 ∧ |S_t|] is of order √t, shown in lemma 11 below. As τ̃_k-1/n is bounded by the product of the independent random variables U_j over j < k, the expectation is bounded by

$\displaystyle c\sum_{k \ge N}{\mathbb E}[\sqrt U_1]^{(k-1)}=c\sum_{k\ge N}(2/3)^{k-1}=2c(2/3)^N.$

Choosing N large, this can be made as small as we like, as required. ⬜

The proofs above made use of some technical lemmas to bound the terms. As they are a little boring, I relegated them to the end of this section, starting with the following which was required for the proof of lemma 8 and theorem 10.

Lemma 11 The positive part S_t⁺ of Lévy process S has finite mean for some time t > 0 if and only if it has finite mean at all times, in which case 𝔼[S_t⁺] is continuous in t and

$\displaystyle {\mathbb E}[S_t^+]\le{\mathbb E}[R_t]=O(\sqrt{t})$

in the limit t → 0.

Proof: Fixing K > 0, decompose S as,

$\displaystyle S_t=X_t+Y_t-Z_t$

where Y, Z are pure jump processes consisting of the jumps larger than K and less than –K respectively.

$\displaystyle \begin{aligned} &Y_t=\sum_{u\le t}1_{\{\Delta S_u > K\}}\Delta S_u,\\ &Z_t=-\sum_{u\le t}1_{\{\Delta S_u > K\}}\Delta S_u. \end{aligned}$

First, look at X. As it is a Lévy process with bounded jumps (by K), it is square integrable, and by independent increments, both 𝔼[X_t] and the variance Var(X_t) are linear in time,

$\displaystyle {\mathbb E}[X_t]=at,\ {\rm Var}(X_t)=bt.$

Applying Doob’s inequality to the martingale X_t – at,

$\displaystyle {\mathbb E}\left[\sup_{s\le t}X_s^2\right]\le {\mathbb E}[2\sup_{s\le t}(X_s-as)^2]+2a^2t^2\le 8bt+2a^2t^2$

and, hence,

$\displaystyle {\mathbb E}\left[\sup_{s\le t}X_s\right]\le \sqrt{8bt+2a^2t^2}$

which is bounded by a multiple of √t over each finite time interval.

Next, look at the nonnegative term Y_t. If we suppose that S_t⁺ is integrable for a single t > 0 then, since Y_t – Z_t = S_t^–X_t is integrable and Y, Z are independent, it follows that Y_t is integrable. By independent increments, Y_t is integrable for all t and 𝔼[Y_t] = ct for some nonnegative constant c. Hence,

$\displaystyle {\mathbb E}[R_t]\le{\mathbb E}\left[\sup_{s\le t}(Y_s+Z_s)\right]\le\sqrt{8bt+2a^2t^2}+ct$

which is bounded by a multiple of √t over bounded time intervals.

Finally, as Lévy processes are continuous in probability, if t_n → t then S_{t_n} → S_t in probability. As S_t⁺ is bounded by the increasing integrable process R_t, dominated convergence says that 𝔼[S⁺_{t_n}] → 𝔼[S_t⁺] which, therefore, is continuous in t. ⬜

The following bound was required for the proof of theorem 9.

Lemma 12 For any fixed K > 0, 𝔼[|S_t| ∧ K] is bounded by a multiple of √t.

Proof: First, f(t) = 𝔼[|S_t| ∧ K] is bounded by K and, hence, is bounded by a multiple of √t away from 0. So, it is sufficient to show that f(t) = O(√t) as t → 0. Then, as in the proof of lemma 11, we strip out the jumps of S which are larger than K in absolute value,

$\displaystyle X_t=S_t-\sum_{u\le t}1_{\{\lvert\Delta S_u\rvert > K\}}\Delta S_u.$

As the jumps of S_t greater than K in absolute value arrive according to a Poisson process of some finite rate λ ≥ 0, the first time τ at which such a jump occurs has the exponential distribution with rate parameter λ. Taking expectations of the inequality

$\displaystyle \lvert S_t\rvert\wedge K\le \lvert X_t\rvert+K1_{\{\tau \le t\}}.$

we obtain

$\displaystyle f(t)\le{\mathbb E}[\lvert X_t\rvert]+K(1-e^{-\lambda t}).$

As in the proof of lemma 11, 𝔼[|X_t|] = O(√t), so the right hand side of the above inequality is bounded by a multiple of √t as t → 0. ⬜

Proof of Spitzer’s Formula

I give a proof of Spitzer’s formula, based on the treatment in the 1958 paper Spitzer’s formula: A short proof by J.G. Wendell. It is actually rather surprising how such a seemingly complicated result drops almost effortlessly out of some very basic algebra. While this section might look quite long at first glance, most of it consists of simply defining the algebraic structures and elementary manipulations. After that, there is very little real work. The real meat that seems to contribute to the final result is the algebraic properties of the projection operator P stated in lemma 13 below. This lemma is still rather simple though.

Use M to denote the finite complex Borel measures on ℝ², which is a vector space over ℂ. The idea is that the distribution of (R_n, S_n) is in M and needs to be constructed. Note that we construct S_n+1 by adding independent random variable X_n+1 to S_n and then obtain R_n+1 by taking the maximum of this with R_n,

$\displaystyle \begin{aligned} &S_{n+1}=S_n + X_{n+1},\\ &R_{n+1}=R_n\vee S_{n+1}. \end{aligned}$

This can be represented by a pair of linear operators on M , which I denote by F and P respectively. If (R, S) are random variables with probability distribution μ ∈ M then Fμ represents the distribution of (R, S + X) where X has the random walk increment distribution (i.e., same as X_k) and is independent of (R, S). Also, Pν represents the distribution of (R ∨ S, S). That is, they are linear maps M → M satisfying

$\displaystyle \begin{aligned} &\int f(r,s)\,dF\mu(r,s)=\iint f(r,s+x)\,d\mu(r,s)d\lambda_X(x),\\ &\int f(r,s)\,dP\mu(r,s)=\int f(r\vee s,s)\,d\mu(r,s). \end{aligned}$

for μ ∈ M and bounded measurable f: ℝ² → ℂ, where λ_X denotes the distribution of X. By construction, if μ is the distribution of (R_n, S_n) then Fμ is the distribution of (R_n, S_n+1) and PFμ is the distribution of (R_n+1, S_n+1). That is,

$\displaystyle \gamma_{n+1}=PF\gamma_n$

where γ_n ∈ M is the distribution of (R_n, S_n).

Use δ ∈ M to denote the ‘Dirac delta’ distribution on ℝ² under which (R, S) satisfy R = S = 0 almost surely. Then, γ₀ = δ and, iteratively applying the equation above gives

$\displaystyle \gamma_n=(PF)^n\delta.$

(13)

Next we promote M to an algebra over ℂ by using convolution as the multiplication operator. If μ, ν ∈ M are distributions of (R₁, S₁) and (R₂, S₂) respectively, and these are independent, then μν is the distribution of (R₁ + R₂, S₁ + S₂). Explicitly

$\displaystyle \int f(r,s)\,d(\mu\nu)(r,s)=\iint f(r_1+r_2,s_1+s_2)\,d\mu(r_1,s_1)d\nu(r_2,s_2).$

for bounded measurable f: ℝ² → ℂ. This is a commutative ring with identity δ, δμ = μδ = μ.

It should be clear from the definition that P is a projection, P² = P. However, it also satisfies the following properties which will be instrumental in proving Spitzer’s formula. These are equivalent saying that PM and (1 - P)M are subalgebras.

Lemma 13 The operator P satisfies the following identities for any μ, ν ∈ M ,

if Pμ = μ and Pν = ν then P(μν) = μν.

if Pμ = Pν = 0 then P(μν) = 0.

Proof: The statements are equivalent to the identities

$\displaystyle \begin{aligned} &P((P\mu)P\nu)=(P\mu)P\nu,\\ &P(((1-P)\mu)(1-P)\nu)=0. \end{aligned}$

By linearity it is sufficient to prove these for probability measures, so suppose that (R₁, S₁) and (R₂, S₂) are independent with distributions μ and ν respectively. From the definitions

$\displaystyle (R,S)\equiv(R_1\vee S_1+R_2\vee S_2, S_1+S_2)$

has distribution (Pμ)Pν. As R ≥ S, applying P to this has no effect, proving the first identity. The second identity can be rearranged as

$\displaystyle P(\mu\nu)+(P\mu)P\nu=P((P\mu)\nu)+P(\mu P\nu).$

(14)

The terms on the left hand side are the distributions of (R, S₁ + S₂) with R equal to

$\displaystyle (R_1+R_2)\vee(S_1+S_2){\rm\ or\ }R_1\vee S_1+R_2\vee S_2$

respectively. For the right hand side, R is equal to

$\displaystyle (R_1\vee S_1+R_2)\vee(S_1+S_2){\rm\ or\ }(R_1+R_2\vee S_2)\vee(S_1+S_2).$

respectively. On {R₁ ≥ S₁} these are the same pairs of values, so that P(μν) = P((Pμ)ν) and (Pμ)Pν = P(μPν). On {R₂ ≥ S₂}, they are the same pairs but in different order, so P(μν) = P(μPν) and (Pμ)Pν = P((Pμ)ν). On the remaining case with R₁ < S₁ and R₂ < S₂ they all reduce to R = S₁ + S₂. So, in all cases, (14) holds. ⬜

Curiously, the paper by Wendell does not prove a version of lemma 13. Instead, it merely states that PM and (1 - P)M are subalgebras as if it is an obvious fact. As these properties appear to be the central facts on which the proof is built, and are not obvious to me, I included the statement and proof.

Now, consider the space M [[t]] of power series over M in indeterminate t. This consists of series

$\displaystyle \mu_0+\mu_1t+\mu_2t^2+\mu_3t^3+\cdots$

for coefficients μ_k lying in M . Performing addition and multiplication term-by-term makes this into an algebra, to which the operators F and P can be extended by applying to the the coefficients.

Equation (13) can be expressed by a simple formula for the generating function of the sequence γ_n.

Lemma 14 The operator 1 – tPF on M [[t]] is invertible and

$\displaystyle \sum_{n=0}^\infty \gamma_nt^n=(1-tPF)^{-1}\delta$ (15)

Proof: It is easily checked that the operator 1 – tPF has inverse

$\displaystyle 1+tPF+t^2(PF)^2+t^3(PF)^3+\cdots$

As for a standard geometric series, just multiply by 1 – tPF and note that all powers of t cancel leaving only 1. Applying to δ directly gives the left hand side of (15) by substituting in (13). ⬜

Any function f(z) given by a power series with complex coefficients can be defined on tM [[t]]. Here, we just need exponentials and logarithms,

$\displaystyle \begin{aligned} &\exp(\mu)=\sum_{n=0}^\infty\frac1{n!}\mu^n,\\ &\log(\delta+\mu)=\sum_{n=1}^\infty\frac{(-1)^{n+1}}{n}\mu^n. \end{aligned}$

These are well-defined as long as μ ∈ M [[t]] is a multiple of t, so that nonzero coefficients of any fixed power of t only occur in finitely terms on the right. They satisfy the usual identities

$\displaystyle \begin{aligned} &\exp(\mu)\exp(\nu)=\exp(\mu+\nu),\\ &\exp(\log(\delta+\mu))=\delta+\mu. \end{aligned}$

This can be checked by expanding and comparing powers of μ and ν, which we know must verify the correct identities since they hold over the complex numbers.

Now let χ ∈ M be the distribution of (0, X),

$\displaystyle \int f(r,s)\,d\chi(r,s)=\int f(0,x)\,d\lambda_X(x)={\mathbb E}[f(0,X)].$

With this definition, the operator F is the same as multiplying by χ.

Proving the following formula for the generating function of γ_n is now just a matter of simple algebra combined with the properties of P stated in lemma 13. This is essentially an alternative statement of Spitzer’s theorem. In many ways, I actually prefer the formulation given below over the version of Spitzer’s theorem stated in 1 and 2 above, since it directly works with distributions rather than characteristic functions.

Theorem 15

$\displaystyle \sum_{n=0}^\infty\gamma_n t^n=\exp\left(-P\log(\delta-\chi t)\right)$ (16)

Proof: Letting γ^∗ denote the right hand side of (16), this is of the form exp(μ) where Pμ = μ. Expanding it out gives a sum of multiples of powers of μ which, by lemma 13, are invariant under applying P. Hence, Pγ^∗ = γ^∗.

Now,

$\displaystyle \begin{aligned} (\delta-\chi t)\gamma^* &=\exp\left(\log(\delta-\chi t)\right)\exp\left(-P\log(\delta-\chi t)\right)\\ &=\exp\left((1-P)\log(\delta-\chi t)\right). \end{aligned}$

The right hand side is of the form exp(μ) for Pμ = 0. Expanding this out gives δ + ν where ν is a sum of multiples of positive powers of μ. It follows from lemma 13 that Pν = 0. Hence,

$\displaystyle P((\delta-\chi t)\gamma^*)=P\delta=\delta.$

We obtain

$\displaystyle \begin{aligned} \delta&=P\gamma^* - tP(\chi\gamma^*)\\ &=\gamma^*-tPF\gamma^*\\ &=(1-tPF)\gamma^* \end{aligned}$

Lemma 14 gives

$\displaystyle \gamma^*=\sum_{n=0}^\infty \gamma_n t^n$

as required. ⬜

We finish off by expressing the result in terms of characteristic functions to obtain Spitzer’s theorem as stated at the top of the post.

Proof of theorems 1 and 2: As the two statements are equivalent under a change of variables, we just need to prove theorem 2. Fixing reals α, β, define f: ℝ² → ℂ by f(r, s) = exp(iαr + iβs). Writing μ(f) for the integral of f with respect to measure μ,

$\displaystyle \begin{aligned} &\gamma_n(f)={\mathbb E}\left[e^{i\alpha R_n+i\beta S_n}\right]\equiv\tilde\phi_n,\\ &(P\chi^n)(f)={\mathbb E}\left[e^{i\alpha S_n^++i\beta S_n}\right]\equiv\tilde w_n. \end{aligned}$

The fact that the characteristic function of the sum of independent random variables is the product of their characteristic functions means that (μν)(f) = μ(f)ν(f) for μ, ν ∈ M or, equivalently, integration of f is a homomorphism from M to ℂ. Applying termwise, this gives a homomorphism from M [[t]] to ℂ[[t]]. Applying (16) to f gives the result.

$\displaystyle \begin{aligned} \sum_{n=0}^\infty\tilde\phi_nt^n &=\sum_{n=0}^\infty\tilde\gamma_n(f)t^n\\ &=\exp\left(\sum_{n=1}^\infty P\chi^n\frac{t^n}{n}\right)(f)\\ &=\exp\left(\sum_{n=1}^\infty(P\chi^n)(f)\frac{t^n}{n}\right)\\ &=\exp\left(\sum_{n=1}^\infty \tilde w_n\frac{t^n}{n}\right) \end{aligned}.$

⬜

3 thoughts on “Spitzer’s Formula”

Anonymous says:

4 January 24 at 7:08 AM

Can this also be applied to the distribution of the maximum distance from the origin? I.e. $|R_n|$ in your notation.

1. George Lowther says:
  
  4 January 24 at 10:49 AM
  
  Well, R_n is nonnegative in my notation, so I don’t think you mean |R_n|. I assume you mean, max_{k <= n}|S_k|. Then, no. The absolute maximum is rather different. Compare the absolute maximum of Brownian motion which requires infinite sum, with the much simpler (non-absolute) maximum.
  
  The Minimum and Maximum of Brownian motion
  
Yang Chu says:

6 February 24 at 12:52 AM

There is a beautiful theory of convex minorants of random walk and Levy processes due to Pitman and el., which extends the idea of Spitzer’s lemma: one can recover the Spitzer’s lemma by considering the faces of convex minorants with negative slopes. See e.g. https://arxiv.org/abs/1102.0818

	Anonymous on About
	Anonymous on About
	Anonymous on The Stochastic Integral
	Anonymous on Bessel Processes
	Anonymous on Bessel Processes
	Stat Prof on Pathwise Regularity of Optiona…
	Yang Chu on Spitzer’s Formula
	Anonymous on About
	Anonymous on Continuous Processes with Inde…
	Anonymous on Continuous Processes with Inde…

Spitzer’s Formula

Lévy Processes

Proof of Spitzer’s Formula

Published by George Lowther

3 thoughts on “Spitzer’s Formula”

Leave a comment Cancel reply

Lévy Processes

Proof of Spitzer’s Formula

Related

Published by George Lowther

3 thoughts on “Spitzer’s Formula”

Leave a comment Cancel reply