The Kolmogorov Continuity Theorem

Fractional BM
Figure 1: Fractional Brownian motion with H = 1/4, 1/2, 3/4

One of the common themes throughout the theory of continuous-time stochastic processes, is the importance of choosing good versions of processes. Specifying the finite distributions of a process is not sufficient to determine its sample paths so, if a continuous modification exists, then it makes sense to work with that. A relatively straightforward criterion ensuring the existence of a continuous version is provided by Kolmogorov’s continuity theorem.

For any positive real number {\gamma}, a map {f\colon E\rightarrow F} between metric spaces E and F is said to be {\gamma}-Hölder continuous if there exists a positive constant C satisfying

\displaystyle  d(f(x),f(y))\le Cd(x,y)^\gamma

for all {x,y\in E}. The smallest value of C satisfying this inequality is known as the {\gamma}-Hölder coefficient of {f}. Hölder continuous functions are always continuous and, at least on bounded spaces, is a stronger property for larger values of the coefficient {\gamma}. So, if E is a bounded metric space and {\alpha\le\beta}, then every {\beta}-Hölder continuous map from E is also {\alpha}-Hölder continuous. In particular, 1-Hölder and Lipschitz continuity are equivalent.

Kolmogorov’s theorem gives simple conditions on the pairwise distributions of a process which guarantee the existence of a continuous modification but, also, states that the sample paths {t\mapsto X_t} are almost surely locally Hölder continuous. That is, they are almost surely Hölder continuous on every bounded interval. To start with, we look at real-valued processes. Throughout this post, we work with repect to a probability space {(\Omega,\mathcal F, {\mathbb P})}. There is no need to assume the existence of any filtration, since they play no part in the results here

Theorem 1 (Kolmogorov) Let {\{X_t\}_{t\ge0}} be a real-valued stochastic process such that there exists positive constants {\alpha,\beta,C} satisfying

\displaystyle  {\mathbb E}\left[\lvert X_t-X_s\rvert^\alpha\right]\le C\lvert t-s\vert^{1+\beta},

for all {s,t\ge0}. Then, X has a continuous modification which, with probability one, is locally {\gamma}-Hölder continuous for all {0 < \gamma < \beta/\alpha}.

As an example, consider a standard Brownian motion X. In this case, {X_t-X_s} is a centred normal variable of variance {\lvert t-s\rvert}. Hence,

\displaystyle  {\mathbb E}[\lvert X_t-X_s\rvert^\alpha]={\mathbb E}[\lvert N\rvert^\alpha]\lvert t-s\rvert^{\alpha/2}

for a standard normal N. Theorem 1 can be applied so long as we take {\alpha > 2}. In that case, {\beta=\alpha/2-1} and we see that Brownian motion is locally {\gamma}-Hölder continuous for all {\gamma < 1/2-1/\alpha}. By choosing {\alpha} as large as we like, this demonstrates that Brownian motion is locally {\gamma}-Hölder continuous for all {\gamma < 1/2}. In the other direction, it is not hard to show that it cannot be 1/2-Hölder continuous on any nontrivial interval.

More generally, theorem 1 can be applied to fractional Brownian motion. These are centred Gaussian processes whose finite distributions can be defined by the pairwise covariances. I do not show that these finite distributions are well-defined here (i.e., that the covariance matrix is positive semi-definite). The point is that once we have constructed the finite distributions, Kolmogorov’s theorem ensures the existence of a continuous modification.

Example 1 Fractional Brownian motion, {\{B_t\}_{t\ge0}}, of Hurst parameter H (strictly between 0 and 1), is a centred Gaussian process such that {B_t-B_s} has standard deviation {\lvert t- s\rvert^H} for all {s,t\ge0}.

This has a continuous modification which, with probability one, is locally {\gamma}-Hölder continuous for all {\gamma < H}.

As in the example of standard Brownian motion above, which is actually just fractional Brownian motion with Hurst parameter 1/2, we can compute

\displaystyle  {\mathbb E}[\lvert X_t-X_s\rvert^\alpha]={\mathbb E}[\lvert N\rvert^{\alpha}]\lvert t-s\rvert^{\alpha H}

and, so, theorem 1 applies with {\beta=\alpha H-1} and {\gamma}-Hölder continuity holds for all {\gamma < H-1/\alpha}. Again, letting {\alpha} go to infinity, shows that it holds for all {\gamma < H}, as claimed. In the reverse direction, it is not difficult to show that the fractional Brownian motion is not H-Hölder continuous. So, with increasing value of H, the sample paths of fractional brownian motion become smoother, in a sense. This can be seen visually for the paths shown in figure 1 above.

The continuity theorem can be generalised in a couple of ways. Firstly, the process need not be real-valued but, rather, can take values in a complete metric space. Secondly, the (time) index need not be restricted to be the nonnegative reals, but can be allowed to take values in any subset of {{\mathbb R}^d}.

Theorem 2 Let E be a separable and complete metric space, {S\subseteq{\mathbb R}^d}, and {\{U_x\}_{x\in S}} be a collection of E-valued random variables. If {\alpha,\beta,C} are positive constants satisfying

\displaystyle  {\mathbb E}\left[d(U_x,U_y)^\alpha\right]\le C\lVert x-y\rVert^{d+\beta} (1)

for all {x,y\in S}, then {U_x} has a continuous modification. Furthermore, with probability one, this modification is almost surely {\gamma}-Hölder continuous on all bounded sets for all {0 < \gamma < \beta/\alpha}.

Theorem 1 is just the special case of this result where {E={\mathbb R}}, {d=1} and {S={\mathbb R}_+}. The proof is given further down. The requirement for the metric space to be separable, so that it has a countable dense subset, is only really to ensure that {d(U_x,U_y)} are measurable random variables. I was also a bit unclear in the statement of inequality (1) as to the meaning of the norm {\lVert\cdot\rVert} on {{\mathbb R}^d}. We could, for example, use the {L^p}-norm for any {1\le p < \infty}, defined by {\lVert x\rVert_p=(\lvert x_1\rvert^p+\cdots+\lvert x_d\rvert^p)^{1/p}}. Alternatively, the {L^\infty} norm given by {\lVert x\rVert_\infty=\max(\lvert x_1\rvert,\ldots,\lvert x_d\rvert)} can be used. The fact that these are all equivalent,

\displaystyle  \lVert x\rVert_\infty\le\lVert x\rVert_p\le d^{1/p}\lVert x\rVert_\infty,

means that it does not matter which is used. The only difference is in the value of the arbitrary constant C, and does not affect whether the condition of theorem 2 is satisfied.

A Brownian sheet
Figure 2: A Brownian Sheet

As an example application of theorem 2, we can construct generalisations of Brownian motion varying over a multidimensional index set. The 2-dimensional case is called a Brownian sheet, and a sample is plotted in figure 2 above. This can represent a continuous random path, which itself varies randomly over time. Such processes may be used to build models of interest rates where, at any moment in time, we have an entire yield curve representing the interest rates for all maturities, and these also vary randomly over time.

Lemma 3 For each positive integer d, there exists a zero mean Gaussian stochastic process {\{W_t\colon t\in{\mathbb R}_+^d\}} with covariance

\displaystyle  {\mathbb E}[W_sW_t]=\prod_{i=1}^d s_i\wedge t_i,

for all {s,t\in{\mathbb R}_+^d}. This has a continuous modification, which is locally {\gamma}-Hölder continuous for all {\gamma < 1/2}.

Proof: I make use of the standard result that, for any (real) inner product space V, we can define a joint normal collection of random variables {X(v)}, over {v\in V}, with zero mean and such that {{\mathbb E}[X(u)X(v)]=\langle u,v\rangle} for all {u,v\in V}. In fact, joint normal variables can be defined for any positive semidefinite covariance matrix, which applies here since,

\displaystyle  \sum_{i,j=1}^nc_ic_j{\mathbb E}[X(v_i)X(v_j)]=\left\langle\sum_{i=1}^nc_iv_i,\sum_{j=1}^nc_jv_j\right\rangle\ge0.

Take V to be {L^2({\mathbb R}_+^d,\lambda)} with {\lambda} being the Lebesgue measure. Define,

\displaystyle  W_t=X(1_{[0,t)})

for all {t\in{\mathbb R}_+^d}, with {[0,t)} denoting the set of all {s\in{\mathbb R}_+^d} with {s_i < t_i} ({i=1,\ldots,d}). We then have,

\displaystyle  \begin{aligned} {\mathbb E}[W_sW_t]&=\int 1_{[0,s)}1_{[0,t)}d\lambda\\ &=\int\cdots\int 1_{\{u_1 < s_1\wedge t_1,\ldots,u_d < s_d\wedge t_d\}}du_1\cdots du_d\\ &=\prod_{i=1}^d s_i\wedge t_i \end{aligned}

as required.

It remains to show that W has a modification with the stated Hölder continuity and, for this, it is sufficient to prove the result for index t restricted to bounded sets of the form {[0,T]^d}, as the full result will follow by letting T increase to infinity. For {s,t\in[0,T]^d},

\displaystyle  \begin{aligned} {\mathbb E}[(W_s-W_t)^2]&= \int\left(1_{[0,s)}-1_{[0,t)}\right)^2d\lambda\\ &\le\sum_{i=1}^d\int \prod_{j\not=i}1_{\{u_j < T\}}1_{\{s_i\wedge t_i\le u_i\le s_i\vee t_i\}}d\lambda(u)\\ &=T^{d-1}\sum_{i=1}^d\lvert t_i-s_i\rvert\\ &=T^{d-1}\lVert t-s\rVert_1. \end{aligned}

Hence, for any fixed {\alpha > 0}, there exists a constant C satisfying

\displaystyle  {\mathbb E}[\lvert W_s-W_t\rvert^\alpha]\le C\lVert t-s\rVert_1^{\alpha/2}.

Theorem 2 can be applied so long as {\alpha > 2d}. In this case, we take {\beta=\alpha/2-d} and see that the continuous modification is {\gamma}-Hölder continuous for all {\gamma < 1/2-d/\alpha}. Letting {\alpha} increase to infinity gives the result. ⬜


Proof of the Continuity Theorem

To show that a process {\{U_x\}_{x\in S}} is Hölder continuous, we need to bound

\displaystyle  \sup\left\{d(U_x,U_y)\colon x,y\in S,\ \lVert x-y\rVert < \epsilon\right\}.

In particular, this should be bounded of the form {\epsilon^\gamma} for small {\epsilon}. This depends on the joint distribution of {d(U_x,U_y)} as x and y vary so, if all that we are given to work with is the inequality (1) for the individual distributions, then it is not easy to obtain a good bound. One, rather extreme, upper bound on a set of nonnegative real numbers is given by their sum. This does at least allow us to make use of the linearity of expectation,

\displaystyle  \begin{aligned} {\mathbb E}\left[\sup_{\lVert x-y\rVert < \epsilon}d(U_x,U_y)^\alpha\right] &\le\sum_{\lVert x-y\rVert < \epsilon}{\mathbb E}\left[d(U_x,U_y)^\alpha\right]\\ &\le\sum_{\lVert x-y\rVert < \epsilon}C\epsilon^{d+\beta}. \end{aligned}

In cases of interest, the set S will be infinite, and the sum on the right hand side will contain infinitely many terms, so will diverge. As it is, this is not much help. However, if we restrict x and y to lie on a regular grid whose spacing is of order {\epsilon}, this idea does lead to useful bounds. Then, combining with the triangle inequality to split {d(U_x,U_y)} as a finite sum of terms like {d(U_{x^\prime},U_{y^\prime})} for pairs {(x^\prime,y^\prime)} lying on such grids, we can obtain reasonable bounds for more general points x and y. Choosing grids of spacing {2^{-n}} for integer n works well. This leads to considering the restriction of {U_x} to dyadic points {x=(x_1,\ldots,x_d)} where each {x_k} is of the form {2^{-n}a} for integer a. As I stated theorem 2 in a rather general form, where S can be any subset of {\mathbb R^d} not necessarily including the dyadic points, this adds a slight complication. However, it is easily resolved by approximating the dyadic grid points by elements of S instead, and we obtain a proof of Hölder continuity on a dense set of points.

Lemma 4 Let E be a separable complete metric space, S be a subset of the unit d-cube {[0,1)^d}, and {\{U_x\}_{x\in S}} be a collection of E-valued random variables satisfying (1).

Then, there exists a countable dense subset {\tilde S\subseteq S} such that, with probability one, {x\mapsto U_x} is {\gamma}-Hölder continuous on {\tilde S} for all {\gamma < \beta/\alpha}. In particular, the {\gamma}-Hölder coefficient {C_\gamma} satisfies {{\mathbb E}[C_\gamma^\alpha] < \infty}.

Proof: For each nonnegative integer n, we let {\mathbb D_n} denote the set of dyadic numbers of the form {x=2^{-n}a} for integer {0\le a < 2^n}. Also, let {I_n(x)} be the dyadic interval {[x,x+2^{-n})}.

Moving from 1 dimension to d dimensions, we let {\mathbb D^d_n} be the collection of {x=(x_1,\ldots,x_d)} such that each {x_k} is in {\mathbb D_n}, and write

\displaystyle  I_n(x)=I_n(x_1)\times\cdots\times I_n(x_d)\subseteq[0,1)^d.

We note that, for each n, these sets form a partition of {[0,1)^d}. Let {\mathbb{\tilde D}_n^d} denote the set of {x\in\mathbb D_n^d} such that {I_n(x)} has nonempty intersection with S and, for any such x, choose {\theta_n(x)\in I_n(x)\cap S}. Then, define the finite subset of S,

\displaystyle  S_n=\left\{\theta_n(x)\colon x\in\mathbb{\tilde D}_n^d\right\}.

Note that, whatever the choice of {\theta_n(x)}, it will lie in {I_{n+1}(y)} for some {y\in\mathbb{\tilde D}_{n+1}^d}. Hence, {\theta_{n+1}(y)} can be chosen to equal {\theta_n(x)} and, by doing this, we ensure that {S_n\subseteq S_{n+1}}. We take {\tilde S=\bigcup_{n=0}^\infty S_n}. This is easily seen to be dense in S. Consider {x\in S}, which will be contained in {I_n(y)} for some {y\in\mathbb{\tilde D}_n^d}. Then {\tilde y=\theta_n(y)} is in {\tilde S} and {\lVert x-\tilde y\rVert_\infty < 2^{-n}}. As n can be as large as we like, this shows that {\tilde S} is dense.

Consider the random variables,

\displaystyle  \begin{aligned} &Y_n=\sup\left\{d(U_{\theta_n(x)},U_{\theta_{n+1}(y)})\colon x\in\mathbb{\tilde D}_n^d,\ y\in\mathbb{\tilde D}_{n+1}^d\cap I_n(x)\right\},\\ &Z_n=\sup\left\{d(U_{\theta_n(x)},U_{\theta_n(y)})\colon x,y\in\mathbb{\tilde D}_n^k,\ \lVert x-y\rVert_\infty\le2^{-n}\right\}. \end{aligned}

We note that there are at most {2^{nd}} possible values of {x\in\mathbb{\tilde D}_n^d}. Then, there are at most {2^d} possible values for y in {\mathbb{\tilde D}_{n+1}^d\cap I_n(x)} and, then, {\lVert\theta_n(x)-\theta_{n+1}(y)\rVert_\infty < 2^{-n}}. Similarly, if {y\in\mathbb{\tilde D}_n^d} and {\lVert x-y\rVert_\infty\le2^{-n}}, then {x_k-y_k} is equal to {\pm2^{-n}} or 0, for each {k=1,\ldots,d}. Hence, there are at most {3^d} possible values for y and, then, {\lVert \theta_n(x)-\theta_n(y)\rVert_\infty < 2^{1-n}}. We suppose that inequality (1) holds using the {L^\infty} norm, so that,

\displaystyle  \begin{aligned} {\mathbb E}\left[Y_n^\alpha\right] &\le\sum_{ x\in\mathbb{\tilde D}_n^d,\ y\in\mathbb{\tilde D}_{n+1}^d\cap I_n(x) }{\mathbb E}\left[d(U_{\theta_n(x)},U_{\theta_{n+1}(y)})^\alpha\right]\\ &\le C2^{nd}2^d2^{-n(d+\beta)}=C2^d2^{-n\beta},\\ {\mathbb E}\left[Z_n^\alpha\right] &\le\sum_{x,y\in\mathbb{\tilde D}_n^k,\ \lVert x-y\rVert_\infty\le2^{-n}}{\mathbb E}\left[d(U_{\theta_n(x)},U_{\theta_n(y)})^\alpha\right]\\ &\le C2^{nd}3^d2^{(1-n)(d+\beta)} =C6^d2^{-(n-1)\beta}. \end{aligned}

Consequently, for {\gamma < \beta/\alpha}, if we set {\tilde Y_n=2^{n\gamma}Y_n} and {\tilde Z_n=2^{n\gamma}Z_n} then,

\displaystyle  \begin{aligned} &{\mathbb E}\left[\tilde Y_n^\alpha\right]\le C2^d2^{-n(\beta-\alpha\gamma)},\\ &{\mathbb E}\left[\tilde Z_n^\alpha\right]\le C6^d2^{\beta-n(\beta-\alpha\gamma)}. \end{aligned}

By the sums of geometric series, these have bounded sum over n. For distinct {x,y\in\tilde S}, choose integer {n\ge0} with

\displaystyle  2^{-(n+1)} < \lVert x-y\rVert_\infty\le2^{-n}.

Also, choose {m\ge n} large enough that x and y are both in {S_m}. For each integer {n\le k\le m} choose {x_k,y_k\in\mathbb{D}_k^d} such that {x\in I_k(x_k)} and {y\in I_k(y_k)}. By construction, {\theta(x_m)=x} and {x_{k+1}\in I_k(x_k)}, and similarly for y. Furthermore, {\lVert x_n-y_n\rVert_\infty\le2^{-n}}. Hence, by the triangle inequality,

\displaystyle  \begin{aligned} d(U_x,U_y) &\le d(U_{\theta_n(x_n)},U_{\theta_n(y_n)})\\ &\quad +\sum_{k=n}^{m-1}\left(d(U_{\theta_k(x_k)},U_{\theta_{k+1}(x_{k+1})})+d(U_{\theta_k(y_k)},U_{\theta_{k+1}(y_{k+1})}\right)\\ &\le Z_n+2 \sum_{k=n}^{m-1}Y_k\\ &=2^{-n\gamma}\left(\tilde Z_n+2\sum_{k=n}^\infty2^{k-n}\tilde Y_k\right)\\ &\le2^\gamma\lVert x-y\rVert_\infty^\gamma\left(\tilde Z_n+2(1-2^{-\gamma})^{-1}\sup_{k\ge n}\tilde Y_k\right)\\ \end{aligned}

Hence, the {\gamma}-Hölder coefficient on {\tilde S} satisfies,

\displaystyle  C_\gamma\le 2^{\gamma}\sup_{n}\tilde Z_n+2^{\gamma+1}(1-2^{-\gamma})^{-1}\sup_{n}\tilde Y_n.

Raising to the power of {\alpha} gives

\displaystyle  C_\gamma^\alpha \le 2^{\alpha(1+\gamma)}\sum_n\tilde Z_n^\alpha+2^{\alpha(2+\gamma)}(1-2^{-\gamma})^{-\alpha}\sum_n\tilde Y_n^\alpha,

which has finite expectation, as required. ⬜

This is the hard part of the proof over with now. Constructing the continuous modification over bounded index sets is straightforward.

Lemma 5 Let E be a separable and complete metric space, {S\subseteq{\mathbb R}^d} be bounded, and {\{U_x\}_{x\in S}} be a collection of E-valued random variables satisfying inequality (1). Then {U_x} has a continuous modification. Furthermore, with probability one, this modification is {\gamma}-Hölder continuous for all {0 < \gamma < \beta/\alpha}, and the Hölder coefficient satisfies {{\mathbb E}[C_\gamma^\alpha] < \infty}.

Proof: By scaling, if necessary, we can suppose without loss of generality that S is contained in the unit d-cube {[0,1)^d}. Then, applying lemma 4, there exists a countable dense subset {\tilde S} on which, for all {\gamma < \beta/\alpha}, the {\gamma}-Hölder coefficient {C_\gamma} satisfies {{\mathbb E}[C_\gamma^\alpha] < \infty}. We can let {A\subseteq\Omega} be the event on which {C_\gamma < \infty} for all {\gamma < \beta/\alpha} and, then, choosing any fixed {e\in E}, define the modification,

\displaystyle  \tilde U(\omega)=\begin{cases} \lim_{\substack{y\rightarrow x\\ y\in\tilde S}}U_y(\omega),&{\rm if\ }\omega\in A,\\ e,&{\rm otherwise}. \end{cases}

By uniform continuity, the limit over y exists and defines a {\gamma}-Hölder continuous map on S with coefficient {C_\gamma}. It only remains to show that this is indeed a modification. So, choosing {x\in S} and a sequence {x_n\in\tilde S} tending to {x}, Fatou’s lemma gives,

\displaystyle  \begin{aligned} {\mathbb E}[d(U_x,\tilde U_x)^\alpha] &\le\liminf_{n\rightarrow\infty}{\mathbb E}[d(U_x,U_{x_n})^\alpha]\\ &\le\lim_{n\rightarrow\infty}C\lVert x-x_n\rVert^{d+\beta}\\ &=0. \end{aligned}

Hence {d(U_x,\tilde U_x)=0} and, so, {\tilde U_x=U_x} almost surely. ⬜

Completing the proof by extending to unbounded index sets is now almost a formality.

Proof of Theorem 2: For each positive integer n, let {S_n} be the set of {x\in S} with {\lVert x\rVert\le n}. This is an increasing sequence of bounded subsets of S, which eventually contains any given bounded subset. Lemma 5 provides continuous modifications {\{U^n_x\}_{x\in S_n}} which, furthermore, are {\gamma}-Hölder continuous for each {\gamma < \beta/\alpha}. It is standard that, up to probability one, there can be at most one continuous modification on each set. To be precise, choosing countable dense subsets {\tilde S_n\subseteq S_n}, the set {A_n} of all {\omega\in\Omega} for which {U^n_x(\omega)=U^{n+1}_x(\omega)} over {x\in\tilde S_n} is measurable with probability one. Furthermore, by continuity, {U^n(\omega)} and {U^{n+1}(\omega)} agree on all of {S_n}, for all {\omega\in A_n}. Hence, {A=\bigcap_nA_n} has probability one and, fixing any {e\in E}, the required global modification is given by

\displaystyle  \tilde U_x(\omega) = \begin{cases} U^n_x(\omega),&{\rm for\ }\omega\in A,{\rm\ and\ }x\in S_n,\\ e,&{\rm for\ }\omega\not\in A. \end{cases}

This is clearly {\gamma}-Hölder continuous on any bounded subset of S, since it either agrees with {U^n} for sufficiently large n or is constant. ⬜

4 thoughts on “The Kolmogorov Continuity Theorem

    1. I’m not sure what you mean by X_t is separable. We do require the metric space in which X takes values to be separable. This is only so that d(X_s,X_t) is a measurable random variable. Even if the metric space is not separable, but you know that these random variables are measurable, then the theorem still works. Even if it is not measurable, so that the probabilities are not even defined, the theorem will still work as long as you interpret the probabilities as outer measures.

  1. Thanks a lot for your post. One can easily show that Theorem 1 is sharp. For example, define a real-valued stochastic process as X(t) = I_{U =< t } for all t \in [0,1], where "I" is the indicator function and U is a uniformly distributed random variable on the range [0,1]. Then we have E [ |X(t)-X(s)|^\alpha ] = |t-s| for all \alpha but obviously X has no continuous sample paths. Do you know an example that shows Theorem 2 is also sharp for example for the case d = 2? Thank you in advance!

Leave a comment