# The Kolmogorov Continuity Theorem

One of the common themes throughout the theory of continuous-time stochastic processes, is the importance of choosing good versions of processes. Specifying the finite distributions of a process is not sufficient to determine its sample paths so, if a continuous modification exists, then it makes sense to work with that. A relatively straightforward criterion ensuring the existence of a continuous version is provided by Kolmogorov’s continuity theorem.

For any positive real number ${\gamma}$, a map ${f\colon E\rightarrow F}$ between metric spaces E and F is said to be ${\gamma}$-Hölder continuous if there exists a positive constant C satisfying

$\displaystyle d(f(x),f(y))\le Cd(x,y)^\gamma$

for all ${x,y\in E}$. Hölder continuous functions are always continuous and, at least on bounded spaces, is a stronger property for larger values of the coefficient ${\gamma}$. So, if E is a bounded metric space and ${\alpha\le\beta}$, then every ${\beta}$-Hölder continuous map from E is also ${\alpha}$-Hölder continuous. In particular, 1-Hölder and Lipschitz continuity are equivalent.

Kolmogorov’s theorem gives simple conditions on the pairwise distributions of a process which guarantee the existence of a continuous modification but, also, states that the sample paths ${t\mapsto X_t}$ are almost surely locally Hölder continuous. That is, they are almost surely Hölder continuous on every bounded interval. To start with, we look at real-valued processes. Throughout this post, we work with repect to a probability space ${(\Omega,\mathcal F, {\mathbb P})}$. There is no need to assume the existence of any filtration, since they play no part in the results here

Theorem 1 (Kolmogorov) Let ${\{X_t\}_{t\ge0}}$ be a real-valued stochastic process such that there exists positive constants ${\alpha,\beta,C}$ satisfying

$\displaystyle {\mathbb E}\left[\lvert X_t-X_s\rvert^\alpha\right]\le C\lvert t-s\vert^{1+\beta},$

for all ${s,t\ge0}$. Then, X has a continuous modification which, with probability one, is locally ${\gamma}$-Hölder continuous for all ${0 < \gamma < \beta/\alpha}$.

# Pathwise Burkholder-Davis-Gundy Inequalities

As covered earlier in my notes, the Burkholder-David-Gundy inequality relates the moments of the maximum of a local martingale M with its quadratic variation,

 $\displaystyle c_p^{-1}{\mathbb E}[[M]^{p/2}_\tau]\le{\mathbb E}[\bar M_\tau^p]\le C_p{\mathbb E}[[M]^{p/2}_\tau].$ (1)

Here, ${\bar M_t\equiv\sup_{s\le t}\lvert M_s\rvert}$ is the running maximum, ${[M]}$ is the quadratic variation, ${\tau}$ is a stopping time, and the exponent ${p}$ is a real number greater than or equal to 1. Then, ${c_p}$ and ${C_p}$ are positive constants depending on p, but independent of the choice of local martingale and stopping time. Furthermore, for continuous local martingales, which are the focus of this post, the inequality holds for all ${p > 0}$.

Since the quadratic variation used in my notes, by definition, starts at zero, the BDG inequality also required the local martingale to start at zero. This is not an important restriction, but it can be removed by requiring the quadratic variation to start at ${[M]_0=M_0^2}$. Henceforth, I will assume that this is the case, which means that if we are working with the definition in my notes then we should add ${M_0^2}$ everywhere to the quadratic variation ${[M]}$.

In keeping with the theme of the previous post on Doob’s inequalities, such martingale inequalities should have pathwise versions of the form

 $\displaystyle c_p^{-1}[M]^{p/2}+\int\alpha dM\le\bar M^p\le C_p[M]^{p/2}+\int\beta dM$ (2)

for predictable processes ${\alpha,\beta}$. Inequalities in this form are considerably stronger than (1), since they apply on all sample paths, not just on average. Also, we do not require M to be a local martingale — it is sufficient to be a (continuous) semimartingale. However, in the case where M is a local martingale, the pathwise version (2) does imply the BDG inequality (1), using the fact that stochastic integration preserves the local martingale property.

Lemma 1 Let X and Y be nonnegative increasing measurable processes satisfying ${X\le Y-N}$ for a local (sub)martingale N starting from zero. Then, ${{\mathbb E}[X_\tau]\le{\mathbb E}[Y_\tau]}$ for all stopping times ${\tau}$.

Proof: Let ${\tau_n}$ be an increasing sequence of bounded stopping times increasing to infinity such that the stopped processes ${N^{\tau_n}}$ are submartingales. Then,

$\displaystyle {\mathbb E}[1_{\{\tau_n\ge\tau\}}X_\tau]\le{\mathbb E}[X_{\tau_n\wedge\tau}]={\mathbb E}[Y_{\tau_n\wedge\tau}]-{\mathbb E}[N_{\tau_n\wedge\tau}]\le{\mathbb E}[Y_{\tau_n\wedge\tau}]\le{\mathbb E}[Y_\tau].$

Letting n increase to infinity and using monotone convergence on the left hand side gives the result. ⬜

Moving on to the main statements of this post, I will mention that there are actually many different pathwise versions of the BDG inequalities. I opt for the especially simple statements given in Theorem 2 below. See the papers Pathwise Versions of the Burkholder-Davis Gundy Inequality by Bieglböck and Siorpaes, and Applications of Pathwise Burkholder-Davis-Gundy inequalities by Soirpaes, for slightly different approaches, although these papers do also effectively contain proofs of (3,4) for the special case of ${r=1/2}$. As usual, I am using ${x\vee y}$ to represent the maximum of two numbers.

Theorem 2 Let X and Y be nonnegative continuous processes with ${X_0=Y_0}$. For any ${0 < r\le1}$ we have,

 $\displaystyle (1-r)\bar X^r\le (3-2r)\bar Y^r+r\int(\bar X\vee\bar Y)^{r-1}d(X-Y)$ (3)

and, if X is increasing, this can be improved to,

 $\displaystyle \bar X^r\le (2-r)\bar Y^r+r\int(\bar X\vee\bar Y)^{r-1}d(X-Y).$ (4)

If ${r\ge1}$ and X is increasing then,

 $\displaystyle \bar X^r\le r^{r\vee 2}\,\bar Y^r+r^2\int(\bar X\vee\bar Y)^{r-1}d(X-Y).$ (5)

# Pathwise Martingale Inequalities

Recall Doob’s inequalities, covered earlier in these notes, which bound expectations of functions of the maximum of a martingale in terms of its terminal distribution. Although these are often applied to martingales, they hold true more generally for cadlag submartingales. Here, I use ${\bar X_t\equiv\sup_{s\le t}X_s}$ to denote the running maximum of a process.

Theorem 1 Let X be a nonnegative cadlag submartingale. Then,

• ${{\mathbb P}\left(\bar X_t \ge K\right)\le K^{-1}{\mathbb E}[X_t]}$ for all ${K > 0}$.
• ${\lVert\bar X_t\rVert_p\le (p/(p-1))\lVert X_t\rVert_p}$ for all ${p > 1}$.
• ${{\mathbb E}[\bar X_t]\le(e/(e-1)){\mathbb E}[X_t\log X_t+1]}$.

In particular, for a cadlag martingale X, then ${\lvert X\rvert}$ is a submartingale, so theorem 1 applies with ${\lvert X\rvert}$ in place of X.

We also saw the following much stronger (sub)martingale inequality in the post on the maximum maximum of martingales with known terminal distribution.

Theorem 2 Let X be a cadlag submartingale. Then, for any real K and nonnegative real t,

 $\displaystyle {\mathbb P}(\bar X_t\ge K)\le\inf_{x < K}\frac{{\mathbb E}[(X_t-x)_+]}{K-x}.$ (1)

This is particularly sharp, in the sense that for any distribution for ${X_t}$, there exists a martingale with this terminal distribution for which (1) becomes an equality simultaneously for all values of K. Furthermore, all of the inequalities stated in theorem 1 follow from (1). For example, the first one is obtained by taking ${x=0}$ in (1). The remaining two can also be proved from (1) by integrating over K.

Note that all of the submartingale inequalities above are of the form

 $\displaystyle {\mathbb E}[F(\bar X_t)]\le{\mathbb E}[G(X_t)]$ (2)

for certain choices of functions ${F,G\colon{\mathbb R}\rightarrow{\mathbb R}^+}$. The aim of this post is to show how they have a more general pathwise’ form,

 $\displaystyle F(\bar X_t)\le G(X_t) - \int_0^t\xi\,dX$ (3)

for some nonnegative predictable process ${\xi}$. It is relatively straightforward to show that (2) follows from (3) by noting that the integral is a submartingale and, hence, has nonnegative expectation. To be rigorous, there are some integrability considerations to deal with, so a proof will be included later in this post.

Inequality (3) is required to hold almost everywhere, and not just in expectation, so is a considerably stronger statement than the standard martingale inequalities. Furthermore, it is not necessary for X to be a submartingale for (3) to make sense, as it holds for all semimartingales. We can go further, and even drop the requirement that X is a semimartingale. As we will see, in the examples covered in this post, ${\xi_t}$ will be of the form ${h(\bar X_{t-})}$ for an increasing right-continuous function ${h\colon{\mathbb R}\rightarrow{\mathbb R}}$, so integration by parts can be used,

 $\displaystyle \int h(\bar X_-)\,dX = h(\bar X)X-h(\bar X_0)X_0 - \int X\,dh(\bar X).$ (4)

The right hand side of (4) is well-defined for any cadlag real-valued process, by using the pathwise Lebesgue–Stieltjes integral with respect to the increasing process ${h(\bar X)}$, so can be used as the definition of ${\int h(\bar X_-)dX}$. In the case where X is a semimartingale, integration by parts ensures that this agrees with the stochastic integral ${\int\xi\,dX}$. Since we now have an interpretation of (3) in a pathwise sense for all cadlag processes X, it is no longer required to suppose that X is a submartingale, a semimartingale, or even require the existence of an underlying probability space. All that is necessary is for ${t\mapsto X_t}$ to be a cadlag real-valued function. Hence, we reduce the martingale inequalities to straightforward results of real-analysis not requiring any probability theory and, consequently, are much more general. I state the precise pathwise generalizations of Doob’s inequalities now, leaving the proof until later in the post. As the first of inequality of theorem 1 is just the special case of (1) with ${x=0}$, we do not need to explicitly include this here.

Theorem 3 Let X be a cadlag process and t be a nonnegative time.

1. For real ${K > x}$,
 $\displaystyle 1_{\{\bar X_t\ge K\}}\le\frac{(X_t-x)_+}{K-x}-\int_0^t\xi\,dX$ (5)

where ${\xi=(K-x)^{-1}1_{\{\bar X_-\ge K\}}}$.

2. If X is nonnegative and p,q are positive reals with ${p^{-1}+q^{-1}=1}$ then,
 $\displaystyle \bar X_t^p\le q^p X^p_t-\int_0^t\xi dX$ (6)

where ${\xi=pq\bar X_-^{p-1}}$.

3. If X is nonnegative then,
 $\displaystyle \bar X_t\le\frac{e}{e-1}\left( X_t \log X_t +1\right)-\int_0^t\xi\,dX$ (7)

where ${\xi=\frac{e}{e-1}\log(\bar X_-\vee1)}$.

# Semimartingale Local Times

For a stochastic process X taking values in a state space E, its local time at a point ${x\in E}$ is a measure of the time spent at x. For a continuous time stochastic process, we could try and simply compute the Lebesgue measure of the time at the level,

 $\displaystyle L^x_t=\int_0^t1_{\{X_s=x\}}ds.$ (1)

For processes which hit the level ${x}$ and stick there for some time, this makes some sense. However, if X is a standard Brownian motion, it will always give zero, so is not helpful. Even though X will hit every real value infinitely often, continuity of the normal distribution gives ${{\mathbb P}(X_s=x)=0}$ at each positive time, so that that ${L^x_t}$ defined by (1) will have zero expectation.

Rather than the indicator function of ${\{X=x\}}$ as in (1), an alternative is to use the Dirac delta function,

 $\displaystyle L^x_t=\int_0^t\delta(X_s-x)\,ds.$ (2)

Unfortunately, the Dirac delta is not a true function, it is a distribution, so (2) is not a well-defined expression. However, if it can be made rigorous, then it does seem to have some of the properties we would want. For example, the expectation ${{\mathbb E}[\delta(X_s-x)]}$ can be interpreted as the probability density of ${X_s}$ evaluated at ${x}$, which has a positive and finite value, so it should lead to positive and finite local times. Equation (2) still relies on the Lebesgue measure over the time index, so will not behave as we may expect under time changes, and will not make sense for processes without a continuous probability density. A better approach is to integrate with respect to the quadratic variation,

 $\displaystyle L^x_t=\int_0^t\delta(X_s-x)d[X]_s$ (3)

which, for Brownian motion, amounts to the same thing. Although (3) is still not a well-defined expression, since it still involves the Dirac delta, the idea is to come up with a definition which amounts to the same thing in spirit. Important properties that it should satisfy are that it is an adapted, continuous and increasing process with increments supported on the set ${\{X=x\}}$,

 $\displaystyle L^x_t=\int_0^t1_{\{X_s=x\}}dL^x_s.$

Local times are a very useful and interesting part of stochastic calculus, and finds important applications to excursion theory, stochastic integration and stochastic differential equations. However, I have not covered this subject in my notes, so do this now. Recalling Ito’s lemma for a function ${f(X)}$ of a semimartingale X, this involves a term of the form ${\int f^{\prime\prime}(X)d[X]}$ and, hence, requires ${f}$ to be twice differentiable. If we were to try to apply the Ito formula for functions which are not twice differentiable, then ${f^{\prime\prime}}$ can be understood in terms of distributions, and delta functions can appear, which brings local times into the picture. In the opposite direction, which I take in this post, we can try to generalise Ito’s formula and invert this to give a meaning to (3). Continue reading “Semimartingale Local Times”

# A Process With Hidden Drift

Consider a stochastic process X of the form

 $\displaystyle X_t=W_t+\int_0^t\xi_sds,$ (1)

for a standard Brownian motion W and predictable process ${\xi}$, defined with respect to a filtered probability space ${(\Omega,\mathcal F,\{\mathcal F_t\}_{t\in{\mathbb R}_+},{\mathbb P})}$. For this to make sense, we must assume that ${\int_0^t\lvert\xi_s\rvert ds}$ is almost surely finite at all times, and I will suppose that ${\mathcal F_\cdot}$ is the filtration generated by W.

The question is whether the drift ${\xi}$ can be backed out from knowledge of the process X alone. As I will show with an example, this is not possible. In fact, in our example, X will itself be a standard Brownian motion, even though the drift ${\xi}$ is non-trivial (that is, ${\int\xi dt}$ is not almost surely zero). In this case X has exactly the same distribution as W, so cannot be distinguished from the driftless case with ${\xi=0}$ by looking at the distribution of X alone.

On the face of it, this seems rather counter-intuitive. By standard semimartingale decomposition, it is known that we can always decompose

 $\displaystyle X=M+A$ (2)

for a unique continuous local martingale M starting from zero, and unique continuous FV process A. By uniqueness, ${M=W}$ and ${A=\int\xi dt}$. This allows us to back out the drift ${\xi}$ and, in particular, if the drift is non-trivial then X cannot be a martingale. However, in the semimartingale decomposition, it is required that M is a martingale with respect to the original filtration ${\mathcal F_\cdot}$. If we do not know the filtration ${\mathcal F_\cdot}$, then it might not be possible to construct decomposition (2) from knowledge of X alone. As mentioned above, we will give an example where X is a standard Brownian motion which, in particular, means that it is a martingale under its natural filtration. By the semimartingale decomposition result, it is not possible for X to be an ${\mathcal F_\cdot}$-martingale. A consequence of this is that the natural filtration of X must be strictly smaller than the natural filtration of W.

The inspiration for this post was a comment by Gabe posing the following question: If we take ${\mathbb F}$ to be the filtration generated by a standard Brownian motion W in ${(\Omega,\mathcal F,{\mathbb P})}$, and we define ${\tilde W_t=W_t+\int_0^t\Theta_udu}$, can we find an ${\mathbb F}$-adapted ${\Theta}$ such that the filtration generated by ${\tilde W}$ is smaller than ${\mathbb F}$? Our example gives an affirmative answer. Continue reading “A Process With Hidden Drift”

# Projection in Discrete Time

It has been some time since my last post, but I am continuing now with the stochastic calculus notes on optional and predictable projection. In this post, I will go through the ideas in the discrete-time situation. All of the main concepts involved in optional and predictable projection are still present in discrete time, but the theory is much simpler. It is only really in continuous time that the projection theorems really show their power, so the aim of this post is to motivate the concepts in a simple setting before generalising to the full, continuous-time situation. Ideally, this would have been published before the posts on optional and predictable projection in continuous time, so it is a bit out of sequence.

We consider time running through the discrete index set ${{\mathbb Z}^+=\{0,1,2,\ldots\}}$, and work with respect to a filtered probability space ${(\Omega,\mathcal{F},\{\mathcal{F}_n\}_{n=0,1,\ldots},{\mathbb P})}$. Then, ${\mathcal{F}_n}$ is used to represent the collection of events observable up to and including time n. Stochastic processes will all be real-valued and defined up to almost-sure equivalence. That is, processes X and Y are considered to be the same if ${X_n=Y_n}$ almost surely for each ${n\in{\mathbb Z}^+}$. The projections of a process X are defined as follows.

Definition 1 Let X be a measurable process. Then,

1. the optional projection, ${{}^{\rm o}\!X}$, exists if and only if ${{\mathbb E}[\lvert X_n\rvert\,\vert\mathcal{F}_n]}$ is almost surely finite for each n, in which case
 $\displaystyle {}^{\rm o}\!X_n={\mathbb E}[X_n\,\vert\mathcal{F}_n].$ (1)
2. the predictable projection, ${{}^{\rm p}\!X}$, exists if and only if ${{\mathbb E}[\lvert X_n\rvert\,\vert\mathcal{F}_{n-1}]}$ is almost surely finite for each n, in which case
 $\displaystyle {}^{\rm p}\!X_n={\mathbb E}[X_n\,\vert\mathcal{F}_{n-1}].$ (2)

# The Projection Theorems

In this post, I introduce the concept of optional and predictable projections of jointly measurable processes. Optional projections of right-continuous processes and predictable projections of left-continuous processes were constructed in earlier posts, with the respective continuity conditions used to define the projection. These are, however, just special cases of the general theory. For arbitrary measurable processes, the projections cannot be expected to satisfy any such pathwise regularity conditions. Instead, we use the measurability criteria that the projections should be, respectively, optional and predictable.

The projection theorems are a relatively straightforward consequence of optional and predictable section. However, due to the difficulty of proving the section theorems, optional and predictable projection is generally considered to be an advanced or hard part of stochastic calculus. Here, I will make use of the section theorems as stated in an earlier post, but leave the proof of those until after developing the theory of projection.

As usual, we work with respect to a complete filtered probability space ${(\Omega,\mathcal{F},\{\mathcal{F}\}_{t\ge0},{\mathbb P})}$, and only consider real-valued processes. Any two processes are considered to be the same if they are equal up to evanescence. The optional projection is then defined (up to evanescence) by the following.

Theorem 1 (Optional Projection) Let X be a measurable process such that ${{\mathbb E}[1_{\{\tau < \infty\}}\lvert X_\tau\rvert\;\vert\mathcal{F}_\tau]}$ is almost surely finite for each stopping time ${\tau}$. Then, there exists a unique optional process ${{}^{\rm o}\!X}$, referred to as the optional projection of X, satisfying

 $\displaystyle 1_{\{\tau < \infty\}}{}^{\rm o}\!X_\tau={\mathbb E}[1_{\{\tau < \infty\}}X_\tau\,\vert\mathcal{F}_\tau]$ (1)

almost surely, for each stopping time ${\tau}$.

Predictable projection is defined similarly.

Theorem 2 (Predictable Projection) Let X be a measurable process such that ${{\mathbb E}[1_{\{\tau < \infty\}}\lvert X_\tau\rvert\;\vert\mathcal{F}_{\tau-}]}$ is almost surely finite for each predictable stopping time ${\tau}$. Then, there exists a unique predictable process ${{}^{\rm p}\!X}$, referred to as the predictable projection of X, satisfying

 $\displaystyle 1_{\{\tau < \infty\}}{}^{\rm p}\!X_\tau={\mathbb E}[1_{\{\tau < \infty\}}X_\tau\,\vert\mathcal{F}_{\tau-}]$ (2)

almost surely, for each predictable stopping time ${\tau}$.

# Pathwise Regularity of Optional and Predictable Processes

As I have mentioned before in these notes, when working with processes in continuous time, it is important to select a good modification. Typically, this means that we work with processes which are left or right continuous. However, in general, it can be difficult to show that the paths of a process satisfy such pathwise regularity. In this post I show that for optional and predictable processes, the section theorems introduced in the previous post can be used to considerably simplify the situation. Although they are interesting results in their own right, the main application in these notes will be to optional and predictable projection. Once the projections are defined, the results from this post will imply that they preserve certain continuity properties of the process paths.

Suppose, for example, that we have a continuous-time process X which we want to show to be right-continuous. It is certainly necessary that, for any sequence of times ${t_n\in{\mathbb R}_+}$ decreasing to a limit ${t}$, ${X_{t_n}}$ almost-surely tends to ${X_t}$. However, even if we can prove this for every possible decreasing sequence ${t_n}$, it does not follow that X is right-continuous. As a counterexample, if ${\tau\colon\Omega\rightarrow{\mathbb R}}$ is any continuously distributed random time, then the process ${X_t=1_{\{t\le \tau\}}}$ is not right-continuous. However, so long as the distribution of ${\tau}$ has no atoms, X is almost-surely continuous at each fixed time t. It is remarkable, then, that if we generalise to look at sequences of stopping times, then convergence in probability along decreasing sequences of stopping times is enough to guarantee everywhere right-continuity of the process. At least, it is enough so long as we restrict consideration to optional processes.

As usual, we work with respect to a complete filtered probability space ${(\Omega,\mathcal{F},\{\mathcal{F}_t\}_{t\ge0},{\mathbb P})}$. Two processes are considered to be the same if they are equal up to evanescence, and any pathwise property is said to hold if it holds up to evanescence. That is, a process is right-continuous if and only is it is everywhere right-continuous on a set of probability 1. All processes will be taken to be real-valued, and a process is said to have left (or right) limits if its left (or right) limits exist everywhere, up to evanescence, and are finite.

Theorem 1 Let X be an optional process. Then,

1. X is right-continuous if and only if ${X_{\tau_n}\rightarrow X_\tau}$ in probability, for each uniformly bounded sequence ${\tau_n}$ of stopping times decreasing to a limit ${\tau}$.
2. X has right limits if and only if ${X_{\tau_n}}$ converges in probability, for each uniformly bounded decreasing sequence ${\tau_n}$ of stopping times.
3. X has left limits if and only if ${X_{\tau_n}}$ converges in probability, for each uniformly bounded increasing sequence ${\tau_n}$ of stopping times.

The only if’ parts of these statements is immediate, since convergence everywhere trivially implies convergence in probability. The importance of this theorem is in the `if’ directions. That is, it gives sufficient conditions to guarantee that the sample paths satisfy the respective regularity properties.

Note that conditions for left-continuity are absent from the statements of Theorem 1. In fact, left-continuity does not follow from the corresponding property along sequences of stopping times. Consider, for example, a Poisson process, X. This is right-continuous but not left-continuous. However, its jumps occur at totally inaccessible times. This implies that, for any sequence ${\tau_n}$ of stopping times increasing to a finite limit ${\tau}$, it is true that ${X_{\tau_n}}$ converges almost surely to ${X_\tau}$. In light of such examples, it is even more remarkable that right-continuity and the existence of left and right limits can be determined by just looking at convergence in probability along monotonic sequences of stopping times. Theorem 1 will be proven below, using the optional section theorem.

For predictable processes, we can restrict attention to predictable stopping times. In this case, we obtain a condition for left-continuity as well as for right-continuity.

Theorem 2 Let X be a predictable process. Then,

1. X is right-continuous if and only if ${X_{\tau_n}\rightarrow X_\tau}$ in probability, for each uniformly bounded sequence ${\tau_n}$ of predictable stopping times decreasing to a limit ${\tau}$.
2. X is left-continuous if and only if ${X_{\tau_n}\rightarrow X_\tau}$ in probability, for each uniformly bounded sequence ${\tau_n}$ of predictable stopping times increasing to a limit ${\tau}$.
3. X has right limits if and only if ${X_{\tau_n}}$ converges in probability, for each uniformly bounded decreasing sequence ${\tau_n}$ of predictable stopping times.
4. X has left limits if and only if ${X_{\tau_n}}$ converges in probability, for each uniformly bounded increasing sequence ${\tau_n}$ of predictable stopping times.

Again, the proof is given below, and relies on the predictable section theorem. Continue reading “Pathwise Regularity of Optional and Predictable Processes”

# Measurable Projection and the Debut Theorem

I will discuss some of the immediate consequences of the following deceptively simple looking result.

Theorem 1 (Measurable Projection) If ${(\Omega,\mathcal{F},{\mathbb P})}$ is a complete probability space and ${A\in\mathcal{B}({\mathbb R})\otimes\mathcal{F}}$ then ${\pi_\Omega(A)\in\mathcal{F}}$.

The notation ${\pi_B}$ is used to denote the projection from the cartesian product ${A\times B}$ of sets A and B onto B. That is, ${\pi_B((a,b)) = b}$. As is standard, ${\mathcal{B}({\mathbb R})}$ is the Borel sigma-algebra on the reals, and ${\mathcal{A}\otimes\mathcal{B}}$ denotes the product of sigma-algebras.

Theorem 1 seems almost obvious. Projection is a very simple map and we may well expect the projection of, say, a Borel subset of ${{\mathbb R}^2}$ onto ${{\mathbb R}}$ to be Borel. In order to formalise this, we could start by noting that sets of the form ${A\times B}$ for Borel A and B have an easily described, and measurable, projection, and the Borel sigma-algebra is the closure of the collection such sets under countable unions and under intersections of decreasing sequences of sets. Furthermore, the projection operator commutes with taking the union of sequences of sets. Unfortunately, this method of proof falls down when looking at the limit of decreasing sequences of sets, which does not commute with projection. For example, the decreasing sequence of sets ${S_n=(0,1/n)\times{\mathbb R}\subseteq{\mathbb R}^2}$ all project onto the whole of ${{\mathbb R}}$, but their limit is empty and has empty projection.

There is an interesting history behind Theorem 1, as mentioned by Gerald Edgar on MathOverflow (1) in answer to The most interesting mathematics mistake? In a 1905 paper, Henri Lebesgue asserted that the projection of a Borel subset of the plane onto the line is again a Borel set (Lebesgue, (3), pp 191–192). This was based on the erroneous assumption that projection commutes with the limit of a decreasing sequence of sets. The mistake was spotted, in 1916, by Mikhail Suslin, and led to his investigation of analytic sets and to begin the study of what is now known as descriptive set theory. See Kanamori, (2), for more details. In fact, as was shown by Suslin, projections of Borel sets need not be Borel. So, by considering the case where ${\Omega={\mathbb R}}$ and ${\mathcal{F}=\mathcal{B}({\mathbb R})}$, Theorem 1 is false if the completeness assumption is dropped. I will give a proof of Theorem 1 but, as it is a bit involved, this is left for a later post.

For now, I will state some consequences of the measurable projection theorem which are important to the theory of continuous-time stochastic processes, starting with the following. Throughout this post, the underlying probability space ${(\Omega,\mathcal{F})}$ is assumed to be complete, and stochastic processes are taken to be real-valued, or take values in the extended reals ${\bar{\mathbb R}={\mathbb R}\cup\{\pm\infty\}}$, with time index ranging over ${{\mathbb R}_+}$. For a first application of measurable projection, it allows us to show that the supremum of a jointly measurable processes is measurable.

Lemma 2 If X is a jointly measurable process and ${S\in\mathcal{B}(\mathbb{R}_+)}$ then ${\sup_{s\in S}X_s}$ is measurable.

Proof: Setting ${U=\sup_{s\in S}X_s}$ then, for each real K, ${U > K}$ if and only if ${X_s > K}$ for some ${s\in S}$. Hence,

$\displaystyle U^{-1}\left((K,\infty]\right)=\pi_\Omega\left((S\times\Omega)\cap X^{-1}\left((K,\infty]\right)\right).$

By the measurable projection theorem, this is in ${\mathcal{F}}$ and, as sets of the form ${(K,\infty]}$ generate the Borel sigma-algebra on ${\mathbb{\bar R}}$, U is ${\mathcal{F}}$-measurable. ⬜

Next, the running maximum of a jointly measurable process is again jointly measurable.

Lemma 3 If X is a jointly measurable process then ${X^*_t\equiv\sup_{s\le t}X_s}$ is also jointly measurable.

# Predictable Projection For Left-Continuous Processes

In the previous post, I looked at optional projection. Given a non-adapted process X we construct a new, adapted, process Y by taking the expected value of ${X_t}$ conditional on the information available up until time t. I will now concentrate on predictable projection. This is a very similar concept, except that we now condition on the information available strictly before time t.

It will be assumed, throughout this post, that the underlying filtered probability space ${(\Omega,\mathcal{F},\{\mathcal{F}_t\}_{t\in{\mathbb R}_+},{\mathbb P})}$ satisfies the usual conditions, meaning that it is complete and right-continuous. This is just for convenience, as most of the results stated here extend easily to non-right-continuous filtrations. The sigma-algebra

$\displaystyle \mathcal{F}_{t-} = \sigma\left(\mathcal{F}_s\colon s < t\right)$

represents the collection of events which are observable before time t and, by convention, we take ${\mathcal{F}_{0-}=\mathcal{F}_0}$. Then, the conditional expectation of X is written as,

 $\displaystyle Y_t={\mathbb E}[X_t\;\vert\mathcal{F}_{t-}]{\rm\ \ (a.s.)}$ (1)

By definition, Y is adapted. However, at each time, (1) only defines Y up to a zero probability set. It does not determine the paths of Y, which requires specifying its values simultaneously at the uncountable set of times in ${{\mathbb R}_+}$. So, (1) does not tell us the distribution of Y at random times, and it is necessary to specify an appropriate version for Y. Predictable projection gives a uniquely defined modification satisfying (1). The full theory of predictable projection for jointly measurable processes requires the predictable section theorem. However, as I demonstrate here, in the case where X is left-continuous, predictable projection can be done by more elementary methods. The statements and most of the proofs in this post will follow very closely those given previously for optional projection. The main difference is that left and right limits are exchanged, predictable stopping times are used in place of general stopping times, and the sigma algebra ${\mathcal{F}_{t-}}$ is used in place of ${\mathcal{F}_t}$.

Stochastic processes will be defined up to evanescence, so two processes are considered to be the same if they are equal up to evanescence. In order to apply (1), some integrability requirements need to imposed. I will use local integrability. Recall that, in these notes, a process X is locally integrable if there exists a sequence of stopping times ${\tau_n}$ increasing to infinity and such that

 $\displaystyle 1_{\{\tau_n > 0\}}\sup_{t \le \tau_n}\lvert X_t\rvert$ (2)

is integrable. This is a strong enough condition for the conditional expectation (1) to exist, not just at each fixed time, but also whenever t is a stopping time. The main result of this post can now be stated.

Theorem 1 (Predictable Projection) Let X be a left-continuous and locally integrable process. Then, there exists a unique left-continuous process Y satisfying (1).

As it is left-continuous, the fact that Y is specified, almost surely, at any time t by (1) means that it is uniquely determined up to evanescence. The main content of Theorem 1 is the existence of Y, and the proof of this is left until later in this post.

The process defined by Theorem 1 is called the predictable projection of X, and is denoted by ${{}^{\rm p}\!X}$. So, ${{}^{\rm p}\!X}$ is the unique left-continuous process satisfying

 $\displaystyle {}^{\rm p}\!X_t={\mathbb E}[X_t\;\vert\mathcal{F}_{t-}]{\rm\ \ (a.s.)}$ (3)

for all times t. In practice, X will usually not just be left-continuous, but will also have right limits everywhere. That is, it is caglad (“continu à gauche, limites à droite”).

Theorem 2 Let X be a caglad and locally integrable process. Then, its predictable projection is caglad.

The simplest non-trivial example of predictable projection is where ${X_t}$ is constant in t and equal to an integrable random variable U. Then, ${{}^{\rm p}\!X_t=M_{t-}}$ is the left-limits of the cadlag martingale ${M_t={\mathbb E}[U\;\vert\mathcal{F}_t]}$, so ${{}^{\rm p}\!X}$ is easily seen to be a caglad process. Continue reading “Predictable Projection For Left-Continuous Processes”