# Stochastic Differential Equations

Stochastic differential equations (SDEs) form a large and very important part of the theory of stochastic calculus. Much like ordinary differential equations (ODEs), they describe the behaviour of a dynamical system over infinitesimal time increments, and their solutions show how the system evolves over time. The difference with SDEs is that they include a source of random noise., typically given by a Brownian motion. Since Brownian motion has many pathological properties, such as being everywhere nondifferentiable, classical differential techniques are not well equipped to handle such equations. Standard results regarding the existence and uniqueness of solutions to ODEs do not apply in the stochastic case, and cannot readily describe what it even means to solve such as system. I will make some posts explaining how the theory of stochastic calculus applies to systems described by an SDE.

Consider a stochastic differential equation describing the evolution of a real-valued process {Xt}t≥0,

 $\displaystyle dX_t = \sigma(X_t)\,dW_t + b(X_t)\,dt$ (1)

which can be specified along with an initial condition X0 = x0. Here, b is the drift specifying how X moves on average across the dt time, σ is a volatility term giving the amplitude of the random noise and W is a driving Brownian motion providing the source of the randomness. There are numerous situations where equations such as (1) are used, with applications in physics, finance, filtering theory, and many other areas.

In the case where σ is zero, (1) is just an ordinary differential equation dX/dt = b(X). In the general case, we can informally think of dividing through by dt to give an ODE plus an additional noise term

 $\displaystyle \frac{dX_t}{dt}=b(X_t)+\sigma(X_t)\xi_t.$ (2)

I have set ξt = dWt/dt which can be thought of as a process whose values at each time are independent zero-mean random variables. As mentioned above, though, Brownian motion is not differentiable so this does not exist in the usual sense. While it can be described by a kind of random distribution, even distribution theory is not well-equipped to handle such equations involving multiplying by the nondifferentiable process σ(Xt). Instead, (1) can be integrated to obtain

 $\displaystyle X_t=X_0+\int_0^t\sigma(X_s)\,dW_s+\int_0^tb(X_s)\,ds,$ (3)

where the right-hand-side is interpreted using stochastic integration with respect to the semimartingale W. Likewise, X will be a semimartingale, and such solutions are often referred to as diffusions.

The differential form (1) can be interpreted as a shorthand for the integral expression (3), which I will do in these notes. It can be generalized to n-dimensional processes by allowing b to take values in n, a(x) to be an n × m matrix, and W to be an m-dimensional Brownian motion. That is, W = (W1, …, Wm) where Wi are independent Brownian motions. I will sometimes write this as

 $\displaystyle dX^t_i=\sigma_{ij}(X_t)dW^j_t+b_i(X_t)dt$

where the summation convention is being applied, with subscripts or superscripts occuring more than once in a single term being summed from 1 to n.

Unlike ODEs, when dealing with SDEs we need to consider what underlying probability space the solution is defined with respect to. This leads to the existence of different classes of solutions.

• Strong solutions where X can be expressed as a measurable function of the Brownian motion W or, equivalently, X is adapted to its natural filtration.
• Weak solutions where X need not be a function of W. Such cases may require additional randomness so may not exist on the probability space with respect to which the Brownian motion W is defined. It can be necessary to extend the filtered probability space to construct these solutions.

Likewise, when considering uniqueness of solutions, there are different ways this occurs.

• Pathwise uniqueness where, up to indistinguishability, there is only one solution X. This should hold not just on one specific space containing a Brownian motion W, but on all such spaces. That is, weak solutions should be unique.
• Uniqueness in law where there may be multiple pathwise solutions, but their distribution is uniquely determined by the SDE.

There are various general conditions under which strong solutions and pathwise uniqueness are guaranteed for SDE (1) , such as the Itô result for Lipschitz continuous coefficients. I covered this situation in a previous post.

Other than using the SDE (1), such systems can also be described by an associated differential operator. For the n-dimensional case set a(x) = σ(x)σ(x)T, which is an n × n positive semidefinite matrix. Then, the second order operator L can be defined

 $\displaystyle Lf(x)=\frac12a_{ij}(x)f_{,ij}(x)+b_{i}(x)f_{,i}(x)$

operating on twice continuously differentiable functions f: ℝn → ℝ. Being able to effortlessly switch between descriptions using the SDE (1) and the operator L is a huge benefit when working with such systems. There are several different ways in which the operator can be used to describe a stochastic process, all of which relate to weak solutions and uniqueness in law of the SDE.

Markov Generator: A Markov process is a weak solution to the SDE (1) if its infinitesimal generator is L. That is, if the transition function is Pt then,

 $\displaystyle \lim_{t\rightarrow0}t^{-1}(P_tf-f)=Lf$

for suitably regular functions f.

Backwards Equation: For a function f: ℝn × ℝ+ → ℝ, f(t, Xt) is a local martingale if and only if it solves the partial differential equation (PDE)

 $\displaystyle \frac{\partial f}{\partial t}+Lf=0.$

Consequently, for any time t > 0 and function g: ℝd → ℝ, if we let f be a solution to the PDE above with boundary condition f(x, t) = g(x) then, assuming integrability conditions, the conditional expectations at times s < t are

 $\displaystyle {\mathbb E}[g(X_t)\;\vert\mathcal F_s]=f(X_s,s).$

If the conditions are satisfied, this describes a Markov process and gives its transition probabilities, describing the distribution of X and implying uniqueness in law.

Forward Equation: Assuming that it is sufficiently smooth, the probability density p(t, x) of Xt satisfies the PDE

 $\displaystyle \frac{\partial p}{\partial t}=L^Tf.$

where LT is the transpose of operator L

 $\displaystyle L^Tp=\frac12(a_{ij}p)_{,ij}+(b_ip)_{,i}.$

If this PDE has a unique solution for given initial distribution, then this uniquely determines the distribution of Xt. So, if unique solutions to the forward equation exist starting at every future time, it gives uniqueness in law for X.

Martingale problem: Any weak solution to SDE (1) satisfies the property that

 $\displaystyle f(X_t)-\int_0^t Lf(X_s)\,ds$

is a local martingale for twice continuously differentiable functions f: ℝn → ℝ. This approach, which was pioneered by Stroock and Varadhan, has many benefits over the other applications of operator L described above, since it applies much more generally. We do not need to a-priori impose any properties on X such as being Markov, and as the test functions f are chosen at will, they automatically satisfy the necessary regularity properties. As well as being a very general way to describe solutions to a stochastic dynamical system, it turns out to be very fruitful. The striking and far-reaching Stroock–Varadhan uniqueness theorem, in particular, guarantees existence and uniqueness in law so long as a is continuous and positive definite and b is locally bounded.

# The Kolmogorov Continuity Theorem

One of the common themes throughout the theory of continuous-time stochastic processes, is the importance of choosing good versions of processes. Specifying the finite distributions of a process is not sufficient to determine its sample paths so, if a continuous modification exists, then it makes sense to work with that. A relatively straightforward criterion ensuring the existence of a continuous version is provided by Kolmogorov’s continuity theorem.

For any positive real number ${\gamma}$, a map ${f\colon E\rightarrow F}$ between metric spaces E and F is said to be ${\gamma}$-Hölder continuous if there exists a positive constant C satisfying

 $\displaystyle d(f(x),f(y))\le Cd(x,y)^\gamma$

for all ${x,y\in E}$. The smallest value of C satisfying this inequality is known as the ${\gamma}$-Hölder coefficient of ${f}$. Hölder continuous functions are always continuous and, at least on bounded spaces, is a stronger property for larger values of the coefficient ${\gamma}$. So, if E is a bounded metric space and ${\alpha\le\beta}$, then every ${\beta}$-Hölder continuous map from E is also ${\alpha}$-Hölder continuous. In particular, 1-Hölder and Lipschitz continuity are equivalent.

Kolmogorov’s theorem gives simple conditions on the pairwise distributions of a process which guarantee the existence of a continuous modification but, also, states that the sample paths ${t\mapsto X_t}$ are almost surely locally Hölder continuous. That is, they are almost surely Hölder continuous on every bounded interval. To start with, we look at real-valued processes. Throughout this post, we work with repect to a probability space ${(\Omega,\mathcal F, {\mathbb P})}$. There is no need to assume the existence of any filtration, since they play no part in the results here

Theorem 1 (Kolmogorov) Let ${\{X_t\}_{t\ge0}}$ be a real-valued stochastic process such that there exists positive constants ${\alpha,\beta,C}$ satisfying

 $\displaystyle {\mathbb E}\left[\lvert X_t-X_s\rvert^\alpha\right]\le C\lvert t-s\vert^{1+\beta},$

for all ${s,t\ge0}$. Then, X has a continuous modification which, with probability one, is locally ${\gamma}$-Hölder continuous for all ${0 < \gamma < \beta/\alpha}$.

# Pathwise Burkholder-Davis-Gundy Inequalities

As covered earlier in my notes, the Burkholder-David-Gundy inequality relates the moments of the maximum of a local martingale M with its quadratic variation,

 $\displaystyle c_p^{-1}{\mathbb E}[[M]^{p/2}_\tau]\le{\mathbb E}[\bar M_\tau^p]\le C_p{\mathbb E}[[M]^{p/2}_\tau].$ (1)

Here, ${\bar M_t\equiv\sup_{s\le t}\lvert M_s\rvert}$ is the running maximum, ${[M]}$ is the quadratic variation, ${\tau}$ is a stopping time, and the exponent ${p}$ is a real number greater than or equal to 1. Then, ${c_p}$ and ${C_p}$ are positive constants depending on p, but independent of the choice of local martingale and stopping time. Furthermore, for continuous local martingales, which are the focus of this post, the inequality holds for all ${p > 0}$.

Since the quadratic variation used in my notes, by definition, starts at zero, the BDG inequality also required the local martingale to start at zero. This is not an important restriction, but it can be removed by requiring the quadratic variation to start at ${[M]_0=M_0^2}$. Henceforth, I will assume that this is the case, which means that if we are working with the definition in my notes then we should add ${M_0^2}$ everywhere to the quadratic variation ${[M]}$.

In keeping with the theme of the previous post on Doob’s inequalities, such martingale inequalities should have pathwise versions of the form

 $\displaystyle c_p^{-1}[M]^{p/2}+\int\alpha dM\le\bar M^p\le C_p[M]^{p/2}+\int\beta dM$ (2)

for predictable processes ${\alpha,\beta}$. Inequalities in this form are considerably stronger than (1), since they apply on all sample paths, not just on average. Also, we do not require M to be a local martingale — it is sufficient to be a (continuous) semimartingale. However, in the case where M is a local martingale, the pathwise version (2) does imply the BDG inequality (1), using the fact that stochastic integration preserves the local martingale property.

Lemma 1 Let X and Y be nonnegative increasing measurable processes satisfying ${X\le Y-N}$ for a local (sub)martingale N starting from zero. Then, ${{\mathbb E}[X_\tau]\le{\mathbb E}[Y_\tau]}$ for all stopping times ${\tau}$.

Proof: Let ${\tau_n}$ be an increasing sequence of bounded stopping times increasing to infinity such that the stopped processes ${N^{\tau_n}}$ are submartingales. Then,

$\displaystyle {\mathbb E}[1_{\{\tau_n\ge\tau\}}X_\tau]\le{\mathbb E}[X_{\tau_n\wedge\tau}]={\mathbb E}[Y_{\tau_n\wedge\tau}]-{\mathbb E}[N_{\tau_n\wedge\tau}]\le{\mathbb E}[Y_{\tau_n\wedge\tau}]\le{\mathbb E}[Y_\tau].$

Letting n increase to infinity and using monotone convergence on the left hand side gives the result. ⬜

Moving on to the main statements of this post, I will mention that there are actually many different pathwise versions of the BDG inequalities. I opt for the especially simple statements given in Theorem 2 below. See the papers Pathwise Versions of the Burkholder-Davis Gundy Inequality by Bieglböck and Siorpaes, and Applications of Pathwise Burkholder-Davis-Gundy inequalities by Soirpaes, for slightly different approaches, although these papers do also effectively contain proofs of (3,4) for the special case of ${r=1/2}$. As usual, I am using ${x\vee y}$ to represent the maximum of two numbers.

Theorem 2 Let X and Y be nonnegative continuous processes with ${X_0=Y_0}$. For any ${0 < r\le1}$ we have,

 $\displaystyle (1-r)\bar X^r\le (3-2r)\bar Y^r+r\int(\bar X\vee\bar Y)^{r-1}d(X-Y)$ (3)

and, if X is increasing, this can be improved to,

 $\displaystyle \bar X^r\le (2-r)\bar Y^r+r\int(\bar X\vee\bar Y)^{r-1}d(X-Y).$ (4)

If ${r\ge1}$ and X is increasing then,

 $\displaystyle \bar X^r\le r^{r\vee 2}\,\bar Y^r+r^2\int(\bar X\vee\bar Y)^{r-1}d(X-Y).$ (5)

# Pathwise Martingale Inequalities

Recall Doob’s inequalities, covered earlier in these notes, which bound expectations of functions of the maximum of a martingale in terms of its terminal distribution. Although these are often applied to martingales, they hold true more generally for cadlag submartingales. Here, I use ${\bar X_t\equiv\sup_{s\le t}X_s}$ to denote the running maximum of a process.

Theorem 1 Let X be a nonnegative cadlag submartingale. Then,

• ${{\mathbb P}\left(\bar X_t \ge K\right)\le K^{-1}{\mathbb E}[X_t]}$ for all ${K > 0}$.
• ${\lVert\bar X_t\rVert_p\le (p/(p-1))\lVert X_t\rVert_p}$ for all ${p > 1}$.
• ${{\mathbb E}[\bar X_t]\le(e/(e-1)){\mathbb E}[X_t\log X_t+1]}$.

In particular, for a cadlag martingale X, then ${\lvert X\rvert}$ is a submartingale, so theorem 1 applies with ${\lvert X\rvert}$ in place of X.

We also saw the following much stronger (sub)martingale inequality in the post on the maximum maximum of martingales with known terminal distribution.

Theorem 2 Let X be a cadlag submartingale. Then, for any real K and nonnegative real t,

 $\displaystyle {\mathbb P}(\bar X_t\ge K)\le\inf_{x < K}\frac{{\mathbb E}[(X_t-x)_+]}{K-x}.$ (1)

This is particularly sharp, in the sense that for any distribution for ${X_t}$, there exists a martingale with this terminal distribution for which (1) becomes an equality simultaneously for all values of K. Furthermore, all of the inequalities stated in theorem 1 follow from (1). For example, the first one is obtained by taking ${x=0}$ in (1). The remaining two can also be proved from (1) by integrating over K.

Note that all of the submartingale inequalities above are of the form

 $\displaystyle {\mathbb E}[F(\bar X_t)]\le{\mathbb E}[G(X_t)]$ (2)

for certain choices of functions ${F,G\colon{\mathbb R}\rightarrow{\mathbb R}^+}$. The aim of this post is to show how they have a more general pathwise’ form,

 $\displaystyle F(\bar X_t)\le G(X_t) - \int_0^t\xi\,dX$ (3)

for some nonnegative predictable process ${\xi}$. It is relatively straightforward to show that (2) follows from (3) by noting that the integral is a submartingale and, hence, has nonnegative expectation. To be rigorous, there are some integrability considerations to deal with, so a proof will be included later in this post.

Inequality (3) is required to hold almost everywhere, and not just in expectation, so is a considerably stronger statement than the standard martingale inequalities. Furthermore, it is not necessary for X to be a submartingale for (3) to make sense, as it holds for all semimartingales. We can go further, and even drop the requirement that X is a semimartingale. As we will see, in the examples covered in this post, ${\xi_t}$ will be of the form ${h(\bar X_{t-})}$ for an increasing right-continuous function ${h\colon{\mathbb R}\rightarrow{\mathbb R}}$, so integration by parts can be used,

 $\displaystyle \int h(\bar X_-)\,dX = h(\bar X)X-h(\bar X_0)X_0 - \int X\,dh(\bar X).$ (4)

The right hand side of (4) is well-defined for any cadlag real-valued process, by using the pathwise Lebesgue–Stieltjes integral with respect to the increasing process ${h(\bar X)}$, so can be used as the definition of ${\int h(\bar X_-)dX}$. In the case where X is a semimartingale, integration by parts ensures that this agrees with the stochastic integral ${\int\xi\,dX}$. Since we now have an interpretation of (3) in a pathwise sense for all cadlag processes X, it is no longer required to suppose that X is a submartingale, a semimartingale, or even require the existence of an underlying probability space. All that is necessary is for ${t\mapsto X_t}$ to be a cadlag real-valued function. Hence, we reduce the martingale inequalities to straightforward results of real-analysis not requiring any probability theory and, consequently, are much more general. I state the precise pathwise generalizations of Doob’s inequalities now, leaving the proof until later in the post. As the first of inequality of theorem 1 is just the special case of (1) with ${x=0}$, we do not need to explicitly include this here.

Theorem 3 Let X be a cadlag process and t be a nonnegative time.

1. For real ${K > x}$,
 $\displaystyle 1_{\{\bar X_t\ge K\}}\le\frac{(X_t-x)_+}{K-x}-\int_0^t\xi\,dX$ (5)

where ${\xi=(K-x)^{-1}1_{\{\bar X_-\ge K\}}}$.

2. If X is nonnegative and p,q are positive reals with ${p^{-1}+q^{-1}=1}$ then,
 $\displaystyle \bar X_t^p\le q^p X^p_t-\int_0^t\xi dX$ (6)

where ${\xi=pq\bar X_-^{p-1}}$.

3. If X is nonnegative then,
 $\displaystyle \bar X_t\le\frac{e}{e-1}\left( X_t \log X_t +1\right)-\int_0^t\xi\,dX$ (7)

where ${\xi=\frac{e}{e-1}\log(\bar X_-\vee1)}$.

# Semimartingale Local Times

For a stochastic process X taking values in a state space E, its local time at a point ${x\in E}$ is a measure of the time spent at x. For a continuous time stochastic process, we could try and simply compute the Lebesgue measure of the time at the level,

 $\displaystyle L^x_t=\int_0^t1_{\{X_s=x\}}ds.$ (1)

For processes which hit the level ${x}$ and stick there for some time, this makes some sense. However, if X is a standard Brownian motion, it will always give zero, so is not helpful. Even though X will hit every real value infinitely often, continuity of the normal distribution gives ${{\mathbb P}(X_s=x)=0}$ at each positive time, so that that ${L^x_t}$ defined by (1) will have zero expectation.

Rather than the indicator function of ${\{X=x\}}$ as in (1), an alternative is to use the Dirac delta function,

 $\displaystyle L^x_t=\int_0^t\delta(X_s-x)\,ds.$ (2)

Unfortunately, the Dirac delta is not a true function, it is a distribution, so (2) is not a well-defined expression. However, if it can be made rigorous, then it does seem to have some of the properties we would want. For example, the expectation ${{\mathbb E}[\delta(X_s-x)]}$ can be interpreted as the probability density of ${X_s}$ evaluated at ${x}$, which has a positive and finite value, so it should lead to positive and finite local times. Equation (2) still relies on the Lebesgue measure over the time index, so will not behave as we may expect under time changes, and will not make sense for processes without a continuous probability density. A better approach is to integrate with respect to the quadratic variation,

 $\displaystyle L^x_t=\int_0^t\delta(X_s-x)d[X]_s$ (3)

which, for Brownian motion, amounts to the same thing. Although (3) is still not a well-defined expression, since it still involves the Dirac delta, the idea is to come up with a definition which amounts to the same thing in spirit. Important properties that it should satisfy are that it is an adapted, continuous and increasing process with increments supported on the set ${\{X=x\}}$,

 $\displaystyle L^x_t=\int_0^t1_{\{X_s=x\}}dL^x_s.$

Local times are a very useful and interesting part of stochastic calculus, and finds important applications to excursion theory, stochastic integration and stochastic differential equations. However, I have not covered this subject in my notes, so do this now. Recalling Ito’s lemma for a function ${f(X)}$ of a semimartingale X, this involves a term of the form ${\int f^{\prime\prime}(X)d[X]}$ and, hence, requires ${f}$ to be twice differentiable. If we were to try to apply the Ito formula for functions which are not twice differentiable, then ${f^{\prime\prime}}$ can be understood in terms of distributions, and delta functions can appear, which brings local times into the picture. In the opposite direction, which I take in this post, we can try to generalise Ito’s formula and invert this to give a meaning to (3). Continue reading “Semimartingale Local Times”

# A Process With Hidden Drift

Consider a stochastic process X of the form

 $\displaystyle X_t=W_t+\int_0^t\xi_sds,$ (1)

for a standard Brownian motion W and predictable process ${\xi}$, defined with respect to a filtered probability space ${(\Omega,\mathcal F,\{\mathcal F_t\}_{t\in{\mathbb R}_+},{\mathbb P})}$. For this to make sense, we must assume that ${\int_0^t\lvert\xi_s\rvert ds}$ is almost surely finite at all times, and I will suppose that ${\mathcal F_\cdot}$ is the filtration generated by W.

The question is whether the drift ${\xi}$ can be backed out from knowledge of the process X alone. As I will show with an example, this is not possible. In fact, in our example, X will itself be a standard Brownian motion, even though the drift ${\xi}$ is non-trivial (that is, ${\int\xi dt}$ is not almost surely zero). In this case X has exactly the same distribution as W, so cannot be distinguished from the driftless case with ${\xi=0}$ by looking at the distribution of X alone.

On the face of it, this seems rather counter-intuitive. By standard semimartingale decomposition, it is known that we can always decompose

 $\displaystyle X=M+A$ (2)

for a unique continuous local martingale M starting from zero, and unique continuous FV process A. By uniqueness, ${M=W}$ and ${A=\int\xi dt}$. This allows us to back out the drift ${\xi}$ and, in particular, if the drift is non-trivial then X cannot be a martingale. However, in the semimartingale decomposition, it is required that M is a martingale with respect to the original filtration ${\mathcal F_\cdot}$. If we do not know the filtration ${\mathcal F_\cdot}$, then it might not be possible to construct decomposition (2) from knowledge of X alone. As mentioned above, we will give an example where X is a standard Brownian motion which, in particular, means that it is a martingale under its natural filtration. By the semimartingale decomposition result, it is not possible for X to be an ${\mathcal F_\cdot}$-martingale. A consequence of this is that the natural filtration of X must be strictly smaller than the natural filtration of W.

The inspiration for this post was a comment by Gabe posing the following question: If we take ${\mathbb F}$ to be the filtration generated by a standard Brownian motion W in ${(\Omega,\mathcal F,{\mathbb P})}$, and we define ${\tilde W_t=W_t+\int_0^t\Theta_udu}$, can we find an ${\mathbb F}$-adapted ${\Theta}$ such that the filtration generated by ${\tilde W}$ is smaller than ${\mathbb F}$? Our example gives an affirmative answer. Continue reading “A Process With Hidden Drift”

# Projection in Discrete Time

It has been some time since my last post, but I am continuing now with the stochastic calculus notes on optional and predictable projection. In this post, I will go through the ideas in the discrete-time situation. All of the main concepts involved in optional and predictable projection are still present in discrete time, but the theory is much simpler. It is only really in continuous time that the projection theorems really show their power, so the aim of this post is to motivate the concepts in a simple setting before generalising to the full, continuous-time situation. Ideally, this would have been published before the posts on optional and predictable projection in continuous time, so it is a bit out of sequence.

We consider time running through the discrete index set ${{\mathbb Z}^+=\{0,1,2,\ldots\}}$, and work with respect to a filtered probability space ${(\Omega,\mathcal{F},\{\mathcal{F}_n\}_{n=0,1,\ldots},{\mathbb P})}$. Then, ${\mathcal{F}_n}$ is used to represent the collection of events observable up to and including time n. Stochastic processes will all be real-valued and defined up to almost-sure equivalence. That is, processes X and Y are considered to be the same if ${X_n=Y_n}$ almost surely for each ${n\in{\mathbb Z}^+}$. The projections of a process X are defined as follows.

Definition 1 Let X be a measurable process. Then,

1. the optional projection, ${{}^{\rm o}\!X}$, exists if and only if ${{\mathbb E}[\lvert X_n\rvert\,\vert\mathcal{F}_n]}$ is almost surely finite for each n, in which case
 $\displaystyle {}^{\rm o}\!X_n={\mathbb E}[X_n\,\vert\mathcal{F}_n].$ (1)
2. the predictable projection, ${{}^{\rm p}\!X}$, exists if and only if ${{\mathbb E}[\lvert X_n\rvert\,\vert\mathcal{F}_{n-1}]}$ is almost surely finite for each n, in which case
 $\displaystyle {}^{\rm p}\!X_n={\mathbb E}[X_n\,\vert\mathcal{F}_{n-1}].$ (2)

# The Projection Theorems

In this post, I introduce the concept of optional and predictable projections of jointly measurable processes. Optional projections of right-continuous processes and predictable projections of left-continuous processes were constructed in earlier posts, with the respective continuity conditions used to define the projection. These are, however, just special cases of the general theory. For arbitrary measurable processes, the projections cannot be expected to satisfy any such pathwise regularity conditions. Instead, we use the measurability criteria that the projections should be, respectively, optional and predictable.

The projection theorems are a relatively straightforward consequence of optional and predictable section. However, due to the difficulty of proving the section theorems, optional and predictable projection is generally considered to be an advanced or hard part of stochastic calculus. Here, I will make use of the section theorems as stated in an earlier post, but leave the proof of those until after developing the theory of projection.

As usual, we work with respect to a complete filtered probability space ${(\Omega,\mathcal{F},\{\mathcal{F}\}_{t\ge0},{\mathbb P})}$, and only consider real-valued processes. Any two processes are considered to be the same if they are equal up to evanescence. The optional projection is then defined (up to evanescence) by the following.

Theorem 1 (Optional Projection) Let X be a measurable process such that ${{\mathbb E}[1_{\{\tau < \infty\}}\lvert X_\tau\rvert\;\vert\mathcal{F}_\tau]}$ is almost surely finite for each stopping time ${\tau}$. Then, there exists a unique optional process ${{}^{\rm o}\!X}$, referred to as the optional projection of X, satisfying

 $\displaystyle 1_{\{\tau < \infty\}}{}^{\rm o}\!X_\tau={\mathbb E}[1_{\{\tau < \infty\}}X_\tau\,\vert\mathcal{F}_\tau]$ (1)

almost surely, for each stopping time ${\tau}$.

Predictable projection is defined similarly.

Theorem 2 (Predictable Projection) Let X be a measurable process such that ${{\mathbb E}[1_{\{\tau < \infty\}}\lvert X_\tau\rvert\;\vert\mathcal{F}_{\tau-}]}$ is almost surely finite for each predictable stopping time ${\tau}$. Then, there exists a unique predictable process ${{}^{\rm p}\!X}$, referred to as the predictable projection of X, satisfying

 $\displaystyle 1_{\{\tau < \infty\}}{}^{\rm p}\!X_\tau={\mathbb E}[1_{\{\tau < \infty\}}X_\tau\,\vert\mathcal{F}_{\tau-}]$ (2)

almost surely, for each predictable stopping time ${\tau}$.

# Pathwise Regularity of Optional and Predictable Processes

As I have mentioned before in these notes, when working with processes in continuous time, it is important to select a good modification. Typically, this means that we work with processes which are left or right continuous. However, in general, it can be difficult to show that the paths of a process satisfy such pathwise regularity. In this post I show that for optional and predictable processes, the section theorems introduced in the previous post can be used to considerably simplify the situation. Although they are interesting results in their own right, the main application in these notes will be to optional and predictable projection. Once the projections are defined, the results from this post will imply that they preserve certain continuity properties of the process paths.

Suppose, for example, that we have a continuous-time process X which we want to show to be right-continuous. It is certainly necessary that, for any sequence of times ${t_n\in{\mathbb R}_+}$ decreasing to a limit ${t}$, ${X_{t_n}}$ almost-surely tends to ${X_t}$. However, even if we can prove this for every possible decreasing sequence ${t_n}$, it does not follow that X is right-continuous. As a counterexample, if ${\tau\colon\Omega\rightarrow{\mathbb R}}$ is any continuously distributed random time, then the process ${X_t=1_{\{t\le \tau\}}}$ is not right-continuous. However, so long as the distribution of ${\tau}$ has no atoms, X is almost-surely continuous at each fixed time t. It is remarkable, then, that if we generalise to look at sequences of stopping times, then convergence in probability along decreasing sequences of stopping times is enough to guarantee everywhere right-continuity of the process. At least, it is enough so long as we restrict consideration to optional processes.

As usual, we work with respect to a complete filtered probability space ${(\Omega,\mathcal{F},\{\mathcal{F}_t\}_{t\ge0},{\mathbb P})}$. Two processes are considered to be the same if they are equal up to evanescence, and any pathwise property is said to hold if it holds up to evanescence. That is, a process is right-continuous if and only is it is everywhere right-continuous on a set of probability 1. All processes will be taken to be real-valued, and a process is said to have left (or right) limits if its left (or right) limits exist everywhere, up to evanescence, and are finite.

Theorem 1 Let X be an optional process. Then,

1. X is right-continuous if and only if ${X_{\tau_n}\rightarrow X_\tau}$ in probability, for each uniformly bounded sequence ${\tau_n}$ of stopping times decreasing to a limit ${\tau}$.
2. X has right limits if and only if ${X_{\tau_n}}$ converges in probability, for each uniformly bounded decreasing sequence ${\tau_n}$ of stopping times.
3. X has left limits if and only if ${X_{\tau_n}}$ converges in probability, for each uniformly bounded increasing sequence ${\tau_n}$ of stopping times.

The only if’ parts of these statements is immediate, since convergence everywhere trivially implies convergence in probability. The importance of this theorem is in the `if’ directions. That is, it gives sufficient conditions to guarantee that the sample paths satisfy the respective regularity properties.

Note that conditions for left-continuity are absent from the statements of Theorem 1. In fact, left-continuity does not follow from the corresponding property along sequences of stopping times. Consider, for example, a Poisson process, X. This is right-continuous but not left-continuous. However, its jumps occur at totally inaccessible times. This implies that, for any sequence ${\tau_n}$ of stopping times increasing to a finite limit ${\tau}$, it is true that ${X_{\tau_n}}$ converges almost surely to ${X_\tau}$. In light of such examples, it is even more remarkable that right-continuity and the existence of left and right limits can be determined by just looking at convergence in probability along monotonic sequences of stopping times. Theorem 1 will be proven below, using the optional section theorem.

For predictable processes, we can restrict attention to predictable stopping times. In this case, we obtain a condition for left-continuity as well as for right-continuity.

Theorem 2 Let X be a predictable process. Then,

1. X is right-continuous if and only if ${X_{\tau_n}\rightarrow X_\tau}$ in probability, for each uniformly bounded sequence ${\tau_n}$ of predictable stopping times decreasing to a limit ${\tau}$.
2. X is left-continuous if and only if ${X_{\tau_n}\rightarrow X_\tau}$ in probability, for each uniformly bounded sequence ${\tau_n}$ of predictable stopping times increasing to a limit ${\tau}$.
3. X has right limits if and only if ${X_{\tau_n}}$ converges in probability, for each uniformly bounded decreasing sequence ${\tau_n}$ of predictable stopping times.
4. X has left limits if and only if ${X_{\tau_n}}$ converges in probability, for each uniformly bounded increasing sequence ${\tau_n}$ of predictable stopping times.

Again, the proof is given below, and relies on the predictable section theorem. Continue reading “Pathwise Regularity of Optional and Predictable Processes”

# Measurable Projection and the Debut Theorem

I will discuss some of the immediate consequences of the following deceptively simple looking result.

Theorem 1 (Measurable Projection) If ${(\Omega,\mathcal{F},{\mathbb P})}$ is a complete probability space and ${A\in\mathcal{B}({\mathbb R})\otimes\mathcal{F}}$ then ${\pi_\Omega(A)\in\mathcal{F}}$.

The notation ${\pi_B}$ is used to denote the projection from the cartesian product ${A\times B}$ of sets A and B onto B. That is, ${\pi_B((a,b)) = b}$. As is standard, ${\mathcal{B}({\mathbb R})}$ is the Borel sigma-algebra on the reals, and ${\mathcal{A}\otimes\mathcal{B}}$ denotes the product of sigma-algebras.

Theorem 1 seems almost obvious. Projection is a very simple map and we may well expect the projection of, say, a Borel subset of ${{\mathbb R}^2}$ onto ${{\mathbb R}}$ to be Borel. In order to formalise this, we could start by noting that sets of the form ${A\times B}$ for Borel A and B have an easily described, and measurable, projection, and the Borel sigma-algebra is the closure of the collection such sets under countable unions and under intersections of decreasing sequences of sets. Furthermore, the projection operator commutes with taking the union of sequences of sets. Unfortunately, this method of proof falls down when looking at the limit of decreasing sequences of sets, which does not commute with projection. For example, the decreasing sequence of sets ${S_n=(0,1/n)\times{\mathbb R}\subseteq{\mathbb R}^2}$ all project onto the whole of ${{\mathbb R}}$, but their limit is empty and has empty projection.

There is an interesting history behind Theorem 1, as mentioned by Gerald Edgar on MathOverflow (1) in answer to The most interesting mathematics mistake? In a 1905 paper, Henri Lebesgue asserted that the projection of a Borel subset of the plane onto the line is again a Borel set (Lebesgue, (3), pp 191–192). This was based on the erroneous assumption that projection commutes with the limit of a decreasing sequence of sets. The mistake was spotted, in 1916, by Mikhail Suslin, and led to his investigation of analytic sets and to begin the study of what is now known as descriptive set theory. See Kanamori, (2), for more details. In fact, as was shown by Suslin, projections of Borel sets need not be Borel. So, by considering the case where ${\Omega={\mathbb R}}$ and ${\mathcal{F}=\mathcal{B}({\mathbb R})}$, Theorem 1 is false if the completeness assumption is dropped. I will give a proof of Theorem 1 but, as it is a bit involved, this is left for a later post.

For now, I will state some consequences of the measurable projection theorem which are important to the theory of continuous-time stochastic processes, starting with the following. Throughout this post, the underlying probability space ${(\Omega,\mathcal{F})}$ is assumed to be complete, and stochastic processes are taken to be real-valued, or take values in the extended reals ${\bar{\mathbb R}={\mathbb R}\cup\{\pm\infty\}}$, with time index ranging over ${{\mathbb R}_+}$. For a first application of measurable projection, it allows us to show that the supremum of a jointly measurable processes is measurable.

Lemma 2 If X is a jointly measurable process and ${S\in\mathcal{B}(\mathbb{R}_+)}$ then ${\sup_{s\in S}X_s}$ is measurable.

Proof: Setting ${U=\sup_{s\in S}X_s}$ then, for each real K, ${U > K}$ if and only if ${X_s > K}$ for some ${s\in S}$. Hence,

$\displaystyle U^{-1}\left((K,\infty]\right)=\pi_\Omega\left((S\times\Omega)\cap X^{-1}\left((K,\infty]\right)\right).$

By the measurable projection theorem, this is in ${\mathcal{F}}$ and, as sets of the form ${(K,\infty]}$ generate the Borel sigma-algebra on ${\mathbb{\bar R}}$, U is ${\mathcal{F}}$-measurable. ⬜

Next, the running maximum of a jointly measurable process is again jointly measurable.

Lemma 3 If X is a jointly measurable process then ${X^*_t\equiv\sup_{s\le t}X_s}$ is also jointly measurable.