On The Integral ∫I(W ≥ 0)dW

In this post I look at the integral X_t = ∫₀^t 1_{W≥0} dW for standard Brownian motion W. This is a particularly interesting example of stochastic integration with connections to local times, option pricing and hedging, and demonstrates behaviour not seen for deterministic integrals that can seem counter-intuitive. For a start, X is a martingale so has zero expectation. To some it might, at first, seem that X is nonnegative and — furthermore — equals W ∨ 0. However, this has positive expectation contradicting the first property. In fact, X can go negative and we can compute its distribution. In a Twitter post, Oswin So asked about this very point, showing some plots demonstrating the behaviour of the integral.

simulation of X — Figure 1: Numerically evaluating ∫¹₀ 1_{W≥0} dW

We can evaluate the integral as X_t = W_t ∨ 0 – 12 L_t⁰ where L_t⁰ is the local time of W at 0. The local time is a continuous increasing process starting from 0, and only increases at times where W = 0. That is, it is constant over intervals on which W is nonzero. The first term, W_t ∨ 0 has probability density p(x) equal to that of a normal density over x > 0 and has a delta function at zero. Subtracting the nonnegative value L⁰_t spreads out the density of this delta function to the left, leading to the odd looking density computed numerically in So’s Twitter post, with a peak just to the left of the origin and dropping instantly to a smaller value on the right. We will compute an exact form for this probability density but, first, let’s look at an intuitive interpretation in the language of option pricing.

Consider a financial asset such as a stock, whose spot price at time t is S_t. We suppose that the price is defined at all times t ≥ 0 and has continuous sample paths. Furthermore, suppose that we can buy and sell at spot any time with no transaction costs. A call option of strike price K and maturity T pays out the cash value (S_T - K)₊ at time T. For simplicity, assume that this is ‘out of the money’ at the initial time, meaning that S₀ ≤ K.

The idea of option hedging is, starting with an initial investment, to trade in the stock in such a way that at maturity T, the value of our trading portfolio is equal to (S_T - K)₊. This synthetically replicates the option. A naive suggestion which is sometimes considered is to hold one unit of stock at all times t for which S_t ≥ K and zero units at all other times.The profit from such a strategy is given by the integral X_T = ∫₀^T 1_{S≥K} dS. If the stock only equals the strike price at finitely many times then this works. If it first hits K at time s and does not drop back below it on interval (s, t) then the profit at t is equal to the amount S_t – K that it has gone up since we purchased it. If it drops back below the strike then we sell at K for zero profit or loss, and this repeats for subsequent times that it exceeds K. So, at time T, we hold one unit of stock if its value is above K for a profit of S_T – K and zero units for zero profit otherwise. This replicates the option payoff.

The idea described works if S_T hits the strike K at a finite set of times,and also if the path of S_t has finite variation, in which case Lebesgue-Stieltjes integration gives X_T = (S_T - K)₊. It cannot work for stock prices though! If it did, then we have a trading strategy which is guaranteed to never lose money but generates profits on the positive probability event that S_T > K. This is arbitrage, generating money with zero risk, which should be impossible.

What goes wrong? First, Brownian motion does not have sample paths with finite variation and will not hit a level finitely often. Instead, if it reaches K then it hits the level uncountably often. As our simple trading strategy would involve buying and selling infinitely often, it is not so easy. Instead, we can approximate by a discrete-time strategy and take the limit. Choosing a finite sequence of times 0 = t₀ < t₁ < ⋯< t_n = T, the discrete approximation is to hold one unit of the asset over the interval (t_i, t_i+1] if S_{t_i} ≥ K and zero units otherwise.

The discrete strategy involves buying one unit of the asset whenever its price reaches K at one of the discrete times and selling whenever it drops back below. This replicates the option payoff, except for the fact then when we buy above K we effectively overpay by amount S_{t_i} – K and, when we sell below K, we lose K – S_{t_i}. This results in some slippage from not being able to execute at the exact level,

$\displaystyle A_T=\sum_{i=1}^{n}1_{\{S_{t_{i-1}} < K\le S_{t_i}{\rm\ or\ }S_{t_{i-1}}\ge K > S_{t_i}\}}\lvert S_{t_i}-K\rvert.$

So, our simple trading strategy generates profit (S_T - K)₊ – A_T, missing the option value by amount A_T. In the limit as n goes to infinity with time step size going to zero, the slippage A_T does not go to zero. For equally spaced times, It can be shown that the number of times that spot crosses K is of order √n, and each of these times generates slippage of order 1/√n on average. So, in the limit, A_T does not vanish and, instead, converges on a positive value equal to half the local time L_T^K.

Figure 2: Naive option hedge with slippage

Figure 2 shows the situation, with the slippage A shown on the same plot (using K as the zero axis, so they are on the same scale). We can just take K = 0 for an asset whose spot price can be positive or negative. Then, with S = W, our integral X_T = ∫₀^T 1_{W≥0} dW is the same as the payoff from the naive option hedge, or (S_T)₊ minus slippage L⁰_T/2.

Now lets turn to a computation of the probability density of X_T = W_T ∨ 0 – L_T⁰/2. By the scaling property of Brownian motion, the distribution of X_T/√T does not depend on T, so we take T = 1 without loss of generality. The first trick to this is to make use of the fact that, if M_t = sup_s≤tW_s is the running maximum then (|W_t|, L_t⁰) has the same joint distribution as (M_t - W_t, M_t). This immediately tells us that L₁⁰ has the same distribution as M₁ which, by the reflection principle, has the same distribution as |W₁|. Using

$\displaystyle \varphi(x)=\frac1{\sqrt{2\pi}}e^{-\frac12x^2}$

for the standard normal density, this shows that the local time L₁⁰ has probability density 2φ(x) over x > 0.

Next, as flipping the sign W does not impact either |W₁| or L₁⁰, sgn(W₁) is independent of these. On the event W₁ < 0 we have X₁ = –L₁⁰/2 which has density 4φ(2x) over x < 0. On the event W₁ > 0, we have X₁ = |W₁|-L₁⁰/2, which has the same distribution as M₁/2 – W₁.

To complete the computation of the probability density of X₁, we need to know the joint distribution of M₁ and W₁, which can be done as described in the post on the reflection principle. The probability that W₁ is in an interval of width δx about a point x and that M₁ > y, for some y > x is, by reflection, equal to the probability that W₁ is in an interval of width δx about the point 2y – x. This has probability φ(2y - x)δx and, by differentiating in y, gives a joint probability density of 2φ′(x - 2y) for (W₁, M₁).

The expectation of f(X₁) for bounded measurable function f can be computed by integrating over this joint probability density.

$\displaystyle \begin{aligned} {\mathbb E}[f(X_1)\vert\;W_1 > 0] &={\mathbb E}[f(M_1/2-W_1)]\\ &=2\int_{-\infty}^\infty\int_{x_+}^\infty f(y/2-x)\varphi'(x-2y)\,dydx\\ &=4\int_{-\infty}^\infty\int_{(-x)\vee(-x/2)}^\infty f(z)\varphi'(-3x-4z)\,dzdx\\ &=4\int_{-\infty}^\infty\int_{(-z)\vee(-2z)}^\infty f(z)\varphi'(-3x-4z)\,dxdz\\ &=\frac43\int_{-\infty}^\infty f(z)\varphi(2z)\,dz+\frac43\int_0^\infty f(z)\varphi(z)\,dz. \end{aligned}$

The substitution z = y/2 – x was applied in the inner integral, and the order of integration switched. The probability density of X₁ conditioned on W₁ > 0 is therefore,

$\displaystyle p_{X_1}(x\vert\; W_1 > 0)=\begin{cases} \frac43\varphi(x),&{\rm for\ }x > 0,\\ \frac43\varphi(2x),&{\rm for\ }x < 0. \end{cases}$

Conditioned on W₁ < 0, we have already shown that the density is 4φ(2x) over x < 0 so, taking the average of these, we obtain

$\displaystyle p_{X_1}(x)=\begin{cases} \frac23\varphi(x),&{\rm for\ }x > 0,\\ \frac83\varphi(2x),&{\rm for\ }x < 0. \end{cases}$

This is plotted in figure 3 below, agreeing with So’s numerical estimation from the Twitter post shown in figure 1 above.

Stochastic Differential Equations

Stochastic differential equations (SDEs) form a large and very important part of the theory of stochastic calculus. Much like ordinary differential equations (ODEs), they describe the behaviour of a dynamical system over infinitesimal time increments, and their solutions show how the system evolves over time. The difference with SDEs is that they include a source of random noise., typically given by a Brownian motion. Since Brownian motion has many pathological properties, such as being everywhere nondifferentiable, classical differential techniques are not well equipped to handle such equations. Standard results regarding the existence and uniqueness of solutions to ODEs do not apply in the stochastic case, and cannot readily describe what it even means to solve such as system. I will make some posts explaining how the theory of stochastic calculus applies to systems described by an SDE.

Consider a stochastic differential equation describing the evolution of a real-valued process {X_t}_t≥0,

$\displaystyle dX_t = \sigma(X_t)\,dW_t + b(X_t)\,dt$

(1)

which can be specified along with an initial condition X₀ = x₀. Here, b is the drift specifying how X moves on average across the dt time, σ is a volatility term giving the amplitude of the random noise and W is a driving Brownian motion providing the source of the randomness. There are numerous situations where equations such as (1) are used, with applications in physics, finance, filtering theory, and many other areas.

In the case where σ is zero, (1) is just an ordinary differential equation dX/dt = b(X). In the general case, we can informally think of dividing through by dt to give an ODE plus an additional noise term

$\displaystyle \frac{dX_t}{dt}=b(X_t)+\sigma(X_t)\xi_t.$

(2)

I have set ξ_t = dW_t/dt which can be thought of as a process whose values at each time are independent zero-mean random variables. As mentioned above, though, Brownian motion is not differentiable so this does not exist in the usual sense. While it can be described by a kind of random distribution, even distribution theory is not well-equipped to handle such equations involving multiplying by the nondifferentiable process σ(X_t). Instead, (1) can be integrated to obtain

$\displaystyle X_t=X_0+\int_0^t\sigma(X_s)\,dW_s+\int_0^tb(X_s)\,ds,$

(3)

where the right-hand-side is interpreted using stochastic integration with respect to the semimartingale W. Likewise, X will be a semimartingale, and such solutions are often referred to as diffusions.

The differential form (1) can be interpreted as a shorthand for the integral expression (3), which I will do in these notes. It can be generalized to n-dimensional processes by allowing b to take values in ℝⁿ, σ(x) to be an n × m matrix, and W to be an m-dimensional Brownian motion. That is, W = (W¹, …, W^m) where Wⁱ are independent Brownian motions. I will sometimes write this as

$\displaystyle dX^t_i=\sigma_{ij}(X_t)dW^j_t+b_i(X_t)dt$

where the summation convention is being applied, with subscripts or superscripts occuring more than once in a single term being summed from 1 to n.

Unlike ODEs, when dealing with SDEs we need to consider what underlying probability space the solution is defined with respect to. This leads to the existence of different classes of solutions.

Strong solutions where X can be expressed as a measurable function of the Brownian motion W or, equivalently, X is adapted to its natural filtration.
Weak solutions where X need not be a function of W. Such cases may require additional randomness so may not exist on the probability space with respect to which the Brownian motion W is defined. It can be necessary to extend the filtered probability space to construct these solutions.

Likewise, when considering uniqueness of solutions, there are different ways this occurs.

Pathwise uniqueness where, up to indistinguishability, there is only one solution X. This should hold not just on one specific space containing a Brownian motion W, but on all such spaces. That is, weak solutions should be unique.
Uniqueness in law where there may be multiple pathwise solutions, but their distribution is uniquely determined by the SDE.

There are various general conditions under which strong solutions and pathwise uniqueness are guaranteed for SDE (1) , such as the Itô result for Lipschitz continuous coefficients. I covered this situation in a previous post.

Other than using the SDE (1), such systems can also be described by an associated differential operator. For the n-dimensional case set a(x) = σ(x)σ(x)^T, which is an n × n positive semidefinite matrix. Then, the second order operator L can be defined

$\displaystyle Lf(x)=\frac12a_{ij}(x)f_{,ij}(x)+b_{i}(x)f_{,i}(x)$

operating on twice continuously differentiable functions f: ℝⁿ → ℝ. Being able to effortlessly switch between descriptions using the SDE (1) and the operator L is a huge benefit when working with such systems. There are several different ways in which the operator can be used to describe a stochastic process, all of which relate to weak solutions and uniqueness in law of the SDE.

Markov Generator: A Markov process is a weak solution to the SDE (1) if its infinitesimal generator is L. That is, if the transition function is P_t then,

$\displaystyle \lim_{t\rightarrow0}t^{-1}(P_tf-f)=Lf$

for suitably regular functions f.

Backwards Equation: For a function f: ℝⁿ × ℝ₊ → ℝ, f(t, X_t) is a local martingale if and only if it solves the partial differential equation (PDE)

$\displaystyle \frac{\partial f}{\partial t}+Lf=0.$

Consequently, for any time t > 0 and function g: ℝ^d → ℝ, if we let f be a solution to the PDE above with boundary condition f(x, t) = g(x) then, assuming integrability conditions, the conditional expectations at times s < t are

$\displaystyle {\mathbb E}[g(X_t)\;\vert\mathcal F_s]=f(X_s,s).$

If the conditions are satisfied, this describes a Markov process and gives its transition probabilities, describing the distribution of X and implying uniqueness in law.

Forward Equation: Assuming that it is sufficiently smooth, the probability density p(t, x) of X_t satisfies the PDE

$\displaystyle \frac{\partial p}{\partial t}=L^Tf.$

where L^T is the transpose of operator L

$\displaystyle L^Tp=\frac12(a_{ij}p)_{,ij}+(b_ip)_{,i}.$

If this PDE has a unique solution for given initial distribution, then this uniquely determines the distribution of X_t. So, if unique solutions to the forward equation exist starting at every future time, it gives uniqueness in law for X.

Martingale problem: Any weak solution to SDE (1) satisfies the property that

$\displaystyle f(X_t)-\int_0^t Lf(X_s)\,ds$

is a local martingale for twice continuously differentiable functions f: ℝⁿ → ℝ. This approach, which was pioneered by Stroock and Varadhan, has many benefits over the other applications of operator L described above, since it applies much more generally. We do not need to a-priori impose any properties on X such as being Markov, and as the test functions f are chosen at will, they automatically satisfy the necessary regularity properties. As well as being a very general way to describe solutions to a stochastic dynamical system, it turns out to be very fruitful. The striking and far-reaching Stroock–Varadhan uniqueness theorem, in particular, guarantees existence and uniqueness in law so long as a is continuous and positive definite and b is locally bounded.

The Kolmogorov Continuity Theorem

Fractional BM — Figure 1: Fractional Brownian motion with H = 1/4, 1/2, 3/4

One of the common themes throughout the theory of continuous-time stochastic processes, is the importance of choosing good versions of processes. Specifying the finite distributions of a process is not sufficient to determine its sample paths so, if a continuous modification exists, then it makes sense to work with that. A relatively straightforward criterion ensuring the existence of a continuous version is provided by Kolmogorov’s continuity theorem.

For any positive real number ${\gamma}$ , a map ${f\colon E\rightarrow F}$ between metric spaces E and F is said to be ${\gamma}$ -Hölder continuous if there exists a positive constant C satisfying

$\displaystyle d(f(x),f(y))\le Cd(x,y)^\gamma$

for all ${x,y\in E}$ . The smallest value of C satisfying this inequality is known as the ${\gamma}$ -Hölder coefficient of ${f}$ . Hölder continuous functions are always continuous and, at least on bounded spaces, is a stronger property for larger values of the coefficient ${\gamma}$ . So, if E is a bounded metric space and ${\alpha\le\beta}$ , then every ${\beta}$ -Hölder continuous map from E is also ${\alpha}$ -Hölder continuous. In particular, 1-Hölder and Lipschitz continuity are equivalent.

Kolmogorov’s theorem gives simple conditions on the pairwise distributions of a process which guarantee the existence of a continuous modification but, also, states that the sample paths ${t\mapsto X_t}$ are almost surely locally Hölder continuous. That is, they are almost surely Hölder continuous on every bounded interval. To start with, we look at real-valued processes. Throughout this post, we work with repect to a probability space ${(\Omega,\mathcal F, {\mathbb P})}$ . There is no need to assume the existence of any filtration, since they play no part in the results here

Theorem 1 (Kolmogorov) Let ${\{X_t\}_{t\ge0}}$ be a real-valued stochastic process such that there exists positive constants ${\alpha,\beta,C}$ satisfying

$\displaystyle {\mathbb E}\left[\lvert X_t-X_s\rvert^\alpha\right]\le C\lvert t-s\vert^{1+\beta},$

for all ${s,t\ge0}$ . Then, X has a continuous modification which, with probability one, is locally ${\gamma}$ -Hölder continuous for all ${0 < \gamma < \beta/\alpha}$ .

Continue reading “The Kolmogorov Continuity Theorem” →

Pathwise Burkholder-Davis-Gundy Inequalities

As covered earlier in my notes, the Burkholder-David-Gundy inequality relates the moments of the maximum of a local martingale M with its quadratic variation,

$\displaystyle c_p^{-1}{\mathbb E}[[M]^{p/2}_\tau]\le{\mathbb E}[\bar M_\tau^p]\le C_p{\mathbb E}[[M]^{p/2}_\tau].$

(1)

Here, ${\bar M_t\equiv\sup_{s\le t}\lvert M_s\rvert}$ is the running maximum, ${[M]}$ is the quadratic variation, ${\tau}$ is a stopping time, and the exponent ${p}$ is a real number greater than or equal to 1. Then, ${c_p}$ and ${C_p}$ are positive constants depending on p, but independent of the choice of local martingale and stopping time. Furthermore, for continuous local martingales, which are the focus of this post, the inequality holds for all ${p > 0}$ .

Since the quadratic variation used in my notes, by definition, starts at zero, the BDG inequality also required the local martingale to start at zero. This is not an important restriction, but it can be removed by requiring the quadratic variation to start at ${[M]_0=M_0^2}$ . Henceforth, I will assume that this is the case, which means that if we are working with the definition in my notes then we should add ${M_0^2}$ everywhere to the quadratic variation ${[M]}$ .

In keeping with the theme of the previous post on Doob’s inequalities, such martingale inequalities should have pathwise versions of the form

$\displaystyle c_p^{-1}[M]^{p/2}+\int\alpha dM\le\bar M^p\le C_p[M]^{p/2}+\int\beta dM$

(2)

for predictable processes ${\alpha,\beta}$ . Inequalities in this form are considerably stronger than (1), since they apply on all sample paths, not just on average. Also, we do not require M to be a local martingale — it is sufficient to be a (continuous) semimartingale. However, in the case where M is a local martingale, the pathwise version (2) does imply the BDG inequality (1), using the fact that stochastic integration preserves the local martingale property.

Lemma 1 Let X and Y be nonnegative increasing measurable processes satisfying ${X\le Y-N}$ for a local (sub)martingale N starting from zero. Then, ${{\mathbb E}[X_\tau]\le{\mathbb E}[Y_\tau]}$ for all stopping times ${\tau}$ .

Proof: Let ${\tau_n}$ be an increasing sequence of bounded stopping times increasing to infinity such that the stopped processes ${N^{\tau_n}}$ are submartingales. Then,

$\displaystyle {\mathbb E}[1_{\{\tau_n\ge\tau\}}X_\tau]\le{\mathbb E}[X_{\tau_n\wedge\tau}]={\mathbb E}[Y_{\tau_n\wedge\tau}]-{\mathbb E}[N_{\tau_n\wedge\tau}]\le{\mathbb E}[Y_{\tau_n\wedge\tau}]\le{\mathbb E}[Y_\tau].$

Letting n increase to infinity and using monotone convergence on the left hand side gives the result. ⬜

Moving on to the main statements of this post, I will mention that there are actually many different pathwise versions of the BDG inequalities. I opt for the especially simple statements given in Theorem 2 below. See the papers Pathwise Versions of the Burkholder-Davis Gundy Inequality by Bieglböck and Siorpaes, and Applications of Pathwise Burkholder-Davis-Gundy inequalities by Soirpaes, for slightly different approaches, although these papers do also effectively contain proofs of (3,4) for the special case of ${r=1/2}$ . As usual, I am using ${x\vee y}$ to represent the maximum of two numbers.

Theorem 2 Let X and Y be nonnegative continuous processes with ${X_0=Y_0}$ . For any ${0 < r\le1}$ we have,

$\displaystyle (1-r)\bar X^r\le (3-2r)\bar Y^r+r\int(\bar X\vee\bar Y)^{r-1}d(X-Y)$ (3)

and, if X is increasing, this can be improved to,

$\displaystyle \bar X^r\le (2-r)\bar Y^r+r\int(\bar X\vee\bar Y)^{r-1}d(X-Y).$ (4)

If ${r\ge1}$ and X is increasing then,

$\displaystyle \bar X^r\le r^{r\vee 2}\,\bar Y^r+r^2\int(\bar X\vee\bar Y)^{r-1}d(X-Y).$ (5)

Continue reading “Pathwise Burkholder-Davis-Gundy Inequalities” →

Pathwise Martingale Inequalities

Recall Doob’s inequalities, covered earlier in these notes, which bound expectations of functions of the maximum of a martingale in terms of its terminal distribution. Although these are often applied to martingales, they hold true more generally for cadlag submartingales. Here, I use ${\bar X_t\equiv\sup_{s\le t}X_s}$ to denote the running maximum of a process.

Theorem 1 Let X be a nonnegative cadlag submartingale. Then,

${{\mathbb P}\left(\bar X_t \ge K\right)\le K^{-1}{\mathbb E}[X_t]}$ for all ${K > 0}$ .

${\lVert\bar X_t\rVert_p\le (p/(p-1))\lVert X_t\rVert_p}$ for all ${p > 1}$ .

${{\mathbb E}[\bar X_t]\le(e/(e-1)){\mathbb E}[X_t\log X_t+1]}$ .

In particular, for a cadlag martingale X, then ${\lvert X\rvert}$ is a submartingale, so theorem 1 applies with ${\lvert X\rvert}$ in place of X.

We also saw the following much stronger (sub)martingale inequality in the post on the maximum maximum of martingales with known terminal distribution.

Theorem 2 Let X be a cadlag submartingale. Then, for any real K and nonnegative real t,

$\displaystyle {\mathbb P}(\bar X_t\ge K)\le\inf_{x < K}\frac{{\mathbb E}[(X_t-x)_+]}{K-x}.$ (1)

This is particularly sharp, in the sense that for any distribution for ${X_t}$ , there exists a martingale with this terminal distribution for which (1) becomes an equality simultaneously for all values of K. Furthermore, all of the inequalities stated in theorem 1 follow from (1). For example, the first one is obtained by taking ${x=0}$ in (1). The remaining two can also be proved from (1) by integrating over K.

Note that all of the submartingale inequalities above are of the form

$\displaystyle {\mathbb E}[F(\bar X_t)]\le{\mathbb E}[G(X_t)]$

(2)

for certain choices of functions ${F,G\colon{\mathbb R}\rightarrow{\mathbb R}^+}$ . The aim of this post is to show how they have a more general `pathwise’ form,

$\displaystyle F(\bar X_t)\le G(X_t) - \int_0^t\xi\,dX$

(3)

for some nonnegative predictable process ${\xi}$ . It is relatively straightforward to show that (2) follows from (3) by noting that the integral is a submartingale and, hence, has nonnegative expectation. To be rigorous, there are some integrability considerations to deal with, so a proof will be included later in this post.

Inequality (3) is required to hold almost everywhere, and not just in expectation, so is a considerably stronger statement than the standard martingale inequalities. Furthermore, it is not necessary for X to be a submartingale for (3) to make sense, as it holds for all semimartingales. We can go further, and even drop the requirement that X is a semimartingale. As we will see, in the examples covered in this post, ${\xi_t}$ will be of the form ${h(\bar X_{t-})}$ for an increasing right-continuous function ${h\colon{\mathbb R}\rightarrow{\mathbb R}}$ , so integration by parts can be used,

$\displaystyle \int h(\bar X_-)\,dX = h(\bar X)X-h(\bar X_0)X_0 - \int X\,dh(\bar X).$

(4)

The right hand side of (4) is well-defined for any cadlag real-valued process, by using the pathwise Lebesgue–Stieltjes integral with respect to the increasing process ${h(\bar X)}$ , so can be used as the definition of ${\int h(\bar X_-)dX}$ . In the case where X is a semimartingale, integration by parts ensures that this agrees with the stochastic integral ${\int\xi\,dX}$ . Since we now have an interpretation of (3) in a pathwise sense for all cadlag processes X, it is no longer required to suppose that X is a submartingale, a semimartingale, or even require the existence of an underlying probability space. All that is necessary is for ${t\mapsto X_t}$ to be a cadlag real-valued function. Hence, we reduce the martingale inequalities to straightforward results of real-analysis not requiring any probability theory and, consequently, are much more general. I state the precise pathwise generalizations of Doob’s inequalities now, leaving the proof until later in the post. As the first of inequality of theorem 1 is just the special case of (1) with ${x=0}$ , we do not need to explicitly include this here.

Theorem 3 Let X be a cadlag process and t be a nonnegative time.

For real ${K > x}$ ,

$\displaystyle 1_{\{\bar X_t\ge K\}}\le\frac{(X_t-x)_+}{K-x}-\int_0^t\xi\,dX$ (5)

where ${\xi=(K-x)^{-1}1_{\{\bar X_-\ge K\}}}$ .

If X is nonnegative and p,q are positive reals with ${p^{-1}+q^{-1}=1}$ then,

$\displaystyle \bar X_t^p\le q^p X^p_t-\int_0^t\xi dX$ (6)

where ${\xi=pq\bar X_-^{p-1}}$ .

If X is nonnegative then,

$\displaystyle \bar X_t\le\frac{e}{e-1}\left( X_t \log X_t +1\right)-\int_0^t\xi\,dX$ (7)

where ${\xi=\frac{e}{e-1}\log(\bar X_-\vee1)}$ .

Continue reading “Pathwise Martingale Inequalities” →

Semimartingale Local Times

Figure 1: Brownian motion B with local time L and auxiliary Brownian motion W

For a stochastic process X taking values in a state space E, its local time at a point ${x\in E}$ is a measure of the time spent at x. For a continuous time stochastic process, we could try and simply compute the Lebesgue measure of the time at the level,

$\displaystyle L^x_t=\int_0^t1_{\{X_s=x\}}ds.$

(1)

For processes which hit the level ${x}$ and stick there for some time, this makes some sense. However, if X is a standard Brownian motion, it will always give zero, so is not helpful. Even though X will hit every real value infinitely often, continuity of the normal distribution gives ${{\mathbb P}(X_s=x)=0}$ at each positive time, so that that ${L^x_t}$ defined by (1) will have zero expectation.

Rather than the indicator function of ${\{X=x\}}$ as in (1), an alternative is to use the Dirac delta function,

$\displaystyle L^x_t=\int_0^t\delta(X_s-x)\,ds.$

(2)

Unfortunately, the Dirac delta is not a true function, it is a distribution, so (2) is not a well-defined expression. However, if it can be made rigorous, then it does seem to have some of the properties we would want. For example, the expectation ${{\mathbb E}[\delta(X_s-x)]}$ can be interpreted as the probability density of ${X_s}$ evaluated at ${x}$ , which has a positive and finite value, so it should lead to positive and finite local times. Equation (2) still relies on the Lebesgue measure over the time index, so will not behave as we may expect under time changes, and will not make sense for processes without a continuous probability density. A better approach is to integrate with respect to the quadratic variation,

$\displaystyle L^x_t=\int_0^t\delta(X_s-x)d[X]_s$

(3)

which, for Brownian motion, amounts to the same thing. Although (3) is still not a well-defined expression, since it still involves the Dirac delta, the idea is to come up with a definition which amounts to the same thing in spirit. Important properties that it should satisfy are that it is an adapted, continuous and increasing process with increments supported on the set ${\{X=x\}}$ ,

$\displaystyle L^x_t=\int_0^t1_{\{X_s=x\}}dL^x_s.$

Local times are a very useful and interesting part of stochastic calculus, and finds important applications to excursion theory, stochastic integration and stochastic differential equations. However, I have not covered this subject in my notes, so do this now. Recalling Ito’s lemma for a function ${f(X)}$ of a semimartingale X, this involves a term of the form ${\int f^{\prime\prime}(X)d[X]}$ and, hence, requires ${f}$ to be twice differentiable. If we were to try to apply the Ito formula for functions which are not twice differentiable, then ${f^{\prime\prime}}$ can be understood in terms of distributions, and delta functions can appear, which brings local times into the picture. In the opposite direction, which I take in this post, we can try to generalise Ito’s formula and invert this to give a meaning to (3). Continue reading “Semimartingale Local Times” →

A Process With Hidden Drift

Consider a stochastic process X of the form

$\displaystyle X_t=W_t+\int_0^t\xi_sds,$

(1)

for a standard Brownian motion W and predictable process ${\xi}$ , defined with respect to a filtered probability space ${(\Omega,\mathcal F,\{\mathcal F_t\}_{t\in{\mathbb R}_+},{\mathbb P})}$ . For this to make sense, we must assume that ${\int_0^t\lvert\xi_s\rvert ds}$ is almost surely finite at all times, and I will suppose that ${\mathcal F_\cdot}$ is the filtration generated by W.

The question is whether the drift ${\xi}$ can be backed out from knowledge of the process X alone. As I will show with an example, this is not possible. In fact, in our example, X will itself be a standard Brownian motion, even though the drift ${\xi}$ is non-trivial (that is, ${\int\xi dt}$ is not almost surely zero). In this case X has exactly the same distribution as W, so cannot be distinguished from the driftless case with ${\xi=0}$ by looking at the distribution of X alone.

On the face of it, this seems rather counter-intuitive. By standard semimartingale decomposition, it is known that we can always decompose

$\displaystyle X=M+A$

(2)

for a unique continuous local martingale M starting from zero, and unique continuous FV process A. By uniqueness, ${M=W}$ and ${A=\int\xi dt}$ . This allows us to back out the drift ${\xi}$ and, in particular, if the drift is non-trivial then X cannot be a martingale. However, in the semimartingale decomposition, it is required that M is a martingale with respect to the original filtration ${\mathcal F_\cdot}$ . If we do not know the filtration ${\mathcal F_\cdot}$ , then it might not be possible to construct decomposition (2) from knowledge of X alone. As mentioned above, we will give an example where X is a standard Brownian motion which, in particular, means that it is a martingale under its natural filtration. By the semimartingale decomposition result, it is not possible for X to be an ${\mathcal F_\cdot}$ -martingale. A consequence of this is that the natural filtration of X must be strictly smaller than the natural filtration of W.

The inspiration for this post was a comment by Gabe posing the following question: If we take ${\mathbb F}$ to be the filtration generated by a standard Brownian motion W in ${(\Omega,\mathcal F,{\mathbb P})}$ , and we define ${\tilde W_t=W_t+\int_0^t\Theta_udu}$ , can we find an ${\mathbb F}$ -adapted ${\Theta}$ such that the filtration generated by ${\tilde W}$ is smaller than ${\mathbb F}$ ? Our example gives an affirmative answer. Continue reading “A Process With Hidden Drift” →

Projection in Discrete Time

It has been some time since my last post, but I am continuing now with the stochastic calculus notes on optional and predictable projection. In this post, I will go through the ideas in the discrete-time situation. All of the main concepts involved in optional and predictable projection are still present in discrete time, but the theory is much simpler. It is only really in continuous time that the projection theorems really show their power, so the aim of this post is to motivate the concepts in a simple setting before generalising to the full, continuous-time situation. Ideally, this would have been published before the posts on optional and predictable projection in continuous time, so it is a bit out of sequence.

We consider time running through the discrete index set ${{\mathbb Z}^+=\{0,1,2,\ldots\}}$ , and work with respect to a filtered probability space ${(\Omega,\mathcal{F},\{\mathcal{F}_n\}_{n=0,1,\ldots},{\mathbb P})}$ . Then, ${\mathcal{F}_n}$ is used to represent the collection of events observable up to and including time n. Stochastic processes will all be real-valued and defined up to almost-sure equivalence. That is, processes X and Y are considered to be the same if ${X_n=Y_n}$ almost surely for each ${n\in{\mathbb Z}^+}$ . The projections of a process X are defined as follows.

Definition 1 Let X be a measurable process. Then,

the optional projection, ${{}^{\rm o}\!X}$ , exists if and only if ${{\mathbb E}[\lvert X_n\rvert\,\vert\mathcal{F}_n]}$ is almost surely finite for each n, in which case

$\displaystyle {}^{\rm o}\!X_n={\mathbb E}[X_n\,\vert\mathcal{F}_n].$ (1)

the predictable projection, ${{}^{\rm p}\!X}$ , exists if and only if ${{\mathbb E}[\lvert X_n\rvert\,\vert\mathcal{F}_{n-1}]}$ is almost surely finite for each n, in which case

$\displaystyle {}^{\rm p}\!X_n={\mathbb E}[X_n\,\vert\mathcal{F}_{n-1}].$ (2)

Continue reading “Projection in Discrete Time” →

The Projection Theorems

In this post, I introduce the concept of optional and predictable projections of jointly measurable processes. Optional projections of right-continuous processes and predictable projections of left-continuous processes were constructed in earlier posts, with the respective continuity conditions used to define the projection. These are, however, just special cases of the general theory. For arbitrary measurable processes, the projections cannot be expected to satisfy any such pathwise regularity conditions. Instead, we use the measurability criteria that the projections should be, respectively, optional and predictable.

The projection theorems are a relatively straightforward consequence of optional and predictable section. However, due to the difficulty of proving the section theorems, optional and predictable projection is generally considered to be an advanced or hard part of stochastic calculus. Here, I will make use of the section theorems as stated in an earlier post, but leave the proof of those until after developing the theory of projection.

As usual, we work with respect to a complete filtered probability space ${(\Omega,\mathcal{F},\{\mathcal{F}\}_{t\ge0},{\mathbb P})}$ , and only consider real-valued processes. Any two processes are considered to be the same if they are equal up to evanescence. The optional projection is then defined (up to evanescence) by the following.

Theorem 1 (Optional Projection) Let X be a measurable process such that ${{\mathbb E}[1_{\{\tau < \infty\}}\lvert X_\tau\rvert\;\vert\mathcal{F}_\tau]}$ is almost surely finite for each stopping time ${\tau}$ . Then, there exists a unique optional process ${{}^{\rm o}\!X}$ , referred to as the optional projection of X, satisfying

$\displaystyle 1_{\{\tau < \infty\}}{}^{\rm o}\!X_\tau={\mathbb E}[1_{\{\tau < \infty\}}X_\tau\,\vert\mathcal{F}_\tau]$ (1)

almost surely, for each stopping time ${\tau}$ .

Predictable projection is defined similarly.

Theorem 2 (Predictable Projection) Let X be a measurable process such that ${{\mathbb E}[1_{\{\tau < \infty\}}\lvert X_\tau\rvert\;\vert\mathcal{F}_{\tau-}]}$ is almost surely finite for each predictable stopping time ${\tau}$ . Then, there exists a unique predictable process ${{}^{\rm p}\!X}$ , referred to as the predictable projection of X, satisfying

$\displaystyle 1_{\{\tau < \infty\}}{}^{\rm p}\!X_\tau={\mathbb E}[1_{\{\tau < \infty\}}X_\tau\,\vert\mathcal{F}_{\tau-}]$ (2)

almost surely, for each predictable stopping time ${\tau}$ .

Continue reading “The Projection Theorems” →

Pathwise Regularity of Optional and Predictable Processes

As I have mentioned before in these notes, when working with processes in continuous time, it is important to select a good modification. Typically, this means that we work with processes which are left or right continuous. However, in general, it can be difficult to show that the paths of a process satisfy such pathwise regularity. In this post I show that for optional and predictable processes, the section theorems introduced in the previous post can be used to considerably simplify the situation. Although they are interesting results in their own right, the main application in these notes will be to optional and predictable projection. Once the projections are defined, the results from this post will imply that they preserve certain continuity properties of the process paths.

Suppose, for example, that we have a continuous-time process X which we want to show to be right-continuous. It is certainly necessary that, for any sequence of times ${t_n\in{\mathbb R}_+}$ decreasing to a limit ${t}$ , ${X_{t_n}}$ almost-surely tends to ${X_t}$ . However, even if we can prove this for every possible decreasing sequence ${t_n}$ , it does not follow that X is right-continuous. As a counterexample, if ${\tau\colon\Omega\rightarrow{\mathbb R}}$ is any continuously distributed random time, then the process ${X_t=1_{\{t\le \tau\}}}$ is not right-continuous. However, so long as the distribution of ${\tau}$ has no atoms, X is almost-surely continuous at each fixed time t. It is remarkable, then, that if we generalise to look at sequences of stopping times, then convergence in probability along decreasing sequences of stopping times is enough to guarantee everywhere right-continuity of the process. At least, it is enough so long as we restrict consideration to optional processes.

As usual, we work with respect to a complete filtered probability space ${(\Omega,\mathcal{F},\{\mathcal{F}_t\}_{t\ge0},{\mathbb P})}$ . Two processes are considered to be the same if they are equal up to evanescence, and any pathwise property is said to hold if it holds up to evanescence. That is, a process is right-continuous if and only is it is everywhere right-continuous on a set of probability 1. All processes will be taken to be real-valued, and a process is said to have left (or right) limits if its left (or right) limits exist everywhere, up to evanescence, and are finite.

Theorem 1 Let X be an optional process. Then,

X is right-continuous if and only if ${X_{\tau_n}\rightarrow X_\tau}$ in probability, for each uniformly bounded sequence ${\tau_n}$ of stopping times decreasing to a limit ${\tau}$ .

X has right limits if and only if ${X_{\tau_n}}$ converges in probability, for each uniformly bounded decreasing sequence ${\tau_n}$ of stopping times.

X has left limits if and only if ${X_{\tau_n}}$ converges in probability, for each uniformly bounded increasing sequence ${\tau_n}$ of stopping times.

The `only if’ parts of these statements is immediate, since convergence everywhere trivially implies convergence in probability. The importance of this theorem is in the `if’ directions. That is, it gives sufficient conditions to guarantee that the sample paths satisfy the respective regularity properties.

Note that conditions for left-continuity are absent from the statements of Theorem 1. In fact, left-continuity does not follow from the corresponding property along sequences of stopping times. Consider, for example, a Poisson process, X. This is right-continuous but not left-continuous. However, its jumps occur at totally inaccessible times. This implies that, for any sequence ${\tau_n}$ of stopping times increasing to a finite limit ${\tau}$ , it is true that ${X_{\tau_n}}$ converges almost surely to ${X_\tau}$ . In light of such examples, it is even more remarkable that right-continuity and the existence of left and right limits can be determined by just looking at convergence in probability along monotonic sequences of stopping times. Theorem 1 will be proven below, using the optional section theorem.

For predictable processes, we can restrict attention to predictable stopping times. In this case, we obtain a condition for left-continuity as well as for right-continuity.

Theorem 2 Let X be a predictable process. Then,

X is right-continuous if and only if ${X_{\tau_n}\rightarrow X_\tau}$ in probability, for each uniformly bounded sequence ${\tau_n}$ of predictable stopping times decreasing to a limit ${\tau}$ .

X is left-continuous if and only if ${X_{\tau_n}\rightarrow X_\tau}$ in probability, for each uniformly bounded sequence ${\tau_n}$ of predictable stopping times increasing to a limit ${\tau}$ .

X has right limits if and only if ${X_{\tau_n}}$ converges in probability, for each uniformly bounded decreasing sequence ${\tau_n}$ of predictable stopping times.

X has left limits if and only if ${X_{\tau_n}}$ converges in probability, for each uniformly bounded increasing sequence ${\tau_n}$ of predictable stopping times.

Again, the proof is given below, and relies on the predictable section theorem. Continue reading “Pathwise Regularity of Optional and Predictable Processes” →

	Anonymous on Poisson Processes
	Anonymous on About
	Anonymous on About
	Anonymous on About
	Anonymous on The Projection Theorems
	Anonymous on Feller Processes
	SilverBladeII on Cadlag Modifications
	Anonymous on Spitzer’s Formula
	Anonymous on Spitzer’s Formula
	Anonymous on Brownian Bridges