On The Integral ∫I(W ≥ 0)dW

In this post I look at the integral X_t = ∫₀^t 1_{W≥0} dW for standard Brownian motion W. This is a particularly interesting example of stochastic integration with connections to local times, option pricing and hedging, and demonstrates behaviour not seen for deterministic integrals that can seem counter-intuitive. For a start, X is a martingale so has zero expectation. To some it might, at first, seem that X is nonnegative and — furthermore — equals W ∨ 0. However, this has positive expectation contradicting the first property. In fact, X can go negative and we can compute its distribution. In a Twitter post, Oswin So asked about this very point, showing some plots demonstrating the behaviour of the integral.

simulation of X — Figure 1: Numerically evaluating ∫¹₀ 1_{W≥0} dW

We can evaluate the integral as X_t = W_t ∨ 0 – 12 L_t⁰ where L_t⁰ is the local time of W at 0. The local time is a continuous increasing process starting from 0, and only increases at times where W = 0. That is, it is constant over intervals on which W is nonzero. The first term, W_t ∨ 0 has probability density p(x) equal to that of a normal density over x > 0 and has a delta function at zero. Subtracting the nonnegative value L⁰_t spreads out the density of this delta function to the left, leading to the odd looking density computed numerically in So’s Twitter post, with a peak just to the left of the origin and dropping instantly to a smaller value on the right. We will compute an exact form for this probability density but, first, let’s look at an intuitive interpretation in the language of option pricing.

Consider a financial asset such as a stock, whose spot price at time t is S_t. We suppose that the price is defined at all times t ≥ 0 and has continuous sample paths. Furthermore, suppose that we can buy and sell at spot any time with no transaction costs. A call option of strike price K and maturity T pays out the cash value (S_T - K)₊ at time T. For simplicity, assume that this is ‘out of the money’ at the initial time, meaning that S₀ ≤ K.

The idea of option hedging is, starting with an initial investment, to trade in the stock in such a way that at maturity T, the value of our trading portfolio is equal to (S_T - K)₊. This synthetically replicates the option. A naive suggestion which is sometimes considered is to hold one unit of stock at all times t for which S_t ≥ K and zero units at all other times.The profit from such a strategy is given by the integral X_T = ∫₀^T 1_{S≥K} dS. If the stock only equals the strike price at finitely many times then this works. If it first hits K at time s and does not drop back below it on interval (s, t) then the profit at t is equal to the amount S_t – K that it has gone up since we purchased it. If it drops back below the strike then we sell at K for zero profit or loss, and this repeats for subsequent times that it exceeds K. So, at time T, we hold one unit of stock if its value is above K for a profit of S_T – K and zero units for zero profit otherwise. This replicates the option payoff.

The idea described works if S_T hits the strike K at a finite set of times,and also if the path of S_t has finite variation, in which case Lebesgue-Stieltjes integration gives X_T = (S_T - K)₊. It cannot work for stock prices though! If it did, then we have a trading strategy which is guaranteed to never lose money but generates profits on the positive probability event that S_T > K. This is arbitrage, generating money with zero risk, which should be impossible.

What goes wrong? First, Brownian motion does not have sample paths with finite variation and will not hit a level finitely often. Instead, if it reaches K then it hits the level uncountably often. As our simple trading strategy would involve buying and selling infinitely often, it is not so easy. Instead, we can approximate by a discrete-time strategy and take the limit. Choosing a finite sequence of times 0 = t₀ < t₁ < ⋯< t_n = T, the discrete approximation is to hold one unit of the asset over the interval (t_i, t_i+1] if S_{t_i} ≥ K and zero units otherwise.

The discrete strategy involves buying one unit of the asset whenever its price reaches K at one of the discrete times and selling whenever it drops back below. This replicates the option payoff, except for the fact then when we buy above K we effectively overpay by amount S_{t_i} – K and, when we sell below K, we lose K – S_{t_i}. This results in some slippage from not being able to execute at the exact level,

$\displaystyle A_T=\sum_{i=1}^{n}1_{\{S_{t_{i-1}} < K\le S_{t_i}{\rm\ or\ }S_{t_{i-1}}\ge K > S_{t_i}\}}\lvert S_{t_i}-K\rvert.$

So, our simple trading strategy generates profit (S_T - K)₊ – A_T, missing the option value by amount A_T. In the limit as n goes to infinity with time step size going to zero, the slippage A_T does not go to zero. For equally spaced times, It can be shown that the number of times that spot crosses K is of order √n, and each of these times generates slippage of order 1/√n on average. So, in the limit, A_T does not vanish and, instead, converges on a positive value equal to half the local time L_T^K.

Figure 2: Naive option hedge with slippage

Figure 2 shows the situation, with the slippage A shown on the same plot (using K as the zero axis, so they are on the same scale). We can just take K = 0 for an asset whose spot price can be positive or negative. Then, with S = W, our integral X_T = ∫₀^T 1_{W≥0} dW is the same as the payoff from the naive option hedge, or (S_T)₊ minus slippage L⁰_T/2.

Now lets turn to a computation of the probability density of X_T = W_T ∨ 0 – L_T⁰/2. By the scaling property of Brownian motion, the distribution of X_T/√T does not depend on T, so we take T = 1 without loss of generality. The first trick to this is to make use of the fact that, if M_t = sup_s≤tW_s is the running maximum then (|W_t|, L_t⁰) has the same joint distribution as (M_t - W_t, M_t). This immediately tells us that L₁⁰ has the same distribution as M₁ which, by the reflection principle, has the same distribution as |W₁|. Using

$\displaystyle \varphi(x)=\frac1{\sqrt{2\pi}}e^{-\frac12x^2}$

for the standard normal density, this shows that the local time L₁⁰ has probability density 2φ(x) over x > 0.

Next, as flipping the sign W does not impact either |W₁| or L₁⁰, sgn(W₁) is independent of these. On the event W₁ < 0 we have X₁ = –L₁⁰/2 which has density 4φ(2x) over x < 0. On the event W₁ > 0, we have X₁ = |W₁|-L₁⁰/2, which has the same distribution as M₁/2 – W₁.

To complete the computation of the probability density of X₁, we need to know the joint distribution of M₁ and W₁, which can be done as described in the post on the reflection principle. The probability that W₁ is in an interval of width δx about a point x and that M₁ > y, for some y > x is, by reflection, equal to the probability that W₁ is in an interval of width δx about the point 2y – x. This has probability φ(2y - x)δx and, by differentiating in y, gives a joint probability density of 2φ′(x - 2y) for (W₁, M₁).

The expectation of f(X₁) for bounded measurable function f can be computed by integrating over this joint probability density.

$\displaystyle \begin{aligned} {\mathbb E}[f(X_1)\vert\;W_1 > 0] &={\mathbb E}[f(M_1/2-W_1)]\\ &=2\int_{-\infty}^\infty\int_{x_+}^\infty f(y/2-x)\varphi'(x-2y)\,dydx\\ &=4\int_{-\infty}^\infty\int_{(-x)\vee(-x/2)}^\infty f(z)\varphi'(-3x-4z)\,dzdx\\ &=4\int_{-\infty}^\infty\int_{(-z)\vee(-2z)}^\infty f(z)\varphi'(-3x-4z)\,dxdz\\ &=\frac43\int_{-\infty}^\infty f(z)\varphi(2z)\,dz+\frac43\int_0^\infty f(z)\varphi(z)\,dz. \end{aligned}$

The substitution z = y/2 – x was applied in the inner integral, and the order of integration switched. The probability density of X₁ conditioned on W₁ > 0 is therefore,

$\displaystyle p_{X_1}(x\vert\; W_1 > 0)=\begin{cases} \frac43\varphi(x),&{\rm for\ }x > 0,\\ \frac43\varphi(2x),&{\rm for\ }x < 0. \end{cases}$

Conditioned on W₁ < 0, we have already shown that the density is 4φ(2x) over x < 0 so, taking the average of these, we obtain

$\displaystyle p_{X_1}(x)=\begin{cases} \frac23\varphi(x),&{\rm for\ }x > 0,\\ \frac83\varphi(2x),&{\rm for\ }x < 0. \end{cases}$

This is plotted in figure 3 below, agreeing with So’s numerical estimation from the Twitter post shown in figure 1 above.

Model-Independent Discrete Barrier Adjustments

I continue the investigation of discrete barrier approximations started in an earlier post. The idea is to find good approximations to a continuous barrier condition, while only sampling the process at a discrete set of times. The difference now is that I will look at model independent methods which do not explicitly depend on properties of the underlying process, such as the volatility. This will enable much more generic adjustments which can be applied more easily and more widely. I point out now, the techniques that I will describe here are original research and cannot currently be found in the literature outside of this blog, to the best of my knowledge.

Recall that the problem is to compute the expected value of a function of a stochastic process X,

$\displaystyle V={\mathbb E}\left[f(X_T);\;\sup{}_{t\le T}X_t \ge K\right]$

(1)

which depends on whether or not the process crosses a continuous barrier level K. In many applications, such as with Monte Carlo simulation, we typically only sample X at a discrete set of times 0 < t₁ < t₂ < ⋯< t_n = T. In that case, the continuous barrier is necessarily approximated by a discrete one

$\displaystyle V={\mathbb E}\left[f(X_T);\;\sup{}_{i=1,\ldots,n}X_{t_i}\ge K\right].$

(2)

As we saw, this converges slowly as the number n of sampling times increases, with the error between this and the limiting continuous barrier (1) only going to zero at rate 1/√n.

A barrier adjustment as described in the earlier post is able to improve this convergence rate. If X is a Brownian motion with constant drift μ and positive volatility σ, then the discrete barrier level K is shifted down by an amount βσ√δt where β ≈ 0.5826 is a constant and δt = T/n is the sampling width. We are assuming, for now, that the sampling times are equally spaced. As was seen, using the shifted barrier level in (2) improves the rate of convergence. Although we did not theoretically derive the new convergence rate, numerical experiment suggests that it is close to 1/n.

Another way to express this is to shift the values of X up,

$\displaystyle M_i=X_{t_i}+\beta\sigma\sqrt{\delta t}.$

(3)

Then, (2) is replaced to use these shifted values, which are a proxy for the maximum value of X across each of the intervals (t_i-1, t_i),

$\displaystyle V={\mathbb E}\left[f(X_T);\;\sup{}_{i=1,\ldots,n}M_i\ge K\right].$

(4)

As it is equivalent to shifting the level K down, we still obtain the improved rate of convergence.

This idea is especially useful because of its generality. For non-equally spaced sampling times, the adjustment (3) can still be applied. Now, we just set δt = t_i – t_i-1 to be the spacing for the specific time, so depends on index i. It can also be used for much more general expressions than (1). Any function of X which depends on whether or not it crosses a continuous barrier can potentially make use of the adjustment described. Even if X is an Ito process with time dependent drift and volatility

$\displaystyle dX_t=\sigma_t\,dB_t+\mu_t\,dt,$

(5)

the method can be applied. Now, the volatility in (3) is replaced by an average value across the interval (t_i-1, t_i).

The methods above are very useful, but there is a further improvement that can be made. Ideally, we would not have to specify an explicit value of the volatility σ. That is, it should be model independent. There are many reasons why this is desirable. Suppose that we are running a Monte Carlo simulation and generate samples of X at the times t_i. If the simulation only outputs values of X, then this is not sufficient to compute (3). So, it will be necessary to update the program running the simulation to also output the volatility. In some situations this might not be easy. For example, X could be a complicated function of various other processes and, although we could use Ito’s lemma to compute the volatility of X from the other processes, it could be messy. In some situations we might not even have access to the volatility or any method of computing it. For example, the values of X could be computed from historical data. We could be looking at the probability of stock prices crossing a level by looking at historical close fixings, without access to the complete intra-day data. In any case, a model independent discrete barrier adjustment would make applying it much easier.

Removing Volatility Dependence

How can the volatility term be removed from adjustment (3)? One idea is to replace it by an estimator computed from the samples of X, such as

$\displaystyle \hat\sigma^2=\frac1T\sum_{i=1}^n(X_{t_i}-X_{t_{i-1}})^2.$

While this would work, at least for a constant volatility process, it does not meet the requirements. For a general Ito process (5) with stochastic volatility, using an estimator computed over the whole time interval [0, T] may not be a good approximation for the volatility at the time that the barrier is hit. A possible way around this is for the adjustment (3) applied at time t_i to only depend on a volatility estimator computed from samples near the time. This would be possible, although it is not clear what is the best way to select these times. Besides, an important point to note is that we do not need a good estimate of the volatility, since that is not the goal here.

As explained in the previous post, adjustment (3) works because it corrects for the expected overshoot when the barrier is hit. Specifically, at the first time for which M_i ≥ K, the overshoot is R = X_t_i – K. If there was no adjustment then the overshoot is positive and the leading order term in the discrete barrier approximation error is proportional to 𝔼[R]. The positive shift added to X_t_i is chosen to compensate for this, giving zero expected overshoot to leading order, and reducing the barrier approximation error. The same applies to any similar adjustment. As long as there is sufficient freedom in choosing M_i, then it should be possible to do it in a way that has zero expected overshoot. Taking this to the extreme, it should be possible to compute the adjustment at time t_i using only the sampled values X_t_i-1 and X_t_i.

Consider adjustments of the form

$\displaystyle M_i=\theta(X_{t_{i-1}},X_{t_i})$

for θ: ℝ² → ℝ. By model independence, if this adjustment applies to a process X, then it should equally apply to the shifted and scaled processes X + a and bX for constants a and b > 0. Equivalently, θ satisfies the scaling and translation invariance,

$\displaystyle \begin{aligned} &\theta(x+a,y+a)=\theta(x,y)+a,\\ &\theta(bx,by)=b\theta(x,y). \end{aligned}$

(6)

This restricts the possible forms that θ can take.

Lemma 1 A function θ: ℝ² → ℝ satisfies (6) if and only if

$\displaystyle \theta(x,y)=py+(1-p)x+c\lvert y-x\rvert$

for constants p, c.

Proof: Write θ(0, u) as the sum of its antisymmetric and symmetric parts

$\displaystyle \theta(0,u)=(\theta(0,u)-\theta(0,-u))/2+(\theta(0,u)+\theta(0,-u))/2.$

By scaling invariance, the first term on the right is proportional to u and the second is proportional to |u|. Hence,

$\displaystyle \theta(0,u)=pu+c\lvert u\rvert$

for constants p and c. Using translation invariance,

$\displaystyle \begin{aligned} \theta(x,y) &= x + \theta(0,y-x)\\ &=x + p(y-x)+c\lvert y-x\rvert \end{aligned}$

as required. ⬜

I will therefore only consider adjustments where the maximum of the process across the interval (t_i-1, t_i) is replaced by

$\displaystyle M_i=pX_{t_i}+(1-p)X_{t_{i-1}}+c\lvert X_{t_i}-X_{t_{i-1}}\rvert.$

(7)

According to (3), the barrier condition sup_t≤TX_t ≥ K is replaced by the discrete approximation max_iM_i ≥ K.

There are various ways in which (7) can be parameterized, but this form is quite intuitive. The term pX_t_i + (1 - p)X_t_i-1 is an interpolation of the path of X, and c|X_t_i – X_t_i-1| represents a shift proportional to the sample deviation across the interval replacing the σ√δt term of the simple shift (3). The purpose of this post is to find values for p and c giving a good adjustment, improving convergence of the discrete approximation.

The discrete barrier condition M_i ≥ K given by (7) can be satisfied while the process is below the barrier level, giving a negative barrier ‘overshoot’ R = X_t_i – K as in figure 2. As we will see, this is vital to obtaining an accurate approximation for the hitting probability. Continue reading “Model-Independent Discrete Barrier Adjustments” →

Discrete Barrier Approximations

It is quite common to consider functions of real-time stochastic process which depend on whether or not it crosses a specified barrier level K. This can involve computing expectations involving a real-valued process X of the form

$\displaystyle V={\mathbb E}\left[f(X_T);\;\sup{}_{t\le T}X_t \ge K\right]$

(1)

for a positive time T and function f: ℝ → ℝ. I am using the notation 𝔼[A;S] to denote the expectation of random variable A restricted to event S, or 𝔼[A1_S].

One example is computing prices of financial derivatives such as barrier options, where T represents the expiration time and f is the payoff at expiry conditional on hitting upper barrier level K. A knock-in call option would have the final payoff f(x) = (x - a)₊ for a contractual strike of a. Knock-out options are similar, except that the payoff is conditioned on not hitting the barrier level. As the sum of knock-in and knock-out options is just an option with no barrier, both cases involve similar calculations.

Alternatively, the barrier can be discrete, meaning that it only involves sampling the process at a finite set of times 0 ≤ t₁ ≤ ⋯ ≤ t_n ≤ T. Then, equation (1) is replaced by

$\displaystyle V={\mathbb E}\left[f(X_T);\;\sup{}_{i=1,\ldots,n}X_{t_i}\ge K\right].$

(2)

Naturally, sampling at a finite set of times will reduce the probability of the barrier being reached and, so, if f is nonnegative then (2) will have a lower value than (1). It should still converge though as n goes to infinity and the sampling times become dense in the interval.

If the underlying process X is Brownian motion or geometric Brownian motion, possibly with a constant drift, then there are exact expressions for computing (1) in terms of integrating f against a normal density. See the post on the reflection principle for more information. However, it is difficult to find exact expressions for the discrete barrier (2) other than integrating over high-dimensional joint normal distributions. So, it can be useful to approximate a discrete barrier with analytic formulas for the continuous barrier. This is the idea used in the classic 1997 paper A Continuity Correction for Discrete Barrier Options by Broadie, Glasserman and Kou (freely available here).
We may want to compute the continuous barrier expectation (1) using Monte Carlo simulation. This is a common method, but involves generating sample paths of the process X at a finite set of times. This means that we are only able to sample at these times so, necessarily, are restricted to discrete barrier calculations as in (2).

I am primarily concerned with the second idea This is a very general issue, since Monte Carlo simulation is a common technique used in many applications. However, as it only represents sample paths at discrete time points, it necessarily involves discretely approximating continuous barrier levels. You may well ask why we would even want to use Monte Carlo if, as I mentioned above, there are exact expressions in these cases.In answer, such formulas only hold in very restrictive situations where the process X is a Brownian motion or geometric Brownian motion with constant drift. More generally it could be an ‘Ito process’ of the form

$\displaystyle dX_t=\sigma_t\,dB_t+\mu_t\,dt$

(3)

where B is standard Brownian motion. This describes X as a stochastic integral with respect to the predictable integrands σ and μ, which represent the volatility and drift of the process. Strictly speaking, these are ‘linear’ volatility and drift terms, rather than log-linear as used in many financial models applied to nonnegative processes such as stock prices. This is simply the choice made here, since this post is addressing a general mathematical problem of approximating continuous barriers and not restricting to such specific applications.

If the volatility and drift terms in (3) are not constant, then the exact formulas no longer hold. This is true, even if they are deterministic functions of time. In practice, these terms are often stochastic and can be rather general, in which case trying to find exact expressions is an almost hopeless task. Even though I concentrate on the case with constant volatility and drift in any calculations performed here, this is for convenience of exposition. The idea is that, as long as σ is piecewise continuous then, locally, it is well approximate as constant and the techniques discussed here should still apply.

In addition to considering general Ito processes (3), the ideas described here will apply to much more general functions of the process X than stated in (1). In the financial context, this means more general payoffs than simple knock-in or knock-out options. For example, autocallable trades involve a down-and-in put option but, additionally, contain a discrete set of upper barriers which cause the trade to make a final payment and terminate. They may also allow the issuer to early terminate the trade on a discrete set of dates. Furthermore, trades can depend on different assets with separate barriers on each of them, or on the average of a basket of assets, or have different barrier levels in different time periods. The list of possibilities is endless but, the idea is that each continuous barrier inside a complex payoff will be approximated by discretely sampled barrier conditions.

For efficiency, we may also want to approximate a discrete barrier with a large number of sampling times by one with fewer. The methods outlined in the post can also be used for this. In particular, the simple barrier shift described below could be used by taking the difference between the shift computed for the times actually sampled and the one for the required sample times. I do not go into details of this, but mention it now give an idea of the generality of the technique.

Figure 1: Discrete barrier approximation error

Let’s consider simply approximating a continuous barrier in (1) by the discrete barrier in (2). This will converge as the number of sampling times t_i increases but, the problem is, it converges very slowly. We can get an idea of the order of the error when the sampling times have a δt spacing which, with equally spaced times, is given by δt = T/n. This is as shown in figure 1 above. When the process first hits the continuous barrier level, it will be on average about δt/2 before the next sampling time. If X behaves approximately like a Brownian motion with volatiity σ over this interval then it will have about 50% chance of being above K at the next discrete time. On the other hand, it will be below K with about 50% probability, in which case with will drop a distance proportional to σ√δt below on average. This means that if the continuous barrier is hit, there is a probability roughly proportional to σ√δt that the discrete barrier is not hit. So, the error in approximating a continuous barrier (1) by the discrete case (2) is of the order of σ√δt which only tends to zero at rate 1/√n. Continue reading “Discrete Barrier Approximations” →

Brownian Motion and the Riemann Zeta Function

Intriguingly, various constructions related to Brownian motion result in quantities with moments described by the Riemann zeta function. These distributions appear in integral representations used to extend the zeta function to the entire complex plane, as described in an earlier post. Now, I look at how they also arise from processes constructed from Brownian motion such as Brownian bridges, excursions and meanders.

Recall the definition of the Riemann zeta function as an infinite series

$\displaystyle \zeta(s)=1+2^{-s}+3^{-s}+4^{-s}+\cdots$

which converges for complex argument s with real part greater than one. This has a unique extension to an analytic function on the complex plane outside of a simple pole at s = 1.

Often, it is more convenient to use the Riemann xi function which can be defined as zeta multiplied by a prefactor involving the gamma function,

$\displaystyle \xi(s)=\frac12s(s-1)\pi^{-s/2}\Gamma(s/2)\zeta(s).$

This is an entire function on the complex plane satisfying the functional equation ξ(1 - s) = ξ(s).

It turns out that ξ describes the moments of a probability distribution, according to which a random variable X is positive with moments

$\displaystyle {\mathbb E}[X^s]=2\xi(s),$

(1)

which is well-defined for all complex s. In the post titled The Riemann Zeta Function and Probability Distributions, I denoted this distribution by Ψ, which is a little arbitrary but was the symbol used for its probability density. A related distribution on the positive reals, which we will denote by Φ, is given by the moments

$\displaystyle {\mathbb E}[X^s]=\frac{1-2^{1-s}}{s-1}2\xi(s)$

(2)

which, again, is defined for all complex s.

As standard, complex powers of a positive real x are defined by x^s = e^slogx, so (1,2) are equivalent to the moment generating functions of logX, which uniquely determines the distributions. The probability densities and cumulative distribution functions can be given, although I will not do that here since they are already explicitly written out in the earlier post. I will write X ∼ Φ or X ∼ Ψ to mean that random variable X has the respective distribution. As we previously explained, these are closely connected:

If X ∼ Ψ and, independently, Y is uniform on [1, 2], then X/Y ∼ Φ.
If X, Y ∼ Φ are independent then √X² + Y² ∼ Ψ.

The purpose of this post is to describe some constructions involving Brownian bridges, excursions and meanders which naturally involve the Φ and Ψ distributions.

Theorem 1 The following have distribution Φ:

√2/πZ where Z = sup_t|B_t| is the absolute maximum of a standard Brownian bridge B.

Z/√2π where Z = sup_tB_t is the maximum of a Brownian meander B.

√2πZ where Z is the sample standard deviation of a Brownian bridge B,

$\displaystyle Z=\left(\int_0^1(B_t-\bar B)^2\,dt\right)^{\frac12}$

with sample mean B̅ = ∫₀¹B_t dt.

√π/2Z where Z is the pathwise Euclidean norm of a 2-dimensional Brownian bridge B = (B¹, B²),

$\displaystyle Z=\left(\int_0^1\lVert B_t\rVert^2\,dt\right)^{\frac12}$

√τπ/2 where τ = inf{t ≥ 0: ‖B_t‖= 1} is the first time at which the norm of a 3-dimensional standard Brownian motion B = (B¹, B², B³) hits 1.

The Kolmogorov distribution is, by definition, the absolute maximum of a Brownian bridge. So, the first statement of theorem 1 is saying that Φ is just the Kolmogorov distribution scaled by the constant factor √2/π. Moving on to Ψ;

Theorem 2 The following have distribution Ψ:

√2/πZ where Z = sup_tB_t – inf_tB_t is the range of a standard Brownian bridge B.

√2/πZ where Z = sup_tB_t is the maximum of a (normalized) Brownian excursion B.

√π/2Z where Z is the pathwise Euclidean norm of a 4-dimensional Brownian bridge B = (B¹, B², B³, B⁴),

$\displaystyle Z=\left(\int_0^1\lVert B_t\rVert^2\,dt\right)^{\frac12}.$

Continue reading “Brownian Motion and the Riemann Zeta Function” →

The Minimum and Maximum of Brownian motion

If X is standard Brownian motion, what is the distribution of its absolute maximum |X|_t^∗ = sup_s ≤ t|X_s| over a time interval [0, t]? Previously, I looked at how the reflection principle can be used to determine that the maximum X_t^∗ = sup_s ≤ tX_s has the same distribution as |X_t|. This is not the same thing as the maximum of the absolute value though, which is a more difficult quantity to describe. As a first step, |X|_t^∗ is clearly at least as large as X_t^∗ from which it follows that it stochastically dominates |X_t|.

I would like to go further and precisely describe the distribution of |X|_t^∗. What is the probability that it exceeds a fixed positive level a? For this to occur, the suprema of both X and –X must exceed a. Denoting the minimum and maximum by

$\displaystyle \begin{aligned} &X_t^m=\inf_{s\le t}X_s,\\ &X_t^M=\sup_{s\le t}X_s, \end{aligned}$

then |X|_t^∗ is the maximum of X_t^M and –X_t^m. I have switched notation a little here, and am using X^M to denote what was previously written as X^∗. This is just to use similar notation for both the minimum and maximum. Using inclusion-exclusion, the probability that the absolute maximum is greater than a level a is,

$\displaystyle \begin{aligned} {\mathbb P}(\lvert X\rvert_t^* > a)={} & {\mathbb P}(X_t^M > a)+{\mathbb P}(X_t^m < -a)\\ & -{\mathbb P}(X_t^M > a{\rm\ and\ }X_t^m < -a). \end{aligned}$

As X_t^M has the same distribution as |X_t| and, by symmetry, so does –X^m, we obtain

$\displaystyle {\mathbb P}(\lvert X\rvert_t^* > a)=4{\mathbb P}(X_t > a)-{\mathbb P}(X_t^M > a{\rm\ and\ }X_t^m < -a).$

This hasn’t really answered the question. All we have done is to re-express the probability in terms of both the minimum and maximum being beyond a level. For large values of a it does, however, give a good approximation. The probability of the Brownian motion reaching a large positive value a and then dropping to the large negative value –a will be vanishingly small, so the final term in the identity above can be neglected. This gives an asymptotic approximation as a tends to infinity,

$\displaystyle \begin{aligned} {\mathbb P}(\lvert X\rvert_t^* > a) &\sim 4{\mathbb P}(X_t > a)\\ &\sim\sqrt{\frac{8t}{\pi a^2}}e^{-\frac{a^2}{2t}}. \end{aligned}$

(1)

The last expression here is just using the fact that X_t is centered Gaussian with variance t and applying a standard approximation for the cumulative normal distribution function.

For small values of a, approximation (1) does not work well at all. We know that the left-hand-side should tend to 1, whereas 4ℙ(X_t > a) will tend to 2, and the final expression diverges. In fact, it can be shown that

$\displaystyle {\mathbb P}(\lvert X\rvert_t^* < a)\sim\frac{4}{\pi}e^{-\frac{t\pi^2}{8a^2}}$

(2)

as a → 0. I gave a direct proof in this math.stackexchange answer. In this post, I will look at how we can compute joint distributions of the minimum, maximum and terminal value of Brownian motion, from which limits such as (2) will follow. Continue reading “The Minimum and Maximum of Brownian motion” →

The Brownian Drawdown Process

The drawdown of a stochastic process is the amount that it has dropped since it last hit its maximum value so far. For process X with running maximum X^∗_t = sup_s ≤ tX_s, the drawdown is thus X^∗_t – X_t, which is a nonnegative process. This is as in figure 1 below.

Brownian motion drawdown — Figure 1: Brownian motion and its drawdown process

The previous post used the reflection principle to show that the maximum of a Brownian motion has the same distribution as its terminal absolute value. That is, X^∗_t and |X_t| are identically distributed.

For a process X started from zero, its maximum and drawdown can be written as X^∗_t – X₀ and X^∗_t – X_t. Reversing the process in time across the interval [0, t] will exchange these values. So, reversing in time and translating so that it still starts from zero will exchange the maximum value and the drawdown. Specifically, write

$\displaystyle Y_s = X_{t-s} - X_t$

for time index 0 ≤ s ≤ t. The maximum of Y is equal to the drawdown of X,

$\displaystyle Y^*_t = X^*_t-X_t.$

If X is standard Brownian motion then so is Y, since the independent normal increments property for Y follows from that of X. As already stated, the maximum Y^∗_t = X^∗_t – X_t has the same distribution as the absolute value |Y_t|= |X_t|. So, the drawdown has the same distribution as the absolute value at each time.

Lemma 1 If X is standard Brownian motion, then X^∗_t – X_t has the same distribution as |X_t| at each time t ≥ 0.

Continue reading “The Brownian Drawdown Process” →

The Maximum of Brownian Motion and the Reflection Principle

The distribution of a standard Brownian motion X at a positive time t is, by definition, centered normal with variance t. What can we say about its maximum value up until the time? This is X^∗_t = sup_s ≤ tX_s, and is clearly nonnegative and at least as big as X_t. To be more precise, consider the probability that the maximum is greater than a fixed positive value a. Such problems will be familiar to anyone who has looked at pricing of financial derivatives such as barrier options, where the payoff of a trade depends on whether the maximum or minimum of an asset price has crossed a specified barrier level.

This can be computed with the aid of a symmetry argument commonly referred to as the reflection principle. The idea is that, if we reflect the Brownian motion when it first hits a level, then the resulting process is also a Brownian motion. The first time at which X hits level a is τ = inf{t ≥ 0: X_t ≥ a}, which is a stopping time. Reflecting the process about this level at all times after τ gives a new process

Reflected Brownian motion — Figure 1: Reflecting Brownian motion when it hits level a.

Continue reading “The Maximum of Brownian Motion and the Reflection Principle” →

Brownian Meanders

Having previously looked at Brownian bridges and excursions, I now turn to a third kind of process which can be constructed either as a conditioned Brownian motion or by extracting a segment from Brownian motion sample paths. Specifically, the Brownian meander, which is a Brownian motion conditioned to be positive over a unit time interval. Since this requires conditioning on a zero probability event, care must be taken. Instead, it is cleaner to start with an alternative definition by appropriately scaling a segment of a Brownian motion.

For a fixed positive times T, consider the last time σ before T at which a Brownian motion X is equal to zero,

$\displaystyle \sigma=\sup\left\{t\le T\colon X_t=0\right\}.$

(1)

On interval [σ, T], the path of X will start from 0 and then be either strictly positive or strictly negative, and we may as well restrict to the positive case by taking absolute values. Scaling invariance says that c^-1/2X_ct is itself a standard Brownian motion for any positive constant c. So, scaling the path of X on [σ, 1] to the unit interval defines a process

$\displaystyle B_t=(T-\sigma)^{-1/2}\lvert X_{\sigma+t(T-\sigma)}\rvert.$

(2)

over 0 ≤ t ≤ 1; This starts from zero and is strictly positive at all other times.

Brownian meander construction — Figure 2: Constructing a Brownian meander

Scaling invariance shows that the law of the process B does not depend on the choice of fixed time T The only remaining ambiguity is in the choice of the fixed time T.

Lemma 1 The distribution of B defined by (2) does not depend on the choice of the time T > 0.

Proof: Consider any other fixed positive time T̃, and use the construction above with T̃, σ̃, B̃ in place of T, σ, B respectively. We need to show that B̃ and B have the same distribution. Using the scaling factor S = T̃/T, then X′_t = S^-1/2X_tS is a standard Brownian motion. Also, σ′= σ̃/S is the last time before T at which X′ is zero. So,

$\displaystyle \tilde B_t=(T-\sigma')^{-1/2}\lvert X'_{\sigma'+t(T-\sigma')}\rvert$

has the same distribution as B. ⬜

This leads to the definition used here for Brownian meanders.

Definition 2 A continuous process {B_t}_{t ∈ [0, 1]} is a Brownian meander if and only it has the same distribution as (2) for a standard Brownian motion X and fixed time T > 0.

In fact, there are various alternative — but equivalent — ways in which Brownian excursions can be defined and constructed.

As a scaled segment of a Brownian motion before a time T and after it last hits 0. This is definition 2.
As a Brownian motion conditioned on being positive. See theorem 4 below.
As a segment of a Brownian excursion. See lemma 5.
As the path of a standard Brownian motion starting from its minimum, in either the forwards or backwards direction. See theorem 6.
As a Markov process with specified transition probabilities. See theorem 9 below.
As a solution to an SDE. See theorem 12 below.

Continue reading “Brownian Meanders” →

Brownian Excursions

A normalized Brownian excursion is a nonnegative real-valued process with time ranging over the unit interval, and is equal to zero at the start and end time points. It can be constructed from a standard Brownian motion by conditioning on being nonnegative and equal to zero at the end time. We do have to be careful with this definition, since it involves conditioning on a zero probability event. Alternatively, as the name suggests, Brownian excursions can be understood as the excursions of a Brownian motion X away from zero. By continuity, the set of times at which X is nonzero will be open and, hence, can be written as the union of a collection of disjoint (and stochastic) intervals (σ, τ).

In fact, Brownian motion can be reconstructed by simply joining all of its excursions back together. These are independent processes and identically distributed up to scaling. Because of this, understanding the Brownian excursion process can be very useful in the study of Brownian motion. However, there will by infinitely many excursions over finite time periods, so the procedure of joining them together requires some work. This falls under the umbrella of ‘excursion theory’, which is outside the scope of the current post. Here, I will concentrate on the properties of individual excursions.

In order to select a single interval, start by fixing a time T > 0. As X_T is almost surely nonzero, T will be contained inside one such interval (σ, τ). Explicitly,

$\displaystyle \begin{aligned} &\sigma=\sup\left\{t\le T\colon X_t=0\right\},\\ &\tau=\inf\left\{t\ge T\colon X_t=0\right\}, \end{aligned}$

(1)

so that σ < T < τ < ∞ almost surely. The path of X across such an interval is t ↦ X_σ + t for time t in the range [0, τ - σ]. As it can be either nonnegative or nonpositive, we restrict to the nonnegative case by taking the absolute value. By invariance, S^-1/2X_tS is also a standard Brownian motion, for each fixed S > 0. Using a stochastic factor S = τ – σ, the width of the excursion is normalised to obtain a continuous process {B_t}_{t ∈ [0, 1]} given by

$\displaystyle B_t=(\tau-\sigma)^{-1/2}\lvert X_{\sigma+t(\tau-\sigma)}\rvert.$

(2)

By construction, this is strictly positive over 0 < t < 1 and equal to zero at the endpoints t ∈ {0, 1}.

Figure 2: Constructing a Brownian excursion

The only remaining ambiguity is in the choice of the fixed time T.

Lemma 1 The distribution of B defined by (2) does not depend on the choice of the time T > 0.

Proof: This follows from scaling invariance of Brownian motion. Consider any other fixed positive time T̃, and use the construction above with T̃, σ̃, τ̃, B̃ in place of T, σ, τ, B respectively. We need to show that B̃ and B have the same distribution. Using the scaling factor S = T̃/T, then X′_t = S^-1/2X_tS is a standard Brownian motion. Also, σ′= σ̃/S and τ′= τ̃/S are random times given in the same way as σ and τ, but with the Brownian motion X′ in place of X in (1). So,

$\displaystyle \tilde B_t=(\tau^\prime-\sigma^\prime)^{-1/2}\lvert X^\prime_{\sigma^\prime+t(\tau^\prime-\sigma^\prime)}\rvert$

has the same distribution as B. ⬜

This leads to the definition used here for Brownian excursions.

Definition 2 A continuous process {B_t}_{t ∈ [0, 1]} is a Brownian excursion if and only it has the same distribution as (2) for a standard Brownian motion X and time T > 0.

In fact, there are various alternative — but equivalent — ways in which Brownian excursions can be defined and constructed.

As a normalized excursion away from zero of a Brownian motion. This is definition 2.
As a normalized excursion away from zero of a Brownian bridge. This is theorem 6.
As a Brownian bridge conditioned on being nonnegative. See theorem 9 below.
As the sample path of a Brownian bridge, translated so that it has minimum value zero at time 0. This is a very interesting and useful method of directly computing excursion sample paths from those of a Brownian bridge. See theorem 12 below, sometimes known as the Vervaat transform.
As a Markov process with specified transition probabilities. See theorem 15 below.
As a transformation of Bessel process paths, see theorem 16 below.
As a Bessel bridge of order 3. This can be represented either as a Bessel process conditioned on hitting zero at time 1., or as the vector norm of a 3-dimensional Brownian bridge. See lemma 17 below.
As a solution to a stochastic differential equation. See theorem 18 below.

Continue reading “Brownian Excursions” →

Brownian Bridges

A Brownian bridge can be defined as standard Brownian motion conditioned on hitting zero at a fixed future time T, or as any continuous process with the same distribution as this. Rather than conditioning, a slightly easier approach is to subtract a linear term from the Brownian motion, chosen such that the resulting process hits zero at the time T. This is equivalent, but has the added benefit of being independent of the original Brownian motion at all later times.

Lemma 1 Let X be a standard Brownian motion and ${T > 0}$ be a fixed time. Then, the process

$\displaystyle B_t = X_t - \frac tTX_T$ (1)

over ${0\le t\le T}$ is independent from ${\{X_t\}_{t\ge T}}$ .

Proof: As the processes are joint normal, it is sufficient that there is zero covariance between them. So, for times ${s\le T\le t}$ , we just need to show that ${{\mathbb E}[B_sX_t]}$ is zero. Using the covariance structure ${{\mathbb E}[X_sX_t]=s\wedge t}$ we obtain,

$\displaystyle {\mathbb E}[B_sX_t]={\mathbb E}[X_sX_t]-\frac sT{\mathbb E}[X_TX_t]=s-\frac sTT=0$

as required. ⬜

This leads us to the definition of a Brownian bridge.

Definition 2 A continuous process ${\{B_t\}_{t\in[0,T]}}$ is a Brownian bridge on the interval ${[0,T]}$ if and only it has the same distribution as ${X_t-\frac tTX_T}$ for a standard Brownian motion X.

In case that ${T=1}$ , then B is called a standard Brownian bridge.

There are actually many different ways in which Brownian bridges can be defined, which all lead to the same result.

As a Brownian motion minus a linear term so that it hits zero at T. This is definition 2.
As a Brownian motion X scaled as ${tT^{-1/2}X_{T/t-1}}$ . See lemma 9 below.
As a joint normal process with prescribed covariances. See lemma 7 below.
As a Brownian motion conditioned on hitting zero at T. See lemma 14 below.
As a Brownian motion restricted to the times before it last hits zero before a fixed positive time T, and rescaled to fit a fixed time interval. See lemma 15 below.
As a Markov process. See lemma 13 below.
As a solution to a stochastic differential equation with drift term forcing it to hit zero at T. See lemma 18 below.

There are other constructions beyond these, such as in terms of limits of random walks, although I will not cover those in this post. Continue reading “Brownian Bridges” →

	Anonymous on Poisson Processes
	Anonymous on About
	Anonymous on About
	Anonymous on About
	Anonymous on The Projection Theorems
	Anonymous on Feller Processes
	SilverBladeII on Cadlag Modifications
	Anonymous on Spitzer’s Formula
	Anonymous on Spitzer’s Formula
	Anonymous on Brownian Bridges