# Stochastic Differential Equations

Stochastic differential equations (SDEs) form a large and very important part of the theory of stochastic calculus. Much like ordinary differential equations (ODEs), they describe the behaviour of a dynamical system over infinitesimal time increments, and their solutions show how the system evolves over time. The difference with SDEs is that they include a source of random noise., typically given by a Brownian motion. Since Brownian motion has many pathological properties, such as being everywhere nondifferentiable, classical differential techniques are not well equipped to handle such equations. Standard results regarding the existence and uniqueness of solutions to ODEs do not apply in the stochastic case, and cannot readily describe what it even means to solve such as system. I will make some posts explaining how the theory of stochastic calculus applies to systems described by an SDE.

Consider a stochastic differential equation describing the evolution of a real-valued process {Xt}t≥0,

 $\displaystyle dX_t = \sigma(X_t)\,dW_t + b(X_t)\,dt$ (1)

which can be specified along with an initial condition X0 = x0. Here, b is the drift specifying how X moves on average across the dt time, σ is a volatility term giving the amplitude of the random noise and W is a driving Brownian motion providing the source of the randomness. There are numerous situations where equations such as (1) are used, with applications in physics, finance, filtering theory, and many other areas.

In the case where σ is zero, (1) is just an ordinary differential equation dX/dt = b(X). In the general case, we can informally think of dividing through by dt to give an ODE plus an additional noise term

 $\displaystyle \frac{dX_t}{dt}=b(X_t)+\sigma(X_t)\xi_t.$ (2)

I have set ξt = dWt/dt which can be thought of as a process whose values at each time are independent zero-mean random variables. As mentioned above, though, Brownian motion is not differentiable so this does not exist in the usual sense. While it can be described by a kind of random distribution, even distribution theory is not well-equipped to handle such equations involving multiplying by the nondifferentiable process σ(Xt). Instead, (1) can be integrated to obtain

 $\displaystyle X_t=X_0+\int_0^t\sigma(X_s)\,dW_s+\int_0^tb(X_s)\,ds,$ (3)

where the right-hand-side is interpreted using stochastic integration with respect to the semimartingale W. Likewise, X will be a semimartingale, and such solutions are often referred to as diffusions.

The differential form (1) can be interpreted as a shorthand for the integral expression (3), which I will do in these notes. It can be generalized to n-dimensional processes by allowing b to take values in n, a(x) to be an n × m matrix, and W to be an m-dimensional Brownian motion. That is, W = (W1, …, Wm) where Wi are independent Brownian motions. I will sometimes write this as

 $\displaystyle dX^t_i=\sigma_{ij}(X_t)dW^j_t+b_i(X_t)dt$

where the summation convention is being applied, with subscripts or superscripts occuring more than once in a single term being summed from 1 to n.

Unlike ODEs, when dealing with SDEs we need to consider what underlying probability space the solution is defined with respect to. This leads to the existence of different classes of solutions.

• Strong solutions where X can be expressed as a measurable function of the Brownian motion W or, equivalently, X is adapted to its natural filtration.
• Weak solutions where X need not be a function of W. Such cases may require additional randomness so may not exist on the probability space with respect to which the Brownian motion W is defined. It can be necessary to extend the filtered probability space to construct these solutions.

Likewise, when considering uniqueness of solutions, there are different ways this occurs.

• Pathwise uniqueness where, up to indistinguishability, there is only one solution X. This should hold not just on one specific space containing a Brownian motion W, but on all such spaces. That is, weak solutions should be unique.
• Uniqueness in law where there may be multiple pathwise solutions, but their distribution is uniquely determined by the SDE.

There are various general conditions under which strong solutions and pathwise uniqueness are guaranteed for SDE (1) , such as the Itô result for Lipschitz continuous coefficients. I covered this situation in a previous post.

Other than using the SDE (1), such systems can also be described by an associated differential operator. For the n-dimensional case set a(x) = σ(x)σ(x)T, which is an n × n positive semidefinite matrix. Then, the second order operator L can be defined

 $\displaystyle Lf(x)=\frac12a_{ij}(x)f_{,ij}(x)+b_{i}(x)f_{,i}(x)$

operating on twice continuously differentiable functions f: ℝn → ℝ. Being able to effortlessly switch between descriptions using the SDE (1) and the operator L is a huge benefit when working with such systems. There are several different ways in which the operator can be used to describe a stochastic process, all of which relate to weak solutions and uniqueness in law of the SDE.

Markov Generator: A Markov process is a weak solution to the SDE (1) if its infinitesimal generator is L. That is, if the transition function is Pt then,

 $\displaystyle \lim_{t\rightarrow0}t^{-1}(P_tf-f)=Lf$

for suitably regular functions f.

Backwards Equation: For a function f: ℝn × ℝ+ → ℝ, f(t, Xt) is a local martingale if and only if it solves the partial differential equation (PDE)

 $\displaystyle \frac{\partial f}{\partial t}+Lf=0.$

Consequently, for any time t > 0 and function g: ℝd → ℝ, if we let f be a solution to the PDE above with boundary condition f(x, t) = g(x) then, assuming integrability conditions, the conditional expectations at times s < t are

 $\displaystyle {\mathbb E}[g(X_t)\;\vert\mathcal F_s]=f(X_s,s).$

If the conditions are satisfied, this describes a Markov process and gives its transition probabilities, describing the distribution of X and implying uniqueness in law.

Forward Equation: Assuming that it is sufficiently smooth, the probability density p(t, x) of Xt satisfies the PDE

 $\displaystyle \frac{\partial p}{\partial t}=L^Tf.$

where LT is the transpose of operator L

 $\displaystyle L^Tp=\frac12(a_{ij}p)_{,ij}+(b_ip)_{,i}.$

If this PDE has a unique solution for given initial distribution, then this uniquely determines the distribution of Xt. So, if unique solutions to the forward equation exist starting at every future time, it gives uniqueness in law for X.

Martingale problem: Any weak solution to SDE (1) satisfies the property that

 $\displaystyle f(X_t)-\int_0^t Lf(X_s)\,ds$

is a local martingale for twice continuously differentiable functions f: ℝn → ℝ. This approach, which was pioneered by Stroock and Varadhan, has many benefits over the other applications of operator L described above, since it applies much more generally. We do not need to a-priori impose any properties on X such as being Markov, and as the test functions f are chosen at will, they automatically satisfy the necessary regularity properties. As well as being a very general way to describe solutions to a stochastic dynamical system, it turns out to be very fruitful. The striking and far-reaching Stroock–Varadhan uniqueness theorem, in particular, guarantees existence and uniqueness in law so long as a is continuous and positive definite and b is locally bounded.

# Brownian Motion and the Riemann Zeta Function

Intriguingly, various constructions related to Brownian motion result in quantities with moments described by the Riemann zeta function. These distributions appear in integral representations used to extend the zeta function to the entire complex plane, as described in an earlier post. Now, I look at how they also arise from processes constructed from Brownian motion such as Brownian bridges, excursions and meanders.

Recall the definition of the Riemann zeta function as an infinite series

 $\displaystyle \zeta(s)=1+2^{-s}+3^{-s}+4^{-s}+\cdots$

which converges for complex argument s with real part greater than one. This has a unique extension to an analytic function on the complex plane outside of a simple pole at s = 1.

Often, it is more convenient to use the Riemann xi function which can be defined as zeta multiplied by a prefactor involving the gamma function,

 $\displaystyle \xi(s)=\frac12s(s-1)\pi^{-s/2}\Gamma(s/2)\zeta(s).$

This is an entire function on the complex plane satisfying the functional equation ξ(1 - s) = ξ(s).

It turns out that ξ describes the moments of a probability distribution, according to which a random variable X is positive with moments

 $\displaystyle {\mathbb E}[X^s]=2\xi(s),$ (1)

which is well-defined for all complex s. In the post titled The Riemann Zeta Function and Probability Distributions, I denoted this distribution by Ψ, which is a little arbitrary but was the symbol used for its probability density. A related distribution on the positive reals, which we will denote by Φ, is given by the moments

 $\displaystyle {\mathbb E}[X^s]=\frac{1-2^{1-s}}{s-1}2\xi(s)$ (2)

which, again, is defined for all complex s.

As standard, complex powers of a positive real x are defined by xs = eslogx, so (1,2) are equivalent to the moment generating functions of logX, which uniquely determines the distributions. The probability densities and cumulative distribution functions can be given, although I will not do that here since they are already explicitly written out in the earlier post. I will write X ∼ Φ or X ∼ Ψ to mean that random variable X has the respective distribution. As we previously explained, these are closely connected:

• If X ∼ Ψ and, independently, Y is uniform on [1, 2], then X/Y ∼ Φ.
• If X, Y ∼ Φ are independent then X2 + Y2 ∼ Ψ.

The purpose of this post is to describe some constructions involving Brownian bridges, excursions and meanders which naturally involve the Φ and Ψ distributions.

Theorem 1 The following have distribution Φ:

1. 2/πZ where Z = supt|Bt| is the absolute maximum of a standard Brownian bridge B.
2. Z/√ where Z = suptBt is the maximum of a Brownian meander B.
3. Z where Z is the sample standard deviation of a Brownian bridge B,

 $\displaystyle Z=\left(\int_0^1(B_t-\bar B)^2\,dt\right)^{\frac12}$

with sample mean  = ∫01Btdt.

4. π/2Z where Z is the pathwise Euclidean norm of a 2-dimensional Brownian bridge B = (B1, B2),

 $\displaystyle Z=\left(\int_0^1\lVert B_t\rVert^2\,dt\right)^{\frac12}$
5. τπ/2 where τ = inf{t ≥ 0: ‖Bt‖= 1} is the first time at which the norm of a 3-dimensional standard Brownian motion B = (B1, B2, B3) hits 1.

The Kolmogorov distribution is, by definition, the absolute maximum of a Brownian bridge. So, the first statement of theorem 1 is saying that Φ is just the Kolmogorov distribution scaled by the constant factor 2/π. Moving on to Ψ;

Theorem 2 The following have distribution Ψ:

1. 2/πZ where Z = suptBt – inftBt is the range of a standard Brownian bridge B.
2. 2/πZ where Z = suptBt is the maximum of a (normalized) Brownian excursion B.
3. π/2Z where Z is the pathwise Euclidean norm of a 4-dimensional Brownian bridge B = (B1, B2, B3, B4),

 $\displaystyle Z=\left(\int_0^1\lVert B_t\rVert^2\,dt\right)^{\frac12}.$

# The Minimum and Maximum of Brownian motion

If X is standard Brownian motion, what is the distribution of its absolute maximum |X|t = sups ≤ t|Xs| over a time interval [0, t]? Previously, I looked at how the reflection principle can be used to determine that the maximum Xt = sups ≤ tXs has the same distribution as |Xt|. This is not the same thing as the maximum of the absolute value though, which is a more difficult quantity to describe. As a first step, |X|t is clearly at least as large as Xt from which it follows that it stochastically dominates |Xt|.

I would like to go further and precisely describe the distribution of |X|t. What is the probability that it exceeds a fixed positive level a? For this to occur, the suprema of both X and X must exceed a. Denoting the minimum and maximum by

 \displaystyle \begin{aligned} &X_t^m=\inf_{s\le t}X_s,\\ &X_t^M=\sup_{s\le t}X_s, \end{aligned}

then |X|t is the maximum of XtM and Xtm. I have switched notation a little here, and am using XM to denote what was previously written as X. This is just to use similar notation for both the minimum and maximum. Using inclusion-exclusion, the probability that the absolute maximum is greater than a level a is,

 \displaystyle \begin{aligned} {\mathbb P}(\lvert X\rvert_t^* > a)={} & {\mathbb P}(X_t^M > a)+{\mathbb P}(X_t^m < -a)\\ & -{\mathbb P}(X_t^M > a{\rm\ and\ }X_t^m < -a). \end{aligned}

As XtM has the same distribution as |Xt| and, by symmetry, so does Xm, we obtain

 $\displaystyle {\mathbb P}(\lvert X\rvert_t^* > a)=4{\mathbb P}(X_t > a)-{\mathbb P}(X_t^M > a{\rm\ and\ }X_t^m < -a).$

This hasn’t really answered the question. All we have done is to re-express the probability in terms of both the minimum and maximum being beyond a level. For large values of a it does, however, give a good approximation. The probability of the Brownian motion reaching a large positive value a and then dropping to the large negative value a will be vanishingly small, so the final term in the identity above can be neglected. This gives an asymptotic approximation as a tends to infinity,

 \displaystyle \begin{aligned} {\mathbb P}(\lvert X\rvert_t^* > a) &\sim 4{\mathbb P}(X_t > a)\\ &\sim\sqrt{\frac{8t}{\pi a^2}}e^{-\frac{a^2}{2t}}. \end{aligned} (1)

The last expression here is just using the fact that Xt is centered Gaussian with variance t and applying a standard approximation for the cumulative normal distribution function.

For small values of a, approximation (1) does not work well at all. We know that the left-hand-side should tend to 1, whereas 4ℙ(Xt > a) will tend to 2, and the final expression diverges. In fact, it can be shown that

 $\displaystyle {\mathbb P}(\lvert X\rvert_t^* < a)\sim\frac{4}{\pi}e^{-\frac{t\pi^2}{8a^2}}$ (2)

as a → 0. I gave a direct proof in this math.stackexchange answer. In this post, I will look at how we can compute joint distributions of the minimum, maximum and terminal value of Brownian motion, from which limits such as (2) will follow. Continue reading “The Minimum and Maximum of Brownian motion”

# The Brownian Drawdown Process

The drawdown of a stochastic process is the amount that it has dropped since it last hit its maximum value so far. For process X with running maximum Xt = sups ≤ tXs, the drawdown is thus Xt – Xt, which is a nonnegative process. This is as in figure 1 below.

The previous post used the reflection principle to show that the maximum of a Brownian motion has the same distribution as its terminal absolute value. That is, Xt and |Xt| are identically distributed.

For a process X started from zero, its maximum and drawdown can be written as Xt – X0 and Xt – Xt. Reversing the process in time across the interval [0, t] will exchange these values. So, reversing in time and translating so that it still starts from zero will exchange the maximum value and the drawdown. Specifically, write

 $\displaystyle Y_s = X_{t-s} - X_t$

for time index 0 ≤ s ≤ t. The maximum of Y is equal to the drawdown of X,

 $\displaystyle Y^*_t = X^*_t-X_t.$

If X is standard Brownian motion then so is Y, since the independent normal increments property for Y follows from that of X. As already stated, the maximum Yt = Xt – Xt has the same distribution as the absolute value |Yt|= |Xt|. So, the drawdown has the same distribution as the absolute value at each time.

Lemma 1 If X is standard Brownian motion, then Xt – Xt has the same distribution as |Xt| at each time t ≥ 0.

# Brownian Meanders

Having previously looked at Brownian bridges and excursions, I now turn to a third kind of process which can be constructed either as a conditioned Brownian motion or by extracting a segment from Brownian motion sample paths. Specifically, the Brownian meander, which is a Brownian motion conditioned to be positive over a unit time interval. Since this requires conditioning on a zero probability event, care must be taken. Instead, it is cleaner to start with an alternative definition by appropriately scaling a segment of a Brownian motion.

For a fixed positive times T, consider the last time σ before T at which a Brownian motion X is equal to zero,

 $\displaystyle \sigma=\sup\left\{t\le T\colon X_t=0\right\}.$ (1)

On interval [σ, T], the path of X will start from 0 and then be either strictly positive or strictly negative, and we may as well restrict to the positive case by taking absolute values. Scaling invariance says that c-1/2Xct is itself a standard Brownian motion for any positive constant c. So, scaling the path of X on [σ, 1] to the unit interval defines a process

 $\displaystyle B_t=(T-\sigma)^{-1/2}\lvert X_{\sigma+t(T-\sigma)}\rvert.$ (2)

over 0 ≤ t ≤ 1; This starts from zero and is strictly positive at all other times.

Scaling invariance shows that the law of the process B does not depend on the choice of fixed time T The only remaining ambiguity is in the choice of the fixed time T.

Lemma 1 The distribution of B defined by (2) does not depend on the choice of the time T > 0.

Proof: Consider any other fixed positive time , and use the construction above with , σ̃,  in place of T, σ, B respectively. We need to show that and B have the same distribution. Using the scaling factor S = /T, then Xt = S-1/2XtS is a standard Brownian motion. Also, σ′= σ̃/S is the last time before T at which X′ is zero. So,

 $\displaystyle \tilde B_t=(T-\sigma')^{-1/2}\lvert X'_{\sigma'+t(T-\sigma')}\rvert$

has the same distribution as B. ⬜

This leads to the definition used here for Brownian meanders.

Definition 2 A continuous process {Bt}t ∈ [0, 1] is a Brownian meander if and only it has the same distribution as (2) for a standard Brownian motion X and fixed time T > 0.

In fact, there are various alternative — but equivalent — ways in which Brownian excursions can be defined and constructed.

• As a scaled segment of a Brownian motion before a time T and after it last hits 0. This is definition 2.
• As a Brownian motion conditioned on being positive. See theorem 4 below.
• As a segment of a Brownian excursion. See lemma 5.
• As the path of a standard Brownian motion starting from its minimum, in either the forwards or backwards direction. See theorem 6.
• As a Markov process with specified transition probabilities. See theorem 9 below.
• As a solution to an SDE. See theorem 12 below.

# Brownian Excursions

A normalized Brownian excursion is a nonnegative real-valued process with time ranging over the unit interval, and is equal to zero at the start and end time points. It can be constructed from a standard Brownian motion by conditioning on being nonnegative and equal to zero at the end time. We do have to be careful with this definition, since it involves conditioning on a zero probability event. Alternatively, as the name suggests, Brownian excursions can be understood as the excursions of a Brownian motion X away from zero. By continuity, the set of times at which X is nonzero will be open and, hence, can be written as the union of a collection of disjoint (and stochastic) intervals (σ, τ).

In fact, Brownian motion can be reconstructed by simply joining all of its excursions back together. These are independent processes and identically distributed up to scaling. Because of this, understanding the Brownian excursion process can be very useful in the study of Brownian motion. However, there will by infinitely many excursions over finite time periods, so the procedure of joining them together requires some work. This falls under the umbrella of ‘excursion theory’, which is outside the scope of the current post. Here, I will concentrate on the properties of individual excursions.

In order to select a single interval, start by fixing a time T > 0. As XT is almost surely nonzero, T will be contained inside one such interval (σ, τ). Explicitly,

 \displaystyle \begin{aligned} &\sigma=\sup\left\{t\le T\colon X_t=0\right\},\\ &\tau=\inf\left\{t\ge T\colon X_t=0\right\}, \end{aligned} (1)

so that σ < T < τ < ∞ almost surely. The path of X across such an interval is t ↦ Xσ + t for time t in the range [0, τ - σ]. As it can be either nonnegative or nonpositive, we restrict to the nonnegative case by taking the absolute value. By invariance, S-1/2XtS is also a standard Brownian motion, for each fixed S > 0. Using a stochastic factor S = τ – σ, the width of the excursion is normalised to obtain a continuous process {Bt}t ∈ [0, 1] given by

 $\displaystyle B_t=(\tau-\sigma)^{-1/2}\lvert X_{\sigma+t(\tau-\sigma)}\rvert.$ (2)

By construction, this is strictly positive over 0 < t < 1 and equal to zero at the endpoints t ∈ {0, 1}.

The only remaining ambiguity is in the choice of the fixed time T.

Lemma 1 The distribution of B defined by (2) does not depend on the choice of the time T > 0.

Proof: This follows from scaling invariance of Brownian motion. Consider any other fixed positive time , and use the construction above with , σ̃, τ̃,  in place of T, σ, τ, B respectively. We need to show that and B have the same distribution. Using the scaling factor S = /T, then Xt = S-1/2XtS is a standard Brownian motion. Also, σ′= σ̃/S and τ′= τ̃/S are random times given in the same way as σ and τ, but with the Brownian motion X′ in place of X in (1). So,

 $\displaystyle \tilde B_t=(\tau^\prime-\sigma^\prime)^{-1/2}\lvert X^\prime_{\sigma^\prime+t(\tau^\prime-\sigma^\prime)}\rvert$

has the same distribution as B. ⬜

This leads to the definition used here for Brownian excursions.

Definition 2 A continuous process {Bt}t ∈ [0, 1] is a Brownian excursion if and only it has the same distribution as (2) for a standard Brownian motion X and time T > 0.

In fact, there are various alternative — but equivalent — ways in which Brownian excursions can be defined and constructed.

• As a normalized excursion away from zero of a Brownian motion. This is definition 2.
• As a normalized excursion away from zero of a Brownian bridge. This is theorem 6.
• As a Brownian bridge conditioned on being nonnegative. See theorem 9 below.
• As the sample path of a Brownian bridge, translated so that it has minimum value zero at time 0. This is a very interesting and useful method of directly computing excursion sample paths from those of a Brownian bridge. See theorem 12 below, sometimes known as the Vervaat transform.
• As a Markov process with specified transition probabilities. See theorem 15 below.
• As a transformation of Bessel process paths, see theorem 16 below.
• As a Bessel bridge of order 3. This can be represented either as a Bessel process conditioned on hitting zero at time 1., or as the vector norm of a 3-dimensional Brownian bridge. See lemma 17 below.
• As a solution to a stochastic differential equation. See theorem 18 below.

# Brownian Bridge Fourier Expansions

Brownian bridges were described in a previous post, along with various different methods by which they can be constructed. Since a Brownian bridge on an interval ${[0,T]}$ is continuous and equal to zero at both endpoints, we can consider extending to the entire real line by partitioning the real numbers into intervals of length T and replicating the path of the process across each of these. This will result in continuous and periodic sample paths, suggesting another method of representing Brownian bridges. That is, by Fourier expansion. As we will see, the Fourier coefficients turn out to be independent normal random variables, giving a useful alternative method of constructing a Brownian bridge.

There are actually a couple of distinct Fourier expansions that can be used, which depends on precisely how we consider extending the sample paths to the real line. A particularly simple result is given by the sine series, which I describe first. This is shown for an example Brownian bridge sample path in figure 1 above, which plots the sequence of approximations formed by truncating the series after a small number of terms. This tends uniformly to the sample path, although it is quite slow to converge as should be expected when approximating such a rough path by smooth functions. Also plotted, is the series after the first 100 terms, by which time the approximation is quite close to the target. For simplicity, I only consider standard Brownian bridges, which are defined on the unit interval ${[0,1]}$. This does not reduce the generality, since bridges on an interval ${[0,T]}$ can be expressed as scaled versions of standard Brownian bridges.

Theorem 1 A standard Brownian bridge B can be decomposed as

 $\displaystyle B_t=\sum_{n=1}^\infty\frac{\sqrt2Z_n}{\pi n}\sin(\pi nt)$ (1)

over ${0\le t\le1}$, where ${Z_1,Z_2,\ldots}$ is an IID sequence of standard normals. This series converges uniformly in t, both with probability one and in the ${L^p}$ norm for all ${1\le p < \infty}$.

# Brownian Bridges

A Brownian bridge can be defined as standard Brownian motion conditioned on hitting zero at a fixed future time T, or as any continuous process with the same distribution as this. Rather than conditioning, a slightly easier approach is to subtract a linear term from the Brownian motion, chosen such that the resulting process hits zero at the time T. This is equivalent, but has the added benefit of being independent of the original Brownian motion at all later times.

Lemma 1 Let X be a standard Brownian motion and ${T > 0}$ be a fixed time. Then, the process

 $\displaystyle B_t = X_t - \frac tTX_T$ (1)

over ${0\le t\le T}$ is independent from ${\{X_t\}_{t\ge T}}$.

Proof: As the processes are joint normal, it is sufficient that there is zero covariance between them. So, for times ${s\le T\le t}$, we just need to show that ${{\mathbb E}[B_sX_t]}$ is zero. Using the covariance structure ${{\mathbb E}[X_sX_t]=s\wedge t}$ we obtain,

 $\displaystyle {\mathbb E}[B_sX_t]={\mathbb E}[X_sX_t]-\frac sT{\mathbb E}[X_TX_t]=s-\frac sTT=0$

as required. ⬜

This leads us to the definition of a Brownian bridge.

Definition 2 A continuous process ${\{B_t\}_{t\in[0,T]}}$ is a Brownian bridge on the interval ${[0,T]}$ if and only it has the same distribution as ${X_t-\frac tTX_T}$ for a standard Brownian motion X.

In case that ${T=1}$, then B is called a standard Brownian bridge.

There are actually many different ways in which Brownian bridges can be defined, which all lead to the same result.

• As a Brownian motion minus a linear term so that it hits zero at T. This is definition 2.
• As a Brownian motion X scaled as ${tT^{-1/2}X_{T/t-1}}$. See lemma 9 below.
• As a joint normal process with prescribed covariances. See lemma 7 below.
• As a Brownian motion conditioned on hitting zero at T. See lemma 14 below.
• As a Brownian motion restricted to the times before it last hits zero before a fixed positive time T, and rescaled to fit a fixed time interval. See lemma 15 below.
• As a Markov process. See lemma 13 below.
• As a solution to a stochastic differential equation with drift term forcing it to hit zero at T. See lemma 18 below.

There are other constructions beyond these, such as in terms of limits of random walks, although I will not cover those in this post. Continue reading “Brownian Bridges”

# Independence of Normals

A well known fact about joint normally distributed random variables, is that they are independent if and only if their covariance is zero. In one direction, this statement is trivial. Any independent pair of random variables has zero covariance (assuming that they are integrable, so that the covariance has a well-defined value). The strength of the statement is in the other direction. Knowing the value of the covariance does not tell us a lot about the joint distribution so, in the case that they are joint normal, the fact that we can determine independence from this is a rather strong statement.

Theorem 1 A joint normal pair of random variables are independent if and only if their covariance is zero.

Proof: Suppose that X,Y are joint normal, such that ${X\overset d= N(\mu_X,\sigma^2_X)}$ and ${Y\overset d=N(\mu_Y,\sigma_Y^2)}$, and that their covariance is c. Then, the characteristic function of ${(X,Y)}$ can be computed as

 \displaystyle \begin{aligned} {\mathbb E}\left[e^{iaX+ibY}\right] &=e^{ia\mu_X+ib\mu_Y-\frac12(a^2\sigma_X^2+2abc+b^2\sigma_Y^2)}\\ &=e^{-abc}{\mathbb E}\left[e^{iaX}\right]{\mathbb E}\left[e^{ibY}\right] \end{aligned}

for all ${(a,b)\in{\mathbb R}^2}$. It is standard that the joint characteristic function of a pair of random variables is equal to the product of their characteristic functions if and only if they are independent which, in this case, corresponds to the covariance c being zero. ⬜

To demonstrate necessity of the joint normality condition, consider the example from the previous post.

Example 1 A pair of standard normal random variables X,Y which have zero covariance, but ${X+Y}$ is not normal.

As their sum is not normal, X and Y cannot be independent. This example was constructed by setting ${Y={\rm sgn}(\lvert X\rvert -K)X}$ for some fixed ${K > 0}$, which is standard normal whenever X is. As explained in the previous post, the intermediate value theorem ensures that there is a unique value for K making the covariance ${{\mathbb E}[XY]}$ equal to zero. Continue reading “Independence of Normals”

# Multivariate Normal Distributions

I looked at normal random variables in an earlier post but, what does it mean for a sequence of real-valued random variables ${X_1,X_2,\ldots,X_n}$ to be jointly normal? We could simply require each of them to be normal, but this says very little about their joint distribution and is not much help in handling expressions involving more than one of the ${X_i}$ at once. In case that the random variables are independent, the following result is a very useful property of the normal distribution. All random variables in this post will be real-valued, except where stated otherwise, and we assume that they are defined with respect to some underlying probability space ${(\Omega,\mathcal F,{\mathbb P})}$.

Lemma 1 Linear combinations of independent normal random variables are again normal.

Proof: More precisely, if ${X_1,\ldots,X_n}$ is a sequence of independent normal random variables and ${a_1,\ldots,a_n}$ are real numbers, then ${Y=a_1X_1+\cdots+a_nX_n}$ is normal. Let us suppose that ${X_k}$ has mean ${\mu_k}$ and variance ${\sigma_k^2}$. Then, the characteristic function of Y can be computed using the independence property and the characteristic functions of the individual normals,

 \displaystyle \begin{aligned} {\mathbb E}\left[e^{i\lambda Y}\right] &={\mathbb E}\left[\prod_ke^{i\lambda a_k X_k}\right] =\prod_k{\mathbb E}\left[e^{i\lambda a_k X_k}\right]\\ &=\prod_ke^{-\frac12\lambda^2a_k^2\sigma_k^2+i\lambda a_k\mu_k} =e^{-\frac12\lambda^2\sigma^2+i\lambda\mu} \end{aligned}

where we have set ${\mu_k=\sum_ka_k\mu_k}$ and ${\sigma^2=\sum_ka_k^2\sigma_k^2}$. This is the characteristic function of a normal random variable with mean ${\mu}$ and variance ${\sigma^2}$. ⬜

The definition of joint normal random variables will include the case of independent normals, so that any linear combination is also normal. We use use this result as the defining property for the general multivariate normal case.

Definition 2 A collection ${\{X_i\}_{i\in I}}$ of real-valued random variables is multivariate normal (or joint normal) if and only if all of its finite linear combinations are normal.