# Multivariate Normal Distributions

I looked at normal random variables in an earlier post but, what does it mean for a sequence of real-valued random variables ${X_1,X_2,\ldots,X_n}$ to be jointly normal? We could simply require each of them to be normal, but this says very little about their joint distribution and is not much help in handling expressions involving more than one of the ${X_i}$ at once. In case that the random variables are independent, the following result is a very useful property of the normal distribution. All random variables in this post will be real-valued, except where stated otherwise, and we assume that they are defined with respect to some underlying probability space ${(\Omega,\mathcal F,{\mathbb P})}$.

Lemma 1 Linear combinations of independent normal random variables are again normal.

Proof: More precisely, if ${X_1,\ldots,X_n}$ is a sequence of independent normal random variables and ${a_1,\ldots,a_n}$ are real numbers, then ${Y=a_1X_1+\cdots+a_nX_n}$ is normal. Let us suppose that ${X_k}$ has mean ${\mu_k}$ and variance ${\sigma_k^2}$. Then, the characteristic function of Y can be computed using the independence property and the characteristic functions of the individual normals,

 \displaystyle \begin{aligned} {\mathbb E}\left[e^{i\lambda Y}\right] &={\mathbb E}\left[\prod_ke^{i\lambda a_k X_k}\right] =\prod_k{\mathbb E}\left[e^{i\lambda a_k X_k}\right]\\ &=\prod_ke^{-\frac12\lambda^2a_k^2\sigma_k^2+i\lambda a_k\mu_k} =e^{-\frac12\lambda^2\sigma^2+i\lambda\mu} \end{aligned}

where we have set ${\mu_k=\sum_ka_k\mu_k}$ and ${\sigma^2=\sum_ka_k^2\sigma_k^2}$. This is the characteristic function of a normal random variable with mean ${\mu}$ and variance ${\sigma^2}$. ⬜

The definition of joint normal random variables will include the case of independent normals, so that any linear combination is also normal. We use use this result as the defining property for the general multivariate normal case.

Definition 2 A collection ${\{X_i\}_{i\in I}}$ of real-valued random variables is multivariate normal (or joint normal) if and only if all of its finite linear combinations are normal.

# The Riemann Zeta Function and Probability Distributions

The famous Riemann zeta function was first introduced by Riemann in order to describe the distribution of the prime numbers. It is defined by the infinite sum

 \displaystyle \begin{aligned} \zeta(s) &=1+2^{-s}+3^{-s}+4^{-s}+\cdots\\ &=\sum_{n=1}^\infty n^{-s}, \end{aligned} (1)

which is absolutely convergent for all complex s with real part greater than one. One of the first properties of this is that, as shown by Riemann, it extends to an analytic function on the entire complex plane, other than a simple pole at ${s=1}$. By the theory of analytic continuation this extension is necessarily unique, so the importance of the result lies in showing that an extension exists. One way of doing this is to find an alternative expression for the zeta function which is well defined everywhere. For example, it can be expressed as an absolutely convergent integral, as performed by Riemann himself in his original 1859 paper on the subject. This leads to an explicit expression for the zeta function, scaled by an analytic prefactor, as the integral of ${x^s}$ multiplied by a function of x over the range ${ x > 0}$. In fact, this can be done in a way such that the function of x is a probability density function, and hence expresses the Riemann zeta function over the entire complex plane in terms of the generating function ${{\mathbb E}[X^s]}$ of a positive random variable X. The probability distributions involved here are not the standard ones taught to students of probability theory, so may be new to many people. Although these distributions are intimately related to the Riemann zeta function they also, intriguingly, turn up in seemingly unrelated contexts involving Brownian motion.

In this post, I derive two probability distributions related to the extension of the Riemann zeta function, and describe some of their properties. I also show how they can be constructed as the sum of a sequence of gamma distributed random variables. For motivation, some examples are given of where they show up in apparently unrelated areas of probability theory, although I do not give proofs of these statements here. For more information, see the 2001 paper Probability laws related to the Jacobi theta and Riemann zeta functions, and Brownian excursions by Biane, Pitman, and Yor. Continue reading “The Riemann Zeta Function and Probability Distributions”

# Manipulating the Normal Distribution

The normal (or Gaussian) distribution is ubiquitous throughout probability theory for various reasons, including the central limit theorem, the fact that it is realistic for many practical applications, and because it satisfies nice properties making it amenable to mathematical manipulation. It is, therefore, one of the first continuous distributions that students encounter at school. As such, it is not something that I have spent much time discussing on this blog, which is usually concerned with more advanced topics. However, there are many nice properties and methods that can be performed with normal distributions, greatly simplifying the manipulation of expressions in which it is involved. While it is usually possible to ignore these, and instead just substitute in the density function and manipulate the resulting integrals, that approach can get very messy. So, I will describe some of the basic results and ideas that I use frequently.

Throughout, I assume the existence of an underlying probability space ${(\Omega,\mathcal F,{\mathbb P})}$. Recall that a real-valued random variable X has the standard normal distribution if it has a probability density function given by,

 $\displaystyle \varphi(x)=\frac1{\sqrt{2\pi}}e^{-\frac{x^2}2}.$

For it to function as a probability density, it is necessary that it integrates to one. While it is not obvious that the normalization factor ${1/\sqrt{2\pi}}$ is the correct value for this to be true, it is the one fact that I state here without proof. Wikipedia does list a couple of proofs, which can be referred to. By symmetry, ${-X}$ and ${X}$ have the same distribution, so that they have the same mean and, therefore, ${{\mathbb E}[X]=0}$.

The derivative of the density function satisfies the useful identity

 $\displaystyle \varphi^\prime(x)=-x\varphi(x).$ (1)

This allows us to quickly verify that standard normal variables have unit variance, by an application of integration by parts.

 \displaystyle \begin{aligned} {\mathbb E}[X^2] &=\int x^2\varphi(x)dx\\ &= -\int x\varphi^\prime(x)dx\\ &=\int\varphi(x)dx-[x\varphi(x)]_{-\infty}^\infty=1 \end{aligned}

Another identity satisfied by the normal density function is,

 $\displaystyle \varphi(x+y)=e^{-xy - \frac{y^2}2}\varphi(x)$ (2)

This enables us to prove the following very useful result. In fact, it is difficult to overstate how helpful this result can be. I make use of it frequently when manipulating expressions involving normal variables, as it significantly simplifies the calculations. It is also easy to remember, and simple to derive if needed.

Theorem 1 Let X be standard normal and ${f\colon{\mathbb R}\rightarrow{\mathbb R}_+}$ be measurable. Then, for all ${\lambda\in{\mathbb R}}$,

 \displaystyle \begin{aligned} {\mathbb E}[e^{\lambda X}f(X)] &={\mathbb E}[e^{\lambda X}]{\mathbb E}[f(X+\lambda)]\\ &=e^{\frac{\lambda^2}{2}}{\mathbb E}[f(X+\lambda)]. \end{aligned} (3)

# Brownian Drawdowns

Here, I apply the theory outlined in the previous post to fully describe the drawdown point process of a standard Brownian motion. In fact, as I will show, the drawdowns can all be constructed from independent copies of a single ‘Brownian excursion’ stochastic process. Recall that we start with a continuous stochastic process X, assumed here to be Brownian motion, and define its running maximum as ${M_t=\sup_{s\le t}X_s}$ and drawdown process ${D_t=M_t-X_t}$. This is as in figure 1 above.

Next, ${D^a}$ was defined to be the drawdown ‘excursion’ over the interval at which the maximum process is equal to the value ${a \ge 0}$. Precisely, if we let ${\tau_a}$ be the first time at which X hits level ${a}$ and ${\tau_{a+}}$ be its right limit ${\tau_{a+}=\lim_{b\downarrow a}\tau_b}$ then,

 $\displaystyle D^a_t=D_{({\tau_a+t})\wedge\tau_{a+}}=a-X_{({\tau_a+t)}\wedge\tau_{a+}}.$

Next, a random set S is defined as the collection of all nonzero drawdown excursions indexed the running maximum,

 $\displaystyle S=\left\{(a,D^a)\colon D^a\not=0\right\}.$

The set of drawdown excursions corresponding to the sample path from figure 1 are shown in figure 2 below.

As described in the post on semimartingale local times, the joint distribution of the drawdown and running maximum ${(D,M)}$, of a Brownian motion, is identical to the distribution of its absolute value and local time at zero, ${(\lvert X\rvert,L^0)}$. Hence, the point process consisting of the drawdown excursions indexed by the running maximum, and the absolute value of the excursions from zero indexed by the local time, both have the same distribution. So, the theory described in this post applies equally to the excursions away from zero of a Brownian motion.

Before going further, let’s recap some of the technical details. The excursions lie in the space E of continuous paths ${z\colon{\mathbb R}_+\rightarrow{\mathbb R}}$, on which we define a canonical process Z by sampling the path at each time t, ${Z_t(z)=z_t}$. This space is given the topology of uniform convergence over finite time intervals (compact open topology), which makes it into a Polish space, and whose Borel sigma-algebra ${\mathcal E}$ is equal to the sigma-algebra generated by ${\{Z_t\}_{t\ge0}}$. As shown in the previous post, the counting measure ${\xi(A)=\#(S\cap A)}$ is a random point process on ${({\mathbb R}_+\times E,\mathcal B({\mathbb R}_+)\otimes\mathcal E)}$. In fact, it is a Poisson point process, so its distribution is fully determined by its intensity measure ${\mu={\mathbb E}\xi}$.

Theorem 1 If X is a standard Brownian motion, then the drawdown point process ${\xi}$ is Poisson with intensity measure ${\mu=\lambda\otimes\nu}$ where,

• ${\lambda}$ is the standard Lebesgue measure on ${{\mathbb R}_+}$.
• ${\nu}$ is a sigma-finite measure on E given by
 $\displaystyle \nu(f) = \lim_{\epsilon\rightarrow0}\epsilon^{-1}{\mathbb E}_\epsilon[f(Z^{\sigma})]$ (1)

for all bounded continuous continuous maps ${f\colon E\rightarrow{\mathbb R}}$ which vanish on paths of length less than L (some ${L > 0}$). The limit is taken over ${\epsilon > 0}$, ${{\mathbb E}_\epsilon}$ denotes expectation under the measure with respect to which Z is a Brownian motion started at ${\epsilon}$, and ${\sigma}$ is the first time at which Z hits 0. This measure satisfies the following properties,

• ${\nu}$-almost everywhere, there exists a time ${T > 0}$ such that ${Z > 0}$ on ${(0,T)}$ and ${Z=0}$ everywhere else.
• for each ${t > 0}$, the distribution of ${Z_t}$ has density
 $\displaystyle p_t(z)=z\sqrt{\frac 2{\pi t^3}}e^{-\frac{z^2}{2t}}$ (2)

over the range ${z > 0}$.

• over ${t > 0}$, ${Z_t}$ is Markov, with transition function of a Brownian motion stopped at zero.

# Drawdown Point Processes

For a continuous real-valued stochastic process ${\{X_t\}_{t\ge0}}$ with running maximum ${M_t=\sup_{s\le t}X_s}$, consider its drawdown. This is just the amount that it has dropped since its maximum so far,

 $\displaystyle D_t=M_t-X_t,$

which is a nonnegative process hitting zero whenever the original process visits its running maximum. By looking at each of the individual intervals over which the drawdown is positive, we can break it down into a collection of finite excursions above zero. Furthermore, the running maximum is constant across each of these intervals, so it is natural to index the excursions by this maximum process. By doing so, we obtain a point process. In many cases, it is even a Poisson point process. I look at the drawdown in this post as an example of a point process which is a bit more interesting than the previous example given of the jumps of a cadlag process. By piecing the drawdown excursions back together, it is possible to reconstruct ${D_t}$ from the point process. At least, this can be done so long as the original process does not monotonically increase over any nontrivial intervals, so that there are no intervals with zero drawdown. As the point process indexes the drawdown by the running maximum, we can also reconstruct X as ${X_t=M_t-D_t}$. The drawdown point process therefore gives an alternative description of our original process.

See figure 1 for the drawdown of the bitcoin price valued in US dollars between April and December 2020. As it makes more sense for this example, the drawdown is shown as a percent of the running maximum, rather than in dollars. This is equivalent to the approach taken in this post applied to the logarithm of the price return over the period, so that ${X_t=\log(B_t/B_0)}$. It can be noted that, as the price was mostly increasing, the drawdown consists of a relatively large number of small excursions. If, on the other hand, it had declined, then it would have been dominated by a single large drawdown excursion covering most of the time period.

For simplicity, I will suppose that ${X_0=0}$ and that ${M_t}$ tends to infinity as t goes to infinity. Then, for each ${a\ge0}$, define the random time at which the process first hits level ${a}$,

 $\displaystyle \tau_a=\inf\left\{t\ge 0\colon X_t\ge a\right\}.$

By construction, this is finite, increasing, and left-continuous in ${a}$. Consider, also, the right limits ${\tau_{a+}=\lim_{b\downarrow0}\tau_b}$. Each of the excursions on which the drawdown is positive is equal to one of the intervals ${(\tau_a,\tau_{a+})}$. The excursion is defined as a continuous stochastic process ${\{D^a_t\}_{t\ge0}}$ equal to the drawdown starting at time ${\tau_a}$ and stopped at time ${\tau_{a+}}$,

 $\displaystyle D^a_t=D_{(\tau_a+t)\wedge\tau_{a+}}=a-X_{(\tau_a+t)\wedge\tau_{a+}}.$

This is a continuous nonnegative real-valued process, which starts at zero and is equal to zero at all times after ${\tau_{a+}-\tau_a}$. Note that there uncountably many values for ${a}$ but, the associated excursion will be identically zero other than for the countably many times at which ${\tau_{a+} > \tau_a}$. We will only be interested in these nonzero excursions.

As usual, we work with respect to an underlying probability space ${(\Omega,\mathcal F,{\mathbb P})}$, so that we have one path of the stochastic process X defined for each ${\omega\in\Omega}$. Associated to this is the collection of drawdown excursions indexed by the running maximum.

 $\displaystyle S=\left\{(a,D^a)\colon a\ge0,\ D^a\not=0\right\}.$

As S is defined for each given sample path, it depends on the choice of ${\omega\in\Omega}$, so is a countable random set. The sample paths of the excursions ${D^a}$ lie in the space of continuous functions ${{\mathbb R}_+\rightarrow{\mathbb R}}$, which I denote by E. For each time ${t\ge0}$, I use ${Z_t}$ to denote the value of the path sampled at time t,

 \displaystyle \begin{aligned} &E=\left\{z\colon {\mathbb R}_+\rightarrow{\mathbb R}{\rm\ is\ continuous}\right\}.\\ &Z_t\colon E\rightarrow{\mathbb R},\\ & Z_t(z)=z_t. \end{aligned}

Use ${\mathcal E}$ to denote the sigma-algebra on E generated by the collection of maps ${\{Z_t\colon t\ge0\}}$, so that ${(E,\mathcal E)}$ is the measurable space in which the excursion paths lie. It can be seen that ${\mathcal E}$ is the Borel sigma-algebra generated by the open subsets of E, with respect to the topology of compact convergence. That is, the topology of uniform convergence on finite time intervals. As this is a complete separable metric space, it makes ${(E,\mathcal E)}$ into a standard Borel space.

Lemma 1 The set S defines a simple point process ${\xi}$ on ${{\mathbb R}_+\times E}$,

 $\displaystyle \xi(A)=\#(S\cap A)$

for all ${A\in\mathcal B({\mathbb R}_+)\otimes\mathcal E}$.

From the definition of point processes, this simply means that ${\xi(A)}$ is a measurable random variable for each ${A\in \mathcal B({\mathbb R}_+)\otimes\mathcal E}$ and that there exists a sequence ${A_n\in \mathcal B({\mathbb R}_+)\otimes\mathcal E}$ covering E such that ${\xi(A_n)}$ are almost surely finite. The set of drawdowns for the point process corresponding to the bitcoin prices in figure 1 are shown in figure 2 below.

# Criteria for Poisson Point Processes

If S is a finite random set in a standard Borel measurable space ${(E,\mathcal E)}$ satisfying the following two properties,

• if ${A,B\in\mathcal E}$ are disjoint, then the sizes of ${S\cap A}$ and ${S\cap B}$ are independent random variables,
• ${{\mathbb P}(x\in S)=0}$ for each ${x\in E}$,

then it is a Poisson point process. That is, the size of ${S\cap A}$ is a Poisson random variable for each ${A\in\mathcal E}$. This justifies the use of Poisson point processes in many different areas of probability and stochastic calculus, and provides a convenient method of showing that point processes are indeed Poisson. If the theorem applies, so that we have a Poisson point process, then we just need to compute the intensity measure to fully determine its distribution. The result above was mentioned in the previous post, but I give a precise statement and proof here. Continue reading “Criteria for Poisson Point Processes”

# Poisson Point Processes

The Poisson distribution models numbers of events that occur in a specific period of time given that, at each instant, whether an event occurs or not is independent of what happens at all other times. Examples which are sometimes cited as candidates for the Poisson distribution include the number of phone calls handled by a telephone exchange on a given day, the number of decays of a radio-active material, and the number of bombs landing in a given area during the London Blitz of 1940-41. The Poisson process counts events which occur according to such distributions.

More generally, the events under consideration need not just happen at specific times, but also at specific locations in a space E. Here, E can represent an actual geometric space in which the events occur, such as the spacial distribution of bombs dropped during the Blitz shown in figure 1, but can also represent other quantities associated with the events. In this example, E could represent the 2-dimensional map of London, or could include both space and time so that ${E=F\times{\mathbb R}}$ where, now, F represents the 2-dimensional map and E is used to record both time and location of the bombs. A Poisson point process is a random set of points in E, such that the number that lie within any measurable subset is Poisson distributed. The aim of this post is to introduce Poisson point processes together with the mathematical machinery to handle such random sets.

The choice of distribution is not arbitrary. Rather, it is a result of the independence of the number of events in each region of the space which leads to the Poisson measure, much like the central limit theorem leads to the ubiquity of the normal distribution for continuous random variables and of Brownian motion for continuous stochastic processes. A random finite subset S of a reasonably ‘nice’ (standard Borel) space E is a Poisson point process so long as it satisfies the properties,

• If ${A_1,\ldots,A_n}$ are pairwise-disjoint measurable subsets of E, then the sizes of ${S\cap A_1,\ldots,S\cap A_n}$ are independent.
• Individual points of the space each have zero probability of being in S. That is, ${{\mathbb P}(x\in S)=0}$ for each ${x\in E}$.

The proof of this important result will be given in a later post.

We have come across Poisson point processes previously in my stochastic calculus notes. Specifically, suppose that X is a cadlag ${{\mathbb R}^d}$-valued stochastic process with independent increments, and which is continuous in probability. Then, the set of points ${(t,\Delta X_t)}$ over times t for which the jump ${\Delta X}$ is nonzero gives a Poisson point process on ${{\mathbb R}_+\times{\mathbb R}^d}$. See lemma 4 of the post on processes with independent increments, which corresponds precisely to definition 5 given below. Continue reading “Poisson Point Processes”

# Quantum Coin Tossing

Let me ask the following very simple question. Suppose that I toss a pair of identical coins at the same time, then what is the probability of them both coming up heads? There is no catch here, both coins are fair. There are three possible outcomes, both tails, one head and one tail, and both heads. Assuming that it is completely random so that all outcomes are equally likely, then we could argue that each possibility has a one in three chance of occurring, so that the answer to the question is that the probability is 1/3.

Of course, this is wrong! A fair coin has a probability of 1/2 of showing heads and, by independence, standard probability theory says that we should multiply these together for each coin to get the correct answer of ${\frac12\times\frac12=\frac14}$, which can be verified by experiment. Alternatively, we can note that the outcome of one tail and one head, in reality, consists of two equally likely possibilities. Either the first coin can be a head and the second a tail, or vice-versa. So, there are actually four equally likely possible outcomes, only one of which has both coins showing heads, again giving a probability of 1/4. Continue reading “Quantum Coin Tossing”

# Local Time Continuity

The local time of a semimartingale at a level x is a continuous increasing process, giving a measure of the amount of time that the process spends at the given level. As the definition involves stochastic integrals, it was only defined up to probability one. This can cause issues if we want to simultaneously consider local times at all levels. As x can be any real number, it can take uncountably many values and, as a union of uncountably many zero probability sets can have positive measure or, even, be unmeasurable, this is not sufficient to determine the entire local time ‘surface’

 $\displaystyle (t,x)\mapsto L^x_t(\omega)$

for almost all ${\omega\in\Omega}$. This is the common issue of choosing good versions of processes. In this case, we already have a continuous version in the time index but, as yet, have not constructed a good version jointly in the time and level. This issue arose in the post on the Ito–Tanaka–Meyer formula, for which we needed to choose a version which is jointly measurable. Although that was sufficient there, joint measurability is still not enough to uniquely determine the full set of local times, up to probability one. The ideal situation is when a version exists which is jointly continuous in both time and level, in which case we should work with this choice. This is always possible for continuous local martingales.

Theorem 1 Let X be a continuous local martingale. Then, the local times

 $\displaystyle (t,x)\mapsto L^x_t$

have a modification which is jointly continuous in x and t. Furthermore, this is almost surely ${\gamma}$-Hölder continuous w.r.t. x, for all ${\gamma < 1/2}$ and over all bounded regions for t.

# The Kolmogorov Continuity Theorem

One of the common themes throughout the theory of continuous-time stochastic processes, is the importance of choosing good versions of processes. Specifying the finite distributions of a process is not sufficient to determine its sample paths so, if a continuous modification exists, then it makes sense to work with that. A relatively straightforward criterion ensuring the existence of a continuous version is provided by Kolmogorov’s continuity theorem.

For any positive real number ${\gamma}$, a map ${f\colon E\rightarrow F}$ between metric spaces E and F is said to be ${\gamma}$-Hölder continuous if there exists a positive constant C satisfying

$\displaystyle d(f(x),f(y))\le Cd(x,y)^\gamma$

for all ${x,y\in E}$. Hölder continuous functions are always continuous and, at least on bounded spaces, is a stronger property for larger values of the coefficient ${\gamma}$. So, if E is a bounded metric space and ${\alpha\le\beta}$, then every ${\beta}$-Hölder continuous map from E is also ${\alpha}$-Hölder continuous. In particular, 1-Hölder and Lipschitz continuity are equivalent.

Kolmogorov’s theorem gives simple conditions on the pairwise distributions of a process which guarantee the existence of a continuous modification but, also, states that the sample paths ${t\mapsto X_t}$ are almost surely locally Hölder continuous. That is, they are almost surely Hölder continuous on every bounded interval. To start with, we look at real-valued processes. Throughout this post, we work with repect to a probability space ${(\Omega,\mathcal F, {\mathbb P})}$. There is no need to assume the existence of any filtration, since they play no part in the results here

Theorem 1 (Kolmogorov) Let ${\{X_t\}_{t\ge0}}$ be a real-valued stochastic process such that there exists positive constants ${\alpha,\beta,C}$ satisfying

$\displaystyle {\mathbb E}\left[\lvert X_t-X_s\rvert^\alpha\right]\le C\lvert t-s\vert^{1+\beta},$

for all ${s,t\ge0}$. Then, X has a continuous modification which, with probability one, is locally ${\gamma}$-Hölder continuous for all ${0 < \gamma < \beta/\alpha}$.