An Unexpected Quartic Solution

Many years ago, while in high school, I tried my hand at solving cubic and quartic formulas. Although there are entirely systematic approaches, using Galois theory, this was not something that I was familiar with at the time. I had just heard that it is possible. Here, ‘solving’ means to find an expression for the roots of the polynomial in terms of its coefficients, involving the standard arithmetical operations of addition, subtraction, multiplication and division, as well as extracting square roots, cube roots, etc.

The solution for cubics went very well. In class one day, the teacher wrote a specific example of a quartic on the blackboard, and proceeded to solve it by reducing to two easy quadratics. The reason that his example worked so easily is because the coefficients formed a palindrome. That is, they were the same when written in reverse order. As an example, consider the equation,

\displaystyle  x^4+2x^3-x^2+2x+1=0.

If we divide through by {x^2} then, with a little rearranging, this gives,

\displaystyle  (x+1/x)^2+2(x+1/x)-3=0.

As a quadratic in {x+1/x}, this is easily solved. One solution is {x+1/x=-3}. Multiplying by x and rearranging gives a new quadratic,

\displaystyle  x^2+3x+1=0.

By the standard formula for quadratics, we obtain

\displaystyle  x=(-3\pm\sqrt{5})/2.

It can be checked that this does give two real solutions to the original quartic.

Now, the approach that I attempted for the general quartic was to apply a substitution in order to simplify it, so that a similar method can be applied. Unfortunately, this resulted in a very messy equation, which seemed to be giving a sextic. That is, I went from the original fourth order polynomial, to what was looking like a sixth order one. This was complicating the problem, and getting further away from the goal than where I had started. I am not sure why I did not give up at that point, but I continued. Then, something amazing happened. Computing the coefficients of the sixth, fifth and fourth powers in this sextic, they all vanished! In fact, I had succeeded in reducing the quartic to a cubic, which can be solved. This still seems surprising, that such a messy looking expression should cancel out like this, in just the way that was needed. See equation (2) below for what I am talking about. As this was such a surprise at the time, and is still so now, I have decided to write it up in this post. It just demonstrates that, even if something seems hopeless, if you continue regardless then everything might just fall into place. Continue reading “An Unexpected Quartic Solution”

Quantum Entanglement

Quantum entanglement is one of the most striking differences between the behaviour of the universe described by quantum theory, and that given by classical physics. If two physical systems interact then, even if they later separate, their future evolutions can no longer be considered purely in isolation. Any attempt to describe the systems with classical logic leads inevitably to an apparent link between them, where simply observing one instantaneously impacts the state of the other. This effect remains, regardless of how far apart the systems become.

An EPR-Bohm experiment
Figure 1: An EPR-Bohm experiment

As it is a very famous quantum phenomenon, a lot has been written about entanglement in both the scientific and popular literature. However, it does still seem to be frequently misunderstood, with many surrounding misconceptions. I will attempt to explain the effects of entanglement in as straightforward a way as possible, with some very basic thought experiments. These can be followed without any understanding of what physical processes may be going on underneath. They only involve pressing a button on a box and observing the colour of a light bulb mounted on it. In fact, this is one of the features of quantum entanglement. It does not matter how you describe the physical world, whether you think of things as particles, waves, or whatever. Entanglement is an observable property independently of how, or even if, we try to describe the physical processes. Continue reading “Quantum Entanglement”

What’s in a Name?

roseThat which we call a rose, by any other name would smell as sweet.
You may have noticed, if you pay attention to the address bar, that the domain of this site has changed, and it is pretty sweet! We are now almostsuremath.com, not almostsure.wordpress.com.
It would have been sweeter still to be almostsure.com, but the owner of that domain is not wanting to give it up. The new url contains the name of this site, and is descriptive, so is still pretty good.
It is not just a name change though. Ads have gone away! I dug into my pocket and found the £3 a month to get rid of ads and map my own domain here. Also, switching to the new domain opens up possibilities…such as a self hosted wordpress site, more customization, mathjax, etc.

Is this a Blog?

Well, according to the tagline, it is a “random mathematical blog”, and is hosted on the popular blogging platform, WordPress.com. According to Wikipedia, a blog is a

discussion or informational website published on the World Wide Web consisting of discrete, often informal diary-style text entries (posts).

Most of the posts on this site are not especially discrete, consist of some quite formal maths, and are not at all diary style. According to wix.com, blogs are

regularly updated websites that provide essential insights into a certain topic.

I like to think think that this site provides essential insights into stochastic calculus and probability theory, but how regular is regular? Daily? To qualify, it should probably be weekly, at least. Here, I can go a month or so without updating. It has sometimes been significantly longer between updates. I try and write posts containing proper rigorous mathematics, and to explain things as well as I can. The style that I aim for in many of the posts here are not unlike you might see in published maths papers or in textbooks. These take some time, and I cannot just rush out a post containing detailed proofs and explanations of mathematical theory. It is not my job, just something that I like to do. I like to work through advanced maths subjects, prove results, and to get a good understanding of these. It would be possible to post weekly, but the style would not be the same and the mathematical content would be much reduced and at a shallower level, which is not what I would enjoy doing.

So, no, this is not a blog!

I should probably change the tagline. This is a random mathematical website. I would however like to change the style of the site a bit. The idea is to update with detailed maths posts, as always, but not on a weekly basis. However, I am inclined to also do some weekly updates. Just short updates on mathematical subjects, or anything related to this website. I’ll see how it goes…

The Khintchine Inequality

For a Rademacher sequence {X=(X_1,X_2,\ldots)} and square summable sequence of real numbers {a=(a_1,a_2,\ldots)}, the Khintchine inequality provides upper and lower bounds for the moments of the random variable,

\displaystyle  a\cdot X=a_1X_1+a_2X_2+\cdots.

We use {\ell^2} for the space of square summable real sequences and

\displaystyle  \lVert a\rVert_2=\left(a_1^2+a_2^2+\cdots\right)^{1/2}

for the associated Banach norm.

Theorem 1 (Khintchine) For each {0 < p < \infty}, there exists positive constants {c_p,C_p} such that,

\displaystyle  c_p\lVert a\rVert_2^p\le{\mathbb E}\left[\lvert a\cdot X\rvert^p\right]\le C_p\lVert a\rVert_2^p, (1)

for all {a\in\ell^2}.

Continue reading “The Khintchine Inequality”

Rademacher Series

The Rademacher distribution is probably the simplest nontrivial probability distribution that you can imagine. This is a discrete distribution taking only the two possible values {\{1,-1\}}, each occurring with equal probability. A random variable X has the Rademacher distribution if

\displaystyle  {\mathbb P}(X=1)={\mathbb P}(X=-1)=1/2.

A Randemacher sequence is an IID sequence of Rademacher random variables,

\displaystyle  X = (X_1,X_2,X_3\ldots).

Recall that the partial sums {S_N=\sum_{n=1}^NX_n} of a Rademacher sequence is a simple random walk. Generalizing a bit, we can consider scaling by a sequence of real weights {a_1,a_2,\ldots}, so that {S_N=\sum_{n=1}^Na_nX_n}. I will concentrate on infinite sums, as N goes to infinity, which will clearly include the finite Rademacher sums as the subset with only finitely many nonzero weights.

Rademacher series serve as simple prototypes of more general IID series, but also have applications in various areas. Results include concentration and anti-concentration inequalities, and the Khintchine inequality, which imply various properties of {L^p} spaces and of linear maps between them. For example, in my notes constructing the stochastic integral starting from a minimal set of assumptions, the {L^0} version of the Khintchine inequality was required. Rademacher series are also interesting in their own right, and a source of some very simple statements which are nevertheless quite difficult to prove, some of which are still open problems. See, for example, Some explorations on two conjectures about Rademacher sequences by Hu, Lan and Sun. As I would like to look at some of these problems in the blog, I include this post to outline the basic constructions. One intriguing aspect of Rademacher series, is the way that they mix discrete distributions with combinatorial aspects, and continuous distributions. On the one hand, by the central limit theorem, Rademacher series can often be approximated well by a Gaussian distribution but, on the other hand, they depend on the discrete set of signs of the individual variables in the sum. Continue reading “Rademacher Series”

Pathwise Burkholder-Davis-Gundy Inequalities

As covered earlier in my notes, the Burkholder-David-Gundy inequality relates the moments of the maximum of a local martingale M with its quadratic variation,

\displaystyle  c_p^{-1}{\mathbb E}[[M]^{p/2}_\tau]\le{\mathbb E}[\bar M_\tau^p]\le C_p{\mathbb E}[[M]^{p/2}_\tau]. (1)

Here, {\bar M_t\equiv\sup_{s\le t}\lvert M_s\rvert} is the running maximum, {[M]} is the quadratic variation, {\tau} is a stopping time, and the exponent {p} is a real number greater than or equal to 1. Then, {c_p} and {C_p} are positive constants depending on p, but independent of the choice of local martingale and stopping time. Furthermore, for continuous local martingales, which are the focus of this post, the inequality holds for all {p > 0}.

Since the quadratic variation used in my notes, by definition, starts at zero, the BDG inequality also required the local martingale to start at zero. This is not an important restriction, but it can be removed by requiring the quadratic variation to start at {[M]_0=M_0^2}. Henceforth, I will assume that this is the case, which means that if we are working with the definition in my notes then we should add {M_0^2} everywhere to the quadratic variation {[M]}.

In keeping with the theme of the previous post on Doob’s inequalities, such martingale inequalities should have pathwise versions of the form

\displaystyle  c_p^{-1}[M]^{p/2}+\int\alpha dM\le\bar M^p\le C_p[M]^{p/2}+\int\beta dM (2)

for predictable processes {\alpha,\beta}. Inequalities in this form are considerably stronger than (1), since they apply on all sample paths, not just on average. Also, we do not require M to be a local martingale — it is sufficient to be a (continuous) semimartingale. However, in the case where M is a local martingale, the pathwise version (2) does imply the BDG inequality (1), using the fact that stochastic integration preserves the local martingale property.

Lemma 1 Let X and Y be nonnegative increasing measurable processes satisfying {X\le Y-N} for a local (sub)martingale N starting from zero. Then, {{\mathbb E}[X_\tau]\le{\mathbb E}[Y_\tau]} for all stopping times {\tau}.

Proof: Let {\tau_n} be an increasing sequence of bounded stopping times increasing to infinity such that the stopped processes {N^{\tau_n}} are submartingales. Then,

\displaystyle  {\mathbb E}[1_{\{\tau_n\ge\tau\}}X_\tau]\le{\mathbb E}[X_{\tau_n\wedge\tau}]={\mathbb E}[Y_{\tau_n\wedge\tau}]-{\mathbb E}[N_{\tau_n\wedge\tau}]\le{\mathbb E}[Y_{\tau_n\wedge\tau}]\le{\mathbb E}[Y_\tau].

Letting n increase to infinity and using monotone convergence on the left hand side gives the result. ⬜

Moving on to the main statements of this post, I will mention that there are actually many different pathwise versions of the BDG inequalities. I opt for the especially simple statements given in Theorem 2 below. See the papers Pathwise Versions of the Burkholder-Davis Gundy Inequality by Bieglböck and Siorpaes, and Applications of Pathwise Burkholder-Davis-Gundy inequalities by Soirpaes, for slightly different approaches, although these papers do also effectively contain proofs of (3,4) for the special case of {r=1/2}. As usual, I am using {x\vee y} to represent the maximum of two numbers.

Theorem 2 Let X and Y be nonnegative continuous processes with {X_0=Y_0}. For any {0 < r\le1} we have,

\displaystyle  (1-r)\bar X^r\le (3-2r)\bar Y^r+r\int(\bar X\vee\bar Y)^{r-1}d(X-Y) (3)

and, if X is increasing, this can be improved to,

\displaystyle  \bar X^r\le (2-r)\bar Y^r+r\int(\bar X\vee\bar Y)^{r-1}d(X-Y). (4)

If {r\ge1} and X is increasing then,

\displaystyle  \bar X^r\le r^{r\vee 2}\,\bar Y^r+r^2\int(\bar X\vee\bar Y)^{r-1}d(X-Y). (5)

Continue reading “Pathwise Burkholder-Davis-Gundy Inequalities”

Pathwise Martingale Inequalities

Recall Doob’s inequalities, covered earlier in these notes, which bound expectations of functions of the maximum of a martingale in terms of its terminal distribution. Although these are often applied to martingales, they hold true more generally for cadlag submartingales. Here, I use {\bar X_t\equiv\sup_{s\le t}X_s} to denote the running maximum of a process.

Theorem 1 Let X be a nonnegative cadlag submartingale. Then,

  • {{\mathbb P}\left(\bar X_t \ge K\right)\le K^{-1}{\mathbb E}[X_t]} for all {K > 0}.
  • {\lVert\bar X_t\rVert_p\le (p/(p-1))\lVert X_t\rVert_p} for all {p > 1}.
  • {{\mathbb E}[\bar X_t]\le(e/(e-1)){\mathbb E}[X_t\log X_t+1]}.

In particular, for a cadlag martingale X, then {\lvert X\rvert} is a submartingale, so theorem 1 applies with {\lvert X\rvert} in place of X.

We also saw the following much stronger (sub)martingale inequality in the post on the maximum maximum of martingales with known terminal distribution.

Theorem 2 Let X be a cadlag submartingale. Then, for any real K and nonnegative real t,

\displaystyle  {\mathbb P}(\bar X_t\ge K)\le\inf_{x < K}\frac{{\mathbb E}[(X_t-x)_+]}{K-x}. (1)

This is particularly sharp, in the sense that for any distribution for {X_t}, there exists a martingale with this terminal distribution for which (1) becomes an equality simultaneously for all values of K. Furthermore, all of the inequalities stated in theorem 1 follow from (1). For example, the first one is obtained by taking {x=0} in (1). The remaining two can also be proved from (1) by integrating over K.

Note that all of the submartingale inequalities above are of the form

\displaystyle  {\mathbb E}[F(\bar X_t)]\le{\mathbb E}[G(X_t)] (2)

for certain choices of functions {F,G\colon{\mathbb R}\rightarrow{\mathbb R}^+}. The aim of this post is to show how they have a more general `pathwise’ form,

\displaystyle  F(\bar X_t)\le G(X_t) - \int_0^t\xi\,dX (3)

for some nonnegative predictable process {\xi}. It is relatively straightforward to show that (2) follows from (3) by noting that the integral is a submartingale and, hence, has nonnegative expectation. To be rigorous, there are some integrability considerations to deal with, so a proof will be included later in this post.

Inequality (3) is required to hold almost everywhere, and not just in expectation, so is a considerably stronger statement than the standard martingale inequalities. Furthermore, it is not necessary for X to be a submartingale for (3) to make sense, as it holds for all semimartingales. We can go further, and even drop the requirement that X is a semimartingale. As we will see, in the examples covered in this post, {\xi_t} will be of the form {h(\bar X_{t-})} for an increasing right-continuous function {h\colon{\mathbb R}\rightarrow{\mathbb R}}, so integration by parts can be used,

\displaystyle  \int h(\bar X_-)\,dX = h(\bar X)X-h(\bar X_0)X_0 - \int X\,dh(\bar X). (4)

The right hand side of (4) is well-defined for any cadlag real-valued process, by using the pathwise Lebesgue–Stieltjes integral with respect to the increasing process {h(\bar X)}, so can be used as the definition of {\int h(\bar X_-)dX}. In the case where X is a semimartingale, integration by parts ensures that this agrees with the stochastic integral {\int\xi\,dX}. Since we now have an interpretation of (3) in a pathwise sense for all cadlag processes X, it is no longer required to suppose that X is a submartingale, a semimartingale, or even require the existence of an underlying probability space. All that is necessary is for {t\mapsto X_t} to be a cadlag real-valued function. Hence, we reduce the martingale inequalities to straightforward results of real-analysis not requiring any probability theory and, consequently, are much more general. I state the precise pathwise generalizations of Doob’s inequalities now, leaving the proof until later in the post. As the first of inequality of theorem 1 is just the special case of (1) with {x=0}, we do not need to explicitly include this here.

Theorem 3 Let X be a cadlag process and t be a nonnegative time.

  1. For real {K > x},
    \displaystyle  1_{\{\bar X_t\ge K\}}\le\frac{(X_t-x)_+}{K-x}-\int_0^t\xi\,dX (5)

    where {\xi=(K-x)^{-1}1_{\{\bar X_-\ge K\}}}.

  2. If X is nonnegative and p,q are positive reals with {p^{-1}+q^{-1}=1} then,
    \displaystyle  \bar X_t^p\le q^p X^p_t-\int_0^t\xi dX (6)

    where {\xi=pq\bar X_-^{p-1}}.

  3. If X is nonnegative then,
    \displaystyle  \bar X_t\le\frac{e}{e-1}\left( X_t \log X_t +1\right)-\int_0^t\xi\,dX (7)

    where {\xi=\frac{e}{e-1}\log(\bar X_-\vee1)}.

Continue reading “Pathwise Martingale Inequalities”

Semimartingale Local Times

Figure 1: Brownian motion B with local time L and auxilliary Brownian motion W

For a stochastic process X taking values in a state space E, its local time at a point {x\in E} is a measure of the time spent at x. For a continuous time stochastic process, we could try and simply compute the Lebesgue measure of the time at the level,

\displaystyle  L^x_t=\int_0^t1_{\{X_s=x\}}ds. (1)

For processes which hit the level {x} and stick there for some time, this makes some sense. However, if X is a standard Brownian motion, it will always give zero, so is not helpful. Even though X will hit every real value infinitely often, continuity of the normal distribution gives {{\mathbb P}(X_s=x)=0} at each positive time, so that that {L^x_t} defined by (1) will have zero expectation.

Rather than the indicator function of {\{X=x\}} as in (1), an alternative is to use the Dirac delta function,

\displaystyle  L^x_t=\int_0^t\delta(X_s-x)\,ds. (2)

Unfortunately, the Dirac delta is not a true function, it is a distribution, so (2) is not a well-defined expression. However, if it can be made rigorous, then it does seem to have some of the properties we would want. For example, the expectation {{\mathbb E}[\delta(X_s-x)]} can be interpreted as the probability density of {X_s} evaluated at {x}, which has a positive and finite value, so it should lead to positive and finite local times. Equation (2) still relies on the Lebesgue measure over the time index, so will not behave as we may expect under time changes, and will not make sense for processes without a continuous probability density. A better approach is to integrate with respect to the quadratic variation,

\displaystyle  L^x_t=\int_0^t\delta(X_s-x)d[X]_s (3)

which, for Brownian motion, amounts to the same thing. Although (3) is still not a well-defined expression, since it still involves the Dirac delta, the idea is to come up with a definition which amounts to the same thing in spirit. Important properties that it should satisfy are that it is an adapted, continuous and increasing process with increments supported on the set {\{X=x\}},

\displaystyle  L^x_t=\int_0^t1_{\{X_s=x\}}dL^x_s.

Local times are a very useful and interesting part of stochastic calculus, and finds important applications to excursion theory, stochastic integration and stochastic differential equations. However, I have not covered this subject in my notes, so do this now. Recalling Ito’s lemma for a function {f(X)} of a semimartingale X, this involves a term of the form {\int f^{\prime\prime}(X)d[X]} and, hence, requires {f} to be twice differentiable. If we were to try to apply the Ito formula for functions which are not twice differentiable, then {f^{\prime\prime}} can be understood in terms of distributions, and delta functions can appear, which brings local times into the picture. In the opposite direction, which I take in this post, we can try to generalise Ito’s formula and invert this to give a meaning to (3). Continue reading “Semimartingale Local Times”

A Process With Hidden Drift

Consider a stochastic process X of the form

\displaystyle  X_t=W_t+\int_0^t\xi_sds, (1)

for a standard Brownian motion W and predictable process {\xi}, defined with respect to a filtered probability space {(\Omega,\mathcal F,\{\mathcal F_t\}_{t\in{\mathbb R}_+},{\mathbb P})}. For this to make sense, we must assume that {\int_0^t\lvert\xi_s\rvert ds} is almost surely finite at all times, and I will suppose that {\mathcal F_\cdot} is the filtration generated by W.

The question is whether the drift {\xi} can be backed out from knowledge of the process X alone. As I will show with an example, this is not possible. In fact, in our example, X will itself be a standard Brownian motion, even though the drift {\xi} is non-trivial (that is, {\int\xi dt} is not almost surely zero). In this case X has exactly the same distribution as W, so cannot be distinguished from the driftless case with {\xi=0} by looking at the distribution of X alone.

On the face of it, this seems rather counter-intuitive. By standard semimartingale decomposition, it is known that we can always decompose

\displaystyle  X=M+A (2)

for a unique continuous local martingale M starting from zero, and unique continuous FV process A. By uniqueness, {M=W} and {A=\int\xi dt}. This allows us to back out the drift {\xi} and, in particular, if the drift is non-trivial then X cannot be a martingale. However, in the semimartingale decomposition, it is required that M is a martingale with respect to the original filtration {\mathcal F_\cdot}. If we do not know the filtration {\mathcal F_\cdot}, then it might not be possible to construct decomposition (2) from knowledge of X alone. As mentioned above, we will give an example where X is a standard Brownian motion which, in particular, means that it is a martingale under its natural filtration. By the semimartingale decomposition result, it is not possible for X to be an {\mathcal F_\cdot}-martingale. A consequence of this is that the natural filtration of X must be strictly smaller than the natural filtration of W.

The inspiration for this post was a comment by Gabe posing the following question: If we take {\mathbb F} to be the filtration generated by a standard Brownian motion W in {(\Omega,\mathcal F,{\mathbb P})}, and we define {\tilde W_t=W_t+\int_0^t\Theta_udu}, can we find an {\mathbb F}-adapted {\Theta} such that the filtration generated by {\tilde W} is smaller than {\mathbb F}? Our example gives an affirmative answer. Continue reading “A Process With Hidden Drift”