Markov Processes

In these notes, the approach taken to stochastic calculus revolves around stochastic integration and the theory of semimartingales. An alternative starting point would be to consider Markov processes. Although I do not take the second approach, all of the special processes considered in the current section are Markov, so it seems like a good idea to introduce the basic definitions and properties now. In fact, all of the special processes considered (Brownian motion, Poisson processes, Lévy processes, Bessel processes) satisfy the much stronger property of being Feller processes, which I will define in the next post.

Intuitively speaking, a process X is Markov if, given its whole past up until some time s, the future behaviour depends only its state at time s. To make this precise, let us suppose that X takes values in a measurable space {(E,\mathcal{E})} and, to denote the past, let {\mathcal{F}_t} be the sigma-algebra generated by {\{X_s\colon s\le t\}}. The Markov property then says that, for any times {s\le t} and bounded measurable function {f\colon E\rightarrow{\mathbb R}}, the expected value of {f(X_t)} conditional on {\mathcal{F}_s} is a function of {X_s}. Equivalently,

\displaystyle  {\mathbb E}\left[f(X_t)\mid\mathcal{F}_s\right]={\mathbb E}\left[f(X_t)\mid X_s\right] (1)

(almost surely). More generally, this idea makes sense with respect to any filtered probability space {\mathbb{F}=(\Omega,\mathcal{F},\{\mathcal{F}_t\}_{t\ge 0},{\mathbb P})}. A process X is Markov with respect to {\mathbb{F}} if it is adapted and (1) holds for times {s\le t}.

Continuous-time Markov processes are usually defined in terms of transition functions. These specify how the distribution of {X_t} is determined by its value at an earlier time s. To state the definition of transition functions, it is necessary to introduce the concept of transition probabilities.

Definition 1 A (transition) kernel N on a measurable space {(E,\mathcal{E})} is a map

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle N\colon E\times\mathcal{E}\rightarrow{\mathbb R}_+\cup\{\infty\},\smallskip\\ &\displaystyle (x,A)\mapsto N(x,A) \end{array}

such that

  1. for each {x\in E}, the map {A\mapsto N(x,A)} is a measure.
  2. for each {A\in\mathcal{E}}, the map {x\mapsto N(x,A)} is measurable.

If, furthermore, {N(x,E)=1} for all {x\in E}, then N is a transition probability.

A transition probability, then, associates to each {x\in E} a probability measure on {(E,\mathcal{E})}. This can be used to describe how the conditional distribution of a process at a time t depends on its value at an earlier time s

\displaystyle  {\mathbb P}(X_t\in A\mid\mathcal{F}_s)=N(X_s,A).

Given any such kernel N and {x\in E}, we denote the integral of a measurable function {f\colon E\rightarrow{\mathbb R}_+\cup\{\infty\}} with respect to the measure {N(x,\cdot)} by

\displaystyle  Nf(x)\equiv\int f(y)\,N(x,dy).

Then, {Nf} is itself a measurable function from E to {{\mathbb R}_+\cup\{\infty\}}. For the case of an indicator function {f=1_A} of any {A\in\mathcal{E}}, {Nf(x)=N(x,A)} is measurable by definition. This extends to positive linear combinations of such indicator functions (the simple functions) and, then, by monotone convergence, measurability of {Nf} extends to all nonnegative measurable f. In the case where N is a transition probability, {Nf} is well-defined and bounded for all bounded and measurable {f\colon E\rightarrow{\mathbb R}}.

Two kernels M and N can be combined by first applying N followed by M to get the kernel MN,

\displaystyle  MNf(x)\equiv\int\!\!\int f(z)\,N(y,dz)M(x,dy).

Suppose that M, N are transition probabilities describing how a process X goes from its state at time s to its conditional distribution at time t and from its state at time t to a distribution at time u respectively, for {s\le t\le u}. Then, the combination MN describes how it transitions from its state at time s to u. This is just the tower rule for conditional expectation,

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle{\mathbb E}[f(X_u)\mid\mathcal{F}_s]&\displaystyle={\mathbb E}\left[{\mathbb E}[f(X_u)\mid\mathcal{F}_t]\;\middle\vert\;\mathcal{F}_s\right]\smallskip\\ &\displaystyle={\mathbb E}\left[Nf(X_t)\mid\mathcal{F}_s\right]=MNf(X_s). \end{array}

A Markov process is defined by a collection of transition probabilities {P_{s,t}}, one for each {s\le t}, describing how it goes from its state at time s to a distribution at time t. I only consider the homogeneous case here, meaning that {P_{s,t}} depends only on the size ts of the time increment and not explicitly on the start or end times s, t, so the notation {P_{s,t}} can be replaced by {P_{t-s}}. This is not much of a restriction because, given an inhomogeneous Markov process X, it is always possible to look at its space-time process {(t,X_t)} taking values in {{\mathbb R}_+\times E}, which will be homogeneous Markov.

Definition 2 A homogeneous transition function on {(E,\mathcal{E})} is a collection {\{P_t\}_{t\ge 0}} of transition probabilities on {(E,\mathcal{E})} such that {P_sP_t=P_{s+t}} for all {s,t\ge 0}.

A process X is Markov with transition function {P=\{P_t\}}, and with respect to a filtered probability space {(\Omega,\mathcal{F},\{\mathcal{F}_t\}_{t\ge 0},{\mathbb P})} if it is adapted and

\displaystyle  {\mathbb E}\left[f(X_t)\mid\mathcal{F}_s\right]=P_{t-s}f(X_s)

(almost surely), for all times {s<t}.

The identity {P_{s+t}=P_sP_t} is known as the Chapman-Kolmogorov equation, and is required so that the transition probabilities are consistent with the tower rule for conditional expectations. Alternatively {\{P_t\}} forms a semigroup.

As an example, standard Brownian motion, B, has the defining property that {B_t-B_s} is normal with mean 0 and variance ts independently of {\{B_u\colon u\le s\}}, for times {s<t}. Equivalently, it is a Markov process with the transition function

\displaystyle  P_tf(x)=\frac{1}{\sqrt{2\pi t}}\int e^{-\frac{1}{2t}(y-x)^2}f(y)\,dy.

In practice, other than for a small number of special processes, it is not possible to write down transition functions explicitly. Instead, the processes are defined as solutions to stochastic differential equations, or the transition function is defined via an infinitesimal generator.

The distribution of a Markov process is determined uniquely by its transition function and initial distribution.

Lemma 3 Suppose that X is a Markov process on {(E,\mathcal{E})} with transition function P such that {X_0} has distribution {\mu}. Then, for any times {0=t_0<t_1<\cdots<t_n} and bounded measurable function {f\colon E^{n+1}\rightarrow{\mathbb R}},

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle{\mathbb E}[f(X_{t_0},\ldots,X_{t_n})]=\smallskip\\ &\displaystyle\quad\int\!\!\int\!\cdots\!\int f(x_0,\ldots,x_n)\,P_{t_n-t_{n-1}}(x_{n-1},dx_n)\cdots P_{t_1-t_0}(x_0,dx_1)\mu(dx_0). \end{array} (2)

Proof: Let us start by showing that for {n\ge1},

\displaystyle  {\mathbb E}[f(X_{t_0},\ldots,X_{t_n})\mid\mathcal{F}_{t_{n-1}}] = g(X_{t_0},\ldots,X_{t_{n-1}}) (3)

where {g\colon E^n\rightarrow{\mathbb R}} is the measurable function

\displaystyle  g(x_0,\ldots,x_{n-1})=\int f(x_0,\ldots,x_{n-1},y)\,P_{t_n-t_{n-1}}(x_{t_{n-1}},dy).

In the case where {f(x_0,\ldots,x_n)} is just a product {u(x_0,\ldots,x_{n-1})v(x_n)}, then

\displaystyle  g(x_0,\ldots,x_{n-1})=u(x_0,\ldots,x_{n-1})P_{t_n-t_{n-1}}v(x_{n-1})

and (3) follows from the Markov property {{\mathbb E}[v(X_{t_n})\mid\mathcal{F}_{t_{n-1}}]=P_{t_n-t_{n-1}}v(X_{t_{n-1}})}. The extension of (3) to arbitrary bounded measurable {f\colon E^{n+1}\rightarrow{\mathbb R}} is now just a standard application of the monotone class theorem.

The proof of (2) follows from an induction on n. For the case {n=0}, (2) just reduces to the statement that {X_0} has distribution {\mu}. So, suppose that {n\ge1} and (2) holds for n replaced by n-1. Then, using (3)

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle{\mathbb E}[f(X_{t_0},\ldots,X_{t_n})]={\mathbb E}[g(X_{t_0},\ldots,X_{t_{n-1}})]\smallskip\\ =&\displaystyle \int\!\!\int\!\cdots\!\int g(x_0,\ldots,x_{n-1})\,P_{t_{n-1}-t_{n-2}}(x_{n-2},dx_{n-1})\cdots P_{t_1-t_0}(x_0,dx_1)\mu(dx_0). \end{array}

Substituting in the expression for g gives the result. ⬜

The finite distributions of a process X taking values in E are defined to be the distributions, in {E^n}, of {(X_{t_1},\ldots,X_{t_n})} for all finite sets of times {t_1,\ldots,t_n}. Expression (2) describes the finite distributions of a Markov process solely in terms of its initial distribution and transition function.

Corollary 4 Suppose that X and Y are Markov processes, each with the same transition function {\{P_t\}} and such that {X_0} and {Y_0} have the same distribution {\mu} on {(E,\mathcal{E})}.

Then, X and Y have the same finite distributions.

It is important to know that Markov processes do indeed exist for any given transition function and initial distribution. This is a consequence of the Kolmogorov extension theorem.

Theorem 5 Let {(E,\mathcal{E})} be a measurable space, and {\Omega=E^{{\mathbb R}_+}} be the space of functions {{\mathbb R}_+\rightarrow E}. Denote its coordinate process by X,

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} &\displaystyle X_t\colon\Omega\rightarrow E,\smallskip\\ &\displaystyle\omega\mapsto X_t(\omega)=\omega(t). \end{array}

Also, let {\mathcal{F}^0} be the sigma-algebra generated by {\{X_t\colon t\in{\mathbb R}_+\}} and, for each {t\ge 0}, let {\mathcal{F}_t^0} be the sigma-algebra generated by {\{X_s\colon s\le t\}}. So, {\{\mathcal{F}^0_t\}} is a filtration on the measurable space {(\Omega,\mathcal{F}^0)} with respect to which X is adapted.

Then, for every transition function {\{P_t\}} and probability distribution {\mu} on E, there is a unique probability measure {{\mathbb P}} on {(\Omega,\mathcal{F}^0)} under which X is a Markov process with transition function {\{P_t\}} and initial distribution {\mu}.

The superscripts `0′ just denote the fact that we are using the uncompleted sigma-algebras. Once the probability measure has been defined, it is standard practice to complete the filtration, which does not affect the Markov property.

Proof: For any finite subset {S\subset{\mathbb R}_+}, let {\mathcal{F}_S} denote the sigma-algebra generated by {\{X_t\colon t\in S\}}. If {S=\{t_0,t_1,\ldots,t_n\}} for times {0=t_0<\cdots<t_n}, let {{\mathbb P}_S} denote the probability measure on {(\Omega,\mathcal{F}_S)} given by (2). Note that, if {S^\prime=S\setminus\{t_k\}} for some {1\le k<n} then the identity

\displaystyle  \int\!\int \cdot\,P_{t_{k+1}-t_k}(x_k,dx_{k+1})P_{t_k-t_{k-1}}(x_{k-1},dx_k)=\int\cdot\,P_{t_{k+1}-t_{k-1}}(x_{k-1},dx_{k+1})

shows that, if {f(x)} is independent of {x_k}, expression (2) for {{\mathbb P}_S} reduces to the definition of {{\mathbb P}_{S^\prime}}. Similarly, if {S^\prime=S\setminus\{t_n\}} then the fact that {P_{t_n-t_{n-1}}(x,E)=1} shows that definition (2) reduces to that of {{\mathbb P}_{S^\prime}} whenever {f(x)} does not depend on {x_n}. In either case, {{\mathbb P}_{S^\prime}} is the restriction of {{\mathbb P}_S} to {\mathcal{F}_{S^\prime}}.

By successively removing elements from S, this shows that {{\mathbb P}_{S^\prime}} is the restriction of {{\mathbb P}_S} to {\mathcal{F}_{S^\prime}} for any {S^\prime\subseteq S} with {0\in S^\prime}. The measures {{\mathbb P}_S} for finite subsets {S\subset{\mathbb R}_+} with {0\in S} are therefore consistent, and the Kolmogorov extension theorem implies the existence of a unique measure {{\mathbb P}} on {(\Omega,\mathcal{F}^0)} agreeing with {{\mathbb P}_S} on {\mathcal{F}_S} for all such finite subsets {S\subset{\mathbb R}_+}.

We have now shown the existence and uniqueness of the probability measure {{\mathbb P}} satisfying (2). Only the Markov property remains. So, consider times {s<t} and bounded measurable function {v\colon E\rightarrow{\mathbb R}}. Also, let Z be the {\mathcal{F}^0_s}-measurable random variable {u(X_{t_0},\ldots,X_{t_{n-1}})} for some {0=t_0<\cdots<t_{n-1}=s} and {u\colon E^n\rightarrow{\mathbb R}}. Then, applying (2) to {f(x_0,\ldots,x_n)=u(x_0,\ldots,x_{n-1})v(x_n)} gives

\displaystyle  \setlength\arraycolsep{2pt} \begin{array}{rl} \displaystyle{\mathbb E}[Zv(X_t)] &\displaystyle=\int u(x_0,\ldots,x_{n-1})v(x_n)P_{t_n-t_{n-1}}(x_{n-1},dx_n)\cdots\mu(dx_0)\smallskip\\ &\displaystyle=E[ZP_{t_n-t_{n-1}}v(X_{t_n})]={\mathbb E}[ZP_{t-s}v(X_t)]. \end{array}

By the monotone class theorem, this extends to all bounded and {\mathcal{F}^0_s}-measurable random variables Z, giving {{\mathbb E}[v(X_t)\mid\mathcal{F}^0_s]=P_{t-s}v(X_s)} as required. ⬜

The unique measure with respect to which X is Markov with the given transition function and initial distribution is denoted by {{\mathbb P}_\mu}, and expectation with respect to this measure is denoted by {{\mathbb E}_\mu[\cdot]}. In particular, if {\mu} consists of a single mass of weight 1 at a point {x\in E} then we write {{\mathbb P}_x\equiv{\mathbb P}_\mu} and, similarly, write {{\mathbb E}_x[\cdot]} for {{\mathbb E}_\mu[\cdot]}. It can be seen that, if Z is a bounded random variable, the map {x\mapsto{\mathbb E}_x[Z]} is measurable and

\displaystyle  {\mathbb E}_\mu[Z]=\int{\mathbb E}_x[Z]\,\mu(dx)

for an arbitrary probability measure {\mu} on {(E,\mathcal{E})}. In fact, this follows from expression (2).

It is sometimes useful to generalize Definition 2 to sub-Markovian transition functions. These are sets of kernels {\{P_t\}_{t\ge0}} satisfying the identity {P_sP_t=P_{s+t}} as above but, instead of requiring that {P_t} are transition probabilities, the inequality {P_t(x,E)\le1} is imposed instead. Although such transition functions do not have probabilities summing to 1, they can be represented by Markov processes which are killed at some time. To do this, we adjoin a a new point {\Delta}, called the cemetery or coffin state, to E to form a new state space {E_\Delta=E\cup\{\Delta\}}. The sigma-algebra {\mathcal{E}_\Delta} consists of the sets {A\subseteq E_\Delta} such that {A\setminus\{\Delta\}} is in {\mathcal{E}}. Then, a new (Markovian) transition function {\{P^\Delta_t\}} can be defined on {(E_\Delta,\mathcal{E}_\Delta)} by

\displaystyle  P_t^\Delta f(x)=\begin{cases} P_tf\vert_E(x)+(1-P_t(x,E))f(\Delta),&\textrm{if }x\in E,\\ f(\Delta),&\textrm{if }x=\Delta. \end{cases} (4)

It can be checked that the identity {P_s^\Delta P_t^\Delta=P_{s+t}^\Delta} is satisfied, and that {P^\Delta_t(x,E_\Delta)=1}. Then, {P^\Delta_t} represents a process which, over an interval {[s,t]} has probability {P^\Delta_{t-s}(X_s,\Delta)=1-P_{t-s}(X_s,E)} of being killed, in which case it jumps to the state {\Delta} and remains there.

8 thoughts on “Markov Processes

  1. Why the function f need to be non-negative in the transition kernel case, and it is bounded in the transition prob case?

    1. For general transition kernels, integrals with respect to arbitrary measurable functions are not well defined. You need to restrict either to integrable or to nonnegative functions. For transition probabilities, bounded functions are integrable, so can be used without imposing an additional integrability condition.

  2. It seems to me that there is a little mistake in the proof of lemma 3 in the case where f is just a product of u and v. Why depends P_t_n,t_n-1 only of one variable?

      1. Yeah, i forgot that you only consider only the space and time homogenous case. Can you maybe add a few words which property the functions f must have in the defining property of a markov process and why (or if) it is eqivalent to the equation P[X_t\in A|\mathcal{F}_t]=P[X_t\in A|X_s] (the transition kernel exist if the state space is Borel)

  3. Does every Markov process (defined by Eq(1)) have a transition function? I understood the converse is true (by Theorem 5). But can we construct a transition function from a given Markov process (defined by Eq(1))?

Leave a comment