A very common technique when looking at general stochastic processes is to break them down into separate martingale and drift terms. This is easiest to describe in the discrete time situation. So, suppose that is a stochastic process adapted to the discrete-time filtered probability space . If X is integrable, then it is possible to decompose it into the sum of a martingale M and a process A, starting from zero, and such that is -measurable for each . That is, A is a predictable process. The martingale condition on M enforces the identity
So, A is uniquely defined by
and is referred to as the compensator of X. This is just the predictable term in the Doob decomposition described at the start of the previous post.
In continuous time, where we work with respect to a complete filtered probability space , the situation is much more complicated. There is no simple explicit formula such as (1) for the compensator of a process. Instead, it is defined as follows.
Definition 1 The compensator of a cadlag adapted process X is a predictable FV process A, with , such that is a local martingale.
For an arbitrary process, there is no guarantee that a compensator exists. From the previous post, however, we know exactly when it does. The processes for which a compensator exists are precisely the special semimartingales or, equivalently, the locally integrable semimartingales. Furthermore, if it exists, then the compensator is uniquely defined up to evanescence. Definition 1 is considerably different from equation (1) describing the discrete-time case. However, we will show that, at least for processes with integrable variation, the continuous-time definition does follow from the limit of discrete time compensators calculated along ever finer partitions (see below).
Although we know that compensators exist for all locally integrable semimartingales, the notion is often defined and used specifically for the case of adapted processes with locally integrable variation or, even, just integrable increasing processes. As with all FV processes, these are semimartingales, with stochastic integration for locally bounded integrands coinciding with Lebesgue-Stieltjes integration along the sample paths. As an example, consider a homogeneous Poisson process X with rate . The compensated Poisson process is a martingale. So, X has compensator .
We start by describing the jumps of the compensator, which can be done simply in terms of the jumps of the original process. Recall that the set of jump times of a cadlag process are contained in the graphs of a sequence of stopping times, each of which is either predictable or totally inaccessible. We, therefore, only need to calculate separately for the cases where is a predictable stopping time and when it is totally inaccessible.
For the remainder of this post, it is assumed that the underlying filtered probability space is complete. Whenever we refer to the compensator of a process X, it will be understood that X is a special semimartingale. Also, the jump of a process is defined to be zero at time .
Lemma 2 Let A be the compensator of a process X. Then, for a stopping time ,
- if is totally inaccessible.
- if is predictable.
Proof: As A is a cadlag predictable process, we have if is totally inaccessible. Only the second statement remains to be proven.
Suppose that is predictable. By definition of the compensator, the process is a local martingale. Then, as is -measurable,
It just needs to be shown that is almost surely zero. By localising, we can suppose without loss of generality that is integrable. From the classification of predictable stopping times, we know that is a local martingale. Integrating any bounded predictable process with respect to N gives a local martingale which, as it is dominated in , is a proper martingale. By optional sampling,
As every bounded -measurable random variable can be written in the form for a predictable process , this gives as required.
In particular, this result shows that the compensator of any continuous special semimartingale is itself continuous. We can go further than this, though, and show that all quasi-left-continuous processes have continuous compensators. Recall that a cadlag process X is quasi-left-continuous if (almost-surely) for all predictable stopping times . This covers many kinds of processes which are commonly studied, such as Feller processes which, in particular, includes all Lévy processes. So, even when studying non-continuous processes, it is often still the case that the compensator is continuous.
Corollary 3 If X is quasi-left-continuous then its compensator is a continuous FV process.
Furthermore, if X is increasing, then it is quasi-left-continuous if and only if its compensator is continuous.
Proof: If A is the compensator of X, then Lemma 2 implies that at any inaccessible stopping time and, for predictable times ,
The final equality uses the fact that X is quasi-left-continuous, so .
Conversely, suppose that X is increasing with a continuous compensator. Then,
for any predictable stopping time . However, as X is increasing, is nonnegative, so almost surely.
The decomposition of special semimartingales given in the previous post can be modified to obtain a unique decomposition for all semimartingales, whether locally integrable or not. This can be done by subtracting out all jumps which are larger than 1 to obtain a locally bounded semimartingale and applying the decomposition to this to obtain equation (2) below. Lemma 2 implies that the jumps of the martingale term M obtained by doing this are uniformly bounded and, hence, M is a locally bounded martingale. Note that, by combining the final two terms on the right hand side of (2) into a single finite variation term, we decompose the semimartingale X into a locally bounded martingale and an FV process, as stated in the Bichteler-Dellacherie theorem.
Lemma 4 Every semimartingale X decomposes uniquely as
where M is a locally bounded martingale and A is a predictable FV process with .
Furthermore, is bounded by 1 and, hence, is bounded by 2.
Proof: As X is cadlag, it can only have finitely many jumps larger than 1 in any finite interval, so
is a well-defined FV process. So, is a semimartingale with jumps bounded by 1. Therefore, Y is locally bounded and, in particular, is locally integrable. Applying the special semimartingale decomposition gives (1). It still remains to show that M is locally bounded.
Now, for any predictable stopping time , Lemma 2 gives
So and as required. In particular, is uniformly bounded, so M is a locally bounded martingale.
Recall that, for a locally bounded integrand , stochastic integration preserves the properties of being a local martingale, and also, of being a predictable FV process. Consequently, if a process X has compensator A, then has compensator . So, taking compensators commutes with stochastic integration. This can be generalised slightly to non-locally-bounded integrands.
Lemma 5 Suppose that X has compensator A and that is a predictable X-integrable process such that is locally integrable. Then, is A-integrable and has compensator .
Proof: This is just a restatement of Theorem 3 of the previous post. We can write for a local martingale M. Then, is both M-integrable and A-integrable. Furthermore, is a predictable FV process and is a local martingale.
Similarly, taking the compensator of a process commutes with continuous time-changes. A time-change is defined by a set of finite stopping times such that whenever , and we say that this defines a continuous time-change if is almost-surely continuous. This can be used to transform the filtration into the time-changed filtration . Similarly, if X is any stochastic process, then is the time-changed process. We say that is a continuous time-change of X. If X is progressively measurable then will be –adapted.
Lemma 6 Suppose that X has compensator A and that is a continuous time-change. Then, is the compensator of the time-changed process , with respect to the filtration .
Proof: By definition, is a local martingale so, as previously shown, the time changed process is a local martingale with respect to .
This shows that is an -local martingale. It remains to show that is a predictable FV process. First, as the time change is continuous, will be cadlag. Furthermore, the variation of over an interval is equal to the variation of A over , which is almost surely finite. Also, as previously shown, continuous time changes take predictable processes to predictable processes. Therefore, is a predictable FV process.
Processes with Locally Integrable Variation
We now specialise to processes with locally integrable variation. To say that a cadlag adapted process X has locally integrable variation means that there is a sequence of stopping times increasing to infinity, and such that the variations are all integrable. This is equivalent to X being a locally integrable FV process.
Lemma 7 Let X be a cadlag adapted process. Then, the following are equivalent.
- X has locally integrable variation.
- X is a locally integrable FV process.
Proof: If X has locally integrable variation then, in particular, it has locally finite variation and must be an FV process. It only needs to be shown that, for FV processes, the property of being locally integrable is equivalent to having locally integrable variation. However, local integrability of a cadlag adapted process is equivalent to the local integrability of its jumps. Also, the variation process has jumps . This gives
|X is locally integrable||⇔||is locally integrable|
|⇔||is locally integrable|
|⇔||V is locally integrable|
In particular, as cadlag predictable processes are locally bounded, this means that compensators automatically have locally integrable variation. However, if X has locally integrable variation, then we can go a bit further and bound the expected variation of its compensator.
Lemma 8 Let X be a cadlag adapted process with locally integrable variation, and let A be its compensator. Then,
for all stopping times and predictable processes . Furthermore, if the right hand side of (3) is finite then
Proof: By applying monotone convergence to (3) and dominated convergence to (4), it is enough to consider the case where is bounded. Furthermore, replacing by if necessary, we only need to consider the case with .
Suppose that is bounded, and let . Then, is a local martingale. So, there exists stopping times increasing to infinity such that are uniformly integrable martingales, hence have zero expectation, and such that has integrable variation. Therefore,
As A is a predictable FV process, there exists a predictable process with such that is the variation of A. Replacing by in (5) gives
Letting n increase to infinity and using monotone convergence gives (3).
Now, suppose that the right hand side of (3) is finite. Then, is a local martingale with integrable variation, so is a martingale dominated in . Therefore, , giving (4).
We can go even further and restrict to increasing processes. In that case, compensators are themselves increasing, and we get equality between expectations of stochastic integrals with respect to a process and with respect to its compensator. This is sometimes used for the definition of the compensator.
Lemma 9 Let X be a cadlag, adapted and locally integrable increasing process. Then, its compensator A is also increasing and,
for all stopping times and nonnegative predictable processes .
Furthermore, the compensator A of X is the unique right-continuous predictable and increasing process with which satisfies (6) for all nonnegative predictable and .
Proof: As with all predictable FV processes, A decomposes into the difference of increasing processes, , and there is a predictable set S such that . Choose stopping times increasing to infinity such that are integrable. Then, (4) gives
So, almost surely. Therefore, and, as it is increasing from 0, A is identically zero. So, is increasing.
Now, let be a nonnegative predictable process. By monotone convergence, it is enough to prove (6) in the case where is bounded. Then, there exists stopping times increasing to infinity such that are integrable. Equation (4) gives
Letting n go to infinity and using monotone convergence gives the result.
We now prove the `furthermore’ part of the lemma. It just needs to be shown that if A is right-continuous, predictable and increasing with , then identity (6) just for the case with is enough to guarantee that A is the compensator of X. Equivalently, that is a local martingale.
As X is locally integrable, there exists a sequence of stopping times increasing to infinity such that is integrable. Then, for any nonnegative predictable ,
In particular, taking shows that is integrable and, hence, that is integrable. Then, letting be any nonnegative elementary integrand gives and, so, is a martingale. Therefore, is a local martingale as required.
Approximation by the Discrete-Time Compensator
The definition of the compensator of a continuous-time process given by Definition 1 above does appear to be considerably different from the much simpler case of discrete-time compensators given by equation (1). It is natural to ask whether the continuous-time situation does really arise from the discrete-time formula applied in the limit over small time steps. For processes with integrable variation this is indeed the case, although some care does need to be with how we take the limit. The idea is to discretise time using a partition, apply (1) to obtain an approximation to the compensator, then take the limit under the appropriate topology as the mesh of the partition goes to zero.
Define a stochastic partition of to be a sequence of stopping times
The mesh of the partition is denoted by . Then, given an integrable process X we define its compensator along P by
Now letting the mesh of P go to zero, the question is whether or not tends to A. The precise answer will depend on the topology in which we take the limit. However, in the quasi-left-continuous case it turns out that convergence occurs uniformly in , which is about as strong a mode of convergence as we could have hoped for. There is one further technical point; the mesh is itself a random variable. So, in taking the limit as goes to zero, it is also necessary to state the topology under which is to be understood. In order to obtain a strong result, it is best to use as weak a topology as possible. I use convergence in probability, denoted by here.
Theorem 10 Let X be a cadlag adapted process with integrable total variation, and A be its compensator. If X is quasi-left-continuous or, more generally, if A is continuous, then tends uniformly to A in . That is,
Stated explicitly, this convergence means that for each there exists a such that for all partitions P satisfying . Or, in terms of sequences, if is a sequence of partitions with tending to zero in probability, then tends to zero in .
The proof of Theorem 10 will be given in a moment but, first, let’s consider what happens when A is not continuous. Then Theorem 10 does not apply, and convergence does not occur uniformly in . In fact, need not converge to A in at any positive time. Even if we were to just look at the weaker notion of convergence in probability, the limit still need not exist. As I’ll show using an example in an upcoming post, what can go wrong is that, at a jump time of A, the approximation can randomly overshoot or undershoot the jump by an amount and probability which does not vanish as the mesh goes to zero. In a sense, though, the jump in the approximation will match that of A on average. We can capture this, rather weak notion of convergence, by the weak topology on . A sequence of integrable random variables is said to converge weakly to the (integrable) limit Z if as n goes to infinity for any bounded random variable Y. As an example demonstrating that weak convergence does not imply convergence in probability, consider a sequence of independent random variables, each with the uniform distribution on [-1,1]. This cannot possibly converge to anything in probability but, with respect to the sigma-algebra generated by , it does converge to zero in the weak topology. If Y is measurable with respect to finitely many of the then, by independence, for all but finitely many n. The set of such Y is dense in , from which it follows that as for all integrable Y, so weakly.
Now, it is true that the discrete-time approximations do converge weakly to the compensator at each time.
Theorem 11 Let X be a cadlag adapted process with integrable total variation, and A be its compensator. Then tends to A under the weak topology in at each time. More precisely, for any random time ,
weakly in as .
Stated explicitly, this means that for each uniformly bounded random variable Y and constant , there exists such that for all partitions P satisfying . Or, in terms of sequences, if is a sequence of partitions with mesh tending to zero in probability and Y is a uniformly bounded random variable, then . Also note that, in Theorem 11, the time is any random time, not necessarily a stopping time.
In some approaches, Theorem 10, 11, or an equivalent, is used to prove the existence of compensators in the first place. That is, these results are proved without the a-priori assumption that compensators exist. In the treatment given in these notes we have already proved the existence of the compensator by other means, and just need to show that the approximations do indeed converge to the expected limit.
Before moving on the proofs of these two theorems, let us note some simple facts concerning the definitions. First, for the sake of brevity, given any process depending on the discrete parameter n, I will use the notation to denote the difference . As the process X has integrable variation, the same is true for A (Lemma 8). Therefore is an -dominated local martingale, so is a true martingale, and is uniformly integrable. So for any stopping times . We can rewrite the definition of to express it in terms of the compensator A instead of X. Doing this, equation (7) is replaced by
Equivalently, is left-continuous and is constant over each of the intervals , with and
In particular, Jensen’s inequality gives . So, summing over n, the expected total variation of is
which is bounded by the expected total variation of A.
Proof of Theorem 10: Corollary 3 tells us that A is continuous whenever X is quasi-left-continuous. Let us start by considering the case where the A has total variation bounded by some positive constant K, and define the process over nonnegative integer n. This is a discrete-time -adapted process and (9) tells us that . So, M is a martingale. Then, we can obtain the following sequence of inequalities,
The first line is Doob’s L2 martingale inequality. The second is the Ito isometry which, in discrete-time, just consists in expanding out as and then noting that the martingale property implies that for all . The third line is using (9) to expand as the difference of and , and the Cauchy-Schwarz inequality to bound by the sum of squares. The fourth line is using Jensen’s inequality to move the square inside the conditional expectation. The final line is using , summing over n and bounding by K, as was assumed above.
Next, using the fact that is constant on each interval ,
If we square, take the supremum over n, and take the expected value of this, then use (11) to bound the term, we obtain the bound
However, as A is continuous with finite total variation, it is uniformly continuous. That is, tends to zero as goes to zero. So, if is a sequence of partitions with mesh going to zero in probability, then tends to zero in probability as n goes to infinity. As A is uniformly bounded by K, dominated convergence implies that the right hand side of (12) tends to zero as goes to zero in probability and, therefore uniformly in and, hence, in . This completes the proof when A has uniformly bounded variation.
Finally, consider the case where A has integrable total variation . Then, for any fixed let be the first time at which the variation of A hits K. This is a stopping time, and the stopped process has variation bounded by K. Letting , then we can define the compensators of B and C on the partition P as above. By linearity of the definition, and
By the argument above, the first term on the right hand side tends to zero in as the mesh of P goes to zero in probability. The second term is bounded by the total variation of and C, so its expected value is bounded by twice the expected variation of C. If the is the variation of A over intervals then the total variation of C is . Then,
However, is bounded by and, as K goes to infinity, then goes to infinity. Dominated convergence shows that the right hand side can be made as small as we like by making K large, so the left hand side is zero as required.
Now that the proof of Theorem 10 is out of the way, we can move on to the case where the compensator is not continuous and give a proof of Theorem 11. In this case, we do not obtain uniform convergence in as discussed above and, instead, only manage weak convergence. One way of proving is to justify the following sequence of equalities and limit. Letting M be a cadlag version of the process and be an approximation on the partition P,
The limit here is just using the fact that as the mesh of P goes to zero. So, according to this method, convergence in the weak topology is a consequence of dominated convergence. However, as we already have Theorem 10 giving the result when the compensator is continuous, we can simplify the proof of Theorem 11. It is only necessary to prove the case where A has a single discontinuity at a predictable stopping time, which can then be pieced together with the continuous case to obtain the result.
Lemma 12 Suppose that for a predictable stopping time and integrable -measurable random variable U. Then, weakly in for all random times .
Proof: Let us first suppose that is a stopping time, and let Y be any uniformly bounded random variable. Also, suppose for now that the filtration is right-continuous, so that we can take a cadlag version of the martingale . This is uniformly bounded and, by optional sampling, we have . Expanding out using equation (8), which is absolutely convergent in by (10), we obtain,
However, is zero unless , in which case it is equal to U. So, letting denote the maximum less than ,
Letting the mesh of P go to zero, tends to from the left,
Now, as is predictable, there exists a sequence of stopping times strictly increasing to . Then, tends to but, by Levy’s upwards convergence theorem, it also converges to .
as required. Here, we have used the fact that U is -measurable. This proves the case where is a stopping time.
Now, suppose that is any random time. Noting that, from the definition, is zero whenever ,
as the mesh of P goes to zero in probability. This limit follows from the argument above applied to . On the other hand, let be a sequence of stopping times strictly increasing to . Without loss of generality, we can suppose that U and Y are nonnegative. Then, as is increasing,
Again, this limit follows from the argument above, at the stopping times and . The first term on the right-hand-side is zero, as . The second term can be made as small as we like by choosing m large. So, combining this with (13) gives
This completes the proof of the lemma in the case where the filtration is right-continuous, so that the martingale M has a cadlag version. The only reason why a cadlag version was required was so that the optional sampling result holds for stopping times , and so that the left-limit is well-defined. However, it is not necessary for right-continuity of the filtration for us to be able to satisfy these properties. Rather than taking a cadlag version of M, there always exists a version with left and right limits everywhere which is right-continuous outside of a fixed countable set. Furthermore, optional sampling still holds for such versions, and the argument above carries through unchanged.
Combining this lemma with Theorem 10 completes the proof of Theorem 11.
Proof of Theorem 11: As the compensator A is predictable, there exists a sequence of predictable stopping times such that whenever and , and such that contains all the jump times of A. So, we can decompose A into a continuous term plus a sum over its jumps
Furthermore, the sum of the variations of these terms is equal to the variation of A, so it converges uniformly in . This means that we can calculate the compensator of each these terms along the partition P, multiply by a uniformly bounded random variable Y, and take expectations to get
The term converges to as the mesh of the partition goes to zero, by Theorem 10. As A is predictable, so that is -measurable, Lemma 12 guarantees that the terms inside the sum converge to . Also, equation (10) says that the terms inside the sum are bounded by , which has finite sum. So, dominated convergence allows us to exchange the limit with the summation,
16 thoughts on “Compensators”
At the beginning of Lemma 7’s proof and also in the beginning if the preceding post, it is mentioned that a process of Locally Finite Variation is a FV process.
As defined in these note in the beginning of the “Properties of Stochastic Integral” post, a process is a FV process, if it is càdlàg, adapted with respect to a complete filtered probability space, and such that with probability one it has finite variations over bounded time intervals.
I don’t see how it is obvious that a Locally Finite Variation process is a FV process. Maybe I got it wrong but there is nothing in the Localization procedure that entails this fact. Could you elaborate about this ?
Locally finite variation implies finite variation over every bounded interval. Add in cadlag and adapted, which were assumed, and it is an FV process.
“Locally finite variation implies finite variation over every bounded interval” this is precisely this point that I cannot see.
If has locally finite variation this means there exists a sequence of stopping times increasing almost surely to such that the stopped process has finite variation over bounded interval, right ?
But why should this property hold true when passing to the limit ?
I think I miss something obvious sorry about that…
Well, if there exists a sequence of random times (stopping times or not) which increase to infinity, and has finite variation over the interval [0,t], then X must have finite variation on [0,t]. This follows because we have for large enough n.
I think I got it this time. To be complete, I ‘ll try to prove the equivalence. The contrapositve proposition is the easiest way to see this for me. So i try to show that X not FV implies X not of Locally Finite Variation.
If is not FV, then there exists an event A of strictly postive probability over which is of infinite variation over some interval [0,t]. In that case, cannot be of Locally Finite Variation, because for any sequence of stopping time increasing almost surely to the stopped process as is of infinite variation over [0,t] conditionally on the set A of strictly positive probability.
Yes, that works, although I don’t really think that going to the contrapositive is easier. Just note that, for any finite t, we (almost surely) have for large enough n, and finite variation on , so finite variation on [0,t].
I am not sure to get exactly what you mean when you say :
“for any finite t>0, we (almost surely) have for large enough n,”
What about for an absolutely continuous random variable of full support over (for example an exponential rv of parameter 1) ?
This sequence is a proper localizing sequence of stopping times as it is increasing almost surely to , but for a fixed time t>0, and any n, we don’t have almost surely that as for any n we have .
I think that it’s this point that has disturbed from the begining and which is why I had to use the contrapositive proposition to convince myself.
“for almost every ω ∈ Ω there exists an n such that …” is maybe a clearer way of saying it. That is, n depends on ω. This is implied by almost surely tending to infinity (it’s equivalent to being almost surely unbounded). It can be unclear sometimes exactly what is held fixed and what is dependent on ω, especially when ω is implicit as is usually the case.
Btw, I edited some latex in your post, hope it’s correct now. You have to be especially careful with < and > signs, which can get interpreted as marking HTML tags. The only foolproof way I know is to use the HTML codes < and >.
Now there’s nothing left unclear to me, sorry to be so long to get such elementary details.
Thx for the latex advice and corrections.
I have another elementary question (I’m afraid) to ask about the proof of point 2 on lemma 2.
There you prove that a.s. for an Integrable Martigale . Right ? [GL: Correct]
Applying localization argument here then means that for a localizing sequence of the now only locally integrable, local martingale we have :
a.s. for all right ? [GL: Correct]
But letting and noting , we can only conclude that :
This is not the statement to be proven and I miss the step that would gives the final conclusion.
Suuch an argument would be :
But I can’t justify properly the intertwining of Lim and Expectation operator that gives the conclusion wihtout adding extra assumptions.
Maybe (or probably should I say) I missed something that would trivially lead to the conclusion, so would you please point me that out ?
Rather than taking the limit directly, use the fact that to write,
If the indicator function in the front of this expression is moved inside the conditional expectation, then the identity is just using .
Now, you can take the limit as n goes to infinity, and it doesn’t have to be commuted with the expectation at all. You have . Also, as Y is taken to be 0 at infinity, this also holds if the indicator function is changed to . So, .
There is something that I glossed over here, and I was intending to add to the end of this post as a note (I'll do this). I take the conditional expectation without proving that is integrable. In fact, it need not be integrable. For a random variable Z and sub-sigma-algebra , the conditional expectation is always defined, although it could be infinite. If this is almost surely finite, the is well-defined, by for any such that is integrable. Also, is almost-surely finite if and only if there is a sequence with and are integrable. From this, you can show that almost surely, for any locally integrable process X. So, makes sense. I’ll append a note to this post along those lines.
Also, I do plan on tidying up these posts and trying to incorporate comments or clarifications but, for now, at least I have your comments at the bottom of the post in case it causes anyone else any confusion. I think the jump in Lemma 2 here is too large for many people to see how this works, so I’m not surprised you picked up on it. Like I said, the original thinking was to add a note concerning conditional expectations at the end and link to it (but, I forgot).
Thank’s for this very neat explanation and for the additional elements.
So here your which tends to almost surely by hypothesis.
The worst part here is that I knew this generalisation of conditional expectation. By the way it is named -Integrability with respect to in He, Wang, and Yan ‘s book “Semimartingale Theory and Stochastic Calculus” (first Chapter).
TheBridge: Thanks for the reference for σ-integrability.
Could you please give me a reference on this stuff?
I need a reference book having results like Lemma 9. Any help will be greatly appreciated.
I’m currently studying Markov processes and the concept of compensators always prop up. From the first paragraph, I got the gist of the idea of compensators. But I can’t get the rationale why do we have to break stochastic processes into martingale and drift terms? What’s the purpose of splitting them up? I’m sorry if this may sound naive. Thank you.
There are lots of reasons. The decomposition of stochastic processes into martingale and finite variation terms turns out to be useful in many situations, and I cannot do this idea justice in a short comment. From the purely theoretical situation for semimartingales, it is useful for integration.
1) The drift term has finite variation, so we can integrate with respect to it. There are also different techniques to integrate with respect to a martingale and these definitions are consistent (for a finite variation martingale, the integrals coincide). So, we can split it up and use the different constructions for the martingale and drift components.
2) The decomposition helps tell you about the distribution. Suppose you want to compute the expectation of f(X_t) for a process X. Only the drift component of f(X) contributes (subject to technical constraints). If you want the expectation of f(X)^2, then the quadratic variation term also contributes, which comes from the martingale term.
3) For SDEs representing physical processes, the drift and martingale components often represent distinct things, with the drift component given by the contribution from the non-stochastic physical laws of the system and the martingale component coming from some external random noise. Consider the Ornstein-Uhlenbeck process representing the noisy motion of particles, with the drift term coming from resistance/friction and the martingale term coming from the random shocks of molecules hitting the particles.
Probably lots of other examples could be given, but the decomposition is useful very often for different reasons.