Learning from Prof Sergey Fomel’s paper: “Adaptive multiple subtraction using regularized non stationary regression”

stationary process:


Formally, let \left\{X_t\right\} be a stochastic process and let F_{X}(x_{t_1 + \tau}, \ldots, x_{t_k + \tau}) represent the cumulative distribution function of the joint distribution of \left\{X_t\right\} at times t_1 + \tau, \ldots, t_k + \tau. Then, \left\{X_t\right\} is said to be stationary if, for all k, for all \tau, and for all t_1, \ldots, t_k,

 F_{X}(x_{t_1+\tau} ,\ldots, x_{t_k+\tau}) = F_{X}(x_{t_1},\ldots, x_{t_k}).

Since \tau does not affect F_X(\cdot) F_{X} is not a function of time.


In all cases, the estimation target is a function of the independent variables called the regression function. In regression analysis, it is also of interest to characterize the variation of the dependent variable around the regression function which can be described by a probability distribution.

The performance of regression analysis methods in practice depends on the form of the data generating process, and how it relates to the regression approach being used. Since the true form of the data-generating process is generally not known, regression analysis often depends to some extent on making assumptions about this process. These assumptions are sometimes testable if a sufficient quantity of data is available. Regression models for prediction are often useful even when the assumptions are moderately violated, although they may not perform optimally. However, in many applications, especially with small effects or questions of causality based on observational data, regression methods can give misleading results.

match filtering:?

Adaptive subtraction: Standard adaptive subtraction methods use the well-known minimum energy criterion, stating that the total energy after optimal multiple attenuation should be minimal.

Adaptive subtraction

The goal of adaptive subtraction is to estimate the non-stationary filters ${\bf f}$ that minimize the objective function 

g({\bf f})=\Vert{\bf Pf}-{\bf d}\Vert^2
,\end{displaymath} (71)

where ${\bf P}$ represents the non-stationary convolution with the multiple model obtained with SRMP (i.e., Chapter [*]) and ${\bf d}$ are the input data. These filters are estimated in a least-squares sense for one shot gather at a time. Note that in practice, a regularization term is usually added in equation ([*]) to enforce smoothness between filters. This strategy is similar to the one used in Chapter [*]. The residual vector ${\bf Pf}-{\bf d}$ contains the estimated primaries.

Multiple subtraction

In this section, the multiple model computed in the preceding section is subtracted from the data with two techniques. The model is obtained after shot interpolation with the sparseness constraint. The first technique is a pattern-based method introduced in Chapter [*] that separates primaries from multiples according to their multivariate spectra. These spectra are approximated with prediction-error filters. The second technique adaptively subtract the multiple model from the data by estimating non-stationary matching filters (see Chapter [*]).

Shaping regularization: http://www.reproducibility.org/RSF/book/jsg/shape/paper_html/

Least squares sense:

Least Squares Fitting

DOWNLOAD Mathematica Notebook EXPLORE THIS TOPIC IN the MathWorld Classroom Contribute to this entryLeastSquaresFitting

A mathematical procedure for finding the best-fitting curve to a given set of points by minimizing the sum of the squares of the offsets (“the residuals”) of the points from the curve. The sum of the squares of the offsets is used instead of the offset absolute values because this allows the residuals to be treated as a continuous differentiable quantity. However, because squares of the offsets are used, outlying points can have a disproportionate effect on the fit, a property which may or may not be desirable depending on the problem at hand.

Matched filter:

In signal processing, a matched filter (originally known as a North filter[1]) is obtained by correlating a known signal, or template, with an unknown signal to detect the presence of the template in the unknown signal. This is equivalent to convolving the unknown signal with a conjugated time-reversed version of the template. The matched filter is the optimal linear filter for maximizing the signal to noise ratio (SNR) in the presence of additive stochastic noise. Matched filters are commonly used in radar, in which a known signal is sent out, and the reflected signal is examined for common elements of the out-going signal. Pulse compression is an example of matched filtering. It is so called because impulse response is matched to input pulse signals. Two-dimensional matched filters are commonly used in image processing, e.g., to improve SNR for X-ray. Matched filtering is a demodulation technique with LTI filters to maximize SNR.

Derivation of the matched filter impulse response[edit]

The following section derives the matched filter for a discrete-time system. The derivation for a continuous-time system is similar, with summations replaced with integrals.

The matched filter is the linear filter, h, that maximizes the output signal-to-noise ratio.

\ y[n] = \sum_{k=-\infty}^{\infty} h[n-k] x[k].

Though we most often express filters as the impulse response of convolution systems, as above (see LTI system theory), it is easiest to think of the matched filter in the context of the inner product, which we will see shortly.

We can derive the linear filter that maximizes output signal-to-noise ratio by invoking a geometric argument. The intuition behind the matched filter relies on correlating the received signal (a vector) with a filter (another vector) that is parallel with the signal, maximizing the inner product. This enhances the signal. When we consider the additive stochastic noise, we have the additional challenge of minimizing the output due to noise by choosing a filter that is orthogonal to the noise.

Let us formally define the problem. We seek a filter, h, such that we maximize the output signal-to-noise ratio, where the output is the inner product of the filter and the observed signal x.

Our observed signal consists of the desirable signal s and additive noise v:

\ x=s+v.\,

Let us define the covariance matrix of the noise, reminding ourselves that this matrix has Hermitian symmetry, a property that will become useful in the derivation:

\ R_v=E\{ vv^\mathrm{H} \}\,

where v^\mathrm{H} denotes the conjugate transpose of v, and E denotes expectation. Let us call our output, y, the inner product of our filter and the observed signal such that

\ y = \sum_{k=-\infty}^{\infty} h^*[k] x[k] = h^\mathrm{H}x = h^\mathrm{H}s + h^\mathrm{H}v = y_s + y_v.

We now define the signal-to-noise ratio, which is our objective function, to be the ratio of the power of the output due to the desired signal to the power of the output due to the noise:

\mathrm{SNR} = \frac{|y_s|^2}{E\{|y_v|^2\}}.

We rewrite the above:

\mathrm{SNR} = \frac{|h^\mathrm{H}s|^2}{E\{|h^\mathrm{H}v|^2\}}.

We wish to maximize this quantity by choosing h. Expanding the denominator of our objective function, we have

\ E\{ |h^\mathrm{H}v|^2 \} = E\{ (h^\mathrm{H}v){(h^\mathrm{H}v)}^\mathrm{H} \} = h^\mathrm{H} E\{vv^\mathrm{H}\} h = h^\mathrm{H}R_vh.\,

Now, our \mathrm{SNR} becomes

\mathrm{SNR} = \frac{ |h^\mathrm{H}s|^2 }{ h^\mathrm{H}R_vh }.

We will rewrite this expression with some matrix manipulation. The reason for this seemingly counterproductive measure will become evident shortly. Exploiting the Hermitian symmetry of the covariance matrix R_v, we can write

\mathrm{SNR} = \frac{ | {(R_v^{1/2}h)}^\mathrm{H} (R_v^{-1/2}s) |^2 }
                  { {(R_v^{1/2}h)}^\mathrm{H} (R_v^{1/2}h) },

We would like to find an upper bound on this expression. To do so, we first recognize a form of the Cauchy-Schwarz inequality:

\ |a^\mathrm{H}b|^2 \leq (a^\mathrm{H}a)(b^\mathrm{H}b),\,

which is to say that the square of the inner product of two vectors can only be as large as the product of the individual inner products of the vectors. This concept returns to the intuition behind the matched filter: this upper bound is achieved when the two vectors a and b are parallel. We resume our derivation by expressing the upper bound on our \mathrm{SNR} in light of the geometric inequality above:

\mathrm{SNR} = \frac{ | {(R_v^{1/2}h)}^\mathrm{H} (R_v^{-1/2}s) |^2 }
                  { {(R_v^{1/2}h)}^\mathrm{H} (R_v^{1/2}h) }
             \frac{ \left[
             			{(R_v^{1/2}h)}^\mathrm{H} (R_v^{1/2}h)
             			{(R_v^{-1/2}s)}^\mathrm{H} (R_v^{-1/2}s)
             		\right] }
                  { {(R_v^{1/2}h)}^\mathrm{H} (R_v^{1/2}h) }.

Our valiant matrix manipulation has now paid off. We see that the expression for our upper bound can be greatly simplified:

\mathrm{SNR} = \frac{ | {(R_v^{1/2}h)}^\mathrm{H} (R_v^{-1/2}s) |^2 }
                  { {(R_v^{1/2}h)}^\mathrm{H} (R_v^{1/2}h) }
             \leq s^\mathrm{H} R_v^{-1} s.

We can achieve this upper bound if we choose,

\ R_v^{1/2}h = \alpha R_v^{-1/2}s

where \alpha is an arbitrary real number. To verify this, we plug into our expression for the output \mathrm{SNR}:

\mathrm{SNR} = \frac{ | {(R_v^{1/2}h)}^\mathrm{H} (R_v^{-1/2}s) |^2 }
                  { {(R_v^{1/2}h)}^\mathrm{H} (R_v^{1/2}h) }
           = \frac{ \alpha^2 | {(R_v^{-1/2}s)}^\mathrm{H} (R_v^{-1/2}s) |^2 }
                  { \alpha^2  {(R_v^{-1/2}s)}^\mathrm{H} (R_v^{-1/2}s) }
           = \frac{ | s^\mathrm{H} R_v^{-1} s |^2 }
                  { s^\mathrm{H} R_v^{-1} s }
           = s^\mathrm{H} R_v^{-1} s.

Thus, our optimal matched filter is

\ h = \alpha R_v^{-1}s.

We often choose to normalize the expected value of the power of the filter output due to the noise to unity. That is, we constrain

\ E\{ |y_v|^2 \} = 1.\,

This constraint implies a value of \alpha, for which we can solve:

\ E\{ |y_v|^2 \} = \alpha^2 s^\mathrm{H} R_v^{-1} s = 1,


\ \alpha = \frac{1}{\sqrt{s^\mathrm{H} R_v^{-1} s}},

giving us our normalized filter,

\ h = \frac{1}{\sqrt{s^\mathrm{H} R_v^{-1} s}} R_v^{-1}s.

If we care to write the impulse response of the filter for the convolution system, it is simply the complex conjugate time reversal of h.

Though we have derived the matched filter in discrete time, we can extend the concept to continuous-time systems if we replace R_v with the continuous-time autocorrelation function of the noise, assuming a continuous signal s(t), continuous noise v(t), and a continuous filter h(t).