Thursday, December 5, 2013

Cost Functional

Introduction

The proper specification of a cost functional, which is typically defined as a weighted sum of lack of fits to observations and other physical constraints, is essential for obtaining a satisfactory solution of an objective analysis problem (Legler and Navon 1991).

The data that I am typically dealing with in my research is derived from scanning Doppler radars, e.g., measurements from X-band (95-GHz) and C-band (5.6-GHz) radars. These radars scan the atmosphere in the so-called plan position indicator (PPI) mode, which is best described by the spherical coordinate system as each gate location can be represented by the coordinate-triplet \((r, \phi, \theta)\). This data is usually then interpolated onto a more natural and intuitive 3-D Cartesian analysis domain. From here, numerous geophysical parameters can be estimated, e.g., the 3-D wind field \((\mathbf{u}, \mathbf{v}, \mathbf{w})\). And this is where a cost functional is often needed.

The Cost Functional

A cost functional \(J\) is typically defined as the sum of independent constraints, one of these relying on observations from one or more instruments, and more often than not some sort of constraint on the smoothness of the analysis. All of these individual constraints represent so-called lack of fits between observations or a priori assumptions. We arrive at our best estimate of the truth when the global minimum of \(J\) has been found. So for example, if we're attempting to estimate the 3-D wind field, we're looking for the simultaneous satisfaction of

$$
\frac{\partial J}{\partial \mathbf{u}} = 0, \; \frac{\partial J}{\partial \mathbf{v}} = 0, \; \frac{\partial J}{\partial \mathbf{w}} = 0 .
$$

As a result, we are expected to compute the gradient of the cost functional at one point or another, which, for some constraints is easily defined analytically, but for others (e.g., smoothness) not so much. So let's focus on the smoothness constraint.

The Smoothness Constraint

The smoothness constraint can be defined multiple different ways, e.g., using first-order or second-order partial derivatives. For the sake of this post, let's focus on the smoothness constraint defined in Potvin et al. (2012), in particular only the first term defined in equation (5). This term looks something like the following

$$
\begin{equation}
J_s = \frac{1}{2} \sum_{i,j,k} \lambda_s
             \left[\left(\frac{\Delta^2 u}{\Delta x^2}\right)^2 +
                    \left(\frac{\Delta^2 u}{\Delta y^2}\right)^2 +
                    \left(\frac{\Delta^2 v}{\Delta x^2}\right)^2 +
                    \left(\frac{\Delta^2 v}{\Delta y^2}\right)^2    
             \right]
\label{Eq:1} 
\end{equation}
$$

The summation is over all the Cartesian grid points in the analysis domain, and \(\lambda_s\) represents the weight of this term on the entire analysis.

For the sake of simplicity we'll assume \(u\) is only a function of \(x\), i.e. \(u_i = u(x_i)\). Therefore the second order partial derivative can be approximated by a finite difference,

$$
\begin{equation}
\label{Eq:2}
\left( \frac{\partial^2 u}{\partial x^2} \right)_i^2 = \left( \frac{u_{i+1} - 2 u_i + u_{i-1}}{\Delta x^2} + \mathcal{O}( \Delta x^2) \right)^2
\end{equation}
$$

Equation \(\ref{Eq:2}\) becomes an approximation when the \(\mathcal{O}(\Delta x^2)\) term is dropped. This is referred to as a centered difference scheme with second-order accuracy. Note how a scheme like this can only be used for interior grid points since it requires information from left and right grid points. Therefore forward or backward difference schemes must be employed at the grid boundaries.

Now, since \(J_s\) is the summation of all these partial derivatives over all the Cartesian grid points, it is actually a function of several terms involving \(u_i\). To illustrate this, consider the following,

\( \left( \frac{\partial^2 u}{\partial x^2} \right)_{i-2}^2 \approx \left( \frac{u_{i-1} - 2 u_{i-2} + u_{i-3}}{\Delta x^2} \right)^2 \)

\( \left( \frac{\partial^2 u}{\partial x^2} \right)_{i-1}^2 \approx \left( \frac{u_{i} - 2 u_{i-1} + u_{i-2}}{\Delta x^2} \right)^2 \)

\( \left( \frac{\partial^2 u}{\partial x^2} \right)_{i}^2 \approx \left( \frac{u_{i+1} - 2 u_{i} + u_{i-1}}{\Delta x^2} \right)^2 \)

\( \left( \frac{\partial^2 u}{\partial x^2} \right)_{i+1}^2 \approx \left( \frac{u_{i+2} - 2 u_{i+1} + u_{i}}{\Delta x^2} \right)^2 \)

\( \left( \frac{\partial^2 u}{\partial x^2} \right)_{i+2}^2 \approx \left( \frac{u_{i+3} - 2 u_{i+2} + u_{i+1}}{\Delta x^2} \right)^2 \)

Notice how \(u_i\) manifests itself in three of the above equations. However, from the finite difference scheme we chose to approximate the partial derivative, we see that the influence of the \(u_i\) term vanishes when we move past the \(i - 2\) and \(i + 2\) derivatives. Therefore, in general for a second-order-accurate centered difference scheme, the partial derivative of \(J_s\) with respect to \(u\) at the \(i\)-th grid point is,

$$
\begin{equation}
\label{Eq:3}
\left( \frac{\partial J_s}{\partial u} \right)_i^2 = \frac{\lambda_s}{\Delta x^2} \left[ \left(\frac{\partial^2 u}{\partial x^2}\right)_{i-1} - 2 \left(\frac{\partial^2 u}{\partial x^2}\right)_i + \left(\frac{\partial^2 u}{\partial x^2}\right)_{i+1} \right]
\end{equation}
$$

As stated before, equation \(\ref{Eq:3}\) was derived using the second-order-accurate centered difference scheme we chose at the beginning. The chain rule has been used here to calculate the derivatives. Fundamentally, equation \(\ref{Eq:3}\) is highly dependent on the finite difference scheme we choose to approximate the partial derivatives. As a result we need to keep track of which finite difference schemes are being utilized. For example, for grid points at the boundaries of the grid, say at \(i = 1\), we now have to use a forward difference scheme because we only have information of \(u\) in the forward direction. Notice that now we would have,

\( \left( \frac{\partial^2 u}{\partial x^2} \right)_{i=1}^2 \approx \left( \frac{u_{1} - 2 u_{2} + u_{3}}{\Delta x^2} \right)^2 \)

\( \left( \frac{\partial^2 u}{\partial x^2} \right)_{i=2}^2 \approx \left( \frac{u_{3} - 2 u_{2} + u_{1}}{\Delta x^2} \right)^2 \)

\( \left( \frac{\partial^2 u}{\partial x^2} \right)_{i=3}^2 \approx \left( \frac{u_{4} - 2 u_{3} + u_{2}}{\Delta x^2} \right)^2 \)

where a forward difference scheme of \(\mathcal{O}(\Delta x)\) is used at \(i = 1\) and a centered difference scheme of \(\mathcal{O}(\Delta x^2)\) has been used for all subsequent derivatives. Notice how \(u_1\) manifests itself in two of the above equations, where its influence vanishes once we get past the second grid point \((i \gt 2)\). So now we would have a slightly different equation than that of equation \(\ref{Eq:3}\) to describe the derivative of \(J_s\) with respect to \(u\) at the boundary \(i = 1\),

$$
\begin{equation}
\label{Eq:4}
\left( \frac{\partial J_s}{\partial u} \right)_{i=1}^2 = \frac{\lambda_s}{\Delta x^2} \left[ \left(\frac{\partial^2 u}{\partial x^2}\right)_{i=1} + \left(\frac{\partial^2 u}{\partial x^2}\right)_{i=2} \right]
\end{equation}
$$


Conclusions


Once again, please keep in mind that equations \(\ref{Eq:3}\) and \(\ref{Eq:4}\) have been produced using the particular finite difference schemes that we chose to approximate the partial derivatives. The take home message is that if a constraint cannot be expressed analytically, like the smoothness constraint which has to be approximated by finite differences, then in turn the derivative of that constraint is highly dependent on those finite difference schemes.

References


Legler, D. M. and I. M. Navon, 1991: VARIATM - A Fortran program for objective analysis of pseudostress wind fields using large-scale conjugate-gradient minimization. Computers & Geosciences, 17, pp. 1-21.