
Kluwer - Handbook of Biomedical Image Analysis Vol
.3.pdf360 |
|
Kybic and Unser |
by a finite number of parameters cj: |
|
|
g(x) = x + |
|
|
cjϕj(x) |
(9.9) |
|
|
j J |
|
where J is a set of parameter indexes and ϕj are the corresponding basis functions. This transforms a variational problem into a much easier finitedimensional minimization problem, for which numerous algorithms exist [59]. Moreover, the restriction of the family G of all possible functions g can already guarantee some useful properties, such as the regularity (smoothness) of the solution.
9.4.4.1 B-Spline Deformation Model
There are various possibilities for the choice of the basis functions ϕj for the deformation model (9.9). These include polynomials [37], harmonic functions [38,39], radial basis functions [1,102], and wavelets [64,65,103], all of which have been used in registration algorithms before. However, we have again chosen B-splines, basically for the same reasons that lead us to choose them to interpolate our images (see Section 9.4.3): their good approximation properties, computational efficiency [96], scalability, and additionally physical plausibility [95] (such as minimizing the “strain energy” !g !2 by cubic B-splines [104, 105]) and low interdependency thanks to short support. Their property of being able to represent affine transformations, including rigid body motion, is welcome, too. We have also evaluated the alternative wavelet representation of the same B- spline space [106], only to find that direct B-spline representation was again more efficient [96].
The B-spline deformation model is obtained by substituting a scaled version of the B-spline (or tensor product thereof) in (9.9)
g(x) = x + |
cjβnm (x/h − j) |
(9.10) |
|
j Ic ZN |
|
where nm is the degree of splines used, h is the knot spacing, and the division is taken elementwise. This corresponds to placing the knots on a regular grid over the image. For efficiency reasons, we require the node spacing h to be integer, which together with the separability of βnm(x) implies that the values of the B-spline βnm(x) are only needed at a very small number of points (nm + 1)h and that they can be precalculated.

Elastic Registration for Biomedical Applications |
361 |
9.4.5 Existence and Unicity
Since the useful range of the parameters c is naturally compactly constrained (for example, by excluding displacements larger than the image size) and since the criterion E is non-negative and continuous with respect to c (if at least linear interpolation for ftc is used), it is clear that E as a function of c has a minimum; i.e., the proposed problem has a solution. However, depending on the images at hand, the solution does not have to be unique and there can be local minima. Fortunately, this does not pose problems in practice, thanks to a multiresolution approach (section 9.4.6.2) which smoothes out images at coarse levels and brings us sufficiently close to the solution at fine resolution levels. The algorithm will find a solution if started within the attraction basin of that solution. The virtual springs (section 9.4.7) play a role of an a prioiri information and a regularization term; extra regularization can be applied, if desired [88, 107] (section 9.4.9).
9.4.6 Optimization Strategy
9.4.6.1 Optimization Algorithm
Recall from (9.7) and (9.10) that we need to minimize a criterion E with respect to a finite number of parameters c. To determine which of the many available algorithms performs best in our context, we tested four local iterative algorithms which can be cast into a following common framework: At each step i we take the actual estimate c(i) and calculate a proposed update c(i). If the step is successful, i.e., the criterion decreases, then the proposed point is accepted, c(i+1) = c(i) + c(i). Otherwise, a more conservative update c(i) is calculated, and the test is repeated.
1. Gradient descent with feedback step size adjustment. The update rule is:
c(i) = −µ E(c(i)). After a successful step, µ is multiplied by µ , other-
c f
wise it is divided by µ f .6
2.Gradient descent with quadratic step size estimation. We choose a step size µ minimizing the following approximation of the criterion around c(i):
E(c(i) + x) = E(c(i)) + xT E(c(i)) + α!x!2, where α is identified from the
c
6 We used µ f = 10 and µ f = 15.

362 |
Kybic and Unser |
two last calculated criterion values E. As a fallback strategy, the previous step size is divided by µ f , as above.
3. Conjugated gradient. This algorithm [59] chooses its descent directions to be mutually conjugate so that moving along one does not spoil the result of previous optimizations. To work well, the step size µ has to be chosen optimally. Therefore, at each step, we need to run another internal onedimensional minimization routine which finds the optimal µ; this makes it the slowest algorithm in our setting.
4. Marquardt-Levenberg. The most effective algorithm in the sense of the number of iterations was a regularized Newton method inspired by the
Marquardt-Levenberg algorithm (ML), as in [98]. We shall examine various
approximations of the Hessian matrix 2 E , see Section 9.4.8.1.
c
The choice of the best optimizer is always application-dependent. We observe that the behavior of all optimizers is almost identical at the beginning of the optimization process (see Fig. 9.7). The main factor determining the speed
280 |
|
|
|
|
|
|
|
MLdH |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
MLH |
|
270 |
|
|
|
|
|
|
|
GD |
|
260 |
|
|
|
|
|
|
|
|
|
250 |
|
|
|
|
|
|
|
|
|
240 |
|
|
|
|
|
|
|
|
|
230 |
|
|
|
|
|
|
|
|
|
220 |
|
|
|
|
|
|
|
|
|
210 |
|
|
|
|
|
|
|
|
|
200 |
|
|
|
|
|
|
|
|
|
190 |
|
|
|
|
|
|
|
|
|
180 |
|
|
|
|
|
|
|
|
|
170 |
|
|
|
|
|
|
|
|
|
0 |
2 |
4 |
6 |
8 |
10 |
12 |
14 |
16 |
18 |
Figure 9.7: The evolution of the SSD criterion during first 18 iterations when registering the Lena image, artificially deformed with 2 × 4 × 4 cubic B-spline coefficients and a maximum displacement of about 30 pixels, without multiresolution. The optimizers used were: Marquardt-Levenberg with full Hessian (MLH), Marquardt-Levenberg with only the diagonal of the Hessian taken into account (MLdH), and gradient descent (GD). The deformation was recovered in all cases with an accuracy between 0.1 and 0.01 pixels (see also section 9.4.10).

Elastic Registration for Biomedical Applications |
363 |
is therefore the cost of a single iteration, which was smallest for the gradient descent (GD) algorithm. Among the GD variants we recommend the quadratic step size estimation that outperforms the classical feedback adjustment. One additional pleasant property of the GD algorithm is its tendency to leave uninfluential coefficients intact, unlike the ML algorithm. Consequently, less regularization is needed for the GD algorithm. We choose the GD optimizer for most of our image registration tasks.
When, on the other hand, we work with a small number of parameters, the criterion is smooth, and high precision is needed, the ML algorithm [36] performs the best, as its higher cost per iteration is compensated for by a smaller number of iterations due to the quadratic convergence (Fig. 9.8).
9.4.6.2 Multiresolution
The robustness and efficiency of the algorithm can be significantly improved by a multiresolution approach: The task at hand is first solved at a coarse scale.
|
x 105 |
|
|
|
|
|
|
|
|
|
|
|
5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ML |
|
|
|
|
|
|
|
|
|
|
|
GD |
|
4.5 |
|
|
|
|
|
|
|
|
|
CG |
|
4 |
|
|
|
|
|
|
|
|
|
|
|
3.5 |
|
|
|
|
|
|
|
|
|
|
criterion |
3 |
|
|
|
|
|
|
|
|
|
|
2.5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
|
|
1.5 |
|
|
|
|
|
|
|
|
|
|
|
1 |
|
|
|
|
|
|
|
|
|
|
|
0.5 |
|
|
|
|
|
|
|
|
|
|
|
0 |
50 |
100 |
150 |
200 |
250 |
300 |
350 |
400 |
450 |
500 |
|
0 |
time [s]
Figure 9.8: Comparison of gradient descent (GD), conjugated gradient (CG), and Marquardt-Levenberg (ML) optimization algorithm performances when registering SPECT images with control grid of 6 × 6 × 6 knots. The graphs give the value of the finest-level SSD criterion of all successful (i.e., criterion-decreasing) iterations as a function of the execution time. The abrupt changes are caused by transitions between resolution levels.
364 |
Kybic and Unser |
Then, the results are propagated to the next finer level and used as a starting guess for solving the task at that level. This procedure is iterated until the finest level is reached.
In our algorithm, multiresolution is used twice. First, we build an image pyramid: a set of gradually reduced versions of the original image [108, 109]. This pyramid is compatible with our image representation (9.8) and is optimal in the L2-sense (i.e., compatible with the SSD criterion (9.7)), which ensures that the approximation made by substituting the lower resolution image is the best possible. We reduce images up to the size of 16 32 pixels, which works well in most cases. The coarse versions of images (half size) are generated using a reduction operator (see Section 9.4.8.4) and coarse level solutions are extrapolated to finer levels using an expansion operator (cubic spline interpolation).
Second, we use multiresolution for the warping function as well. We start with a deformation g described with very few parameters ck, and with a large distance h between knots. After the optimization of ck is complete, we halve the distance between knots. This approximately corresponds to doubling the number of knots in each direction, i.e., quadrupling (in 2D) the number of coefficients ck. Because of the two-scale spline relation, we can exactly represent the warping function from the old, coarse space, in the new, finer space. The sequence obeys h j+1 = h j /2. The process starts with g being identity.
The global strategy combines the two multiresolutions by alternatively decreasing the scales for the image and for the model.
The consequence of using multiresolution is that the algorithm works best for images and deformations that follow the multiresolution model; i.e., when a low resolution version is a good approximation of the finer resolution version.
9.4.7 Semi-Automatic Registration
We realize that although the multiresolution approach leads to a very robust registration algorithm, there are cases when it is misled by an apparent similarity of features which do not correspond physically. Therefore, we developed an extension of the algorithm which can use expert hints. The hints come in the form of a set of landmarks and are used to gear the algorithm towards the correct solution. Similar idea appeared also for non-parametric approaches [110, 111].
The landmark information is incorporated in the automatic process using the concept of virtual springs, tying each pair of corresponding points together. We

Elastic Registration for Biomedical Applications |
365 |
augment the data part of the criterion E with a term Es, corresponding to the potential energy of the springs, and minimize the sum of the two: Ec = E + Es. The spring term is:
S
Es = αi !g(xi) − zi!2 |
(9.11) |
i=1 |
|
where S is the number of springs, αi are weighting factors corresponding to their stiffnesses, and xi, resp. zi, are the landmark positions in the reference, resp., test images. The spring factors αi control the influence of the particular landmark pairs.
As an example, we present the registration of an MRI slice from an atlas7 with a sample MRI test image8. To identify the same structures in the test image, we register it with the unlabeled version of the atlas. Once the geometric correspondence is established, the structures and their labels from the atlas can be projected onto the test image. The unsupervised registration correctly registers some of the structures but misses others; in particular, the skull boundary (see Fig. 9.9). If we now help the algorithm by identifying several landmarks in both images (Fig. 9.10), the semi-automatic version can recover a plausible deformation, even though the landmark information alone (using e.g., thin-plate splines) would not have been enough [112].
Adding the spring term privileges likely solutions based on our a priori knowledge and makes the problem better-posed. The points need not to be imagedependent landmarks. For example, anchoring the four corners of the image prevents the solution from degenerating. In this way, the springs play in part the role of a regularization factor.
9.4.8 Implementation Issues
The purpose of this section is to describe some specific aspects of our implementation. These are mostly independent of the main philosophy of the algorithm but can have a major impact on its performance.
7 The atlas is a labeled and annotated collection of images. Courtesy of Harvard Medical
School, http://www.med.harvard.edu/AANLIB/home.html
8 We use a proton density MR image from the Visible Human project
http://www.meddean.luc.edu/lumen/meded/grossanatomy/cross section/index.html. Prior to
registration, the histogram of the test image was matched to that of the reference.

366 |
Kybic and Unser |
Figure 9.9: The reference MRI proton density brain slice from the atlas with (a) and without labels (b). The sample test slice of a corresponding region (c). The superposition (in red and green) of the two images before (d) and after the registration (e). The deformation field (f). Cubic splines were used with knot spacing of h = 32. The image size was 512 × 512 pixels. The difference between images is only partially corrected by the unsupervised registration. Misalignment of several structures is clearly visible. (Color slide.)

Elastic Registration for Biomedical Applications |
367 |
(a) |
(b) |
(c) |
(d) |
Figure 9.10: The reference (a) and test (b) images with superimposed landmarks (in red). The superimposed images after registration using the semiautomatic algorithm (c) and the deformation field found (d). Corresponding anatomical structures are well identified; the alignment is clearly superior to that in Figure 9.9. (Color slide.)
9.4.8.1 Explicit Derivatives
For the optimization algorithm, we need to calculate the partial derivatives of
E, as they form the gradient vector |
|
c E(c(i)) and the Hessian matrix |
|
|
2 E(c(i)). |
||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
c |
|
|
|
|
||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||
Starting from Eq. (9.7), we obtain the first partial derivatives |
|
|
|
|
|
|
|||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
∂ E |
|
|
1 |
|
|
|
|
|
∂ei |
|
∂ f c(x) |
x=g(i) |
∂gm(i) |
|
|
|
|
|
|
||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
= |
|
|
i |
|
|
|
|
|
|
|
|
t |
|
|
|
|
|
|
|
|
|
|
(9.12) |
||||||||||||
|
|
|
|
|
|
∂cj,m |
!I! |
|
Ib ∂ fw (i) |
∂ xm |
|
∂cj,m |
|
|
|
||||||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
as well as the second partial derivatives |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||||||||||||||||||||||
|
|
|
|
∂2 E |
1 |
|
|
|
|
|
|
|
∂2ei |
|
|
∂ f c ∂ f c |
|
|
∂ei |
|
|
|
∂2 f c |
|
∂gm ∂gn |
|
|
|
|
||||||||||||||||
|
|
|
|
|
= |
|
|
|
|
|
|
|
|
|
|
|
|
|
t |
|
|
t |
+ |
|
|
|
|
|
|
t |
|
|
|
|
|
|
|
(9.13) |
|||||||
|
∂cj,m∂ck,n |
!I! |
i |
|
Ib |
∂ fw (i)2 |
|
∂ xm |
|
∂ xn |
∂ fw (i) |
|
∂ xm∂ xn |
∂cj,m |
|
∂ck,n |
|
|
|||||||||||||||||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
||||
|
|
|
From (9.7) defining the SSD criterion, we get |
|
|
∂ei |
= 2 fw (i) − fr |
(i) and |
|||||||||||||||||||||||||||||||||||||
2 |
|
|
∂ fw (i) |
||||||||||||||||||||||||||||||||||||||||||
|
∂ |
ei |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(9.10) is simply |
|
∂gm |
|
|||||||||
|
∂ fw (i)2 = 2. The derivative of the deformation function |
|
|
|
|
|
|
|
|
∂cj,m = |
βnm (x/h − j). The deformation model is linear and all its second derivatives are therefore zero; that is the reason for the simplicity of (9.13). The partial

368 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Kybic and Unser |
derivatives of |
|
f c |
in (9.12) and (9.13) can be calculated from (9.8) as a ten- |
||||||||||||
|
∂ f c |
|
t |
|
|
|
|
|
|
|
|
|
|
|
|
sor product |
|
(x) |
= k I |
b |
|
β (xm |
− |
km) |
|
N |
βn(xl |
− |
kl ). Second-order partial |
||
t |
|
k |
|
l 1 |
|||||||||||
∂ xm |
|
||||||||||||||
|
|
|
|
n |
|
|
l =m |
|
|
||||||
derivatives of |
|
c |
|
|
|
|
|
|
|
|
= |
|
|
|
|
|
are obtained in a similar fashion. |
|
|
||||||||||||
|
|
ft |
|
|
|
|
|
|
|
3 |
|
|
|
|
9.4.8.2 Hessian Approximation
Because the evaluation of the Hessian matrix from (9.13) is costly, several modifications have been devised. The Marquardt-Levenberg approximation assumes
that the term ∂ei is negligibly small or that it sums to zero on the average. This
∂ fw (i)
reduces (9.13) to
∂2 E |
2 |
|
∂ f c ∂ f c |
|
∂gm ∂gn |
|
|||||||
|
= |
|
|
|
|
t t |
|
|
|
|
(9.14) |
||
|
|
i |
|
|
|
|
|
|
|
|
|
||
∂cj,m∂ck,n |
!I! |
|
Ib |
∂ xm ∂ xn ∂cj,m ∂cj,n |
|||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
Another simplification is to consider only diagonal terms ∂2 E/∂cj2,m. Obviously, this diagonal Hessian approximation only makes sense if the basis functions ϕj do not overlap too much. This is another argument for the B-spline model. Each such approximation makes the evaluation faster at the expense of precision which may result in slower convergence. Whether it is advantageous to use some approximation depends on many factors, including the size and the character of the data.
9.4.8.3 Gradient Calculation
Similarly to the case of evaluating the deformation g, the use of an integer step size h leads to computational savings here too. The expanded ex-
|
|
|
|
|
∂ E |
|
|
|
|
|
|
|
|
|
|
pression for |
|
|
can be transformed into a discrete separable convolution |
||||||||||||
∂cj,m |
|||||||||||||||
|
∂ E |
|
= |
i w(i)b(j · h − i) = (w b)↓h, where we have substituted w for the |
|||||||||||
|
∂cj,m |
j |
|||||||||||||
first two factors in (9.12), |
( |
q |
) |
= βnm |
( |
−q/h |
), and |
↓ h |
indicates downsampling as |
||||||
$ |
|
4 |
|
|
|
|
b |
|
|
|
|
defined by the formula, with elementwise multiplication j · h. The convolution
kernel b is separable and the convolution can be calculated as a sequence of N
unidimensional convolutions (w b1)↓h1 · · · bn ↓hN . Because of the downsampling, calculating one output value at step k consists of a scalar product with
a filter bk of length (nm + 1)hk and shifting this filter by hk.
Elastic Registration for Biomedical Applications |
369 |
9.4.8.4 Multiresolution Spline Representation
To deploy the multiresolution strategy (see section 9.4.6.2), we need to specify expansion and reduction operators. We will use the same approach for both the deformation model and the image model.
Let us consider here a 1D signal represented in a B-spline space
f (x) = |
ciβn(x − i) |
(9.15) |
|
i |
|
The expansion operator E yields a twice expanded version of f which is also
a spline
fe = E f, fe(x) = f (x/2) = diβn(x − i) (9.16)
i
with coefficients di given by
d = c↑2 un |
(9.17) |
where c↑2 denotes upsampled version of c and un is a symmetrical binomial filter defined in the z-domain as
U n(z) |
= |
(1 + z)n+1 |
z−(n+1)/2 |
(9.18) |
|
||||
|
2n |
|
The twice reduced signal f (2x) cannot be represented as a spline with knots at integers. We need to resort to approximation and we have chosen the L2 optimality as described in [109]. The reduction operator R will yield a projection
(denoted P1) in the original spline space with step size 1. |
|
|
fr = R f, fr (x) = P1 f (2x), fr (x) = i |
eiβn(x − i) |
(9.19) |
The spline coefficients ei are calculated as |
|
|
e = (h˚ c)↓2 b−(2n+1) |
|
(9.20) |
˚ = 2n+1 n 2n+1
with prefilter h b u , where b is the sequence of sampled values of a B-spline of degree 2n + 1, bn(i) = βn(i). Finally, b−(2n+1) is the inverse filter to b2n+1 and the convolution can be handled by recursive filtering, as described in [8, 9].
Because R is a projection complementary to E, we have the projection identity RE f = f . Extension of both operators to multiple dimensions is trivial thanks to separability.