Добавил:
kiopkiopkiop18@yandex.ru t.me/Prokururor I Вовсе не секретарь, но почту проверяю Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Ординатура / Офтальмология / Английские материалы / Visual Prosthetics Physiology, Bioengineering, Rehabilitation_Dagnelie_2011.pdf
Скачиваний:
0
Добавлен:
28.03.2026
Размер:
6.27 Mб
Скачать

Chapter 17

Image Analysis, Information Theory

and Prosthetic Vision

Luke E. Hallum and Nigel H. Lovell

AbstractRecent years have seen markedly improved clinical outcomes in cochlear implantees. This improvement is largely attributed to improvements in speech processing algorithms. In light of these improvements, researchers are prompted to ask, “Could image analysis improve clinical outcomes in retinal implantees?” We discuss our approach to image analysis, microelectronic retinal prostheses, and the perception of low-resolution images, which we believe can be used to help constrain the design of an implant. We hope that our approach, and developments thereof, will ultimately contribute to improved clinical outcomes in retinal implantees.

Abbreviation

APRL Artificial preferred retinal locus

17.1  Introduction

It makes intuitive sense that the cochlear implant involves a speech processor. This processor is typically implemented in programmable microelectronics that the subject wears behind the ear; it lies between the device microphone and the electrode array implanted in the subject’s cochlea. The processor analyzes incoming acoustic signals, deriving parameters that determine the current waveforms injected at each electrode. Recent years have seen markedly improved clinical outcomes in cochlear implantees. This improvement is largely attributed to improvements in speech processing algorithms [9]. In light of these improvements, researchers are prompted to ask, “Could

L.E. Hallum (*)

Graduate School of Biomedical Engineering, University of New South Wales, ANZAC Parade, Sydney 2052, Australia

and

Center for Neural Science, New York University, New York, NY 10003, USA

e-mail: hallum@cns.nyu.edu

G. Dagnelie (ed.), Visual Prosthetics: Physiology, Bioengineering, Rehabilitation,

343

DOI 10.1007/978-1-4419-0754-7_17, © Springer Science+Business Media, LLC 2011

 

344

L.E. Hallum and N.H. Lovell

image analysis – analogous to the cochlear implant’s speech processing – improve clinical outcomes in retinal implantees?” This chapter is concerned with image analysis and the perception of low-resolution images.

In this chapter we review two studies of ours [6, 8]. In the first study prosthetic vision was simulated via low-resolution images that we refer to as “phosphene images.” Normally sighted subjects were required to track a moving target. The phosphene images were rendered using three image analysis schemes, and we examined the effects of each scheme on subjects’ tracking performance. The second study was numerical. We used information theory to quantify the amount of information contained in the phosphene images, and how different image analysis schemes affect this information. This numerical approach ultimately served as a reasonably good model of subjects’ tracking performance in [8]. Further, we used this numerical approach to suggest how image analysis schemes could be further improved.

The two above-mentioned studies help constrain the design of a retinal prosthesis. They directly address the topic of image analysis and its integration with a retinal implant. They effectively ask, “How should an implant process incoming images?” This contribution is taken up in the Sect. 17.5. There, we also discuss generalizing the approach to visual tasks beyond fixation, saccade, and pursuit, for example, visual acuity tasks and reading. We begin this chapter by situating image analysis, functionally speaking, with respect to the prosthetic device as a whole, and by describing the experimental framework within which low-resolution perception is investigated.

17.2  Situating Image Analysis

The major components of a retinal prosthesis, as it is envisioned and prototyped by a number of groups [4], are the camera, the image analyzer, the communication link and the implant, as discussed elsewhere in this volume. Physically, the image analyzer is external to the body and, functionally, lies between the acquisition of highresolution images of the world (“scenes”) and the generation of signals for transmission to the implanted component of the device (via the communication link). The camera, worn on spectacles by the subject, captures high-resolution spatiotemporal images of the real world. The image analyzer is implemented in programmable microelectronics, a digital signal processor, worn on the body. It processes data captured by the camera, effectively converting scenes to an array of numbers which determine the current waveforms at each electrode. These numbers are delivered to the implant by the communication link. As with the cochlear implant, this link ideally comprises radio-frequency signals transmitted transcutaneously; as opposed to a percutaneous link, a radio-frequency link poses a lesser risk of infection and allows the external unit to be physically uncoupled from the body. The implanted analog electronics decode the incoming radio-frequency signals and drive an array of electrodes. These electrodes are embedded in a flexible substrate that is affixed to the inner layer of the retina. Recent clinical trials involve arrays of 60 electrodes (Argus™ II, http://clinicaltrials.gov/ct2/show/NCT00407602).

Here we will step the reader through the processing stream that generated the phosphene image in Fig. 17.1. First, the high-resolution image (left panel) was filtered.

17  Image Analysis, Information Theory and Prosthetic Vision

345

Fig. 17.1A basic simulation of prosthetic vision. The high-resolution image (left) is filtered and subsampled. The samples are used to modulate the sizes of luminous phosphenes which, together, comprise the phosphene image (right). Image source: http://pics.psych.stir.ac.uk/

This filtering anti-aliases the resulting phosphene image, and may be used to emphasize salient features of the high-resolution image. Second, the filtered image was sampled at 400 locations, each pertaining to a phosphene. Each of these samples was then quantized to one of 16 levels, and used to modulate the size of its corresponding phosphene. Quantization makes for more accurate simulation since, as with the cochlear implant, the microelectronic retinal prosthesis is likely to elicit only a small number of different percepts at each location in the visual field.

17.3  The Experimental Framework

The right-hand panel in Fig. 17.1 shows a phosphene image of a face. This image depicts the sort of vision that the microelectronic retinal prosthesis aims to provide the otherwise profoundly blind implantee, that is, a relatively small number of discrete, luminous blobs. Phosphene images like this one, and also phosphene images that vary over time, may be used in psychophysical experiments and presented to normally seeing observers. These experiments draw on well established, psychophysical methods in vision research wherein perception is inferred through the measurement of subjects’ behavior. For example, the reading speed of subjects presented low-resolution, phosphenized text could be measured. In this way, experimenters may use simulation data to predict clinical outcomes in actual implantees. This approach, which we call “visual modeling,” is analogous to “acoustic modeling” of cochlear implants wherein colored noise bands, modulated

346

L.E. Hallum and N.H. Lovell

by speech, or in recent cases by music, are played to normally hearing listeners. These listeners form a more readily accessible cohort than one comprising actual cochlear implantees. The behavior of these normally hearing listeners is then used to investigate improved signal processing strategies and electrode array designs.

Visual modeling is an experimental framework that allows us to test predictions concerning image analysis, microelectronic retinal prostheses, and the perception of low-resolution images (for discussion, see [7]). It has been used extensively in the field since the early work of Cha [2], who argued that “three parameters are important in determining the quality of a pixelized image: the number of pixels, their density, and their range of intensities.” By contrast, we are using the approach to test image analysis schemes.

17.4  Tracking a Low-Resolution Target

Several years ago we wondered whether image analysis could improve the performance of the phosphene image observer [8]. To this end, we conducted a visual modeling experiment that compared the effects of three different image analysis schemes on subjects’ performance. The task involved the fixation of, saccading to, and the pursuit of a small, high-contrast target. The first image analysis scheme that we investigated, referred to as scheme Q0, was trivial: images were not preprocessed but simply down-sampled. This scheme allowed for the comparison of Kichul Cha’s results [2] with our results. The second scheme (Q1) blurred images using a uniform-intensity filter kernel, that is, the spatial equivalent of a boxcar filter. The spatial width of the kernel was identical to the spatial separation of phosphenes. This scheme allowed for the comparison of our results and the widespread approach to visual modeling which uses uniform-intensity kernels [5, 10]. The third scheme (Q2) involved pre-filtering with a Gaussian kernel. The standard deviation of the kernel was equal to one-third of the separation of phosphenes. We were interested in this kernel due to the nature of a Gaussian: it is dually compact in the spatial and Fourier domains, and is often used to model components in the early visual system of mammals. The standard deviation of this Gaussian, however, was chosen without quantitative reasoning. We hypothesized that scheme Q2 would afford subjects improved performance as compared to Q1.

For complete details as to the experiment the reader is referred to the original publication [7]. Here we summarize the details. A computer monitor was viewed from a fixed distance by 20 subjects each trained for 3 h. A hexagonal array of 23 phosphenes was freely viewed. Phosphenes were separated by approximately 1°, and therefore excited retinal loci separated by approximately 300 mm. Phosphenes were size-modulated (see Fig. 17.1), and of fixed intensity (white). The phosphene array was moveable via a joystick; subjects were required to track a moving target (a small, white square on a black background) using the central phosphene of the array. The target initially appeared in the center of the monitor (for fixation), and after a short, random interval it jumped (eliciting a visuomanual saccade in the

17  Image Analysis, Information Theory and Prosthetic Vision

347

subject) and then described a randomly generated, S-shaped course into the monitor’s periphery (eliciting pursuit) at an average velocity of approximately 2°/s. Trials were counterbalanced so as to assess tracking for each of Q0, Q1, and Q2. Note that the setup, whilst good for examining the issues at hand, measured visuomanual behavior. Ultimately, prosthetic visual fixation, saccade, and smooth pursuit will likely involve head movements (of a head-mounted camera) and a stabilized retinal image. Thus we take several of the statistics of the measured behaviors as a model for what will ultimately involve head motion and stabilized retinal images.

Our data showed that, indeed, image analysis had a significant effect on the tracking performance of subjects. After practice, schemes Q1 and Q2 made for superior performance in all tasks (fixation, saccade, and pursuit) as compared to Q0. Scheme Q2 made for improved fixation accuracy of 35.8% (8.3 min of visual arc) as compared to scheme Q1 (as measured by mean deviation from the target), and for improved pursuit accuracy of 6.8% (3.3 min of visual arc). Schemes Q1 and Q2 made for comparable saccade accuracy. These results suggest that image analysis, when functionally integrated with a prosthetic device, can be programmed so as to result in improved visual outcomes in implantees. Furthermore, the results advocate a scheme of Gaussian kernels for fixationand pursuit-related tasks.

The scanning data from the above-described experiment, that is, the way that subjects moved the phosphene array relative to the moving target, were also of interest. These data made us think about preferred retinal loci, and whether implantees would use some phosphenes in their visual field in preference to others. Hence, we coined the term “artificial preferred retinal locus” (APRL).1 In the case of scheme Q0, which afforded subjects poor tracking accuracy, subjects adopted nystagmus-like scanning behaviors, that is, vigorous and wide-ranging scanning. This was apparently an attempt to effectively increase the spatial sampling rate of the array and in doing so render the phosphene image more informative as to the moving target’s location. In this case, the associated APRL may be modeled as a bivariate function, centered on the array’s central phosphene. That function is uniform in intensity, covering an area that encompasses most of the 23-phosphene array.

The scanning associated with scheme Q1 was similar to that of Q2. For those schemes, subjects adopted scanning that had dynamics approximately equal to the target’s motion. In terms of an APRL, the bivariate function was centered on the central phosphene of the array, and was normal with a standard deviation equal to approximately 0.475°. Since tracking for Q2 was more accurate than tracking for Q1, the APRL associated with scheme Q2 was relatively narrow. See Fig. 17.2. We believe that scanning and modeling the APRL have important implications for developing a model of the phosphene observer, which we discuss further below.

Subsequently to the above-described experiment, we wondered, “Is scheme Q2 somehow optimal?” To address this question, we drew upon techniques in communication theory [6]. Specifically, we used the mutual-information function to

1 For example, see the way in which sufferers of scotoma develop new preferred retinal loci in the vicinity of the fovea [11].

348

L.E. Hallum and N.H. Lovell

Fig. 17.2Raw and analyzed scanning data from a single trial [8]. (a) Subjects were required to track a small, moving target with a phosphene array. The target’s motion is shown by the thin, solid line. The target was initially stationary at the center of the monitor. The target then stepped (down and right by approximately 2°) before following an S-shaped curve to the monitor periphery. The thick, dotted line shows the subject’s tracking. Specifically, the thick, dotted line shows the location of the center of the phosphene array. Each dot shows the position of the array center at successive points in time. (b) The scanning signal (target position minus tracking position) from the trial in (a). The average of this signal across trials and subjects is well modeled by a bivariate normal, indicated by the histograms for this trial

measure the amount of information contained in phosphene images, and how that information differed with different image analysis schemes. We used a numerical setup. We presented targets to the phosphene image in a way that was consistent with scanning behaviors, that is, the APRLs, found in [8]. These mutual-information measurements were then reconciled with the tracking performance. We found that, to an extent, the mutual-information function was a good model of tracking performance.

As a quick aside, the following paragraphs canvas the mutual-information function [1]. The mutual-information function is typically applied to communication channels, like the one shown in Fig. 17.3. There, a time-varying signal, x(t), is input to a noisy channel, Q, and output in modified form as y(t). The mutual-information function can be used to measure the information that y(t) carries about x(t). In other words, after receiving the signal y(t), what is consequently known about the signal x(t)? Mutual information is typically used to assess the nature of the channel, Q.

The mutual-information function is written as follows:

I( p( x(t)); Q) = H( y(t)) H( y(t) | x(t)).

17  Image Analysis, Information Theory and Prosthetic Vision

349

Fig. 17.3A simple communication channel. The input signal, x(t), to the noisy channel, Q, results in the output signal, y(t)

The symbols I and H may be read as “information” and “uncertainty” respectively. Note that information is a function of the probability density that describes the input, p(x(t)). Information is also a function of Q, that is, the nature of the channel. Information is equal to the receiver’s reduction in uncertainty regarding the channel input after having received the channel output (the term y(t)|x(t) may be read as “the input after having received the output”).

To illustrate the use of the mutual-information function, and its interpretation, consider the following example. A discrete random series, X, may take values a, b, c or d. All four values occur independently and with equal probability ( p = 0.25). The series X generates an output series, Y. Let’s suppose that inputs a and b generate output symbol e, and inputs c and d generate output symbol f. That is the nature of the channel, Q, which maps inputs to outputs. Using the mathematical definition of uncertainty, H, we can show that the information, I( p(X ); Q), contained in each output symbol is 1 bit. That is, I( p(X ); Q) = 1 bit/symbol. One bit of information effectively affords the receiver one “yes/no” decision. In other words, 1 bit of information halves his/her uncertainty regarding the input, X. This makes intuitive sense: After receiving the value f, he would consequently know that either c or d were input to the channel. Prior to receiving f, he could do no better than guess that the input was a, b, c or d. Upon receiving f, his uncertainty was halved. For this example, the information carried by an output symbol can be increased if the specificity of the channel, Q, is increased. In other words, it is desirable if the output describes the input with less ambiguity. If inputs a and b generate the output e, whilst c generates f and d generates g, then we can arrive at an average information measure of 1.5 bits/symbol.

The simple communication channel of Fig. 17.3 is analogous to the abovementioned process that converts high-resolution images of a small, moving target to phosphene images. In that process, the target varies in space, x(s), where s is a vector denoting space. The target is input to the image analysis scheme, Q, which, here, is a spatial filter, not a temporal one. The phosphene image, y(s), is analogous to an output symbol, and is rendered according to the output of the analysis scheme, Q. We developed this analogy in [6] by using the scanning model, that is, the APRL, as p(x(s)). See Fig. 17.4.

We found that image analysis scheme Q2 imparts more information to the phosphene image than Q1 [6]. Specifically, Q2 imparts approximately 5 bits/ image as compared to Q1’s 1 bit/image. This improvement of 4 bits/image is the case when the target is presented to the phosphene image in a spatial pattern corresponding to the scanning behavior found in [8]. Those scanning behaviors were

350

L.E. Hallum and N.H. Lovell

Fig. 17.4The set-up of the numerical experiment involving a seven-phosphene image [6]. The high-resolution image (left) comprises a small, high-contrast target. The image is analyzed by the image analysis scheme (middle). The scheme comprises seven identical, Gaussian filter kernels; the circles indicate the first and second standard deviations of the kernels. Each kernel operates on the image and produces a response. Each response modulates the size of a phosphene comprising the phosphene image (right). In the example shown, the target activates two phosphenes since its location is “seen” by those two phosphenes, but no other phosphenes. For clarity, the phosphenes that are not activated are indicated with dashed outlines

modeled by a bivariate normal with standard deviation equal to 0.25 times the phosphene-to-phosphene spacing. This finding provides some theoretical reasoning for subjects’ accuracy in [8]: Q2 afforded better fixation and pursuit accuracy because it provides 4 bits/symbol more information to the phosphene image observer.

Furthermore, our numerical experiment suggested that the image analysis scheme Q2 can be improved [6]. Specifically, the prediction is that, by using Gaussian kernels with standard deviation equal to 0.6 times the separation of phosphenes, performance will be further improved. This new scheme would make for approximately 8 bits of information in the phosphene image, assuming that subjects’ scanning behaviors were unchanged. Indeed, the numerical experiment provides a prediction: that an image analysis scheme comprising Gaussian kernels with standard deviation 0.6 times the separation of phosphenes should afford superior tracking as compared to Q2. Experimental work that examines this prediction, and similar predictions, will be the subject of future work.

How are these measures of information to be interpreted? As discussed above, 1 bit of information affords the phosphene image observer a single “yes/no” decision. For localizing a target, a single bit of information effectively allows the observer to divide the visual field into two halves and ask, “Which half of the visual field contains the target?” Therefore, when considering scheme Q1, affording the phosphene image observer 1 bit of information per image is akin to informing the observer whether the target lies within the left half, or the right half, of the area covered by the phosphene array or not. This being the case, we would expect the observer to deviate, on average, from the target by approximately one-quarter of the diameter of the array. In contrast, scheme Q2, affording the phosphene image observer 5 bits of information per image is akin to informing