Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A Computational Model for Color Perception

A Computational Model for Color Perception Color is not a physical quantity of an object. It cannot be measured. We can only measure reflectance, i.e. the amount of light reflected for each wavelength. Nevertheless, we attach colors to the objects around us. A human observer perceives colors as being approximately constant irrespective of the illuminant which is used to illuminate the scene. Colors are a very important cue in everyday life. They can be used to recognize or distinguish different objects. Currently, we do not yet know how the brain arrives at a color constant or approximately color constant descriptor, i.e. what computational processing is actually performed by the brain. What we need is a computational description of color perception in particular and color vision in general. Only if we are able to write down a full computational theory of the visual system then we have understood how the visual system works. With this contribution, a computational model of color perception is presented. This model is much simpler compared to previous theories. It is able to compute a color constant descriptor even in the presence of spatially varying illuminants. According to this model, the cones respond approximately logarithmic to the irradiance entering the eye. Cells in V1 perform a change of the coordinate system such that colors are represented along a red-green, a blue-yellow and a black-white axis. Cells in V4 compute local space average color using a resistive grid. The resistive grid is formed by cells in V4. The left and right hemispheres are connected via the corpus callosum. A color constant descriptor which is presumably used for color based object recognition is computed by subtracting local space average color from the cone response within a rotated coordinate system. KEYWORDS: Color Perception, Computational Theory, Color Constancy, V4 M. Ebner 1. Introduction Color is not a physical quantity which can be measured. Yet we attach it to the objects around us. A human observer perceives colors as being approximately constant irrespective of the illuminant which is used to illuminate the scene. Colors are a very important cue in everyday life. We use colors to recognize or distinguish different objects. Some colors, e.g. red, are used to focus attention (ripe fruit) or to communicate important messages, e.g. an immediate danger. Color would not be useful as a signaling mechanism if the perceived color of an object would vary with the color of the illuminant used. The color of an object would not even stay constant during the course of the day because the color temperature varies during the day. That is why we need a mechanism for color perception which somehow computes a color constant descriptor from the light which is reflected from an object. Several theories for color perception have been put forward. However, many are basically phenomenological descriptions of what color vision does. Phenomenological in a sense that these theories do not explain how the computations are actually performed by the brain. What we need is a computational description of color perception (i.e. down to the neural level) (Ebner 2007c). We have only understood how the visual system works if we are able to write down a full computational theory of this system. With this contribution we provide a computational theory of color perception which can be mapped to what is known about the visual system. The main contribution of this article is to (a) present a computational model for color constancy and (b) to show how the individual states of this model are mapped to what is known about the human visual system and (c) summarizes which visual phenomena are explained by this model. We will first provide some background on the theory of color image formation followed by a brief review of several important color constancy algorithms from the machine vision community. We then show how a local estimate of the illuminant can be computed iteratively using either a grid of processing elements (neurons) or a resistive grid. This estimate is obtained by computing local space average color. Finally, we show how one can apply the concept of the gray world assumption (Buchsbaum 1980), a well known color constancy algorithm, within the context of color shifts (Ebner 2004a). The use of color shifts by the human visual system is supported by psychophysical experiments. Our results lead to a computational theory of color perception. 2. Theory of Color Image Formation In order to understand how this computational theory works, we first have a look at a how a color image is formed. Suppose that we are looking at an object which is illuminated by a light source. The incident light is reflected with varying amounts depending on the wavelength of the incident light. We can measure the reflected light using a measuring device such as a digital or analog camera or a spectrometer. Let R() be the percentage of the reflected light at wavelength and let E() be the irradiance at wavelength then the reflected light is given as ( ) ( ) ( ) The reflected light varies with the amount of incident light E(). If we measure the reflected light over the visible spectrum and also know the irradiance, we can compute the reflectance R() for each wavelength . This signature is a physical quantity of the object. When we look at an object, light, which is reflected from the object, enters the eye and is measured by the receptors inside the retina. Two types of receptors exist, rods and cones (Dowling 1987). The rods are used when very little light is available. They have a much higher sensitivity compared to cones (Fain and Dowling 1973). The cones are used in bright light conditions. Three different types of cones can be distinguished which respond to light in the short, middle and long parts of the spectrum (Brown and Wald 1964; Marks et al. 1964). The blue, green and red cone pigments peak at 419 nm, 531 nm, and 559 nm respectively. There appears to be some variance in the sensitivities of the red and green cones. In order to develop a computational theory for color perception, it pays to take a look at machine vision. Note that in machine vision one usually tries to estimate reflectance whereas in computational modeling of color perception one tries to replicate how colors appear to a human observer. Even though human color perception correlates with integrated reflectance (McCann et al. 1976) the two problems are quite different. If we take a photograph of a scene, the digital camera measures the energy of the light, which is reflected from the objects contained in the scene, in three different parts of the spectrum. The energy is measured in the short (blue), middle (green) and long (red) parts of the spectrum. Analog film can also be considered to be a measuring device with which light is measured in three parts of the spectrum. The measured energy depends on the sensitivity of the receptor or sensor. Let S() be the sensitivity of sensor i with i {r,g,b}, then the measured energy Qi is essentially given by ( ) ( ) M. Ebner The response curves of an artificial sensor are usually modeled to have similar response characteristics as the receptors of the human retina. Once the reflected light is measured using three sets of cones or three types of artificial sensors the result is a point in a three-dimensional space. We refer to this point as the cone response. The position of this point inside this three-dimensional space varies with the type of illuminant used to illuminate the object. Suppose that we are looking at a wall which reflects the incident light uniformly across all wavelengths, henceforth referred to as a white wall. If we illuminate the wall using white light, i.e. the irradiance is uniform over all wavelengths, then all of the sensors will respond equally strong. Suppose that we now illuminate the same wall using light from a candle. A candle emits light mostly in the red and green part of the spectrum. The sensor covering the red part of the spectrum will respond very strongly, the sensor covering the green part of the spectrum will also respond and the sensor covering the blue part of the spectrum will hardly respond at all. Thus, the same wall will have a completely different color. In the first case, the white wall will appear to be white whereas in the second case the white wall will appear to be yellow or orange. We have just seen that the cone response varies with the type of illuminant used. In our model, a sensor at position (x,y) is used to measure the energy Q of the reflected light at wavelength . This energy is proportional to the object reflectance R(x,y,) and is also proportional to the irradiance E(x,y,) falling onto the object depicted at position (x,y), i.e. we have ( ) ( ) ( ) for wavelength . The response of a sensor is obtained by also taking into account the sensitivity of the sensor and by integrating over all wavelengths to which the sensor responds. ( ) ( ) ( ) ( ) Let us first suppose that the cone response ci(x,y) depends linearly on the measured energy, i.e. we assume that ( ) ( ) The cone response actually seems to depend logarithmically on the measured energy. We will get back to that later on. The cone response, i.e. the measured energy, varies with the type of illuminant used. This is apparent to many amateur photographers all around the world. The images produced by a digital camera sometimes do not show the colors the same way a human observer perceived them to be. Sometimes the images have a very strong color cast because the automatic white balance did not estimate the color temperature of the illuminant correctly. Professional photographers are well aware of how the illuminant changes the overall look of an image. In the times of analog photography, filters were used to change the color balance (Hedgecoe 2004; Jacobsen et al. 2000). Today, application of different filters can be simulated by either setting the white balance on the camera or during post-processing. With post-processing, it is possible to change the color balance such that the resulting photograph looks more natural, i.e. as if it were taken under a canonical illuminant with a uniform power distribution. The color observed by a human observer stays remarkably constant irrespective of the illuminant used (Zeki 1993). Similarly, the lightness of an object also appears to be constant (Adelson 2000). The ability to perceive colors as approximately constant is called color constancy (Ebner 2007a). Going back to our example of the white wall, a human observer will say that the wall is white even if more light is reflected in the red and green part of the spectrum compared to the light reflected in the blue part of the spectrum. Land (1964; 1974) investigated this phenomenon in great detail. Is is obviously of great interest to develop a computational model of the brain (Koch 1999) in general and color perception in particular. If we had such a model, we could use it to improve color reproduction in digital photography. We could also use it for automatic object recognition in robotics or computer vision. We need to derive an algorithm which computes a color constant descriptor which remains constant (or at least approximately constant) irrespective of the illuminant used. This algorithm has to be mapped to what is known about the visual system. A number of theoretical models have been proposed for color perception, e.g. (Judd 1940; Richards and Parks 1971; Dufort and Lumsden 1991). However, such psychophysical models of color perception do not explain how or why the color perceived by an observer would depend on either average apparent reflectance or the average luminance. Instead of a phenomenological description of color vision we need a computational theory of color vision (Marr 1982) which allows us to simulate color perception computationally. Since color constancy is very important for the machine vision community and also for the image processing community, quite a large number of color constancy algorithms have been developed. Ebner (2007a) gives a detailed introduction into the field of computational color constancy. Surveys are also given by Maloney (1999) and Agarwal et al. (2006). Land and McCann (1971) developed the Retinex theory. Since its original inception many variants of the Retinex theory have been proposed (Horn 1974; Land 1986; Hurlbert 1986; Blake 1985; Rahman et al. 1999). Buchsbaum (1980) developed the gray world hypothesis. Van de Weijer et al. (2007) generalized the gray world assumption and introduced the gray edge hypothesis. Additional algorithms include gamut constraint methods (Forsyth 1988; Finlayson 1996; Barnard et al. 1997), color cluster rotation M. Ebner (Paulus et al. 1998), comprehensive normalization (Finlayson et al. 1998) as well as the computation of intrinsic images (Finlayson et al. 1998; Tappen et al. 2002). It appears that human color constancy is also using cues from mutual reflections or specular highlights (Hurlbert 1999). Algorithms based on a dichromatic reflection model, which also takes a specular reflection into account, have also been developed (Tominaga 1991; D`Zmura and Lennie 1992; Finlayson and Schaefer 2001; Risson 2003; Ebner and Herrmann 2005). In order to compute a color constant image from the available data, some assumptions have to be made. Otherwise, the problem of computing the reflectances cannot be solved because we only have three known values (the cone responses) but more than three unknowns for every image pixel. A popular assumption is that the receptors only respond within a very narrow band. If the sensors are assumed to respond to only a single wavelength, then the integration is removed and the model of the response of the sensor is greatly simplified. Another frequently made assumption is to say that the illuminant is approximately constant within the image. This is true if the light source is located far away from the objects. In practice, this assumption may be violated. Quite often, we have multiple illuminants such as sunlight falling through a window and also artificial illuminants within a room which are located near other objects. Several algorithms exist which also work in the presence of spatially varying illuminants (Land and McCann 1971; Land 1986; Faugeras 1979; Parker 1997; Finlayson and Hordley 2001; Barnard et al. 1997; Ebner 2004b). Most color constancy algorithms which have been developed by the image processing community are quite complex. It is not clear whether they can be mapped to what is known about the human visual system. Notable exceptions are the Retinex algorithm of Land and McCann (1971) together with the variants of Horn (1974), Blake (1985) and Rahman et al. (1999) and the parallel algorithms developed by Ebner (2004b; 2007b). All of these algorithms also work in the presence of a spatially varying illuminant. Before we describe our computational model of color perception, we briefly discuss relevant algorithms from the image processing community. 3. Algorithms for Color Constancy Since the problem of color constancy is of considerable practical interest, quite a large number of color constancy algorithms have been developed. Land (1964) formulated very early on the Retinex theory. According to the Retinex theory, three different receptors are used in the retina which primarily respond to long, middle and short wavelengths. Thus, we have three sets of receptors each measuring the energy in different parts of the visible spectrum. Land suggested that the sets of receptors process their data independently. At the time, it was not known whether the processing is done in the retina or cortex. This seems to have been resolved (Land et al. 1983). The processing most likely occurs inside the cortex. We will return to this point later. Land and McCann (1971) probably developed the first computational theory for color constancy. They showed that a color constant descriptor can be computed by multiplying ratios between adjacent photoreceptors across the image. This is called a sequential product. Land and McCann suggest that a color constant descriptor is computed using multiple paths through an image. They assume a logarithmic response of the sensors. In this case, the sequential product turns into a sum. Whenever the sequential product gets larger than zero, it is reset to zero. The result from different paths which pass through the current point is averaged in order to estimate its reflectance. This operation averages the reflectance ratio between the reflectance of the current point and the reflectance of random samples surrounding the point (Brainard and Wandell 1986; Hurlbert 1986). If it is assumed that all paths pass through all of the image pixels, then the algorithm just described, is simply a normalization with respect to the response at the largest location (Brainard and Wandell 1986). Such a normalization with respect to the maximum response of the sensor is also known as the white patch Retinex algorithm. However, as pointed out by McCann (1989), the Land and McCann formulation of the Retinex algorithm does not use an infinite number of iterations and hence its behavior is quite different from the simple white patch Retinex algorithm. Horn (1974) has developed a two-dimensional algorithm operating on a grid. He suggested an iterative method to obtain the re-integrated image data. He also noted that a resistive grid may be used. This algorithm is shown in Figure 1. Blake (1985) proposed a slight modification to Horn's algorithm. The Laplacian is separated into two gradient operations and the threshold operation is applied in between the two gradient operations. A major drawback of the algorithm of Horn and the algorithm of Blake is that a threshold is required in order to determine whether a change between adjacent sensors is caused by a change in reflectance or by a change of the illuminant. In practice it is very difficult to set this threshold correctly. Land (1986) suggested an alternative variant of the Retinex theory. He proposed to compute color constant descriptors by first taking the logarithm of the input. Then the logarithm of the average color of the surround is subtracted from the logarithm of the input. The average color of the surround is computed using samples which are distributed around the given point. The density pattern of the samples varies as 1/r2 where r is the radius measured from the position of the current point. In other words, the color constant descriptor oi is computed as ( ) ( ) ( ( / )) where denotes convolution. M. Ebner Figure 1. Algorithm of Horn. First the logarithm is applied to linear data. Then the Laplacian is computed and a threshold operation is applied. A color constant descriptor is obtained after an integration step using a resistive grid. The algorithm shown is assumed to work on each color channel independently. Frankle and McCann (1980) extended the Retinex Algorithm to work on multiple resolutions. Rahman et al. (1999) used Gaussians of different sizes to compute a series of blurred images to perform color correction on multiple scales. The method is also used for dynamic range compression. They use Gaussians of different sized to compute a series of blurred images. The logarithm is applied to both the blurred images and to the input image. A set of weights may be used to enhance or lower the influence of a particular Gaussian. Instead of a Gaussian, other smoothing kernels may also be used. Is is also possible to take the logarithm before applying the Gaussians. This results in the computation in a weighted product instead of a weighted sum. Many color constancy algorithms known from the literature are based on the gray world assumption or the generalized version, the gray edge hypothesis, in one way or another. Land, in his alternative formulation of the Retinex algorithm, suggested that the logarithm of the average color of the surround is subtracted from the logarithm of the input in order to compute a color constant descriptor. Land (1986) assumes this average is obtained by averaging input from several receptors. The algorithm of Rahman et al. (1999) also requires some form of averaging. The image is blurred using a convolution. The gray world assumption is due to Buchsbaum (1980). According to Buchsbaum, on average, the world is gray. Buchsbaum worked with overlapping response characteristics of the sensors. However, it is easier to understand how the gray world assumption works by assuming that the sensor are very narrow band, i.e. they respond only to a single wavelength. Let ci(x,y) be the measured response at position (x,y) of the image for color channel i, i.e. wavelength . This data is proportional to the reflectance Ri(x,y) at the corresponding object point and the irradiance Li(x,y) at the corresponding object point. ( ) ( ) ( ) In order to obtain an estimate of the illuminant, Buchsbaum assumed that the illuminant is constant over the entire image. In this case, we have Li(x,y) = Li which gives us ( ) ( ) We now see that the illuminant Li scales the reflectances. Once we have an estimate of the illuminant, we can compute a color constant descriptor by dividing the measured color by this estimate. The estimate of the illuminant is obtained by computing space average color over all image pixels. Global space average color a of an image with n pixels is given by ) ( ) ( )] [ ( with ( ) ( ) ( ) In order to solve this equation for Li , we need to make some additional assumptions. We are assuming that the scene we are looking at is going to contain several differently colored objects. We do not know anything about the objects which will be contained in the image. Therefore, we are going to assume that all possible colors of the objects are equally likely. In other words, we are assuming that the reflectances are uniformly distributed over the range [0,1]. If the image contains a sufficiently large number of differently colored objects, then we can replace the average reflectance by its expected value. [ ( )] Once we substitute this into the above equation, we immediately see how the color of the illuminant can be estimated from global space average color. We can obtain a color constant descriptor oi(x,y) by dividing the cone response by twice the global space average color. ( ) ( ) ( ) ( ) M. Ebner Actually, it is sufficient to simply divide the cone response by global space average color to obtain a color constant descriptor. The constant factor 2 just scales the result. This method is often used by machine vision algorithms assuming linear sensor responses once the illuminant has been estimated. 4. Human Color Perception As of now, we do not know yet how the human visual system actually computes color constant descriptors. However, it may soon be possible to develop tests which validate or invalidate the different theories. Here, a computational model for color constancy is presented and it is shown how the individual states of this model are mapped to what is known about the human visual system. Below, we will show how this model explains several different visual phenomena. We do know that visual area V4 is very important for color perception. Inside V4, cells have been found which respond to a particularly colored patch irrespective of the illuminant used (Zeki 1993; Zeki and Marini 1998). Area V4 can be subdivided into two sub-areas V4 and V4 (Zeki and Bartels 1999). The difference between these two areas is that V4 seems to have a retinotopic organization whereas area V4 does not have a retinotopic organization. The receptive fields of cells found inside visual area V4 is rather large. Some of these cells may be involved in the computation of some kind of average, i.e. local or global space average color. That would explain the reason why their receptive field is very large. Ebner (2004a; 2004b; 2006) has shown that local space average color can also be computed iteratively. This average may then be used as a local estimate of the illuminant as described below. Even though only local connections are required, as we will see below, we need to exchange data between the two hemispheres of the brain in order to compute local space average color across the vertical meridian. This would require that cells along the vertical meridian are connected somehow. V4 is the first visual area having callosal connections, thus enabling information exchange between the left and right hemispheres. We will see below that local space average color can be used to compute a color constant descriptor. The use of local space average color versus global space average color has the advantage that a color constant descriptor is also obtained in the presence of a non-uniform smoothly changing illuminant. Local space average color could be subtracted in V4 from a descriptor which is essentially based on the cone response but is located inside a different, rotated coordinate system. The output would be a color constant descriptor which responds to the color of an object irrespective of the wavelength composition of the light reflected by the object. Experiments done by Helson (1938) point to the use of color shifts. His experiments have shown that the background will have an impact on the perceived color of a gray sample. If the patch is brighter than the background it will appear to have the color of the illuminant. If the patch is darker than the background, then it will have the complementary color of the illuminant. This points to the use of color shifts. We will now describe how local space average color can be computed using a grid of processing elements, i.e. neurons. 5. Iterative Computation of Local Space Average Color Suppose that we are given a grid of processing elements, i.e. neurons. We assume that we have one processing element per image pixel and that each processing element is connected to its nearest neighbor. The neighborhood of each processing element is defined using N(x,y) which is the set of processing elements connected to the element located at position (x,y) of the image N(x,y)={(x',y')|(x',y') is neighbor of element (x,y) and c(x,y)>} where is 0 or another small value. The task of each processing element is to compute local space average color a(x,y). ( ) [ ( ) ( ) ( )] Suppose that each processing element already has an estimate of local space average color a(x,y). At this point, the value of this estimate is arbitrary. We will see below why the initial estimate may be arbitrary. One then iterates the following update equations. ( ( ) ) | ( ( ) )| ( ) ( ) ( ) ( ) ) A small percentage p is used for the second update operation. The first operation reads local space average color stored in neighboring elements and then averages this data provided that the measured color is larger than a threshold , e.g. 0 or another small value. The second operation uses the color measured at the current element, and slowly fades this color into the average. The second operation is simply a weighted average between the measured color and the data averaged from neighboring elements. The result will be a new and better estimate of local space average color. This process converges to local space average color while the original content slowly fades M. Ebner away. The data basically diffuses between neighboring elements. The properties of this diffusion process are determined by the parameter p. For a stationary input c(x,y), the result will be independent of the initialization a(x,y)$. The parameter p defines the extent over which local space average color will be computed. If we set p to a small value, then local space average color will be computed over an extensive area. If we set p to a relatively large value, local space average color will be computed over a very small area. The computation performed by averaging data from neighboring elements and then adding some measured data is similar though not identical to the convolution of the measured data with an exponential kernel (Ebner 2007a). ( ) ( ) | | | | The scaling parameter can be computed from the parameter p as follows. If we convolve the input image with an exponential kernel and scaling parameter , we obtain an image which is very similar to the image which is computed by the grid of processing elements and the algorithm described above using the parameter p. When we look at the computations which are performed by the grid of processing elements we see that relatively simple computational operations are required. In fact, it is possible to use a resistive grid to compute local space average color. Such a resistive grid is shown in Figure 2. Adjacent points of the grid are connected with resistance R. The input is fed into the system from the top. An input resistance R0 is used which connects the input and the grid points. The output of the resistive grid is available at the node points. It is simply a spatially smoothed version of the input image. The relationship between the parameter p and the input resistance R0 and the resistance R which connects neighboring grid points is given by R0/R=(1-p)/4p. In the model, cells located within V4 are resistively coupled in order to obtain a measure of the illuminant. Cells within V4 of the left and right hemisphere have to be resistively coupled across the vertical meridian. It is assumed that this happens through the corpus callosum. If these connections are severed, local space average color would be computed within each hemisphere separately. In this case, the color constant descriptor is computed using only data from one half of the image. Figure 2. A resistive grid can be used to compute local space average color. The relationship between the parameter p and the input resistance R0 and the resistance R which connects neighboring grid points is given by R0/R=(1-p)/4p. Land et al. performed an experiment with a human subject who had its callosal connections cut. His color perception differed from the color perception of a normal subject. Hence, an intact corpus callosum is required for accurate color perception. This also makes sense algorithmically. The connections across the vertical meridian are required to allow for an exchange between the left and right hemispheres of the brain. In the experiment of Land et al. (1983), a subject with cut callosal connections was fixating a purple test region as shown in Figure 3. The subject had to report on the color of the purple test region. Note that the language center is located in the left hemisphere of the brain. The subject with the cut callosal connections perceived the test region as white whereas the normal subject perceived the test region as purple. If the light in the region around the purple test region was attenuated using neutral density filters then both the normal person and the person with the cut corpus callosum perceived the purple test region as white. The result obtained with the algorithm described here is as described in the experiment carried out by Land et al. (1983). M. Ebner Figure 3. (a) Color Mondrian normally illuminated (b) Color Mondrian with attenuated illumination. 6. Color Constancy Based on Local Space Average Color Ebner (2004a) has shown that the gray world assumption may also be applied locally. We can use local space average color to estimate the color of the illuminant for each image pixel (x,y). ( ) ( ) This estimate can then be used to compute a color constant descriptor. Since we obtain an estimate of the illuminant locally for each image pixel, we are now able to correct for a spatially varying illuminant provided that the environment is sufficiently diverse. A color constant descriptor oi can be computed by by dividing the measured color by twice local space average color. ( ) ( ( ) ) ( ( ) ) ( ) ( ( ) ) ( ) It is obvious that the algorithm only works well if the assumptions made by this method are fulfilled. For the derivation of this algorithm, we assumed that a large number of different colors are contained in the scene. Figure 4(a) shows a leaf from a banana plant. The output color is shown in Figure 4(b). The output color was obtained by computing local space average color using an exponential kernel and dividing the measured color by twice local space average color. The output is surprisingly good even though the original image has a strong color bias which should not be removed. Indeed it appears that human color perception behaves similarly when only an isolated patch is viewed, e.g. through a tube (Land 1974). Figure 4. (a) Input image showing a leaf from a banana plant (b) Output image, computed by dividing the measured data by twice local space average color. Figure 5 shows the results for a natural scene. The image shown in Figure 5(a) was taken with a Canon 10D. The white balance was set to a color temperature of 6500K. The image is very blueish due to the presence of a blueish illuminant. Before processing, the standard sRGB transform was used to transform the image to a linear RGB color space. The estimated illuminant is shown in Figure 5(b). Figure 5(c) shows the output image. Again, local space average color was computed using an exponential kernel and dividing the measured color by twice local space average color. The kernel extended over M. Ebner one third of the image. We see that the color cast is removed nicely. Figure 5(d)-(e) show the results for a non uniform illuminant. Figure 5. (a) Input image showing an office scene. (b) Estimate of the illuminant (local space average color). Local space average color was computed using an exponential kernel which extends over one third of the image. (c) Output image, computed by dividing the measured data by twice local space average color. (d)-(f) Results for an input scene illuminated by a non-uniform illuminant. Algorithms based on the computation of local space average color have shown to work very well when evaluated on a large image database containing objects with different reflectance characteristics: matte, specular, metallic or fluorescent (Ebner 2009). The database contained photographs of different scenes each illuminated by several different illuminants. For each scene, two photographs were randomly chosen and a color constancy algorithm was applied. The task was to automatically match photographs of the same scene which were illuminated by different illuminants. A perfect color constancy algorithm would produce exactly the same output for both illuminants and matching would therefore be quite easy. It turned out that color constancy algorithms based on local space average color performed very well on this task. 7. Usage of Color Shifts Helson performed experiments investigating the color perception of human subjects looking at achromatic samples which are illuminated by colored light (Helson 1938). The color which is perceived depends on the color of the illuminant and also on the color of the background. A bright gray patch located on a gray background, i.e. a background of intermediate uniform reflectance, will appear to have the color of the illuminant. In contrast, a dark gray patch on the same background will have the complementary color of the illuminant. Patches of intermediate reflectance similar to the background will appear achromatic. Obviously a computational algorithm for color perception should be able to reproduce this behavior. If we look at the algorithms of Land (1986), Horn (1974, 1986), Moore et al. (1991) and Rahman et al. (1999) we see that these algorithms do not reproduce this behavior. The reason for this is because these algorithms compute the ratio between the color of the pixel and local space average color. As soon as we compute the ratio of the illuminant to local space average color (assuming narrow band receptors), the color of the illuminant drops out of the equation. Hence, achromatic stimuli will always appear to be achromatic irrespective of the illuminant used. Ebner (2007a) gives a more extensive discussion of the behavior of these algorithms on the different stimuli. The behavior, which was observed by Helson, that a bright gray patch on a gray background appears to have the color of the illuminant and that a dark gray patch appears to have the complementary color of the background, points to the use of color shifts, i.e. that the average color of the scene is subtracted from the color of the sample. Ebner (2004a) has developed a color constancy algorithm which is based on color shifts. A color constant descriptor is obtained by subtracting the component of local space average color which is perpendicular to the gray vector from the measured color of each pixel. The gray vector runs from black to white through the color space. This operation moves local space average color onto the gray vector while maintaining the average intensity of the local space average color. This is in line with the gray world assumption which states that on average, the world is gray. If we compute local space average color and this computed color is not located on the gray vector it needs to be shifted onto the gray vector such that the gray world assumption is fulfilled. Let w = (1/3) [1,1,1]T be the normalized gray vector. Let c = [cr, cg, cb ]T$ be the measured color for an image pixel. Let a = [ar, ag, ab]T be local space average color which is computed for the neighborhood of the same image pixel. First, we compute the component a of local space average color which is perpendicular to the gray vector. We obtain this component by projecting the vector a onto the white vector w and subtracting the result from a. ( ) M. Ebner The vector a points from the gray vector to the local space average color a which was computed for the given image pixel. If we subtract a from the measured color c, we obtain a color constant descriptor o o = c ­ a This operation is illustrated in Figure 6 for two vectors c and a. The operation can be simplified by looking at the individual components, i.e. color channels. ( ) Writing ( ) , we obtain In other words, local space average color is subtracted from the measured color. The average intensity of the individual components of local space average color a is added to maintain the average intensity of the measured color. The result of this operation is that the measured color is moved in the direction as specified between the difference of the local space average color and its projection onto the gray vector. Local space average color is shifted onto the gray vector. A color cast is removed. This algorithm shows the behavior which was described by Helson. Figure 6. Local space average color is pushed onto the gray vector by subtracting the component a from the measured color c. The component a runs perpendicular to the gray vector w. The result of this operation is that a color cast is removed from the image while average intensity is maintained. A bright gray patch on a gray background will have the color of the illuminant because the average color will be heavily influenced by the background. It will not appear achromatic but show some color of the illuminant. Similarly, a dark gray patch on a gray background will have the opposite color of the illuminant because the measured color of a point inside the patch will be closer to the gray vector than the computed local space average color. Hence, it will be pushed too far and receive the opposite color. 8. A Computational Theory of Color Perception We now discuss the individual stages of our model shown in Figure 7. Figure 7. Proposal for a computational model for color constancy. The retinal receptors are assumed to have a logarithmic response. Cells within and up to V1 provide a rotation of the coordinate system. A resistive grid which is probably part of V4 computes local space average color. Local space average color is subtracted from the measured color in order to arrive at a color constant descriptor. For dynamic input, temporal integration also has to be considered. Processing of visual information starts with the cones inside the retina. The three types of cones, assuming a subject with normal vision, measure the incident light inside the red, green, and blue parts of the spectrum. The model assumes that the response of the receptors is logarithmic. Faugeras (1979) also proposed a logarithmic relationship. Let us assume that the first step in the processing pipeline is the application of a logarithmic or other closely related function. This is the retinal stage. The next stage is a rotation of the coordinate system which occurs up to and including area V1. This stage M. Ebner would be due to the presence of color opponent cells. Color is now described inside a rotated coordinate system with the three axes: red-green, blue-yellow and black-white. Within this rotated coordinate system local space average color would be computed using interconnected neurons. The neurons only have to be connected to other neurons which also compute local space average color in their vicinity. They would simply have to be resistively coupled to neighboring neurons in order to obtain a low-pass filter which essentially computes local space average color. Connections across the corpus callosum could provide for a coupling between the left and right hemispheres of the brain. The resistive grid would either be located inside V4 or in an area providing connections to V4. Gap junctions behave mainly as pure resistors (Herault 1996). Amacrine cells of the mammalian retina have been found to be resistively coupled through gap junctions and to provide a low pass filter (Veruki and Hartveit 2002). Galarreta and Hestrin (1999) report of electrical coupling between neurons inside the visual cortex. Another way to form a resistive grid would be to use the input resistance of the dendritic branch (Di Maio 2007). The following derivation is given for simplicity using the RGB color space. However, it also holds for any rotated coordinate system. Given that we now have local space average color at our disposal, we can compute a color constant descriptor o by subtracting local space average color a from the measured color log c inside the rotated coordinate system ( ) ( ( ( ) ) ) ( ( ) ( ) ) with ( ) ( ) ( ) ( ) Assuming that ( ( ), we obtain ( ( ) ) ) which is independent from the illuminant and only depends on the reflectance Ri of the object. This is the most likely architecture for human color constancy. As the resolution of brain imaging methods increases, it may be possible to test whether one or the other method is used by the visual system. The first step towards verifying or disproving this theory would be to look for networks of resistively coupled neurons. 9. Discussion In order to fully understand how the human visual system works, we need a computational theory. Theoretical neuroscience has still a long way to go before we can truly say that we have understood what the brain computes (see Carandini et al. 2005). We basically need a description of how the visual information is processed computationally. Given such a description, we would be able to replicate the results in a simulated visual system on a computer. Judd (1940) as well as Richards and Parks (1971) have developed theoretical models for color perception among others. However, these models are psychophysical models of color perception. It is not clear why they would depend on average apparent reflectance or the average luminance. Their models do not describe how the data measured by the receptors inside the retina is mapped to a particular color. They are phenomenological descriptions of color vision. It is clear that we need a computational theory of color vision. This was also suggested by Marr (1982). Smithson (2005) gives an overview about many computational algorithms and also discusses whether cortical mechanisms, i.e. within V4 are involved in color constancy. According to Smithson, color constancy processing is starting in the retina. It is then enhanced in V1/V2 and processing finally continues in V4. This view is shared here. Many different algorithms have been developed in the computer vision community. However, for many of those algorithms, it is not clear how they could be mapped to what is known about the visual system. Given that the brain is operating in a highly parallel manner, we should look for a solution which arrives at a color constant descriptor through parallel computations. Among the algorithms which would lend themselves to a biological realization are the original Retinex algorithm of Land and McCann (1971), as well as the variants developed by Land (1986), Horn (1974) and Blake (1985). Algorithms developed by Ebner (2004a, 2004b, 2006) are simpler and more robust than other algorithms and also operate in parallel. In principle, each of these algorithms could be realized using the massively parallel neural architecture of the human visual system. Linnel and M. Ebner Foster (1997) made the suggestion that observers are mainly using space average color in order to estimate of color of the illuminant in Mondrian scenes. However, the color of the highest luminance patch may also be used to a certain extent. It has been argued by McCann (1997) that in the human visual system, color is basically determined by normalizing each color channel independently to the maximum in the field of view. Currently, we do not know how color is processed inside V4. Local space average color could be either computed in space or in time (Hurlbert, 1986). Several algorithms require an integration over time, e.g. the algorithms of Horn (1974), Blake (1985) and Ebner (2004a, 2004b, 2006). All that is required for an integration over time are recurrent neurons which are connected to their nearest neighbors as we have seen above. Another way to compute local space average color would be to compute it by applying Gaussian blur. This could be performed in several stages where the lowest stage just blurs the image very slightly. The next stage again blurs the output of the first stage an so on. If this method were implemented, then the neurons would form a hierarchy where the neurons at the first stage have a very small receptive fields. As we move up the hierarchy, the size of the receptive fields would increase. The neuron at the top of the hierarchy would receive a very strongly blurred image as input which would be just the space average color which we need to compute a color constant descriptor. This would be similar to the method suggested by Rahman et al. (1999). They suggested to use Gaussian kernels of different scales and to perform a color correction at multiple scales. Another completely different way to estimate the color of the illuminant would be to use the rods in the periphery of the retina as suggested by Hurlbert (1986). The rods in the periphery could be used to compute a spatial average over the image boundary. Yet another method is suggested by D'Zmura and Lennie (1986, 1992). They suggested, color constancy might be due to an adaptation mechanism. As the eye, head and body moves, the retina is exposed to different parts of the scene. In this scenario, space average color would be computed in the course of time by averaging the data per receptor as the retina is exposed to different parts of the scene. The perceived color would then depend on how much the measured color would differ from this adapted state. However, Land and McCann (1971) performed color constancy experiments with very short exposure times. Their experiments show that the ability to perceive colors as constant does not dependent on long exposure times. The phenomenon of color constancy even exists if the image is only perceived for a fraction of a second. The visual system is a product of natural evolution. The ability of color constancy definitely provided an evolutionary advantage compared to individuals without this ability. An interesting question in this respect is which solution would be preferred by natural evolution. It it seems likely that natural evolution would use what is already in place and adapt it in order to improve it. In fact, Ebner (2006) has shown that it is possible to evolve a parallel algorithm for color constancy operating on a grid of processing elements. The visual system is highly parallel. Hence, it seems likely that either the algorithms of Land (1986), Horn (1974) or Ebner (2004a, 2004b) is employed. For these algorithms, only local connections are required. A hierarchy of neurons, which could also be used to compute a blurred image, would require a much larger and possibly unnecessarily large neural architecture. If a color constancy algorithm could be realized by relatively simple means why would evolution favor a more complicated architecture? Let us now have a look at how the different algorithms could be mapped to what is known about the visual system. The algorithm of Horn (1974) first applies the Laplacian operator. We could just use local differencing followed by an averaging operation to construct a Laplacian operator. The output of a Laplacian operator is a color constant descriptor because the response of the receptors is logarithmic (or nearly logarithmic) (Herault 1996). However, it does not describe the color of an object. It does not describe relative reflectance of a patch compared to other parts of the scene. The next step in the algorithm of Horn is a thresholding operation. The output is then integrated to obtain relative reflectance. Given what we know so far, this integration could be performed inside V4 using a resistive grid of resistively coupled neurons. Livingstone and Hubel (1984) discuss how the first stages of this algorithm, i.e. up to the integration, could be mapped to what is known about the visual system. They assume that the Retinex algorithm is applied inside a transformed coordinate system. The three cone channels red, green, and blue are transformed either to a longitude-latitude spherical polar coordinate system or to a rotated coordinate system. Inside a spherical coordinate system radius would denote the dark-light scale, longitude the redgreen axis and latitude the blue-yellow axis. Land (1986) also noted that the Retinex algorithm can also be applied inside a rotated coordinate system. The Retinex algorithm of Land and McCann (1971) as well as the twodimensional variants of Horn (1974) and Blake (1985) require a thresholding operation. Algorithms based on a threshold are usually not very robust. Either way we set a threshold, we are going to make mistakes. In this case, a change of illuminant could be taken as a change of reflectance and vice versa. The algorithm of Ebner (2004a, 2004b) does not require a threshold operation. In order to map this algorithm to the visual system, we assume that the cone signals (with a logarithmic response) are first transformed into a rotated coordinate system. Inside this rotated coordinate system, the data would then be averaged. The result, local space average color inside a rotated coordinate system, would be subtracted from the measured color which has also been transformed to the rotated coordinate system. Local space average color M. Ebner would be computed using resistively coupled neurons. The architecture of this algorithm is shown in Figure 7. Livingstone and Hubel (1984) note that cells found inside the blobs of V1 could act as building blocks which contribute to long-range interactions occurring in V4. Both models, the model of Horn (1974) as well as the model of Ebner (2004a, 2004b) do not require long range interactions (apart from the callosal connections). Only local connections which behave like a resistor between neighboring neurons are required. Experiments investigating the phenomenon of color constancy have shown that the color of a given point depends on the color patches in the surrounding area. The reason why this is the case, is most likely due to iterative propagation of data from one cell to the next, i.e. because of the resistive coupling. So far, the computational algorithm shown in Figure 7 has been shown in line with Helson's (1938) results on the color perception of gray patches illuminated with a non-uniform spectrum (Ebner 2007a). Above, we have illustrated how this algorithm behaves on a simulated observer where connections of the resistive grid have been cut across the vertical meridian. Thus, the above algorithm is also in line with the behavior of a patient, who had his callosal connections cut, on an illuminated color Mondrian (Land et al. 1983). Recently, Werner has shown that color constancy improves when an object moves (Werner 2007). Werner conducted experiments with subjects looking at colored as well as black and white checker boards. Observers had to judge whether or not an achromatic test patch actually appeared achromatic under different illuminants. Experiments were conducted with a static test patch in front of a static checker board pattern and also using a test patch which was moving from right to left. In the latter case, color constancy improved. Ebner (2012a, 2012b) has shown that the above algorithm is able to explain this behavior. The only additional assumption is that there is a slight offset (dx,dy) in the computation of a color constant descriptor oi, i.e. the computation is ( ) ( ) ( ) Koenderink and van Doorn (1999, 2000) have shown that local disordering of image data may have certain advantages. A small offset is also very likely, given that the brain is a developmental system. 10. Conclusions Several theories of color vision have been put forward. Most of them are phenomenological descriptions and it is not clear how they could be mapped to what is known about the visual system. What we need is a computational theory of color perception which can also be mapped to what is known about the visual system. Helson`s psychophysical experiments point to the use of color shifts. The computational theory presented above is very simple and is based on color shifts. It is very effective at computing color constant descriptors. It is based on the computation of local space average color using a resistive grid. Local space average color is used to obtain an estimate of the illuminant locally for each point of the scene. Since the illuminant is estimated locally, the method is also able to cope with multiple illuminants, i.e. a spatially varying illumination of the scene. The resistive grid is formed through locally resistively coupled neurons. The long range connections through the corpus callosum are assumed to connect adjacent neurons (adjacent with respect to the receptive field of the neuron). This provides an exchange of data between the left and right hemisphere of the brain. Some neurons of V4 are assumed to receive input from this resistive grid as well as through V1 which provides the original retinal data within a rotated coordinate system. The receptors of the retina provide a logarithmic response. Local space average color is assumed to be subtracted from the logarithmic retinal response, i.e. all that is needed is a negative coupling between the system computing local space average color and the system providing the input signal. A color constant descriptor is obtained as a result. This computational algorithm is able to explain Helson's results (Helson 1938, Ebner 2007a), the results by Land et al. (1983) on a patient who had his callosal connections cut and it also explains Werner's results on why color constancy seems to improve for moving objects (Ebner 2012a, Ebner 2012b). Acknowlegement John McCann provided very helpful comments to improve this paper. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bio-Algorithms and Med-Systems de Gruyter

A Computational Model for Color Perception

Bio-Algorithms and Med-Systems , Volume 8 (4) – Dec 1, 2012

Loading next page...
 
/lp/de-gruyter/a-computational-model-for-color-perception-NpBugw6l1A
Publisher
de Gruyter
Copyright
Copyright © 2012 by the
ISSN
1895-9091
DOI
10.1515/bams-2012-0028
Publisher site
See Article on Publisher Site

Abstract

Color is not a physical quantity of an object. It cannot be measured. We can only measure reflectance, i.e. the amount of light reflected for each wavelength. Nevertheless, we attach colors to the objects around us. A human observer perceives colors as being approximately constant irrespective of the illuminant which is used to illuminate the scene. Colors are a very important cue in everyday life. They can be used to recognize or distinguish different objects. Currently, we do not yet know how the brain arrives at a color constant or approximately color constant descriptor, i.e. what computational processing is actually performed by the brain. What we need is a computational description of color perception in particular and color vision in general. Only if we are able to write down a full computational theory of the visual system then we have understood how the visual system works. With this contribution, a computational model of color perception is presented. This model is much simpler compared to previous theories. It is able to compute a color constant descriptor even in the presence of spatially varying illuminants. According to this model, the cones respond approximately logarithmic to the irradiance entering the eye. Cells in V1 perform a change of the coordinate system such that colors are represented along a red-green, a blue-yellow and a black-white axis. Cells in V4 compute local space average color using a resistive grid. The resistive grid is formed by cells in V4. The left and right hemispheres are connected via the corpus callosum. A color constant descriptor which is presumably used for color based object recognition is computed by subtracting local space average color from the cone response within a rotated coordinate system. KEYWORDS: Color Perception, Computational Theory, Color Constancy, V4 M. Ebner 1. Introduction Color is not a physical quantity which can be measured. Yet we attach it to the objects around us. A human observer perceives colors as being approximately constant irrespective of the illuminant which is used to illuminate the scene. Colors are a very important cue in everyday life. We use colors to recognize or distinguish different objects. Some colors, e.g. red, are used to focus attention (ripe fruit) or to communicate important messages, e.g. an immediate danger. Color would not be useful as a signaling mechanism if the perceived color of an object would vary with the color of the illuminant used. The color of an object would not even stay constant during the course of the day because the color temperature varies during the day. That is why we need a mechanism for color perception which somehow computes a color constant descriptor from the light which is reflected from an object. Several theories for color perception have been put forward. However, many are basically phenomenological descriptions of what color vision does. Phenomenological in a sense that these theories do not explain how the computations are actually performed by the brain. What we need is a computational description of color perception (i.e. down to the neural level) (Ebner 2007c). We have only understood how the visual system works if we are able to write down a full computational theory of this system. With this contribution we provide a computational theory of color perception which can be mapped to what is known about the visual system. The main contribution of this article is to (a) present a computational model for color constancy and (b) to show how the individual states of this model are mapped to what is known about the human visual system and (c) summarizes which visual phenomena are explained by this model. We will first provide some background on the theory of color image formation followed by a brief review of several important color constancy algorithms from the machine vision community. We then show how a local estimate of the illuminant can be computed iteratively using either a grid of processing elements (neurons) or a resistive grid. This estimate is obtained by computing local space average color. Finally, we show how one can apply the concept of the gray world assumption (Buchsbaum 1980), a well known color constancy algorithm, within the context of color shifts (Ebner 2004a). The use of color shifts by the human visual system is supported by psychophysical experiments. Our results lead to a computational theory of color perception. 2. Theory of Color Image Formation In order to understand how this computational theory works, we first have a look at a how a color image is formed. Suppose that we are looking at an object which is illuminated by a light source. The incident light is reflected with varying amounts depending on the wavelength of the incident light. We can measure the reflected light using a measuring device such as a digital or analog camera or a spectrometer. Let R() be the percentage of the reflected light at wavelength and let E() be the irradiance at wavelength then the reflected light is given as ( ) ( ) ( ) The reflected light varies with the amount of incident light E(). If we measure the reflected light over the visible spectrum and also know the irradiance, we can compute the reflectance R() for each wavelength . This signature is a physical quantity of the object. When we look at an object, light, which is reflected from the object, enters the eye and is measured by the receptors inside the retina. Two types of receptors exist, rods and cones (Dowling 1987). The rods are used when very little light is available. They have a much higher sensitivity compared to cones (Fain and Dowling 1973). The cones are used in bright light conditions. Three different types of cones can be distinguished which respond to light in the short, middle and long parts of the spectrum (Brown and Wald 1964; Marks et al. 1964). The blue, green and red cone pigments peak at 419 nm, 531 nm, and 559 nm respectively. There appears to be some variance in the sensitivities of the red and green cones. In order to develop a computational theory for color perception, it pays to take a look at machine vision. Note that in machine vision one usually tries to estimate reflectance whereas in computational modeling of color perception one tries to replicate how colors appear to a human observer. Even though human color perception correlates with integrated reflectance (McCann et al. 1976) the two problems are quite different. If we take a photograph of a scene, the digital camera measures the energy of the light, which is reflected from the objects contained in the scene, in three different parts of the spectrum. The energy is measured in the short (blue), middle (green) and long (red) parts of the spectrum. Analog film can also be considered to be a measuring device with which light is measured in three parts of the spectrum. The measured energy depends on the sensitivity of the receptor or sensor. Let S() be the sensitivity of sensor i with i {r,g,b}, then the measured energy Qi is essentially given by ( ) ( ) M. Ebner The response curves of an artificial sensor are usually modeled to have similar response characteristics as the receptors of the human retina. Once the reflected light is measured using three sets of cones or three types of artificial sensors the result is a point in a three-dimensional space. We refer to this point as the cone response. The position of this point inside this three-dimensional space varies with the type of illuminant used to illuminate the object. Suppose that we are looking at a wall which reflects the incident light uniformly across all wavelengths, henceforth referred to as a white wall. If we illuminate the wall using white light, i.e. the irradiance is uniform over all wavelengths, then all of the sensors will respond equally strong. Suppose that we now illuminate the same wall using light from a candle. A candle emits light mostly in the red and green part of the spectrum. The sensor covering the red part of the spectrum will respond very strongly, the sensor covering the green part of the spectrum will also respond and the sensor covering the blue part of the spectrum will hardly respond at all. Thus, the same wall will have a completely different color. In the first case, the white wall will appear to be white whereas in the second case the white wall will appear to be yellow or orange. We have just seen that the cone response varies with the type of illuminant used. In our model, a sensor at position (x,y) is used to measure the energy Q of the reflected light at wavelength . This energy is proportional to the object reflectance R(x,y,) and is also proportional to the irradiance E(x,y,) falling onto the object depicted at position (x,y), i.e. we have ( ) ( ) ( ) for wavelength . The response of a sensor is obtained by also taking into account the sensitivity of the sensor and by integrating over all wavelengths to which the sensor responds. ( ) ( ) ( ) ( ) Let us first suppose that the cone response ci(x,y) depends linearly on the measured energy, i.e. we assume that ( ) ( ) The cone response actually seems to depend logarithmically on the measured energy. We will get back to that later on. The cone response, i.e. the measured energy, varies with the type of illuminant used. This is apparent to many amateur photographers all around the world. The images produced by a digital camera sometimes do not show the colors the same way a human observer perceived them to be. Sometimes the images have a very strong color cast because the automatic white balance did not estimate the color temperature of the illuminant correctly. Professional photographers are well aware of how the illuminant changes the overall look of an image. In the times of analog photography, filters were used to change the color balance (Hedgecoe 2004; Jacobsen et al. 2000). Today, application of different filters can be simulated by either setting the white balance on the camera or during post-processing. With post-processing, it is possible to change the color balance such that the resulting photograph looks more natural, i.e. as if it were taken under a canonical illuminant with a uniform power distribution. The color observed by a human observer stays remarkably constant irrespective of the illuminant used (Zeki 1993). Similarly, the lightness of an object also appears to be constant (Adelson 2000). The ability to perceive colors as approximately constant is called color constancy (Ebner 2007a). Going back to our example of the white wall, a human observer will say that the wall is white even if more light is reflected in the red and green part of the spectrum compared to the light reflected in the blue part of the spectrum. Land (1964; 1974) investigated this phenomenon in great detail. Is is obviously of great interest to develop a computational model of the brain (Koch 1999) in general and color perception in particular. If we had such a model, we could use it to improve color reproduction in digital photography. We could also use it for automatic object recognition in robotics or computer vision. We need to derive an algorithm which computes a color constant descriptor which remains constant (or at least approximately constant) irrespective of the illuminant used. This algorithm has to be mapped to what is known about the visual system. A number of theoretical models have been proposed for color perception, e.g. (Judd 1940; Richards and Parks 1971; Dufort and Lumsden 1991). However, such psychophysical models of color perception do not explain how or why the color perceived by an observer would depend on either average apparent reflectance or the average luminance. Instead of a phenomenological description of color vision we need a computational theory of color vision (Marr 1982) which allows us to simulate color perception computationally. Since color constancy is very important for the machine vision community and also for the image processing community, quite a large number of color constancy algorithms have been developed. Ebner (2007a) gives a detailed introduction into the field of computational color constancy. Surveys are also given by Maloney (1999) and Agarwal et al. (2006). Land and McCann (1971) developed the Retinex theory. Since its original inception many variants of the Retinex theory have been proposed (Horn 1974; Land 1986; Hurlbert 1986; Blake 1985; Rahman et al. 1999). Buchsbaum (1980) developed the gray world hypothesis. Van de Weijer et al. (2007) generalized the gray world assumption and introduced the gray edge hypothesis. Additional algorithms include gamut constraint methods (Forsyth 1988; Finlayson 1996; Barnard et al. 1997), color cluster rotation M. Ebner (Paulus et al. 1998), comprehensive normalization (Finlayson et al. 1998) as well as the computation of intrinsic images (Finlayson et al. 1998; Tappen et al. 2002). It appears that human color constancy is also using cues from mutual reflections or specular highlights (Hurlbert 1999). Algorithms based on a dichromatic reflection model, which also takes a specular reflection into account, have also been developed (Tominaga 1991; D`Zmura and Lennie 1992; Finlayson and Schaefer 2001; Risson 2003; Ebner and Herrmann 2005). In order to compute a color constant image from the available data, some assumptions have to be made. Otherwise, the problem of computing the reflectances cannot be solved because we only have three known values (the cone responses) but more than three unknowns for every image pixel. A popular assumption is that the receptors only respond within a very narrow band. If the sensors are assumed to respond to only a single wavelength, then the integration is removed and the model of the response of the sensor is greatly simplified. Another frequently made assumption is to say that the illuminant is approximately constant within the image. This is true if the light source is located far away from the objects. In practice, this assumption may be violated. Quite often, we have multiple illuminants such as sunlight falling through a window and also artificial illuminants within a room which are located near other objects. Several algorithms exist which also work in the presence of spatially varying illuminants (Land and McCann 1971; Land 1986; Faugeras 1979; Parker 1997; Finlayson and Hordley 2001; Barnard et al. 1997; Ebner 2004b). Most color constancy algorithms which have been developed by the image processing community are quite complex. It is not clear whether they can be mapped to what is known about the human visual system. Notable exceptions are the Retinex algorithm of Land and McCann (1971) together with the variants of Horn (1974), Blake (1985) and Rahman et al. (1999) and the parallel algorithms developed by Ebner (2004b; 2007b). All of these algorithms also work in the presence of a spatially varying illuminant. Before we describe our computational model of color perception, we briefly discuss relevant algorithms from the image processing community. 3. Algorithms for Color Constancy Since the problem of color constancy is of considerable practical interest, quite a large number of color constancy algorithms have been developed. Land (1964) formulated very early on the Retinex theory. According to the Retinex theory, three different receptors are used in the retina which primarily respond to long, middle and short wavelengths. Thus, we have three sets of receptors each measuring the energy in different parts of the visible spectrum. Land suggested that the sets of receptors process their data independently. At the time, it was not known whether the processing is done in the retina or cortex. This seems to have been resolved (Land et al. 1983). The processing most likely occurs inside the cortex. We will return to this point later. Land and McCann (1971) probably developed the first computational theory for color constancy. They showed that a color constant descriptor can be computed by multiplying ratios between adjacent photoreceptors across the image. This is called a sequential product. Land and McCann suggest that a color constant descriptor is computed using multiple paths through an image. They assume a logarithmic response of the sensors. In this case, the sequential product turns into a sum. Whenever the sequential product gets larger than zero, it is reset to zero. The result from different paths which pass through the current point is averaged in order to estimate its reflectance. This operation averages the reflectance ratio between the reflectance of the current point and the reflectance of random samples surrounding the point (Brainard and Wandell 1986; Hurlbert 1986). If it is assumed that all paths pass through all of the image pixels, then the algorithm just described, is simply a normalization with respect to the response at the largest location (Brainard and Wandell 1986). Such a normalization with respect to the maximum response of the sensor is also known as the white patch Retinex algorithm. However, as pointed out by McCann (1989), the Land and McCann formulation of the Retinex algorithm does not use an infinite number of iterations and hence its behavior is quite different from the simple white patch Retinex algorithm. Horn (1974) has developed a two-dimensional algorithm operating on a grid. He suggested an iterative method to obtain the re-integrated image data. He also noted that a resistive grid may be used. This algorithm is shown in Figure 1. Blake (1985) proposed a slight modification to Horn's algorithm. The Laplacian is separated into two gradient operations and the threshold operation is applied in between the two gradient operations. A major drawback of the algorithm of Horn and the algorithm of Blake is that a threshold is required in order to determine whether a change between adjacent sensors is caused by a change in reflectance or by a change of the illuminant. In practice it is very difficult to set this threshold correctly. Land (1986) suggested an alternative variant of the Retinex theory. He proposed to compute color constant descriptors by first taking the logarithm of the input. Then the logarithm of the average color of the surround is subtracted from the logarithm of the input. The average color of the surround is computed using samples which are distributed around the given point. The density pattern of the samples varies as 1/r2 where r is the radius measured from the position of the current point. In other words, the color constant descriptor oi is computed as ( ) ( ) ( ( / )) where denotes convolution. M. Ebner Figure 1. Algorithm of Horn. First the logarithm is applied to linear data. Then the Laplacian is computed and a threshold operation is applied. A color constant descriptor is obtained after an integration step using a resistive grid. The algorithm shown is assumed to work on each color channel independently. Frankle and McCann (1980) extended the Retinex Algorithm to work on multiple resolutions. Rahman et al. (1999) used Gaussians of different sizes to compute a series of blurred images to perform color correction on multiple scales. The method is also used for dynamic range compression. They use Gaussians of different sized to compute a series of blurred images. The logarithm is applied to both the blurred images and to the input image. A set of weights may be used to enhance or lower the influence of a particular Gaussian. Instead of a Gaussian, other smoothing kernels may also be used. Is is also possible to take the logarithm before applying the Gaussians. This results in the computation in a weighted product instead of a weighted sum. Many color constancy algorithms known from the literature are based on the gray world assumption or the generalized version, the gray edge hypothesis, in one way or another. Land, in his alternative formulation of the Retinex algorithm, suggested that the logarithm of the average color of the surround is subtracted from the logarithm of the input in order to compute a color constant descriptor. Land (1986) assumes this average is obtained by averaging input from several receptors. The algorithm of Rahman et al. (1999) also requires some form of averaging. The image is blurred using a convolution. The gray world assumption is due to Buchsbaum (1980). According to Buchsbaum, on average, the world is gray. Buchsbaum worked with overlapping response characteristics of the sensors. However, it is easier to understand how the gray world assumption works by assuming that the sensor are very narrow band, i.e. they respond only to a single wavelength. Let ci(x,y) be the measured response at position (x,y) of the image for color channel i, i.e. wavelength . This data is proportional to the reflectance Ri(x,y) at the corresponding object point and the irradiance Li(x,y) at the corresponding object point. ( ) ( ) ( ) In order to obtain an estimate of the illuminant, Buchsbaum assumed that the illuminant is constant over the entire image. In this case, we have Li(x,y) = Li which gives us ( ) ( ) We now see that the illuminant Li scales the reflectances. Once we have an estimate of the illuminant, we can compute a color constant descriptor by dividing the measured color by this estimate. The estimate of the illuminant is obtained by computing space average color over all image pixels. Global space average color a of an image with n pixels is given by ) ( ) ( )] [ ( with ( ) ( ) ( ) In order to solve this equation for Li , we need to make some additional assumptions. We are assuming that the scene we are looking at is going to contain several differently colored objects. We do not know anything about the objects which will be contained in the image. Therefore, we are going to assume that all possible colors of the objects are equally likely. In other words, we are assuming that the reflectances are uniformly distributed over the range [0,1]. If the image contains a sufficiently large number of differently colored objects, then we can replace the average reflectance by its expected value. [ ( )] Once we substitute this into the above equation, we immediately see how the color of the illuminant can be estimated from global space average color. We can obtain a color constant descriptor oi(x,y) by dividing the cone response by twice the global space average color. ( ) ( ) ( ) ( ) M. Ebner Actually, it is sufficient to simply divide the cone response by global space average color to obtain a color constant descriptor. The constant factor 2 just scales the result. This method is often used by machine vision algorithms assuming linear sensor responses once the illuminant has been estimated. 4. Human Color Perception As of now, we do not know yet how the human visual system actually computes color constant descriptors. However, it may soon be possible to develop tests which validate or invalidate the different theories. Here, a computational model for color constancy is presented and it is shown how the individual states of this model are mapped to what is known about the human visual system. Below, we will show how this model explains several different visual phenomena. We do know that visual area V4 is very important for color perception. Inside V4, cells have been found which respond to a particularly colored patch irrespective of the illuminant used (Zeki 1993; Zeki and Marini 1998). Area V4 can be subdivided into two sub-areas V4 and V4 (Zeki and Bartels 1999). The difference between these two areas is that V4 seems to have a retinotopic organization whereas area V4 does not have a retinotopic organization. The receptive fields of cells found inside visual area V4 is rather large. Some of these cells may be involved in the computation of some kind of average, i.e. local or global space average color. That would explain the reason why their receptive field is very large. Ebner (2004a; 2004b; 2006) has shown that local space average color can also be computed iteratively. This average may then be used as a local estimate of the illuminant as described below. Even though only local connections are required, as we will see below, we need to exchange data between the two hemispheres of the brain in order to compute local space average color across the vertical meridian. This would require that cells along the vertical meridian are connected somehow. V4 is the first visual area having callosal connections, thus enabling information exchange between the left and right hemispheres. We will see below that local space average color can be used to compute a color constant descriptor. The use of local space average color versus global space average color has the advantage that a color constant descriptor is also obtained in the presence of a non-uniform smoothly changing illuminant. Local space average color could be subtracted in V4 from a descriptor which is essentially based on the cone response but is located inside a different, rotated coordinate system. The output would be a color constant descriptor which responds to the color of an object irrespective of the wavelength composition of the light reflected by the object. Experiments done by Helson (1938) point to the use of color shifts. His experiments have shown that the background will have an impact on the perceived color of a gray sample. If the patch is brighter than the background it will appear to have the color of the illuminant. If the patch is darker than the background, then it will have the complementary color of the illuminant. This points to the use of color shifts. We will now describe how local space average color can be computed using a grid of processing elements, i.e. neurons. 5. Iterative Computation of Local Space Average Color Suppose that we are given a grid of processing elements, i.e. neurons. We assume that we have one processing element per image pixel and that each processing element is connected to its nearest neighbor. The neighborhood of each processing element is defined using N(x,y) which is the set of processing elements connected to the element located at position (x,y) of the image N(x,y)={(x',y')|(x',y') is neighbor of element (x,y) and c(x,y)>} where is 0 or another small value. The task of each processing element is to compute local space average color a(x,y). ( ) [ ( ) ( ) ( )] Suppose that each processing element already has an estimate of local space average color a(x,y). At this point, the value of this estimate is arbitrary. We will see below why the initial estimate may be arbitrary. One then iterates the following update equations. ( ( ) ) | ( ( ) )| ( ) ( ) ( ) ( ) ) A small percentage p is used for the second update operation. The first operation reads local space average color stored in neighboring elements and then averages this data provided that the measured color is larger than a threshold , e.g. 0 or another small value. The second operation uses the color measured at the current element, and slowly fades this color into the average. The second operation is simply a weighted average between the measured color and the data averaged from neighboring elements. The result will be a new and better estimate of local space average color. This process converges to local space average color while the original content slowly fades M. Ebner away. The data basically diffuses between neighboring elements. The properties of this diffusion process are determined by the parameter p. For a stationary input c(x,y), the result will be independent of the initialization a(x,y)$. The parameter p defines the extent over which local space average color will be computed. If we set p to a small value, then local space average color will be computed over an extensive area. If we set p to a relatively large value, local space average color will be computed over a very small area. The computation performed by averaging data from neighboring elements and then adding some measured data is similar though not identical to the convolution of the measured data with an exponential kernel (Ebner 2007a). ( ) ( ) | | | | The scaling parameter can be computed from the parameter p as follows. If we convolve the input image with an exponential kernel and scaling parameter , we obtain an image which is very similar to the image which is computed by the grid of processing elements and the algorithm described above using the parameter p. When we look at the computations which are performed by the grid of processing elements we see that relatively simple computational operations are required. In fact, it is possible to use a resistive grid to compute local space average color. Such a resistive grid is shown in Figure 2. Adjacent points of the grid are connected with resistance R. The input is fed into the system from the top. An input resistance R0 is used which connects the input and the grid points. The output of the resistive grid is available at the node points. It is simply a spatially smoothed version of the input image. The relationship between the parameter p and the input resistance R0 and the resistance R which connects neighboring grid points is given by R0/R=(1-p)/4p. In the model, cells located within V4 are resistively coupled in order to obtain a measure of the illuminant. Cells within V4 of the left and right hemisphere have to be resistively coupled across the vertical meridian. It is assumed that this happens through the corpus callosum. If these connections are severed, local space average color would be computed within each hemisphere separately. In this case, the color constant descriptor is computed using only data from one half of the image. Figure 2. A resistive grid can be used to compute local space average color. The relationship between the parameter p and the input resistance R0 and the resistance R which connects neighboring grid points is given by R0/R=(1-p)/4p. Land et al. performed an experiment with a human subject who had its callosal connections cut. His color perception differed from the color perception of a normal subject. Hence, an intact corpus callosum is required for accurate color perception. This also makes sense algorithmically. The connections across the vertical meridian are required to allow for an exchange between the left and right hemispheres of the brain. In the experiment of Land et al. (1983), a subject with cut callosal connections was fixating a purple test region as shown in Figure 3. The subject had to report on the color of the purple test region. Note that the language center is located in the left hemisphere of the brain. The subject with the cut callosal connections perceived the test region as white whereas the normal subject perceived the test region as purple. If the light in the region around the purple test region was attenuated using neutral density filters then both the normal person and the person with the cut corpus callosum perceived the purple test region as white. The result obtained with the algorithm described here is as described in the experiment carried out by Land et al. (1983). M. Ebner Figure 3. (a) Color Mondrian normally illuminated (b) Color Mondrian with attenuated illumination. 6. Color Constancy Based on Local Space Average Color Ebner (2004a) has shown that the gray world assumption may also be applied locally. We can use local space average color to estimate the color of the illuminant for each image pixel (x,y). ( ) ( ) This estimate can then be used to compute a color constant descriptor. Since we obtain an estimate of the illuminant locally for each image pixel, we are now able to correct for a spatially varying illuminant provided that the environment is sufficiently diverse. A color constant descriptor oi can be computed by by dividing the measured color by twice local space average color. ( ) ( ( ) ) ( ( ) ) ( ) ( ( ) ) ( ) It is obvious that the algorithm only works well if the assumptions made by this method are fulfilled. For the derivation of this algorithm, we assumed that a large number of different colors are contained in the scene. Figure 4(a) shows a leaf from a banana plant. The output color is shown in Figure 4(b). The output color was obtained by computing local space average color using an exponential kernel and dividing the measured color by twice local space average color. The output is surprisingly good even though the original image has a strong color bias which should not be removed. Indeed it appears that human color perception behaves similarly when only an isolated patch is viewed, e.g. through a tube (Land 1974). Figure 4. (a) Input image showing a leaf from a banana plant (b) Output image, computed by dividing the measured data by twice local space average color. Figure 5 shows the results for a natural scene. The image shown in Figure 5(a) was taken with a Canon 10D. The white balance was set to a color temperature of 6500K. The image is very blueish due to the presence of a blueish illuminant. Before processing, the standard sRGB transform was used to transform the image to a linear RGB color space. The estimated illuminant is shown in Figure 5(b). Figure 5(c) shows the output image. Again, local space average color was computed using an exponential kernel and dividing the measured color by twice local space average color. The kernel extended over M. Ebner one third of the image. We see that the color cast is removed nicely. Figure 5(d)-(e) show the results for a non uniform illuminant. Figure 5. (a) Input image showing an office scene. (b) Estimate of the illuminant (local space average color). Local space average color was computed using an exponential kernel which extends over one third of the image. (c) Output image, computed by dividing the measured data by twice local space average color. (d)-(f) Results for an input scene illuminated by a non-uniform illuminant. Algorithms based on the computation of local space average color have shown to work very well when evaluated on a large image database containing objects with different reflectance characteristics: matte, specular, metallic or fluorescent (Ebner 2009). The database contained photographs of different scenes each illuminated by several different illuminants. For each scene, two photographs were randomly chosen and a color constancy algorithm was applied. The task was to automatically match photographs of the same scene which were illuminated by different illuminants. A perfect color constancy algorithm would produce exactly the same output for both illuminants and matching would therefore be quite easy. It turned out that color constancy algorithms based on local space average color performed very well on this task. 7. Usage of Color Shifts Helson performed experiments investigating the color perception of human subjects looking at achromatic samples which are illuminated by colored light (Helson 1938). The color which is perceived depends on the color of the illuminant and also on the color of the background. A bright gray patch located on a gray background, i.e. a background of intermediate uniform reflectance, will appear to have the color of the illuminant. In contrast, a dark gray patch on the same background will have the complementary color of the illuminant. Patches of intermediate reflectance similar to the background will appear achromatic. Obviously a computational algorithm for color perception should be able to reproduce this behavior. If we look at the algorithms of Land (1986), Horn (1974, 1986), Moore et al. (1991) and Rahman et al. (1999) we see that these algorithms do not reproduce this behavior. The reason for this is because these algorithms compute the ratio between the color of the pixel and local space average color. As soon as we compute the ratio of the illuminant to local space average color (assuming narrow band receptors), the color of the illuminant drops out of the equation. Hence, achromatic stimuli will always appear to be achromatic irrespective of the illuminant used. Ebner (2007a) gives a more extensive discussion of the behavior of these algorithms on the different stimuli. The behavior, which was observed by Helson, that a bright gray patch on a gray background appears to have the color of the illuminant and that a dark gray patch appears to have the complementary color of the background, points to the use of color shifts, i.e. that the average color of the scene is subtracted from the color of the sample. Ebner (2004a) has developed a color constancy algorithm which is based on color shifts. A color constant descriptor is obtained by subtracting the component of local space average color which is perpendicular to the gray vector from the measured color of each pixel. The gray vector runs from black to white through the color space. This operation moves local space average color onto the gray vector while maintaining the average intensity of the local space average color. This is in line with the gray world assumption which states that on average, the world is gray. If we compute local space average color and this computed color is not located on the gray vector it needs to be shifted onto the gray vector such that the gray world assumption is fulfilled. Let w = (1/3) [1,1,1]T be the normalized gray vector. Let c = [cr, cg, cb ]T$ be the measured color for an image pixel. Let a = [ar, ag, ab]T be local space average color which is computed for the neighborhood of the same image pixel. First, we compute the component a of local space average color which is perpendicular to the gray vector. We obtain this component by projecting the vector a onto the white vector w and subtracting the result from a. ( ) M. Ebner The vector a points from the gray vector to the local space average color a which was computed for the given image pixel. If we subtract a from the measured color c, we obtain a color constant descriptor o o = c ­ a This operation is illustrated in Figure 6 for two vectors c and a. The operation can be simplified by looking at the individual components, i.e. color channels. ( ) Writing ( ) , we obtain In other words, local space average color is subtracted from the measured color. The average intensity of the individual components of local space average color a is added to maintain the average intensity of the measured color. The result of this operation is that the measured color is moved in the direction as specified between the difference of the local space average color and its projection onto the gray vector. Local space average color is shifted onto the gray vector. A color cast is removed. This algorithm shows the behavior which was described by Helson. Figure 6. Local space average color is pushed onto the gray vector by subtracting the component a from the measured color c. The component a runs perpendicular to the gray vector w. The result of this operation is that a color cast is removed from the image while average intensity is maintained. A bright gray patch on a gray background will have the color of the illuminant because the average color will be heavily influenced by the background. It will not appear achromatic but show some color of the illuminant. Similarly, a dark gray patch on a gray background will have the opposite color of the illuminant because the measured color of a point inside the patch will be closer to the gray vector than the computed local space average color. Hence, it will be pushed too far and receive the opposite color. 8. A Computational Theory of Color Perception We now discuss the individual stages of our model shown in Figure 7. Figure 7. Proposal for a computational model for color constancy. The retinal receptors are assumed to have a logarithmic response. Cells within and up to V1 provide a rotation of the coordinate system. A resistive grid which is probably part of V4 computes local space average color. Local space average color is subtracted from the measured color in order to arrive at a color constant descriptor. For dynamic input, temporal integration also has to be considered. Processing of visual information starts with the cones inside the retina. The three types of cones, assuming a subject with normal vision, measure the incident light inside the red, green, and blue parts of the spectrum. The model assumes that the response of the receptors is logarithmic. Faugeras (1979) also proposed a logarithmic relationship. Let us assume that the first step in the processing pipeline is the application of a logarithmic or other closely related function. This is the retinal stage. The next stage is a rotation of the coordinate system which occurs up to and including area V1. This stage M. Ebner would be due to the presence of color opponent cells. Color is now described inside a rotated coordinate system with the three axes: red-green, blue-yellow and black-white. Within this rotated coordinate system local space average color would be computed using interconnected neurons. The neurons only have to be connected to other neurons which also compute local space average color in their vicinity. They would simply have to be resistively coupled to neighboring neurons in order to obtain a low-pass filter which essentially computes local space average color. Connections across the corpus callosum could provide for a coupling between the left and right hemispheres of the brain. The resistive grid would either be located inside V4 or in an area providing connections to V4. Gap junctions behave mainly as pure resistors (Herault 1996). Amacrine cells of the mammalian retina have been found to be resistively coupled through gap junctions and to provide a low pass filter (Veruki and Hartveit 2002). Galarreta and Hestrin (1999) report of electrical coupling between neurons inside the visual cortex. Another way to form a resistive grid would be to use the input resistance of the dendritic branch (Di Maio 2007). The following derivation is given for simplicity using the RGB color space. However, it also holds for any rotated coordinate system. Given that we now have local space average color at our disposal, we can compute a color constant descriptor o by subtracting local space average color a from the measured color log c inside the rotated coordinate system ( ) ( ( ( ) ) ) ( ( ) ( ) ) with ( ) ( ) ( ) ( ) Assuming that ( ( ), we obtain ( ( ) ) ) which is independent from the illuminant and only depends on the reflectance Ri of the object. This is the most likely architecture for human color constancy. As the resolution of brain imaging methods increases, it may be possible to test whether one or the other method is used by the visual system. The first step towards verifying or disproving this theory would be to look for networks of resistively coupled neurons. 9. Discussion In order to fully understand how the human visual system works, we need a computational theory. Theoretical neuroscience has still a long way to go before we can truly say that we have understood what the brain computes (see Carandini et al. 2005). We basically need a description of how the visual information is processed computationally. Given such a description, we would be able to replicate the results in a simulated visual system on a computer. Judd (1940) as well as Richards and Parks (1971) have developed theoretical models for color perception among others. However, these models are psychophysical models of color perception. It is not clear why they would depend on average apparent reflectance or the average luminance. Their models do not describe how the data measured by the receptors inside the retina is mapped to a particular color. They are phenomenological descriptions of color vision. It is clear that we need a computational theory of color vision. This was also suggested by Marr (1982). Smithson (2005) gives an overview about many computational algorithms and also discusses whether cortical mechanisms, i.e. within V4 are involved in color constancy. According to Smithson, color constancy processing is starting in the retina. It is then enhanced in V1/V2 and processing finally continues in V4. This view is shared here. Many different algorithms have been developed in the computer vision community. However, for many of those algorithms, it is not clear how they could be mapped to what is known about the visual system. Given that the brain is operating in a highly parallel manner, we should look for a solution which arrives at a color constant descriptor through parallel computations. Among the algorithms which would lend themselves to a biological realization are the original Retinex algorithm of Land and McCann (1971), as well as the variants developed by Land (1986), Horn (1974) and Blake (1985). Algorithms developed by Ebner (2004a, 2004b, 2006) are simpler and more robust than other algorithms and also operate in parallel. In principle, each of these algorithms could be realized using the massively parallel neural architecture of the human visual system. Linnel and M. Ebner Foster (1997) made the suggestion that observers are mainly using space average color in order to estimate of color of the illuminant in Mondrian scenes. However, the color of the highest luminance patch may also be used to a certain extent. It has been argued by McCann (1997) that in the human visual system, color is basically determined by normalizing each color channel independently to the maximum in the field of view. Currently, we do not know how color is processed inside V4. Local space average color could be either computed in space or in time (Hurlbert, 1986). Several algorithms require an integration over time, e.g. the algorithms of Horn (1974), Blake (1985) and Ebner (2004a, 2004b, 2006). All that is required for an integration over time are recurrent neurons which are connected to their nearest neighbors as we have seen above. Another way to compute local space average color would be to compute it by applying Gaussian blur. This could be performed in several stages where the lowest stage just blurs the image very slightly. The next stage again blurs the output of the first stage an so on. If this method were implemented, then the neurons would form a hierarchy where the neurons at the first stage have a very small receptive fields. As we move up the hierarchy, the size of the receptive fields would increase. The neuron at the top of the hierarchy would receive a very strongly blurred image as input which would be just the space average color which we need to compute a color constant descriptor. This would be similar to the method suggested by Rahman et al. (1999). They suggested to use Gaussian kernels of different scales and to perform a color correction at multiple scales. Another completely different way to estimate the color of the illuminant would be to use the rods in the periphery of the retina as suggested by Hurlbert (1986). The rods in the periphery could be used to compute a spatial average over the image boundary. Yet another method is suggested by D'Zmura and Lennie (1986, 1992). They suggested, color constancy might be due to an adaptation mechanism. As the eye, head and body moves, the retina is exposed to different parts of the scene. In this scenario, space average color would be computed in the course of time by averaging the data per receptor as the retina is exposed to different parts of the scene. The perceived color would then depend on how much the measured color would differ from this adapted state. However, Land and McCann (1971) performed color constancy experiments with very short exposure times. Their experiments show that the ability to perceive colors as constant does not dependent on long exposure times. The phenomenon of color constancy even exists if the image is only perceived for a fraction of a second. The visual system is a product of natural evolution. The ability of color constancy definitely provided an evolutionary advantage compared to individuals without this ability. An interesting question in this respect is which solution would be preferred by natural evolution. It it seems likely that natural evolution would use what is already in place and adapt it in order to improve it. In fact, Ebner (2006) has shown that it is possible to evolve a parallel algorithm for color constancy operating on a grid of processing elements. The visual system is highly parallel. Hence, it seems likely that either the algorithms of Land (1986), Horn (1974) or Ebner (2004a, 2004b) is employed. For these algorithms, only local connections are required. A hierarchy of neurons, which could also be used to compute a blurred image, would require a much larger and possibly unnecessarily large neural architecture. If a color constancy algorithm could be realized by relatively simple means why would evolution favor a more complicated architecture? Let us now have a look at how the different algorithms could be mapped to what is known about the visual system. The algorithm of Horn (1974) first applies the Laplacian operator. We could just use local differencing followed by an averaging operation to construct a Laplacian operator. The output of a Laplacian operator is a color constant descriptor because the response of the receptors is logarithmic (or nearly logarithmic) (Herault 1996). However, it does not describe the color of an object. It does not describe relative reflectance of a patch compared to other parts of the scene. The next step in the algorithm of Horn is a thresholding operation. The output is then integrated to obtain relative reflectance. Given what we know so far, this integration could be performed inside V4 using a resistive grid of resistively coupled neurons. Livingstone and Hubel (1984) discuss how the first stages of this algorithm, i.e. up to the integration, could be mapped to what is known about the visual system. They assume that the Retinex algorithm is applied inside a transformed coordinate system. The three cone channels red, green, and blue are transformed either to a longitude-latitude spherical polar coordinate system or to a rotated coordinate system. Inside a spherical coordinate system radius would denote the dark-light scale, longitude the redgreen axis and latitude the blue-yellow axis. Land (1986) also noted that the Retinex algorithm can also be applied inside a rotated coordinate system. The Retinex algorithm of Land and McCann (1971) as well as the twodimensional variants of Horn (1974) and Blake (1985) require a thresholding operation. Algorithms based on a threshold are usually not very robust. Either way we set a threshold, we are going to make mistakes. In this case, a change of illuminant could be taken as a change of reflectance and vice versa. The algorithm of Ebner (2004a, 2004b) does not require a threshold operation. In order to map this algorithm to the visual system, we assume that the cone signals (with a logarithmic response) are first transformed into a rotated coordinate system. Inside this rotated coordinate system, the data would then be averaged. The result, local space average color inside a rotated coordinate system, would be subtracted from the measured color which has also been transformed to the rotated coordinate system. Local space average color M. Ebner would be computed using resistively coupled neurons. The architecture of this algorithm is shown in Figure 7. Livingstone and Hubel (1984) note that cells found inside the blobs of V1 could act as building blocks which contribute to long-range interactions occurring in V4. Both models, the model of Horn (1974) as well as the model of Ebner (2004a, 2004b) do not require long range interactions (apart from the callosal connections). Only local connections which behave like a resistor between neighboring neurons are required. Experiments investigating the phenomenon of color constancy have shown that the color of a given point depends on the color patches in the surrounding area. The reason why this is the case, is most likely due to iterative propagation of data from one cell to the next, i.e. because of the resistive coupling. So far, the computational algorithm shown in Figure 7 has been shown in line with Helson's (1938) results on the color perception of gray patches illuminated with a non-uniform spectrum (Ebner 2007a). Above, we have illustrated how this algorithm behaves on a simulated observer where connections of the resistive grid have been cut across the vertical meridian. Thus, the above algorithm is also in line with the behavior of a patient, who had his callosal connections cut, on an illuminated color Mondrian (Land et al. 1983). Recently, Werner has shown that color constancy improves when an object moves (Werner 2007). Werner conducted experiments with subjects looking at colored as well as black and white checker boards. Observers had to judge whether or not an achromatic test patch actually appeared achromatic under different illuminants. Experiments were conducted with a static test patch in front of a static checker board pattern and also using a test patch which was moving from right to left. In the latter case, color constancy improved. Ebner (2012a, 2012b) has shown that the above algorithm is able to explain this behavior. The only additional assumption is that there is a slight offset (dx,dy) in the computation of a color constant descriptor oi, i.e. the computation is ( ) ( ) ( ) Koenderink and van Doorn (1999, 2000) have shown that local disordering of image data may have certain advantages. A small offset is also very likely, given that the brain is a developmental system. 10. Conclusions Several theories of color vision have been put forward. Most of them are phenomenological descriptions and it is not clear how they could be mapped to what is known about the visual system. What we need is a computational theory of color perception which can also be mapped to what is known about the visual system. Helson`s psychophysical experiments point to the use of color shifts. The computational theory presented above is very simple and is based on color shifts. It is very effective at computing color constant descriptors. It is based on the computation of local space average color using a resistive grid. Local space average color is used to obtain an estimate of the illuminant locally for each point of the scene. Since the illuminant is estimated locally, the method is also able to cope with multiple illuminants, i.e. a spatially varying illumination of the scene. The resistive grid is formed through locally resistively coupled neurons. The long range connections through the corpus callosum are assumed to connect adjacent neurons (adjacent with respect to the receptive field of the neuron). This provides an exchange of data between the left and right hemisphere of the brain. Some neurons of V4 are assumed to receive input from this resistive grid as well as through V1 which provides the original retinal data within a rotated coordinate system. The receptors of the retina provide a logarithmic response. Local space average color is assumed to be subtracted from the logarithmic retinal response, i.e. all that is needed is a negative coupling between the system computing local space average color and the system providing the input signal. A color constant descriptor is obtained as a result. This computational algorithm is able to explain Helson's results (Helson 1938, Ebner 2007a), the results by Land et al. (1983) on a patient who had his callosal connections cut and it also explains Werner's results on why color constancy seems to improve for moving objects (Ebner 2012a, Ebner 2012b). Acknowlegement John McCann provided very helpful comments to improve this paper.

Journal

Bio-Algorithms and Med-Systemsde Gruyter

Published: Dec 1, 2012

There are no references for this article.