The Sense of Visionplays a crucial role in primates, allowing them to retrieve a representation of the environment that is suitable to perform vital control tasks such as moving within the environment, tracking and manipulating objects, and recognizing them.
 
Any imaging system, such as the eye, a video-camera or a telescope, entails a map of three-dimensional environment onto the two-dimensional surface of an imaging sensor.  Images of any such map are characterized by a loss of information along a spatial dimension.  Despite such a loss, humans are extremely efficient at recovering spatial information from images.  In the following table, the same data are displayed as a two-dimensional array of positive values - as images are represented in a computer - and as a brightness map,  a representation  far better suited for interpretation by our visual system.

Many of these images convey a vivid impression of the three-dimensional shape of the scene being imaged. This is so because we rely on very strong prior assumptions on the scene. For instance, assuming that the scene contain surfaces with uniform reflectance properties allows one to associate changes in image brightness (shading) to three-dimensional shape. In this sense, shading is a "cue" to three-dimensional shape. Indeed shading is just an example of so-called PICTORIAL CUES, that is measurable properties of an image that, associated with prior assumptions on a scene, contain information about its three-dimensional shape. Other examples include texture, T-junctions, cast shadows and blur.
 

Pictorial cues, however, are intrinsically ambiguous, in that the prior assumptions cannot be validated. For instance, the "blur image"could be interpreted as two squares with similar spatial frequency content placed at different depths, or as a square with high spatial frequency content surrounded by a frame with low spatial frequency, all at one depth. In this sense pictures are illusions, in that they are three-dimensional scenes different than the true scene that generate the same image.As another example, just by looking at this picture, you could not distinguish Jerry from a cardboard copy of himself (but try clicking on the image ...)
 

CONTROLLABLE CUES
Controllable Cues, unlike pictorial ones,  are not present in one single image. Rather, they are associated with variations among different images of the same scene.

For instance, when we take images of a scene from  a changing viewpoint we have either the "Stereo" cue (2 images) or the "Motion" cue (several images). Stereo and Motion are two manifestations of the so-called Parallax cue, that is measurable properties of images that are associated with a change of the viewpoint. As another example, consider different images of the same scene taken with different geometry of the imaging device. For instance, the human eye can change the shape of the lens to make the aperture larger or smaller, or to accommodate points at different depths. Similarly, in a camera one can change the aperture of the lens and its focus. Measurable properties of images that are associated with a change in the geometry of the imaging device are called Accommodation cues. 

The peculiarity of controllable cues is that, unlike pictorial cues, they can provide unambiguous information about the three-dimensional shape and motion of a scene, as we will see later.
 

EXOGENOUS CUES
To complete the picture, one could also consider Exogenous cues, that is cues that are associated with the introduction of known structures in the scene, as for instance when projecting structured light or shadows into the scene.
 
 

BACK