Home
Exeter Hypnosis
Download The Book
Clients
Free Newsletter
Tell-a-Friend!
The Subconscious
Hypnosis Secrets
De-Hypnotise
NLP
Technobabble
Self Hypnosis
Power of Suggestion
Confidence
The Self Image
Hypnosis in Context
Lose Weight
Fear & Phobias
Anxiety
Job Interviews
Depression
Stop Smoking
Quit Smoking CD
Driving Nerves
Dream Analysis
The Need to Sleep
Binaural Records
Theta Healing
Links
Research and Proof
Products
Contact
Hypnosis Spiral
Personal CDs
 

Stereopsis: How we are able to perceive a 3D world through retinal stimulation

Depth Perception:

How we are able to perceive a 3D world

 

W. Williams

 

 

A variety of mechanisms are responsible for reproducing a three-dimensional world from the two-dimensional images projected onto our retinas. Monocular cues allow us to perceive depth through a single eye, relying on psychological inferences, whereas binocular mechanisms utilise both eyes for a more cognitive based perception. The latter is the stronger and most reliable system, and as such has evolved more in predatory creatures, whose sense of depth assists in hunting and chasing. Stereopsis occurs through the overlap of the visual fields, ranging from the almost absolute (e.g. eagles) to none (e.g. chameleons). Whilst both monocular and binocular vision have their own methods of calculating depth, they also have their difficulties, and work together to provide an optimal and accurate estimation of depth. 

 

Leonardo Da Vinci was among the first to illustrate various monocular cues to create depth in paintings (1452-1519), and was an early pioneer of depth perception. Our awareness and experience of perspective informs us that objects become smaller with distance, also inferred by our knowledge of the relative size of objects. Haze can indicate depth through a direct distance to shade relationship, such as mountains becoming lighter in the distance. Superposition, where objects conceal others, indicate depth through knowledge of order, such as figure before ground. Motion parallax is the most effective monocular cue, and describes the relative perceived motion of objects depending on their distance (Helmholtz 1925 as cited in Daw 1995 p43). Gibson et al (1950 as cited in Daw 1995) also highlights the variable of texture, in that course textures appear closer and finer ones more distant. These are the most common monocular cues, but there are many others that are offered through individual experience of the environment, such as our knowledge of buildings, sound and their relationship to distance. The focussing of the lens and subsequent image also provides vital clues, and so monocular cues also rely on physiological information such as the flexing of the lens.

 

Stereopsis relies on the different images of the world that each eye produces, caused by their differing perspective from approx. 63mm apart. Wheatstone (1838 as cited in Gulick and Lawson 1976) was the main pioneer of stereoscopic theory, listing the principle factors of stereopsis as convergence of the eyes, disparity of images, and size of retinal images. Disparity is the most important principle, and describes the difference between the images seen by each eye, the convergence and processing of which is termed stereopsis. Wheatstone’s basic rule of stereopsis was that ‘when the half-views of a stereogram are exact replicas of the monocular views of a solid object, then binocular combination of the half-views creates an image of the solid object’ (Gulick and Lawson 1976, p 20). Disparity is approximately proportional to the depth difference divided by the square of the viewing difference (Hubel 1995 p213). It therefore increases with the amount of depth, but decreases rapidly with viewing distance. An object lying before a fixated object will produce disparate images either side of the foveas, and appear to be closer. Closer objects have crossed disparities, with further objects being uncrossed (referring to the lines of sight according to the point of fixation). Therefore, if corresponding retinal images fall on the temporal hemispheres, the paths will cross an object fixed on the fovea, and appear closer. If the images fall on the nasal hemispheres, the object will appear further away (figure 1).

 


 

 


Images disparate of more than 2 seconds of arc horizontally will produce a double image (allowing depth perception with 25mm at 25cm, Frisby 1988). Upto these retinal distances the images will appear fused when fixed on the fovea, the depth decided by any minute differences in disparity. When the image becomes double, the sense of depth will be lost stereoscopically to a certain extent, but may still be psychologically inferred through monocular cues or a previous knowledge of the fixated depth. The area in which obects will appear fused and in depth is termed Panum’s area (after Panum, 1868 as cited in Gulick and Lawson 1976) and is logically oval in shape, lying on the horizontal plane of the eyes, allowing for only small differences in vertical disparity. Panum also suggested that disparate images could be fused when falling within retinal areas thought to be 0.052mm, or 15 to 20 cones in diameter. Panum argued that stereoscopic depth perception acts independently from attention, eye movements or psychological processes, and is instead produced by retinal nerve energies, which he termed ‘synergy of the binocular parallax’. Although popular, Panum’s theory leaves no role for experience in depth perception, and is vague and untestable in it’s unsubstantiated speculation. Ogle (1964 as cited in Bruce et al 1996) suggests that Panum’s area increases towards the periphery, reflecting the larger receptive fields and increasing lack of cones.

 

When the corresponding images of an object are fixed on the fovea, the plane of depth is called the horopter, and lies on a horizontal circular plane, passing through both the eyes and the fixated point (figure 2). 

 


 

 


The horopter varies with distance, becoming concaved towards close objects, and slightly convexed for distant objects (Ogel 1964 as cited in Gulick and Lawson). Hubel (1995) describes the horopter more subjectively as ‘neither a plane nor a sphere but depends on our estimations of distance, and consequently on our brains’ (p147). As the horopter itself has no real depth value, it is generally thought that convergence of the eyes provides information with which the brain scales depth (Brewster 1849 as cited in Gulick and Lawson 1976). Helmholtz (1925 as cited in Daw 1995) studied unconscious inferences of stereopsis, allowing for experiences and inferences as the important precursors of stereopsis, the deliberate and conscious mechanisms soon becoming second nature. His famous stereogram of negative images still allowing fusion recognised the overriding importance of contours over the similarity of the objects themselves. This challenged Panum’s earlier emphasis of exact images, instead supporting the role of contours as the physiological importance. This was clarified by the work of Ogle (1964) who, emphasising physiology as an important role of stereopsis, stated ‘In every case stereoscopic depth depends on the disparity between the images of identifiable contours’ (Gulick and Lawson 1976 p45).

 

The neurophysiology of stereopsis clarifies the underlying mechanics. It is has been discovered through electrode probings of cat striate cortex that almost all of the neurons are activated by both eyes, corresponding to the near total overlap of a cat’s visual field. (Bishop, Henry and Coombs 1972 as cited in Pettigrew 1972). Disparity cells have also been found in humans (Hubel and Weisel 1970 as cited in Bruce et al 1996), and monkeys (Poggio and Poggio 1984 as cited in Bruce et al 1996). The visual cortex is logically the first area in the visual pathway where this ability would occur, as the retinal ganglion cells and geniculate cells are activated by specific hemispheres only, highlighted by the lateral geniculate layering where the eyes are still separated. Binocular cells are complex neurons, after receiving their inputs from simple neurons from the adjacent cortical layer. Through measuring the firing rates of neurons in a cat’s striate cortex, Hubel (1995) found that cells were activated by various directional movements of various disparities, normally lying across the vertical midline of the visual overlap. Using the method of electrode probing and retinal stimulation, Poggio and Fischer (1977 as cited in Bruce et al 1996) found the binocular striate cortex cells of monkeys to be divided into four groups. These are:

 

  • tuned excitatory cells, responding to objects at or close to fixation (inhibited by further or closer stimuli)
  • tuned inhibitory cells, responding to wide range of depth but inhibited by the fixation point
  • near cells, excited by stimuli nearer than the fixation point, inhibited by further depths
  • far cells, excited by stimuli farther than the fixation point, inhibited by closer depths.

 

The frequency of firing and cell type could distinuguish a multitude of depths, much as the cochlea can detect frequency and amplitude of sound. It is still not clear however as to how common the cells are, particularly in other species including humans, whether they occur in special layers, and their relationship to occular dominance columns. They do however maintain the orientation specificity (further supporting the importance of contours first realised by Helmholtz and Ogle), and otherwise act as ordinary upper level complex cells. It is understandable how neurons may be activated following the fusion of identical images, but it remains a mystery how multiple image disparities may be fused accurately.

 

 


 

 


Figure 3 shows a problem that may be perceived a number of ways. A number of different matchings of the array are possible, all producing different depths and planes of Xs. A row of three Xs may be perceived before of after the array, with half Xs either side, or a row of four Xs lying perpendicular between X1 and X4 (two of which are shown in the figure). The depths of the fused Xs are dependant on the disparities made on the retina and whether the lines of sight are crossed before or after the array. This is the underlying principle to stereograms, invented first in the form of random dots by Julezs (1960 as cited in Frisby 1988) as an aid to stereoscopic research and later developed into a more popular form of visual entertainment by the Magic Eye series. The lack of context and cues within the images (particularly the random dot stereograms of Julezs) highlight the physiological mechanisms of stereopsis as independent of contextual cues and experience. The seemingly random images are ordered into horizontal repetitions, the differing distances between them creating the illusion of depth when fused together. Figure 4 shows an example of stereoptic ambiguity.

 

 


 

 


Julezs (1971 as cited in Bruce et al 1996) suggested that the correspondance problem is solved by a global stereopsis mechanism, which attempts to match as many points as possible. Marr and Poggio (1976 as cited in Marr 1982) developed a successful algorithm to ‘solve’ random dot stereograms. This was based on three constraints: compatibility (the images must match in order to be fused); uniqueness (images cannot be multiple) and continuity (smooth variance of disparity). Whilst offering a model to suggest how the brain may operate, and supporting Julezs global mechnism theory, the model lacks ecological validity. For example, images do not always have to match for fusion to occur.

 

As images become less identical, the liklihood of fusion also decreases, although mismatched colours can still fuse provided the contours are the same (Helmholtz 1925, Ogle 1965). However, when the contours differ, fusion becomes almost impossible, and the images compete for dominance. This is known as rivalrous stimuli, and is an interesting problem for stereopsis. If two images are fused, one of a horizontally orientated grid and the other vertical, the fused image will at first appear to be made of squares, but closer analysis will show that the image is actually broken into random and changing patches of vertical and horizontal lines, competing for dominance (see figure 5).

 


 

 

 


The effect is stronger with colours, a green square fused with a red one will rapidly alternate colour. It seems that the brain recognises an impossible situation and abandons fusion rather than tries to make sense of it. Pre-stereoptic infants are immune to rivalrous stimuli, presumably perceiving a fused view of the two images (Birch et al 1982 as cited in Daw 1995). During the post-stereoptic period of visual development however, infants will avoid looking at rivalrous stimuli, suggesting the onset of the phenomenon.

 

Stereoscopic acuity in the adult has found to be only a few seconds of arc, therefore images disparate by this amount will register depth. This is ten times more acute than vernier and grating acuity, where lines must be seperated by at least 1-2 minutes of arc to be seen as separate. It would be expected therefore that a certain level of grating acuity would be a prerequisite of stereopsis. Despite this, research shows that during the development of stereopsis and orthotropia, improvement in grating acuity is small. Using adjacent panels of varying depth, Held et al (1980 as cited in Daw 1995) found that infants can discriminate around 16 weeks of age, when the acuity subsequently develops from over 60 minutes of arc to less than 1 minute in a few weeks (Birch et al 1982 as cited in Daw 1995). The main correlation with the onset of stereopsis is the segregation of ocular dominance columns, where the signals from the two eyes converge onto visual cortical layers II and III. This suggests that stereopsis emerges with the development of binocular cells, which are likely to be fine tuned by aptosis (deliberate cell death) and neural networking following experience and stimulation.

 

The relationship between monocular and binocular cues to provide depth perception is complex and varied. The notion of pseudoscopic viewing highlights the dependancy between the mechanisms. When the disparities of a stereogram are reversed, the depth planes are also reversed. However if the stereogram is comprised of two photos, such as swapping the photo images in a Wheatstone viewer, the reversed depth is difficult to achieve (Bruce et al 1996). Although physiologically reversed, the perceived effect is overriden by monocular cues, that tell us it’s impossible for a wall to be closer than the tree planted in front (superposition). Similarly, when viewing the inside of a hollow mask of a human face, we percieve the mask as convex, like an ordinary face (Gregory 1973 as cited in Bruce et al 1996). Cues such as a protruding nose and the effect of shade assist in this illusion. Hill and Bruce (1993, 1994 as cited in Bruce et al 1996) have also shown experimentally that familiarity of faces and a general preference for convexity tend to favour the illusory face-like interpretation of the hollow mask. Monocular cues have also shown to compete with stereopsis in other ways. Rogers and Graham (1979 as cited in Bruce et al 1996) have shown that the motion parallax view obtained with one eye can convey a sense of depth as strong as that provided by two eyes. This is logical following that the different views gained through different positions is the basis of stereopsis, although monocularly the time difference hinders the comparison. The 3D perception of the world based on 2D images is a complex and almost flawless process. Based on experience, focussing, inferences, convergence, disparity, psychology and physiology, the brain can compute an accurate interpretation of depth, as it was before it was lost on our retinas. Contradictory information between the mechanisms can however fool and confuse the system, as shown through stimulus rivalry, and the overriding importance of inferences and contours. The dominance of certain mechanisms pose computational problems that, like the complex process of image matching, have yet to be solved.

 

 

References

 

Bruce, U; Green, P.R; Georgeson, M.A. (1996). Visual Perception: Physiology, Psychology and

Ecology (3rd Ed.). Erlbaum: Taylor and Francis.

 

Daw, N.W. (1995). Visual development. New York: Plenum.

 

Frisby, J.P. (1988). Stereopsis, Binolcular Perception. In Held, R. (1988). Sensory Systems I: Vision

and Visual Systems. Boston: Birkhauser. Pp 57-58.

 

Gulick, W.L; Lawson, R.B. (1976). Human Stereopsis: A Psychophysical analysis. NewYork:

Oxford Press.

 

Hubel, D.H. (1995). Eye, Brain and Vision. New York: Scientific American.

 

Marr, D. (1982). Vision. San Fransisco: WH Freeman.

 

Pettigrew, J.D. (1972). The Neurophysiology of Binocular Vision.

In Recent Progress in Perception. Scientific American. (1972). San Fransisco: WH Freeman. Pp 55-66.