Skip to Main Content
Have library access? Log in through your library
Scene Vision

Scene Vision: Making Sense of What We See

Kestutis Kveraga
Moshe Bar
Copyright Date: 2014
Published by: MIT Press
Pages: 328
  • Cite this Item
  • Book Info
    Scene Vision
    Book Description:

    For many years, researchers have studied visual recognition with objects -- single, clean, clear, and isolated objects, presented to subjects at the center of the screen. In our real environment, however, objects do not appear so neatly. Our visual world is a stimulating scenery mess; fragments, colors, occlusions, motions, eye movements, context, and distraction all affect perception. In this volume, pioneering researchers address the visual cognition of scenes from neuroimaging, psychology, modeling, electrophysiology, and computer vision perspectives. Building on past research -- and accepting the challenge of applying what we have learned from the study of object recognition to the visual cognition of scenes -- these leading scholars consider issues of spatial vision, context, rapid perception, emotion, attention, memory, and the neural mechanisms underlying scene representation. Taken together, their contributions offer a snapshot of our current knowledge of how we understand scenes and the visual world around us.ContributorsElissa M. Aminoff, Moshe Bar, Margaret Bradley, Daniel I. Brooks, Marvin M. Chun, Ritendra Datta, Russell A. Epstein, Michèle Fabre-Thorpe, Elena Fedorovskaya, Jack L. Gallant, Helene Intraub, Dhiraj Joshi, Kestutis Kveraga, Peter J. Lang, Jia Li Xin Lu, Jiebo Luo, Quang-Tuan Luong, George L. Malcolm, Shahin Nasr, Soojin Park, Mary C. Potter, Reza Rajimehr, Dean Sabatinelli, Philippe G. Schyns, David L. Sheinberg, Heida Maria Sigurdardottir, Dustin Stansbury, Simon Thorpe, Roger Tootell, James Z. Wang

    eISBN: 978-0-262-31989-8
    Subjects: Biological Sciences, Technology

Table of Contents

  1. Front Matter
    (pp. i-vi)
  2. Table of Contents
    (pp. vii-viii)
  3. Acknowledgments
    (pp. ix-x)
  4. The Current Scene
    (pp. 1-4)
    Moshe Bar

    For decades, visual recognition has been studied with objects rather than with scenes: single, clean, clear, and isolated objects presented to subjects at the center of the screen. This is the type of display you only see in a laboratory. In our real environment, objects do not appear so neatly. Our visual world is a stimulating scenery mess. Fragments, colors, occlusions, motion, eye movements, context, and distraction all have profound effects on perception. But except for a few brave researchers (with the seminal work of Irving Biederman from the 1960s and 1970s standing out), most have made the implicit or...

  5. 1 Visual Scene Representation: A Spatial-Cognitive Perspective
    (pp. 5-26)
    Helene Intraub

    Traditionally, scene perception has been conceptualized within the modality-centric framework of visual cognition. However, in the world, observers are spatially embedded within the scenes they perceive. Scenes are sampled through eye movements but also through movements of the head and body, guided by expectations about surrounding space. In this chapter, I will address the idea that scene representation is, at its core, a spatio-centric representation that incorporates multiple sources of information: sensory input, but also several sources of top-down information. Boundary extension (false memory beyond the edges of a view; Intraub 2010; Intraub & Richardson, 1989) provides a novel window onto...

  6. 2 More Than Meets the Eye: The Active Selection of Diagnostic Information across Spatial Locations and Scales during Scene Categorization
    (pp. 27-44)
    George L. Malcolm and Philippe G. Schyns

    Imagine flipping through a friend’s holiday photos. In one you see a number of buildings and streets and realize that she had been in a city. But if you knew she was visiting several cities during her trip and wanted to know which particular one this photo was taken in, you might recognize the castle on the hilltop in the background and conclude that your friend went to Edinburgh. Or maybe you are less concerned with the city she visited and more by the weather during the trip. On the same photo you notice that the sky is dark and...

  7. 3 The Constructive Nature of Scene Perception
    (pp. 45-72)
    Soojin Park and Marvin M. Chun

    Humans have the remarkable ability to recognize complex, real-world scenes in a single, brief glance. Thegist, the essential meaning of a scene, can be recognized in a fraction of a second. Such recognition is sophisticated, in that people can accurately detect whether an animal is present in a scene or not, what kind of event is occurring in a scene, as well as the scene category, all in as little as 150 ms (Potter, 1976 ; Schyns & Oliva, 1994; Thorpe, Fize, & Marlot, 1996; VanRullen & Thorpe, 2001). With this remarkable ability, the experience of scene perception feels effortless. It is...

  8. 4 Deconstructing Scene Selectivity in Visual Cortex
    (pp. 73-84)
    Reza Rajimehr, Shahin Nasr and Roger Tootell

    In high-order object-processing areas of the ventral visual pathway, discrete clusters of neurons (“modules”) respond selectively to specific categories of complex images such as faces (Kanwisher, McDermott, & Chun, 1997; Tsao, Freiwald, Knutsen, Mandeville, & Tootell, 2003; Tsao, Moeller, & Freiwald, 2008), places/scenes (Aguirre, Zarahn, & D’Esposito, 1998; Epstein & Kanwisher, 1998), body parts (Downing, Jiang, Shuman, & Kanwisher, 2001; Grossman & Blake, 2002), and word forms (Cohen et al., 2000). On the other hand, stimuli of a common category often also share low-level visual cues, and correspondingly, it has been reported that many neurons in the inferior temporal (IT) cortex (which is the final stage of...

  9. 5 The Neurophysiology of Attention and Object Recognition in Visual Scenes
    (pp. 85-104)
    Daniel I. Brooks, Heida Maria Sigurdardottir and David L. Sheinberg

    If you are like most academics and scholars, you start your day with a cup of coffee. This task (Land & Hayhoe, 2001), among other things, requires you to locate the correct cupboard in the kitchen, open it, search for the can of coffee grounds among the surrounding clutter, and operate the coffee brewer by finding and pushing the correct buttons on the coffee maker. Depending on your view and current goal, the kitchen with its furniture and appliances, the cupboard full of miscellaneous objects, and the coffee brewer with its all too many buttons can be thought of as scenes...

  10. 6 Neural Systems for Visual Scene Recognition
    (pp. 105-134)
    Russell A. Epstein

    If you were to look out my office window at this moment, you would see a campus vista that includes a number of trees, a few academic buildings, and a small green pond. Turning your gaze the other direction, you would see a room with a desk, a bookshelf, a rug, and a couch. Although the objects are of interest in both cases, what you see in each view is more than just a collection of disconnected objects—it is a coherent entity that we colloquially label a “scene.” In this chapter I describe the neural systems involved in the...

  11. 7 Putting Scenes in Context
    (pp. 135-154)
    Elissa M. Aminoff

    Human vision can understand an image of a scene extremely quickly and effortlessly. However, the mechanisms mediating scene understanding are still being explored. This chapter proposes that scene understanding is not derived from a unique, isolated cognitive process but rather is part of a more general mechanism of associative processing. When a person is understanding a scene, it is the collection of associations, meaning the co-occurrence of objects, the spatial relations among these objects, and other statistical regularities associated with scene categories that are processed. The object-to-object relations and spatial relations define the scene and signify a scene category. Framing...

  12. 8 Fast Visual Processing of “In-Context” Objects
    (pp. 155-176)
    M. Fabre-Thorpe

    Isolated objects do not exist in the natural world because they are virtually always embedded in contextual scenes. Despite such complexity, object recognition within scenes appears both effortless and virtually instantaneous for humans, whereas coping with natural scenes is still a major challenge for computer vision systems. Even in categorization tasks with briefly flashed (20 ms) natural scenes, humans can respond with short latencies (250–280 ms) to any exemplar from a wide range of categories (animal, human, vehicles, urban or natural scenes).

    In daily life, online contextual information can facilitate the processing of all objects that can be expected...

  13. 9 Detecting and Remembering Briefly Presented Pictures
    (pp. 177-198)
    Mary C. Potter

    During our waking hours we take a new mental snapshot—a fixation—about three times a second. What do we pick up from each glimpse, and for how long do we remember what we saw? What is the form of our memory representation—visual, conceptual, or both—and does it change over time? One method for addressing these questions in the laboratory is to simulate continual shifts of fixation by using rapid serial visual presentation (RSVP) of sequences of unrelated pictures. When viewers are given a target name such aspicnicorsmiling couple, they are able to detect a...

  14. 10 Making Sense of Scenes with Spike-Based Processing
    (pp. 199-224)
    Simon Thorpe

    Although the ability of the human visual system to process complex natural scenes is very impressive, the state of the art in computer vision is starting to catch up. Interestingly, the best artificial systems use processing architectures built on simple feedforward mechanisms that look remarkably similar to those used in the primate visual system. However, the procedures used for training these artificial systems are very different from the mechanisms used in biological vision. In this chapter I discuss the possibility that spike-based processing and learning mechanisms may allow future models to combine the remarkable efficiency of the latest computer vision...

  15. 11 A Statistical Modeling Framework for Investigating Visual Scene Processing in the Human Brain
    (pp. 225-240)
    Dustin E. Stansbury and Jack L. Gallant

    An overarching goal of visual neuroscience is to understand how the visual system processes natural scenes. Natural scenes possess statistical structure, and computational models based on this structure have provided numerous insights into the processing mechanisms implemented in the early visual system (Barlow, 1961; Field, 1987; Geisler, Perry, Super, & Gallogly, 2001; Simoncelli & Olshausen, 2001). Despite the success of modeling early visual processing based on natural scene statistics, there are still few studies that take this approach when modeling later stages of visual processing.

    In this chapter we present a simple but powerful framework for developing models of visual processing that...

  16. 12 On Aesthetics and Emotions in Scene Images: A Computational Perspective
    (pp. 241-272)
    Dhiraj Joshi, Ritendra Datta, Elena Fedorovskaya, Xin Lu, Quang-Tuan Luong, James Z. Wang, Jia Li and Jiebo Luo

    In this chapter we discuss the problem of computational inference of aesthetics and emotions from images. We draw inspiration from diverse disciplines such as philosophy, photography, art, and psychology to define and understand the key concepts of aesthetics and emotions. We introduce the primary computational problems that the research community has been striving to solve and the computational framework required for solving them. We also describe data sets available for performing assessment and outline several real-world applications for which research in this domain can be employed. This chapter discusses the contributions of a significant number of research articles that have...

  17. 13 Emotion and Motivation in the Perceptual Processing of Natural Scenes
    (pp. 273-290)
    Margaret M. Bradley, Dean Sabatinelli and Peter J. Lang

    Because human cognition, including perception, has evolved in the context of a fundamental drive to survive, it is useful to consider the role of motivation in perceptual processing. In this chapter, we focus on perceptual processing of natural scenes that humans describe as emotionally arousing—both pleasant and unpleasant scenes. It is proposed that these scenes engage motivational circuits that have evolved in the mammalian brain to promote the survival of individuals and their progeny. Motive circuit activation prompts enhanced perceptual processing and information intake in the service of selecting and implementing effective coping actions in both appetitive and defensive...

  18. 14 Threat Perception in Visual Scenes: Dimensions, Action, and Neural Dynamics
    (pp. 291-306)
    Kestutis Kveraga

    Efficient recognition of threat is necessary for survival. Identifying threats in natural environments can be a difficult task for which humans have evolved a finely tuned visual recognition and action system. Although threat is a type of negative stimulus that typically initiates defensive reactions and mobilizes fight-or-flight systems in the brain, nonthreatening negative stimuli can produce exploratory behavior and engage associative processing. In this chapter I describe a number of studies that have explored visual discrimination of different types of negative stimuli in real-world scene images. I discuss the role of spatial and temporal properties of threat, the neural systems...

  19. Contributors
    (pp. 307-310)
  20. Index
    (pp. 311-312)
  21. [Illustrations]
    (pp. None)