Figure Ground Assignment In Natural Images Landscaping


Occlusion boundaries and junctions provide important cues for inferring three-dimensional scene organization from two-dimensional images. Although several investigators in machine vision have developed algorithms for detecting occlusions and other edges in natural images, relatively few psychophysics or neurophysiology studies have investigated what features are used by the visual system to detect natural occlusions. In this study, we addressed this question using a psychophysical experiment where subjects discriminated image patches containing occlusions from patches containing surfaces. Image patches were drawn from a novel occlusion database containing labeled occlusion boundaries and textured surfaces in a variety of natural scenes. Consistent with related previous work, we found that relatively large image patches were needed to attain reliable performance, suggesting that human subjects integrate complex information over a large spatial region to detect natural occlusions. By defining machine observers using a set of previously studied features measured from natural occlusions and surfaces, we demonstrate that simple features defined at the spatial scale of the image patch are insufficient to account for human performance in the task. To define machine observers using a more biologically plausible multiscale feature set, we trained standard linear and neural network classifiers on the rectified outputs of a Gabor filter bank applied to the image patches. We found that simple linear classifiers could not match human performance, while a neural network classifier combining filter information across location and spatial scale compared well. These results demonstrate the importance of combining a variety of cues defined at multiple spatial scales for detecting natural occlusions.

Keywords: occlusions, natural images, psychophysics, edge detection, neural networks


One useful set of two-dimensional cues for inferring three-dimensional scene organization are the boundaries and junctions formed by the occlusions of distinct surfaces (Guzman, 1969; Nakayama, He, & Shimojo, 1995; Todd, 2004), as illustrated in Figure 1. In natural images, occlusion boundaries are defined by multiple cues, including local texture, color, and luminance differences, all of which are integrated perceptually (McGraw, Whitaker, Badcock, & Skillen, 2003; Rivest & Cavanagh, 1996). Although numerous machine vision studies have developed algorithms for detecting occlusions and junctions in natural images (Hoiem, Efros, & Hebert, 2011; Konishi, Yuille, Coughlin, & Zhu, 2003; Martin, Fowlkes, & Malik, 2004; Perona, 1992), relatively little work in visual psychophysics has directly studied natural occlusion detection (McDermott, 2004) or used natural occlusions in perceptual tasks (Fowlkes, Martin, & Malik, 2007).

Figure 1

Occlusion of one surface by another in depth gives rise to image patches containing occlusion edges (magenta circle) and junctions (cyan and purple circles).

In this study, we investigate the question of what locally available cues are used by human subjects to detect occlusion boundaries in natural scenes. We approach this problem by developing a novel database of natural occlusion boundaries taken from a set of uncompressed calibrated images used in previous research (Arsenault, Yoonessi, & Baker, 2011; Kingdom, Field, & Olmos, 2007; Olmos & Kingdom, 2004). We demonstrate that our database exhibits strong intersubject agreement in the locations of the labeled occlusions, particularly when compared with edges derived from image segmentation databases. In addition, we find that a variety of simple visual features characterized in previous studies (Balboa & Grzywacz, 2000; Fine, MacLeod, & Boynton, 2003; Geisler, Perry, Super, & Gallogly, 2001; Ing, Wilson, & Geisler, 2010; Rajashekar, van der Linde, Bovik, & Cormack, 2007) can be used to distinguish occlusion and surface patches.

Using a simple two-alternative forced choice experiment, we test the ability of human subjects to discriminate local image regions containing either occlusions or single surfaces. In agreement with related work on junction detection (McDermott, 2004) and image classification (Torralba, 2009), we find that subjects require a fairly large image region (32 × 32 pixels) in order to make reliable judgments. Using a quadratic classifier analysis, we find that simple visual features defined on the scale of the whole image patch (i.e., luminance gradients) are insufficient to account for human performance, suggesting that human subjects integrate complex spatial information existing at multiple scales.

We investigated this possibility further by training standard linear and neural network classifiers on the rectified outputs of a set of Gabor filters applied to the occlusion and surface patches. We found that a linear classifier cannot fully account for subject performance since this classifier simply detects low spatial frequency luminance edges. However, a neural network having a moderate number of hidden units compared much better to human performance by combining information from filters across multiple locations and spatial scales. Our analysis demonstrates that only one layer of processing beyond the initial filtering and rectification is needed for reliably detecting natural occlusions. Interpreting the hidden units as implementing “second-order” filters, our results are consistent with previous demonstrations that filter-rectify-filter (FRF) models can detect edges defined by cues other than luminance differences (Baker & Mareschal, 2001; Bergen & Landy, 1991; Graham, 1991; Landy, 1991).

This study complements and extends previous work by quantitatively demonstrating the importance of integrating complex, multiscale spatial information when detecting natural occlusion edges (McDermott, 2004). Furthermore, this work provides the larger vision science community with a novel database of occlusion edges as well as a benchmark dataset of human performance on a standard edge-detection problem studied in machine vision (Konishi et al., 2003; Martin et al., 2004; Zhou & Mel, 2008). Finally, we discuss possible mechanisms for natural occlusion detection and suggest directions for future research.


Image databases

A set of 100 images containing few or no manmade objects were selected from a set of over 850 calibrated uncompressed color images from the McGill Calibrated Color Image Database (Olmos & Kingdom, 2004). Images were selected to have a clear figure-ground organization and plenty of discernible occlusion boundaries. Some representative images are shown in Figure 2 (left column). A group of five paid undergraduate research assistants were instructed to label all of the clearly discernible continuous occlusion boundaries using Adobe Photoshop layers. They were given the following instructions:

Figure 2

Representative images from the occlusion boundary database, together with subject occlusion labelings. Left: Original color images. Middle: Grayscale images with overlaid pixels labeled as occlusions (white lines) and examples of surface regions (magenta...

“Your task is to label the occlusion contours in the given set of 100 images. An occlusion contour is an edge or boundary where one object occludes or blocks the view of another object or region behind it. Label as many contours as you can, but you do not need to label contours that you are unsure of. Make each distinct contour a unique color to help with future analysis. Each contour must be continuous (i.e., one connected piece). Start by labeling contours on the largest and most prominent objects, and work your way down to smaller and less prominent objects. Do not label extremely small contours like blades of grass.”

Students worked independently so their labeling reflected their independent judgment. The lead author (CD) hand-labeled all images as well, so there were six subjects total.

In order to compare the statistics of occlusions with image regions not containing occlusions, a database of “surface” image patches was selected from the same images by the same subjects. “Surfaces” in this context were broadly defined as uniform image regions which do not contain any occlusions, and subjects were not given any explicit guidelines beyond the constraint that the regions they select should be relatively uniform and could not contain any occlusions (which was prevented by our custom-authored software). No constraints were imposed with respect to lighting, curvature, material, shadows or luminance gradients. Therefore, some surface patches contained substantial luminance gradients, for instance patches of zebra skin (Figure 3). Each subject selected 10 surface regions (60 × 60) from each of the 100 images, and for our analyses we extracted image patches of various sizes (8 × 8, 16 × 16, 32 × 32) at random locations from these larger 60 × 60 regions. Example 32 × 32 surface patches are shown in Figure 2 (middle panel), and examples of both surface and occlusion patches are shown in Figure 3.

Figure 3

Examples of 32 × 32 occlusions (left), surfaces (right), and shadow edges not defined by occlusions (bottom).

Quantifying subject consistency

In order to quantify intersubject consistency in the locations of the pixels labeled as occlusions we applied precision-recall analysis commonly used in machine vision (Abdou & Pratt, 1979; Martin et al., 2004). In addition, we also developed a novel analysis method which we call the most-conservative subject (MCS) analysis which controls for the fact that disagreement between subjects in the location of labeled occlusions often arises simply because some subjects are more exhaustive in their labeling than others.

Precision-recall analysis is often used in machine vision studies of edge detection in order to quantify the trade-off between correctly detecting all edges (recall) and not incorrectly labeling non-edges as edges (precision). Mathematically, precision (P) and recall (R) are defined:

where tp, fp, tn, fn are the true and false positives and true and false negatives, respectively. Typically, these quantities are determined by comparing a machine generated test edgemap E to a ground-truth reference edgemap G derived from hand-annotated images (Martin et al., 2004). Since all of our edgemaps were human generated, we performed a “leave-one-out” analysis where we compared a test edgemap from each subject to a reference “ground truth” edgemap defined by combining edgemaps from all of the other remaining subjects. Since our goal for this analysis was simply to compare human performance on two different tasks (occlusion labeling and region labeling), we did not make use of sophisticated boundary-matching procedures (Goldberg & Kennedy, 1995) used in previous studies to optimize the comparisons between human data and machine performance (Martin et al., 2004). We quantify the overall agreement between the test and reference edgemaps using the weighed harmonic mean of P and R defined by:

This quantity F is known as an F-measure and was originally developed to quantify the accuracy of document retrieval methods (Rijsbergen, 1979), but is also applied in machine vision (Abdou & Pratt, 1979; Martin et al., 2004). The parameter α determines the relative weight of precision and recall, and we used α = 0.5 in our analysis.

In addition to the precision-recall analysis, we developed a novel method for quantifying intersubject consistency, which minimizes problems of intersubject disagreement arising from the fact that certain subjects are simply more exhaustive in labeling all possible occlusions than other subjects. We defined the most conservative subject (MCS) for a given image as the subject who had labeled the fewest pixels. Using the MCS labeling, we generate a binary image mask F, which is 1 for any pixel within R pixels of an occlusion labeled by the MCS, and 0 for all other pixels. Applying this mask to the labeling of each subject yields a “reduced” labeling, which is valid for inter-subject comparison since it only includes the most prominent occlusions labeled by all of the subjects. To calculate the comparison between two subjects, we randomly assigned one binary edgemap as the “reference” (Iref) and the other binary edge-map as the “test” (Itest). We used the reference map to define a weighting function fγ(r), which was applied to all of the pixels in the test map that quantified how close each pixel in the test map was to a pixel in the reference map. Mathematically, our index is given by

where r((x,y), Iref) is the distance between (x,y) and the closest pixel in Iref, Nt is the number of pixels in the test set and the function fγ(r) is defined for 0 ≤ γ < ∞ by

and for γ = ∞ by

where R is the radius of the mask, which we set to R = 10 in our analysis. The parameter γ in our weighting function sets the sensitivity of fγ to the distance between the reference labeling and the test labeling. Setting γ = 0 counts the fraction of pixels in the test edgemap, which lie inside the mask generated by the reference edgemap, and setting γ = ∞ measures the fraction of pixels in complete agreement between the two edgemaps.

Statistical measurements

Patch extraction and region labeling

Occlusion patches of varying sizes centered on an occlusion boundary were extracted automatically from the database by choosing random pixels labeled as occlusions by a single subject, cycling through the six subjects. Since we only wanted patches containing a single occlusion separating two regions (figure and ground) of roughly equal size, we only accepted a candidate patch when:

  • 1.  The composite occlusion edge map from all subjects consisted of a single, connected piece in the analysis window.
  • 2.  The occlusion contacted the sides of the window at two distinct points and divided the patch into two regions.
  • 3.  Each region comprised at least 35% of the pixels.

Each occlusion patch consisted of two regions of roughly equal size separated by a single boundary, with the central pixel (w/2, w/2) of a patch of size w always being an occlusion (Figure 4a). Note that this procedure actually yields a subset of all possible occlusions, since it excludes T-junctions or occlusions formed by highly convex boundaries. Since the selection of occlusion patches was automated, there were no constraints on the properties of the surfaces on either side of the occlusion with respect to factors like lighting, shadows, reflectance or material. In addition to the occlusion patches, we extracted surface patches at random locations from the set of 60 × 60 surface regions chosen by the subjects. We used 8 × 8, 16 × 16, and 32 × 32 patches for our analyses and psychophysical experiments.

Figure 4

Illustration of stimuli used in the psychophysical experiments. (a) Original color and grayscale occlusion edge (top, middle) and its region label (bottom). The region label divides the patch into regions corresponding to the two surfaces (white, gray)...

Grayscale scalar measurements

To obtain grayscale images, we converted the raw images into gamma-corrected RGB images using software available online at: (Olmos & Kingdom, 2004). We then mapped the RGB color space to the NTSC color space, obtaining the grayscale luminance I = 0.2989 · R + 0.5870 · G + 0.1140 · B (Acharya & Ray, 2005). From these patches, we measured a variety of visual features, which can be used to distinguish occlusion from surface patches. Some of these features (for instance luminance difference) depended on there being a region labeling of the image patch, which separates it into regions corresponding to two different surfaces (Figure 4a). However, measuring these same features from surface patches is impossible since surface patches only contain a single surface. Therefore, in order to measure region labeling-dependent features from the uniform surface patches, we assigned to each surface patch a set of 25 “dummy” region labelings from our database (spanning all possible orientations). The measured value of the feature was then taken as the maximum value over all 25 dummy labelings, which is sensible since all of the visual features were on average larger for occlusions than uniform surface patches.

Given a grayscale image patch and a region labeling R = {R1, R2, B} partitioning the patch into regions corresponding to the two surfaces (R1,R2) as well as the set of boundary (B) pixels (Figure 4a), we measured the following visual features taken from the computational vision literature:

G1. Luminance difference Δμ:

where μ1, μ2 are the mean luminance in regions R1, R2, respectively.

G2. Contrast difference Δσ:

where σ1, σ2 are the contrasts (standard deviation) in regions R1, R2, respectively. Features G1 and G2 were both measured in previous studies on surface segmentation (Fine et al., 2003; Ing et al., 2010).

G3. Boundary luminance gradient GB:

where ∇I(x, y) = [∂I(x, y)/∂x,I(x, y)/∂y]T is the gradient of the image patch evaluated at the central pixel, and Ī is the average intensity of the image patch (Balboa & Grzywacz, 2000).

G4. Oriented energy Eθ:

where x is the image patch in vector form and , are a quadrature-phase pair of Gabor filters of Nθ = 8 evenly spaced orientations θi. For patch size w, the filters had means w/2 and standard derivations of w/4. Oriented energy has been used in several previous studies as a means of detecting edges (Geisler et al., 2001; Lee & Choe, 2003; Sigman, Cecchi, Gilbert, & Magnasco, 2001).

G5. Global patch contrast ρ:

This quantity has been measured in previous studies, which quantify statistical differences between fixated image regions and random image regions for subjects' free-viewing natural images while their eyes are being tracked (Rajashekar et al., 2007; Reinagel & Zador, 1999). Several studies of this kind have suggested that subjects may be preferentially looking at edges (Baddeley & Tatler, 2006).

Note that features G3-G5 are measured globally from the entire patch, whereas features G1, G2 are differences between statistics measured from different regions of the patch.

Color scalar measurements

Images were converted from RGB to LMS color space using a MATLAB program, which accompanies the images in the McGill database (Olmos & Kingdom, 2004). We converted the logarithmically transformed LMS images into an Lαβ color space by performing principal components analysis (PCA) on the set of LMS pixel intensities (Fine et al., 2003; Ing et al., 2010). Projections onto the axes of the Lαβ basis represent a color pixel in terms of its overall luminance (L), blue-yellow opponency (α), and red-green opponency (β).

We measured two additional properties from the LMS color image patches represented in the Lαβ basis:

C1. Blue-Yellow difference Δα:

where α1, α2 are the mean values of the B-Y opponency component α in regions R1, R2, respectively.

C2. Red-Green difference Δβ:

where β1, β2 are the mean values of the R-G opponency component β in regions R1, R2, respectively.

These color scalar statistics were motivated by previous work studying human perceptual discrimination of different surfaces (Fine et al., 2003; Ing et al., 2010).

Machine classifiers

Quadratic classifier analysis

In order to study how well various feature subsets measured from the image patches could predict human performance, we made use of a quadratic classifier analysis. The quadratic classifier is a natural choice for quantifying the discriminability of two categories defined by features having multivariate Gaussian distributions (Duda, Hart, & Stork, 2000), and has been used in previous work studying the perceptual discriminability of surfaces (Ing et al., 2010). Assume that we have two categories C1,C2 of stimuli from which we can measure n features u = (u1,u2, … , un)T, and features measured from each category are Gaussian distributed

where μi and Σi are the means and covariances of each category. Given a novel observation with feature vector u*, assuming that the two categories are equally likely a priori we evaluate the log-likelihood ratio

choosing C1 when L12 ≥ 0 and C2 when L12 < 0. In the case of Gaussian distributions for each category as in Equation 14, Equation 15 can be rewritten as


The task for applying this formalism is to define a set of n features to measure from the set of image patches, and then use these measurements to define the means and covariances of each category in a supervised manner. New image patches that are unlabeled can then be classified using this quadratic classifier.

In our analyses, category C1 was occlusion patches and category C2 was surface patches. We estimated the parameters of the classifiers for each category by taking the means and covariances of the statistics measured from a set of 2,000 image patches (training set) and we applied these classifiers to a different 400 patch subsets of 1,000 image patches, which were presented to the subjects (test set). This analysis was performed for multiple classifiers defined by different subsets of parameters, and for image patches of all sizes (8 × 8, 16 × 16, 32 × 32).

SVM classifier analysis

As an additional control, in addition to the quadratic classifier analysis we trained a Support Vector Machine (SVM) classifier (Cristianini & Shawe-Taylor, 2000) to discriminate occlusions and surfaces using our grayscale visual feature set (G1–G5). The SVM classifier is a standard and well-studied method in machine learning, which achieves good classification results by learning the separating hyperplane that maximizes the margin between two categories (Bishop, 2006). We implemented this analysis using the function svmclassify.m and svmtrain.m in the MATLAB Bioinformatics Toolbox.

Multiscale classifier analyses on Gabor filter outputs

One weakness of defining machine classifiers using our set of visual features is that these features are defined on the scale of the entire image patch. This is problematic because it is well known that occlusion edges exist at multiple scales, and that appropriate scale selection and integration across scale is an essential computation for accurate edge detection (Elder & Zucker, 1998; Marr & Hildreth, 1980). Furthermore, it is well known that the neural code at the earliest stages of cortical processing is reasonably well described by a bank of multi-scale filters resembling Gabor functions (Daugman, 1985; Pollen & Ronner, 1983), and that such a basis forms an efficient code for natural scenes (Olshausen & Field, 1996).

In order to define a multiscale feature set resembling the early visual code, we utilized the rectified outputs of a bank of filters learned using Independent Component Analysis (ICA). These filters closely resemble Gabor filters, but have the additional useful property of constituting a maximally independent set of feature dimensions for encoding natural images (Bell & Sejnowski, 1997). Example filters for 16 × 16 image patches are shown in Figure 5a. The outputs of our filter bank were used as inputs to two different standard classifiers: (a) a linear logistic regression classifier and (b) a three-layer neural network classifier (Bishop, 2006) having 4, 16, 64 hidden units. These classifiers were trained using standard gradient descent methods, and their performance was evaluated on a separate set of validation data not used for training. A schematic illustration of these classifiers is shown in Figure 5b.

Figure 5

Illustration of the multiscale classifier analysis using the outputs of rectified Gabor filters. (a) Gabor functions learned using ICA form an efficient code for natural images, which maximize statistical independence of filter responses. (b) Grayscale...

Experimental paradigm

In the psychophysical experiments, image patches were displayed on a 24-inch Macintosh cinema display (Apple, Inc., Cupertino, CA). Since the RGB images were linearized to correct for camera gamma (Olmos & Kingdom, 2004), we set the display monitor to have gamma of approximately 1 using the Mac Calibration Assistant so that the displayed images would be as natural looking as possible. Subjects were seated in a dim room in front of the monitor and image patches were presented to the center of the screen, scaled to subtend 1.5 degrees of visual angle at the approximately 12-inch (30.5 cm) viewing distance. All stimuli subtended the same visual angle to eliminate confounds between the size of the image on the retina and its pixel dimensions. Previous studies have demonstrated that human performance on a similar task is dependent only on the number of pixels (McDermott, 2004), so by holding the retinal size constant the only variable is the number of pixels.

Our experimental paradigm is illustrated in Figure 6. Surrounding the image patch was a set of flanking crosshairs whose imaginary intersection defines the “center” of the image patch. In the standard task, the subject decides whether an occlusion boundary passes through the image patch center with a binary (1/0) key-press. Half of the patches were taken from the occlusion boundary database (where all occlusions pass through the patch center) and the other half of the patches were taken from the surface database. Therefore, guessing on the task would yield performance of 50 percent correct. The design of this experiment is similar to a previous study on detecting T-junctions in natural images (McDermott, 2004). To optimize performance, subjects were given as much time as they needed for each patch. Furthermore, since perceptual learning can improve performance (Ing et al., 2010), we provided positive and negative feedback.

Figure 6

Schematic of the two-alternative forced choice experiment. A patch was presented to the subject, who decided whether an occlusion passes through the imaginary intersection of the crosshairs. After the decision, the subject was given positive or negative...

We performed the experiment on two sets of naive subjects having no previous exposure to the images or knowledge of the scientific aims, as well as the lead author (S0), for a total of six subjects (three males, three females). One set of two naive subjects (S1, S2) was allowed to browse grayscale versions of the full-scale images (576 × 768 pixels) prior to the experiment. After 2–3 seconds of viewing, they were shown, superimposed on the images, the union of all subject labelings. This was meant to give the subjects an intuition for what is meant by an occlusion boundary in the context of the full-scale image, and was inspired by previous work on surface segmentation where subjects were allowed to preview large full-scale images of foliage from which surface patches were drawn (Ing et al., 2010). A second set of three naive subjects (S3, S4, S5) were not shown the full-scale images beforehand, in order to control for the possibility that this pre-exposure may have helped to improve task performance.

We used both color and grayscale patches of sizes 8 × 8, 16 × 16, and 32 × 32. In addition to the raw image patches, a set of “texture-removed” image patches was created by averaging the pixels in each of the two regions, thus creating synthetic edges where the only cue was luminance or color contrast. For the grayscale patches, we also considered the effects of removing luminance difference cues. However, it is a much harder problem to create luminance-removed patches as was done in previous studies on surface segmentation (Ing et al., 2010), since simply subtracting the mean luminance in each region of an image patch containing an occlusion often yields a high spatial frequency boundary artifact, which provides a strong edge cue (Arsenault et al., 2011). Therefore, we circumvented this problem by setting a 3-pixel thick region around the boundary to the mean luminance of the entire patch, in effect covering up the boundary. Then we could remove luminance cues equalizing the mean luminance on each side without creating a boundary artifact since there was no boundary visible. We called this condition “boundary + luminance removed.” One issue, however, with comparing these boundary + luminance removed patches to the normal patches is that now two cues are missing (the boundary and the luminance difference), so in order to better assess the combination of texture and luminance information we also created a “boundary removed” condition, which blocks the boundary but does not modify the mean luminance on each side. Illustrative examples of these stimuli are shown in Figure 4b.

All subjects were shown different sets of 400 image patches sampled randomly from a set of 1,000 patches in every condition, with the exception of two subjects in the luminance-only grayscale condition (S0, S2), who were shown the same set of patches in this condition only. Informed consent was obtained from all subjects, and all experimental procedures were approved beforehand by the Case Western Reserve University IRB (Protocol #20101216).

Tests of significance

In order to determine whether or not the performance of human subjects was significantly different in different conditions of the task, we utilized the standard binomial proportion test (Ott, 1993), which relies on a Gaussian approximation to the binomial distribution. This test is well justified in our case because of the large number of stimuli (N = 400) presented in each experimental condition. For proportion estimate πˆ we compute the 1 – α confidence interval as

We use a significance level of α = 0.05 in all of our tests and calculations of confidence intervals.

In order to evaluate the performance of the machine classifiers we performed a Monte Carlo analysis where the classifiers were evaluated on 200 different sets of 400 image patches randomly chosen from a validation set of image patches. This validation set was distinct from the set of patches used to train the classifier, allowing us to study classifier generalization. We plot 95% confidence intervals around the mean performance of our classifiers.


Case occlusion boundary (COB) database

We developed a novel database of occlusion edges and surface regions not containing any occlusions for use in our perceptual experiments. Representative images from our database are illustrated in Figure 2 (left column). Note the clear figural objects and many occlusions in these images. In the middle column of Figure 2, we see grayscale versions of these images, with the set of all pixels labeled by any subject (logical OR) overlaid in white. The magenta squares show examples of surface regions labeled by the subjects. Finally, the right column shows an overlay plot of the occlusions marked by all subjects, with darker pixels being labeled by more subjects and the lightest gray pixels being labeled by only a single subject. Note the high degree of consistency between the labelings of the multiple subjects. Figure 3 shows some representative examples of occlusion (left) and surface (right) patches from our database. It is important to note that while occlusions may be a major source of edges in natural images, edges may arise from other cues like cast shadows or changes in material properties (Kersten, 2000). The bottom panel of Figure 3 shows examples of patches containing edges defined by shadows rather than by occlusions.

In other annotated edge databases like the Berkeley Segmentation Dataset (BSD) (Martin, Fowlkes, Tal, & Malik, 2001), occlusion edges were labeled indirectly by segmenting the image into regions and then denoting the boundaries of these regions to be edges. We observed that when occlusions are labeled directly instead of being inferred indirectly from region segmentations that a smaller number of pixels are labeled, as shown in Figure 7b, which plots the distribution of the percentage of edge pixels for all images and subjects in our database (red) and the BSD database (blue) for 98 BSD images segmented by six subjects. We find that averaged across all images and subjects that about 1% of the pixels were labeled as edges in our database, whereas about twice as many pixels were labeled as edges by computing edgemaps using the BSD segmentations (COB median = 0.0084, N = 600; BSD median = 0.0193, N = 588; p < 10−121, Wilcox rank-sum).

Figure 7

Labeling occlusions directly marks fewer pixels than inferring occlusions from image segmentations and yields greater agreement between subjects. (a) Top: An image from our database (left) together with the labeling (middle) by the most conservative subject...

We observed a higher level of intersubject consistency in the edgemaps obtained from the COB than those obtained from the BSD segmentations, which we quantified using a novel analysis we developed, as well as a more standard precision-recall analysis (Abdou & Pratt, 1979; Rijsbergen, 1979, Methods, Image database). Figure 7a shows an image from the COB database (top left) together with the edgemap and derived mask for the most conservative subject (MCS), which is the subject who labeled the fewest pixels (top middle, right). When the MCS mask is multiplied by the edgemap of the other subjects, we see reasonably good agreement between all subjects (bottom row). In order to quantify this over all images, we computed our novel intersubject similarity index ζ defined in Equation 4 and in Figure 7c we see that on average that ζ is larger for our dataset than for the BSD, where here we plot the histogram for γ = 10 (COB median = 0.2890, N = 3500; BSD median = 0.1882, N = 3430; p < 10−202, Wilcox rank-sum). Similar results were obtained over a wide range of values of γ (Supplementary Figure S1). In addition to our novel analysis, we also implemented a precision-recall analysis (Abdou & Pratt, 1979; Rijsbergen, 1979) using a “leave-one-out” procedure where we compared the edges labeled by one subject to a “ground truth” labeling defined by combining the edgemaps of the five other subjects (Methods, Image database). Agreement was quantified using the F-measure (Rijsbergen, 1979), which provides a weighted mean of precision (not labeling nonedges as edges) and recall (detecting all edges in the image). We observe in Figure 7d that for this analysis there is a significantly better agreement between edgemaps in the COB database than those obtained indirectly from the BSD segmentations using this analysis (p < 10−28).

Visual features measured from occlusion and surface patches

We were interested in determining which kinds of locally available visual features could possibly be utilized by human subjects to distinguish image patches containing occlusions from those containing surfaces. Toward this end, we measured a variety of visual features taken from previous studies of natural image statistics (Balboa & Grzywacz, 2000; Field, 1987; Ing et al., 2010; Lee & Choe, 2003; Rajashekar et al., 2007; Reinagel & Zador, 1999) from both occlusion and surface patches. For patches containing only a single occlusion, which can be divided into two regions of roughly equal size, we obtained a region labeling, illustrated in Figure 4a (bottom). The region labeling consists of sets of pixels corresponding to the two surfaces divided by the boundary (white and gray regions), as well as the boundary (black line). Using this region labeling, we can measure properties of the image on either side of the boundary and compute differences in these properties between regions. By definition, surface patches are comprised of a single region and therefore it is unclear how we can measure quantities (like luminance differences), which depend on a region labeling. Therefore, in order to measure the same set of features from the surface patches, we assigned to each patch a set of 25 dummy region labelings (spanning all orientations) and for each dummy labeling performed the measurements. Since all features were, on average, larger for occlusions, for surfaces we took as the measured value of the feature the maximum over all 25 dummy labelings. A full description of measured features is given in Methods (Statistical measurements).

We compared the power spectra of 32 × 32 occlusion and texture patches (Figure 8, left). On average there is significantly less energy in low spatial frequencies for textures than for occlusions, which is intuitive since many occlusions contain a low spatial frequency luminance edge (Figure 3, left panel). Analysis of the exponent by fitting a line to the power spectra of individual images found a different distribution of exponents for the occlusions and the textures (Figure 8, right). For textures the median exponent is close to 2, consistent with previous observations (Field, 1987), whereas for occlusions the median exponent value is slightly higher (≈2.6). This is consistent with previous work analyzing the spectral content of different scene categories, which show that landscape scenes with a prominent horizon (like an ocean view) tend to have lower spatial frequency content (Oliva & Torralba, 2001).

Figure 8

Power spectra of 32 × 32 patches. Left: Median power spectrum of occlusions (blue) and surfaces (green). Thin dashed lines show the 25th and 75th percentiles. Right: Power-spectrum slopes for occlusions (blue) and surfaces (green).

Figure 9 shows that all of the grayscale features (G1-G5, Methods, Statistical measurements) measured from surfaces (green) and occlusions (blue) are well approximated by Gaussians when plotted in logarithmic coordinates. Supplementary Table 1 lists the means and standard deviations of each of these features. We see that on average these features are larger for occlusions than for textures, as one might expect intuitively since these features explicitly or implicitly measure differences or variability in an image patch, which will tend to be larger for occlusion patches. Analyzing the correlation structure of the grayscale features reveals that they are all significantly positively correlated (Supplementary Table 2). Although some of these correlations are unsurprising (like that between luminance difference log Δμ and boundary gradient log GB), many pairs of these positively correlated features can be manipulated independently of each other in artificial images (for instance global contrast log ρ and contrast difference log Δσ). These results are consistent with previous work, which demonstrates that in natural images there are often strong conditional dependencies between supposedly independent visual feature dimensions (Fine et al., 2003; Karklin & Lewicki, 2003; Schwartz & Simoncelli, 2001; Zetzsche & Rohrbein, 2001). Supplementary Figure S2 plots all possible bivariate distributions of the logarithm of the grayscale features for 32 × 32 textures and surfaces.

Figure 9

Univariate distributions of grayscale visual features G1-G5 (see Methods) for occlusions (blue) and textures (green). When plotted on a log scale, these distributions are well described by Gaussians and exhibit separation for occlusions and textures.

In addition to grayscale features, we measured color features (C1–C2, Methods, Statistical measurements) as well by transforming the images into the Lαβ color space Fine et al. (2003); Ing et al. (2010). Using our region labelings (Figure 4a, bottom), we measured the color parameters log Δα and log Δβ from our occlusions. Supplementary Table 3 lists the means and standard deviations of each of these features, and we observe positive correlations (r = 0.43) between their values, similar to previous observations of positive correlations between these same features for two nearby patches taken from the same surface (Fine et al., 2003). This finding is interesting because occlusion boundaries by definition separate two or more different surfaces, and the color features we measure constitute a set of independent feature dimensions. From the two-dimensional scatterplots in Supplementary Figure S3, we see that there is separation between the multivariate distributions defined by these color parameters, suggesting that color contrast provides a potential cue for occlusion edge detection, much as it does for surface segmentation (Fine et al., 2003; Ing et al., 2010).

Human performance on occlusion boundary detection task

Effects of patch size, color and pre-exposure

In order to determine the local visual features used to discriminate occlusions from surfaces, we designed a simple two-alternative forced choice task, illustrated schematically in Figure 6. An image patch subtending roughly 1.5 degrees of visual angle (30.5 cm viewing distance) was presented to the subject, who had to decide with a binary (1/0) keypress whether an image patch contains an occlusion. Patches were chosen from a pre-extracted set of 1,000 occlusions and 1,000 surfaces, with equal probability of each type so guessing would yield 50% correct. Subjects were allowed to view each patch as long as they needed (1–2 seconds typical) and following previous work, auditory feedback was provided to optimize performance (Ing et al., 2010). The task was performed on six subjects having varying degrees of exposure to the image database (Methods, Experimental paradigm).

Figure 10a illustrates the effect of patch size on the task for grayscale image patches for all subjects. Thick lines indicate the mean subject performance and thin dashed lines denote 95% confidence intervals. We see from this that performance is significantly different for the different size image patches which we tested (8 × 8, 16 × 16, 32 × 32). For a subset of subjects (S0, S1, S2), we also tested color image patches, and the results for color image patches are shown in Figure 10b (red line) together with the grayscale data from these same subjects (black dashed line). We see from this plot that for all patch sizes tested performance is significantly better for color patches, which is sensible because color is a potentially informative cue for distinguishing different surfaces, as has been shown in previous work (Fine et al., 2003; Ing et al., 2010).

Figure 10

Performance of human subjects at the occlusion detection task for 8 × 8, 16 × 16, and 32 × 32 image patches. (a) Subject performance for grayscale image patches. Thin dashed lines denote 95% confidence intervals. Note how performance...

One concern with the interpretation of our results is that the brief pre-exposure (3 seconds) of subjects S1, S2 to the full-scale (576 × 768) images prior to the first task session (Methods, Experimental paradigm) may have unfairly improved their performance. In order to control for this possibility, we ran three additional subjects (S3, S4, S5) who did not have this pre-exposure to the full-scale images, and we found no significant difference in the performance of the two groups of subjects (Supplementary Figure S4a). We also found that the lead author (S0) was not significantly better at the task than the two subjects (S1, S2) who were only briefly pre-exposed to the images (Supplementary Figure S4b). Therefore, we conclude that pre-exposure to the full-scale images makes little if any difference in our task.

Effects of luminance, boundary and texture cues

In order to better understand what cues subjects are making use of in the task, we tested subjects on modified image patches with various cues removed (Methods, Experimental paradigm). In order to determine the importance of texture, we removed all texture cues from the patches by averaging all pixels in each region (“texture removed”), as illustrated in Figure 4b (top). We see from Figure 11a that subject performance in the task is substantially impaired without texture cues for the 16 × 16 and 32 × 32 patch sizes. Note how performance is roughly flat with increasing patch size when all cues but luminance are removed, suggesting that luminance gradients are a fairly “local” cue, which is very useful even at very small scales. Similar results were obtained for color patches (Figure 12), where the only cues available were luminance and color contrast.

Figure 11

Subject performance for grayscale image patches with various cues removed. Dashed lines indicate the average performance for unaltered image patches, solid lines performance in the cue-removed case. (a) Removal of texture cues significantly impairs subject...

Figure 12

Subject performance for unaltered color image patches (dashed line) and with texture cues removed (solid line).

Removing luminance cues while keeping texture cues intact is slightly more tricky, since simply equalizing the luminance in the two image regions can lead to problems of boundary artifact (a high spatial frequency edge along the boundary), as has been noted by others (Arsenault et al., 2011). In order to circumvent this problem, we removed the boundary artifact by covering a 3-pixel wide region including the boundary by setting all pixels in this strip to a single uniform value (Figure 4b, bottom). We then were able to equalize the luminance on each side of the patch without creating boundary artifact (luminance + boundary removed). Since the boundary pixels may could potentially contain long-range spatial correlation information (Hess & Field, 1999) useful for identifying occlusions, as an additional control we also considered performance on patches where we only removed the boundary (boundary removed).

We see from Figure 11c

Different schools of professional and academic thought have recently emerged to address the unprecedented problems of the sprawling megacity. One particular group believes that solutions will emerge from the cultivation of data and vast amounts of statistical research. This activity, which is sometimes referred to a “datascaping”, reduces the complex problems of megacities to verbal logic that has the capacity to inform other verbal systems, such as the regulatory statutes, zoning, by-laws, comprehensive plans, and public policy of a city.

The suburban megacity feathers through endless gradations, from city patterns and built systems to nature and bio-morphic systems, forming ONE LANDSCAPE.

Another group, comprising architects, landscape architects, and urban planners, see the megacity as a design problem. Born out of a long and time-honored history of urban design, this notion extends from a conviction that the spatial arrangements of a city and the uses they contain can be designed, altered, or permuted to foster the social and economic relationships of a society and its goals. In contrast to the datascapers, this group largely sees the city as visual and spatial logic—in other words, Architecture.

Another group that is neither interested in datascaping or Architectural conventions passionately argues that megacities are unprecedented constructs that deserve, if not demand, new and unprecedented methods. The recent developments of Landscape Urbanism and Ecological Urbanism invent new verbal ideas and terminology that are in concert with the new and unfamiliar design solutions they produce.

Rather than debate the legitimacy of which one is right or better, that a unified theory and nomenclature of megacities do not yet to exist is perhaps a clue that they are not yet accurately understood or characterized. For example, to refer to Rome as a “city” and Los Angeles as a megacity implies that LA is simply a gigantic version of the Roman pattern, which, of course, it isn’t.

Perhaps a productive step would be to characterize the megacity more accurately by its attributes rather than by using nomenclature that is inaccurate or insufficient.


Pulitzer Prize winning architectural critic, Robert Campbell, offers a useful potential assessment of the megacity and its relationship with nature in “Still Steel” for Landscape Architecture Magazine.

“For the first time in human history, the entire world both built and un-built is being considered as one continuous landscape. It is a profound way of re-conceiving architecture (landscape) and cities.”

This article will explore and discuss the suburban megacity and/or mega-region as a landscape that feathers through endless gradations of city patterns and built systems on the one hand, to nature and bio-morphic systems on the other—i.e., ONE LANDSCAPE.

The article begins with a diagnostic of the suburban megacity that maps out a supportive framework for the notion of One Landscape. Density analyses of various cities and urban geographies will be used to reveal pattern characteristics.

Two potential techniques that can intervene in landscape-like patterns follow the diagnostic. The first is based on the notion of “reciprocity between buildings and landscape”, a conceptual device that was loosely utilized by planners and designers in the mid- to late- 20th century. The second is a particular kind of drawing technique that exploits the formal vagueness of megacities and the potential to introduce new qualities within them that unify urban design, landscape, and ecological impulses.


Experts on urbanism extol “density”—the ratio of humans to an area of measurement—as an attribute that “offers hope for the future” as a potential strategy that can restructure a suburban pattern. However, simple questions quickly arise. For example, what is the density goal? At what density does urbanity ignite—i.e., what is a target density? And then, by logical extension, would the same density that produces a social and economic network also be sufficient to make energy consumption efficient and economical? Or are these different density thresholds?

And, conversely, at what concentration of building forms and density is the potential for nature and ecologies to exist within a city driven out and replaced by an entirely constructed environment? Simply put, does “density” mean Hong Kong, or is the density of Boulder, Colorado or Savannah, Georgia sufficient, and for what?

An inventory of the density of key world cities is revealing. The density comparisons that follow take into account only the residential population of a city or region and the area it encompasses. For purposes of this analysis, this limitation avoids potential density distortions that are created by surging commuter populations that originate from outside a geography, and which can heighten the urban performance of an area with pulse concentrations.

When considering only its residential population, San Francisco’s density is 27 people to an acre. Given that San Francisco is generally seen as a highly urbane world city, its surprisingly low resident-density, which stabilizes the urban performance of the city, is also evidence of the commuter surge delivered by BART (Bay Area Rapid Transit) into the financial and governmental quarters of San Francisco.

The resident-density of Paris is 103 people per acre. At over four times the resident-density of San Francisco, what comes quickly into focus by comparing the two cities is that Paris is an extraordinarily efficient urban pattern, with an abundance of avenues and public spaces. We can infer that it isn’t as reliant on a commuter surge and/or that the weaving of residences with shops and small officing must be exceptionally integrated and fine-grained to sustain a resident-density of over 100 people per acre.

The resident-density of New York City is even higher than Paris at 111 people per acre. According to Professor Kenneth Frampton, the daily commuter surge into Manhattan can drive the resident-density even higher, with guesstimates falling somewhere between 500 and 1000 people per acre.

Comparing the density of these world cities—which originated around a historical core or a colonial center, or were hyper-densified by unusual geographical restrictions such as those posed by Manhattan island—with the 20th-century suburban megacities of the North American Sunbelt reveals a shocking if not an alarming, reality.

The average human density of Dallas-Fort Worth (or DFW) is 1 person per acre. Unaffected during its rapid expansion by any natural boundaries that might interfere and generate density, what has materialized in DFW instead is a pattern that undergoes a machine migration every day: residents abandon vast tracts of purely residential geographies to commute to purely “officing” or shopping geographies. Taken together with the public easements established for intercity highway and infrastructure, multiple airports (including the colossal DFW International Airport), and its system of water-harvesting reservoirs, every person living in DFW currently requires one acre of civilization to exist.

While the astonishment of such a land and resource consumption pattern settles in, keep in mind that Atlanta is virtually the same, with 0.97 persons per acre. Indeed, the same analysis applied to virtually all Sunbelt cities—Houston, Austin, Las Vegas, and others—yields a resident-density of approximately one person per acre.. Since all these cities were largely constructed with the same kind of engineered pattern—designed to the same parameters of traffic, safety, and turning radii—they essentially are one in the same place. Little wonder when critics and writers wax about the “lack of place” that typifies these kind “Generica” environments, they are stating facts that can be supported quantitatively. Whether it was offered as a critique or simply a statistical fact, architect Rem Koolhaas, during his 2008 lecture for the opening of the Wylie Multi-form Theater in the Dallas Arts District, called Dallas (DFW) the, “Epicenter of the generic.”

Only Phoenix, with 0.30 humans per acre—essentially one third the density of all the others—distinguishes itself from the monotonous hyper-pattern of the North American suburban megacity, which has produced one landscape built at an average resident-density of one person per acre.


By comparison with hyper-dense cities, the strikingly thin density of the suburban megacity raises a broad spectrum of questions and potential speculations. It provides evidence for why attempts to create nodes of urban concentration and density struggle to succeed. Urban formations are inherently more complex and expensive to design and construct. Costs to achieve them are transferred into the lease and purchasing rates for officing, retail, condos, and apartments. The spike in price point is theoretically offset by the advantages offered by urbanism that include culture, convenience, walkability, safety, and a generally vibrant and satisfying urban environment.

What can be observed with almost documentary evidence is how the thinly densified suburban area around a dense node tends to exert a dissipating effect on the benefit of urbanization by diffusing the amenities of concentrated land uses: cheaper rents and real estate are supported by an endless array of alternative land uses that are equally accessible by motorcar.

The cause and effect relationship between density and urbanity may be more complicated than the simple notion that attaining higher and higher densities should always be the objective. For example, several U.S. cities, such as Portland, Oregon; Madison, Wisconsin; Boulder, Colorado; and Savannah Georgia frequently top rankings of urban places that are highly desirable to live in. The same density analysis approach in these cities reveals the following:

However, it is the counterintuitiveness of the analysis that brings into focus a more poignant revelation about the suburban megacity that may be its most urgent and irreversible characteristic.

Using North Texas as a typical case study region, we see that 11 separate counties comprise DFW and they incorporate approximately 7 million acres of civilization for approximately 7 million residents. As a simple thought experiment, consider what would happen if the entire DFW metropolis attempted to universally densify to equal the charming and town-like density of Madison, Wisconsin, with 4.7 people per acre. Simple arithmetic reveals that the entire population of Canada would have to move to DFW to inhabit the new and denser city of 36 million people.

Does this potentially mean that any attempt to urbanize the suburban megacity is fundamentally doomed, an exercise in futility or romance for a town-like history that cannot be achieved? Has the unbridled growth and horizontal expansion of the North American city made the suburban megacity statistically impossible to retro-densify? Obviously, nodes of concentration can exist within the pattern, but even the most modest density objectives of, say, a Savannah Georgia-like density project, quickly produce a statistical reality that cannot be achieved. Even if the denser formations were built, there simply wouldn’t be enough people to occupy the buildings.

This documentary evidence could lead us to conclude that the future will, in fact, be One Landscape where nature, either cultivated or “wild,” co-exists with diffuse patterns of civilization that feather across density and nature layers. To meaningfully design new places, design strategies that interchangeably consider nature as architecture and buildings as site elements are needed. A strategy that considered such a hypothesis throughout the history of cities and gardens, as well as in the modern age, that could be useful to the contemporary problem of the suburban megacity, is known as “Reciprocity.”


Webster’s definition of reciprocity is “a situation or relationship in which ‘two people or groups’ agree to ‘do something similar’ for each other.” When reciprocity is applied as a design tool for architecture and planning, the phrase “something similar,” means the definition of spaces and places of most types and at most scales for human use. In extending the metaphor and application of reciprocity to urban planning and landscape design, the preceding phrase, “two groups,” that Webster mentions, can refer to architectural elements such as columns, walls, volumes and planes that can “reciprocate” by design with biomorphic and/or landscape elements such as trees, hedges, bosques, and orchards.

The key to reciprocity is that the mutual design of buildings and landscape elements should be a perceivable characteristic to individuals who inhabit environments or spaces that have been reciprocally conceived. Reciprocity is the result of deliberate and composed relationships that put buildings and landscapes into the “reciprocal” role of defining, mending, correcting, making a space or place that is a shared objective. The product of reciprocity is a continuous landscape where buildings and nature are spatially woven into a seamless fabric.

A simple and basic example of reciprocity between buildings and landscape, can be observed in how two repetitive lines of dots can signify the columns of a trellis or colonnade, or the trees of an alley or tree-lined path. If two such conditions were combined, the cadence of the trellis columns could continue into the cadence of the tree trunks and vice-versa.

The same thinking would apply to how a thickened line, drawn in plan view, can signify a building or landscape wall and/or a plant hedge. By further logical extension, a rectangle or volume in plan view, can signify a building footprint—a house—or it could signify a Bosque of trees, or even a biofilter that is planted and filled with dense underbrush.

These basic examples demonstrate how reciprocity can produce environments that are accomplished with the spatial integration of built and biomorphic materials of landscape. Creative extrapolations can rapidly multiply from the basic examples, into a playful and disciplined activity that is rich in possibilities, and thus “The game,” as Shakespeare wrote, “is afoot.”

Traces and built incidents of reciprocity occur throughout history as well as in contemporary buildings and landscapes. While reciprocity has existed as an infrequent occasion for making architecture, gardens, and cities, it could be used more often as a tool to make places and spaces in the diffuse pattern of the suburban megacity.

Two case studies follow that are intended to explain and highlight how reciprocity existed in the Renaissance garden of the Villa Gamberaia, as well as in The Nasher Sculpture Center, a 21st century accomplishment by architect Renzo Piano and landscape architect, Peter Walker FASLA.

Reciprocity in History: The Villa Gamberaia, Settignano Italy

Situated on a Tuscan ridge near Settignano, Italy, and in the hills above Florence, the Renaissance Villa Gamberaia is a textbook demonstration of how garden spaces can be reciprocally conceived with building and landscape elements. Along with the shifted formal relationships of buildings and plant materials, meanings and perceptions produced by the reciprocal operations also shift, adding richness that is an inspiration for how conceptual and perceptual intentions can co-exist in a place of unprecedented beauty and delight.

In the same way that the overture of an opera proclaims the essential themes of the musical production, the arrival sequence at the Villa Gamberaia announces to the observer that the entire garden will unfold as an interplay between landscape elements that are rendered as building elements, and building elements that are realized with landscape materials.

The foreshadowing role of the arrival sequence begins on the country road that extends a short distance from the town center in Settignano to the villa entrance, and proceeds through a concavely shaped gate and into a narrow garden corridor that is defined by two monumental bay laurel hedges that terminate on the door-less side of the main house. (Image One) The metaphorical meaning of the hall-like garden corridor is eventually revealed in the sequential presentation of the main space of the V. Gamberaia, which historians often refer to as the “bowling green.” (Image Two)

When examined in plan view, a long and axial bowling green is the dominant spatial figure of the space and the principal element that organizes the entire garden into subsets of other street-like spaces. The main building of the villa, two double arched arcades, a retaining wall that is articulated like a building façade, the edge of an equestrian stable, a banister railing and another bay laurel hedge, are arranged to reciprocally form and define the edges of the bowling green.

A freestanding grotto fountain caps one end and gives the alley-like space of the Bowling Green a kind of metaphorical beginning and origin point. (Image Three) The other end is left open as a belvedere overlook that propels a spectacular view into the Arno valley below.

A third clue is the interaction of the main house with the other dominant object of the garden, which is a monumental bay laurel hedge that was planted and trimmed to appear like a fragment of a Roman amphitheater. (Image Four) A plan view of the garden helps to reinforce the reciprocal reversal of meaning, because the hedge amphitheater looks more like an architecture element than the actual main house, which is a simple rectangular block. Returning for a moment to Webster’s definition of reciprocity, what the two different elements are “agreeing to do for each other,” is to frame and define a formal water garden between them. It is a space made in one part by a building that is simulating a hedge and on the other side by a hedge that is simulating a historical building fragment—an amphitheater. And this pattern of reciprocal operations and reversals in meaning repeats throughout the garden.

When all of these elements are taken together, one realizes that the Villa Gamberaia is a city fragment, where the narrow garden alleys and the bowling green are metaphorical streets and avenues with plants shaped into living facades and building facades that stand in for urban palaces.

Essentially, two places are produced in the same garden. One, in and of the city. The other, outside the city and in a pleasure garden. By traveling outside of Florence to enter a hillside garden, the observer discovers they have been conceptually re-inserted into a city. The concepts and ideals that shift the observer’s interpretation of the environment unfold within a garden that is also exquisitely beautiful and flawlessly integrated into the surrounding landscape.

Contemporary Reciprocity: The Nasher Sculpture Center, Dallas, Texas

Renzo Piano, architect for the Nasher Sculpture Center (Nasher), referred to the design as a contemporary “ruin” that nature has reclaimed as a garden. Where the Villa Gamberaia demonstrates reciprocity using a classical nomenclature of Roman Amphitheaters and axial alignments, the Nasher utilizes a modern and repeating system of parallel alignments of lines and dots that are reciprocally realized as walls, hedges, columns, and trees.

When viewed in a plan, the dominant quality of the overall arrangement is parallel lines that are the walls of the interior, the exterior perimeter walls of the sculpture garden, or freestanding hedges in the garden that act as spatial dividers and partitions within the overall garden room. Rows of live oak trees (Quercus virginiana) stand parallel with the walls and hedges. These point-lines reciprocally extend the building walls from inside the museum building into the sculpture garden, even as they are simultaneously transformed into landscape points that become the live oak rows and a cedar elm orchard.

To heighten interest, some of the line-points are shifted out of alignment with the building walls in order to adjust for pathways and also allow the imagination of the observer to become involved by correcting the misalignment with their minds-eye. Lines of street trees that lie outside the containment walls of the garden seem typical when viewed as a streetscape. However, when seen from within the garden and in comparison with other garden elements, they read like more rows of the parallels trees and hedges within the garden, that have been multiplied onto the street edges.

In addition to being a place that was exquisitely conceived and impeccably maintained, the Nasher is a textbook case illustrating that the elements of a building can be seen as reciprocally continuous with the elements of a garden landscape.

The net effect of reciprocal design is the work of the mind: inside can become outside, building turns into landscape, and a wall becomes hedge or a line of trees. Taken along with the splashing fountains, shadow patterns on the flawless turf, and the unparalleled quality of the sculpture collection, the reciprocal operations heighten curiosity and enlarge any visit to the center.

Reciprocity isn’t the only device that is available to mend and restructure the diffuse pattern of the suburban megacity. Urban applications of landscape and building reciprocity as an “architecture of trees” and potential mending fabric for the fragmentary and misshapen spaces of the contemporary city represent another tool that was advanced in late 20th– century writings of Colin Rowe.


Colin Rowe (1920 -1999) was an architectural historian, theoretician, and professor of architecture at Cornell University, who exerted a significant intellectual influence on world architecture and urbanism in the second half of the twentieth century. His writings and influence revivified the urban design tactics and lessons of the great canonical cities of western civilization such as Rome, Florence, Paris and London.

As a graphic tool to convey and explore patterns of urban space and form, Rowe and his colleagues and followers frequently relied on a particular kind of drawing convention known as figure / ground, that was both a graphic device as well as an intellectual summary of an architectural worldview. The highly reductive, black and white abstractions were useful and consistent to their theoretical interests, considering how the black and white contrast intensified the edge and boundary condition between buildings and the voids that are formed between. The conclusion and summary effect of Rowe’s hypothesis is that cities are essentially building solids and the voids between them. In the same way that architectural space is the reality of a building, to paraphrase Frank Lloyd Wright, cities can also be reduced to the same essential condition. Cities are essentially voids that are deliberately shaped by buildings.

While Rowe’s erudite speculations and the figure / ground drawings that represented them influenced world-renowned architects such as James Stirling, Michael Graves, Leon Krier, Rob Krier (Leon’s brother), Alan Chimacoff, Michael Dennis, Fred Koetter and others, as well as exerting a revolutionary influence on the curriculum of architecture, planning, and landscape programs at Cornell, Syracuse, the University of Virginia, the University of Maryland and individuals within the Harvard GSD, the drawing technique also carried with it the effect of editing out consideration of any role for nature, landscape, and/or the circumstantial interference of topography and/or geography to city form. All cities can be reduced to black and white diagrams of solids and voids. Cities that cannot be mapped by figure / ground, were edited in Rowe’s hypothesis as irrelevant or as anti-cities.

While an entire school of thought formed around the figure / ground-driven view of the “city of (architectural) space,” the same group of academics and practitioners may have overlooked another important lesson that also originated from Rowe’s writing—one that may be an even more provocative offering that could benefit the crisis of the suburban megacity.

While his interests were principally aligned with the European planning models, doubt about their relevance and/or applicability to the diffuse patterns of the suburban metropolis were already unfolding in the American city of the mid-twentieth century. Skepticism about the universal relevance of European cities may have been a by-product of his early teaching years at UT-Austin and the expansive Texas landscape he encountered. He offered the following speculation in an essay he wrote for “The Present Urban Predicament.”

“I would simply like to suggest that the garden may be regarded as both a model of the city; and that the architecture of trees either articulating as parterres as one of the these cases or, amplifying a particular condition as in the other, might well provide some kind of palliative for the contemporary predicament and even some kind of paradigm for the future.”

In the same way that Rowe revivified principles of the European city which are applicable for dense nodes, downtown centers, or dense American cities that have grown, densely, around the originating colonial center, the notion of an “architecture of trees,” and also the idea of the garden as a “palliative” and/or mending fabric for the sprawling and diffuse contemporary city, is an invitation for current generations to potentially extend Rowe’s line of design inquiry and research.

Two projects by Kevin Sloan Studio (of which I am principle and founder), one built and the other unrealized, are case studies that explored “Architecture of Trees” and the potential cohesion it could develop for a diffuse building and landscape formations.

Case Study One: An Architecture of Trees at the Sprint World Headquarters Campus

The Sprint World Headquarters Campus in Overland Park (suburban) Kansas City is an essay on Colin Rowe’s hypothesis for “an architecture of trees.” Situated on 212-acres that were formerly agricultural land, the Kansas City-based Sprint telecommunications company co-located some 13,000 employees within a new campus formation of 21 buildings. While the building design favored a historicist notion of an academic campus in retro-brick, the planning idea for the mixed-use corporate center, produced seven garden quadrangles that were intended to be a spatial, social, and organizational armature for the entire project.

During the master planning process, the physical size of the quadrangles and the building arrangements that formed them was heavily influenced by an interior space-planning strategy that was driven by the area needed for a mid-level executive at Sprint to supervise their particular group on one continuous floor. Consequentially, the typical floor sizes for the office buildings at the Sprint Campus are unusually large—typically 50,000 square feet per floor, and up to 100,000 square feet for exceptionally large corporate divisions.

As a result, the spaces between the buildings were also unusually large and unwieldy for fostering the kind of social interaction between employees that was imagined by the co-location strategy and master plan. The idea to insert an architecture of trees into the seven voids of the quadrangles arose both as a theoretical exploration and one that would also be useful in re-scaling the quadrangles into multiple spaces that would individually be more humane in proportion.

Once within the network of quadrangles, the architecture of trees creates an enveloping effect that rescales the open areas of the quadrangles in some areas, and in others, completely removes the buildings from any perception. Much like the reciprocal metaphors at the Villa Gamberaia, after entering the quads and the highly densified building formations, one is suddenly presented with a landscape world that is without any visual perception of a building. In addition to abstracting notions from the V. Gamberaia, in other situations, we used modern notions of transforming arcade and column formations into tree groves and fountain structures.

In reversing the perceptual reality of the Sprint Campus from the buildings to the landscape in the seven quads, one is invited to imagine removing the buildings to leave only the trees, earth forms, and fountains as the architectural reality of the campus.

Case Study Two: A Pecan Farm becomes a City of Trees

This project began as an assignment to lay out the orchards of a pecan farm and support buildings on four square miles of river bottomland along the Neosho River in southeast Kansas. In lieu of only an agriculturally established layout, the expansive fabric of trees was re-imagined as a “City of Trees,” to extend Rowe’s hypothesis for an ”Architecture of Trees.”

To originate the abstracted city form in pecan trees, the pattern of an ideal city that was conceived by 1st-century Roman architecture, we used Vitruvius and multiplied it into an array. The scale of the pattern was determined by two conditions: 1) the ideal spacing of pecan trees for agricultural production, which was 2) multiplied vis-à-vis the Vitruvian pattern across the area of the entire site.

The insertion of the pattern onto the site forced the ideal pattern and the circumstantial form of the river and its attendant cottonwoods to interfere and modify the design. The project remains unrealized as the landowner reconsidered the economical potential of hydraulic fracking over pecans.

Reciprocity in Drawing as a Design Tool

Michael Graves (1934 – 2015) was an American architect who revolutionized modern architecture by repositioning history into contemporary building designs. In addition to his prodigious architectural production and household product designs that included teapots, silverware, and other household items, Graves was an accomplished painter and artist. Drawing assumed an essential role in his architectural production and a particular kind of drawing he referred to as “referential” exploited ambiguities of drawn notations that could be reciprocally interpreted as either a building or a landscape element.

Each of the drawing examples shown above represents different themes, organizational ideas, sets of principles, or even conversations between pieces and fragments that suggest a possible completion or interpretation. The key to the drawing is that the ambiguities remain deliberate, allowing the broadest potential for interpreting what part of the drawing might be the building element and what part the landscape element.

As a demonstration of applying reciprocity as an active part of a landscape or urban design process, Graves’ use of this particular kind of drawing convention may have no equal.

While Graves’ sketches are entirely from his hand, one can easily imagine extending the idea by taking the fragmentary characteristics of an existing site or suburban building arrangement and filling the spaces between with drawn notations that knit, organize, permute and/or transform. By making the drawing insertions similarly ambiguous, the endless speculation that the elements, which knit and transform a fragmentation into a composition, could be additional buildings or landscape devices, is possible.

While Graves may have been definitive in his use of this particular drawing convention for design, much more can be done with it, especially in application towards the vast problems and occasions of the suburban megacity.


In “Landscape & Memory,” author Simon Schama says, “landscape is the work of the mind.” This elegant and accurate remark clarifies that for landscape to be “landscape,” it must distinctly bear the imprint of the hand of man, distinct from nature. In returning to Robert Campbell’s statement that the entire surface of the earth is now being considered as “one continuous landscape”, by logical extension, we can move to viewing the entire surface of the earth as touched directly or indirectly by the actions of people.

At the poetic level, this notion is compelling and opens up exciting new possibilities for planning, design, and the nature of cities. And at a prosaic level, the statement is less poetry than potential fact, given the threats to the environment that are accumulating from the unmanaged actions of humans.

What is hopeful is not density, but rather how design as a productive and beneficial human could make incremental progress in reversing and transforming the malevolent nature of current building and planning paradigms into a synthesis of building with nature. Indeed, as Campbell concludes, it is potentially a profound new territory for landscape architecture to explore.

Kevin Sloan
Dallas-Fort Worth

On The Nature of Cities

About the Writer:
Kevin Sloan

Kevin Sloan, ASLA, RLA is a landscape architect, writer and professor. The work of his professional practice, Kevin Sloan Studio in Dallas, Texas, has been nationally and internationally recognized.


Leave a Reply

Your email address will not be published. Required fields are marked *