INQUA Working Group on Data-Handling Methods

Newsletter 9: January 1993

SLOTDEEP.EXE: MANUAL CORRELATION USING THE DISSIMILARITY MATRIX

Louis J. Maher

At the end of a paleoecological investigation, the results have to be correlated with other sites to see how the new information ties in with the old. The methods by which correlation is done are extremely varied depending on which part of the geologic column is involved and what materials were studied. Those working in Holocene sediments may not have extinctions or first occurrences to help them, but pollen and diatoms do occur in prodigious numbers, and Carbon-14 is abundant throughout.

The standard pollen zonation schemes of Europe and North America emphasize that the sequential changes in taxon composition in a single core are mirrored in other cores of the region. Before the middle of this century, zone boundaries in cores "dated" the sediment as surely as Carbon-14 does now. Pollen can still provide chronologies for material unsuited to carbon analysis.

Anyone can recognize a sequence of pollen zones when they are pointed out on a diagram, but almost no two workers will agree exactly where one zone ends and another begins. This shows up in terms like "transition zone", "sub-zone", and "telescoped zones" as well as "veteran researcher" and "neophyte". This sort of problem can be explored by numerical analysis with the computer. Where can a sequence be divided what will yield two adjacent parts that are most alike in themselves and most unalike between themselves? The zone boundaries are the datable entities, and this explains the efforts to define them.

I was first introduced to the idea of using numerical analysis to correlate two sites by "slotting them together" from the work of John Birks (1979) and Alan Gordon (1973, 1980). The ambiguous results I obtained with their FORTRAN program for sequence slotting, SLOTSEQ (Birks, 1979, Appendix 2), led me to develop SLOTSEE, a QuickBASIC program with graphics which shows the user the original diagrams, the slotted result, and a "map" of the dissimilarity matrix the computer algorithm uses to do the slotting. I needed to "see" the data to try to understand what the algorithm was doing, and why it sometimes produced strange results. It was through SLOTSEE that I first met (via e-mail!) Malcolm Clark, who told me how he was addressing the problem of "blocking" (slotting where long sequences from one core are inserted between long sequences of the other) and developing the H-matrix concept. As a result of that contact, I asked him to do two articles for the newsletter (Clark, 1992 & p. 5-10 - this issue) to discuss the problems and their solutions.

I have come to realize there is a paradox involved when correlating by pollen zonation and correlating by slotting. Pollen zone boundaries are potential points of correlation; pollen zones often prevent effective slotting. Malcolm Clark ( see p. 7 ) suggests the slotted route's best "combined path length" (CPL) through a dissimilarity matrix is analogous to a river running in a valley; the slotting path is sure when the valley is deep, and ambiguous when the valley widens to form a lake. In these terms, we can think of pollen zones as matrix lakes. The pollen zone, by definition, is a sediment sequence where the taxon abundances assume characteristic relative values that remain stable for a time. Almost any route through the zone (lake) would yield a very similar CPL. The actual CPL that is shortest might result more from minor fluctuations owing to random counting error than to contemporary vegetation change in the landscape.

One of the problems with SLOTSEQ and SLOTSEE is that they rank the sediment sample sequence from 1 to n, but they discard the actual depth spacing of the samples in the sediment. It takes time to accumulate sediment, and--lacking independent information to the contrary--samples spaced farther apart should differ more in age than those situated close together. Surely that kind of information is too valuable to ignore. The silhouette diagrams drawn on the screen in SLOTSEE may look like pollen diagrams, but they are quite schematic; the samples are spaced evenly from top to bottom. That they look like pollen diagrams simply reflects the fact that most of us choose a sample interval and stick with it. The matrix map produced by SLOTSEE plots the samples from two sites in correct sequential order, but it also ignores their actual depths and spacing. SLOTSEE does not even read the sample depths from the sites' data files.

I tried to remedy this situation by developing SLOTDEEP. This program is a superset of SLOTSEE. The user has the choice of the same three measures of dissimilarity (Manhattan Metric, Chord Distance, or 1-Spearman Rank). It contains the same Gordon algorithm (Birks, 1979), and it calculates the same results in the automatic mode. However SLOTDEEP retains information about the sample stratigraphy and plots the pollen samples at their correct depth rather than merely in serial order--hence the title SLOTDEEP. The dissimilarity matrix also retains the samples' depth separation.

One has the important option of using the dissimilarity matrix to correlate the two diagrams manually. The matrix map displays quantitatively the degree of dissimilarity--depending on the particular measure used--between each sample in one core with each sample in the other. The dissimilarity of the core tops is plotted at the "northwestern" corner of the matrix map, and the core bottoms plot at the "southeastern" corner. The values of low dissimilarity tend to plot in a northwest to southeast trend when comparable cores are graphed (Malcolm Clark's river valley). This presentation of the pollen data differs markedly from the diagrams we normally use in picking zone boundaries; one is less likely to be influenced by the biases often brought to that task.

I will give an example of how SLOTDEEP may be used to correlate two pollen sites from eastern Wisconsin. The sites are separated by 18 km; the pollen counts were done 15 years apart by different analysts.

The Ernst Brothers Quarry Site (Maher, 1970) was sampled in 1965 from a section exposed in a sand pit. Stumps of Picea and Larix rooted in till and outwash gravel had been buried first under pond sediments and later by peat. The sequence extended from an arbitrary "floating zero datum" in the peat to a depth of 370 cm in the gravel. Pollen was recovered from 14 samples in the interval from 50 to 275 cm, extending from the peat to the soil of the lower stump layer. The quarry was excavated and closed by the early 1980's, but at least two carbon dates are available for wood from the lower stump layer: 12,410±100 BP (WIS-347) and 12,500±120 BP (ISGS-75).

The Radtke Lake Site (Webb, 1987) is based on an 835-cm core of lake mud. Seven carbon dates were obtained from the core. The lowest interval from 826-835 cm yielded an age of 11,460 ± 580 BP (GX7893), which was considered to represent 11,290 BP (Webb subtracted 170 years from each date in an attempt to correct for hard-water error.).

The first pollen site loaded into SLOTDEEP is considered the subordinate site about which less is known; the better known principal site is loaded second. I will define the short Ernst Brothers Quarry sequence the subordinate site; only its base is dated. Radtke Lake, with its long record, is the principal site. The pollen sum is composed of 17 anemophilous taxa. Once the *.DAT format data files are read and the chord distance is calculated, the following menu appears:

1. SHOW Diagrams of Original Sites
2. SHOW Correlation suggested by SLOTSEQ
3. SHOW Matrix Map
4.  ** CORRELATE MATRIX MANUALLY **
5.     * SHOW  MANUAL SLOTTING *
6.  SAVE SLOTSEQ results to PRINTER
7.  CHANGE Dissimilarity Coefficient
8.     CHANGE Screen Colors
9.     **CHANGE THE SITES**
E.     EXIT
                  Press 1 - 9, or E
Pressing 1 displays the original diagrams (Fig. 1). The subordinate site is plotted above the principal site. The ticks on the depth scale are in meters. Nine minor taxa are shown combined in the right column to save space, but all 17 taxa contribute to the Chord Distance. Pressing 2 or 3 (and 6 - E) produces essentially the same results as SLOTSEE.
Figure 1
Figure 1
But if you press 4 to manually correlate the sites using the matrix map, you will see the screen shown in Fig. 2. This represents the "exploded" matrix map which shows with correct stratigraphic spacing, all points with Chord Distance less than a stipulated value; here, 0.5. Radtke Lake uses the Y axis; the horizontal lines represent core depth in meters increasing from top to bottom. The subordinate Ernst Brothers site is shown on the X axis, and the vertical lines show its depth in meters, increasing from left to right. The cross-hair cursor in Fig. 2 is shown at a depth of 150 cm in Ernst and 400 cm in Radtke. The coordinates of the cursor can always be read at the Cursor Indicator at the bottom right of the screen. The cross-hair can be moved about the matrix with the arrow keys. A shifted arrow key increases the speed of the cursor. The "Home" key moves the cursor to the top samples at the upper left, and the "End" key moves it to the lowermost samples at the bottom right.
Figure 2
Figure 2
The Chord Distances are shown on the screen in spectral colors ranging from white and red (very low dissimilarity; that is, highly similar) through yellow, green, blue, and purple which are less similar. One moves the cross-hair cursor to the colored points that are judged to correlate best and then presses the F1 key to "set" the point. A copy of the cross-hair is anchored at that position, and the "Point Indicator" at the lower left in incremented by one. When two or more points of correlation are established, a heavy yellow line can be fit to the selected points--either by linear segments (press the F5 key) or by a cubic spline (press the F6 key). Pressing the "S" key shows the route the SLOTSEQ algorithm has selected as best; this may be helpful in selecting the points for manual correlation. Fig. 3 shows the screen with the SLOTSEQ solution indicated by the thin line that steps from the upper left to the lower right.

After 10 points of correlation were manually selected with the F1 key, the "Home" key was pressed to move the cross-hair out of the way to the top position (50 cm in Ernst and 0 cm in Radtke). Pressing the F6 key then used a cubic spline function to connect the correlation points with a heavy line. The line of correlation follows SLOTSEQ's solution rather closely accept at the upper and lower parts of the trend. The F5 key will fit the points with linear segments. (You can cycle between the F5 and the F6 keys; the last one you press before choosing the F10 or "Q" key to Quit and return to the menu will be the one the program uses.)

Pressing 5 (Show Manual Slotting) will display the two pollen diagrams slotted together (Fig. 4) which allows you to judge the success of the correlation. The user has the option of saving the results to disk as an ASCII text file in which the two diagrams' depth sequences are plotted side by side with the subordinate site's samples plotted in the depth units of the principal site. The subordinate site's original depths are used as labels as suggested in the following abridged file generated from the correlation shown in Figures 3 and 4:

Results of SLOTDEEP.EXE   12-13-1992

Subordinate Site is Ernst Bros, Ozaukee Co, WI
Principal Site is Radtke Lake, WI (Sara Webb)
DC was Chord Distance   Fit was Spline

Subordinate Site's Depths are those correlated to the Principal Site.
  (Its actual depths are shown in brackets.)

Radtke        Ernst
 0
 16
 ...

 592
 608
         610       [ 50 ]
 624
         627.7     [ 80 ]
 637
 640
         641      [ 110 ]
 643
 656
 672
         673      [ 130 ]
 688
         690      [ 150 ]
 704
         706      [ 180 ]
 720
 736
         737      [ 200 ]
         766.3    [ 220 ]
 768
 800
         802      [ 240 ]
 816
         826.3    [ 260 ]
 832
         832      [ 265 ]
         838.4    [ 270 ]
         841.3    [ 272 ]
         846.1    [ 275 ]

Figure 3
Figure 3

Figure 4
Figure 4
The user also has the option of making a work copy of the subordinate site's file wherein the samples' depths are converted to their equivalents in the principal site. Note that if the principal site's samples had been converted to their estimated ages by the use of DEP-AGE.EXE (Maher, 1992), this option would in effect convert the subordinate's depths into age as well.

SLOTDEEP.EXE requires a color monitor and is supplied in versions for either the EGA or VGA graphic screen. Both versions (in self-extracting files with some example data) are available for anonymous ftp from the /pub/inqua directory of ice.geology.wisc.edu. The VGA version is named SLOTDEPV to differentiate it from the EGA version; SLOTDEPV can be renamed SLOTDEEP when it is extracted.

References.

Birks, H.J.B. 1979. Numerical methods for the zonation and correlation of biostratigraphical data. 99-123 + Appendix 2, (15 p; the SLOTSEQ.FOR listing appears on 13-15 of Appendix 2). In Bjorn E. Berglund, Ed. Vol I. General Project Descriptions. Subproject B: Lake and Mire Environments. Project 158: Palaeo-hydrological Changes in the Temperate Zone in the Last 15,000 Years, International Geological Correlation Programme. Lund, Sweden. 143 pp + 2 Appendices.

Clark, Malcolm. 1992. Sequence comparisons and sequence-slotting. INQUA - Commission for the Study of the Holocene, Working Group on Data-Handling Methods Newsletter 8:3-6.

Gordon, A.D. 1980. SLOTSEQ: a FORTRAN IV program for comparing two sequences of observations. Computers and Geosciences 6, 7-20. [The 1980 version differs from somewhat from the version listed in Birks (1979) that is used in SLOTSEE.EXE and SLOTDEEP.EXE.]

Gordon, A. D. 1973. A sequence-comparison statistic and algorithm. Biometrika 60, 197-200.

Maher, Louis J., Jr. 1992. Depth-age conversion of pollen data. INQUA - Commission for the Study of the Holocene, Working Group on Data-Handling Methods Newsletter 7:13-17.

Maher, Louis J., Jr. 1970. Two Creeks forest, Valders glaciation, and pollen grains, p. D-1 - D-8. In Black, R. F. et al., Pleistocene geology of Southern Wisconsin. Wisconsin Geological and Natural History Survey Information Circular No. 15, 175 p..

Webb, Sara L. 1987, Beech range extension and vegetation history: pollen stratigraphy of two Wisconsin lakes. Ecology, 68(6):1991-2005.


Copyright © 1993 Louis J. Maher
Home page
Newsletter 9 index
Author index
Subject index
WWW pages by K.D. Bennett