Calculate Mean Site-Level Trait Distances Across Time Slices

This function computes mean trait distances between co-occurring species at individual sites across different time slices. For each time slice, it calculates the mean distance for all species across all sites where they co-occur, then aggregates these individual species distances to obtain a time slice-level mean and variance.

Usage

clade_site_distance(
  df.TS.TE,
  df.occ,
  time.slice,
  dist.trait,
  nearest.taxon,
  group = NULL,
  group.focal.compare = NULL,
  type.comparison = NULL,
  trait = NULL,
  round.digits = 1,
  species = "species",
  TS = "TS",
  TE = "TE",
  Max.age = "Max.age",
  Min.age = "Min.age",
  site = "site"
)

Arguments

df.TS.TE

A data frame containing species temporal and trait data with at least four columns: species names, origination times (TS), extinction times (TE), and trait values. Additional columns may include group assignments.

df.occ

A data frame containing fossil occurrence records with at least four columns: species names, minimum age, maximum age, and site location ID. Each row represents a single occurrence record at a specific site.

time.slice

Numeric. The time interval (in the same units as TS and TE) between consecutive time slices for temporal binning.

dist.trait

A distance matrix object (class dist or matrix) containing pairwise trait distances between species. Row and column names must match species names in df.TS.TE. If NULL, distances will be computed from the trait column using Euclidean distance.

nearest.taxon

Numeric or character. The number of nearest neighbors to consider when calculating mean distances. Use 1 for mean nearest neighbor distance (MNND), or "all" for mean pairwise distance (MPD).

group

Character. The name of the column in df.TS.TE containing group assignments for species (e.g., clade, family). Required if using group.focal.compare. Default is NULL.

group.focal.compare

Character vector of length 2. The first element specifies the focal group and the second specifies the comparison group. If NULL (default), distances are calculated across all species regardless of group membership.

type.comparison

Character. Specifies the type of distance comparison:

"between": Calculate distances only between species from the focal and comparison groups.
"within": Calculate distances only among species within the focal group.
NULL (default): Calculate distances among all species together.

trait

Character. The name of the column in df.TS.TE containing trait values. If NULL (default), dist.trait must be provided.

round.digits

Integer. The number of decimal places to round time slice values. Default is 1. This affects temporal binning precision.

species

Character. The name of the column in df.TS.TE and df.occ containing species identifiers. Default is "species".

TS

Character. The name of the column in df.TS.TE containing origination (first appearance) times for each species. Default is "TS".

TE

Character. The name of the column in df.TS.TE containing extinction (last appearance) times for each species. Default is "TE".

Max.age

Character. The name of the column in df.occ containing the maximum (oldest) age estimate for each occurrence record. Default is "Max.age".

Min.age

Character. The name of the column in df.occ containing the minimum (youngest) age estimate for each occurrence record. Default is "Min.age".

site

Character. The name of the column in df.occ containing site location identifiers. Default is "site".

remove.singletons

Logical. Should singleton species (species occurring alone at a site with no co-occurring species) be excluded from mean and variance calculations? Default is TRUE. When TRUE, singletons are assigned a distance of 0 but may be excluded from aggregation depending on implementation.

Value

A data frame with three columns:

mean.distance: Numeric. The mean trait distance across all species-site combinations within each time slice. This is calculated by first computing mean distances for each species to its co-occurring species at sites, then averaging these values across all species in the time slice. Returns NA when there is no species occurrence in the time slice and NA_singleton when there is only one species occurring
var.distance: Numeric. The variance of trait distances across all species-site combinations within each time slice.
time.slice: Numeric. The time point representing each slice, typically the upper (older) boundary of the time bin.

Details

The function performs the following steps:

Creates time slices from maximum TS to minimum TE
Generates regional co-occurrence matrices using aux_matrix_regional_coex()
Determines which species occur at which sites in each time slice
Creates site-based co-occurrence matrices using comp_site_cooccurr()
For each species at each site, calculates mean distances to co-occurring species
Aggregates individual species distances to obtain time slice-level mean and variance
Optionally filters by group membership

Distance calculation hierarchy:

Species-site level: For each species at each site, calculate mean distance to co-occurring species (based on nearest.taxon)
Time slice level: Average all species-site distances within each time slice to obtain overall mean and variance

Special cases:

Singleton species: Species with no co-occurring taxa at a site are assigned distance = 0 and flagged (treatment depends on remove.singletons)
Missing data: Time slices with no occurrences return NA
Group comparisons: When using group.focal.compare, only distances between/within specified groups are computed

This function differs from IndivSpec_site_distance() by aggregating to the time slice level rather than returning individual species-level results. For regional-scale (non-site-based) distances, see clade_regional_distance().

Examples

if (FALSE) { # \dontrun{
# Create example fossil data with traits
df_temporal <- data.frame(
  species = c("sp1", "sp2", "sp3", "sp4"),
  TS = c(100, 95, 90, 85),
  TE = c(50, 45, 40, 35),
  trait = c(1.2, 2.5, 3.1, 4.0),
  group = c("A", "A", "B", "B")
)

df_occurrences <- data.frame(
  species = c("sp1", "sp1", "sp2", "sp3", "sp4", "sp4"),
  Max.age = c(100, 95, 95, 90, 85, 85),
  Min.age = c(90, 85, 85, 80, 75, 75),
  site = c("site1", "site2", "site1", "site1", "site2", "site3")
)

# Calculate mean site-level MPD through time
result_mpd <- clade_site_distance(
  df.TS.TE = df_temporal,
  df.occ = df_occurrences,
  time.slice = 10,
  dist.trait = NULL,
  nearest.taxon = "all",
  trait = "trait"
)

# View results
head(result_mpd)

# Plot mean distance through time
plot(result_mpd$time.slice,
     result_mpd$mean.distance,
     type = "l",
     xlab = "Time (Ma)",
     ylab = "Mean Site-Level Trait Distance",
     main = "Mean Trait Distance at Sites Through Time")

# Add variance as error bands
polygon(c(result_mpd$time.slice, rev(result_mpd$time.slice)),
        c(result_mpd$mean.distance - sqrt(result_mpd$var.distance),
          rev(result_mpd$mean.distance + sqrt(result_mpd$var.distance))),
        col = rgb(0, 0, 1, 0.2), border = NA)

# Calculate MNND at sites
result_mnnd <- clade_site_distance(
  df.TS.TE = df_temporal,
  df.occ = df_occurrences,
  time.slice = 10,
  dist.trait = NULL,
  nearest.taxon = 1,
  trait = "trait"
)

# Calculate distances between groups at sites
result_between <- clade_site_distance(
  df.TS.TE = df_temporal,
  df.occ = df_occurrences,
  time.slice = 10,
  dist.trait = NULL,
  nearest.taxon = "all",
  trait = "trait",
  group = "group",
  group.focal.compare = c("A", "B"),
  type.comparison = "between"
)

# Calculate distances within a single group at sites
result_within <- clade_site_distance(
  df.TS.TE = df_temporal,
  df.occ = df_occurrences,
  time.slice = 10,
  dist.trait = NULL,
  nearest.taxon = "all",
  trait = "trait",
  group = "group",
  group.focal.compare = c("A", "B"),
  type.comparison = "within",
  remove.singletons = TRUE
)
} # }