Calculate Mean Site-Level Trait Distances Across Time Slices
clade_site_distance.RdThis function computes mean trait distances between co-occurring species at individual sites across different time slices. For each time slice, it calculates the mean distance for all species across all sites where they co-occur, then aggregates these individual species distances to obtain a time slice-level mean and variance.
Usage
clade_site_distance(
df.TS.TE,
df.occ,
time.slice,
dist.trait,
nearest.taxon,
group = NULL,
group.focal.compare = NULL,
type.comparison = NULL,
trait = NULL,
round.digits = 1,
species = "species",
TS = "TS",
TE = "TE",
Max.age = "Max.age",
Min.age = "Min.age",
site = "site"
)Arguments
- df.TS.TE
A data frame containing species temporal and trait data with at least four columns: species names, origination times (TS), extinction times (TE), and trait values. Additional columns may include group assignments.
- df.occ
A data frame containing fossil occurrence records with at least four columns: species names, minimum age, maximum age, and site location ID. Each row represents a single occurrence record at a specific site.
- time.slice
Numeric. The time interval (in the same units as TS and TE) between consecutive time slices for temporal binning.
- dist.trait
A distance matrix object (class
distormatrix) containing pairwise trait distances between species. Row and column names must match species names indf.TS.TE. If NULL, distances will be computed from the trait column using Euclidean distance.- nearest.taxon
Numeric or character. The number of nearest neighbors to consider when calculating mean distances. Use
1for mean nearest neighbor distance (MNND), or"all"for mean pairwise distance (MPD).- group
Character. The name of the column in
df.TS.TEcontaining group assignments for species (e.g., clade, family). Required if usinggroup.focal.compare. Default is NULL.- group.focal.compare
Character vector of length 2. The first element specifies the focal group and the second specifies the comparison group. If NULL (default), distances are calculated across all species regardless of group membership.
- type.comparison
Character. Specifies the type of distance comparison:
"between": Calculate distances only between species from the focal and comparison groups."within": Calculate distances only among species within the focal group.NULL (default): Calculate distances among all species together.
- trait
Character. The name of the column in
df.TS.TEcontaining trait values. If NULL (default),dist.traitmust be provided.- round.digits
Integer. The number of decimal places to round time slice values. Default is 1. This affects temporal binning precision.
- species
Character. The name of the column in
df.TS.TEanddf.occcontaining species identifiers. Default is "species".- TS
Character. The name of the column in
df.TS.TEcontaining origination (first appearance) times for each species. Default is "TS".- TE
Character. The name of the column in
df.TS.TEcontaining extinction (last appearance) times for each species. Default is "TE".- Max.age
Character. The name of the column in
df.occcontaining the maximum (oldest) age estimate for each occurrence record. Default is "Max.age".- Min.age
Character. The name of the column in
df.occcontaining the minimum (youngest) age estimate for each occurrence record. Default is "Min.age".- site
Character. The name of the column in
df.occcontaining site location identifiers. Default is "site".- remove.singletons
Logical. Should singleton species (species occurring alone at a site with no co-occurring species) be excluded from mean and variance calculations? Default is TRUE. When TRUE, singletons are assigned a distance of 0 but may be excluded from aggregation depending on implementation.
Value
A data frame with three columns:
- mean.distance
Numeric. The mean trait distance across all species-site combinations within each time slice. This is calculated by first computing mean distances for each species to its co-occurring species at sites, then averaging these values across all species in the time slice. Returns NA when there is no species occurrence in the time slice and NA_singleton when there is only one species occurring
- var.distance
Numeric. The variance of trait distances across all species-site combinations within each time slice.
- time.slice
Numeric. The time point representing each slice, typically the upper (older) boundary of the time bin.
Details
The function performs the following steps:
Creates time slices from maximum TS to minimum TE
Generates regional co-occurrence matrices using
aux_matrix_regional_coex()Determines which species occur at which sites in each time slice
Creates site-based co-occurrence matrices using
comp_site_cooccurr()For each species at each site, calculates mean distances to co-occurring species
Aggregates individual species distances to obtain time slice-level mean and variance
Optionally filters by group membership
Distance calculation hierarchy:
Species-site level: For each species at each site, calculate mean distance to co-occurring species (based on
nearest.taxon)Time slice level: Average all species-site distances within each time slice to obtain overall mean and variance
Special cases:
Singleton species: Species with no co-occurring taxa at a site are assigned distance = 0 and flagged (treatment depends on
remove.singletons)Missing data: Time slices with no occurrences return NA
Group comparisons: When using
group.focal.compare, only distances between/within specified groups are computed
This function differs from IndivSpec_site_distance() by aggregating
to the time slice level rather than returning individual species-level results.
For regional-scale (non-site-based) distances, see clade_regional_distance().
Examples
if (FALSE) { # \dontrun{
# Create example fossil data with traits
df_temporal <- data.frame(
species = c("sp1", "sp2", "sp3", "sp4"),
TS = c(100, 95, 90, 85),
TE = c(50, 45, 40, 35),
trait = c(1.2, 2.5, 3.1, 4.0),
group = c("A", "A", "B", "B")
)
df_occurrences <- data.frame(
species = c("sp1", "sp1", "sp2", "sp3", "sp4", "sp4"),
Max.age = c(100, 95, 95, 90, 85, 85),
Min.age = c(90, 85, 85, 80, 75, 75),
site = c("site1", "site2", "site1", "site1", "site2", "site3")
)
# Calculate mean site-level MPD through time
result_mpd <- clade_site_distance(
df.TS.TE = df_temporal,
df.occ = df_occurrences,
time.slice = 10,
dist.trait = NULL,
nearest.taxon = "all",
trait = "trait"
)
# View results
head(result_mpd)
# Plot mean distance through time
plot(result_mpd$time.slice,
result_mpd$mean.distance,
type = "l",
xlab = "Time (Ma)",
ylab = "Mean Site-Level Trait Distance",
main = "Mean Trait Distance at Sites Through Time")
# Add variance as error bands
polygon(c(result_mpd$time.slice, rev(result_mpd$time.slice)),
c(result_mpd$mean.distance - sqrt(result_mpd$var.distance),
rev(result_mpd$mean.distance + sqrt(result_mpd$var.distance))),
col = rgb(0, 0, 1, 0.2), border = NA)
# Calculate MNND at sites
result_mnnd <- clade_site_distance(
df.TS.TE = df_temporal,
df.occ = df_occurrences,
time.slice = 10,
dist.trait = NULL,
nearest.taxon = 1,
trait = "trait"
)
# Calculate distances between groups at sites
result_between <- clade_site_distance(
df.TS.TE = df_temporal,
df.occ = df_occurrences,
time.slice = 10,
dist.trait = NULL,
nearest.taxon = "all",
trait = "trait",
group = "group",
group.focal.compare = c("A", "B"),
type.comparison = "between"
)
# Calculate distances within a single group at sites
result_within <- clade_site_distance(
df.TS.TE = df_temporal,
df.occ = df_occurrences,
time.slice = 10,
dist.trait = NULL,
nearest.taxon = "all",
trait = "trait",
group = "group",
group.focal.compare = c("A", "B"),
type.comparison = "within",
remove.singletons = TRUE
)
} # }