Compute Summary Statistics from Fossil Records Across Time Intervals
get_summary_interval.RdThis function calculates summary statistics for fossil occurrence records grouped into discrete time intervals (bins). For each time interval, it computes the number of occurrence records and the number of unique species (taxonomic richness). The function bins occurrences based on their age estimates and provides both record-level and species-level summaries.
Usage
get_summary_interval(
df.occ.fossil,
interval,
age.occ = "midpoint",
species = "species",
Max.age = "Maximum_Age",
Min.age = "Minimum_Age",
TS = NULL,
TE = NULL,
lat = NULL,
lng = NULL,
site = NULL,
group = NULL,
trait = NULL,
...
)Arguments
- df.occ.fossil
A data frame containing fossil occurrence records with columns for species names, age estimates (maximum and minimum ages, or midpoint), and optionally spatial coordinates, site information, group assignments, and trait values.
- interval
Numeric vector. A sequence of time bin boundaries in descending order (from oldest to youngest). For example,
c(100, 90, 80, 70)creates three time bins: 100-90, 90-80, and 80-70. Must have at least two elements.- age.occ
Character. The name of the column in
df.occ.fossilcontaining the age estimate used for binning occurrences. Typically "midpoint" for the midpoint between maximum and minimum ages. Default is "midpoint".- species
Character. The name of the column in
df.occ.fossilcontaining species identifiers. Default is "species".- Max.age
Character. The name of the column in
df.occ.fossilcontaining the maximum (oldest) age estimate for each occurrence record. Default is "Maximum_Age".- Min.age
Character. The name of the column in
df.occ.fossilcontaining the minimum (youngest) age estimate for each occurrence record. Default is "Minimum_Age".- TS
Character. The name of the column containing origination (first appearance) times. Default is NULL. This parameter is retained for future functionality but not currently used in calculations.
- TE
Character. The name of the column containing extinction (last appearance) times. Default is NULL. This parameter is retained for future functionality but not currently used in calculations.
- lat
Character. The name of the column containing latitude coordinates. Default is NULL. If provided, latitude information is retained in output.
- lng
Character. The name of the column containing longitude coordinates. Default is NULL. If provided, longitude information is retained in output.
- site
Character. The name of the column containing site location identifiers. Default is NULL. If provided, site information is retained in output.
- group
Character. The name of the column containing group assignments for species (e.g., clade, family). Default is NULL. If provided, group information is retained in output.
- trait
Character. The name of the column containing trait values for species. Default is NULL. If provided, trait information is retained in output.
- ...
Additional arguments (currently not used but reserved for future extensions).
Value
A list containing two data frames:
- df_summary_occurrences
A data frame with all occurrence records augmented with two columns:
time.interval: Character. The time bin identifier (e.g., "100-90")n.records.bin: Integer. The total number of occurrence records in that time interval
This data frame has one row per occurrence record.
- df_summary_richness
A data frame with unique species per time interval augmented with two columns:
time.interval: Character. The time bin identifiern.species.bin: Integer. The number of unique species (taxonomic richness) in that time interval
This data frame has one row per unique species per time interval.
Details
The function performs the following steps:
Validates that input is a data frame
Extracts specified columns from the occurrence data
For each time interval (consecutive pairs in
interval):Filters occurrences where
age.occfalls within the intervalAssigns a time interval label (e.g., "100-90")
Calculates summary statistics:
n.records.bin: Total number of occurrences per interval
n.species.bin: Number of unique species per interval
Binning logic:
An occurrence is assigned to an interval if:
interval[i] >= age.occ >= interval[i+1]Intervals are labeled using the format "upper-lower" (e.g., "100-90")
Occurrences falling outside all intervals are excluded
The two output data frames serve different purposes:
df_summary_occurrences: Retains all individual occurrence records for detailed analyses (e.g., spatial patterns, within-interval variation)
df_summary_richness: Provides species-level summaries for diversity analyses (e.g., richness through time, turnover rates)
Examples
if (FALSE) { # \dontrun{
# Create example fossil occurrence data
df_fossils <- data.frame(
species = c("sp1", "sp2", "sp3", "sp1", "sp2", "sp4"),
midpoint = c(95, 92, 88, 85, 82, 78),
Maximum_Age = c(100, 95, 90, 90, 85, 80),
Minimum_Age = c(90, 89, 86, 80, 79, 76),
lat = c(10, 15, 20, 25, 30, 35),
lng = c(-50, -55, -60, -65, -70, -75),
site = c("A", "B", "C", "A", "B", "C"),
group = c("G1", "G1", "G2", "G1", "G1", "G2"),
trait = c(1.2, 2.5, 3.1, 1.2, 2.5, 4.0)
)
# Define time intervals
time_bins <- c(100, 90, 80, 70)
# Get summary statistics
results <- get_summary_interval(
df.occ.fossil = df_fossils,
interval = time_bins,
age.occ = "midpoint"
)
# View occurrence-level summary
head(results$df_summary_occurrences)
# View species-level summary (richness)
head(results$df_summary_richness)
# Plot richness through time
library(ggplot2)
richness_summary <- results$df_summary_richness %>%
group_by(time.interval) %>%
summarise(richness = first(n.species.bin))
ggplot(richness_summary, aes(x = time.interval, y = richness)) +
geom_col() +
labs(x = "Time Interval", y = "Species Richness") +
theme_minimal()
# Calculate sampling intensity (records per species)
sampling_summary <- results$df_summary_occurrences %>%
group_by(time.interval) %>%
summarise(
total_records = first(n.records.bin),
n_species = n_distinct(species),
records_per_species = total_records / n_species
)
# Use custom column names
df_custom <- data.frame(
taxon = c("sp1", "sp2", "sp3"),
age = c(95, 85, 75),
oldest = c(100, 90, 80),
youngest = c(90, 80, 70)
)
results_custom <- get_summary_interval(
df.occ.fossil = df_custom,
interval = c(100, 90, 80, 70),
age.occ = "age",
species = "taxon",
Max.age = "oldest",
Min.age = "youngest"
)
} # }