Introduction to SBEARS - Site Based Estimation of Ancestral Range of Species
Intro_sbears.RmdContext
Here we will demonstrate how to fit a model of ancestral area reconstruction called SBEARS (Site Based Estimation of Ancestral Range of Species) (Duarte et al 2025). SBEARS allows you to reconstruct ancestral ranges at finer spatial resolution (grids) without requiring any a priori definition of biogeographic regions. The output provides the spatial distribution of ancestral nodes in a user-friendly matrix format that can be used for other methods in community ecology and macroecology.
In SBEARS, the reconstruction is carried out based on the current
distribution of species across the geographic space of species belonging
to a monophyletic lineage. This information can be organized as an
occurrence/ presence-absence matrix of sites occupied by species. The
reconstruction is performed using a phylogenetic tree through a
stochastic evolutionary model based on a Brownian motion model,
implemented in the function fastAnc of phytools
R package. In this model, each site is treated as a binary categorical
trait representing the presence (state 1) or absence (state 0) of every
species in the site, which is reconstructed at the nodes of a
phylogeny.
This process is repeated for all sites (with each site corresponding to a different trait) and the final product is a matrix describing the probability value for each node to occur in each site of the grid. The probability is standardized to account for variation in the range size of nodes. This is the procedure called Single Site. There is also another option called Dispersal Assembly in which the reconstruction probability also depends on the probability of an ancestor occurring in a site dispersing to a neighboring site, given the geographical distance among sites weighted by a dispersal kernel function.
Next, we will show how these two methods can be used in the
Herodotools package, how to interpret the results of the
calc_sbears function, and how to visualize them.
Data
To perform SBEARS reconstruction, we need a community matrix, the geographical coordinates of these communities, and a phylogeny for the species occurring in the communities. As an example, we will use the same data used in Maestri et al (2019) and Duarte et al (2025) that comprise data on the occurrence and phylogeny of Sigmodontinae rodents. The community matrix corresponds to grids of approximately XX km².
library(Herodotools)
data("comm_sbears")
data("coords_sbears")
data("phy_sbears")Single site algorithm
To perform the single site algorithm, we can use the
calc_sbears function with the argument
method = "single_site".
out_single_site <- calc_sbears(x = comm_sbears,
phy = phy_sbears,
coords = coords_sbears,
method = "single_site",
compute.node.by.sites = TRUE,
make.node.label = TRUE)The calc_sbears function also supports parallel
computation via the package. This is recommended for large community
matrices, as computation time can otherwise be substantial. To run the
function in parallel, the user can proceed as follows:
# setting function to work in parallel according to user settings
library(future)
library(progressr)
# Detect safe max cores
ncores <- future::availableCores() # dynamic
safety_margin <- 1
workers <- max(1, ncores - safety_margin)
plan(multisession, workers = workers)
handlers("txtprogressbar") # terminal progress bar + timing info
out_single_site <- calc_sbears(x = comm_sbears,
phy = phy_sbears,
coords = coords_sbears,
method = "single_site",
compute.node.by.sites = TRUE,
make.node.label = TRUE)
# going back to sequential plan
future::plan(sequential)The function output is a list or a single matrix depending on the
arguments used in the function. If the user sets the argument
compute.node.by.site as TRUE the function returns the
output as a list with two elements:
- `reconstruction`: a matrix with ancestral nodes in rows and sites
in columns. The values represent the occupation probability of each
ancestral node in each site
- `site_node_composition`: a matrix with assemblage in rows and
node names in columns. The values represent the presence of a given
node in a given assemblage based on the occurrence of current species
representing that lineage in the sites.
Dispersal Assembly algorithm
When using sbears with the Dispersal Assembly algorithm, two additional arguments must be specified: w_slope, which corresponds to the slope parameter used in the kernel function, and min_disp_prob, which defines the minimum dispersal probability of a node between two cells in geographic space.
out_disp_assembly <- calc_sbears(x = comm_sbears,
phy = phy_sbears,
coords = coords_sbears,
method = "disp_assembly",
w_slope = 5,
min_disp_prob = 0.8,
compute.node.by.sites = TRUE,
make.node.label = TRUE)The structure of the output for calc_sbears function
with dispersal assembly algorithm is the same as the one with the
single_site algorithm
Visualizing ancestral areas
We can use the objects from calc_sbears function to plot
the ancestral range of ancestral nodes.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
# dense to long format
long_out_reconstruction <-
as.data.frame(as.table(out_single_site$reconstruction))
# naming data frame
colnames(long_out_reconstruction) <- c("nodes", "site", "prob.occurr")
# joining with coordinates
df_coords_sbears <-
coords_sbears |>
tibble::rownames_to_column("site")
# joining reconstruction with site coordinates
df_reconstruction_coords <-
long_out_reconstruction |>
left_join(df_coords_sbears, by = "site")
# for visualization
coastline <- rnaturalearth::ne_coastline(returnclass = "sf")
map_limits <- list(
x = c(-95, -30),
y = c(-55, 12)
)
theme_htools <- list(
ggplot2::geom_sf(data = coastline, size = 0.4),
ggplot2::coord_sf(xlim = map_limits$x, ylim = map_limits$y),
ggplot2::scale_x_continuous(n.breaks = 2),
ggplot2::scale_y_continuous(n.breaks = 2),
ggplot2::ggtitle(""),
ggplot2::theme_bw(),
ggplot2::guides(fill = ggplot2::guide_colorbar(barheight = ggplot2::unit(2.3, units = "mm"),
direction = "horizontal",
ticks.colour = "grey20",
title.position = "top",
label.position = "bottom",
title.hjust = 0.5)),
ggplot2::theme(
legend.position = "bottom",
plot.margin = unit(c(0.1, 0.1, 0.1, 0.1), "mm"),
text = element_text(size = 8)
)
)
# plotting node 114
plot_Node114 <-
df_reconstruction_coords |>
filter(nodes == "Node114") |>
ggplot2::ggplot() +
ggplot2::geom_raster(ggplot2::aes(x = longitude, y = latitude, fill = prob.occurr)) +
rcartocolor::scale_fill_carto_c(type = "quantitative",
palette = "SunsetDark",
direction = 1) +
ggplot2::labs(fill = "Occurrence probability
Node 114", title = "Node 114") +
theme_htools
# plotting node 180
plot_Node180 <-
df_reconstruction_coords |>
filter(nodes == "Node180") |>
ggplot2::ggplot() +
ggplot2::geom_raster(ggplot2::aes(x = longitude, y = latitude, fill = prob.occurr)) +
rcartocolor::scale_fill_carto_c(type = "quantitative",
palette = "SunsetDark",
direction = 1) +
ggplot2::labs(fill = "Occurrence probability
Node 180", title = "Node 180") +
theme_htools
# plotting node 88
plot_Node88 <-
df_reconstruction_coords |>
filter(nodes == "Node88") |>
ggplot2::ggplot() +
ggplot2::geom_raster(ggplot2::aes(x = longitude, y = latitude, fill = prob.occurr)) +
rcartocolor::scale_fill_carto_c(type = "quantitative",
palette = "SunsetDark",
direction = 1) +
ggplot2::labs(fill = "Occurrence probability
Node 88", title = "Node 88") +
theme_htools
# All plots
l_plot_prob_occ <- list(plot_Node114, plot_Node180, plot_Node88)
patchwork::wrap_plots(l_plot_prob_occ, nrow = 1, axes = "collect") +
patchwork::plot_annotation(title = "Occurrence probability of Nodes in sites")