This workshop was made using R 4.2.1, we suggest you update your R.
Prior to starting this workshop, you should run the following lines of code to install and update the packages needed for these demos.
## Make a list of packages
list_of_packages <- c("ade4","ape", "biomod2",
"caret", "CoordinateCleaner",
"devtools","dismo",
"dplyr", "ecospat",
"ENMeval", "ENMTools",
"fields", "gatoRs",
"ggplot2", "ggspatial",
"gridExtra", "gtools",
"hypervolume", "kuenm",
"lattice", "leaflet",
"magrittr", "maps",
"multcompView", "parsedate",
"phytools", "plyr",
"rangeBuilder", "raster",
"Rcpp", "ridigbio",
"rJava","rgbif",
"sf", "sp",
"spThin","spam",
"spatstat.geom","stringr",
"terra", "tidyr",
"usdm", "utils",
"viridis", "viridisLite")
## Here we identify which packages are not installed and install these for you
new.packages <- list_of_packages[!(list_of_packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
## Check version of packages
### Compare to package version used for demo creation
### This data frame has two columns: (1) package (2) version, indicating which
### version of the package was used to develop this workshop
versiondf <- read.csv("data/setup/setup.csv", stringsAsFactors = FALSE)
### Save your version under a new column named "current_version"
versiondf$current_version <- as.character(do.call(c, lapply(list_of_packages, packageVersion)))
### Compare the version the workshop was developed with to your current version
updatelist <- versiondf[which(versiondf$version != versiondf$current_version), ]
### Update packages with old versions
lapply(as.character(updatelist$packages), install.packages, ask = FALSE)
## Make sure all packages load
## Packages that are FALSE did not load
### If anything prints here, then something went wrong and a package did not install
loaded <- lapply(list_of_packages, require, character.only = TRUE)
list_of_packages[loaded == FALSE]
# Install packages not in CRAN
library(devtools)
install_github('johnbaums/rmaxent')
install_github("marlonecobos/kuenm")
install_github("nataliepatten/gatoRs")
## Check and make sure all github packages load
github_packages <- c("rmaxent", "kuenm", "gatoRs")
github_loaded <- lapply(github_packages, require, character.only = TRUE)
github_packages[github_loaded == FALSE]
tidyverse is a collection of R packages that are used for “everyday data analysis” including ggplot2, dplyr, tidyr, readr, purr, tibble, stringr, forcats. We use dplyr and ggplot2 throughout many of the R scripts.
ridigbio is an R package that queries the iDigBio API.
spocc queries data from GBIF, USGS’ Biodiversity Information Serving Our Nation (BISON), iNaturalist, Berkeley Ecoinformatics Engine, eBird, iDigBio, VertNet, Ocean Biogeographic Information System (OBIS), and Atlas of Living Australia (ALA).
RCurl and rjson are used to query an application programming interface (API), or an online database.
Geospatial packages include sp, raster, maptools, maps, rgdal, RStoolbox, and mapproj.
ggplot2 is used to plot and visualize data. ggbiplot is an add on to ggplot2.
ENMTools, ENMeval, and dismo are specific to ecological niche modeling.
hypervolume and caret have functions related to statistics.
phytools and picante are related to phylogenetics.
factoextra and FactoMiner are used for statistics related to a principal components analysis.
system("xcode-select --install")
spThin and fields do not compile from source!
If you are still having issues installing ggbiplot, you may have to uninstall the package ‘backports’ and reset R.
This exercise requires the package ENMTools. This package requires Java (and the Oracle Java JDK) and that the maxent.jar file is in the folder that contains the dismo package (this package is a dependency of ENMTools). To find out where to put the jar file, run the command:
system.file("java", package="dismo")
Once you have your java and maxent file set, you are ready to go! If you have more issues, find some help here
To download the jar file to the right directory —
devtools::install_github("johnbaums/rmaxent")
rmaxent::get_maxent(version = "latest", quiet = FALSE)
The previous step may mask rgdal! Make sure to reload it!
For rJava, the newest package requires R >= 3.6.0
dismo only requires rJava >= 0.9-7, download 0.9.7
here
If you are experiencing errors with rJava:
## Remove rJava
remove.packages(rJava)
## Update your R Java configuration
system("sudo R CMD javareconf")
## Restart R and check to see if the path matches
Sys.getenv("DYLD_FALLBACK_LIBRARY_PATH")
## Reinstall rJava
install.packages("rJava")
library(rJava)
If you are still having issues with rJava - check to see if you are using Java 12 (this should be included in the error message):
system("java -version")
The initial release of java 12 does not work - install an old version instead. To install an old version, navigate to the Oracle website or use homebrew.
system("brew cask install homebrew/cask-versions/java11")
Restart R and redo the previous steps. Required: uninstall the newer versions of java prior to repeating and restart R after.
## Uninstall Java 12, then repeat the following
## Remove rJava
remove.packages(rJava)
## Update your R Java configuration
system("sudo R CMD javareconf")
## Restart R and check to see if the path matches
Sys.getenv("DYLD_FALLBACK_LIBRARY_PATH")
## Reinstall rJava
install.packages("rJava")
library(rJava)
More debugging: Check out jerrybreak’s comment from 12/30/2021.
If you have additional issues, please let us know!
ML Gaynor.
library(ridigbio)
library(gatoRs)
library(leaflet)
iDigBio_GU <- idig_search_records(rq=list(scientificname="Galax urceolata"))
iDigBio_GU_family <- idig_search_records(rq=list(family="Diapensiaceae"), limit=1000)
What if you want to read in all the points for a family within an extent?
Hint: Use the iDigBio portal to determine the bounding box for your region of interest.
The bounding box delimits the geographic extent.
rq_input <- list("scientificname"=list("type"="exists"),
"family"="Diapensiaceae",
geopoint=list(
type="geo_bounding_box",
top_left=list(lon = -98.16, lat = 48.92),
bottom_right=list(lon = -64.02, lat = 23.06)
)
)
Search using the input you just made
iDigBio_GU_family_USA <- idig_search_records(rq_input, limit=1000)
write.csv(iDigBio_GU, "data/download/iDigBio_GU_20230605.csv",
row.names = FALSE)
write.csv(iDigBio_GU_family, "data/download/iDigBio_GU_family_20230605.csv",
row.names = FALSE)
Shortia_galacifolia <- c("Shortia galacifolia", "Sherwoodia galacifolia")
Galax_urceolata <- c("Galax urceolata", "Galax aphylla")
Pyxidanthera_barbulata <- c("Pyxidanthera barbulata","Pyxidanthera barbulata var. barbulata")
Pyxidanthera_brevifolia <- c("Pyxidanthera brevifolia", "Pyxidanthera barbulata var. brevifolia")
This function downloads records for all names listed from iDigBio, GBIF, and BISON. It keeps specific columns from each database.
gators_download(synonyms.list = Shortia_galacifolia,
write.file = TRUE,
filename = "data/download/raw/Shortia_galacifolia_raw_20230605.csv")
gators_download(synonyms.list = Galax_urceolata,
write.file = TRUE,
filename = "data/download/raw/Galax_urceolata_raw_20230605.csv")
gators_download(synonyms.list = Pyxidanthera_barbulata,
write.file = TRUE,
filename = "data/download/raw/Pyxidanthera_barbulata_raw_20230605.csv")
gators_download(synonyms.list = Pyxidanthera_brevifolia,
write.file = TRUE,
filename = "data/download/raw/Pyxidanthera_brevifolia_raw_20230605.csv")
rawdf <- read.csv("data/download/raw/Shortia_galacifolia_raw_20230605.csv")
What columns are included?
names(rawdf)
## [1] "scientificName" "genus"
## [3] "specificEpithet" "infraspecificEpithet"
## [5] "ID" "occurrenceID"
## [7] "basisOfRecord" "eventDate"
## [9] "year" "month"
## [11] "day" "institutionCode"
## [13] "recordedBy" "country"
## [15] "county" "stateProvince"
## [17] "locality" "latitude"
## [19] "longitude" "coordinateUncertaintyInMeters"
## [21] "informationWithheld" "habitat"
## [23] "aggregator"
Here is a table of the columns returned for each species in gators_download:
How many observations do we start with?
nrow(rawdf)
## [1] 1219
The error message here indicates many points do not have long/lat values (more in 02).
leaflet(rawdf) %>%
addMarkers(label = paste0(rawdf$longitude, ", ", rawdf$latitude)) %>%
addTiles()
ML Gaynor.
library(gatoRs)
library(fields)
library(sf)
library(ggplot2)
library(ggspatial)
library(leaflet)
rawdf <- read.csv("data/download/raw/Shortia_galacifolia_raw_20230605.csv")
How many observations do we start with?
nrow(rawdf)
## [1] 1219
Here we are going to harmonize taxonomy using taxa_clean(). This function has three filter options: exact, fuzzy, or interactive.
Inspect scientific names included in the raw df.
unique(rawdf$scientificName)
## [1] "Shortia galacifolia Torr. & A.Gray"
## [2] "Sherwoodia galacifolia (Torr. & A.Gray) House"
## [3] "Shortia galacifolia"
## [4] "Shortia galacifolia Torr. & A. Gray"
## [5] "Shortia galacifolia Torrey & A. Gray"
## [6] "Shortia galacifolia var. galacifolia"
## [7] "Shortia galacifolia var. brevistyla"
## [8] "Shortia galacifolia var. brevistyla P. A. Davies"
## [9] "Sherwoodia galacifolia"
Create a list of accepted names based on the name column in your data frame
search <- c("Shortia galacifolia", "Sherwoodia galacifolia")
Filter to only include accepted name:
df <- taxa_clean(df = rawdf,
synonyms.list = search,
taxa.filter = "fuzzy",
accepted.name = "Shortia galacifolia")
How many observations do we have now?
nrow(df)
## [1] 1219
Here we remove any records with missing coordinates, impossible coordinates, coordinates at (0,0), and any that are flagged as skewed. We also round the provided latitude and longitude values to a specified number of decimal places.
df <- basic_locality_clean(df = df,
remove.zero = TRUE, # Records at (0,0) are removed
precision = TRUE, # latitude and longitude are rounded
digits = 2, # round to 2 decimal places
remove.skewed = TRUE)
How many observations do we have now?
nrow(df)
## [1] 57
Remove coordinates in cultivated zones, botanical gardens, and
outside our desired range. Here we set interactive = FALSE
,
however you can visualize the outliers by setting
interactive = TRUE
.
df <- process_flagged(df, interactive = FALSE)
How many observations do we have now?
nrow(df)
## [1] 52
Here we identify and remove both (1) specimen duplicates and (2)
aggregator duplicates based on each specimens coordinates, occurrenceID,
and eventDate. To leverage all date information available, set
remove.unparseable = FALSE
to manually populate the year,
month, and day columns. Here, we also confirm all ID (UUID and key) are
unique to remove any within-aggregator duplicates that may accumulate
due to processing errors.
df <- remove_duplicates(df, remove.unparseable = TRUE)
How many observations do we have now?
nrow(df)
## [1] 41
How many more records do you retain when
remove.unparseable = FALSE
?
Maxent will only retain one point per pixel. To make the ecological niche analysis comparable, we will retain only one point per pixel. Note: Default is a raster with 30 arc sec resolution.
Maxent will only retain one point per pixel. To make the ecological niche analysis comparable, we will retain only one point per pixel. Note: Default is a raster with 30 arc sec resolution.
df <- one_point_per_pixel(df)
How many observations do we have now?
nrow(df)
## [1] 21
Reduce the effects of sampling bias using randomization approach.
First, lets assess the current minimum distance among your occurrence records. Here, we calculate minimum nearest neighbor distance in km.
nnDm <- rdist.earth(as.matrix(data.frame(lon = df$long, lat = df$lat)), miles = FALSE, R = NULL)
nnDmin <- do.call(rbind, lapply(1:5, function(i) sort(nnDm[,i])[2]))
min(nnDmin)
## [1] 2.226477
Here the current minimum distance is 2.22 km. Based on literature, we find a 2 meters (or 0.002 km) distance was enough to collect unique genets, so we do not need to thin our points.
When you do need to thin your records, here is a great function to do so!
df <- thin_points(df,
distance = 0.002, # in km
reps = 100)
df_fixed <- st_as_sf(df, coords = c("longitude", "latitude"), crs = 4326)
USA <- borders(database = "usa", colour = "gray50", fill = "gray50")
state <- borders(database = "state", colour = "black", fill = NA)
Here we are using ggplot2 and ggspatial to plot our occurrence records. We are using two basemaps, USA and state. The geom_point function adds our points based on long and lat, and colors them blue. We set the coordinate visualization space with coord_sf. We fixed the x and y labels. Finally, we added a scale and north arrow.
simple_map <-ggplot() +
USA +
state +
geom_sf(df_fixed,
mapping = aes(col = name),
col = "blue") +
coord_sf(xlim = c(min(df$longitude) - 3,
max(df$longitude) + 3),
ylim = c(min(df$latitude) - 3,
max(df$latitude) + 3)) +
xlab("Longitude") +
ylab("Latitude") +
annotation_scale(plot_unit = "km") +
annotation_north_arrow(height = unit(1, "cm"),
width = unit(1, "cm"),
location = "tl")
simple_map
Another fun way to view these points is with leaflet.
leaflet(df_fixed) %>%
addMarkers(label = paste0(df$longitude, ", ", df$latitude)) %>%
addTiles()
write.csv(df, "data/cleaning_demo/Shortia_galacifolia_20230605-cleaned.csv", row.names = FALSE)
## Set up for loop
files <- list.files("data/download/raw", full.names = TRUE)[1:3]
synonymns <- list(Galax_urceolata = c("Galax urceolata",
"Galax urceolata (Poir.) Brummitt",
"Galax urceolata (Poiret) Brummitt",
"Galax urceolaa",
"Galax aphylla L.",
"Galax aphylla"),
Pyxidanthera_barbulata =c("Pyxidanthera barbulata",
"Pyxidanthera barbulata Michx.",
"Pyxidanthera barbulata var. barbulata" ,
"Pyxidenthera barbulata"),
Pyxidanthera_brevifolia = c("Pyxidanthera brevifolia",
"Pyxidanthera brevifolia Wells",
"Pyxidanthera barbulata var. brevifolia (Wells) H.E.Ahles",
"Pyxidanthera barbulata var. brevifolia" ))
## Repeat cleaning steps for the remaining taxa
for(i in 1:3){
df <- read.csv(files[i])
# Taxa clean
search <- synonymns[[i]]
df <- taxa_clean(df = df, synonyms.list = search,
taxa.filter = "exact",
accepted.name = search[1])
# Locality clean
df <- basic_locality_clean(df = df, remove.zero = TRUE,
precision = TRUE, digits = 2,
remove.skewed = TRUE)
df <- process_flagged(df, interactive = FALSE)
# Remove duplicates
df <- remove_duplicates(df, remove.unparseable = TRUE)
# Spatial correct
df <- one_point_per_pixel(df)
df <- thin_points(df, distance = 0.002, reps = 100)
# Save file
outfile <- paste0("data/cleaning_demo/",
gsub(" ", "_", search[1]),
"_20230605-cleaned.csv")
write.csv(df, outfile,
row.names = FALSE)
rm(df, search, outfile)
}
alldf <- list.files("data/cleaning_demo/", full.names = TRUE,
recursive = FALSE, include.dirs = FALSE, pattern = "*.csv")
alldf <- lapply(alldf, read.csv)
alldf <- do.call(rbind, alldf)
Visually inspect records to verify no additional records should be removed.
### Make points spatial
alldf_fixed <- st_as_sf(alldf, coords = c("longitude", "latitude"),
crs = 4326)
### Set basemap
USA <- borders(database = "usa", colour = "gray50", fill = "gray50")
state <- borders(database = "state", colour = "black", fill = NA)
### Plot
all_map <- ggplot() +
USA +
state +
geom_sf(alldf_fixed,
mapping = aes(col = factor(accepted_name))) +
coord_sf(xlim = c(min(alldf$longitude) - 3, max(alldf$longitude) + 3),
ylim = c(min(alldf$latitude) - 3, max(alldf$latitude) + 3)) +
xlab("Longitude") +
ylab("Latitude") +
labs(color = "Scientific name") +
annotation_scale(plot_unit = "km") +
annotation_north_arrow(height = unit(1, "cm"),
width = unit(1, "cm"),
location = "tl")
all_map
alldfchomp <- c()
species <- unique(alldf$accepted_name)
for(i in 1:4){
alldfchomp[[i]] <- data_chomp(alldf[alldf$accepted_name ==species[i], ],
accepted.name = species[i])
}
alldf <- do.call(rbind, alldfchomp)
write.csv(alldf, "data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20230605.csv", row.names = FALSE)
Create a batch file for GeoLocate. Created by ME Mabry.
Modified by ML Gaynor.
library(dplyr)
library(gatoRs)
rawdf <- read.csv("data/download/raw/Shortia_galacifolia_raw_20230605.csv")
rawdf_GeoRef <- need_to_georeference(rawdf)
rawdf_GeoRef <- rawdf_GeoRef %>%
dplyr::select("locality string" = locality,
country = country,
state = stateProvince,
county = county,
latitude = latitude,
longitude = longitude,
ID = ID,
name = scientificName,
basis = basisOfRecord)
rawdf_GeoRef$'correction status' <- ""
rawdf_GeoRef$precision <- ""
rawdf_GeoRef$'error polygon' <- ""
rawdf_GeoRef$'multiple results' <- ""
rawdf_GeoRef2<- rawdf_GeoRef[,c("locality string", "country", "state", "county", "latitude",
"longitude", "correction status", "precision", "error polygon",
"multiple results", "ID", "name", "basis")]
write.csv(rawdf_GeoRef2,
file = "data/georeferencing/Shortia_galacifolia_Needing_GeoRef_20230605.csv",
row.names = FALSE)
Climate layer processing.
Modified and created by ML Gaynor.
Based on code by Mike Belitz (mbelitz/Odo_SDM_Rproj).
library(raster)
library(gtools)
library(sf)
library(rangeBuilder)
library(dplyr)
library(caret)
library(usdm)
library(dismo)
library(stringr)
Based on code by Mike Belitz (mbelitz/Odo_SDM_Rproj).
source("functions/VIFLayerSelect.R")
biolist <- list.files("data/climate_processing/bioclim/", pattern = "*.tif", full.names = TRUE)
Order list using gtools.
biolist <- mixedsort(sort(biolist))
Load rasters as a stack.
biostack <- raster::stack(biolist)
And fix the name column class and make sure it is a character.
alldf <- read.csv("data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20230605.csv")
alldf$species <- as.character(alldf$species)
alldfsp <- st_as_sf(alldf, coords = c("longitude", "latitude"), crs = 4326)
Here we will make the projection layers, or the layers which include shared space. First, we have to define the accessible space.
hull <- rangeBuilder::getDynamicAlphaHull(x = alldf,
coordHeaders = c("longitude", "latitude"),
fraction = 1, # min. fraction of records we want included
partCount = 1, # number of polygons allowed
initialAlpha = 20, # initial alpha size, 20m
clipToCoast = "terrestrial",
verbose = TRUE)
plot(hull[[1]], col=rangeBuilder::transparentColor('gray50', 0.5), border = NA)
points(x = alldf$longitude, y = alldf$latitude, cex = 0.5, pch = 3)
Here we take the 80th quantile of the max distance between points
buffDist <- quantile(x = (apply(sf::st_distance(alldfspTrans), 2, FUN = function(x) sort(x)[2])),
probs = 0.80, na.rm = TRUE)
buffDist
## 80%
## 3786.239
buffer_m <- st_buffer(x = hullTrans, dist = buffDist, dissolve = TRUE)
buffer <- st_transform(buffer_m, CRS("+proj=longlat +datum=WGS84"))
plot(buffer, col=rangeBuilder::transparentColor('gray50', 0.5), border = NA)
points(x = alldf$longitude, y = alldf$latitude, cex = 0.5, pch = 3)
path <- "data/climate_processing/all/"
end <- ".asc"
for(i in 1:length(biolist)){
# Subset raster layer
rast <- biostack[[i]]
# Setup file names
name <- names(rast)
out <- paste0(path, name)
outfile <- paste0(out, end)
# Crop and mask
c <- crop(rast, sf::as_Spatial(buffer))
c <- mask(c, sf::as_Spatial(buffer))
# Write raster
writeRaster(c, outfile, format = "ascii", overwrite = TRUE)
}
Warning: Many of these steps are time intensive! Do not run during the workshop, we have already done these for you.
We only want to include layers that are not highly correlated. To assess which layers we will include, we can look at the pearson correlation coefficient among layers.
clippedlist <- list.files("data/climate_processing/all/", pattern = "*.asc", full.names = TRUE)
clippedlist <- mixedsort(sort(clippedlist))
clippedstack <- raster::stack(clippedlist)
Warning: Time Intensive!
corr <- layerStats(clippedstack, 'pearson', na.rm=TRUE)
Isolate only the pearson correlation coefficient and take absolute value.
c <- abs(corr$`pearson correlation coefficient`)
write.csv(c, "data/climate_processing/correlationBioclim.csv", row.names = FALSE)
Highly correlated layers (> |0.80|) can impact the statistical significance of the niche models and therefore must be removed.
envtCor <- mixedsort(sort(findCorrelation(c, cutoff = 0.80, names = TRUE, exact = TRUE)))
envtCor
## [1] "bio_1" "bio_3" "bio_4" "bio_6" "bio_9" "bio_10" "bio_11" "bio_12"
## [9] "bio_13" "bio_16" "bio_17" "bio_19"
Warning: Time Intensive!
VIF can detect for multicollinearity in a set of multiple regression
variables. Run a simple maxent model for every species and calculate the
average permutation contribution.
Loop through each species and save permutation importance in
list.
Warning: This will print warnings even when it works
fine.
set.seed(195)
m <- c()
for(i in 1:length(unique(alldf$species))){
name <- unique(alldf$species)[i]
spp_df <- alldf %>%
dplyr::filter(species == name) %>%
dplyr::select(longitude, latitude)
model <- dismo::maxent(x = clippedstack, p = spp_df,
progress = "text", silent = FALSE)
m[[i]] <- vimportance(model)
}
mc <- do.call(rbind, m)
mc_average <- aggregate(mc[, 2], list(mc$Variables), mean)
mc_average <- mc_average %>%
dplyr::select(Variables = Group.1, permutation.importance = x)
mc1 <- mc_average
Use VIF and the MaxEnt permutation importance to select the best variables for your model. Note, this leads to different layers when the models are rerun without setting seed due to permutations being random.
selectedlayers <- VIF_layerselect(clippedstack, mc_average)
mixedsort(sort(names(selectedlayers)))
Since this can vary per system (despite setting seed), we added this line to keep our files consistent for the workshop
sl <- c("bio_3", "bio_7", "bio_8", "bio_9", "bio_14", "bio_15", "bio_18", "elev")
selectedlayers <- raster::subset(clippedstack, sl)
Move selected layers to “Present_Layers/all/” for use in MaxEnt later.
for(i in 1:length(names(selectedlayers))){
name <- names(selectedlayers)[i]
from <- paste0("data/climate_processing/all/", name, ".asc")
to <- paste0("data/climate_processing/PresentLayers/all/", name, ".asc")
file.copy(from, to,
overwrite = TRUE, recursive = FALSE,
copy.mode = TRUE)
}
for(i in 1:length(unique(alldf$species))){
sname <- unique(alldf$species)[i]
# Subset species from data frame
spp_df <- alldf %>%
dplyr::filter(species == sname)
spp_dfsp <- st_as_sf(spp_df, coords = c("longitude", "latitude"), crs = 4326)
## Create alpha hull
sphull <- rangeBuilder::getDynamicAlphaHull(x = spp_df,
coordHeaders = c("longitude", "latitude"),
fraction = 1, # min. fraction of records we want included
partCount = 1, # number of polygons allowed
initialAlpha = 20, # initial alpha size, 20m
clipToCoast = "terrestrial",
verbose = TRUE)
## Add buffer to hull
### Transform into CRS related to meters with Equal Area Cylindrical projection
sphullTrans <- st_transform(sphull[[1]], CRS("+proj=cea +lat_ts=0 +lon_0"))
spp_dfTrans <- st_transform(spp_dfsp, CRS("+proj=cea +lat_ts=0 +lon_0"))
### Calculate buffer size
#### Here we take the 80th quantile of the max distance between points
spbuffDist <- quantile(x = (apply(sf::st_distance(spp_dfTrans), 2, FUN = function(x) sort(x)[2])),
probs = 0.80, na.rm = TRUE)
### Buffer the hull
spbuffer_m <- st_buffer(x = sphullTrans, dist = spbuffDist, dissolve = TRUE)
spbuffer <- st_transform(spbuffer_m, CRS("+proj=longlat +datum=WGS84"))
### Crop and Mask
spec <- gsub(" ", "_", sname)
path <- paste0("data/climate_processing/PresentLayers/", spec,"/")
end <- ".asc"
for(j in 1:length(names(selectedlayers))){
# Subset raster layer
rast <- selectedlayers[[j]]
# Setup file names
name <- names(rast)
out <- paste0(path, name)
outfile <- paste0(out, end)
# Crop and mask
c <- crop(rast, sf::as_Spatial(spbuffer))
c <- mask(c, sf::as_Spatial(spbuffer))
# Write raster
writeRaster(c, outfile, format = "ascii", overwrite = TRUE)
}
}
Ecological analysis using points. This is based off scripts created
by Hannah Owens and James Watling, and Anthony Melton.
Modified and created by ML Gaynor.
library(gtools)
library(raster)
library(plyr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(multcompView)
library(gridExtra)
library(ecospat)
library(dismo)
library(ade4)
alldf <- read.csv("data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20230605.csv")
list <- list.files("data/climate_processing/PresentLayers/all", full.names = TRUE, recursive = FALSE)
list <- mixedsort(sort(list))
envtStack <- stack(list)
For each occurence record, extract the value for each bioclim variables using the package raster.
ptExtracted <- raster::extract(envtStack, alldf[2:3])
ptExtracteddf <- as.data.frame(ptExtracted)
ptExtracteddf <- ptExtracteddf %>%
dplyr::mutate(name = as.character(alldf$species),
x = alldf$longitude,
y = alldf$latitude)
ptExtracteddf <- ptExtracteddf %>%
tidyr::drop_na()
data.bioclim <- ptExtracteddf[, 1:8]
data.species <- ptExtracteddf[, 9]
data.pca <- prcomp(data.bioclim, scale. = TRUE)
When you use the command prcomp your loading variables show up as rotational variables. Thanks to a really great answer on stack overflow you can even convert the rotational variable to show the relative contribution.
loadings <- data.pca$rotation
summary(loadings)
## PC1 PC2 PC3 PC4
## Min. :-0.41017 Min. :-0.35451 Min. :-0.51097 Min. :-0.38192
## 1st Qu.:-0.22213 1st Qu.:-0.05133 1st Qu.:-0.26038 1st Qu.:-0.22393
## Median : 0.24618 Median : 0.11569 Median : 0.02413 Median :-0.05375
## Mean : 0.09623 Mean : 0.16167 Mean : 0.02075 Mean : 0.03410
## 3rd Qu.: 0.38700 3rd Qu.: 0.45486 3rd Qu.: 0.34826 3rd Qu.: 0.26963
## Max. : 0.45572 Max. : 0.58432 Max. : 0.53397 Max. : 0.62319
## PC5 PC6 PC7 PC8
## Min. :-0.6295 Min. :-0.24822 Min. :-0.2551 Min. :-0.711115
## 1st Qu.:-0.4145 1st Qu.:-0.17802 1st Qu.:-0.1173 1st Qu.:-0.283120
## Median :-0.2340 Median :-0.11462 Median : 0.1206 Median :-0.190132
## Mean :-0.1698 Mean : 0.08696 Mean : 0.1623 Mean :-0.158904
## 3rd Qu.: 0.1597 3rd Qu.: 0.43387 3rd Qu.: 0.4533 3rd Qu.: 0.008766
## Max. : 0.2532 Max. : 0.61751 Max. : 0.6537 Max. : 0.428816
There are two options to convert the loading to show the relative contribution, they both give the same answer so either can be used.
loadings_relative_A <- t(t(abs(loadings))/rowSums(t(abs(loadings))))*100
loadings_relative_A
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## bio_3 8.591746 18.290740 15.598618 26.50100048 15.693465 2.892773 10.826188
## bio_7 15.106599 3.605906 20.544960 4.56019510 8.196901 6.586901 27.743419
## bio_8 14.713687 1.268164 13.920838 0.01157387 24.676885 22.390720 4.151105
## bio_9 9.541759 24.350510 8.759713 16.03534080 7.639386 25.535124 3.884780
## bio_14 16.784509 5.136389 1.759000 16.24120193 10.151969 6.787089 7.464726
## bio_15 6.003487 25.442930 21.469601 8.28251442 9.927169 9.084507 6.348032
## bio_18 15.380721 6.468852 14.248091 7.35172686 17.912054 10.264436 18.683458
## elev 13.877493 15.436508 3.699179 21.01644654 5.802170 16.458450 20.898292
## PC8
## bio_3 0.2683966
## bio_7 10.5755001
## bio_8 6.4386942
## bio_9 2.3740267
## bio_14 31.8174754
## bio_15 18.6736844
## bio_18 19.1865553
## elev 10.6656673
loadings_relative_B <- sweep(x = abs(loadings),
MARGIN = 2,
STATS = colSums(abs(loadings)), FUN = "/")*100
loadings_relative_B
First, I made a theme to change the background of the plot. Next, I changed the plot margins and the text size.
theme <- theme(panel.background = element_blank(),
panel.border=element_rect(fill=NA),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
strip.background=element_blank(),
axis.ticks=element_line(colour="black"),
plot.margin=unit(c(1,1,1,1),"line"),
axis.text = element_text(size = 12),
legend.text = element_text(size = 12),
legend.title = element_text(size = 12),
text = element_text(size = 12))
pal <- c("#D43F3AFF", "#EEA236FF", "#5CB85CFF", "#46B8DAFF")
Next, use ggbiplot where obs.scale indicates the scale factor to apply to observation, var.scale indicates the scale factor to apply to variables, ellipse as TRUE draws a normal data ellipse for each group, and circle as TRUE draws a correlation circle.
g <- ggbiplot(data.pca, obs.scale = 1, var.scale = 1,
groups = data.species, ellipse = TRUE, circle = TRUE) +
scale_color_manual(name = '', values = pal) +
theme(legend.direction = 'vertical', legend.position = 'right',
legend.text = element_text(size = 12, face = "italic")) +
theme
g
Simple function to run an ANOVA and a post-hoc Tukey-HSD test
stat.test <- function(dataframe, x = "name", y){
bioaov <- aov(as.formula(paste0(y,"~",x)), data = dataframe)
TH <- TukeyHSD(bioaov, "name")
m <- multcompLetters(TH$name[,4])
y.val <- as.numeric(max(dataframe[[y]]) + 1)
groups <- data.frame(groups = m$Letters, name = names(m$Letters), y.val = rep(y.val, 4))
return(groups)
}
bio3aovplot <- ggplot(ptExtracteddf, aes(x = name, y = bio_3)) +
geom_boxplot(aes(fill = name)) +
scale_color_manual(name = '', values = pal) +
geom_text(data = stat.test(dataframe =ptExtracteddf, y = "bio_3"),
mapping = aes(x = name,
y = max(ptExtracteddf["bio_3"]+1),
label = groups),
size = 5, inherit.aes = FALSE) +
theme(axis.text.x = element_text(angle = 90, size = 8, face = 'italic'))
bio3aovplot
variablelist <- colnames(ptExtracteddf)[1:8]
plotlist <- list()
maxstats <- c()
statsout <- c()
for(i in 1:8){
vname <- variablelist[i]
maxstats[i] <- as.numeric(max(ptExtracteddf[[vname]]) + 1)
statsout[[i]] <- stat.test(dataframe = ptExtracteddf, y = vname)
plotlist[[i]] <- ggplot(ptExtracteddf, aes(x = name, y = .data[[vname]])) +
geom_boxplot(aes(fill = name)) +
scale_colour_manual(name = 'Species', values = pal) +
geom_text(data = statsout[[i]],
mapping = aes(x = name,
y = (y.val*1.1),
label = groups),
size = 4, inherit.aes = FALSE) +
scale_x_discrete(labels = c('G', 'Pba','Pbr', 'S')) +
ggtitle(label = vname) +
ylab(vname) +
theme(legend.position = "none") +
ylim(0, (maxstats[i]*1.2))
}
gridExtra::grid.arrange(grobs = plotlist)
bg1 <- randomPoints(mask = envtStack, n = 1000, p = alldf[,3:2])
bg1.env <- raster::extract(envtStack, bg1)
bg1.env <- data.frame(bg1.env)
allpt.bioclim <- rbind(bg1.env, data.bioclim)
pca.env <- dudi.pca(allpt.bioclim,
center = TRUE, # Center by the mean
scannf = FALSE, # Don't plot
nf = 2) # Number of axis to keep
p1.score <- suprow(pca.env, dplyr::filter(ptExtracteddf, name == "Galax urceolata")[, 1:8])$li
p2.score <- suprow(pca.env, dplyr::filter(ptExtracteddf, name == "Pyxidanthera barbulata")[, 1:8])$li
p3.score <- suprow(pca.env, dplyr::filter(ptExtracteddf, name == "Pyxidanthera brevifolia")[, 1:8])$li
p4.score <- suprow(pca.env, dplyr::filter(ptExtracteddf, name == "Shortia galacifolia")[, 1:8])$li
scores.clim <- pca.env$li
plot(scores.clim, pch = 16, asp = 1,
col = adjustcolor(1, alpha.f = 0.2), cex = 2,
xlab = "PC1", ylab = "PC2")
points(p1.score, pch = 18, col = pal[1], cex = 2)
points(p2.score, pch = 18, col = pal[2], cex = 2)
points(p3.score, pch = 18, col = pal[3], cex = 2)
points(p4.score, pch = 18, col = pal[4], cex = 2)
Create occurrence density grids based on the ordination data.
z1 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p1.score, R = 100)
## Registered S3 methods overwritten by 'adehabitatMA':
## method from
## print.SpatialPixelsDataFrame sp
## print.SpatialPixels sp
z2 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p2.score, R = 100)
z3 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p3.score, R = 100)
z4 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p4.score, R = 100)
zlist <- list(z1, z2, z3, z4)
Schoener’s D ranges from 0 to 1. 0 represents no similarity between niche space. 1 represents completely identical niche space.
overlapD <- matrix(ncol = 2, nrow = 7)
n <- 1
for(i in 1:3){
for(j in 2:4){
if(i != j){
overlapD[n, 1]<- paste0("z", i, "-", "z", j)
overlapD[n, 2]<- ecospat.niche.overlap(zlist[[i]], zlist[[j]], cor = TRUE)$D
n <- n + 1
}
}
}
overlapDdf <- data.frame(overlapD)
overlapDdf
## X1 X2
## 1 z1-z2 0.0890148692326306
## 2 z1-z3 0.00297245638991617
## 3 z1-z4 0.453009449060948
## 4 z2-z3 0.00376518449381513
## 5 z2-z4 0.0739737424962652
## 6 z3-z2 0.00376518449381513
## 7 z3-z4 0.00236297216796688
par(mfrow=c(1,2))
ecospat.plot.niche.dyn(z1, z4, quant=0.25, interest = 1
, title= "Niche Overlap - Z1 top", name.axis1="PC1", name.axis2="PC2")
ecospat.plot.niche.dyn(z1, z4, quant=0.25, interest = 2
, title= "Niche Overlap - Z4 top", name.axis1="PC1", name.axis2="PC2")
Based on Warren et al. 2008 - Are the two niche identical?
Hypothesis test for D, null based on randomization. H1: the niche
overlap is higher than expected by chance (or when randomized).
eq.test <- ecospat.niche.equivalency.test(z1, z4, rep = 10)
ecospat.plot.overlap.test(eq.test, "D", "Equivalency")
Based on Warren et al. 2008 - Are the two niche similar?
Can one species’ niche predict the occurrences of a second species
better than expected by chance?
sim.test <- ecospat.niche.similarity.test(z1, z4, rep = 10, rand.type=2)
ecospat.plot.overlap.test(sim.test, "D", "Similarity")
Ecological Niche Modeling in R.
Modified and created by Anthony Melton and ML Gaynor.
This script is for generating and testing ENMs using ENMEval. Please see the paper describing ENMEval and their vignette.
options(java.parameters = "-Xmx16g") # increase memory that can be used
library(raster)
library(gtools)
library(dplyr)
library(dismo)
library(ENMeval)
library(ggplot2)
library(viridis)
source("functions/ENMevaluation.R")
alldf <- read.csv("data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20230605.csv")
Galax_urceolata <- dplyr::filter(alldf, species == "Galax urceolata")
Pyxidanthera_barbulata <- dplyr::filter(alldf, species == "Pyxidanthera barbulata")
Pyxidanthera_brevifolia <- dplyr::filter(alldf, species == "Pyxidanthera brevifolia")
Shortia_galacifolia <- dplyr::filter(alldf, species == "Shortia galacifolia")
list <- list.files("data/climate_processing/PresentLayers/all", full.names = TRUE, recursive = FALSE)
list <- mixedsort(sort(list))
allstack <- stack(list)
gstack <- stack(mixedsort(sort(list.files("data/climate_processing/PresentLayers/Galax_urceolata/", full.names = TRUE))))
pbastack <- stack(mixedsort(sort(list.files("data/climate_processing/PresentLayers/Pyxidanthera_barbulata/", full.names = TRUE))))
pbrstack <- stack(mixedsort(sort(list.files("data/climate_processing/PresentLayers/Pyxidanthera_brevifolia/", full.names = TRUE))))
sstack <- stack(mixedsort(sort(list.files("data/climate_processing/PresentLayers/Shortia_galacifolia/", full.names = TRUE))))
projection(allstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
projection(gstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
projection(pbastack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
projection(pbrstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
projection(sstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
Warning: Time intensive on Apple M1 chip! Match MaxEnt GUI settings (see word document and powerpoint)
evaldis <- dismo::maxent(x = gstack,
p = Galax_urceolata[, c("longitude", "latitude")],
nbg = 10000,
args = c("projectionlayers=data/climate_processing/PresentLayers/all",
"responsecurves",
"jackknife",
"outputformat=logistic",
"randomseed",
"randomtestpoints=25",
"replicates=5",
"replicatetype=subsample",
"maximumiterations=5000",
"writebackgroundpredictions",
"responsecurvesexponent",
"writeplotdata"),
removeDuplicates = TRUE#,
#path = "data/Ecological_Niche_Modeling/enm_output/Galax_urceolata/"
)
ENMeval will generate multiple models and test them per the specified test method.Two important variables here are the regularization multiplier (RM) value and the feature class (FC). FC will allow for different shapes in response curves (linear, hinge, quadratic, product, and threshold) can be used in the model and RM will influence how many parameters are included in the model.
eval1 <- ENMeval::ENMevaluate(occ = Galax_urceolata[, c("longitude", "latitude")],
env = gstack,
tune.args = list(fc = c("L","Q"), rm = 1:2),
partitions = "block",
n.bg = 10000,
parallel = FALSE,
algorithm = 'maxent.jar',
user.eval = proc)
##
|
| | 0%
|
|================== | 25%
|
|=================================== | 50%
|
|==================================================== | 75%
|
|======================================================================| 100%
Inspect the dismo model based on the html
browseURL(evaldis@html)
Inspect the many models
maps <- eval1@predictions
plot(maps)
mod_overlap <- calc.niche.overlap(eval1@predictions, overlapStat = "D")
##
|
| | 0%
|
|======================= | 33%
|
|=============================================== | 67%
|
|======================================================================| 100%
kable(mod_overlap) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 10)) %>%
scroll_box(width = "100%", height = "200px")
fc.L_rm.1 | fc.Q_rm.1 | fc.L_rm.2 | fc.Q_rm.2 | |
---|---|---|---|---|
fc.L_rm.1 | NA | NA | NA | NA |
fc.Q_rm.1 | 0.8951659 | NA | NA | NA |
fc.L_rm.2 | 0.9864079 | 0.8870755 | NA | NA |
fc.Q_rm.2 | 0.9062003 | 0.9801491 | 0.8992666 | NA |
Identify the best model selecting models with the lowest average test omission rate and the highest average validation AUC
results <- eval.results(eval1)
opt.seq <- results %>%
dplyr::filter(or.10p.avg == min(or.10p.avg)) %>%
dplyr::filter(auc.val.avg == max(auc.val.avg))
kable(opt.seq) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 10)) %>%
scroll_box(width = "100%", height = "200px")
fc | rm | tune.args | auc.train | cbi.train | auc.diff.avg | auc.diff.sd | auc.val.avg | auc.val.sd | cbi.val.avg | cbi.val.sd | or.10p.avg | or.10p.sd | or.mtp.avg | or.mtp.sd | proc_auc_ratio.avg | proc_auc_ratio.sd | proc_pval.avg | proc_pval.sd | AICc | delta.AICc | w.AIC | ncoef |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
L | 2 | fc.L_rm.2 | 0.8777198 | 0.995 | 0.0647308 | 0.0290278 | 0.8587255 | 0.0598812 | 0.9325 | 0.0859477 | 0.1319444 | 0.0749609 | 0.0023148 | 0.0046296 | 1.925411 | 0.0146166 | 0 | 0 | 42603.12 | 36.77516 | 0 | 8 |
mod.seq <- eval.models(eval1)[[opt.seq$tune.args]]
kable(mod.seq@results) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 10)) %>%
scroll_box(width = "100%", height = "200px")
X.Training.samples | 1728.0000 |
Regularized.training.gain | 0.6747 |
Unregularized.training.gain | 0.7039 |
Iterations | 140.0000 |
Training.AUC | 0.8227 |
X.Background.points | 11704.0000 |
bio_14.contribution | 5.9779 |
bio_15.contribution | 7.4804 |
bio_18.contribution | 0.0997 |
bio_3.contribution | 11.5439 |
bio_7.contribution | 2.7500 |
bio_8.contribution | 5.0561 |
bio_9.contribution | 1.3086 |
elev.contribution | 65.7834 |
bio_14.permutation.importance | 0.0000 |
bio_15.permutation.importance | 27.4510 |
bio_18.permutation.importance | 0.0000 |
bio_3.permutation.importance | 22.7601 |
bio_7.permutation.importance | 17.0200 |
bio_8.permutation.importance | 0.2045 |
bio_9.permutation.importance | 1.6658 |
elev.permutation.importance | 30.8986 |
Entropy | 8.6925 |
Prevalence..average.probability.of.presence.over.background.sites. | 0.2961 |
Fixed.cumulative.value.1.cumulative.threshold | 1.0000 |
Fixed.cumulative.value.1.Cloglog.threshold | 0.0751 |
Fixed.cumulative.value.1.area | 0.9052 |
Fixed.cumulative.value.1.training.omission | 0.0185 |
Fixed.cumulative.value.5.cumulative.threshold | 5.0000 |
Fixed.cumulative.value.5.Cloglog.threshold | 0.1327 |
Fixed.cumulative.value.5.area | 0.7183 |
Fixed.cumulative.value.5.training.omission | 0.0284 |
Fixed.cumulative.value.10.cumulative.threshold | 10.0000 |
Fixed.cumulative.value.10.Cloglog.threshold | 0.1883 |
Fixed.cumulative.value.10.area | 0.5726 |
Fixed.cumulative.value.10.training.omission | 0.0723 |
Minimum.training.presence.cumulative.threshold | 0.0161 |
Minimum.training.presence.Cloglog.threshold | 0.0252 |
Minimum.training.presence.area | 0.9964 |
Minimum.training.presence.training.omission | 0.0000 |
X10.percentile.training.presence.cumulative.threshold | 13.9581 |
X10.percentile.training.presence.Cloglog.threshold | 0.2228 |
X10.percentile.training.presence.area | 0.4851 |
X10.percentile.training.presence.training.omission | 0.0995 |
Equal.training.sensitivity.and.specificity.cumulative.threshold | 31.5097 |
Equal.training.sensitivity.and.specificity.Cloglog.threshold | 0.3892 |
Equal.training.sensitivity.and.specificity.area | 0.2257 |
Equal.training.sensitivity.and.specificity.training.omission | 0.2257 |
Maximum.training.sensitivity.plus.specificity.cumulative.threshold | 30.0006 |
Maximum.training.sensitivity.plus.specificity.Cloglog.threshold | 0.3679 |
Maximum.training.sensitivity.plus.specificity.area | 0.2419 |
Maximum.training.sensitivity.plus.specificity.training.omission | 0.2049 |
Balance.training.omission..predicted.area.and.threshold.value.cumulative.threshold | 4.6516 |
Balance.training.omission..predicted.area.and.threshold.value.Cloglog.threshold | 0.1278 |
Balance.training.omission..predicted.area.and.threshold.value.area | 0.7310 |
Balance.training.omission..predicted.area.and.threshold.value.training.omission | 0.0249 |
Equate.entropy.of.thresholded.and.original.distributions.cumulative.threshold | 12.8080 |
Equate.entropy.of.thresholded.and.original.distributions.Cloglog.threshold | 0.2125 |
Equate.entropy.of.thresholded.and.original.distributions.area | 0.5090 |
Equate.entropy.of.thresholded.and.original.distributions.training.omission | 0.0932 |
plot(mod.seq)
dismo::response(mod.seq)
p <- predict(mod.seq, allstack)
# Make into a data frame
p_df <- as.data.frame(p, xy = TRUE)
# Plot
ggplot() +
geom_raster(data = p_df, aes(x = x, y = y, fill = layer)) +
geom_point(data= Galax_urceolata,
mapping = aes(x = longitude, y = latitude),
col='red', cex=0.05) +
coord_quickmap() +
theme_bw() +
scale_fill_gradientn(colours = viridis::viridis(99),
na.value = "black")
save(mod.seq, file = "data/Ecological_Niche_Modeling/enm_output/ENMeval/GalaxENM.rda")
writeRaster(x = p, filename = "data/Ecological_Niche_Modeling/enm_output/ENMeval/GalaxENM.asc",
format = "ascii", NAFlag = "-9999", overwrite = T)
Processing Ecological Niche Models in R.
Script by Anthony Melton and ML Gaynor.
library(raster)
library(gtools)
library(dplyr)
library(ENMTools)
library(ENMeval)
library(ape)
library(RStoolbox)
library(hypervolume)
library(phytools)
library(terra)
The following command generates a binary predicted occurrence map. This was written by Anthony Melton.
source("functions/Functions_AEM.R")
sp1_enm.mx.b <- raster("data/Ecological_Niche_Modeling/enm_output/Galax_urceolata_avg.asc")
sp2_enm.mx.b <- raster("data/Ecological_Niche_Modeling/enm_output/Pyxidanthera_barbulata_avg.asc")
sp3_enm.mx.b <- raster("data/Ecological_Niche_Modeling/enm_output/Pyxidanthera_brevifolia_avg.asc")
sp4_enm.mx.b <- raster("data/Ecological_Niche_Modeling/enm_output/Shortia_galacifolia_avg.asc")
These steps are described in the previous script
system("rm -rf data/climate_processing/PresentLayers/all/maxent.cache/")
list <- list.files("data/climate_processing/PresentLayers/all/",
full.names = TRUE, recursive = FALSE)
list <- mixedsort(sort(list))
allstack <- stack(list)
projection(allstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
Niche breadth is the breadth of environmental factors for a species’ niche, ranging from 0 to 1. When breadth is closer to 1 the more generalist species with wider tolerances. Values closer to 0 indicate a more specialized species. The raster.breadth command in ENMTools measures the smoothness of suitability scores across a projected landscape. The higher the score, the more of the available niche space a species occupies.
## MaxEnt GUI
sp1_breadth <- ENMTools::raster.breadth(terra::rast(sp1_enm.mx.b))
sp1_breadth$B2
## [1] 0.6369572
sp2_breadth <- ENMTools::raster.breadth(terra::rast(sp2_enm.mx.b))
sp2_breadth$B2
## [1] 0.1665568
sp3_breadth <- ENMTools::raster.breadth(terra::rast(sp3_enm.mx.b))
sp3_breadth$B2
## [1] 0.04380163
sp4_breadth <- ENMTools::raster.breadth(terra::rast(sp4_enm.mx.b))
sp4_breadth$B2
## [1] 0.4567
Calculating niche overlap, Schoener’s D, with ENMEval - Schoener’s D ranges from 0 to 1, where 0 represents no similarity between the projections and 1 represents completely identical projections.
enm_stack.b <- stack(sp1_enm.mx.b, sp2_enm.mx.b, sp3_enm.mx.b, sp4_enm.mx.b)
names(enm_stack.b) <- c("Galax urceolata", "Pyxidanthera barbulata", "Pyxidanthera brevifolia","Shortia galacifolia" )
calc.niche.overlap(enm_stack.b, overlapStat = "D")
##
|
| | 0%
|
|======================= | 33%
|
|=============================================== | 67%
|
|======================================================================| 100%
## Galax.urceolata Pyxidanthera.barbulata
## Galax.urceolata NA NA
## Pyxidanthera.barbulata 0.2108513 NA
## Pyxidanthera.brevifolia 0.1287981 0.1614858
## Shortia.galacifolia 0.6447093 0.1228339
## Pyxidanthera.brevifolia Shortia.galacifolia
## Galax.urceolata NA NA
## Pyxidanthera.barbulata NA NA
## Pyxidanthera.brevifolia NA NA
## Shortia.galacifolia 0.03923816 NA
Let’s look at niche overlap over a tree!
tree <- ape::read.tree(file =
"data/Ecological_Niche_Modeling/AOC_Test_Demo/diapensiaceae_subset.tre")
tree <- ape::compute.brlen(phy = tree, method = "grafen")
plot(tree)
tree <- drop.tip(tree, "Cyrilla_racemiflora")
plot(tree)
alldf <- read.csv("data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20230605.csv")
Galax_urceolata <- dplyr::filter(alldf, species == "Galax urceolata")
Pyxidanthera_barbulata <- dplyr::filter(alldf, species == "Pyxidanthera barbulata")
Pyxidanthera_brevifolia <- dplyr::filter(alldf, species == "Pyxidanthera brevifolia")
Shortia_galacifolia <- dplyr::filter(alldf, species == "Shortia galacifolia")
## Generate species objects for each tree member!
sp1 <- enmtools.species(species.name = "Galax_urceolata",
presence.points = vect(Galax_urceolata[,2:3],
geom = c("longitude", "latitude")))
crs(sp1$presence.points) <- crs("+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs")
sp1$range <- background.raster.buffer(sp1$presence.points,
25000,
mask = terra::rast(allstack))
sp1$background.points = background.points.buffer(points = sp1$presence.points,
radius = 5000,
n = 10000,
mask = terra::rast(allstack[[1]]))
##############################
sp2 <- enmtools.species(species.name = "Pyxidanthera_barbulata",
presence.points = vect(Pyxidanthera_barbulata[,2:3],
geom = c("longitude", "latitude")))
crs(sp2$presence.points) <- crs("+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs")
sp2$range <- background.raster.buffer(sp2$presence.points,
25000,
mask = terra::rast(allstack))
sp2$background.points = background.points.buffer(points = sp2$presence.points,
radius = 5000,
n = 10000,
mask = terra::rast(allstack[[1]]))
#############################
sp3 <- enmtools.species(species.name = "Pyxidanthera_brevifolia",
presence.points = vect(Pyxidanthera_brevifolia[,2:3],
geom = c("longitude", "latitude")))
crs(sp3$presence.points) <- crs("+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs")
sp3$range <- background.raster.buffer(sp3$presence.points,
25000,
mask = terra::rast(allstack))
sp3$background.points = background.points.buffer(points = sp3$presence.points,
radius = 5000,
n = 10000,
mask = terra::rast(allstack[[1]]))
#############################
sp4 <- enmtools.species(species.name = "Shortia_galacifolia",
presence.points = vect(Shortia_galacifolia[,2:3],
geom = c("longitude", "latitude")))
crs(sp4$presence.points) <- crs("+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs")
sp4$range <- background.raster.buffer(sp4$presence.points,
25000,
mask = terra::rast(allstack))
sp4$background.points = background.points.buffer(points = sp4$presence.points,
radius = 5000,
n = 10000,
mask = terra::rast(allstack[[1]]))
clade=enmtools.clade(species = list(sp1, sp2, sp3, sp4), tree = tree)
check.clade(clade)
range.aoc <- enmtools.aoc(clade = clade, nreps = 10,
overlap.source = "range")
summary(range.aoc)
##
##
## Age-Overlap Correlation test
##
## 10 replicates
##
## p values:
## Intercept Age
## 0.2272727 0.2272727
## NULL
glm.aoc <- enmtools.aoc(clade = clade,
env = terra::rast(allstack), nreps = 5,
overlap.source = "glm")
summary(glm.aoc)
##
##
## Age-Overlap Correlation test
##
## 5 replicates
##
## p values:
## Intercept Age
## 0.5 0.5
## NULL
The results here are pretty meaningless since we’re looking at very few species, but it serves the purpose of the demo. For range AOCs, an intercept >0.5 and negative slope are indicative of sympatric species, while an intercept of <0.5 and a positive slope are indicative on non-sympatric speciation. A low intercept and positive slope for niche overlap would indicate niche divergence. ***
This will generate a binary map with any cell that contains a suitability score greater than or equal to the lowest score with an occurrence point as a presence. There are other ways to set the threshold, including percentiles or training statistics.
sp1.dist <- make.binary.map(model = sp1_enm.mx.b, occ.dat = Galax_urceolata)
sp2.dist <- make.binary.map(model = sp2_enm.mx.b, occ.dat = Pyxidanthera_barbulata)
sp3.dist <- make.binary.map(model = sp3_enm.mx.b, occ.dat = Pyxidanthera_brevifolia)
sp4.dist <- make.binary.map(model = sp4_enm.mx.b, occ.dat = Shortia_galacifolia)
par(mfrow = c(2,2), mar = c(1,1,1,1))
plot(sp1.dist)
plot(sp2.dist)
plot(sp3.dist)
plot(sp4.dist)
Next, let’s work on getting some data from the predicted
distributions! Niche space can be thought of as a multi-dimensional
hypervolume. We’re using climatic data in this case, so we’re measuring
the hypervolume of climatic niche space occupied by these species.
Warning: This takes forever!
sp1.hv <- get_hypervolume(binary_projection = sp1.dist, envt = allstack)
sp2.hv <- get_hypervolume(binary_projection = sp2.dist, envt = allstack)
sp3.hv <- get_hypervolume(binary_projection = sp3.dist, envt = allstack)
sp4.hv <- get_hypervolume(binary_projection = sp4.dist, envt = allstack)
hv_set <- hypervolume_set(hv1 = sp1.hv, hv2 = sp2.hv, check.memory = F)
hypervolume_overlap_statistics(hv_set)
plot(hv_set)
get_volume(hv_set)