Botany 2023

00-Setup

Install

This workshop was made using R 4.2.1, we suggest you update your R.

Prior to starting this workshop, you should run the following lines of code to install and update the packages needed for these demos.

## Make a list of packages
list_of_packages <- c("ade4","ape", "biomod2",
                      "caret", "CoordinateCleaner", 
                      "devtools","dismo", 
                      "dplyr", "ecospat", 
                      "ENMeval",  "ENMTools", 
                      "fields",  "gatoRs",  
                      "ggplot2", "ggspatial",
                      "gridExtra",  "gtools", 
                      "hypervolume",  "kuenm", 
                      "lattice", "leaflet",
                      "magrittr",   "maps",   
                      "multcompView", "parsedate", 
                      "phytools", "plyr", 
                      "rangeBuilder", "raster",
                      "Rcpp",  "ridigbio",
                      "rJava","rgbif",
                      "sf",   "sp",
                      "spThin","spam",
                      "spatstat.geom","stringr",
                      "terra", "tidyr", 
                      "usdm",  "utils", 
                      "viridis", "viridisLite")

## Here we identify which packages are not installed and install these for you
new.packages <- list_of_packages[!(list_of_packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)

## Check version of packages
### Compare to package version used for demo creation
### This data frame has two columns: (1) package (2) version, indicating which
### version of the package was used to develop this workshop
versiondf <- read.csv("data/setup/setup.csv", stringsAsFactors = FALSE)
### Save your version under a new column named "current_version"
versiondf$current_version <- as.character(do.call(c, lapply(list_of_packages, packageVersion)))
### Compare the version the workshop was developed with to your current version
updatelist <- versiondf[which(versiondf$version != versiondf$current_version), ]
### Update packages with old versions
lapply(as.character(updatelist$packages), install.packages, ask = FALSE)

## Make sure all packages load
## Packages that are FALSE did not load
### If anything prints here, then something went wrong and a package did not install
loaded <- lapply(list_of_packages, require, character.only = TRUE)
list_of_packages[loaded == FALSE]
# Install packages not in CRAN
library(devtools)
install_github('johnbaums/rmaxent')
install_github("marlonecobos/kuenm")
install_github("nataliepatten/gatoRs")


## Check and make sure all github packages load
github_packages <- c("rmaxent", "kuenm", "gatoRs")
github_loaded <- lapply(github_packages, require, character.only = TRUE)
github_packages[github_loaded == FALSE]

Package information

tidyverse is a collection of R packages that are used for “everyday data analysis” including ggplot2, dplyr, tidyr, readr, purr, tibble, stringr, forcats. We use dplyr and ggplot2 throughout many of the R scripts.

ridigbio is an R package that queries the iDigBio API.

spocc queries data from GBIF, USGS’ Biodiversity Information Serving Our Nation (BISON), iNaturalist, Berkeley Ecoinformatics Engine, eBird, iDigBio, VertNet, Ocean Biogeographic Information System (OBIS), and Atlas of Living Australia (ALA).

RCurl and rjson are used to query an application programming interface (API), or an online database.

Geospatial packages include sp, raster, maptools, maps, rgdal, RStoolbox, and mapproj.

ggplot2 is used to plot and visualize data. ggbiplot is an add on to ggplot2.

ENMTools, ENMeval, and dismo are specific to ecological niche modeling.

hypervolume and caret have functions related to statistics.

phytools and picante are related to phylogenetics.

factoextra and FactoMiner are used for statistics related to a principal components analysis.



Troubleshooting

  • For spatstat make sure xcode is installed on OS.
system("xcode-select --install")
  • spThin and fields do not compile from source!

  • If you are still having issues installing ggbiplot, you may have to uninstall the package ‘backports’ and reset R.



This exercise requires the package ENMTools. This package requires Java (and the Oracle Java JDK) and that the maxent.jar file is in the folder that contains the dismo package (this package is a dependency of ENMTools). To find out where to put the jar file, run the command:

system.file("java", package="dismo")

Once you have your java and maxent file set, you are ready to go! If you have more issues, find some help here

To download the jar file to the right directory —

devtools::install_github("johnbaums/rmaxent")
rmaxent::get_maxent(version = "latest", quiet = FALSE)

The previous step may mask rgdal! Make sure to reload it!

For rJava, the newest package requires R >= 3.6.0
dismo only requires rJava >= 0.9-7, download 0.9.7 here

If you are experiencing errors with rJava:

## Remove rJava
remove.packages(rJava)
## Update your R Java configuration 
system("sudo R CMD javareconf")
## Restart R and check to see if the path matches
Sys.getenv("DYLD_FALLBACK_LIBRARY_PATH")
## Reinstall rJava
install.packages("rJava")
library(rJava)

If you are still having issues with rJava - check to see if you are using Java 12 (this should be included in the error message):

system("java -version")

The initial release of java 12 does not work - install an old version instead. To install an old version, navigate to the Oracle website or use homebrew.

system("brew cask install homebrew/cask-versions/java11")

Restart R and redo the previous steps. Required: uninstall the newer versions of java prior to repeating and restart R after.

## Uninstall Java 12, then repeat the following
## Remove rJava
remove.packages(rJava)
## Update your R Java configuration 
system("sudo R CMD javareconf")
## Restart R and check to see if the path matches
Sys.getenv("DYLD_FALLBACK_LIBRARY_PATH")
## Reinstall rJava
install.packages("rJava")
library(rJava)

More debugging: Check out jerrybreak’s comment from 12/30/2021.

If you have additional issues, please let us know!

01-Download Occurrence Data

ML Gaynor.

Load packages

library(ridigbio) 
library(gatoRs)
library(leaflet)

Data download from iDigBio

Search for the species Galax urceolata.

iDigBio_GU <- idig_search_records(rq=list(scientificname="Galax urceolata"))

Search for the family Diapensiaceae.

iDigBio_GU_family <- idig_search_records(rq=list(family="Diapensiaceae"), limit=1000)

What if you want to read in all the points for a family within an extent?

Hint: Use the iDigBio portal to determine the bounding box for your region of interest.

The bounding box delimits the geographic extent.

rq_input <- list("scientificname"=list("type"="exists"),
                 "family"="Diapensiaceae", 
                 geopoint=list(
                   type="geo_bounding_box",
                   top_left=list(lon = -98.16, lat = 48.92),
                   bottom_right=list(lon = -64.02, lat = 23.06)
                   )
                 )

Search using the input you just made

iDigBio_GU_family_USA <- idig_search_records(rq_input, limit=1000)

Save as csv files

write.csv(iDigBio_GU, "data/download/iDigBio_GU_20230605.csv", 
          row.names = FALSE)
write.csv(iDigBio_GU_family, "data/download/iDigBio_GU_family_20230605.csv", 
          row.names = FALSE)

Data download using gatoRs

Make synonym lists

Shortia_galacifolia <- c("Shortia galacifolia", "Sherwoodia galacifolia")
Galax_urceolata <- c("Galax urceolata", "Galax aphylla")
Pyxidanthera_barbulata <- c("Pyxidanthera barbulata","Pyxidanthera barbulata var. barbulata")
Pyxidanthera_brevifolia <- c("Pyxidanthera brevifolia", "Pyxidanthera barbulata var. brevifolia")

Use the gators_download function

This function downloads records for all names listed from iDigBio, GBIF, and BISON. It keeps specific columns from each database.

gators_download(synonyms.list = Shortia_galacifolia,
                write.file = TRUE, 
                filename = "data/download/raw/Shortia_galacifolia_raw_20230605.csv")
gators_download(synonyms.list = Galax_urceolata, 
              write.file = TRUE, 
              filename = "data/download/raw/Galax_urceolata_raw_20230605.csv")
gators_download(synonyms.list = Pyxidanthera_barbulata, 
              write.file = TRUE, 
              filename = "data/download/raw/Pyxidanthera_barbulata_raw_20230605.csv")
gators_download(synonyms.list = Pyxidanthera_brevifolia, 
              write.file = TRUE, 
              filename = "data/download/raw/Pyxidanthera_brevifolia_raw_20230605.csv")

Quick-look at the downloaded files

Read in downloaded data frame

rawdf <- read.csv("data/download/raw/Shortia_galacifolia_raw_20230605.csv")

Inspect the data frame

What columns are included?

names(rawdf)
##  [1] "scientificName"                "genus"                        
##  [3] "specificEpithet"               "infraspecificEpithet"         
##  [5] "ID"                            "occurrenceID"                 
##  [7] "basisOfRecord"                 "eventDate"                    
##  [9] "year"                          "month"                        
## [11] "day"                           "institutionCode"              
## [13] "recordedBy"                    "country"                      
## [15] "county"                        "stateProvince"                
## [17] "locality"                      "latitude"                     
## [19] "longitude"                     "coordinateUncertaintyInMeters"
## [21] "informationWithheld"           "habitat"                      
## [23] "aggregator"

Here is a table of the columns returned for each species in gators_download:

Column Description
scientificName taxonomy, http://rs.tdwg.org/dwc/terms/scientificName
genus taxonomy, https://dwc.tdwg.org/list/#dwc_genus
specificEpithet taxonomy, https://dwc.tdwg.org/list/#dwc_specificEpithet
infraspecificEpithet taxonomy, https://dwc.tdwg.org/list/#dwc_infraspecificEpithet
ID ID (contains unique IDs defined from GBIF or iDigBio)
occurrenceID collection event, https://dwc.tdwg.org/list/#dwc_occurrenceID
basisOfRecord collection event, https://dwc.tdwg.org/list/#dwc_basisOfRecord
eventDate collection event, https://dwc.tdwg.org/list/#dwc_eventDate
year collection event, https://dwc.tdwg.org/list/#dwc_year
month collection event, https://dwc.tdwg.org/list/#dwc_month
day collection event, https://dwc.tdwg.org/list/#dwc_day
institutionCode storage, https://dwc.tdwg.org/list/#dwc_institutionCode
recordedBy collection event, https://dwc.tdwg.org/list/#dwc_recordedBy
country locality, https://dwc.tdwg.org/list/#dwc_country
county locality, https://dwc.tdwg.org/list/#dwc_county
stateProvince locality, https://dwc.tdwg.org/list/#dwc_stateProvince
locality locality, https://dwc.tdwg.org/list/#dwc_locality
latitude locality, https://dwc.tdwg.org/list/#dwc_decimalLatitude
longitude locality, https://dwc.tdwg.org/list/#dwc_decimalLongitude
coordinateUncertaintyInMeters locality, https://dwc.tdwg.org/list/#dwc_coordinateUncertaintyInMeters
informationWithheld storage, https://dwc.tdwg.org/list/#dwc_informationWithheld
habitat locality, https://dwc.tdwg.org/list/#dwc_habitat
aggregator storage, indicates either GBIF or iDigBio

How many observations do we start with?

nrow(rawdf)
## [1] 1219

Where are these points?

The error message here indicates many points do not have long/lat values (more in 02).

leaflet(rawdf) %>% 
  addMarkers(label = paste0(rawdf$longitude, ", ", rawdf$latitude)) %>% 
  addTiles()

02-Occurrence Data Cleaning

ML Gaynor.

Load packages

library(gatoRs)
library(fields)
library(sf)
library(ggplot2)
library(ggspatial)
library(leaflet)

Read in downloaded data frame

rawdf <- read.csv("data/download/raw/Shortia_galacifolia_raw_20230605.csv")

Cleaning

Inspect the data frame

How many observations do we start with?

nrow(rawdf)
## [1] 1219

1. Resolve taxon names

Here we are going to harmonize taxonomy using taxa_clean(). This function has three filter options: exact, fuzzy, or interactive.

Inspect scientific names included in the raw df.

unique(rawdf$scientificName)
## [1] "Shortia galacifolia Torr. & A.Gray"              
## [2] "Sherwoodia galacifolia (Torr. & A.Gray) House"   
## [3] "Shortia galacifolia"                             
## [4] "Shortia galacifolia Torr. & A. Gray"             
## [5] "Shortia galacifolia Torrey & A. Gray"            
## [6] "Shortia galacifolia var. galacifolia"            
## [7] "Shortia galacifolia var. brevistyla"             
## [8] "Shortia galacifolia var. brevistyla P. A. Davies"
## [9] "Sherwoodia galacifolia"

Create a list of accepted names based on the name column in your data frame

search <-  c("Shortia galacifolia", "Sherwoodia galacifolia")

Filter to only include accepted name:

df <-  taxa_clean(df = rawdf, 
                  synonyms.list = search,
                  taxa.filter = "fuzzy",
                  accepted.name = "Shortia galacifolia")

How many observations do we have now?

nrow(df)
## [1] 1219

2. Clean localities

Here we remove any records with missing coordinates, impossible coordinates, coordinates at (0,0), and any that are flagged as skewed. We also round the provided latitude and longitude values to a specified number of decimal places.

df <- basic_locality_clean(df = df,  
                           remove.zero = TRUE, # Records at (0,0) are removed
                           precision = TRUE, # latitude and longitude are rounded 
                           digits = 2, # round to 2 decimal places
                           remove.skewed = TRUE)

How many observations do we have now?

nrow(df)
## [1] 57

Remove unlikely points

Remove coordinates in cultivated zones, botanical gardens, and outside our desired range. Here we set interactive = FALSE, however you can visualize the outliers by setting interactive = TRUE.

df <- process_flagged(df, interactive = FALSE)

How many observations do we have now?

nrow(df)
## [1] 52

3. Remove duplicates

Here we identify and remove both (1) specimen duplicates and (2) aggregator duplicates based on each specimens coordinates, occurrenceID, and eventDate. To leverage all date information available, set remove.unparseable = FALSE to manually populate the year, month, and day columns. Here, we also confirm all ID (UUID and key) are unique to remove any within-aggregator duplicates that may accumulate due to processing errors.

df <- remove_duplicates(df, remove.unparseable = TRUE)

How many observations do we have now?

nrow(df)
## [1] 41

Bonus question

How many more records do you retain when remove.unparseable = FALSE?

4. Spatial correction

Maxent will only retain one point per pixel. To make the ecological niche analysis comparable, we will retain only one point per pixel. Note: Default is a raster with 30 arc sec resolution.

One point per pixel

Maxent will only retain one point per pixel. To make the ecological niche analysis comparable, we will retain only one point per pixel. Note: Default is a raster with 30 arc sec resolution.

df <- one_point_per_pixel(df)

How many observations do we have now?

nrow(df)
## [1] 21

Spatial thinning

Reduce the effects of sampling bias using randomization approach.

Step 1: What should your minimum distance be?

First, lets assess the current minimum distance among your occurrence records. Here, we calculate minimum nearest neighbor distance in km.

nnDm <- rdist.earth(as.matrix(data.frame(lon = df$long, lat = df$lat)), miles = FALSE, R = NULL)
nnDmin <- do.call(rbind, lapply(1:5, function(i) sort(nnDm[,i])[2]))
min(nnDmin)
## [1] 2.226477

Here the current minimum distance is 2.22 km. Based on literature, we find a 2 meters (or 0.002 km) distance was enough to collect unique genets, so we do not need to thin our points.

Step 2: Thin occurrence records using spThin through gatoRs.

When you do need to thin your records, here is a great function to do so!

df <- thin_points(df, 
                    distance = 0.002, # in km 
                    reps = 100)

5. Plot cleaned records

Make points spatial

df_fixed <- st_as_sf(df, coords = c("longitude", "latitude"), crs = 4326)

Set basemap

USA <- borders(database = "usa", colour = "gray50", fill = "gray50")
state <- borders(database = "state", colour = "black", fill = NA)

Plot

Here we are using ggplot2 and ggspatial to plot our occurrence records. We are using two basemaps, USA and state. The geom_point function adds our points based on long and lat, and colors them blue. We set the coordinate visualization space with coord_sf. We fixed the x and y labels. Finally, we added a scale and north arrow.

simple_map <-ggplot() +
              USA +
              state +
              geom_sf(df_fixed, 
                       mapping = aes(col = name), 
                       col = "blue") +
              coord_sf(xlim = c(min(df$longitude) - 3,
                                max(df$longitude) + 3),
                       ylim = c(min(df$latitude) - 3,
                                max(df$latitude) + 3)) +
              xlab("Longitude") +
              ylab("Latitude") +
              annotation_scale(plot_unit = "km") +
              annotation_north_arrow(height = unit(1, "cm"), 
                                     width = unit(1, "cm"), 
                                     location = "tl")
simple_map

Extra

Another fun way to view these points is with leaflet.

leaflet(df_fixed) %>% 
  addMarkers(label = paste0(df$longitude, ", ", df$latitude)) %>% 
  addTiles()

6. Save cleaned.csv

write.csv(df, "data/cleaning_demo/Shortia_galacifolia_20230605-cleaned.csv", row.names = FALSE)

7. Repeat for all species

## Set up for loop
files <- list.files("data/download/raw", full.names = TRUE)[1:3]
synonymns <- list(Galax_urceolata = c("Galax urceolata", 
                                      "Galax urceolata (Poir.) Brummitt", 
                                      "Galax urceolata (Poiret) Brummitt",
                                      "Galax urceolaa",
                                      "Galax aphylla L.", 
                                      "Galax aphylla"),  
               Pyxidanthera_barbulata =c("Pyxidanthera barbulata", 
                                         "Pyxidanthera barbulata Michx.", 
                                         "Pyxidanthera barbulata var. barbulata" , 
                                         "Pyxidenthera barbulata"), 
               Pyxidanthera_brevifolia = c("Pyxidanthera brevifolia",
                                           "Pyxidanthera brevifolia Wells", 
                                           "Pyxidanthera barbulata var. brevifolia (Wells) H.E.Ahles", 
                                           "Pyxidanthera barbulata var. brevifolia"  ))

## Repeat cleaning steps for the remaining taxa
for(i in 1:3){
    df <- read.csv(files[i])
    # Taxa clean
    search <-  synonymns[[i]]
    df <-  taxa_clean(df = df, synonyms.list = search, 
                      taxa.filter = "exact", 
                      accepted.name = search[1])
    # Locality clean 
    df <- basic_locality_clean(df = df,  remove.zero = TRUE, 
                               precision = TRUE, digits = 2, 
                               remove.skewed = TRUE)
    df <- process_flagged(df, interactive = FALSE)
    # Remove duplicates
    df <- remove_duplicates(df, remove.unparseable = TRUE)
    # Spatial correct
    df <- one_point_per_pixel(df)
    df <- thin_points(df,  distance = 0.002, reps = 100)
    # Save file
    outfile <- paste0("data/cleaning_demo/",
                      gsub(" ", "_", search[1]), 
                      "_20230605-cleaned.csv")
    write.csv(df, outfile, 
              row.names = FALSE)
    rm(df, search, outfile)
}

8. Make maxent ready

Read in all cleaned files

alldf <- list.files("data/cleaning_demo/", full.names = TRUE, 
                    recursive = FALSE, include.dirs = FALSE, pattern = "*.csv")
alldf <- lapply(alldf, read.csv)
alldf <- do.call(rbind, alldf)

Plot all records

Visually inspect records to verify no additional records should be removed.

### Make points spatial 
alldf_fixed <- st_as_sf(alldf, coords = c("longitude", "latitude"), 
                        crs = 4326)

### Set basemap
USA <- borders(database = "usa", colour = "gray50", fill = "gray50")
state <- borders(database = "state", colour = "black", fill = NA)

### Plot 
all_map <- ggplot() +
            USA +
            state +
            geom_sf(alldf_fixed, 
                    mapping = aes(col = factor(accepted_name))) +
            coord_sf(xlim = c(min(alldf$longitude) - 3, max(alldf$longitude) + 3),
                     ylim = c(min(alldf$latitude) - 3, max(alldf$latitude) + 3)) +
            xlab("Longitude") +
            ylab("Latitude") +
            labs(color = "Scientific name") +
            annotation_scale(plot_unit = "km") +
            annotation_north_arrow(height = unit(1, "cm"), 
                                   width = unit(1, "cm"), 
                                   location = "tl")
all_map

Select needed columns

alldfchomp <- c()
species <- unique(alldf$accepted_name)
for(i in 1:4){
  alldfchomp[[i]] <- data_chomp(alldf[alldf$accepted_name ==species[i], ], 
                                accepted.name =  species[i])
}
alldf <- do.call(rbind, alldfchomp)

Save Maxent.csv

write.csv(alldf, "data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20230605.csv", row.names = FALSE)

03-Georeferencing

Create a batch file for GeoLocate. Created by ME Mabry.
Modified by ML Gaynor.

Load packages

library(dplyr) 
library(gatoRs)

Read in downloaded data frame

rawdf <- read.csv("data/download/raw/Shortia_galacifolia_raw_20230605.csv")

Identify records for geoferencing

rawdf_GeoRef <- need_to_georeference(rawdf)

Format for GeoLocate

Subset needed columns

rawdf_GeoRef <- rawdf_GeoRef %>%
                dplyr::select("locality string" = locality,
                              country = country,
                              state = stateProvince,
                              county = county,
                              latitude = latitude, 
                              longitude = longitude,
                              ID = ID, 
                              name = scientificName, 
                              basis = basisOfRecord)

Add required columns

rawdf_GeoRef$'correction status' <- ""
rawdf_GeoRef$precision <- ""
rawdf_GeoRef$'error polygon' <- ""
rawdf_GeoRef$'multiple results' <- ""

Reorder

rawdf_GeoRef2<- rawdf_GeoRef[,c("locality string", "country", "state", "county", "latitude",
                                "longitude", "correction status", "precision", "error polygon",
                                "multiple results", "ID", "name", "basis")]

Save file for georeferencing

write.csv(rawdf_GeoRef2,
          file = "data/georeferencing/Shortia_galacifolia_Needing_GeoRef_20230605.csv",
          row.names = FALSE)

04-Climate Processing

Climate layer processing.
Modified and created by ML Gaynor.
Based on code by Mike Belitz (mbelitz/Odo_SDM_Rproj).

Load packages

library(raster)
library(gtools)
library(sf)
library(rangeBuilder)
library(dplyr)
library(caret)
library(usdm)
library(dismo)
library(stringr)

Load functions

Based on code by Mike Belitz (mbelitz/Odo_SDM_Rproj).

source("functions/VIFLayerSelect.R")

Load bioclim layers

biolist <- list.files("data/climate_processing/bioclim/", pattern = "*.tif", full.names = TRUE)

Order list using gtools.

biolist <- mixedsort(sort(biolist))

Load rasters as a stack.

biostack <- raster::stack(biolist)

Load occurrence records

And fix the name column class and make sure it is a character.

alldf <- read.csv("data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20230605.csv")
alldf$species <- as.character(alldf$species)
alldfsp <- st_as_sf(alldf, coords = c("longitude", "latitude"), crs = 4326)

Present layers - all

Here we will make the projection layers, or the layers which include shared space. First, we have to define the accessible space.

Create alpha hull

hull <- rangeBuilder::getDynamicAlphaHull(x = alldf, 
                                          coordHeaders = c("longitude", "latitude"),
                                           fraction = 1, # min. fraction of records we want included
                                           partCount = 1, # number of polygons allowed
                                           initialAlpha = 20, # initial alpha size, 20m
                                           clipToCoast = "terrestrial", 
                                           verbose = TRUE)

Visualize

plot(hull[[1]], col=rangeBuilder::transparentColor('gray50', 0.5), border = NA)
points(x = alldf$longitude, y = alldf$latitude, cex = 0.5, pch = 3)

Add buffer to hull

Calculate buffer size

Here we take the 80th quantile of the max distance between points

buffDist <- quantile(x = (apply(sf::st_distance(alldfspTrans), 2, FUN = function(x) sort(x)[2])), 
                     probs = 0.80, na.rm = TRUE) 
buffDist
##      80% 
## 3786.239
Buffer the hull
buffer_m <- st_buffer(x = hullTrans, dist = buffDist, dissolve = TRUE)
buffer <- st_transform(buffer_m,  CRS("+proj=longlat +datum=WGS84"))
Visualize
plot(buffer, col=rangeBuilder::transparentColor('gray50', 0.5), border = NA)
points(x = alldf$longitude, y = alldf$latitude, cex = 0.5, pch = 3)

Mask and crop bioclim layers

path <- "data/climate_processing/all/"
end <- ".asc"
for(i in 1:length(biolist)){
    # Subset raster layer
    rast <- biostack[[i]]
    # Setup file names
    name <- names(rast)
    out <- paste0(path, name)
    outfile <- paste0(out, end)
    # Crop and mask
    c <- crop(rast, sf::as_Spatial(buffer))
    c <- mask(c, sf::as_Spatial(buffer))
    # Write raster
    writeRaster(c, outfile, format = "ascii", overwrite = TRUE)
}

Select layers for MaxEnt

Warning: Many of these steps are time intensive! Do not run during the workshop, we have already done these for you.

We only want to include layers that are not highly correlated. To assess which layers we will include, we can look at the pearson correlation coefficient among layers.

Stack all layers

clippedlist <- list.files("data/climate_processing/all/", pattern = "*.asc", full.names = TRUE)
clippedlist <- mixedsort(sort(clippedlist))
clippedstack <- raster::stack(clippedlist)

Then calculate the correlation coefficient

Warning: Time Intensive!

corr <- layerStats(clippedstack, 'pearson', na.rm=TRUE)

Isolate only the pearson correlation coefficient and take absolute value.

c <- abs(corr$`pearson correlation coefficient`)

Write file and view in excel

write.csv(c, "data/climate_processing/correlationBioclim.csv", row.names = FALSE)

Highly correlated layers (> |0.80|) can impact the statistical significance of the niche models and therefore must be removed.

Randomly select variables to remove

envtCor <- mixedsort(sort(findCorrelation(c, cutoff = 0.80, names = TRUE, exact = TRUE)))
envtCor
##  [1] "bio_1"  "bio_3"  "bio_4"  "bio_6"  "bio_9"  "bio_10" "bio_11" "bio_12"
##  [9] "bio_13" "bio_16" "bio_17" "bio_19"

Variable inflation factor (VIF)

Warning: Time Intensive!
VIF can detect for multicollinearity in a set of multiple regression variables. Run a simple maxent model for every species and calculate the average permutation contribution.

Run preliminary MaxEnt models

Loop through each species and save permutation importance in list.
Warning: This will print warnings even when it works fine.

set.seed(195)
m <- c()
for(i in  1:length(unique(alldf$species))){
    name <- unique(alldf$species)[i]
    spp_df <-  alldf %>%
               dplyr::filter(species == name) %>%
               dplyr::select(longitude, latitude)
    model <- dismo::maxent(x = clippedstack, p = spp_df, 
                           progress = "text", silent = FALSE) 
    m[[i]] <- vimportance(model)
}

Bind the dataframes

mc <- do.call(rbind, m)

Calculate the mean and rename columns

mc_average <- aggregate(mc[, 2], list(mc$Variables), mean)
mc_average <- mc_average %>%
              dplyr::select(Variables = Group.1, permutation.importance = x)
mc1 <- mc_average

Select Layers

Use VIF and the MaxEnt permutation importance to select the best variables for your model. Note, this leads to different layers when the models are rerun without setting seed due to permutations being random.

selectedlayers <- VIF_layerselect(clippedstack, mc_average)
mixedsort(sort(names(selectedlayers)))
Correct set for workshop purposes

Since this can vary per system (despite setting seed), we added this line to keep our files consistent for the workshop

sl <- c("bio_3", "bio_7", "bio_8", "bio_9", "bio_14", "bio_15", "bio_18", "elev")
selectedlayers <- raster::subset(clippedstack, sl)

Copy selected layers

Move selected layers to “Present_Layers/all/” for use in MaxEnt later.

for(i in 1:length(names(selectedlayers))){
    name <- names(selectedlayers)[i]
    from <- paste0("data/climate_processing/all/", name, ".asc")
    to <- paste0("data/climate_processing/PresentLayers/all/", name, ".asc")
    file.copy(from, to,
              overwrite = TRUE, recursive = FALSE, 
              copy.mode = TRUE)
}

Create Species Training Layers

for(i in 1:length(unique(alldf$species))){
    sname <- unique(alldf$species)[i]
    # Subset species from data frame
    spp_df <-  alldf %>%
               dplyr::filter(species == sname)
    spp_dfsp <- st_as_sf(spp_df, coords = c("longitude", "latitude"), crs = 4326)
    ## Create alpha hull
    sphull <- rangeBuilder::getDynamicAlphaHull(x = spp_df, 
                                                coordHeaders = c("longitude", "latitude"),
                                                fraction = 1, # min. fraction of records we want included
                                                partCount = 1, # number of polygons allowed
                                                initialAlpha = 20, # initial alpha size, 20m
                                                clipToCoast = "terrestrial", 
                                                verbose = TRUE)
    ## Add buffer to hull
    ### Transform into CRS related to meters with Equal Area Cylindrical projection
    sphullTrans <- st_transform(sphull[[1]], CRS("+proj=cea +lat_ts=0 +lon_0")) 
    spp_dfTrans <- st_transform(spp_dfsp, CRS("+proj=cea +lat_ts=0 +lon_0"))
    
    ### Calculate buffer size
    #### Here we take the 80th quantile of the max distance between points
    spbuffDist <- quantile(x = (apply(sf::st_distance(spp_dfTrans), 2, FUN = function(x) sort(x)[2])), 
                         probs = 0.80, na.rm = TRUE) 
    
    ### Buffer the hull
    spbuffer_m <- st_buffer(x = sphullTrans, dist = spbuffDist, dissolve = TRUE)
    spbuffer <- st_transform(spbuffer_m,  CRS("+proj=longlat +datum=WGS84"))
    
    ### Crop and Mask
    spec <- gsub(" ", "_", sname)
    path <- paste0("data/climate_processing/PresentLayers/", spec,"/")
    end <- ".asc"
    for(j in 1:length(names(selectedlayers))){
        # Subset raster layer
        rast <- selectedlayers[[j]]
        # Setup file names
        name <- names(rast)
        out <- paste0(path, name)
        outfile <- paste0(out, end)
        # Crop and mask
        c <- crop(rast, sf::as_Spatial(spbuffer))
        c <- mask(c, sf::as_Spatial(spbuffer))
        # Write raster
        writeRaster(c, outfile, format = "ascii", overwrite = TRUE)
    }
}

05-Point Based

Ecological analysis using points. This is based off scripts created by Hannah Owens and James Watling, and Anthony Melton.
Modified and created by ML Gaynor.

Load packages

library(gtools)
library(raster)
library(plyr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(multcompView)
library(gridExtra)
library(ecospat)
library(dismo)
library(ade4)

Load functions

This function is from vqv/ggbiplot.

source("functions/ggbiplot_copy.R")

Load data file

alldf <- read.csv("data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20230605.csv")

Raster layers

list <- list.files("data/climate_processing/PresentLayers/all", full.names = TRUE, recursive = FALSE) 
list <- mixedsort(sort(list))
envtStack <- stack(list)

ENM - Realized Niche

Extract value for each point

For each occurence record, extract the value for each bioclim variables using the package raster.

ptExtracted <- raster::extract(envtStack, alldf[2:3])

Convert to data frame

ptExtracteddf <- as.data.frame(ptExtracted)

Add species name

ptExtracteddf <- ptExtracteddf %>%
                 dplyr::mutate(name = as.character(alldf$species), 
                               x = alldf$longitude, 
                               y = alldf$latitude)

Drop any NA

ptExtracteddf <- ptExtracteddf %>% 
                 tidyr::drop_na()

PCA

Create two dataframes.

data.bioclim <- ptExtracteddf[, 1:8]
data.species <- ptExtracteddf[, 9]

Using only the bioclim columns to run the principal components analysis.

data.pca <- prcomp(data.bioclim, scale. = TRUE) 

Understanding the PCA - Optional

When you use the command prcomp your loading variables show up as rotational variables. Thanks to a really great answer on stack overflow you can even convert the rotational variable to show the relative contribution.

loadings <- data.pca$rotation
summary(loadings)
##       PC1                PC2                PC3                PC4          
##  Min.   :-0.41017   Min.   :-0.35451   Min.   :-0.51097   Min.   :-0.38192  
##  1st Qu.:-0.22213   1st Qu.:-0.05133   1st Qu.:-0.26038   1st Qu.:-0.22393  
##  Median : 0.24618   Median : 0.11569   Median : 0.02413   Median :-0.05375  
##  Mean   : 0.09623   Mean   : 0.16167   Mean   : 0.02075   Mean   : 0.03410  
##  3rd Qu.: 0.38700   3rd Qu.: 0.45486   3rd Qu.: 0.34826   3rd Qu.: 0.26963  
##  Max.   : 0.45572   Max.   : 0.58432   Max.   : 0.53397   Max.   : 0.62319  
##       PC5               PC6                PC7               PC8           
##  Min.   :-0.6295   Min.   :-0.24822   Min.   :-0.2551   Min.   :-0.711115  
##  1st Qu.:-0.4145   1st Qu.:-0.17802   1st Qu.:-0.1173   1st Qu.:-0.283120  
##  Median :-0.2340   Median :-0.11462   Median : 0.1206   Median :-0.190132  
##  Mean   :-0.1698   Mean   : 0.08696   Mean   : 0.1623   Mean   :-0.158904  
##  3rd Qu.: 0.1597   3rd Qu.: 0.43387   3rd Qu.: 0.4533   3rd Qu.: 0.008766  
##  Max.   : 0.2532   Max.   : 0.61751   Max.   : 0.6537   Max.   : 0.428816

There are two options to convert the loading to show the relative contribution, they both give the same answer so either can be used.

Option 1
loadings_relative_A <- t(t(abs(loadings))/rowSums(t(abs(loadings))))*100
loadings_relative_A
##              PC1       PC2       PC3         PC4       PC5       PC6       PC7
## bio_3   8.591746 18.290740 15.598618 26.50100048 15.693465  2.892773 10.826188
## bio_7  15.106599  3.605906 20.544960  4.56019510  8.196901  6.586901 27.743419
## bio_8  14.713687  1.268164 13.920838  0.01157387 24.676885 22.390720  4.151105
## bio_9   9.541759 24.350510  8.759713 16.03534080  7.639386 25.535124  3.884780
## bio_14 16.784509  5.136389  1.759000 16.24120193 10.151969  6.787089  7.464726
## bio_15  6.003487 25.442930 21.469601  8.28251442  9.927169  9.084507  6.348032
## bio_18 15.380721  6.468852 14.248091  7.35172686 17.912054 10.264436 18.683458
## elev   13.877493 15.436508  3.699179 21.01644654  5.802170 16.458450 20.898292
##               PC8
## bio_3   0.2683966
## bio_7  10.5755001
## bio_8   6.4386942
## bio_9   2.3740267
## bio_14 31.8174754
## bio_15 18.6736844
## bio_18 19.1865553
## elev   10.6656673
Option 2
loadings_relative_B <- sweep(x = abs(loadings), 
                             MARGIN = 2, 
                             STATS = colSums(abs(loadings)), FUN = "/")*100
loadings_relative_B

Plotting the PCA

Set theme

First, I made a theme to change the background of the plot. Next, I changed the plot margins and the text size.

theme <- theme(panel.background = element_blank(),
               panel.border=element_rect(fill=NA),
               panel.grid.major = element_blank(),
               panel.grid.minor = element_blank(),
               strip.background=element_blank(),
               axis.ticks=element_line(colour="black"),
               plot.margin=unit(c(1,1,1,1),"line"), 
               axis.text = element_text(size = 12), 
               legend.text = element_text(size = 12), 
               legend.title = element_text(size = 12),
               text = element_text(size = 12))
Set color palette
pal <- c("#D43F3AFF", "#EEA236FF", "#5CB85CFF", "#46B8DAFF")
ggbiplot

Next, use ggbiplot where obs.scale indicates the scale factor to apply to observation, var.scale indicates the scale factor to apply to variables, ellipse as TRUE draws a normal data ellipse for each group, and circle as TRUE draws a correlation circle.

g <- ggbiplot(data.pca, obs.scale = 1, var.scale = 1, 
              groups = data.species, ellipse = TRUE, circle = TRUE) +
     scale_color_manual(name = '', values = pal) +
     theme(legend.direction = 'vertical', legend.position = 'right', 
           legend.text = element_text(size = 12, face = "italic")) +
     theme
 
g


ANOVA

Simple function to run an ANOVA and a post-hoc Tukey-HSD test

stat.test <- function(dataframe,  x = "name", y){
    bioaov <- aov(as.formula(paste0(y,"~",x)), data = dataframe) 
    TH <- TukeyHSD(bioaov, "name")
    m <- multcompLetters(TH$name[,4])
    y.val <-  as.numeric(max(dataframe[[y]]) + 1)
    groups <- data.frame(groups = m$Letters, name = names(m$Letters), y.val = rep(y.val, 4))
    return(groups)
}

Run for bio_3 only

bio3aovplot <- ggplot(ptExtracteddf, aes(x = name, y = bio_3)) +
               geom_boxplot(aes(fill = name)) +
               scale_color_manual(name = '', values = pal) +
               geom_text(data = stat.test(dataframe =ptExtracteddf,  y = "bio_3"), 
                        mapping = aes(x = name,
                                      y = max(ptExtracteddf["bio_3"]+1), 
                                      label = groups), 
                        size = 5, inherit.aes = FALSE) +
               theme(axis.text.x = element_text(angle = 90, size = 8, face = 'italic'))
bio3aovplot

Loop through all variables

variablelist <- colnames(ptExtracteddf)[1:8]
plotlist <- list()
maxstats <- c()
statsout <- c()

for(i in 1:8){
  vname <- variablelist[i]
  maxstats[i] <- as.numeric(max(ptExtracteddf[[vname]]) + 1)
  statsout[[i]] <- stat.test(dataframe = ptExtracteddf, y = vname)
  plotlist[[i]] <- ggplot(ptExtracteddf, aes(x = name, y = .data[[vname]])) +
                    geom_boxplot(aes(fill = name)) +
                    scale_colour_manual(name = 'Species', values = pal) +
                    geom_text(data = statsout[[i]], 
                              mapping = aes(x = name,
                                            y = (y.val*1.1), 
                                            label = groups), 
                              size = 4, inherit.aes = FALSE) +
                    scale_x_discrete(labels = c('G', 'Pba','Pbr', 'S')) +
                    ggtitle(label = vname) +
                    ylab(vname) +
                    theme(legend.position = "none") +
                   ylim(0, (maxstats[i]*1.2))
}

gridExtra::grid.arrange(grobs = plotlist)


Ecospat Niche Overlap and Niche Equivalency

Set up background points

bg1 <- randomPoints(mask = envtStack, n = 1000, p = alldf[,3:2])
bg1.env <- raster::extract(envtStack, bg1)
bg1.env <- data.frame(bg1.env)
allpt.bioclim <- rbind(bg1.env, data.bioclim)  

dudi.PCA to reduce variables

pca.env <- dudi.pca(allpt.bioclim,
                    center = TRUE, # Center by the mean
                    scannf = FALSE, # Don't plot
                    nf = 2) # Number of axis to keep 

Pull out scores for each species

p1.score <- suprow(pca.env, dplyr::filter(ptExtracteddf, name == "Galax urceolata")[, 1:8])$li
p2.score <- suprow(pca.env, dplyr::filter(ptExtracteddf, name == "Pyxidanthera barbulata")[, 1:8])$li
p3.score <- suprow(pca.env, dplyr::filter(ptExtracteddf, name == "Pyxidanthera brevifolia")[, 1:8])$li
p4.score <- suprow(pca.env, dplyr::filter(ptExtracteddf, name == "Shortia galacifolia")[, 1:8])$li
scores.clim <- pca.env$li
Visualize
plot(scores.clim, pch = 16, asp = 1,
     col = adjustcolor(1, alpha.f = 0.2), cex = 2,
     xlab = "PC1", ylab = "PC2") 
points(p1.score, pch = 18, col = pal[1], cex = 2)
points(p2.score, pch = 18, col = pal[2], cex = 2)
points(p3.score, pch = 18, col = pal[3], cex = 2)
points(p4.score, pch = 18, col = pal[4], cex = 2)

Kernel density estimates

Create occurrence density grids based on the ordination data.

z1 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p1.score, R = 100)
## Registered S3 methods overwritten by 'adehabitatMA':
##   method                       from
##   print.SpatialPixelsDataFrame sp  
##   print.SpatialPixels          sp
z2 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p2.score, R = 100)
z3 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p3.score, R = 100)
z4 <- ecospat.grid.clim.dyn(scores.clim, scores.clim, p4.score, R = 100)
zlist  <- list(z1, z2, z3, z4)

Niche Overlap

Schoener’s D ranges from 0 to 1. 0 represents no similarity between niche space. 1 represents completely identical niche space.

overlapD <- matrix(ncol = 2, nrow = 7)
n <- 1
for(i in 1:3){
  for(j in 2:4){
    if(i != j){
      overlapD[n, 1]<- paste0("z", i, "-", "z", j)
      overlapD[n, 2]<- ecospat.niche.overlap(zlist[[i]], zlist[[j]], cor = TRUE)$D
      n <- n + 1
    }
  }
}

overlapDdf <- data.frame(overlapD)
overlapDdf
##      X1                  X2
## 1 z1-z2  0.0890148692326306
## 2 z1-z3 0.00297245638991617
## 3 z1-z4   0.453009449060948
## 4 z2-z3 0.00376518449381513
## 5 z2-z4  0.0739737424962652
## 6 z3-z2 0.00376518449381513
## 7 z3-z4 0.00236297216796688
Niche Overlap Visualization
par(mfrow=c(1,2))
ecospat.plot.niche.dyn(z1, z4, quant=0.25, interest = 1
                       , title= "Niche Overlap - Z1 top", name.axis1="PC1", name.axis2="PC2")
ecospat.plot.niche.dyn(z1, z4, quant=0.25, interest = 2
                       , title= "Niche Overlap - Z4 top", name.axis1="PC1", name.axis2="PC2")

Niche Equivalency Test

Based on Warren et al. 2008 - Are the two niche identical?
Hypothesis test for D, null based on randomization. H1: the niche overlap is higher than expected by chance (or when randomized).

eq.test <- ecospat.niche.equivalency.test(z1, z4, rep = 10)
ecospat.plot.overlap.test(eq.test, "D", "Equivalency")

Niche Similarity Test

Based on Warren et al. 2008 - Are the two niche similar?
Can one species’ niche predict the occurrences of a second species better than expected by chance?

sim.test <- ecospat.niche.similarity.test(z1, z4, rep = 10, rand.type=2)
ecospat.plot.overlap.test(sim.test, "D", "Similarity")

06-Ecological Niche Modeling

Ecological Niche Modeling in R.
Modified and created by Anthony Melton and ML Gaynor.

This script is for generating and testing ENMs using ENMEval. Please see the paper describing ENMEval and their vignette.

Set up java memory

options(java.parameters = "-Xmx16g") # increase memory that can be used

Load packages

library(raster)
library(gtools)
library(dplyr)
library(dismo)
library(ENMeval)
library(ggplot2)
library(viridis)

Load function

source("functions/ENMevaluation.R")

Load data file

alldf <- read.csv("data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20230605.csv")

Subset for each species

Galax_urceolata <- dplyr::filter(alldf, species == "Galax urceolata")
Pyxidanthera_barbulata <- dplyr::filter(alldf, species == "Pyxidanthera barbulata")
Pyxidanthera_brevifolia <- dplyr::filter(alldf, species == "Pyxidanthera brevifolia")
Shortia_galacifolia <- dplyr::filter(alldf, species == "Shortia galacifolia")

Raster layers

list <- list.files("data/climate_processing/PresentLayers/all", full.names = TRUE, recursive = FALSE) 
list <- mixedsort(sort(list))
allstack <- stack(list)

Read in species training layers

gstack <- stack(mixedsort(sort(list.files("data/climate_processing/PresentLayers/Galax_urceolata/", full.names = TRUE))))
pbastack <- stack(mixedsort(sort(list.files("data/climate_processing/PresentLayers/Pyxidanthera_barbulata/", full.names = TRUE))))
pbrstack <- stack(mixedsort(sort(list.files("data/climate_processing/PresentLayers/Pyxidanthera_brevifolia/", full.names = TRUE))))
sstack <- stack(mixedsort(sort(list.files("data/climate_processing/PresentLayers/Shortia_galacifolia/", full.names = TRUE))))

Fix projection

projection(allstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
projection(gstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
projection(pbastack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
projection(pbrstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
projection(sstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"

Model generation

dismo model generations

Warning: Time intensive on Apple M1 chip! Match MaxEnt GUI settings (see word document and powerpoint)

evaldis <- dismo::maxent(x = gstack,
                         p = Galax_urceolata[, c("longitude", "latitude")],
                         nbg = 10000,
                         args = c("projectionlayers=data/climate_processing/PresentLayers/all",
                                  "responsecurves", 
                                  "jackknife", 
                                  "outputformat=logistic",
                                  "randomseed", 
                                  "randomtestpoints=25", 
                                  "replicates=5", 
                                  "replicatetype=subsample", 
                                  "maximumiterations=5000",
                                  "writebackgroundpredictions",
                                  "responsecurvesexponent",
                                  "writeplotdata"), 
                         removeDuplicates = TRUE#,
                         #path = "data/Ecological_Niche_Modeling/enm_output/Galax_urceolata/"
              )

ENMeval model generation

ENMeval will generate multiple models and test them per the specified test method.Two important variables here are the regularization multiplier (RM) value and the feature class (FC). FC will allow for different shapes in response curves (linear, hinge, quadratic, product, and threshold) can be used in the model and RM will influence how many parameters are included in the model.

eval1 <- ENMeval::ENMevaluate(occ = Galax_urceolata[, c("longitude", "latitude")], 
                              env = gstack,
                              tune.args = list(fc = c("L","Q"), rm = 1:2), 
                              partitions = "block",
                              n.bg = 10000,
                              parallel = FALSE,
                              algorithm = 'maxent.jar', 
                              user.eval = proc)
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================================| 100%

Model statistics

dismo

Inspect the dismo model based on the html

browseURL(evaldis@html)

ENMeval

Inspect the many models

Visualize

maps <- eval1@predictions
plot(maps)

Look at model overlap

mod_overlap <- calc.niche.overlap(eval1@predictions, overlapStat = "D")
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |======================================================================| 100%
kable(mod_overlap) %>%
   kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 10)) %>%
   scroll_box(width = "100%", height = "200px")
fc.L_rm.1 fc.Q_rm.1 fc.L_rm.2 fc.Q_rm.2
fc.L_rm.1 NA NA NA NA
fc.Q_rm.1 0.8951659 NA NA NA
fc.L_rm.2 0.9864079 0.8870755 NA NA
fc.Q_rm.2 0.9062003 0.9801491 0.8992666 NA

Inspect the results

Identify the best model selecting models with the lowest average test omission rate and the highest average validation AUC

results <- eval.results(eval1)
opt.seq <- results %>% 
            dplyr::filter(or.10p.avg == min(or.10p.avg)) %>% 
            dplyr::filter(auc.val.avg == max(auc.val.avg))
kable(opt.seq)  %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 10)) %>%
   scroll_box(width = "100%", height = "200px")
fc rm tune.args auc.train cbi.train auc.diff.avg auc.diff.sd auc.val.avg auc.val.sd cbi.val.avg cbi.val.sd or.10p.avg or.10p.sd or.mtp.avg or.mtp.sd proc_auc_ratio.avg proc_auc_ratio.sd proc_pval.avg proc_pval.sd AICc delta.AICc w.AIC ncoef
L 2 fc.L_rm.2 0.8777198 0.995 0.0647308 0.0290278 0.8587255 0.0598812 0.9325 0.0859477 0.1319444 0.0749609 0.0023148 0.0046296 1.925411 0.0146166 0 0 42603.12 36.77516 0 8

Subset model

mod.seq <- eval.models(eval1)[[opt.seq$tune.args]]

Inspect

kable(mod.seq@results) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", font_size = 10)) %>%
  scroll_box(width = "100%", height = "200px")
X.Training.samples 1728.0000
Regularized.training.gain 0.6747
Unregularized.training.gain 0.7039
Iterations 140.0000
Training.AUC 0.8227
X.Background.points 11704.0000
bio_14.contribution 5.9779
bio_15.contribution 7.4804
bio_18.contribution 0.0997
bio_3.contribution 11.5439
bio_7.contribution 2.7500
bio_8.contribution 5.0561
bio_9.contribution 1.3086
elev.contribution 65.7834
bio_14.permutation.importance 0.0000
bio_15.permutation.importance 27.4510
bio_18.permutation.importance 0.0000
bio_3.permutation.importance 22.7601
bio_7.permutation.importance 17.0200
bio_8.permutation.importance 0.2045
bio_9.permutation.importance 1.6658
elev.permutation.importance 30.8986
Entropy 8.6925
Prevalence..average.probability.of.presence.over.background.sites. 0.2961
Fixed.cumulative.value.1.cumulative.threshold 1.0000
Fixed.cumulative.value.1.Cloglog.threshold 0.0751
Fixed.cumulative.value.1.area 0.9052
Fixed.cumulative.value.1.training.omission 0.0185
Fixed.cumulative.value.5.cumulative.threshold 5.0000
Fixed.cumulative.value.5.Cloglog.threshold 0.1327
Fixed.cumulative.value.5.area 0.7183
Fixed.cumulative.value.5.training.omission 0.0284
Fixed.cumulative.value.10.cumulative.threshold 10.0000
Fixed.cumulative.value.10.Cloglog.threshold 0.1883
Fixed.cumulative.value.10.area 0.5726
Fixed.cumulative.value.10.training.omission 0.0723
Minimum.training.presence.cumulative.threshold 0.0161
Minimum.training.presence.Cloglog.threshold 0.0252
Minimum.training.presence.area 0.9964
Minimum.training.presence.training.omission 0.0000
X10.percentile.training.presence.cumulative.threshold 13.9581
X10.percentile.training.presence.Cloglog.threshold 0.2228
X10.percentile.training.presence.area 0.4851
X10.percentile.training.presence.training.omission 0.0995
Equal.training.sensitivity.and.specificity.cumulative.threshold 31.5097
Equal.training.sensitivity.and.specificity.Cloglog.threshold 0.3892
Equal.training.sensitivity.and.specificity.area 0.2257
Equal.training.sensitivity.and.specificity.training.omission 0.2257
Maximum.training.sensitivity.plus.specificity.cumulative.threshold 30.0006
Maximum.training.sensitivity.plus.specificity.Cloglog.threshold 0.3679
Maximum.training.sensitivity.plus.specificity.area 0.2419
Maximum.training.sensitivity.plus.specificity.training.omission 0.2049
Balance.training.omission..predicted.area.and.threshold.value.cumulative.threshold 4.6516
Balance.training.omission..predicted.area.and.threshold.value.Cloglog.threshold 0.1278
Balance.training.omission..predicted.area.and.threshold.value.area 0.7310
Balance.training.omission..predicted.area.and.threshold.value.training.omission 0.0249
Equate.entropy.of.thresholded.and.original.distributions.cumulative.threshold 12.8080
Equate.entropy.of.thresholded.and.original.distributions.Cloglog.threshold 0.2125
Equate.entropy.of.thresholded.and.original.distributions.area 0.5090
Equate.entropy.of.thresholded.and.original.distributions.training.omission 0.0932

Look at variable contribution

plot(mod.seq)

Look at the response curves

dismo::response(mod.seq)

Project model to allstack

p <- predict(mod.seq, allstack) 

Visualize

# Make into a data frame
p_df <-  as.data.frame(p, xy = TRUE)

# Plot
ggplot() +
  geom_raster(data = p_df, aes(x = x, y = y, fill = layer)) +
  geom_point(data= Galax_urceolata, 
             mapping = aes(x = longitude, y = latitude), 
             col='red', cex=0.05) +
  coord_quickmap() +
  theme_bw() + 
  scale_fill_gradientn(colours = viridis::viridis(99),
                       na.value = "black")


Save outputs

R saved dataset

save(mod.seq, file = "data/Ecological_Niche_Modeling/enm_output/ENMeval/GalaxENM.rda")

Save Raster

writeRaster(x = p, filename = "data/Ecological_Niche_Modeling/enm_output/ENMeval/GalaxENM.asc",
            format = "ascii", NAFlag = "-9999", overwrite = T)

07-ENM Processing

Processing Ecological Niche Models in R.
Script by Anthony Melton and ML Gaynor.

Load Packages

library(raster)
library(gtools)
library(dplyr)
library(ENMTools)
library(ENMeval)
library(ape)
library(RStoolbox)
library(hypervolume)
library(phytools)
library(terra)

Load functions

The following command generates a binary predicted occurrence map. This was written by Anthony Melton.

source("functions/Functions_AEM.R")

Read in models you generated with the MaxEnt GUI into R.

sp1_enm.mx.b <- raster("data/Ecological_Niche_Modeling/enm_output/Galax_urceolata_avg.asc")
sp2_enm.mx.b <- raster("data/Ecological_Niche_Modeling/enm_output/Pyxidanthera_barbulata_avg.asc")
sp3_enm.mx.b <- raster("data/Ecological_Niche_Modeling/enm_output/Pyxidanthera_brevifolia_avg.asc")
sp4_enm.mx.b <- raster("data/Ecological_Niche_Modeling/enm_output/Shortia_galacifolia_avg.asc")

Read in bioclim layers

These steps are described in the previous script

system("rm -rf data/climate_processing/PresentLayers/all/maxent.cache/")
list <- list.files("data/climate_processing/PresentLayers/all/", 
                   full.names = TRUE, recursive = FALSE) 
list <- mixedsort(sort(list))
allstack <- stack(list)
projection(allstack) <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"

ENM Breadth

Niche breadth is the breadth of environmental factors for a species’ niche, ranging from 0 to 1. When breadth is closer to 1 the more generalist species with wider tolerances. Values closer to 0 indicate a more specialized species. The raster.breadth command in ENMTools measures the smoothness of suitability scores across a projected landscape. The higher the score, the more of the available niche space a species occupies.

## MaxEnt GUI
sp1_breadth <- ENMTools::raster.breadth(terra::rast(sp1_enm.mx.b))
sp1_breadth$B2
## [1] 0.6369572
sp2_breadth <- ENMTools::raster.breadth(terra::rast(sp2_enm.mx.b))
sp2_breadth$B2
## [1] 0.1665568
sp3_breadth <- ENMTools::raster.breadth(terra::rast(sp3_enm.mx.b))
sp3_breadth$B2
## [1] 0.04380163
sp4_breadth <- ENMTools::raster.breadth(terra::rast(sp4_enm.mx.b))
sp4_breadth$B2
## [1] 0.4567

ENM Overlap

Calculating niche overlap, Schoener’s D, with ENMEval - Schoener’s D ranges from 0 to 1, where 0 represents no similarity between the projections and 1 represents completely identical projections.

Stack the projections and make sure the stack is named.

enm_stack.b <- stack(sp1_enm.mx.b, sp2_enm.mx.b, sp3_enm.mx.b, sp4_enm.mx.b)
names(enm_stack.b) <- c("Galax urceolata", "Pyxidanthera barbulata",  "Pyxidanthera brevifolia","Shortia galacifolia" )

Calculate

calc.niche.overlap(enm_stack.b, overlapStat = "D")
## 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |======================================================================| 100%
##                         Galax.urceolata Pyxidanthera.barbulata
## Galax.urceolata                      NA                     NA
## Pyxidanthera.barbulata        0.2108513                     NA
## Pyxidanthera.brevifolia       0.1287981              0.1614858
## Shortia.galacifolia           0.6447093              0.1228339
##                         Pyxidanthera.brevifolia Shortia.galacifolia
## Galax.urceolata                              NA                  NA
## Pyxidanthera.barbulata                       NA                  NA
## Pyxidanthera.brevifolia                      NA                  NA
## Shortia.galacifolia                  0.03923816                  NA

Phylogenetic correlations

Let’s look at niche overlap over a tree!

Load the tree file

tree <- ape::read.tree(file = 
                         "data/Ecological_Niche_Modeling/AOC_Test_Demo/diapensiaceae_subset.tre")
tree <- ape::compute.brlen(phy = tree, method = "grafen")
plot(tree)

Drop the outgroup

tree <- drop.tip(tree, "Cyrilla_racemiflora")
plot(tree)

Load data file

alldf <- read.csv("data/cleaning_demo/maxent_ready/diapensiaceae_maxentready_20230605.csv")

Subset for each species

Galax_urceolata <- dplyr::filter(alldf, species == "Galax urceolata")
Pyxidanthera_barbulata <- dplyr::filter(alldf, species == "Pyxidanthera barbulata")
Pyxidanthera_brevifolia <- dplyr::filter(alldf, species == "Pyxidanthera brevifolia")
Shortia_galacifolia <- dplyr::filter(alldf, species == "Shortia galacifolia")

Generate species objects for each tree member!

## Generate species objects for each tree member!
sp1 <- enmtools.species(species.name = "Galax_urceolata",
                        presence.points = vect(Galax_urceolata[,2:3],
                                               geom = c("longitude", "latitude")))
crs(sp1$presence.points) <- crs("+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs")
sp1$range <- background.raster.buffer(sp1$presence.points, 
                                      25000, 
                                      mask = terra::rast(allstack))
sp1$background.points = background.points.buffer(points = sp1$presence.points, 
                                                 radius = 5000,
                                                 n = 10000, 
                                                 mask =  terra::rast(allstack[[1]]))
##############################
sp2 <- enmtools.species(species.name = "Pyxidanthera_barbulata",
                        presence.points = vect(Pyxidanthera_barbulata[,2:3], 
                                               geom = c("longitude", "latitude")))
crs(sp2$presence.points) <- crs("+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs")
sp2$range <- background.raster.buffer(sp2$presence.points, 
                                      25000, 
                                      mask = terra::rast(allstack))
sp2$background.points = background.points.buffer(points = sp2$presence.points,
                                                 radius = 5000,
                                                 n = 10000, 
                                                 mask =  terra::rast(allstack[[1]]))
#############################
sp3 <- enmtools.species(species.name = "Pyxidanthera_brevifolia",
                        presence.points = vect(Pyxidanthera_brevifolia[,2:3], 
                                               geom = c("longitude", "latitude")))
crs(sp3$presence.points) <- crs("+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs")
sp3$range <- background.raster.buffer(sp3$presence.points,
                                      25000, 
                                      mask =  terra::rast(allstack))
sp3$background.points = background.points.buffer(points = sp3$presence.points,
                                                 radius = 5000, 
                                                 n = 10000, 
                                                 mask =   terra::rast(allstack[[1]]))
#############################
sp4 <- enmtools.species(species.name = "Shortia_galacifolia",
                        presence.points = vect(Shortia_galacifolia[,2:3], 
                                               geom = c("longitude", "latitude")))
crs(sp4$presence.points) <- crs("+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs")
sp4$range <- background.raster.buffer(sp4$presence.points, 
                                      25000, 
                                      mask =  terra::rast(allstack))
sp4$background.points = background.points.buffer(points = sp4$presence.points, 
                                                 radius = 5000,
                                                 n = 10000, 
                                                 mask =   terra::rast(allstack[[1]]))

Create “clade” object with all the species in the tree

clade=enmtools.clade(species = list(sp1, sp2, sp3, sp4), tree = tree)
check.clade(clade)

Age-Range Correlation Test

range.aoc <- enmtools.aoc(clade = clade,  nreps = 10, 
                          overlap.source = "range")
summary(range.aoc)
## 
## 
## Age-Overlap Correlation test
## 
## 10 replicates 
## 
## p values:
## Intercept       Age 
## 0.2272727 0.2272727

## NULL

Age-Overlap Correlation Test

glm.aoc <- enmtools.aoc(clade = clade, 
                        env = terra::rast(allstack), nreps = 5,
                        overlap.source = "glm")
summary(glm.aoc)
## 
## 
## Age-Overlap Correlation test
## 
## 5 replicates 
## 
## p values:
## Intercept       Age 
##       0.5       0.5

## NULL

The results here are pretty meaningless since we’re looking at very few species, but it serves the purpose of the demo. For range AOCs, an intercept >0.5 and negative slope are indicative of sympatric species, while an intercept of <0.5 and a positive slope are indicative on non-sympatric speciation. A low intercept and positive slope for niche overlap would indicate niche divergence. ***

Hypervolume

This will generate a binary map with any cell that contains a suitability score greater than or equal to the lowest score with an occurrence point as a presence. There are other ways to set the threshold, including percentiles or training statistics.

Create the binary maps

sp1.dist <- make.binary.map(model = sp1_enm.mx.b, occ.dat = Galax_urceolata)
sp2.dist <- make.binary.map(model = sp2_enm.mx.b, occ.dat = Pyxidanthera_barbulata)
sp3.dist <- make.binary.map(model = sp3_enm.mx.b, occ.dat = Pyxidanthera_brevifolia)
sp4.dist <- make.binary.map(model = sp4_enm.mx.b, occ.dat = Shortia_galacifolia)

Plot

par(mfrow = c(2,2),  mar = c(1,1,1,1))
plot(sp1.dist)
plot(sp2.dist)
plot(sp3.dist)
plot(sp4.dist)

Hypervolume

Next, let’s work on getting some data from the predicted distributions! Niche space can be thought of as a multi-dimensional hypervolume. We’re using climatic data in this case, so we’re measuring the hypervolume of climatic niche space occupied by these species.
Warning: This takes forever!

sp1.hv <- get_hypervolume(binary_projection = sp1.dist, envt = allstack)
sp2.hv <- get_hypervolume(binary_projection = sp2.dist, envt = allstack)
sp3.hv <- get_hypervolume(binary_projection = sp3.dist, envt = allstack)
sp4.hv <- get_hypervolume(binary_projection = sp4.dist, envt = allstack)

Compare the hypervolumes

hv_set <- hypervolume_set(hv1 = sp1.hv, hv2 = sp2.hv, check.memory = F)
hypervolume_overlap_statistics(hv_set)
plot(hv_set)
get_volume(hv_set)