A method to develop an informed sampling design and absence selection for invasive species

Recording and monitoring invasive species is one critical step in the efforts to manage biological invasions, and with this purpose, ground observations are often used in species distribution models to produce current and potential distribution maps. However, field sampling is susceptible to bias due to factors such as sampling effort, species detectability, environmental heterogeneity, species identification, among others. This is why, combining the knowledge on the ecology of the species with an informed sampling design (e.g. sampling locations and effort) is of paramount importance to accurately record species occurrences.

Furthermore, at the modelling stage, data on the locations where the species is absent is key to accurately tune in the distribution models (SDM). One key assumption in SDMs is that species are at equilibrium with the environment, that is, the species has already occupied all the suitable habitat. That is not the case for invasive species where the distribution depends on the time since invasion and its rate of colonization and establishment, and thus the importance of a careful selection of absences to avoid the areas of suitable habitat not yet colonized by the invader (known as contingent absences) (Lobo et al. 2010).

In this tutorial, we will work with the iSDM R package (CRAN site), developed as part of the DIARS project and presented in Hattab et al. 2017, with the aim of providing tools to improve sampling design and absences selection using environmental factors derived from LiDAR data.

The dataset

This tutorial is divided in two sections:

1. Performing an environmental systemic sampling design

2. Calculating the likelihood of absences in suitable areas

You can download the dataset for this tutorial from HERE. The dataset contains a set of raster files with 5 environmental variables derived from LiDAR and a table of species presence/absence.

Environmental systemic sampling design

Let’s start by installing and loading the libraries that we are going to be using in this tutorial.

install.packages("devtools", "iSDM")

library(iSDM)

library(raster)

Set up the working directory and import the raster files with environmental variables of the study area:

library(raster)
setwd("/home/garzonc/Desktop/DIARS")
files <- list.files("/home/garzonc/Desktop/DIARS/Rasters1/", full.names = TRUE)
envData <- stack(files)
plot(envData)

After importing the raster layers you can use the function eSample to perform an environmental systemic sampling design. Before running the line is important to determine the number of sampling points needed (nExpect), the number of dimensions (maximum three) to be used in the ordination analysis, and the upper (upperLim) and lower (lowerLim) limits of the quantile probability (0 to 1). NOTE: You can change the probabilities and the number of samples to explore the combination that results in a robust sampling design for the study area and the focal species.

library(iSDM)
MySampling <- eSample(envData, nExpect = 50, plot = TRUE, saveShape = TRUE, 
    lowerLim = 0.001, upperLim = 0.999, nf = 3)
## [1] "-----------------Optimization of the 3 D grid size---------------------"
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |=======================================================          |  84%
## [1] "-----------------51 points found---------------------"

You must enable Javascript to view this page properly.

plot(envData[[1]], legend = FALSE, col = "gray")  #Plot the first layer in the environmental variables stack
plot(MySampling[[1]], add = TRUE, col = 2, pch = 19)
legend(x = "bottomright", legend = "Sampling points", col = "red", pch = 19, 
    cex = 1)

Calculating the likelihood of absences in suitable habitat

The function pDLA, from the iSDM package, allows the calculation of the probability of detecting absences at suitable areas (known as contingent absences). The user must provide a table of species presences and absences collected from the field. If absence data is not available, the function can be used with previously generated pseudoabsences. The function is based on the idea that absence data environmentally close (within the suitable habitat) but geographically distant from presences are more likely to be contingent absences.

Let’s start by installing and loading the libraries that will be using in this tutorial:

library(raster)
library(sp)
library(colorRamps)
library(virtualspecies)
library(ggplot2)
# install.packages('devtools')

library(iSDM)

Set the working directory, import the raster files with environmental variables of the study area, and using the package virtual species generate a virtual “invasive” species in a non equilibrium state with the environment (i.e. not all the suitable habitat is occupied) colonizing a specific area of the study site (a tutorial of this package can be found HERE):

library(raster)
setwd("/home/garzonc/Desktop/DIARS")

files <- list.files("/home/garzonc/Desktop/DIARS/Rasters1/", full.names = TRUE)
names(files) = paste("var", 1:5, sep = "")
envData <- stack(files)


library(virtualspecies)
# Virtual species response to environmetal data

potential.dist <- generateRandomSp(raster.stack = envData[[c("var1", "var2", 
    "var5")]], relations = c("gaussian"))

# limit the potential distribution to an specific area with the realized
# distribution (non equilibrium with the environment)
realized.dist <- limitDistribution(x = potential.dist$suitab.raster, area = extent(453000, 
    465500, 6080000, 6095000))

Pres <- coordinates(realized.dist$occupied.area)
P <- as.data.frame(Pres[which(values(realized.dist$occupied.area) > 0.5), ])
Presence <- P[sample(300), ]


Abse <- coordinates(realized.dist$occupied.area)
A <- as.data.frame(Abse[which(values(realized.dist$occupied.area) < 0.5), ])
Absence <- A[sample(300), ]
library(sp)
occData <- as.data.frame(rbind(cbind(Presence, SP = rep(1, 300)), cbind(Absence, 
    SP = rep(0, 300))))
occData <- na.omit(occData)
coordinates(occData) <- ~x + y
proj4string(occData) <- proj4string(envData)

plot(potential.dist$suitab.raster, legend = FALSE)
plot(occData, add = TRUE, pch = 1, cex = 0.1, col = ifelse(occData$SP == 0, 
    "red", "blue"))
legend(x = "bottomright", legend = c("Presence", "Absence"), col = c("blue", 
    "red"), pch = 1, cex = 1)

Now calculate the probability of detecting contingent absences, based on the environmental layers and display the results:

# envData<-stack(list.files('./Rasters/',full.names=TRUE))

likelihood <- pDLA(occData = occData, envData = envData[[c(1:5)]], longlat = FALSE)

likelihood1 <- as.data.frame(cbind(likelihood@coords, likelihood@data[, "PDLA"]))
colnames(likelihood1) <- c("x", "y", "PDLA")

# To plot the resulting map using ggplot2:

library(ggplot2)
envMap <- rasterToPoints(potential.dist$suitab.raster)
Map <- data.frame(envMap)
colnames(Map) <- c("X", "Y", "MAP")
ggplot(data = Map, aes(y = Y, x = X)) + geom_raster(fill = "gray") + geom_point(data = likelihood1, 
    aes(x = x, y = y, color = PDLA)) + theme_bw() + coord_equal() + scale_color_gradient("PDLA", 
    low = "dark green", high = "red") + theme(axis.title.x = element_blank(), 
    axis.title.y = element_blank(), panel.grid.major = element_blank(), panel.grid.minor = element_blank(), 
    legend.position = "right", legend.key = element_blank())

The resulting table contains a column with likelihood of contingent absences.

More information about the iSDM package and the functions presented in this tutorial can be found at HERE or HERE