Using remote sensing to map invasive species

Success in the strategies to manage biological invasions rely in our ability to detect and monitor species distributions. However, the field surveys designed to detect and map invasive species often take enormous logistic efforts, in time and space, and might lead to observational bias (e.g. decreased sampling effort at less accessible areas) which can negatively affect the outcome and its applicability.

Remote sensing provides standardized large-scale spatially and temporally explicit data to map and monitor changes in the environment and the distribution of native and invasive species. It helps field surveys overcome the logistical barrier and facilitates the detection and monitoring of species at multiple temporal and spatial scales. Here, as part of the DIARS project, we present a method to map invasive species using hyperspectral images.

The dataset

In this tutorial we will map the distribution of a invasive bryophyte species (Campylopus introflexus), based on the study presented in Skowronek et al. 2016 at Sylt island (Germany) as part of the DIARS project.

For the purpose of this tutorial and in order to reduce the computer processing time (while using real remote sensing data in the analyzes), we provide an already prepared artificial image (click HERE to read more information on the process to create this image).

Throughout the tutorial you will find this gray boxes. They contain general “best practices” tips that will help you improve the quality of the analysis when applying this framework to your own study case.

The framework
Figure 1. Framework used to map invasive species distribution using remote sensing data.

You can find more information on the software we will be using in this tutorial HERE.

Let’s start!

Setting the R environment

Let’s start by loading the libraries that we are going to be using in this tutorial. To load the packages, it is necessary to first install them using the command install.packages. After installation you can load the packages in R:

# Run this command if the packages are not already installed
# install.packages(c('raster','rgdal','hyperSpec','rgeos','rgrass7'))

# Load the packages
library(raster)
library(rgdal)
library(hyperSpec)
library(rgeos)
library(rgrass7)

Now create a folder where the datasets are going to be located and the outputs and results will be saved. To set the working directory in R to the folder created, execute the following command line, replacing the path to the folder you just created:

setwd("/home/garzonc/Desktop/DIARS")

Data preparation

Three datasets are required to map the species through remote sensing: i) two shapefiles with the x,y locations of the field plots that correspond to the calibration and validation points, ii) a shapefile containing a set of points randomly distributed within the study area that do not overlap with the field plots and will serve as background points and, iii) a hyperspectral image in tiff format. You can download a sample dataset HERE and save them in the working directory you just created.

Remember to write down the resolution, projection and geographic region of all your shape and raster files. Make sure they match!

Setting up the field and remote sensing based-data

The first step in the mapping process is to link the field data with the hyperspectral data, that is, to extract the hyperspectral values (reflectances at different wavelenghts) at each x,y location of the validation, calibration and background points. Let’s start!

To extract the spectral values for calibration, validation and background points we first need to load the hyperspectral image in R. Here we also plot the hyperspectral imagery and the field plots to check that all of the field plots are located within the hyperspectral image.

# Create a RasterStack object with the hyperspectral bands in tiff format
layers <- stack("mosaic_datasetSyltMix.tiff")
# Explore and visualize the bands in the RasterStack
layers
## class       : RasterStack 
## dimensions  : 180, 175, 31500, 248  (nrow, ncol, ncell, nlayers)
## resolution  : 1.8, 1.8  (x, y)
## extent      : 469443.6, 469758.6, 6075848, 6076172  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 
## names       : mosaic_datasetSyltMix.1, mosaic_datasetSyltMix.2, mosaic_datasetSyltMix.3, mosaic_datasetSyltMix.4, mosaic_datasetSyltMix.5, mosaic_datasetSyltMix.6, mosaic_datasetSyltMix.7, mosaic_datasetSyltMix.8, mosaic_datasetSyltMix.9, mosaic_datasetSyltMix.10, mosaic_datasetSyltMix.11, mosaic_datasetSyltMix.12, mosaic_datasetSyltMix.13, mosaic_datasetSyltMix.14, mosaic_datasetSyltMix.15, ...
plot(layers[[150]], legend = FALSE)  #Plot band 150 

To read the point shapefiles make sure they are located in the current working folder. Initiate the GRASS software from within R, create a directory that will contain the GRASS GIS database for the working session, set the gisBase argument to GRASS binaries and libraries directory path, this should be something like C:\Program Files\GRASS GIS 7.00beta4 if your are using Windows (check HERE for other operating systems) and, set the correct projection, geographic region and spatial resolution before opening any shapefile:

dir.create("grassdata")  #Create a grassdata directory
myGRASS <- "/home/garzonc/src/grass72/grass72_release/dist.x86_64-pc-linux-gnu"  #Set the location of the grass binaries and libraries
myGISDbase <- "/home/garzonc/Documents/grassData"  #Set the directory path to your GISDbase
myLocation <- "DIARS"  #Name the Location
myMapset <- "PERMANENT"  #Name the mapset

# Initiate the GRASS GIS session
initGRASS(myGRASS, home = tempdir(), gisDbase = myGISDbase, location = myLocation, 
    mapset = myMapset, override = TRUE)
## Error in initGRASS(myGRASS, home = tempdir(), gisDbase = myGISDbase, location = myLocation, : could not find function "initGRASS"
# Set the projection
execGRASS("g.proj", flags = c("c"), parameters = list(epsg = 32632))
## Error in execGRASS("g.proj", flags = c("c"), parameters = list(epsg = 32632)): could not find function "execGRASS"
# Set the region and resolution
execGRASS("g.region", parameters = list(n = "6080248.2", s = "6079942.2", e = "464307.5969", 
    w = "463974.5969", res = "1.8"))
## Error in execGRASS("g.region", parameters = list(n = "6080248.2", s = "6079942.2", : could not find function "execGRASS"

Now the R environment is set and you can import the calibration points shapefile using the v.in.ogr GRASS module:

execGRASS("v.in.ogr", flags = c("overwrite", "quiet"), parameters = list(input = "/home/garzonc/Desktop/DIARS/cal_pointsR.shp", 
    output = "cal_pointsR"))
pointdata_cal <- readVECT("cal_pointsR")  #Read the imported vector file
## Exporting 90 features...
##    5%  11%  17%  23%  30%  36%  42%  48%  54%  60%  66%  72%  78%  84%  90%  96% 100%
## v.out.ogr complete. 90 features (Point type) written to <cal_pointsR>
## (SQLite format).
## OGR data source with driver: SQLite 
## Source: "/home/garzonc/Documents/grassData/DIARS/PERMANENT/.tmp/localhost.localdomain/790.0", layer: "cal_pointsR"
## with 90 features
## It has 9 fields
layers <- stack("mosaic_datasetSyltMix.tiff")
plot(layers[[150]], legend = FALSE)  #Plot hyperspectral image band 150
points(pointdata_cal, pch = 3, cex = 0.5)  #Add the calibration points to the plot

Is important to also extract the values for an area around the point (buffer area) to extract the wavelenghts that reflect the plot area and so the buffer must be the same size as the field plots.

# Create a buffer area of width 3 meters diameter around each point in the
# calibration points file
dataBuffer <- gBuffer(pointdata_cal, byid = T, width = 3, capStyle = "SQUARE")

# Plot the outcome
plot(layers[[150]], legend = FALSE)  #Plot band 150 of the hyperspectral image
points(pointdata_cal, pch = 3, cex = 0.3)  #Add the calibration points to the plot
plot(dataBuffer, add = TRUE)  #Add the buffer areas to the plot

Create a table with only the coordinates (x,y) columns

pointdataXY_calB <- pointdata_cal@data[, 5:6]

And extract and calculate the mean value in each buffer area and for each band, and save them as “pixvals”. Then, create the “type” column and finally join all the columns (x,y coordinates, type and pixvals) and edit the row names in the table. Preview the resulting table and save it as a csv file.

# Extract and calculate the mean values
pixExtract <- extract(layers, dataBuffer, method = "simple", small = T, weights = T, 
    fun = mean)

type <- rep("Calibration", 90)  #Create the type column 
calXY_pix <- cbind(type, pointdata_cal@data[, -c(1:4, 7:8)], pixExtract)  #Join all the columns
rownames(calXY_pix) <- pointdata_cal@data$cat  #Edit the rownames
# head(calXY_pix) #Run this line to preview the first 5 rows of the table
write.table(calXY_pix, "PointCalib.csv", quote = F)  #Save the table as a csv file

Now we have to follow the same procedure to extract the spectral values for the validation points:

execGRASS("v.in.ogr", flags = c("overwrite", "quiet"), parameters = list(input = "/home/garzonc/Desktop/DIARS/val_pointsR.shp", 
    output = "val_pointsR"))
## Error in parseGRASS(cmd, legacyExec = legacyExec): The command
##    v.in.ogr --interface-description
## could not be run (127), and produced the error message:
##    sh: v.in.ogr: command not found
pointdata_val <- readVECT("val_pointsR")
## Error in parseGRASS(cmd, legacyExec = legacyExec): The command
##    v.in.ogr --interface-description
## could not be run (127), and produced the error message:
##    sh: v.in.ogr: command not found
plot(layers[[150]], legend = FALSE)

points(pointdata_val, pch = 3, cex = 0.5)
## Error in points(pointdata_val, pch = 3, cex = 0.5): object 'pointdata_val' not found
pointdataXY_val <- pointdata_val@data[, 4:5]
## Error in eval(expr, envir, enclos): object 'pointdata_val' not found
pixExtractV <- extract(layers, pointdataXY_val, method = "simple")
## Error in extract(layers, pointdataXY_val, method = "simple"): object 'pointdataXY_val' not found
type <- rep("Validation", 166)
ValXY_pix <- cbind(type, pointdata_val@data[, -c(1:3, 6:7, 9)], pixExtractV)
## Error in cbind(type, pointdata_val@data[, -c(1:3, 6:7, 9)], pixExtractV): object 'pointdata_val' not found
rownames(ValXY_pix) <- pointdata_val@data$cat
## Error in eval(expr, envir, enclos): object 'pointdata_val' not found
# head(ValXY_pix) #Run this line to preview the first 5 rows of the table
write.table(ValXY_pix, "PointValid.csv", quote = F)
## Error in is.data.frame(x): object 'ValXY_pix' not found

Finally, we do the same for the background points.

execGRASS("v.in.ogr", flags = c("overwrite", "quiet"), parameters = list(input = "/home/garzonc/Desktop/DIARS/back_pointsR.shp", 
    output = "back_pointsR"))

pointdata_bac <- readVECT("back_pointsR")
layers <- stack("mosaic_datasetSyltMix.tiff")
plot(layers[[150]], legend = FALSE)
points(pointdata_bac, pch = 3, cex = 0.5)

pointdataXY_bac <- pointdata_bac@data[, 5:6]

pixExtractB <- extract(layers, pointdataXY_bac, method = "simple")
type <- rep("Background", 1000)
BacXY_pix <- cbind(type, pointdata_bac@data[, -c(1:4, 7:8)], pixExtractB)
rownames(BacXY_pix) <- pointdata_bac@data$cat
# head(BacXY_pix) #Run this line to preview the first 5 rows of the table
write.table(BacXY_pix, "PointBack.csv", quote = F)

Note that the background points must be carefully selected to reduce or avoid selecting false absences (see Lobo et al. 2010 on the selection of pseudoabsences). You can read more on the procedure followed to select pseudoabsences for the study species mapped in this tutorial in Skowronek et al. 2016.

Working with maxent

When working with Maxent is important to understand the idea and become familiar with the terms in order to interpret the outcome and identify its advantages and limitations. To learn more about maxent check Merow et al. 2013

First, it is important to download Maxent from the website https://www.cs.princeton.edu/~schapire/maxent/), install the dismo package and, copy and paste the file maxent.jar in the folder called java of the dismo package folder:

# To install dismo and maxent
install.packages("maxent", "dismo")
# Locate maxent and the java folder in dismo
system.file(package = "maxent")
system.file("java", package = "dismo")
# After copy/paste the maxent.jar in the java folder load the libraries
library(dismo)
library(rJava)

# Set the working directory
setwd("/home/garzonc/Desktop/DIARS")

Load the tables with the previously extracted values for the buffer areas around the calibration points, and use the vector cal_points to select the plots with >0 species coverage values (cam_cover column). Join the presences table with the background table, and in the resulting table omit the xy coordinates and “type” columns and only keep the spectral values. Finally create a vector y with 1 and 0 values corresponding to the presences (1) and background (0) rows.

occs <- read.table("PointCalib.csv", header = T, sep = " ")

occsPRES <- subset(occs, occs$cam_cover > 0)
occsABS <- subset(occs, occs$cam_cover < 0.1)


occsPRES <- occsPRES[, -c(1:4)]
occsABS <- occsABS[, -c(1:4)]

Background <- read.table("PointBack.csv", header = T, sep = " ")[, -c(1:4)]
specs <- rbind(occsPRES, occsABS, Background)
specs <- specs[, -c(1, 2, 146:157, 194:220)]
y <- c(rep(1, nrow(occsPRES)), rep(0, nrow(occsABS)), rep(0, nrow(Background)))

The resulting tables should look like this: Table of presence (1) and absences (0) (y) and table with wavelength values per band (specs)

Once you have the table ready load the package dismo and run the model using the tables just created (y and specs) and plot the model outcome:

library(dismo)
library(rJava)

me <- maxent(specs, y, args = c("addsamplestobackground=true", "applyThresholdRule=10 percentile training presence"), 
    path = "./outputR")
pred <- predict(me, layers, progress = "text")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================================| 100%
## 
plot(pred)

# save the model outcomes
writeRaster(pred, "./outputR/map.tif", overwrite = T)
save(me, file = "./outputR/mx_obj.RData")

More information on species distribution modeling in R using the Dismo package can be found in https://cran.r-project.org/web/packages/dismo/vignettes/sdm.pdf

Model evaluation

How well does the model prediction fits the data?

The aim of model evaluation is precisely to answer this question, and based on the outcome of this evaluation calculate a threshold to transform the data from probabilities of presence to a binary presence/absence map. The evaluate function performs a cross-validation of the model with presence/absence data. The outcomes of such evaluation are a set of thresholds with its corresponding confusion matrices and evaluation statistics.

To evaluate the model we will use the validation points, so load the table with the spectral values of the validation points buffer areas (previously created), divide the table in presences and absences to create the vector of 0 and 1 values (y), join it all in one table.

occval <- read.table("PointValid.csv", header = T, sep = " ")  #Load validation points 
occsP <- subset(occval, occval$cam_cover > 0)  #presences
occsA <- subset(occval, occval$cam_cover < 0.1)  #absences

The validation table just created is used in the evaluate function to evaluate the model previously constructed. The resulting evaluation contains the outcomes of the model cross-validation, such as the True Skill Statistic (TSS) and ROC curves.

y <- c(rep(1, nrow(occsP)), rep(0, nrow(occsA)))  # presence (1) and absence (0) column

occsP <- occsP[, -c(1:4)]
occsA <- occsA[, -c(1:4)]  #remove columns with data different from the wavelenght values

occsP <- occsP[, -c(1, 2, 146:157, 194:220)]
occsA <- occsA[, -c(1, 2, 146:157, 194:220)]  #remove the wavelenghts that have been excluded from the analysis

# evaluate model performance
library(dismo)
e <- evaluate(occsP, occsA, me)
e
## class          : ModelEvaluation 
## n presences    : 49 
## n absences     : 117 
## AUC            : 0.5098552 
## cor            : 0.02716758 
## max TPR+TNR at : 0.2268233
# Plot the results of the evaluation
par(mfrow = c(1, 2))
tss <- e@TPR + e@TNR - 1
plot(e@t, tss, type = "l", xlab = "Thresholds", ylab = "TSS", main = "TSS")
e@t[which.max(tss)]
## [1] 0.2268233
plot(e, "ROC")

Using the threshold function is possible to find the set of thresholds with the highest evaluation statistics values. This information will allow us to select the threshold value at which the model probabilities should be partition to generate a binary presence/absence map.

# find the thresholds that can serve to partition the model prediction in
# presence/absence
e <- evaluate(occsP, occsA, me)
th.sy = as.vector(t(threshold(e)))
th.sy
## [1] 0.6047138 0.2268233 0.1455828 0.2875774 0.4636983 0.2335773
# evaluate again for the estimated thresholds
ev1.sy = evaluate(occsPRES, occsABS, me, tr = th.sy)
ev1.sy
## class          : ModelEvaluation 
## n presences    : 60 
## n absences     : 30 
## AUC            : 0.8244444 
## cor            : 0.533745 
## max TPR+TNR at : 0.4636983
# find the thresholds in the second model evaluation
threshold(ev1.sy)
##                kappa spec_sens no_omission prevalence equal_sens_spec
## thresholds 0.4636983 0.4636983   0.2335773  0.6047138       0.4636983
##            sensitivity
## thresholds   0.4636983
# sort thresholds from smallest to largest
th.order.sy = sort(threshold(ev1.sy))

# get threshold value name correspondance (in order)
th.name.sy <- names(th.order.sy)
th.order.sy
##            no_omission     kappa spec_sens equal_sens_spec sensitivity
## thresholds   0.2335773 0.4636983 0.4636983       0.4636983   0.4636983
##            prevalence
## thresholds  0.6047138
# create a boxplot of the evaluation results
boxplot(e, col = c("red", "green"))

# create an overview table
sy.ov <- cbind(th.name.sy, ev1.sy@t, ev1.sy@confusion, ev1.sy@auc, ev1.sy@CCR, 
    ev1.sy@MCR, ev1.sy@kappa)
# define column names
colnames(sy.ov) <- c("Th_name", "Th_value", "tp", "fp", "fn", "tn", "AUC", "CCR", 
    "MCR", "kappa")
head(sy.ov)
##      Th_name           Th_value     tp   fp   fn   tn  
## [1,] "no_omission"     "0.14558278" "60" "29" "0"  "1" 
## [2,] "kappa"           "0.22682327" "60" "26" "0"  "4" 
## [3,] "spec_sens"       "0.23357725" "60" "26" "0"  "4" 
## [4,] "equal_sens_spec" "0.28757744" "57" "24" "3"  "6" 
## [5,] "sensitivity"     "0.46369834" "47" "9"  "13" "21"
## [6,] "prevalence"      "0.60471381" "12" "0"  "48" "30"
##      AUC                 CCR                 MCR                
## [1,] "0.824444444444444" "0.677777777777778" "0.322222222222222"
## [2,] "0.824444444444444" "0.711111111111111" "0.288888888888889"
## [3,] "0.824444444444444" "0.711111111111111" "0.288888888888889"
## [4,] "0.824444444444444" "0.7"               "0.3"              
## [5,] "0.824444444444444" "0.755555555555556" "0.244444444444444"
## [6,] "0.824444444444444" "0.466666666666667" "0.533333333333333"
##      kappa              
## [1,] "0.043956043956044"
## [2,] "0.170212765957447"
## [3,] "0.170212765957447"
## [4,] "0.181818181818182"
## [5,] "0.467741935483871"
## [6,] "0.142857142857143"
sy.ov
##      Th_name           Th_value     tp   fp   fn   tn  
## [1,] "no_omission"     "0.14558278" "60" "29" "0"  "1" 
## [2,] "kappa"           "0.22682327" "60" "26" "0"  "4" 
## [3,] "spec_sens"       "0.23357725" "60" "26" "0"  "4" 
## [4,] "equal_sens_spec" "0.28757744" "57" "24" "3"  "6" 
## [5,] "sensitivity"     "0.46369834" "47" "9"  "13" "21"
## [6,] "prevalence"      "0.60471381" "12" "0"  "48" "30"
##      AUC                 CCR                 MCR                
## [1,] "0.824444444444444" "0.677777777777778" "0.322222222222222"
## [2,] "0.824444444444444" "0.711111111111111" "0.288888888888889"
## [3,] "0.824444444444444" "0.711111111111111" "0.288888888888889"
## [4,] "0.824444444444444" "0.7"               "0.3"              
## [5,] "0.824444444444444" "0.755555555555556" "0.244444444444444"
## [6,] "0.824444444444444" "0.466666666666667" "0.533333333333333"
##      kappa              
## [1,] "0.043956043956044"
## [2,] "0.170212765957447"
## [3,] "0.170212765957447"
## [4,] "0.181818181818182"
## [5,] "0.467741935483871"
## [6,] "0.142857142857143"
# save results
write.csv(sy.ov, "sy.overview.csv", quote = FALSE, row.names = FALSE)

Distribution map

Now we can generate a presence map using the threshold value that maximizes the kappa value (to learn more about the kappa statistic check Segurado and Araújo, 2013) where the green areas correspond to presence and the white areas to absences:

e <- evaluate(occsP, occsA, me)
# Plot the model prediction
par(mfrow = c(1, 1))
tr <- threshold(e, "kappa")
tr
## [1] 0.6047138
plot(pred > tr, main = "Presence/absence map", legend = FALSE)

# See all the outcomes of the model evaluation
str(e)
## Formal class 'ModelEvaluation' [package "dismo"] with 22 slots
##   ..@ presence  : num [1:49] 0.401 0.545 0.506 0.535 0.465 ...
##   ..@ absence   : num [1:117] 0.476 0.515 0.592 0.419 0.636 ...
##   ..@ np        : int 49
##   ..@ na        : int 117
##   ..@ auc       : num 0.51
##   ..@ pauc      : num(0) 
##   ..@ cor       : Named num 0.0272
##   .. ..- attr(*, "names")= chr "cor"
##   ..@ pcor      : num 0.728
##   ..@ t         : num [1:168] 0.126 0.13 0.133 0.138 0.142 ...
##   ..@ confusion : int [1:168, 1:4] 49 49 49 49 49 49 48 48 47 47 ...
##   .. ..- attr(*, "dimnames")=List of 2
##   .. .. ..$ : NULL
##   .. .. ..$ : chr [1:4] "tp" "fp" "fn" "tn"
##   ..@ prevalence: num [1:168] 0.295 0.295 0.295 0.295 0.295 ...
##   ..@ ODP       : num [1:168] 0.705 0.705 0.705 0.705 0.705 ...
##   ..@ CCR       : num [1:168] 0.295 0.301 0.307 0.313 0.319 ...
##   ..@ TPR       : num [1:168] 1 1 1 1 1 ...
##   ..@ TNR       : num [1:168] 0 0.00855 0.01709 0.02564 0.03419 ...
##   ..@ FPR       : num [1:168] 1 0.991 0.983 0.974 0.966 ...
##   ..@ FNR       : num [1:168] 0 0 0 0 0 ...
##   ..@ PPP       : num [1:168] 0.295 0.297 0.299 0.301 0.302 ...
##   ..@ NPP       : num [1:168] NaN 1 1 1 1 ...
##   ..@ MCR       : num [1:168] 0.705 0.699 0.693 0.687 0.681 ...
##   ..@ OR        : num [1:168] NaN Inf Inf Inf Inf ...
##   ..@ kappa     : num [1:168] 0 0.00506 0.01016 0.0153 0.02047 ...
You can also explore the metadata, modelling process and outcomes from the maxent model by opening the html file stored in your local ./outputR1 folder.

References

Merow, C., Smith, M. J. and Silander, J. A. (2013), A practical guide to MaxEnt for modeling species’ distributions: what it does, and why inputs and settings matter. Ecography, 36: 1058–1069. doi:10.1111/j.1600-0587.2013.07872.x

Segurado, P. and Araújo, M. B. (2004), An evaluation of methods for modelling species distributions. Journal of Biogeography, 31: 1555–1568. doi:10.1111/j.1365-2699.2004.01076.x

Skowronek, S., Ewald, M., Isermann, M. et al. (2016). Mapping an invasive bryophyte species using hyperspectral remote sensing data. Biol Invasions doi:10.1007/s10530-016-1276-1

Back to the top

Authors: Sandra Skowronek, Carol Garzon-Lopez, Tarek Hattab, Duccio Rocchini & Jonathan Lenoir

Date: 2017-07-05

Built with RMarkdown and hosted on Github Pages.