Species Distribution Models (Correlative)

Also known as: SDMs, Niche Models

Variants: General linear models, Generalised additive models, Maximum Entropy Models, Multiple Adaptive Regression Splines, Neural Networks, Boosted Regression Trees, Random Forests

One sentence description: Predict the probability of presence of a species at a given location, normally as a function of the environmental properties associated with that location. Usually applied across a mapped landscape.

Key references:

Key examples:

Description: At their heart, statistical SDMs contain relationships between predictor variables that they can process (most commonly environmental variables such as mean annual temperature) and some correlate of the species probability of occurrence. SDMs are most commonly implemented by running the model over a landscape for which the necessary environmental variables have been mapped, enabling the production of a map of species probability of occurrence

Ecology typically predicted: Either probability of occurrence of individual species or their relative frequency. Sometimes implemented for multiple species at once enabling insights into variation in species richness

Pre-requisite skills: Correlative species distribution models are one of the most widely used methods in predictive ecology and there are a wide variety of software tools to facilitate their use and development. As a consequence it is relatively easy for people to build such models, although expert knowledge is needed to build good models. Example key considerations are (i) steps to avoid overfitting (ii) accounting for sampling bias and (iii) avoiding making misleading extrapolations.

Strengths: Correlative species distribution models can be used on presence only and presence absence data which are often the only information available about many species. As a result, they can be used to extend what can be hypothesised about the distributions of those species. They are relatively easy to question and understand in terms of their underlying assumptions. They are also easy to implement.

Limitations: It is easy to abuse the ease of use and implementation of correlative species distribution models. Easy to use software and libraries make it easy to create models without applying good scientific practice. Correlative species distribution models do not incorporate hypotheses or representations of the underlying biology and are thus very likely to make misleading predictions under extrapolation.
Data requirements: Presence only data or presence absence data. If using presence only data then statements can only be made about the relative abundance of occurrence records across a landscape, whereas presence absence data can lead to estimates of a true probability of occurrence.

Resources: There are a wide range of software resources for fitting correlative SDMs. Many can be implemented via the statistical software R. General linear models and generalised additive models are frequently used.

Validation: A wide variety of methods are used to validate correlative SDMs, each with their strengths and weaknesses. One of the most commonly adopted and accepted practices is cross-validation or the use of independent test data. These generally involve assessing the model performance against data that has not been used in model parameterisation and care should be taken that these evaluation datasets are as independent of the training datasets as possible.

Other Uses: The methods used in correlative species distribution modelling are common statistical and machine learning methods that are successfully applied in a wide variety of disciplines.

Leave a comment