Introduction

We are now living in the anthropocene. The biodiversity of our planet faces unprecedented threats. Fortunately new developments in technology and data collection enable conservation scientists to attempt new approaches in addressing the situation. Historically there have been many attempts in creating atlases which contain information about the species locations. So far though those have been focused on mapping observation data only. The abundance of satellite data climate data can be combined with advances in machine learning to create “smarter” maps. Those maps take into account environmental data and create visualisations which are much more advanced and cover much larger areas even where observation data is missing. The current project aims to create such maps for endangered species.

What is Species Distribution Modelling (SDM)?

SDM or Ecological Niche Modelling (how it is sometimes named), is a relatively new filled in computational biology. It allowed to take into account environmental data and species occurrence data to create maps of the potential habitat. Those models and maps can be used not only to map the niches of the species in question, but also estimate the effects of changed environmental conditions and predict invasive species distributions.

For almost all of the procedures the R package sdmench was used. For more information refer to the Github repository or the Journal of Open Source Software paper.

Species selection

For this project the initial focus are European species. The species list is from the IUCN European Red List of Threatened Species.

Observation data

The observation data is obtained from the Global Biodiversity Information Facility (GBIF).

Environmental data

The environmental data consists of the so called “bioclimatic” variables. Those are derived by experts from more general measurements. A detailed description is available on the Worldclim website.

For the "future" climate selection the CMIP5 climate model (50 years) was used.

Machine learning

For the first version of the atlas the Random Forest algorithm was used. This selection is based on the well known out-of-the box good performance on a variety of regression and classification tasks.