Euro+Med base map in R

20/4/2016

Euro+Med is the most extensive and up to date database on vascular plant species in Europe. It produces distribution maps at the country level for almost all taxa. Assembling its base map in R is tricky but a good lesson in advanced handling of spatial data. Here is one way of doing it:

There are several tricks in the code below. First, we need to prepare two datasets that differ in their administrative bounderies. We will download simple country borders (admin_0) from GADM, except for Russia where we will download regional borders (admin_1). Second, we need to simplify our spatial polygons to allow for efficent plotting of our figure. Last but not least, we need to unite Russian subregions into larger subregions used by Euro+Med.

If you are completely new to spatial data handling and map producing in R, check out the links at the very bottom.

1. Download and prepare data for European countries

Load libraries

library(dplyr)
library(magrittr)
library(rgdal)
library(sp)
library(ggplot2)
library(data.table)
library(rgeos)
library(raster)

Define countries shown in map (except Russia; we will download its data later).

ctry <- c("ALB", "AND", "ARE", "ARM", "AUT", "AZE", "BEL", "BGR", "BIH", "BLR", "CHE", 
          "CYP", "CZE", "DEU", "DNK", "DZA", "EGY", "ESH", "ESP", "EST", 
          "FIN", "FRA", "FRO", "GBR", "GEO", "GGY", "GIB", "GRC",  "GRL", "HRV", 
          "HUN", "IMN", "IRL", "IRN", "IRQ", "ISL", "ISR", "ITA", "JOR", "KAZ", 
          "LBN", "LBY", "LIE", "LTU", "LUX", "LVA", "MAR", "MCO", "MDA", "MKD", 
          "MLI", "MLT", "MNE", "MRT", "NER", "NLD", "NOR", "POL", "PRT", "QAT", 
          "ROU", "SAU", "SDN", "SEN", "SJM", "SMR", "SRB", "SVK", "SVN", "SWE",
          "SYR", "TCD", "TKM", "TUN", "TUR", "UKR", "UZB", "VAT", "XKO")

Download shapefiles for these countries (admin_0 level = country borders only) and bind as elements to a list.

europe.l <- vector("list", length(ctry))

for(i in 1:length(europe.l)) {
  europe.l[[i]] <- getData("GADM", country=ctry[i], 
                           level=0, path="/GADM_admin_0")
}

Assign a country name to all list elements.

filenames <- list.files("/GADM_admin_0/", pattern="*.rds", full.names=TRUE)

names.l <-gsub("^.*_0//GADM_2.8_(\\w{3})_.*", "\\1", filenames)
names(europe.l) <- names.l

I did not know the exact projection used in the original map but the one below comes pretty close. Got EPSG code from here http://www.crs-geo.eu/nn_124226/crseu/EN/CRS__Description/crs-pan-european__node.html?__nnn=true

I got the CRS code for this projection here:

EPSG <- make_EPSG()
EPSG[grep("3034", EPSG$code),]

##       code                      note
## 1501  3034     # ETRS89 / LCC Europe
## 3546 23034     # ED50 / UTM zone 34N
## 4162 30340 # TC(1948) / UTM zone 40N
##                                                                                                                                   prj4
## 1501 +proj=lcc +lat_1=35 +lat_2=65 +lat_0=52 +lon_0=10 +x_0=4000000 +y_0=2800000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
## 3546                                                    +proj=utm +zone=34 +ellps=intl +towgs84=-87,-98,-121,0,0,0,0 +units=m +no_defs
## 4162                                                                               +proj=utm +zone=40 +ellps=helmert +units=m +no_defs

Now, change the projection:

my.CRS <- "+proj=lcc +lat_1=35 +lat_2=65 +lat_0=52 +lon_0=10 +x_0=4000000 +y_0=2800000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs"

europe.l <- lapply(europe.l, function(x) spTransform(x, CRS(my.CRS)))

GADM files are large and have a high resolution since the polygons are made up of many points. For the European dataset, we created several million points. This makes data plotting very slow… But since we will plot at a continent-level, we actually don’t need so many points. The gSimplify function applies the Douglas-Peuker algorithm to drop unnecessary points and make curves less smooth.

europe.simp.l <- lapply(europe.l, function(x) 
                      gSimplify(x, tol=100, topologyPreserve=TRUE))

Next, we need to convert (‘fortify’) our spatial dataset to an ordinary data.frame, so that the ggplot2 graphics package can produce its magic.

europe.simp.fort.l <- lapply(europe.simp.l, fortify)

When we bind the polygons later, we need to ensure that regions have unique IDs

europe.fort.2.l <- vector("list", length=length(europe.simp.fort.l))
for(i in 1:length(europe.simp.fort.l)) {
  europe.fort.2.l[[i]] <- mutate(europe.simp.fort.l[[i]], 
                                 region_ID = names(europe.simp.fort.l)[i]) %>%
    inner_join(europe.l[[i]]@data, by=c("region_ID" = "ISO"))}

Now bind all list elements that include the country data to one single data.frame

europe.simp.fort.df <- bind_rows(europe.fort.2.l)

2. Prepare admin_1 data from Russia

Now, we can prepare the spatial data from Russia, which will show polygons at the Euro+Med subregional level. Then, we will attach these data to our large data.frame that includes data from the remaining countries.

Read in data on admin_1 regions in Russia, set new projection and simplify polygons as above.

russia_1 <- getData("GADM", country="RUS", 
                       level=1, path="/GADM_admin_1")


russia_1 <- spTransform(russia_1, 
                           CRS(my.CRS))


russia_1.simp <- gSimplify(russia_1, tol=100, topologyPreserve=TRUE) %>%
                    SpatialPolygonsDataFrame(data=russia_1@data)

We need a table (“lexicon”) that assignes a subregion (region_M) to every admin_1 region. You can download my lexicon here: http://viktoriawagner.weebly.com/links.html

admin.regions <- fread("Continent_Country_Region_names.txt",
                       sep="\t", colClasses = "character", data.table=F, header=T)

Combine the spatial dataset in Russia with our lexicon Name_1 is the key variable (the name of the admin1 regions) Its levels have be identical in both datasets.

russia_1.simp@data <- left_join(russia_1.simp@data, 
                                admin.regions[admin.regions$country=="russia",], 
                                by ="NAME_1")

Shapefile row.names and row.names in the included data must be identical

row.names(russia_1.simp) <- row.names(russia_1.simp@data)

IDs must be adjusted, too.

russia_1.simp <- spChFIDs(russia_1.simp, row.names(russia_1.simp))

Now comes the tricky part, in which we dissolve our admin_1 regions to the higher subregion units.

russia_1.simp.un <- russia_1.simp %>% 
  gUnaryUnion(id = russia_1.simp@data$region_EM)

Fortify the table (convert the shapefile to a data.frame) and join back with the data table of the shapefile to preserve the information on subregions.

russia_1.simp.un.fort <- fortify(russia_1.simp.un)

3. Combine European and Russian data.frames

europe.simp.fort.df %<>% mutate(new.group = paste(ID_0, group, sep=""))

Define a new group ID for Russia, analogous to above We also add “NAME_ENGLISH”, in case we want to query by country name.

russia_1.simp.un.fort %<>% 
  rename(new.group = group, ID_0 = id) %>%
  mutate(NAME_ENGLISH=gsub("^(.*)\\.\\d+$", "\\1",new.group))

In order to bind the rows of both datasets, we need to have the same set of columns in the correct order.

colnames(europe.simp.fort.df)

##  [1] "long"          "lat"           "order"         "hole"         
##  [5] "piece"         "id"            "group"         "region_ID"    
##  [9] "OBJECTID"      "ID_0"          "NAME_ENGLISH"  "NAME_ISO"     
## [13] "NAME_FAO"      "NAME_LOCAL"    "NAME_OBSOLETE" "NAME_VARIANTS"
## [17] "NAME_NONLATIN" "NAME_FRENCH"   "NAME_SPANISH"  "NAME_RUSSIAN" 
## [21] "NAME_ARABIC"   "NAME_CHINESE"  "WASPARTOF"     "CONTAINS"     
## [25] "SOVEREIGN"     "ISO2"          "WWW"           "FIPS"         
## [29] "ISON"          "VALIDFR"       "VALIDTO"       "POP2000"      
## [33] "SQKM"          "POPSQKM"       "UNREGION1"     "UNREGION2"    
## [37] "DEVELOPING"    "CIS"           "Transition"    "OECD"         
## [41] "WBREGION"      "WBINCOME"      "WBDEBT"        "WBOTHER"      
## [45] "CEEAC"         "CEMAC"         "CEPLG"         "COMESA"       
## [49] "EAC"           "ECOWAS"        "IGAD"          "IOC"          
## [53] "MRU"           "SACU"          "UEMOA"         "UMA"          
## [57] "PALOP"         "PARTA"         "CACM"          "EurAsEC"      
## [61] "Agadir"        "SAARC"         "ASEAN"         "NAFTA"        
## [65] "GCC"           "CSN"           "CARICOM"       "EU"           
## [69] "CAN"           "ACP"           "Landlocked"    "AOSIS"        
## [73] "SIDS"          "Islands"       "LDC"           "new.group"

colnames(russia_1.simp.un.fort)

## [1] "long"         "lat"          "order"        "hole"        
## [5] "piece"        "ID_0"         "new.group"    "NAME_ENGLISH"

Change order of columns, then bind datasets by row. A warning might come up that notifies you about coercing to character, which is fine.

europe.simp.fort.df %<>% 
  dplyr::select(long, lat, order, hole, piece, 
                ID_0, NAME_ENGLISH, new.group)

russia_1.simp.un.fort %<>% 
  dplyr::select(long, lat, order, hole, piece, 
                ID_0, NAME_ENGLISH, new.group)

if(identical(colnames(europe.simp.fort.df), colnames(russia_1.simp.un.fort))==T){
  europe.df <- rbind(europe.simp.fort.df, russia_1.simp.un.fort)
} else{"Binding unsuccessful"}

4. Produce plot

Set themes (the “look” of the figure) for ggplot2:

my_theme <- theme(axis.text.y = element_blank(),
                  legend.position = "none",
                  panel.background = element_rect(fill = "white"),
                  panel.grid.major = element_line(colour = "white"),
                  panel.grid.minor = element_line(colour = "white"),
                  axis.ticks = element_blank(), 
                  axis.text.x = element_blank(),
                  axis.line = element_blank(),
                  axis.title.y = element_blank(),
                  axis.title.x = element_blank())

Set the horizontal and vertical limits of the map:

longlimits <- c(1000000, 7436135.8)
latlimits <- c(-509971, 7034494)

Define the colors that polygons should be filled with:

non.EM <- c("Asian part of Russia", "Chad", "Greenland","Kazakhstan", 
            "Mauritania", "Mali","Niger","Qatar","Saudi Arabia", 
            "Senegal", "Sudan", "Turkmenistan", "United Arab Emirates",
            "Uzbekistan", "Western Sahara")

europe.df %<>% 
  mutate(fill.col = ifelse(NAME_ENGLISH %in% non.EM, "EM", "non_EM"))

Write figure to file. This can take some minutes.

ptm <- proc.time()
png("test_red_0.2.png", 
    width = 4, height = 4, units = 'in', res = 200)
ggplot(data=europe.df,   aes(long, lat, group=new.group, fill=fill.col)) +
  geom_polygon(color=NA) + 
  coord_cartesian(xlim = longlimits, ylim = latlimits) + 
  my_theme+
  scale_fill_manual(values=c("grey", "darkgreen"), guide="none")
dev.off()

## 2

diff.png <-  proc.time() - ptm 

diff.png

##    user  system elapsed 
## 253.132   2.039 256.016

Some useful links if you want to learn the basics of spatial data handling and map production in R:
http://eriqande.github.io/rep-res-web/lectures/making-maps-with-R.html
http://rpsychologist.com/working-with-shapefiles-projections-and-world-maps-in-ggplot
http://www.maths.lancs.ac.uk/~rowlings/Teaching/UseR2012/cheatsheet.html

I found these sites helfpul when compiling the Euro+Med base map:
http://dev.openlayers.org/examples/simplify-linestring.html
http://gis.stackexchange.com/questions/62292/how-to-speed-up-the-plotting-of-polygons-in-r

0 Comments

Your comment will be posted after it is approved.