For my first thesis, I spend hours comparing taxonomy and spelling with books and online databases. Luckily these days are over - thanks to some recent R packages. Take for example the taxonstand package in R.
Installation: taxonstand is available on CRAN and can be installed directly from R:
> install.packages("taxonstand")
Data cleaning with taxonstand
Let's create a short taxon string.
To keep things simple, we omit any typos and author names:
> spec1 <- c("Stipa capillata",
+ "Salvia tortuosa",
+ "Bryum capillare")
Load taxonstand and use the TPL() function:
> require(Taxonstand)
>TPL(spec1)
## [1] "0 substitutions of standard annotations in specific epithet" ## [1] "0 substitutions of subsp. and var. in infraspecific epithet"
## Genus Species Abbrev Infraspecific Plant.Name.Index TPL_version ## 1 Stipa capillata <NA> TRUE 1.1
## 2 Salvia tortuosa <NA> TRUE 1.1
## 3 Bryum capillare <NA> TRUE 1.1
## Taxonomic.status Family New.Genus New.Species New.Infraspecific
## 1 Accepted Poaceae Stipa capillata
## 2 Accepted Lamiaceae Salvia tortuosa
## 3 Synonym Bryaceae Ptychostomum capillare
## Authority Typo WFormat
## 1 L. FALSE FALSE
## 2 Kunth FALSE FALSE
## 3 (Hedw.) D. T. Holyoak & N. Pedersen FALSE FALSE
In my first string, taxonstand recognized all names and split them successfully into a genus and epithet name. It provided matching family names which is a nice feature. According to TPL (or more correctly: The Plant List), the accepted name for 'Bryum capillare' should be 'Ptychostomum capillare'.
Now, let's make things are little bit more complicated by introducing (unnecessary) author names and spelling mistakes, both in the genus name ('Saalvia' and 'Bryuum') and epithet ('tortuuosa).
> spec2 <- c("Stipa capillata L.",
+ "Salvia tortuuosa",
+ "Saalvia tortuosa",
+ "Bryuum capillare",
+ corr=TRUE)
require(Taxonstand)
TPL(spec2, corr=TRUE)
## [1] "0 substitutions of standard annotations in specific epithet" ## [1] "0 substitutions of subsp. and var. in infraspecific epithet"
## Genus Species Abbrev Infraspecific Plant.Name.Index TPL_version
## 1 Stipa capillata <NA> L. TRUE 1.1
## 2 Salvia tortuuosa <NA> TRUE 1.1
## 3 Saalvia tortuosa <NA> FALSE 1.1
## 4 Bryuum capillare <NA> FALSE 1.1
## 5 TRUE NA <NA> FALSE 1.1
## Taxonomic.status Family New.Genus New.Species New.Infraspecific
## 1 Accepted Poaceae Stipa capillata
## 2 Accepted Lamiaceae Salvia tortuosa
## 3 Saalvia tortuosa NA
## 4 Bryuum capillare NA
## 5 TRUE NA NA
## Authority Typo WFormat
## 1 L. FALSE FALSE
## 2 Kunth TRUE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
In my second string, taxonstand successfully removed an author name (in 'Stipa capillata L.'). Interestingly, it saved the author name as an 'Infraspecific' value.
However, it only recognized one of my three typos. My typo in the species epithet was corrected (in 'tortuuosa'). However, the two typos in the genus name remained unrecognized. In this case, taxonstand just recycled the unrecognized names in columns $New.Genus and $New.Species.
According to Louis Cayuela, this is not the fault of taxonstand but rather of The Plant List. Taxonstand accesses information from the front end of The Plant List. This means basically it extracts the same information as an online user of The Plant List does. For example, if you use the online search field in The Plant List and include a spelling mistake in your species epithet, The Plant List will offer you all species names in the respective genus. Enter a spelling mistake in the genus and you will not get any hits at all.
By the way, you can merge the new genus and species names by using paste:
TPL_spec1 <- TPL(spec1)
## [1] "0 substitutions of standard annotations in specific epithet"
## [1] "0 substitutions of subsp. and var. in infraspecific epithet"
new.names <- paste(TPL_spec1$New.Genus, TPL_spec1$New.Species) new.names
## [1] "Stipa capillata" "Salvia tortuosa" ## [3] "Ptychostomum capillare"
In summary, taxonstand is a nice little package that might help you to clean and standardize your taxon names. But beware of its minor limitations.