This function takes administrative names and cleans them using various matching and string distance algorithms. It can also match the cleaned names with a base list provided by the user or fetched from `GeoNames`, which is a official repository of standard spellings of all foreign geographic names.
Usage
clean_admin_names(
admin_names_to_clean,
country_code,
admin_level = "adm2",
user_base_admin_names = NULL,
user_base_only = FALSE,
report_mode = FALSE
)
Arguments
- admin_names_to_clean
A character vector of administrative names to clean.
- country_code
sed if `use_get_admin_names` is TRUE. A character string or numerical value of the country code (e.g., "KE"). This can be in various formats such as country name, ISO codes, UN codes, etc., see
countrycode::codelist()
for the full list of codes and naming conventions used.- admin_level
A character string indicating the administrative level (e.g., "adm2").
- user_base_admin_names
A character of of administrative names that the use would like to use as reference. This is no necessary, downloaded `GeoNames` will be used if missing.
- user_base_only
A logical indicating whether to use only the user-provided base administrative names (`user_base_admin_names`) for matching. If TRUE, `country_code` and `admin_names_to_clean` are not required. Default is FALSE.
- report_mode
A logical indicating whether to return a detailed report. Default is FALSE.
Value
If `report_mode` is set to TRUE, a data frame containing the original admin names and the matched and cleaned admin names with inormation of the source of data used to clean including the algorithm used, else a cleaned list of names is returned.
See also
countrycode::codelist()
for the full list of codes and naming conventions.
Examples
# \donttest{
# Example with country code
base_names <- c(
"Paris", "Marseille", "Lyon",
"Toulouse", "Nice", "Nantes", "Strasbourg",
"Montpellier", "Bordeaux", "Lille"
)
unclean_names <- c(
"Pariis", "Marseill", "Lyone",
"Toulous", "Niice", "Nantees", "Strasbourgh",
"Montpeelier", "Bordeuax", "Lilie"
)
france_new <- clean_admin_names(
country_code = "Fr",
user_base_admin_names = base_names,
admin_names_to_clean = unclean_names
)
#> There are 10 out of 10 (100%) admins that have been perfectly matched!
#> Use `report_mode` to double check your matches.
print(france_new)
#> [1] "Paris" "Marseille" "Lyon" "Toulouse" "Nice"
#> [6] "Nantes" "Strasbourg" "Montpellier" "Bordeaux" "Lille"
# }