Background
– Who are you? asked Mr Doe.
– I’m a Hindu! Namrata from India replied.
– I’m a statistician! said Günther from Germany.
People of different nationalities tend to identify themselves using different characteristics. In India, your identity might rely on your religion, while in other countries your profession might take its place. In Sweden, you might identify yourself with your almost-world-known (!?) personal identification number (“pin”). This 10 digit number is given to you almost immediately after birth and it often stays with you until your very last breath. The number is similar to a “social security number” but it has a much broader use and it is considered public. It is used in public registers (for education, work, tax payment, healthcare, car ownership etc) and it often serves as a membership number or customer id within companies and member unions. It is also essential for example in the public health and quality registers maintained in Sweden (and other Scandinavian countries) and used for reaserch.
Motivation
Naturally, the “pin” is used extensively to distinguish individuals in data sets analysed by R. The number also helps to match data from different sources and it can bring some demographic background data into the bargain, such as birth date (age), sex and geographic origin (depending on your birth year).
Up until now however, with the lack of a consistent R convention to handle “pins”, the number might be treated as either a 10 or 12 digit numeric (with or without century prefix), a character (with hyphen or a ‘+’-sign to distinguish birth date from suffix numbers) or as a factor variable. But the pin is not a number (to add, subtract or logarithm pins is just nonsense) and it contains more information than captured by the individual characters in a string. Luckily, the new R package sweidnumbr
(released on CRAN) is here for rescue!
Example
Let’s look at some data (all pins are fake; they have a valid syntax but do not identify any real individuals):
library(sweidnumbr)
## sweidnumbr: R tools to handle swedish identity numbers.
## https://github.com/rOpenGov/sweidnumbr
knitr::kable(tail(fake_pins,10))
pin | name | |
---|---|---|
53 | 19471130-3022 | TWIST, LIS |
54 | 19440311-1131 | NOBLESSE, RAGNAR JOHN |
55 | 20000805-0523 | NILSSON, CHOK |
56 | 19240622-2286 | CADBURY, LOVISA |
57 | 19020517-1798 | PLOPP, AUGUST |
58 | 20050111-1123 | MINT, MARIA ADA |
59 | 19370215-1590 | NILSSON, BARRY |
60 | 19970430-3023 | BERG, ANTO |
61 | 20031010-1023 | CENTER, PALL |
62 | 20010218-1823 | CACAO, EDA |
So far, pin is just a standard character vector but let’s change that to benefit from all of sweidnumbr
’s features:
pin <- as.pin(fake_pins$pin)
## Assumption: Pin of format YYMMDDNNNC is assumed to be less than 100 years old
str(pin)
## 'AsIs' chr [1:62] "191212121212" "201212121212" "191212121212" ...
We can now also investigate some demographic characteristics almost on the fly (note that pins contained geographical information only up to 1989):
par(mfrow = c(1,2))
hist(pin_age(pin), 20, col = "lightgreen", main = "Age distribution")
## The age has been calculated at 2021-01-29.
pie(table(pin_sex(pin)), main = "Sex distribution")
pin_birthplace(pin[1:8])
## [1] Stockholms län Born after 31 december 1989
## [3] Stockholms län Born after 31 december 1989
## [5] Born after 31 december 1989 Stockholm stad
## [7] Stockholms län Born after 31 december 1989
## 28 Levels: Stockholm stad Stockholms län Uppsala län ... Born after 31 december 1989
Formats
as.pin
can recognize pins in several different formats such as:
as.pin(c("191212121212", "1212121212", "121212-1212", "121212+1212"))
## Assumption: Pin of format YYMMDDNNNC is assumed to be less than 100 years old
## [1] "191212121212" "201212121212" "201212121212" "191212121212"
## Personal identity number(s)
It also checks that the numbers follow the correct pin syntax:
as.pin("181212121212") # Pins were introduced in 1946 and only for people not deceased before that
## Warning in as.pin.character("181212121212"): Erroneous pin(s) (set to NA).
## [1] NA
## Personal identity number(s)
pin_ctrl("191212121211") # The last digit is a control number that is checked against preceeding digits
## [1] FALSE
luhn_algo("191212121211") # The correct control number can be calculated by the Luhn algorithm
## 'multiplier' set to: c(0, 0, 2, 1, 2, 1, 2, 1, 2, 1, 2, 0)
## [1] 2
Organisational numbers
Not only individual has their personal identification number, so do companies and NGO:s. These features are covered by the oin group of functions in the package. Feel free to try them out …
Other countries
An analogous conversion function is availale for the Finnish social security numbers in the sorvi package.
Keep in touch!
… and feel free to suggest enhancements and report bugs to https://github.com/rOpenGov/sweidnumbr/issues