Entity resolution requires strategy, particularly in weeding out duplicate records stored within the same data system. The very nature of finding these duplicates requires commonality among certain attributes like full names, addresses, birth dates and gender. For the most part, these attributes are inconsistent and require specialized algorithms to help determine a level of confidence.
For example, full names can often differ among two records (Jimmy, Jim and James). Addresses change due to moving. Birth dates are often missing. Even gender can be up for debate. (Is “Chris” equivalent to Christina or Christopher?)
The only attribute that does seem to provide a level of confidence, at least for US populations, is the Social Security Number (SSN). SSN is often the most critical attribute to determine duplication, so much so that several of the healthcare projects with which I have worked assume the same SSN in two different patient records to be synonymous with duplication, despite any other potential unique identifiers associated with those records.
The Social Security Number came into existence in 1936 and was intended to be used for tax purposes only. In fact, in 1946 until 1972, the legend “FOR SOCIAL SECURITY PURPOSES–NOT FOR IDENTIFICATION” appeared on each card. Thus, the tradition of using the Social Security number to distinguish among US patient records appeared to be one performed out of convenience rather than design.
This strategy worked well in a pre-Internet world, but it is slowly becoming obsolete. Private information has become increasingly easy for hackers to access and SSN-only fraud makes up the majority of cases in identity theft.
SSN as a valid authenticator has been so disconcerting that, in 2002, IBM requested that the more than 100 companies offering health insurance to its workers refrain from printing Social Security numbers on their insurance cards. The handful of health insurance companies that responded to this request replied with a blunt “no.”
In several other countries, such as Denmark and Australia, the use of a national healthcare ID has made some traction that is being scrutinized by other Western countries.
The U.S. military has begun to take some initiative on this front. In response, to congressional mandate, the DoD has begun issuing to all U.S. military personnel a new identification card with a unique reference number. This value, known as the DOD EDI PN, will be printed on the face of the card in place of the SSN, effectively serving the identification purpose that the SSN was never intended to have.
More importantly, because the DOD EDI PN is associated with an individual role and does not have dual-purposes, say, those associated with credit card and banking information, the potential risks associated with exposing a DOD EDI PN to the general public is greatly reduced.
The goal of establishing a unique number for all DOD personnel will take several years and there will be undoubtedly be some bumps along the way. In other private sectors, the use of the Social Security Number does not appear to be going anywhere soon. Too many legacy systems use it for finding uniqueness, and it is still the best identifier until a new and inexpensive standard comes along.
Until that day comes, we will continue to provide our private information across open networks, avoiding the bad guys as best we can. It is ironic that in our social age, our Social Security number is anything but secure.