The late Peter Falk played Lieutenant Columbo, the iconic television detective from the 1970s. The cigar-smoking, trench-coat wearing Columbo was shabby and unassuming and his murder suspects – usually the rich and powerful living in Los Angeles– would initially judge the book by its cover. But Columbo, bumbling as he was, was a brilliant strategist. His line of polite questioning eventually unraveled the complex schemes his suspects had crafted. (As he was about to leave, he would say, “Oh… just one more thing…” which became his famous catchphrase)
The point was not discovering how the murder was conducted: viewers usually witnessed the prologue and aftermath of the crime. The point was watching Columbo put the pieces of the crime together through logical reasoning, miscellaneous clues and contradictory statements from his suspects. No matter how confident the perpetrator, no crime was perfect, and we were able to watch a true master at work.
Data analysts facing data quality errors operate much like Lieutenant Columbo. They start with a hypothesis about what went wrong with the data (i.e., what “killed” it) and then take the steps to weed out the suspect.
As it relates to data systems, this means going back in time to pinpoint the scene and source of the crime. As it relates to business owners, this entails diplomatic questioning to determine which business rules were involved in the data manipulation and the exceptions that might have been involved.
Once the suspect has been targeted, like Columbo, the data analysts must take the difficult step to prove the source of the errors to others, to make even the most skeptical stakeholder say, “Yes, I see it now”. Due to politics and territorial pride, this step often has its own perils, but determined data analysts must be willing to pursue this path, despite the challenges presented by those around them.
This type of detective work is a technique not just limited to human beings. One use of suspect identification can be found through Akinator, an entertaining version of the classic car trip game called 20 Questions. I’m not exactly clear how the process works on this one, so curious readers might want to try it out and shed some ideas in the comments below.
Another impressive way to weed out criminals employs a technique called Non-Obvious Relationship Awareness (NORA). NORA enables systems to find real-time relationships amid the stream of name inputs, credit card transactions, terrorist watch lists, monetary transfers, and so on. When a conflict of interest among relationships (e.g., bad guy records comingled with good guy records) is found among otherwise unrelated data, the system sends an alert to the appropriate authorities.
NORA has proven quite effective in recent years. For example, IBM Infosphere Identity Insight has been used to save institutions millions of dollars in lost revenue due to money laundering schemes, credit card fraud, and tax evasion. In these studies, NORA has become a financial Columbo, combining bits of seemingly irrelevant data to systematically solve crime.
NORA is just the next evolution and there is more to come. The world moves faster than ever and computers will continue to support this effort on a grander scale. The work of interrelationships for the data detective undoubtedly will become easier over time and, with each new leap in technology, there will always be “just one more thing.”