A new way to do big data with entity resolution

I have this hope that most of you reading this post aren’t criminals, or terrorists. So this might be interesting to you, if you want to know how they think and carry out their business. Their number one technique is called channel separation, the ability to use multiple identities to prevent them from being caught.

Let’s say you want to rob a bank, or blow something up. You use one identity to rent the getaway car. Another to open an account at the bank. And other identities to hire your thugs or whatnot. You get the idea. But in the process of creating all these identities, you aren’t that clever: you leave some bread crumbs or clues that connect them together, as is shown in the diagram below.

This is the idea behind a startup that has just come out of stealth called Senzing. It is the brainchild of Jeff Jonas. The market category for these types of tools is called entity resolution. Jonas told me, “Anytime you can catch criminals is kind of fun. Their primary tradecraft holds true for anyone, from bank robbers up to organized crime groups. No one uses the same name, address, phone when they are on a known list.” But they leave traces that can be correlated together.

Jonas started working on this many years ago at IBM. He is trying to disrupt the entity resolution market and eventually spun out Senzing with his tool. The goal is that you have all this data and you want to link it together, eliminate or find duplicates, or near-duplicates. Take our criminal, who is going to rent a truck, buy fuel oil and fertilizer, and so forth. He does so using the sample identities shown at the bottom of the graphic. Senzing’s software can parse all this data and within a matter of a few minutes, figure out who Bob Smith really is. In effect, they merge all the different channels of information into a single, coherent whole, so you can make better decisions.

Entity resolution is big business. There are more than 50 firms that sell some kind of service based on this, but they offer more of a custom consulting tool that requires a great deal of care and feeding and specialized knowledge. Many companies end up with million-dollar engagements by the time they are done. Jonas is trying to change all that and make it much cheaper to do it. You can run his software on any Mac or Windows desktop, rather than have to put a lot of firepower behind the complex models that many of these consulting firms use.

Who could benefit from his product? Lots of companies. For example, a supply chain risk management vendor can use to scrape data from the web and determine who is making trouble for a global brand. Or environmentalists looking to find frequent corporate polluters. A finservices firm that is trying to find the relationship between employees and suspected insider threats or fraudulent activities. Or child labor lawyers trying to track down frequent miscreants. You get the idea. You know the data is out there in some form, but it isn’t readily or easily parsed. “We had one firm that was investigating Chinese firms that had poor reputations. They got our software and two days later were getting useful results, and a month later could create some actionable reports.” The ideal client? “Someone who has a firm that may be well respected, but no one actually calls” with an engagement, he told me.

Jonas started developing his tool when he was working at IBM several years ago. I interviewed him for ReadWrite and found him fascinating. An early version of his software played an important role in figuring out the young card sharks behind the movie 21 were taking advantage of card counting in several Vegas casinos, and was able to match up their winnings all over town and get the team banned.  Another example is from  Colombia universities who saved $80M after finding 250,000 fake students being enrolled.

IBM gets a revenue share from Senzing’s sales, which makes sense. The free downloads are limited in terms of how much data you can parse (10,000 records), and they also sell monthly subscriptions that start at up to $500 for the simplest cases. It will be interesting to see how widely his tool will be used: my guess is that there will be lots of interesting stories to come.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.