Blog

Find the latest trends, topics, and news
in identity verification.

What is the definition of “Good” Data? Part I

In my last blog, we discussed the definition of “good” data and ended it with four questions that you should ask when licensing international reference data:

When was the dataset created?
How many sources comprise the dataset?
Are the sources governmental, private or both?
How many unique records are in the dataset?

Today I’ll dive into the first two questions and attempt to describe why these questions really matter.

When was the dataset created?

Reference data changes and grows based on the type of data and the locality. Here are some examples I took from THE CIA WORLD FACTBOOK 2011 and 2012.

In Ecuador the number of mobile phone users grew from 11.5 million to 13.63 million from 2011 to 2012

In Estonia the number of Internet users grew from 888,100 to 971,700 from 2011 to 2012

Even in the US over 600,000 privately owned housing permits were issued in November (http://www.census.gov/construction/nrc/pdf/newresconst.pdf)

In most parts of the world, if you are using data that is more than three years old, it is quite likely you are missing large segments of the audience you are trying to address.

How many sources comprise the dataset?

The standard practice of most vendors utilizing reference data within their solutions is to license a single source of data for each country/locality being covered. Depending on your particular need this approach may work. The better approach is leveraging a vendor who aggregates multiple data sources into a single reference file for its coverage area.

An example of this is the standard practice of relying solely on postal reference data for address information. In the U.S., the USPS licenses its postal data files for use in varying software packages. Many vendors license only this file and declare job done. If vendors were to combine the USPS file with a geolocation reference file and an E911 file the results would be significantly improved over the license of a single file. Why use an E911 file? This file contains descriptive information for streets with multiple names. It was compiled to help emergency responders locate the correct location of a call quickly. Yes, in the U.S. we have streets that have multiple names. This is especially prevalent in rural areas. A road may be known as Rural Route 232421 but the E911 database will list this road as Willy Bill Highway.

This is important for varied reasons including but not limited to localized marketing and routing logistics planning.