Tag Archives: dedupe

Nothing About Data Cleansing Is Easy

Too many companies are building their marketing programs based on lousy data.  While consumer databases are easily overwhelmed with the staggering volume of available information, B2B databases are inherently more complex and once they start to deteriorate- downright ugly.

This is actually a post for smaller businesses about setting up your database to run an append, but the more you look into the subject, the more you’ll tighten up the controls on what goes into your database in the first place so that you’re not overwhelmed with what are actually the first simple steps. You might even want to consider setting aside a small portion of the budget you assign to any marketing program that uses your database in order to improve your database information every time you use it.

Appending your database is simply a process by which the companies in your database are matched up with those in another master database and once matched, some of the empty fields in your data can be filled with the information from the other base. There are of course limitations to what can be filled in with any hope of accuracy and while you can use appends to pull up standard industry information, phone numbers, some web data and executive names, I’m highly dubious of the quality of the contact, title and individual direct line and email information for anyone but the most public figures within an organization. Additionally, you have to consider the limitation contained in  the phrase “matched with another master database”, because  the match rates might actually be very poor, which means you’ll still have a lot of holes when you’re finished.  Oh yes, and its far from free.

What that means is that before you can move ahead there are three things that must be done:

Select Your Files.  You need to determine which data in your base is worth spending the money on.  Lead data that is very fresh is one thing, but do you really think its a good use of your money to fill in empty data fields for a lead you generated five years ago and never responded to your subsequent efforts to convert?   Once you’ve made that decision, your first task will be to isolate that data. How will you do that?  If it’s by sorting to a code/date/source that was never entered in the first place, you will have just hit the first of what will probably be many snags.

Identify and Remove Your Duplicates.  As with the previous task, this sounds easy but it can be a terrible job, but really, you don’t have much of a choice if you ever want to clean up your data. There are a few obvious places to start. For example if you actually have contacts in your database that are flagged as as duplicates or no longer working with the company and/or you have companies that are flagged as duplicates or out of business, why are they still there.  It might seem as if they are already discounted enough to ignore, but that’s just because you’re not the sales rep who is manually working with the data and might just not notice that little field in the corner that identifies the contact file you just entered your notes and next steps into is the duplicate file?  Sound stupid?  I can assure you it happens a lot and now you have good information in a bad file.  The other challenge many companies will face with the simple question of duplicates is to identify which is actually the good record and which is the dupe that can be removed.  It’s not at all out the question that at some point you’re going to have to put a real set of eyeballs on your data to make the decisions you can’t trust the software to make. Tedious, expensive and time consuming work it is, too.

Clean and Standardize Your Remaining Records.  When you begin the append, mop to clean datayou’ll be able to get a half decent match rate if you’re working with data that has been cleansed.  That means that at the very least, numbers have to be formatted consistently, address formatting and abbreviations also need to be standardized.  There is software that will help you clean up your data and get it into the right format to maximize your match rates.

Right about this point, if not already, you’ve probably at least made a few scratch notes on new database entry policies around data formatting, duplicate checking and key information fields, so that you might not have to go through this again, or at least, not for a while.