MrExcel Publishing
Your One Stop for Excel Tips & Solutions

How to compare two sets of data


Posted by Brenda on February 13, 2002 8:48 AM

I have 49,000 lines of data that I want to compare against 47,000 lines. I'm trying to locate what data is missing from the 47,000 lines that appears in the 49,000 lines. I have been doing this manually by scanning down thru the data then inserting a blank line when the information does not appear in both sets of data. Is there a way to do this without spending hours and hours?


Posted by Mark W. on February 13, 2002 9:32 AM

Multiple columns? [nt]

Posted by Brenda on February 13, 2002 9:34 AM

Yes

Posted by Brenda on February 13, 2002 9:40 AM

Example of data I am comparing


Owner ID Owner cwellid cwellname cperiod cyear dsuspdate ctime namount
000001 2000 Suspense Adjustment ALL 11 2000 6/22/01 10:55:38 268.62
000020 Neme 73051 EMILY FRANCIS 11 2000 2/2/01 20:41:14 263.54
000020 Name 73051 EMILY FRANCIS 10 2000 2/2/01 20:41:14 248.22
000020 Name 73051 EMILY FRANCIS 09 2000 2/2/01 20:41:14 -114.27

Posted by Mark W. on February 13, 2002 9:53 AM

Suppose...

...that the smaller of your 2 lists was...

{7,8,9
;"a","b","c"
;"D",6,9}

...and the large list was...

{"a","b","c"
;1,"x",3
;7,8,9
;"D",6,"F"}

...add a 4th column to each of these sets using...

=CONCATENATE(A1,";",B1,";",C1)

...which concatenates the column values on each
row and includes a separator (;) between columns.
Important: Pick a separator that's not in your
data.

Finally, in a 5th column of your smaller list
enter the formula...

=MATCH(I1,D:D,0)

where column I contains the concatenated records
from the small listand the column D contains the
concatenated records from the large list.

Match will return #N/A if it can't find a match
or the nth matching record from the large list.

The example above will produce {3;1;#N/A}. So
you know that the 1st and 3rd records are present.

Posted by Brenda on February 13, 2002 9:57 AM

Re: Suppose...

Clarification...Each set contains the same number of columns...each line is a record. I'm trying to find the records that appear in the 49,000 line set that do not appear in the 47,000 line set of data.

Posted by Mark W. on February 13, 2002 10:02 AM

Okay, but this has no impact on my suggestion. Am I missing something? [nt]

Posted by Mark W. on February 13, 2002 10:09 AM

Okay, but this has no impact on my suggestion. Am I missing something? [nt]

Posted by Brenda on February 13, 2002 10:17 AM

Just didn't quite grasp the concept of your reply.

I will study this to see if I can figure it out. Thanks for your help.

Posted by Aladin Akyurek on February 13, 2002 10:21 AM

Re: Example of data I am comparing

Another approach would be using Access. Define first an appropriate table in Access then import both sets of data into the defined table. Access will automatically remove duplicate records. You can then export the filled-in Access table back to Excel.

=========

Posted by Mark W. on February 13, 2002 10:23 AM

Perhaps I was a bit vague about the CONCATENATE() function...

It's cell references should pertain to each list
and is dependent on the column and row references
of each. You may want to consult the Help index
for "MATCH worksheet function" and "CONCATENATE
worksheet function".