Combining data from two worksheets into one?


Posted by Brad Taplin on September 19, 2001 12:12 AM

Greetings!

I am under the gun to combine data from two worksheets into one. The 26 fields and column formats are the same, but the 17,000+ rows in each differ somewhat.

Each of these sheets contains a few records not in the other, and for every record different fields populated in different sheets. Only one sheet, for instance, has a VoterID listed for every entry, and only the second sheet has anything in the Comments field.

Here's an example: in Sheet1 Joe Blow at 123 West Snow Street may be row 9, where Sheet2 may have him in row 5. Joe's Sheet1 record includes a VoterID, and Joe's Sheet2 record includes some comments. We need one worksheet to contain everything about Joe, once. The combined sheet must also contain all records that had been unique to one or the other worksheet.

Please advise ASAP. I have been advised to try VLOOKUP but am no expert in Excel. I need step-by-step advice, not necessarily specific to my particular sheets but specific to this general challenge.

Thanks in advance.
-Brad
taplinb@mac.com

P.S. I do Excel on a PC though I use a Mac for email.

Posted by Eric on September 19, 2001 8:54 AM

Rough sketch of what I'd try

First, save the workbook to a new name so I don't completely screw you up!. Next get an overall list. There are two ways I can think of

1a) advanced filter way: copy your lists (just the column A names) from both sheets and paste them to a 3rd sheet but stack them in col A- so if there are 17,000 rows in each list, you'll have 34,000 total in this pasted column. Insert a row at the top and type the label "names".

With Sheet3!A1 selected, go to Data-->Filters-->advanced. It should automatically select your sheet3!A2:A34,000, skip the "criteria range" box, check the "unique records only" box, and depending on how conservative you're feeling, you can either filter in place or click the "copy to another location" radio button and select another cell on the sheet where you want it to begin pasting the results. Then copy the resultant list to sheet3!a3, and label the list in a2 with something meaningful.

1b) Vlookup way (you'll probably need this formula later so you can get used to it here if you want). Warning, with this much data, after you're satisfied with your vlookup results, you should convert them from formulas to values (Copy, then Edit-->Paste Special-->Paste as values) to avoid some excessive excel grinding)

Vlookup asks -what do I look up?, where do I look it up(in what array- rectangular section of spreadsheet)?, how many columns over from the first column in the array should I look?, and how alike do the lookup and interrogated values need to be for the lookup value to be declared"found"?
So ideally, if you have all of your names in col A on sheets 1 and 2, then you can crosscheck both lists for "mismatches".

In an available column on sheet1, type the following in row 2 and copy down
=vlookup(a2,sheet2!a$2:a$17000,1,false)
if the value in the corresponding row of A has a match in sheet2, then the name will show up, otherwise you'll get a "#N/A error. The errors are what you're interested in, they are the names without a match in sheet 2.

You could then do the same with sheet 2. In an available column enter:
=vlookup(a2,sheet1!a$2:a$17000,1,false)
and copy down. Copy all of the names with a "#N/A" on their row (you could use sort or autofilter to get them all together- data that size may sort faster than it will filter) to the bottom of the other list to get a "complete" list.

2) Now for pulling the different columns of data together. Let's say as an example that columns b-d are unique to each sheet.

2a) If you used the advanced filter
Put these numbers in this order
2 in b1
3 in c1
4 in d1
2 in e1
3 in f1
4 in g1
in b2-c2 put the column labels from sheet 2 b2-c2
in d2-g2 put the column labels from sheet 1 b2-c2
in b3 type the formula
=vlookup(a3,sheet2!$a$2:$a$17000,b$1,false)
and copy over to d3 and down to end of list
in e3 enter the formula
=vlookup(a3,sheet1!$a$2:$a$17000,e$1,false)
and copy over to g3 and down to end of list.

2b) If you used vlookup
You've copied the "#N/A" names from sheet 1 to the bottom of the name list on sheet2. First insert a row at the top of the sheet. Then label columns e-g with the column labels from sheet1, and enter the number 2 in e1, 3 in f1, and 4 in g1. Now in e2 enter the formula
=vlookup(a3,sheet1!a$2:d$17000,e$1,false)
copy across to g3 and down to the end of the list. For names that appear in sheet 1 but not sheet 2 you will again get the "#N/A" error.

3) Now you should have both of your sheets merged with all of the information you need, however, all of these "live" formulas are going to make working with this sheet difficult. So select the entire page, copy, Edit-->paste special, as values, enter. Then, Edit-->find and replace
find: #N/A, replace with: (don't enter anything in this box), if there's something already there, then delete it! click on "replace all". If you used 1b (the vlookup way) to get your list, then you can delete that column now. This should result in a manageable worksheet.

hope that helps

Posted by Eric on September 19, 2001 8:57 AM

correction for 1a

I forgot to say that you should delete the list in sheet3 col A prior to moving the filtered list over to sheet3!a3



Posted by Brad Taplin on September 21, 2001 3:46 AM

VLOOKUP is great, BUT 1st to make columns A match?

Many thanks to Dave, who emailed me, and to Eric for posting replies to the mrexcel.com message board. They gave similar advice which seemed to almost get me there. The formula from Dave:

=IF(ISERROR(MATCH($A2,Names1,0)),IF(ISERROR(MATCH(B$1,Fields2,0)),"-",VLOOKUP($A2,Table2,MATCH(B$1,Fields2,0)+1,FALSE)),IF(ISERROR(MATCH(B$1,Fields1,0)),"-",IF(VLOOKUP($A2,Table1,MATCH(B$1,Fields1,0)+1,FALSE)="",IF(ISERROR(MATCH($A2,Names2,0)),"-",IF(ISERROR(MATCH(B$1,Fields2,0)),"-",IF(VLOOKUP($A2,Table2,MATCH(B$1,Fields2,0)+1,FALSE)="","-",VLOOKUP($A2,Table2,MATCH(B$1,Fields2,0)+1,FALSE)))),VLOOKUP($A2,Table1,MATCH(B$1,Fields1,0)+1,FALSE))))

This worked using named ranges Table1, Names1, Fields1, Table2, Names2, and Fields2, but only for the first few records. Sheet3 began with a no-dupes concatenation of all entries in columns A from both source sheets, but it seems Sheet1:A12 must match Sheet2:A12 for row 12 to work OK.

Note that I had already made new column A in all sheets "Name_and_Addr" as a concatenation of Lastname, Firstname, House, and Street, and then of course did a paste special to convert the concatentation to fixed values in this column. This was to simplify sorting unique rows.

If I understand the problem, how do I first plug all rows from Sheet2 into Sheet1 which lack like data for column A in Sheet1, and visa versa? More dupes are no concern as long as I redo column A in Sheet3 after this and then the named ranges.

Last week I was thinking of this as a SQL effort, but the user is far more likely to understand and then own this in Excel than in Access. I want to teach the process (once I get it) so I can pass the buck when appropriate and fry other fish.

Note that the sheets are from different months. Some move in, some move out, names may change, etc. A sort won't make columns A from different months match. Making fieldnames and formats identical in the sheets was easy, but this? With 17,000+ rows to handle every so often, automation is essential.

If I misunderstood, if matching columns A is not the problem, please advise where else the problem might be.

Thanks again!
-Brad
taplinb@mac.com

------- from original problem ------
Here's an example: in Sheet1 Joe Blow at 123 West Snow Street may be row 9, where Sheet2 may have him in row 5. Joe's Sheet1 record includes a Precinct, and Joe's Sheet2 record includes some comments. We need Sheet3 to contain everything about Joe, once. The combined sheet must also contain all records that had been unique to one or the other worksheet...