Adding content to blank cells from duplicate entries

alithebarman

New Member
Joined
Nov 5, 2010
Messages
7
<p>Hi guys, my attempts to Google this issue faulted when I couldn't find a way of describing this issue.</p>

<p>I have list of 50,000 contacts with many many duplicates. I would like excel to fill in the blank cells where I have data elsewhere in the spreadsheet. For example:</p>

<table width="100%" border="1">
<tr>
<td> </td>
<td>John</td>
<td>Smith</td>
<td>director@aus.com</td>
<td>Director</td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td>Smith</td>
<td>director@aus.com</td>
<td> </td>
<td>australia</td>
</tr>
<tr>
<td>Mr</td>
<td>John</td>
<td>Smith</td>
<td>director@aus.com</td>
<td> </td>
<td> </td>
</tr>
</table>

<p>Between all three row, all of John Smith's data is available. If I can use his email address and surname as the often duplicated value, how can get excel to pull in all the data</p>

<p>I really have no idea where to start with this one. If possible I'd like to stick with formula as my VBA isn't too great, but I'm not sure if it can be done.</p>

<p>Appreciate any and all advice</p>
 

Excel Facts

Does the VLOOKUP table have to be sorted?
No! when you are using an exact match, the VLOOKUP table can be in any order. Best-selling items at the top is actually the best.
hi alithebarman

Does your example show your data at the start, or is it what the end result should look like?

What does this mean "Between all three row, all of John Smith's data is available"? Where between all three row are these data? and what do you want done with them?

"how can get excel to pull in all the data ... " Pull what data from where? and to where does it go?
 
Upvote 0
<p>Mirabeau, thanks for the response.</p>
<p>This shows the data at the start. I have several tables from several contacts databases that I have put into the one sheet. Each row represents one contact entry. Each source may have had duplicate entries for the same contact or none at all, and duplicate entry might have different information.</p>
<p>I know I'm working with relativly poor data quality but what I'm hoping to do is to remove as many duplicates as possible whilst retaining as much information as possible. So the end result would be:</p>
<table width="100%" border="1">
<tr>
<td>Mr</td>
<td>John</td>
<td>Smith</td>
<td>direct@aus.com</td>
<td>director</td>
<td>australia</td>
</tr>
</table>
<p>The way I planned to do that was for Excel to find a aset of duplicates based on several columns, last name and email for example, and complete the row by looking for a blank cell, in country column for example and taking the data from where another cell had "australia" in the country column.</p>
<p>I could then go in a remove the duplicates entries easily. Whilst I'm not expecting high data quality out of this, it's a good start and it means I can remove the duplicates whilst retaining as much information as possible.</p>
<p>Again I really apprecaite any advice.</p>
 
Upvote 0
alithebarman,

did you actually post any data? i couldn't see anything except the one line of result

perhaps you should look at the FAQ of this forum to see how to post data?
 
Upvote 0
Thanks for the hint, I used the Data sampler in HTML. An example of the data is:

<b>Excel 2010</b><table cellpadding="2.5px" rules="all" style=";background-color: #FFFFFF;border: 1px solid;border-collapse: collapse; border-color: #BBB"><colgroup><col width="25px" style="background-color: #DAE7F5" /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /></colgroup><thead><tr style=" background-color: #DAE7F5;text-align: center;color: #161120">
<th>A</th><th>B</th><th>C</th><th>D</th><th>E</th><th>F</th><th>G</th><th>H</th><th>I</th><th>J</th><th>K</th><th>L</th><th>M</th><th>N</th><th>O</th><th>P</th><th>Q</th><th>R</th><th>S</th><th>T</th><th>U</th><th>V</th></tr></thead><tbody><tr ><td style="color: #161120;text-align: center;">1</td><td style="background-color: #00B0F0;;">Source</td><td style=";">Additional Info</td><td style="background-color: #00B0F0;;">First Name</td><td style="background-color: #00B0F0;;">Last Name</td><td style=";">Type</td><td style="background-color: #00B0F0;;">Title</td><td style=";">Company</td><td style=";">Tags</td><td style="background-color: #00B0F0;;">Notes</td><td style="background-color: #00B0F0;;">Id</td><td style="background-color: #00B0F0;;">Primary Email</td><td style="background-color: #00B0F0;;">PrimaryPhone</td><td style="background-color: #00B0F0;;">PrimaryCell</td><td style="background-color: #00B0F0;;">PrimaryFax</td><td style="background-color: #00B0F0;;">Primary Website</td><td style="background-color: #00B0F0;;">Primary Address_1</td><td style="background-color: #00B0F0;;">Primary Address_2</td><td style="background-color: #00B0F0;;">Primary City</td><td style="background-color: #00B0F0;;">Primary State</td><td style="background-color: #00B0F0;;">Primary Zip_code</td><td style="background-color: #00B0F0;;">Primary Country</td></tr><tr ><td style="color: #161120;text-align: center;">39172</td><td style=";">Dance NPRG</td><td style="text-align: right;;"></td><td style=";">Juanita</td><td style=";">Caddy</td><td style="text-align: right;;"></td>
<td style=";">ISC Funding and Stakeholder Engagement</td><td style=";">Department of Education, Employment and Workplace Relations DEEWR FED</td><td style="text-align: right;;"></td><td style="text-align: right;;"></td><td style="text-align: right;;"></td>
<td style=";">xxx@deewr.gov.au</td>
<td style=";">02 1234 5678</td><td style=";"></td><td style=";"> </td><td style=";">http://www.deewr.gov.au</td><td style=";"></td><td style="text-align: right;;"></td><td style=";"></td><td style="text-align: right;;"></td><td style=";"></td><td style="text-align: right;;"></td></tr><tr ><td style="color: #161120;text-align: center;">39173</td><td style=";">VACD NPRG</td><td style="text-align: right;;"></td><td style=";">Juanita</td><td style=";">Caddy</td><td style="text-align: right;;"></td><td style=";"></td><td style=";">Department of Education, Employment and Workplace Relations DEEWR FED</td><td style="text-align: right;;"></td><td style="text-align: right;;"></td><td style="text-align: right;;"></td>
<td style=";">xxx@deewr.gov.au</td>
<td style=";">02 1234 5678</td><td style=";"></td><td style="text-align: right;;"></td><td style=";">http://www.deewr.gov.au</td><td style=";">Gpo Box</td><td style="text-align: right;;"></td><td style=";">Canberra</td><td style=";">Australian Capital Territory</td><td style="text-align: right;;">2601</td><td style=";">Australia</td></tr></tbody></table><p style="width:4.8em;font-weight:bold;margin:0;padding:0.2em 0.6em 0.2em 0.5em;border: 1px solid #BBB;border-top:none;text-align: center;background-color: #DAE7F5;color: #161120">Contacts</p><br /><br />

You can see that the Surname and Email address are the same but the first row has the title and the second row is an address. I'd like to merge these two together. So the end result would be:

<b>Excel 2010</b><table cellpadding="2.5px" rules="all" style=";background-color: #FFFFFF;border: 1px solid;border-collapse: collapse; border-color: #BBB"><colgroup><col width="25px" style="background-color: #DAE7F5" /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /><col /></colgroup><thead><tr style=" background-color: #DAE7F5;text-align: center;color: #161120">
<th>A</th><th>B</th><th>C</th><th>D</th><th>E</th><th>F</th><th>G</th><th>H</th><th>I</th><th>J</th><th>K</th><th>L</th><th>M</th><th>N</th><th>O</th><th>P</th><th>Q</th><th>R</th><th>S</th><th>T</th><th>U</th><th>V</th></tr></thead><tbody><tr ><td style="color: #161120;text-align: center;">1</td><td style="background-color: #00B0F0;;">Source</td><td style=";">Additional Info</td><td style="background-color: #00B0F0;;">First Name</td><td style="background-color: #00B0F0;;">Last Name</td><td style=";">Type</td><td style="background-color: #00B0F0;;">Title</td><td style=";">Company</td><td style=";">Tags</td><td style="background-color: #00B0F0;;">Notes</td><td style="background-color: #00B0F0;;">Id</td><td style="background-color: #00B0F0;;">Primary Email</td><td style="background-color: #00B0F0;;">PrimaryPhone</td><td style="background-color: #00B0F0;;">PrimaryCell</td><td style="background-color: #00B0F0;;">PrimaryFax</td><td style="background-color: #00B0F0;;">Primary Website</td><td style="background-color: #00B0F0;;">Primary Address_1</td><td style="background-color: #00B0F0;;">Primary Address_2</td><td style="background-color: #00B0F0;;">Primary City</td><td style="background-color: #00B0F0;;">Primary State</td><td style="background-color: #00B0F0;;">Primary Zip_code</td><td style="background-color: #00B0F0;;">Primary Country</td></tr><tr ><td style="color: #161120;text-align: center;">39173</td><td style=";">VACD NPRG</td><td style="text-align: right;;"></td><td style=";">Juanita</td><td style=";">Caddy</td><td style="text-align: right;;"></td><td style=";">ISC Funding and Stakeholder Engagement</td><td style=";">Department of Education, Employment and Workplace Relations DEEWR FED</td><td style="text-align: right;;"></td><td style="text-align: right;;"></td><td style="text-align: right;;"></td>
<td style=";">xxx@deewr.gov.au</td>
<td style=";">02 1234 5678</td><td style=";"></td><td style="text-align: right;;"></td><td style=";">http://www.deewr.gov.au</td><td style=";">Gpo Box</td><td style="text-align: right;;"></td><td style=";">Canberra</td><td style=";">Australian Capital Territory</td><td style="text-align: right;;">2601</td><td style=";">Australia</td></tr></tbody></table><p style="width:4.8em;font-weight:bold;margin:0;padding:0.2em 0.6em 0.2em 0.5em;border: 1px solid #BBB;border-top:none;text-align: center;background-color: #DAE7F5;color: #161120">Contacts</p><br /><br />

Ideally I'd like to specify what columns are used to determine if it's a duplicate entry. I could start with First Name, Last Name, Title, Company etc etc and as the duplicates are found, merged and removed I can progressively reduce the required columns to determine if it's a duplicate. Hopefully to manage the data quality. I hope that makes senses.

Thanks for your patience.
 
Last edited:
Upvote 0

Forum statistics

Threads
1,224,602
Messages
6,179,841
Members
452,948
Latest member
UsmanAli786

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top