Remove duplicate rows whereby duplicates of one column are located in another column (Excel 2007)

guest2013

New Member
Joined
Aug 14, 2013
Messages
8
Hi,
I have a large dataset of text and I'm trying to find the co-occurance of words. Alterantive combinations of words are in two different columns. For instance, the column A has a co-word "apple_orange" and the column B of the same row has its flip co-word "organge_apple" and both are equivalent. This also means that all the values in column A are present somewhere in Column B and vice versa, but in different rows. For instance, consider the following:
Col ACol B
apple_fruitfruit_apple
apple_mangomango_apple
apple_orangeorange_apple
fruit_appleapple_fruit
juice_mangomango_juice
juice_orangeorange_juice
mango_appleapple_mango
mango_juicejuice_mango
orange_appleapple_orange
orange_juicejuice_orange


<tbody>
</tbody>
I need to accurately identify and remove all the duplicate rows, whereby the duplicates of Column A reside in Column B and vice versa. This means that half of the rows in the matrix have to be removed but the challenge is how to identify which rows to be removed. For instance, in the above example, rows 1 and 4 are duplicates, rows 2 and 7 are duplicates, and so forth, and need to be removed.

I have tried different formulae and techniques but failed. Any help would be highly appreciated.

Best regards,
guest2013

 

Excel Facts

VLOOKUP to Left?
Use =VLOOKUP(A2,CHOOSE({1,2},$Z$1:$Z$99,$Y$1:$Y$99),2,False) to lookup Y values to left of Z values.
If code will do try this:-
Code:
[COLOR="Navy"]Sub[/COLOR] MG14Aug23
[COLOR="Navy"]Dim[/COLOR] Rng [COLOR="Navy"]As[/COLOR] Range, Dn [COLOR="Navy"]As[/COLOR] Range
[COLOR="Navy"]Dim[/COLOR] Dic [COLOR="Navy"]As[/COLOR] Object
[COLOR="Navy"]Dim[/COLOR] nRng [COLOR="Navy"]As[/COLOR] Range
[COLOR="Navy"]Set[/COLOR] Rng = Range(Range("A1"), Range("A" & Rows.Count).End(xlUp)).Resize(, 2)
[COLOR="Navy"]Set[/COLOR] Dic = CreateObject("scripting.dictionary")
Dic.CompareMode = vbTextCompare
[COLOR="Navy"]For[/COLOR] [COLOR="Navy"]Each[/COLOR] Dn [COLOR="Navy"]In[/COLOR] Rng
    [COLOR="Navy"]If[/COLOR] Not Dic.exists(Dn.Value) [COLOR="Navy"]Then[/COLOR]
        Dic.Add Dn.Value, Dn
    [COLOR="Navy"]Else[/COLOR]
        [COLOR="Navy"]If[/COLOR] nRng [COLOR="Navy"]Is[/COLOR] Nothing [COLOR="Navy"]Then[/COLOR]
            [COLOR="Navy"]Set[/COLOR] nRng = Dn
        [COLOR="Navy"]Else[/COLOR]
            [COLOR="Navy"]Set[/COLOR] nRng = Union(nRng, Dn)
        [COLOR="Navy"]End[/COLOR] If
    [COLOR="Navy"]End[/COLOR] If
[COLOR="Navy"]Next[/COLOR]
[COLOR="Navy"]If[/COLOR] Not nRng [COLOR="Navy"]Is[/COLOR] Nothing [COLOR="Navy"]Then[/COLOR]
    nRng.EntireRow.Delete
[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]If[/COLOR]
[COLOR="Navy"]End[/COLOR] [COLOR="Navy"]Sub[/COLOR]
Regards Mick
 
Upvote 0
Hi MickG,
Many thanks for your response. I'm not familiar with codes, and therefore, I prefer formulae. Nevertheless, I ran your code as Macro. Out of 2238 rows as input, it produces a result of 92 rows whereas the result must be exactly 1119 rows (I wished to attach the data sheet for your kind perusal but my account does not permit attachment). In fact, when I ran it on a data sheet with more than 10000 rows of similar data, it deleted all rows leaving behind less 200 rows.

Thanks again

Best regards,
Guest2013
 
Upvote 0
I think that may be your data was more complicated:-
e.g. If you have duplicate names in the same column, it will not work. The code assumes that data in column "A" will only have a duplicate in column "B", and as i believe you specified that any duplicate in "A" which matches a duplicate in "B" will automatically have the column "A" row offset value in "B", that will match with the offset of the Dupicate in "B" with offset row in "A". If that makes sense.
If you can possibly post some example that follows your data more accurately , I'll have another go.
 
Last edited:
Upvote 0
Hi MickG,
Many thank for your response. I'm sorry that I had a few columns hidden between the Column "A" and the duplicates column, and I forgot it. Alerted by you explanation, I put that column in place of actual column "B", and you code worked as needed. Wow!

My heartiest apologies for the misunderstanding and thanks a lot for the help. Indeed, I cannot thank you enough.

Best regards,
Guest2013
 
Upvote 0
I know this is really old but I need to take this code and tweak it for my benefit. I am trying to do something really similar but the difference is that my data may be duplicated in the same column. i.e.

I have 2 columns, one with an account number and another with a linked account number.

A B
1000 2000
2000 1000

This is a duplicate - I want to remove 1 row. Basically I have 10,000 lines. But account 1000 may appear somewhere further down column A (but to a different linked account).

How would I do this?

Thanks,

Luke
 
Last edited:
Upvote 0
I know this is really old but I need to take this code and tweak it for my benefit. I am trying to do something really similar but the difference is that my data may be duplicated in the same column. i.e.

I have 2 columns, one with an account number and another with a linked account number.

A B
1000 2000
2000 1000

This is a duplicate - I want to remove 1 row. Basically I have 10,000 lines. But account 1000 may appear somewhere further down column A (but to a different linked account).

How would I do this?

Thanks,

Luke

Up
 
Upvote 0

Forum statistics

Threads
1,213,497
Messages
6,114,002
Members
448,543
Latest member
MartinLarkin

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top