Comparing two columns containing grid refs to find duplicates.

squirrel31

New Member
Joined
Sep 15, 2014
Messages
30
Hello,

First off I'd like to point out that I originally had posted this query at the following link:- Comparing two columns containing grid refs to find duplicates. 5 days ago where I was unsucessful in gaining any replies that helped me out. (You will also find an attached example showing a snippet of my work so far)

As stated in the thread using the above link. I have a problem where I have in one column I have an ID, to the right I have Pos_X, to the right of that I have Pos_Y (Grid reference coords).

I have used a formula to find the duplicate IDs so that I could filter them out however I now need to analyse these duplicates further. Therefore I need to work out within the duplicate IDs which have duplicate grid references and which have unique grid references. Only the first of x amount of duplicate grid refs is then marked down in the Additional intersect? column using the format 'Y'. Any unique grid ref ID is also marked down as 'Y' in the Additional intersect? column.

Any scenario where a 'N' is marked would be the following:

  • The ID is not a duplicate therefore there is only one position for that ID.
  • There are multiple duplicate grid references and therefore the 2nd onwards are marked as N whilst the first is Y.

Any scenario where a 'Y' is marked are as follows:

  • There is a duplicate grid reference under the same ID which only the first duplicate grid ref is marked as Y
  • There are x amount of unique grid references under the same ID which all are marked as Y


Thanks in advance!
 

Some videos you may like

Excel Facts

Move date out one month or year
Use =EDATE(A2,1) for one month later. Use EDATE(A2,12) for one year later.

steve the fish

Well-known Member
Joined
Oct 20, 2009
Messages
8,111
Office Version
  1. 365
Platform
  1. Windows
To be clear do all 3 columns you mention have to be the same to be classed as a duplicate? Some sample data with expected results would be good

As stated in the thread using the above link. I have a problem where I have in one column I have an ID, to the right I have Pos_X, to the right of that I have Pos_Y (Grid reference coords).
 

squirrel31

New Member
Joined
Sep 15, 2014
Messages
30
To be classified as a duplicate the following needs to be true:

  • ID is same

To be Y the following needs to be true:

  • ID is same
  • Pos_X and Pos_Y are unique

To be N the following needs to be true:


  • ID is same
  • Pos_X and Pos_Y are the same

As for that formula Andrew it doesn't quite work in the way I want (See the example I posted in the original thread I mentioned). It is currently telling me that there is an additional intersect at a non-duplicate record. Additionally, it is telling me there is an additional intersect for a record that is a duplicate however the grid refs are the same in both.
 

Andrew Poulsom

MrExcel MVP
Joined
Jul 21, 2002
Messages
73,092

ADVERTISEMENT

Please post some sample data including examples where the formula I suggested isn't returning the expected result and the reason. We need to keep everything in this thread rather than referring to one in another Forum.
 

squirrel31

New Member
Joined
Sep 15, 2014
Messages
30
Example in table form (Additional Intersect shows expected results):

Orange = Completed
Green = In progress
Red = No intersect
Violet = Duplicate

ID
Pos_X
Pos_Y
Con
Lookup
Pon Lookup
Checked?
Intersect?
Completed Entry?
Duplicate?
Additional Intersect?

363964822

<tbody>
</tbody>
151815
32546

3639648221518132545

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

363964822

<tbody>
</tbody>
Y
Y
Y
N
-

363611803

<tbody>
</tbody>
158801
37559

3636118031588037559

<tbody>
</tbody>

3636118031588037559

<tbody>
</tbody>

363611803

<tbody>
</tbody>
Y
Y
N
N
-

363611805

<tbody>
</tbody>
161140
39994

3636118051611439994

<tbody>
</tbody>

3636118051611439994

<tbody>
</tbody>

363611805

<tbody>
</tbody>
Y
Y
Y
N
-

363976725

<tbody>
</tbody>
167048
30340

3639767251670430340

<tbody>
</tbody>

3639767251670430340

<tbody>
</tbody>

363976725

<tbody>
</tbody>
Y
Y
N
N
-

363611789

<tbody>
</tbody>
166502
42025

3636117891665042025

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

363611789

<tbody>
</tbody>
Y
Y
Y
Y
Y

363976725

<tbody>
</tbody>
167085
30311

3639767251670830310

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

363976725

<tbody>
</tbody>
Y
Y
Y
Y
Y

630624513

<tbody>
</tbody>
178576
43780

6306245131785743780

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

630624513

<tbody>
</tbody>
Y
Y
Y
N
-

210182335

<tbody>
</tbody>

212551

<tbody>
</tbody>
200431

2101823352125520043

<tbody>
</tbody>

210182335

<tbody>
</tbody>
Y
N
-
N
-

210182301

<tbody>
</tbody>
242047
205823

2101823012420420582

<tbody>
</tbody>

#N/A

<tbody>
</tbody>


210182301

<tbody>
</tbody>

<tbody>
</tbody>
Y
Y
Y
N
-

210182310

<tbody>
</tbody>

241135

<tbody>
</tbody>
205965

2101823102411320596

<tbody>
</tbody>

210182310

<tbody>
</tbody>
Y
Y
N
N
-

210182300

<tbody>
</tbody>

242677

<tbody>
</tbody>
205715

2101823002426720571

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

210182300

<tbody>
</tbody>
Y
Y
Y
Y
Y

210182300

<tbody>
</tbody>

243785

<tbody>
</tbody>
205306

2101823002437820530

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

210182300

<tbody>
</tbody>
Y
Y
Y
Y
Y

1

<tbody>
</tbody>

123456

<tbody>
</tbody>
234567

11234523456

<tbody>
</tbody>
Y
Y
N
N
-

1

<tbody>
</tbody>

123456

<tbody>
</tbody>
234567

11234523456

<tbody>
</tbody>
Y
Y
Y
Y
N

<tbody>
</tbody>

Results that went wrong (Additional Intersect is current using formula):

210182109, 210182132 and 1000534854 are wrong because they are not duplicates. 363631355 is wrong because although they are duplicates (Same ID) the grid references are the same therefore no unique grid refs)

ID
Pos_X
Pos_Y
Con
Lookup
Pon Lookup
Checked
Intersect
Completed Entry?
Duplicate
Additional Intersect?

210182109

<tbody>
</tbody>

261133

<tbody>
</tbody>

193517

<tbody>
</tbody>

2101821092611319351

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

#N/A

<tbody>
</tbody>
Y
Y
N
N
Y

210182132

<tbody>
</tbody>

266185

<tbody>
</tbody>

193035

<tbody>
</tbody>

2101821322661819303

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

#N/A

<tbody>
</tbody>
Y
N
-
N
Y

363631355

<tbody>
</tbody>

255745

<tbody>
</tbody>

133052

<tbody>
</tbody>

3636313552557413305

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

Y

<tbody>
</tbody>
N
-
Y
N

363631355

<tbody>
</tbody>

255745

<tbody>
</tbody>

133052

<tbody>
</tbody>

3636313552557413305

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

Y

<tbody>
</tbody>
N
-
Y
Y

<tbody>
</tbody>

ID
Pos_X
Pos_Y
Con
Lookup
Pon Lookup
Checked
Intersect
Completed Entry?
Duplicate
Additional Intersect?

1000534854

<tbody>
</tbody>

272342

<tbody>
</tbody>

192919

<tbody>
</tbody>

10005348542723419291

<tbody>
</tbody>

#N/A

<tbody>
</tbody>

1000534854

<tbody>
</tbody>
Y
Y
Y
N
Y

<tbody>
</tbody>
 

squirrel31

New Member
Joined
Sep 15, 2014
Messages
30

ADVERTISEMENT

Hi,

I've just retried =IF(COUNTIFS(A$2:A2,A1,B$2:B2,B1,C$2:C2,C2)>1,"Y","N") and for some reason it is now detecting whether or not it is a duplicate however it appears that it is not sensing whether the grid references are unique or not. 1st part correct now just need to fine tune it :)


Just a few examples where its now going wrong.

Reason why 210182300 should be Y is because it is a duplicate and has a unique grid reference.


IDPos_XPos_YConLookupPon LookupCheckedIntersectCompleted Entry?DuplicateAdditional Intersect?
363631360

<tbody>
</tbody><colgroup><col></colgroup>
255239

<tbody>
</tbody><colgroup><col></colgroup>
133365

<tbody>
</tbody><colgroup><col></colgroup>
3636313602552313336

<tbody>
</tbody><colgroup><col></colgroup>
#N/A

<tbody>
</tbody><colgroup><col></colgroup>
1000546862

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
363631360

<tbody>
</tbody><colgroup><col></colgroup>
255239

<tbody>
</tbody><colgroup><col></colgroup>
133365

<tbody>
</tbody><colgroup><col></colgroup>
3636313602552313336

<tbody>
</tbody><colgroup><col></colgroup>
#N/A

<tbody>
</tbody><colgroup><col></colgroup>
1000546862

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
Y

<tbody>
</tbody><colgroup><col></colgroup>
Y (SHOULD BE N)

<tbody>
</tbody><colgroup><col></colgroup>
210182300

<tbody>
</tbody><colgroup><col></colgroup>
242677

<tbody>
</tbody><colgroup><col></colgroup>
205715

<tbody>
</tbody><colgroup><col></colgroup>
2101823002426720571

<tbody>
</tbody><colgroup><col></colgroup>
#N/A

<tbody>
</tbody><colgroup><col></colgroup>
210182300

<tbody>
</tbody><colgroup><col></colgroup>
YYYYN (SHOULD BE Y)

<tbody>
</tbody>
 

Andrew Poulsom

MrExcel MVP
Joined
Jul 21, 2002
Messages
73,092
Shouldn't the formula be?

=IF(COUNTIFS(A$2:A2,A2,B$2:B2,B2,C$2:C2,C2)>1,"Y","N")

Rather than posting some isolated snippets from your data please create some which clearly demonstrates the results you are expecting.
 

squirrel31

New Member
Joined
Sep 15, 2014
Messages
30
Sorry, my excel seems to have the mind of its own. Thought I had copied that formula correctly (even compared them in notepad...)

One example that shows the result I should be expecting:

IDPos_XPos_YIntersect?Duplicate?Additional Intersect?
210182300

<tbody>
</tbody><colgroup><col></colgroup>
243130

<tbody>
</tbody><colgroup><col></colgroup>
205587

<tbody>
</tbody><colgroup><col></colgroup>
YN-
210182300

<tbody>
</tbody><colgroup><col></colgroup>
243130

<tbody>
</tbody><colgroup><col></colgroup>
205587

<tbody>
</tbody><colgroup><col></colgroup>
YYN
210182300

<tbody>
</tbody><colgroup><col></colgroup>
242677

<tbody>
</tbody><colgroup><col></colgroup>
205715

<tbody>
</tbody><colgroup><col></colgroup>
YYY
210182300

<tbody>
</tbody><colgroup><col></colgroup>
243785

<tbody>
</tbody><colgroup><col></colgroup>

<tbody>
</tbody><colgroup><col></colgroup>
205306

<tbody>
</tbody><colgroup><col></colgroup>
YYY

<tbody>
</tbody>


Explained: The second row is the first time 210182300 appears so is not a duplicate. The third row is where an intersect appears however shares the same grid reference meaning it can be ignored. The following two have the same IDs however they both have unique grid references therefore they are additional intersects (because they also intersect).

Rules:

Same ID + Unique grid ref + Intersect is Y = Additional intersect is Y
Same ID + No unique grid ref + Intersect is Y = Additional intersect is N
Same ID + Intersect is N = Additional intersect is -
Same ID + Duplicate is N = Additional intersect is -

I have removed the columns that do not affect what I am trying to produce (and I aplogise for adding them in as looking back it over-complicates things here).

I shall try and sum up the process again: There are 21004 records of which there are duplicates (9144). I require to determine which ones out of the 9144 I need to deal with and which I can ignore.

I have revised the method to make it easier (I hope)

To be classified as a non-duplicate the following needs to be true:

  • Unique ID
  • if x amounts appear of IDs then the first is classified as non-duplicate

To be a duplicate the following needs to be true:

  • Same ID
  • if x amounts appear of IDs then the 2nd one onwards (2nd,3rd...) are all classified as duplicates.

To be an additional intersect the following needs to be true:

  • Record needs to be a duplicate
  • Grid ref needs to be unique
  • if x amounts of grid refs (needs to be different to the same ID classified as non-duplicate) appear then the first is classified as an additional intersect
  • Intersect = Y
 

Andrew Poulsom

MrExcel MVP
Joined
Jul 21, 2002
Messages
73,092
Maybe:

<b>Excel 2010</b><table cellpadding="2.5px" rules="all" style=";background-color: #FFFFFF;border: 1px solid;border-collapse: collapse; border-color: #BBB"><colgroup><col width="25px" style="background-color: #DAE7F5" /><col /><col /><col /><col /><col /><col /></colgroup><thead><tr style=" background-color: #DAE7F5;text-align: center;color: #161120"><th></th><th>A</th><th>B</th><th>C</th><th>D</th><th>E</th><th>F</th></tr></thead><tbody><tr ><td style="color: #161120;text-align: center;">1</td><td style=";">ID</td><td style=";">Pos_X</td><td style=";">Pos_Y</td><td style=";">Intersect?</td><td style=";">Duplicate?</td><td style=";">Additional Intersect?</td></tr><tr ><td style="color: #161120;text-align: center;">2</td><td style="text-align: right;;">210182300</td><td style="text-align: right;;">243130</td><td style="text-align: right;;">205587</td><td style=";">Y</td><td style=";">N</td><td style=";"></td></tr><tr ><td style="color: #161120;text-align: center;">3</td><td style="text-align: right;;">210182300</td><td style="text-align: right;;">243130</td><td style="text-align: right;;">205587</td><td style=";">Y</td><td style=";">Y</td><td style=";">N</td></tr><tr ><td style="color: #161120;text-align: center;">4</td><td style="text-align: right;;">210182300</td><td style="text-align: right;;">242677</td><td style="text-align: right;;">205715</td><td style=";">Y</td><td style=";">Y</td><td style=";">Y</td></tr><tr ><td style="color: #161120;text-align: center;">5</td><td style="text-align: right;;">210182300</td><td style="text-align: right;;">243785</td><td style="text-align: right;;">205306</td><td style=";">Y</td><td style=";">Y</td><td style=";">Y</td></tr></tbody></table><p style="width:3.6em;font-weight:bold;margin:0;padding:0.2em 0.6em 0.2em 0.5em;border: 1px solid #BBB;border-top:none;text-align: center;background-color: #DAE7F5;color: #161120">Sheet1</p><br /><br /><table width="85%" cellpadding="2.5px" rules="all" style=";border: 2px solid black;border-collapse:collapse;padding: 0.4em;background-color: #FFFFFF" ><tr><td style="padding:6px" ><b>Worksheet Formulas</b><table cellpadding="2.5px" width="100%" rules="all" style="border: 1px solid;text-align:center;background-color: #FFFFFF;border-collapse: collapse; border-color: #BBB"><thead><tr style=" background-color: #DAE7F5;color: #161120"><th width="10px">Cell</th><th style="text-align:left;padding-left:5px;">Formula</th></tr></thead><tbody><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">E2</th><td style="text-align:left">=IF(<font color="Blue">COUNTIF(<font color="Red">A$2:A2,A1</font>)>1,"Y","N"</font>)</td></tr><tr><th width="10px" style=" background-color: #DAE7F5;color: #161120">F2</th><td style="text-align:left">=IF(<font color="Blue">E2="Y",IF(<font color="Red">AND(<font color="Green">D2="Y",COUNTIFS(<font color="Purple">A$2:A2,A2,B$2:B2,B2,C$2:C2,C2</font>)>1</font>),"N","Y"</font>),""</font>)</td></tr></tbody></table></td></tr></table><br />
 

Watch MrExcel Video

Forum statistics

Threads
1,109,456
Messages
5,528,871
Members
409,842
Latest member
mfernandezcean
Top