Excel formula to find data with multiple criteria

joy1mat

New Member
Joined
Jun 26, 2016
Messages
31
Office Version
  1. 365
Platform
  1. Windows
I have a spreadsheet with over 100K rows (and growing) with surnames in 1 column (A) and Christian names in another (B) which I populate from another source. I need to find a formula/s to find when the same surnames and/or Christian names are not sequenced correctly eg surnames such as JONES should contain an _1 or _2 after them etc. If there is also a Christian name against a surname the same applies eg JONES John_1 etc. Basically I need to find same surnames with or without the _ so I can correctly sequence them. I suspect I may need 2 formulas to handle the surname separately from the surname/Christian names. I have a reasonably good understanding of formulas and have tried using LENS and MATCH but this one has me stumped. Any ideas?
 

Excel Facts

Can a formula spear through sheets?
Use =SUM(January:December!E7) to sum E7 on all of the sheets from January through December
Welcome to the MrExcel board!

It isn't quite clear to me. Could you provide a small set of dummy data and expected results (say 10-15 rows) to show the variety of circumstances that may occur with your data and what you want done with it?
For good ways to provide sample data that we can copy and test with, Look here.
 
Upvote 0
Welcome to the MrExcel board!

It isn't quite clear to me. Could you provide a small set of dummy data and expected results (say 10-15 rows) to show the variety of circumstances that may occur with your data and what you want done with it?
For good ways to provide sample data that we can copy and test with, Look here.

I've used the forum tools and done a test and it looks ok. Hopefully I've done it right. I don't think a formula/s will automatically fix my issue but would like the formula to identify which ones I need to fix. I do appreciate your help.
Sample currentSample end result
2014_01_12JONES 2014_01_12JONES_1
2014_01_12JONESJohn_12014_01_12JONESJohn_1
2014_01_12JONESJohn_22014_01_12JONESJohn_2
2014_01_12JONESJohn_32014_01_12JONESJohn_3
2014_01_12JONESJohn_42014_01_12JONESJohn_4
2014_01_12JONESJohn_52014_01_12JONESJohn_5
2014_01_12JONESJohn_62014_01_12JONESJohn_6
2014_01_12JONES_1 2014_01_12JONES_2
2015_06_01JONESJohn2015_06_01JONESJohn_7
2015_06_01JONES_2 2015_06_01JONES_3
2015_06_01JONES_3 2015_06_01JONES_4

<tbody>
</tbody><colgroup><col><col span="2"><col><col><col span="2"></colgroup>
 
Upvote 0
See if this does what you want, though I'm not sure how it will go with 100,000 rows of data.
Might require a macro if the sheet bogs down with formulas. Would that be acceptable?

Both formulas copied down.

Excel Workbook
ABCDE
1Sample current
22014_01_12JONESJONES_1 
32014_01_12JONESJohn_1JONESJohn_1
42014_01_12JONESJohn_2JONESJohn_2
52014_01_12JONESJohn_3JONESJohn_3
62014_01_12JONESJohn_4JONESJohn_4
72014_01_12JONESJohn_5JONESJohn_5
82014_01_12JONESJohn_6JONESJohn_6
92014_01_12JONES_1JONES_2
102015_06_01JONESJohnJONESJohn_7
112015_06_01JONES_2JONES_3
122015_06_01JONES_3JONES_4
Repeat names
 
Last edited:
Upvote 0
WOW your good. I did try to use LENS and FIND but I can see why my formula didn't work. We're nearly there. Whilst both formulas work they are also putting an _1 etc against names that are unique that don't require a sequence eg if I have a surname of ABBOTT and there are no other ABBOTT'S in the list then it doesn't need the _1. The same applies with Christian names eg ABBOTT_Fred. My apologies I should have included a couple of unique names on the sample I sent you. It's only where the same surname and/or Christian name appears more than once both with and without a _ that I have to fix it. Also I tend to agree that the spreadsheet is probably going to get bogged down the bigger it gets so once I get the formulas working I have to think about whether to go down the macro path. I'm on Excel 2013 and it keeps wanting to disable macros eg the Forum Tools one. If you need me to send another sample with the additional names for clarification let me know.
 
Upvote 0
As requested. In the case below the names of ABBOTT are unique and not part of a sequence so they don't need the _ as part of the name. All the JONES should be part of a sequence and should all have an _ as part of the name for both the instances where it's just JONES or JONES John.
Sample currentSample end result
2014_01_12JONES 2014_01_12JONES_1
2014_01_12JONESJohn_12014_01_12JONESJohn_1
2014_01_12JONESJohn_22014_01_12JONESJohn_2
2014_01_12JONESJohn_32014_01_12JONESJohn_3
2014_01_12ABBOTT 2014_01_12ABBOTT
2014_01_12ABBOTTJohn2014_01_12ABBOTTJohn
2014_01_12ABBOTTJoseph2014_01_12ABBOTTJoseph
2014_01_12JONES_1 2014_01_12JONES_2
2015_06_01JONESJohn2015_06_01JONESJohn_4
2015_06_01JONES_2 2015_06_01JONES_3
2015_06_01JONES_3 2015_06_01JONES_4

<tbody>
</tbody><colgroup><col><col span="3"><col><col span="2"></colgroup>

I do hope I'm getting the screen shot correct for you.
 
Upvote 0
I do hope I'm getting the screen shot correct for you.
The screen shots are fine as I can copy/paste that data to test with.
However, particularly with the size of your data, I think a formula solution is getting unwieldy and will probably bog your sheet right down.
Instead, I would suggest that you try this macro. Post back if you need help with how to implement it.

I'm also not sure if you want the results in separate columns like I have below in columns D:E, or whether to just over-write the original values in columns B:C?

Make sure that you test in a copy of your workbook.

Rich (BB code):
Sub NameSequence()
  Dim a As Variant, b As Variant, ky As Variant
  Dim d As Object
  Dim i As Long
  Dim s As String
  
  Set d = CreateObject("Scripting.Dictionary")
  d.CompareMode = 1
  Application.ScreenUpdating = False
  With Range("B2:C" & Range("B" & Rows.Count).End(xlUp).Row)
    b = .Value
    .Replace What:="_*", Replacement:="", LookAt:=xlPart
    a = .Value
    ReDim Preserve a(1 To UBound(a), 1 To 3)
    For i = 1 To UBound(a)
      s = a(i, 1) & "|" & a(i, 2)
      d(s) = d(s) + 1
      a(i, 3) = s
    Next i
    For Each ky In d.Keys
      If d(ky) = 1 Then d.Remove ky
    Next ky
    For i = UBound(a) To 1 Step -1
      s = a(i, 3)
      If d.exists(s) Then
        If Right(s, 1) = "|" Then
          a(i, 1) = a(i, 1) & "_" & d(s)
        Else
          a(i, 2) = a(i, 2) & "_" & d(s)
        End If
        d(s) = d(s) - 1
      End If
    Next i
    .Offset(, 2).Value = a
    .Value = b
  End With
  Application.ScreenUpdating = True
End Sub


Original data in columns A:C, columns D:E produced by the above code.

Excel Workbook
ABCDE
1
22014_01_12JONESJONES_1
32014_01_12JONESJohn_1JONESJohn_1
42014_01_12JONESJohn_2JONESJohn_2
52014_01_12JONESJohn_3JONESJohn_3
62014_01_12ABBOTTABBOTT
72014_01_12ABBOTTJohnABBOTTJohn
82014_01_12ABBOTTJosephABBOTTJoseph
92014_01_12JONES_1JONES_2
102015_06_01JONESJohnJONESJohn_4
112015_06_01JONES_2JONES_3
122015_06_01JONES_3JONES_4
Repeat names (2)
 
Upvote 0

Forum statistics

Threads
1,214,923
Messages
6,122,286
Members
449,076
Latest member
kenyanscott

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top