Magic_Doctor
Board Regular
- Joined
- Mar 18, 2009
- Messages
- 56
Hello,
I try, by means of regular expressions, to write a function allowing me to extract from a string of characters either last names (all capitalized) or first names (up to 3, as in France for example) all starting with a capital letter followed by lower case letters. Vocal accents will be taken into account (I am not Anglo-Saxon).
My function works almost well. Indeed, when there are 3 first names the third is added to the last name.
Thank you for helping me solve this problem.
I try, by means of regular expressions, to write a function allowing me to extract from a string of characters either last names (all capitalized) or first names (up to 3, as in France for example) all starting with a capital letter followed by lower case letters. Vocal accents will be taken into account (I am not Anglo-Saxon).
My function works almost well. Indeed, when there are 3 first names the third is added to the last name.
Code:
Option Explicit
Dim regEx As Object 'memorization to save time in creating the object ("VBScript.RegExp") 'job75
Function DissectNP(NP As String, x As Byte) As String
'- NP : the string containing the NAME(s) + first name(s)
'- x : if x = 1 --> NAME(s)
' if x = 2 --> first name(s)
Dim strPattern As String, NBNomsMaj As Integer, NBNomsFirstMaj As Integer
Dim PatternNoms As String, PatternPreNoms As String
Set regEx = CreateObject("VBScript.RegExp")
NBNomsMaj = NB_MotsMmFirstM(NP, 1) 'number of names in CAPITALS
NBNomsFirstMaj = NB_MotsMmFirstM(NP, 3) 'number of names beginning with a CAPITAL followed by lowercase
PatternNoms = "([A-ZÈÉÊËÄÇÆŒÁÍÓÚÂÊÎÔÛÑÄËÏÜ\-]*\s)" 'pattern of NAMES
PatternPreNoms = "([a-zçæéèíóúâêîôûäëïöüñA-ZÈÉÊËÄÇÆŒÁÍÓÚÂÊÎÔÛÄËÏÜÑ\-]*)" 'pattern of last names
'PatternPreNoms = "([a-zçæéèíóúâêîôûäëïöüñA-ZÈÉÊËÄÇÆŒÁÍÓÚÂÊÎÔÛÄËÏÜÑ\-]*\s)" 'do not work!!
'strPattern = Application.WorksheetFunction.Rept(PatternNoms, NBNomsMaj) & PatternPreNoms & "( )" & PatternPreNoms
strPattern = Application.WorksheetFunction.Rept(PatternNoms, NBNomsMaj) & PatternPreNoms & IIf(NBNomsFirstMaj = 1, "", "( )" & PatternPreNoms)
'strPattern = Application.WorksheetFunction.Rept(PatternNoms, NBNomsMaj) & Application.WorksheetFunction.Rept(PatternPreNoms, NBNomsFirstMaj) 'do not work!
'If strPattern <> "" Then 'is it really useful?
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(NP) Then
Select Case x
Case 1 'NAMES
Select Case NBNomsMaj
Case 1
DissectNP = regEx.Replace(NP, "$1")
Case 2
DissectNP = regEx.Replace(NP, "$1" & "$2")
Case 3
DissectNP = regEx.Replace(NP, "$1" & "$2" & "$3")
End Select
Case 2 'last names
DissectNP = regEx.Replace(NP, "$" & NBNomsMaj + 1 & IIf(NBNomsFirstMaj = 1, "", "$" & NBNomsMaj + 2 & "$" & NBNomsMaj + 3)) 'bricolage...
End Select
Else
DissectNP = "Not matched"
End If
'End If
End Function
Function NB_MotsMmFirstM(phrase As String, x As Byte) As Integer
'Returns the number of words, in UPPERCASE or lowercase or starting with a UPPERCASE followed by lowercase, of a sentence
'Magic_Doctor
'- phrase : the string of characters studied
'- x : if x = 1 --> words in CAPITALS
' if x = 2 --> words in lowercases
' if x = 3 --> words starting with a CAPITAL followed by lowercases
Dim tmp, nbmots As Integer, mot() As String, i As Integer, lemot As String, n As Byte, nbmm As Integer
tmp = Split(phrase, " ")
nbmots = UBound(tmp) + 1 'total number of words (whatever they are) in the sentence
mot = Split(phrase)
For i = 1 To nbmots
lemot = mot(i - 1) 'Option Base 0
Select Case x
Case 1 'words in CAPITALS
n = IIf(UCase(lemot) = lemot, 1, 0)
Case 2 'words in lowercases
n = IIf(LCase(lemot) = lemot, 1, 0)
Case 3 'words starting with a CAPITAL followed by lowercases
n = IIf(Left(lemot, 1) = UCase(Left(lemot, 1)) And Mid(lemot, 2, 1) = LCase(Mid(lemot, 2, 1)), 1, 0)
End Select
nbmm = nbmm + n
Next
NB_MotsMmFirstM = nbmm
End Function
Thank you for helping me solve this problem.
Last edited: