Regular Expressions problem

ya5irha55an

Board Regular
Joined
Sep 17, 2004
Messages
85
Hi All! I hope someone can help me with the following problem:

The problem is as follows:

I recieve a file with a list of addresses, all in one column as a single string, for example:

40-40A Pall MallChorleyLancashire
55/57 Dickson RoadBlackpoolLancashire
211 Waterloo RoadBlackpoolLancashire
362-364 Leyland LaneLeylandLancashire
4 High StreetCleator MooreCumbria
3 Sir Simons ArcadeLancasterLancashire
40 Market StreetDarwenLancashire
Granada Motorway ServicesM6 NorthboundSouthwaite ServicesNr Carlisle
75 Lord StreetFleetwoodLancashire
5 Moorclose RoundaboutWorkingtonCumbria

as you can see, the amount of information varies, but the fields i am looking for are as follows:

Unit
Building
Street Number From
Street Number Letter (1)
Street Number To
Street Number Letter (2)
Street
Area
City
County
Postcode

some or all or this info may be present in any given string.

Now i also recieve the same data from a different source, in the same format only with space between each field, and i have a reg expression (provided very kindly by someone from this board!) which sorts this out fine:

Sub SplitAddress2()
Dim rAddresses As Range, rcell As Range, oMatches As Object, i As Integer, s As String
Const ADDR_UNIT As String = "((unit)\s+([a-z0-9]+)\s+)?"
Const ADDR_NUMBER As String = "((\d+)([a-z])?([/\-](\d+)([a-z])?)? +)?"
Const ADDR_ST As String = "(.+?)"
Const ADDR_PC As String = "(\s+([a-z][a-z0-9]{1,3} \d[a-z][a-z]))?"

Set rAddresses = Range(Range("A2"), Range("A2").End(xlDown))

With CreateObject("VBSCRIPT.REGEXP")
.IgnoreCase = True
.Pattern = ADDR_UNIT & ADDR_NUMBER & ADDR_ST & ADDR_PC & "\s*$"

For Each rcell In rAddresses
Set oMatches = .Execute(rcell)
With oMatches(0)
rcell.Offset(, 1) = .submatches(1)
rcell.Offset(, 2) = .submatches(2)
rcell.Offset(, 3) = .submatches(4)
rcell.Offset(, 4) = .submatches(5)
rcell.Offset(, 5) = .submatches(7)
rcell.Offset(, 6) = .submatches(8)
rcell.Offset(, 7) = .submatches(9)
rcell.Offset (, 8) = .submatches(10)
End With
Next

End With
End Sub

but this does not work for the above data which has no spaces.

The data does have a pattern to it though: each field begins with a Capital letter, unless the letter is preceded by a space in which case it is part of the previous field. I have 2 questions:

1. Is there any way to write a regular expression to split the above data into the fields mentioned?

2. Is there any way to do this so that it also works with the data from the other source, ie the one with spaces - i will paste some examples of this data below:

100 Tottenham Court Road London W1T 4TT
63 Tottenham Court Road London W1T 2ES
75B Victoria Street London SW1H 0HW
40 Albermarle Street London W1S 4TE
85 Aldgate High Street London EC3N 1LH
192 The Strand London WC2R 1DT
3 America Square London EC3N 2LR
120 Baker Street London W1U 6TU
7 Berkeley Square London W1J 6ES
Unit 450 Pingle Drive Oxford OX26 6WD
52 New Street Birmingham B2 4EG
Unit 3A 55A Temple Row Birmingham B2 5LE
140 Bishopsgate London EC2M 4HX
192 Bishopsgate London EC2M 4NR
Unit L0006 Eastern Mall Bluewater Shopping Centre DA9 9SH

I cant tell you how much i would appreciate some assistance, thanks very much!
 

Excel Facts

Workdays for a market open Mon, Wed, Friday?
Yes! Use "0101011" for the weekend argument in NETWORKDAYS.INTL or WORKDAY.INTL. The 7 digits start on Monday. 1 means it is a weekend.
This code seems to do the job. Check sheet names are OK.


To adapt for space delimiter only remove the elsif Asc() .... section and
"And IsNumeric(Left(PartString, 1))" from the "If MyChr = " " line.

Better still would be to open the text file as space delimited. You would have to accept illogical splits (such as PostCode) either way unless you add code to use lookup tables or some other way of checking validity.


Code:
'===========================================
'- PARSE LINES OF DATA & SPLIT INTO COLUMNS
'===========================================
Sub test()
    Dim FromSheet As Worksheet
    Dim FromRow As Long
    Dim ToSheet As Worksheet
    Dim ToRow As Long
    Dim ToColumn As Integer
    Dim MainString As String
    Dim Cstart As Integer
    Dim MyChr As String
    Dim Cend As Integer
    Dim PartString As String
    '-------------------------------
    Set FromSheet = Worksheets("Sheet1")
    FromRow = 1
    Set ToSheet = Worksheets("Sheet2")
    ToRow = 1
    '--------------------------------------------------------------
    '- loop rows
    While FromSheet.Cells(FromRow, 1).Value <> ""
        MainString = FromSheet.Cells(FromRow, 1).Value
        ToColumn = 1
        Cstart = 1
        PartString = ""
        '---------------------------------------------------------
        '- loop characters in string
        For C = 1 To Len(MainString)
            MyChr = Mid(MainString, C, 1)
            '- found space at begining
            If MyChr = " " And IsNumeric(Left(PartString, 1)) Then
                Cstart = C
                ToSheet.Cells(ToRow, ToColumn).Value = PartString
                PartString = ""
                ToColumn = ToColumn + 1
            '- found upper case
            ElseIf Asc(MyChr) >= 65 And Asc(MyChr) <= 90 _
                And Not IsNumeric(Left(PartString, 1)) And PartString <> "" _
                And Right(PartString, 1) <> " " Then
                ToSheet.Cells(ToRow, ToColumn).Value = PartString
                PartString = MyChr
                ToColumn = ToColumn + 1
            Else
                PartString = PartString & MyChr
            End If
        Next
        '-----------------------------------------------------------
        ToSheet.Cells(ToRow, ToColumn).Value = PartString
        ToRow = ToRow + 1
        FromRow = FromRow + 1
    Wend
End Sub
'-------------------------------------------
 
Upvote 0

Forum statistics

Threads
1,214,952
Messages
6,122,457
Members
449,083
Latest member
Ava19

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top