Multiple Text Extract from String

will9455

New Member
Joined
Aug 31, 2017
Messages
2
I have a spreadsheet which contains a column of several thousands of database record serial numbers in the following format: ABC1234DE-YZ123456 (Sometimes there is an additional letter before the "-"). To complicate matters, this serial number is inside a text string of various different lengths. I have been able to extract the serial number using a MID(FIND()) formula.

The problem I am encountering is sometimes there are multiple serial numbers in the same text string, separated by a "/", and I need to be able to extract each serial number individually. Is this even possible, or do I have to keep manually extracting these combine serial numbers?

Unfortunately I am unable to reconfigure the source data to be more user friendly, so I have to work with what I'm given. I am pretty familiar with using formulas, but have no experience with VBA. Please help! Thank you.
 

Excel Facts

Create a Pivot Table on a Map
If your data has zip codes, postal codes, or city names, select the data and use Insert, 3D Map. (Found to right of chart icons).
The "/" isn't a problem as you can use that to split the text into an array to search. it'd probably help people have a better idea if you posted up the code you're currently using to extract the serials so they have an idea of the format, do these text strings ever contain "-" within the text that have no connection to a serial number? Also the serials separated by "/" are there any gaps? or like "ABC1234DE-YZ123456/ABC1234DE-YZ123456" and finally what you want to do with the data retrieved, just dumped in a column? duplicates kept etc
 
Upvote 0
I wish I could figure out how to post a screen shot here, but...What I would like to do is extract the serial number into Column A. I don't know if it is possible, but if there are multiple serial numbers it would be great if the entire Row is copied and inserted as a new Row with each unique serial number extracted to Column A. Here is a typical example of my data.

Source:

TC CVN0001PBT-DK011301/11401/11501/11601/11701 ELEVATOR #10 CONTROL STATION

Desired Return:

CVN0001PBT-DK011301
CVN0001PBT-DK011401
CVN0001PBT-DK011501
CVN0001PBT-DK011601
CVN0001PBT-DK011701

I can extract all the serials as a group using:
=TRIM(MID(A1,FIND("",A1)+1,FIND(" ",A1,FIND(" ",A1)+1)-FIND("",A1)))

or I can extract only the first serial using:
=TRIM(MID(A1,FIND("CVN0001", A1), (FIND("-", A1)+9)-FIND("CVN0001",A1)))
<o:p></o:p>


 
Upvote 0
Use this as a formula array as explained in the comments. The first is by rows that you wanted while the second is by columns. Obviously, spread formula array over as many or more than the number of expected parsed strings.

Place code in a Module.
Code:
'Parse by a Delimiter character, to rows. Select number of rows in one column to spread over.
'=parser(A1), select say 2 rows in a column and press Alt+Ctrl+Enter to enter as forumula array.
'if 3 could be maximum parsed strings, select 3 rows and do as above.
Function ParseR(aCell As Range, Optional delimit As String = "/")
  Dim icols As Integer, i As Integer, j As Integer, a, b
  icols = Application.Caller.Cells.Count
  a = Split(aCell, delimit)
  ReDim b(0 To icols - 1)
  For i = 0 To UBound(b)
    b(i) = ""
  Next i
  On Error GoTo EndI
  For i = 0 To UBound(b)
    If i > (icols - 1) Then Exit For
    b(i) = a(i)
  Next i
EndI:
  ParseR = WorksheetFunction.Transpose(b)
End Function


'Parse by a Delimiter character, to columns. Select number of columns to spread over.
'=parsec(A1), select say 2 columns and press Alt+Ctrl+Enter to enter as forumula array.
'if 3 could be maximum parsed strings, select 3 columns and do as above.
Function ParseC(aCell As Range, Optional delimit As String = "/")
  Dim icols As Integer, i As Integer, j As Integer, a, b
  icols = Application.Caller.Cells.Count
  a = Split(aCell, delimit)
  ReDim b(0 To icols - 1)
  For i = 0 To UBound(b)
    b(i) = ""
  Next i
  On Error GoTo EndI
  For i = 0 To UBound(b)
    If i > (icols - 1) Then Exit For
    b(i) = a(i)
  Next i
EndI:
  ParseC = WorksheetFunction.Transpose(WorksheetFunction.Transpose(b))
End Function
 
Upvote 0
Here's some VBA that inserts a new column in A for the serials to be gathered then checks column B (the old A column) for serial numbers. I've coded it as strRaw = CStr(wksRaw.Range("B" & x).Value) in the code so amend the B to whatever column the data is in after a new column A has been inserted. Also set the sheet name in the line from sheet1 to whatever it's called in Set wksRaw = ThisWorkbook.Sheets("Sheet1") . The code uses regex to be able to match the format of your serials ie w{3}\d{4}\w{2,3}-\w{2}\d{6} 3 letters followed by 4 numbers followed by 2 or 3 letters - two letters and 6 numbers. Because of that you'll need to ensure the regex expressions are enable in excel, that's very simple just follow this link https://stackoverflow.com/questions...ct-regular-expression-using-vba-macro-in-word


Code:
Option Explicit
Public Sub ExtractSerials()
    On Error GoTo xit
    Application.EnableEvents = False
    Application.ScreenUpdating = False
    Application.Calculation = xlCalculationManual
    Dim wksRaw As Worksheet
    Dim strPattern As String, strRaw As String, strMatch As String, _
    splitString As String, splitArray() As String
    Dim lngLastRow As Long, x As Long, y As Long, z As Long
    Dim objMatches As Object
    Dim rgx As RegExp
    Set rgx = New RegExp
    
    'Set references up-front
    Set wksRaw = ThisWorkbook.Sheets("Sheet1")
    strPattern = "\w{3}\d{4}\w{2,3}-\w{2}\d{6}\/\S+\w|\w{3}\d{4}\w{2,3}-\w{2}\d{6}"
    z = 0
    With rgx
        .Global = True
        .MultiLine = True
        .IgnoreCase = True
        .Pattern = strPattern
    End With
    
 
    lngLastRow = wksRaw.Cells.Find(What:="*", LookIn:=xlFormulas, _
                                   SearchOrder:=xlByRows, _
                                   SearchDirection:=xlPrevious).Row
                                   
    'insert a new column A for the data
     wksRaw.Columns("A:A").Insert Shift:=xlToRight, CopyOrigin:=xlFormatFromLeftOrAbove
     wksRaw.Columns("A").ColumnWidth = 24


    
    'Loop through all the rows backwards as we may insert data
     For x = lngLastRow To 1 Step -1
     strRaw = CStr(wksRaw.Range("B" & x).Value)
        
        'If the string inside the cell hits our RegExp, start the operation
        If rgx.Test(strRaw) Then
            
            'Assign the matches inside the string to an object
            Set objMatches = rgx.Execute(strRaw)
            If objMatches.Count > 1 Then wksRaw.Range("A" & x).Font.Color = vbRed: z = z + 1 ' highlight cell if more matches
            
            If InStr(objMatches.Item(0), "/") > 0 Then  ' our multi serial matches
            splitArray = Split(objMatches.Item(0), "/")
            wksRaw.Range("A" & x).Value = splitArray(0) 'objMatches.Item(0)
            
            For y = 1 To UBound(splitArray)
            wksRaw.Range("A" & x + 1).EntireRow.Insert
            wksRaw.Range("A" & x + 1).Value = Trim(Left(splitArray(0), Len(splitArray(0)) - Len(splitArray(y)))) & splitArray(y)
            Next y
            
            
            Else
            wksRaw.Range("A" & x).Value = objMatches.Item(0)
            End If
             
        Else
           
        End If
                 
     Next x
         
    
    


    If z > 0 Then MsgBox z & " cell/s highlighted in red due to  possible multiple matches in those adjacent cells"
xit:
Application.EnableEvents = True
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub
 
Upvote 0
Here is another macro for you to consider. It assumes your data is in Column A starting on Row 1. It also assume Column B is empty (because that is where it outputs the results starting on Row 1).
Code:
[table="width: 500"]
[tr]
	[td]Sub ExtractSerialNumbers()
  Dim R As Long, X As Long, Z As Long, Data As Variant
  Dim Prefix As String, Words() As String, Slashed() As String, Fixed() As String
  Data = Range("[B][COLOR="#FF0000"]A1[/COLOR][/B]", Cells(Rows.Count, "[B][COLOR="#FF0000"]A[/COLOR][/B]").End(xlUp))
  ReDim Fixed(1 To UBound(Data))
  For R = 1 To UBound(Data)
    Words = Split(Data(R, 1))
    For X = 0 To UBound(Words)
      If Words(X) Like "[A-Z][A-Z][A-Z]####[A-Z][A-Z]-[A-Z][A-Z]######*" Or Words(X) Like "[A-Z][A-Z][A-Z]####[A-Z][A-Z][A-Z]-[A-Z][A-Z]######*" Then
        If InStr(Words(X), "/") Then
          Slashed = Split(Words(X), "/")
          For Z = 1 To UBound(Slashed)
            Slashed(Z) = Left(Slashed(0), InStr(Slashed(0), "-") + 2) & Format(Slashed(Z), "000000")
          Next
          Words(X) = Join(Slashed, " ")
        End If
      Else
        Words(X) = ""
      End If
    Next
    Fixed(R) = Application.Trim(Join(Words))
  Next
  Words = Split(Join(Fixed))
  Range("[B][COLOR="#FF0000"]B1[/COLOR][/B]").Resize(UBound(Words) + 1) = Application.Transpose(Words)
End Sub[/td]
[/tr]
[/table]
 
Last edited:
Upvote 0
Ricks method is the cleanest to go with as it avoids enabling regex, you can always tweak it if you want to insert a new column at A to store the serials and even amend the bottom lines to the following if you want unique serials out of the data

Code:
    Range("B1").Resize(UBound(Words) + 1) = Application.Transpose(Words)


to

Code:
      With CreateObject("scripting.dictionary")        For Each it In Words()
            c00 = .Item(it)
        Next
        sn = .keys ' the array .keys contains all unique keys
        Words = Split(Join(.keys, " "))  ' you can join the array into a string
    End With
    Range("B1").Resize(UBound(Words) + 1) = Application.Transpose(Words)
 
Last edited:
Upvote 0
@MrTeeny you can actually use Regular Expressions without turning it on in the References window by using late binding. I adapted Rick's answer to use Regular Expressions as an example to have the same number of (non blank) lines but with fewer overall function calls and fewer single letter long variables.

Code:
Sub ExtractSerialNumbers()
  Dim Item As Variant, SerialNumbers As Collection, X As Long
  Dim Regex As Object, Match As Object, Prefix As String, Suffix As Variant
  
  Set Regex = CreateObject("VBScript.Regexp")
  Regex.Pattern = "([A-Z]{3}\d{4}[A-Z]{3,4}-[A-Z]{2})(\d{6}(?:\/\d+)*)"
  Regex.Global = True
  Regex.IgnoreCase = True
  
  Set SerialNumbers = New Collection
  
  For Each Item In Range("A1", Cells(Rows.Count, "A").End(xlUp)).Value2
    For Each Match In Regex.Execute(Item)
      Prefix = Match.SubMatches(0)
      
      For Each Suffix In Split(Match.SubMatches(1), "/")
        SerialNumbers.Add Prefix & Format$(Suffix, "000000")
      Next Suffix

    Next Match
  Next
  
  If SerialNumbers.Count = 0 Then Exit Sub
  
  ReDim ReturnData(1 To SerialNumbers.Count, 1 To 1) As String
  For X = 1 To SerialNumbers.Count
    ReturnData(X, 1) = SerialNumbers(X)
  Next X
  
  Range("B1").Resize(UBound(ReturnData)) = ReturnData

End Sub
 
Upvote 0
Thanks, I'm kind of new to VBA and programming so this board is a great place to learn as I'm pretty much picking up more efficient methods to do things all the while.
 
Upvote 0

Forum statistics

Threads
1,215,731
Messages
6,126,537
Members
449,316
Latest member
sravya

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top