VBA that can extract words that starts with # and ends with 2021.

whyjaydee

New Member
Joined
Feb 26, 2021
Messages
19
Office Version
  1. 2016
Platform
  1. Windows
Hi Everyone,

I am straggling to in searching for a vba code that will extract a string that starts with # and ends with 2021. The string has no specific length.
After extracting, I need it to be in one column listed. There can be multiple words that starts with # and ends with 2021 in one cell, note that the 1 cell for example cell A1, contains a paragraph long texts and within that text the code will search for all the words that starts with # and ends with 2021 and put it in another column. If the code found found more than 1 word that starts with # and ends with 2021, then it has to be listed per cell in 1 column.

Example
Here is the paragraph or bunch of words and characters in cell A1:

{"106835224676040":{"id":"106835224676040","time":1613053377,"author":"Luisito Noel Magleo","text":"#RRT2106102112021","highlighted":true,"type":"user"},"106830671343162":{"id":"106830671343162","time":1613052698,"author":"Paul John Honrubia","text":"#RRT_PaulJohnHonrubia","highlighted":true,"type":"user"},"#RRT2106102112021":{"id":"105041188188777","time":1612709664,"author":"Arnel Ramo","text":"#RRT1570702072021","highlighted":true,"type":"user"},"103766651649564":{"id":"103766651649564","time":1612463717,"author":"Algernon Sionosa","text":"

The string in Red font are the words that I need to get in the bunch of characters and needs to be listed in another column like this
Cell B1: #RRT2106102112021
Cell B2: #RRT2106102112021
Cell B3: #RRT1570702072021

Note that the words I'm looking for can appear repeatedly in the same paragraph

I appreciated all kinds of help.
 

Excel Facts

How to find 2nd largest value in a column?
MAX finds the largest value. =LARGE(A:A,2) will find the second largest. =SMALL(A:A,3) will find the third smallest

diddi

Well-known Member
Joined
May 20, 2004
Messages
3,297
Office Version
  1. 2010
Platform
  1. Windows
hi and welcome to MrExcel. this looks like a database dump or web rip. if that is the case it would be better to parse the info you need as it is collected. that said if all of that text is in the one cell, you can extract what you need with these couple of lines:
VBA Code:
Sub ParseData()
    Dim TempStr As String, LastRow As Long, Row As Long

    TempStr = "#RRT"
    LastRow = Cells(Rows.Count, "A").End(xlUp).Row
    For Row = 1 To LastRow
        datastr = Cells(Row, 1)
        Col = 1
        While InStr(datastr, TempStr) > 0
            datastr = Mid(datastr, InStr(datastr, TempStr) + Len(TempStr))
            If Val(Left(datastr, 2)) > 0 Then
                Col = Col + 1
                Cells(Row, Col) = TempStr & Trim(Left(datastr, InStr(datastr, "&") - 1))
            End If
        Wend
    Next Row
End Sub

it groups all the hits for each row in the same row, then moves down. it can be changed to just fill one column however
 

Herakles

Board Regular
Joined
Jul 5, 2020
Messages
73
Office Version
  1. 365
Platform
  1. Windows
Welcome to the forum whyjaydee.
Give this a go and let me know how you get on.

It assumes that the worksheet with the text is the active worksheet and that the text is in cell A1.

VBA Code:
Public Sub subExtractStrings()
Dim i As Integer
Dim strChars As String
Dim arrText() As String
Dim intLastRow As Integer

    strText = Replace(Range("A1"), "quot", " ")
    
    ' Replace these characters with a space.
    strChars = ":;(){},&"
    For i = 1 To Len(strChars)
        strText = Replace(strText, Mid(strChars, i, 1), " ")
    Next i
    
    ' Replace all double spaces with a single space.
    Do While InStr(1, strText, "  ", vbTextCompare) > 0
        strText = Replace(strText, "  ", " ", 1)
    Loop
        
    ' Populate an array with one word per element
    arrText = Split(strText, " ")
        
    ' Loop through the array and write to worksheet any words that start with a '#'
    ' and end with '2021'
    For i = LBound(arrText) To UBound(arrText)
        If Left(arrText(i), 1) = "#" And Right(arrText(i), 4) = "2021" Then
            Cells(intLastRow + 1, "B") = arrText(i)
            intLastRow = Cells(Rows.Count, "B").End(xlUp).Row
        End If
    Next i
    
End Sub
 

whyjaydee

New Member
Joined
Feb 26, 2021
Messages
19
Office Version
  1. 2016
Platform
  1. Windows
hi and welcome to MrExcel. this looks like a database dump or web rip. if that is the case it would be better to parse the info you need as it is collected. that said if all of that text is in the one cell, you can extract what you need with these couple of lines:
VBA Code:
Sub ParseData()
    Dim TempStr As String, LastRow As Long, Row As Long

    TempStr = "#RRT"
    LastRow = Cells(Rows.Count, "A").End(xlUp).Row
    For Row = 1 To LastRow
        datastr = Cells(Row, 1)
        Col = 1
        While InStr(datastr, TempStr) > 0
            datastr = Mid(datastr, InStr(datastr, TempStr) + Len(TempStr))
            If Val(Left(datastr, 2)) > 0 Then
                Col = Col + 1
                Cells(Row, Col) = TempStr & Trim(Left(datastr, InStr(datastr, "&") - 1))
            End If
        Wend
    Next Row
End Sub

it groups all the hits for each row in the same row, then moves down. it can be changed to just fill one column however
Hi diddi, appreciate this. I’ll give it a try. Thank you so much for helping. And yes this is a data base dump that I need to clean and count each words that starts with # and ends with 2021.
 

whyjaydee

New Member
Joined
Feb 26, 2021
Messages
19
Office Version
  1. 2016
Platform
  1. Windows

ADVERTISEMENT

Welcome to the forum whyjaydee.
Give this a go and let me know how you get on.

It assumes that the worksheet with the text is the active worksheet and that the text is in cell A1.

VBA Code:
Public Sub subExtractStrings()
Dim i As Integer
Dim strChars As String
Dim arrText() As String
Dim intLastRow As Integer

    strText = Replace(Range("A1"), "quot", " ")
   
    ' Replace these characters with a space.
    strChars = ":;(){},&"
    For i = 1 To Len(strChars)
        strText = Replace(strText, Mid(strChars, i, 1), " ")
    Next i
   
    ' Replace all double spaces with a single space.
    Do While InStr(1, strText, "  ", vbTextCompare) > 0
        strText = Replace(strText, "  ", " ", 1)
    Loop
       
    ' Populate an array with one word per element
    arrText = Split(strText, " ")
       
    ' Loop through the array and write to worksheet any words that start with a '#'
    ' and end with '2021'
    For i = LBound(arrText) To UBound(arrText)
        If Left(arrText(i), 1) = "#" And Right(arrText(i), 4) = "2021" Then
            Cells(intLastRow + 1, "B") = arrText(i)
            intLastRow = Cells(Rows.Count, "B").End(xlUp).Row
        End If
    Next i
   
End Sub
Hi Herakles,

It worked, however what if I have the whole column A filled with this kind of text?
From Cell A2:A3000 and it can be more than that? Can it also be done?
 

Herakles

Board Regular
Joined
Jul 5, 2020
Messages
73
Office Version
  1. 365
Platform
  1. Windows
Run the procedure called sunMain. Hope this helps.

VBA Code:
Public Sub SubMain()
Dim intLastRow As Integer
Dim arrData As Variant
Dim i As Integer
Dim strData As String

On Error GoTo Err_Handler

    ActiveWorkbook.Save
    
    With Range("A:A")
        .WrapText = True
        .ColumnWidth = 30
        .RowHeight = 30
        With .Cells(1, 1)
            .Value = "Raw Data"
            .Interior.Color = RGB(171, 171, 171)
            .Font.Bold = True
        End With
    End With
    
    With Range("B:B")
        .ColumnWidth = 30
        .RowHeight = 30
        .HorizontalAlignment = xlRight
        .ClearContents
        With .Cells(1, 1)
            .Value = "Extract"
            .HorizontalAlignment = xlLeft
            .IndentLevel = 1
            .Interior.Color = RGB(171, 171, 171)
            .Font.Bold = True
        End With
    End With
    
    Range("A2").Select
    ActiveWindow.FreezePanes = True
    
    intLastRow = Cells(Rows.Count, "A").End(xlUp).Row
        
    arrData = Range("A2:A" & intLastRow)
    
    For i = LBound(arrData) To UBound(arrData)
        strData = arrData(i, 1)
        Call subExtractStrings(i, strData)
    Next i

    MsgBox "Finished Extracting Strings", vbInformation, "Confirmation"

Exit_Handler:

    Exit Sub

Err_Handler:

    MsgBox Err.Number & "  " & Err.Description
    
    Resume Exit_Handler

End Sub

Public Sub subExtractStrings(intSourceRow As Integer, ByVal strText As String)
Dim i As Integer
Dim strChars As String
Dim arrText() As String
Dim s As String
Dim intNextRow As Integer

    strText = Replace(strText, "quot", " ")
    
    ' Replace these characters with a space.
    strChars = ":;(){},&"
    For i = 1 To Len(strChars)
        strText = Replace(strText, Mid(strChars, i, 1), " ")
    Next i
    
    ' Replace all double spaces with a single space.
    Do While InStr(1, strText, "  ", vbTextCompare) > 0
        strText = Replace(strText, "  ", " ", 1)
    Loop
            
    ' Populate an array with one word per element
    arrText = Split(strText, " ")
        
    If Range("B2").Value = "" Then
        intNextRow = 2
    Else
        intNextRow = Cells(Rows.Count, "B").End(xlUp).Row + 1
    End If
    
    ' Loop through the array and write to worksheet any words that start with a '#'
    ' and end with '2021'
    For i = LBound(arrText) To UBound(arrText)
        If Left(arrText(i), 1) = "#" And Right(arrText(i), 4) = "2021" Then
            Cells(intNextRow, "B") = arrText(i)
            intNextRow = intNextRow + 1
        End If
    Next i
            
End Sub
 

whyjaydee

New Member
Joined
Feb 26, 2021
Messages
19
Office Version
  1. 2016
Platform
  1. Windows

ADVERTISEMENT

Run the procedure called sunMain. Hope this helps.

VBA Code:
Public Sub SubMain()
Dim intLastRow As Integer
Dim arrData As Variant
Dim i As Integer
Dim strData As String

On Error GoTo Err_Handler

    ActiveWorkbook.Save
   
    With Range("A:A")
        .WrapText = True
        .ColumnWidth = 30
        .RowHeight = 30
        With .Cells(1, 1)
            .Value = "Raw Data"
            .Interior.Color = RGB(171, 171, 171)
            .Font.Bold = True
        End With
    End With
   
    With Range("B:B")
        .ColumnWidth = 30
        .RowHeight = 30
        .HorizontalAlignment = xlRight
        .ClearContents
        With .Cells(1, 1)
            .Value = "Extract"
            .HorizontalAlignment = xlLeft
            .IndentLevel = 1
            .Interior.Color = RGB(171, 171, 171)
            .Font.Bold = True
        End With
    End With
   
    Range("A2").Select
    ActiveWindow.FreezePanes = True
   
    intLastRow = Cells(Rows.Count, "A").End(xlUp).Row
       
    arrData = Range("A2:A" & intLastRow)
   
    For i = LBound(arrData) To UBound(arrData)
        strData = arrData(i, 1)
        Call subExtractStrings(i, strData)
    Next i

    MsgBox "Finished Extracting Strings", vbInformation, "Confirmation"

Exit_Handler:

    Exit Sub

Err_Handler:

    MsgBox Err.Number & "  " & Err.Description
   
    Resume Exit_Handler

End Sub

Public Sub subExtractStrings(intSourceRow As Integer, ByVal strText As String)
Dim i As Integer
Dim strChars As String
Dim arrText() As String
Dim s As String
Dim intNextRow As Integer

    strText = Replace(strText, "quot", " ")
   
    ' Replace these characters with a space.
    strChars = ":;(){},&"
    For i = 1 To Len(strChars)
        strText = Replace(strText, Mid(strChars, i, 1), " ")
    Next i
   
    ' Replace all double spaces with a single space.
    Do While InStr(1, strText, "  ", vbTextCompare) > 0
        strText = Replace(strText, "  ", " ", 1)
    Loop
           
    ' Populate an array with one word per element
    arrText = Split(strText, " ")
       
    If Range("B2").Value = "" Then
        intNextRow = 2
    Else
        intNextRow = Cells(Rows.Count, "B").End(xlUp).Row + 1
    End If
   
    ' Loop through the array and write to worksheet any words that start with a '#'
    ' and end with '2021'
    For i = LBound(arrText) To UBound(arrText)
        If Left(arrText(i), 1) = "#" And Right(arrText(i), 4) = "2021" Then
            Cells(intNextRow, "B") = arrText(i)
            intNextRow = intNextRow + 1
        End If
    Next i
           
End Sub
Hi Herakles,

This works like a charm.
You are so great!!

This forum is a great channel to get help for those who are struggling like me.
Thank you everyone!
 

Rick Rothstein

MrExcel MVP
Joined
Apr 18, 2011
Messages
36,975
Office Version
  1. 2016
Platform
  1. Windows
Here is another macro that you can consider...
VBA Code:
Sub SplitPound2021TextFromA1ToB1Down()
  Dim N As Long, X As Long, Arr1 As Variant, Arr2 As Variant
  Arr1 = Split([A1], "#")
  For X = 1 To UBound(Arr1)
    If Arr1(X) Like "*2021*" Then
      Arr2 = Split(Arr1(X), 2021)
      N = N + 1
      Cells(N, "B") = "#" & Arr2(0) & 2021
    End If
  Next
End Sub
 

whyjaydee

New Member
Joined
Feb 26, 2021
Messages
19
Office Version
  1. 2016
Platform
  1. Windows
Run the procedure called sunMain. Hope this helps.

VBA Code:
Public Sub SubMain()
Dim intLastRow As Integer
Dim arrData As Variant
Dim i As Integer
Dim strData As String

On Error GoTo Err_Handler

    ActiveWorkbook.Save
   
    With Range("A:A")
        .WrapText = True
        .ColumnWidth = 30
        .RowHeight = 30
        With .Cells(1, 1)
            .Value = "Raw Data"
            .Interior.Color = RGB(171, 171, 171)
            .Font.Bold = True
        End With
    End With
   
    With Range("B:B")
        .ColumnWidth = 30
        .RowHeight = 30
        .HorizontalAlignment = xlRight
        .ClearContents
        With .Cells(1, 1)
            .Value = "Extract"
            .HorizontalAlignment = xlLeft
            .IndentLevel = 1
            .Interior.Color = RGB(171, 171, 171)
            .Font.Bold = True
        End With
    End With
   
    Range("A2").Select
    ActiveWindow.FreezePanes = True
   
    intLastRow = Cells(Rows.Count, "A").End(xlUp).Row
       
    arrData = Range("A2:A" & intLastRow)
   
    For i = LBound(arrData) To UBound(arrData)
        strData = arrData(i, 1)
        Call subExtractStrings(i, strData)
    Next i

    MsgBox "Finished Extracting Strings", vbInformation, "Confirmation"

Exit_Handler:

    Exit Sub

Err_Handler:

    MsgBox Err.Number & "  " & Err.Description
   
    Resume Exit_Handler

End Sub

Public Sub subExtractStrings(intSourceRow As Integer, ByVal strText As String)
Dim i As Integer
Dim strChars As String
Dim arrText() As String
Dim s As String
Dim intNextRow As Integer

    strText = Replace(strText, "quot", " ")
   
    ' Replace these characters with a space.
    strChars = ":;(){},&"
    For i = 1 To Len(strChars)
        strText = Replace(strText, Mid(strChars, i, 1), " ")
    Next i
   
    ' Replace all double spaces with a single space.
    Do While InStr(1, strText, "  ", vbTextCompare) > 0
        strText = Replace(strText, "  ", " ", 1)
    Loop
           
    ' Populate an array with one word per element
    arrText = Split(strText, " ")
       
    If Range("B2").Value = "" Then
        intNextRow = 2
    Else
        intNextRow = Cells(Rows.Count, "B").End(xlUp).Row + 1
    End If
   
    ' Loop through the array and write to worksheet any words that start with a '#'
    ' and end with '2021'
    For i = LBound(arrText) To UBound(arrText)
        If Left(arrText(i), 1) = "#" And Right(arrText(i), 4) = "2021" Then
            Cells(intNextRow, "B") = arrText(i)
            intNextRow = intNextRow + 1
        End If
    Next i
           
End Sub
Hi Herakles,

I noticed that some of the hit was not extracted from the pool of characters:
Example pool is below

This is placed in cell A1:
The #RRT2401502112021 is not found in the result which is in column B
(That is just 1 of the example that is missing, not sure if there are others)

Thank you in advance


{"146865850602110":{"id":"146865850602110","time":1613838298,"author":"Rosemarie Villaflor","text":"#EMAILER1250702202021","highlighted":true,"type":"user"},"146320887323273":{"id":"146320887323273","time":1613761997,"author":"Luisito Noel Magleo","text":"#RRT2106102192021","highlighted":true,"type":"user"},"145852334036795":{"id":"145852334036795","time":1613696580,"author":"Anna Janelle Mationg","text":"#RRT1710002192021","highlighted":true,"type":"user"},"145021177453244":{"id":"145021177453244","time":1613569348,"author":"Princess Anne Beth Banton","text":"#RRT1988802172021\n","highlighted":true,"type":"user"},"144573630831332":{"id":"144573630831332","time":1613495583,"author":"Rachaelle Velasquez","text":"#EMAILER2022602162021\n","highlighted":true,"type":"user"},"144558917499470":{"id":"144558917499470","time":1613493769,"author":"-","text":"Changed Status from Pending Response to Response Received.","highlighted":true,"type":"system"},"143146534307375":{"id":"143146534307375","time":1613285937,"author":"Louis Cristian Espaldon","text":"#RRT1711302142021","highlighted":true,"type":"user"},"142358607719501":{"id":"142358607719501","time":1613168026,"author":"Anthony Ely Longayan","text":"#RRT2054602132021\n","highlighted":true,"type":"user"},"141360887819273":{"id":"141360887819273","time":1613018286,"author":"Frederic Manayon","text":"#RRT2401502112021\n","highlighted":true,"type":"user"},"140714124550616":{"id":"140714124550616","time":1612921171,"author":"Rommel Jaramillo","text":"#RRT2301802102021\n","highlighted":true,"type":"user"},"139124651376230":{"id":"139124651376230","time":1612682225,"author":"Gellie Ruth Catchuela","text":"#RRT941102072021","highlighted":true,"type":"user"},"137716121517083":{"id":"137716121517083","time":1612466935,"author":"Dan Allen De Guzman","text":"Used Knowledge Base article [Pages Operations: Access: Disputed Admin 568236683662374]","highlighted":true,"type":"system"}}​
 

whyjaydee

New Member
Joined
Feb 26, 2021
Messages
19
Office Version
  1. 2016
Platform
  1. Windows
Hi Rick,
Here is another macro that you can consider...
VBA Code:
Sub SplitPound2021TextFromA1ToB1Down()
  Dim N As Long, X As Long, Arr1 As Variant, Arr2 As Variant
  Arr1 = Split([A1], "#")
  For X = 1 To UBound(Arr1)
    If Arr1(X) Like "*2021*" Then
      Arr2 = Split(Arr1(X), 2021)
      N = N + 1
      Cells(N, "B") = "#" & Arr2(0) & 2021
    End If
  Next
End Sub

This works great, however my rows in column A can be up to 5000 or even more, the code only works on cells A1 and A2, it does not continue up to the last filled cell.

Thank you in advance
 

Watch MrExcel Video

Forum statistics

Threads
1,129,574
Messages
5,637,159
Members
416,959
Latest member
Mohzein

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Top