Parse column and extract matching string

hoyonono

New Member
Joined
Dec 9, 2014
Messages
3
Hi everybody,

I got programming skills, but never had to deal with VBA. I can read it but I can’t do it from scratch.
I tried a lot of research but failed to find something that was matching my problem.
I understand that my problem must be a classic use case but I couldn’t find the right way to search about it.

My problem:
I got a scanned invoice gone through an OCR, giving me 1200rows for ~500 “codes”.
I need to extract all the “code” in the text of column A.
I got the A column, I want to obtain B column.

[TABLE="width: 611"]
<tbody>[TR]
[TD][/TD]
[TD]A[/TD]
[TD]B[/TD]
[/TR]
[TR]
[TD]1[/TD]
[TD]0[/TD]
[TD]TER1234567890[/TD]
[/TR]
[TR]
[TD]2[/TD]
[TD][/TD]
[TD]TER1234567891[/TD]
[/TR]
[TR]
[TD]3[/TD]
[TD]sdf sdfsdfsdfsdf[/TD]
[TD]RS1234567890[/TD]
[/TR]
[TR]
[TD]4[/TD]
[TD]sdf5454sdf TER1234567890 sdfjklslkdfjkj[/TD]
[TD]RS1234567891[/TD]
[/TR]
[TR]
[TD]5[/TD]
[TD]sdkljfklsjdklfj TER1234567891[/TD]
[TD]TER1234567899[/TD]
[/TR]
[TR]
[TD]6[/TD]
[TD]RS1234567890[/TD]
[TD]…[/TD]
[/TR]
[TR]
[TD]7[/TD]
[TD][/TD]
[TD][/TD]
[/TR]
[TR]
[TD]8[/TD]
[TD]dfgdfgdfgdfgdfg[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]9[/TD]
[TD]RS1234567891 TER1234567899 sdfgdfgdf[/TD]
[TD][/TD]
[/TR]
[TR]
[TD]10[/TD]
[TD]0[/TD]
[TD][/TD]
[/TR]
[TR]
[TD][/TD]
[TD]…[/TD]
[TD][/TD]
[/TR]
</tbody>[/TABLE]

link to xls file : http://wikisend.com/download/803406/sample_****ty_data.xlsx

Every A cells :
· Got 0, 1 or more “code”
· Got a space “ “ before the “code” or is the first string in the cell
· Got a space “ “ after the “code” or is the last string in the cell

I can define all “codes” like:
· “ TER?????????? ”
· “ RS??????????? “
· …
· Always 2 or 3 letters + 10 numbers



I can define B like:
FOR EACH cells in A column
IF text in cell contains “TER??????????“ or “RS??????????“ or …
Extract the “code” to first empty cell in B column (if there is 2 or more “code” in the A cell, I need to extract each code in 1 row in B column)
ELSE nothing


How can I extract the B column from A ?
All hints are welcomed (even like: ‘goat, go search that : “problem expressed clearly for coherent search results” )!

Thanks for your attention excel masters,
Hoyoyonono, excel goat.
 
Last edited:

Excel Facts

Repeat Last Command
Pressing F4 adds dollar signs when editing a formula. When not editing, F4 repeats last command.
I dont know, something like :

Code:
bool no_more_code;
int j = 0;

FOR EACH cell in A column
    no_more_code = false;
    WHILE (no_more_code = false)
        IF string match regex and not in B
            Extract code in Bj
            j++
        ELSE
            no_more_code = true;
END FOR EACH
 
Last edited:
Upvote 0
Will the 10 digits in your codes always be the only 10-digit numbers within the text that is in each cell?
 
Upvote 0
hoyonono,

Welcome to the MrExcel forum.

1. What version of Excel and Windows are you using?

2. Are you using a PC or a Mac?


The following macro is based on your displayed text strings .


Sample raw data in worksheet Feuil1:


Excel 2007
AB
10
2
3sdf sdfsdfsdfsdf
4sdf5454sdf TER1234567890 sdfjklslkdfjkj
5sdkljfklsjdklfj TER1234567891
6RS1234567890
7
8dfgdfgdfgdfgdfg
9RS1234567891 TER1234567890 sdfgdfgdf
100
11
Feuil1


After the macro:


Excel 2007
AB
10TER1234567890
2TER1234567891
3sdf sdfsdfsdfsdfRS1234567890
4sdf5454sdf TER1234567890 sdfjklslkdfjkjRS1234567891
5sdkljfklsjdklfj TER1234567891TER1234567890
6RS1234567890
7
8dfgdfgdfgdfgdfg
9RS1234567891 TER1234567890 sdfgdfgdf
100
11
Feuil1


Please TEST this FIRST in a COPY of your workbook (always make a backup copy before trying new code, you never know what you might lose).

1. Copy the below code
2. Open your NEW workbook
3. Press the keys ALT + F11 to open the Visual Basic Editor
4. Press the keys ALT + I to activate the Insert menu
5. Press M to insert a Standard Module
6. Where the cursor is flashing, paste the code
7. Press the keys ALT + Q to exit the Editor, and return to Excel
8. To run the macro from Excel press ALT + F8 to display the Run Macro Dialog. Double Click the macro's name to Run it.

Code:
Sub ExtractStrings()
' hiker95, 12/09/2014, ME823147
Dim c As Range, s, i As Long, nr As Long
Application.ScreenUpdating = False
With Sheets("Feuil1")
  nr = 0
  For Each c In .Range("A1", .Range("A" & Rows.Count).End(xlUp))
    If c <> "" And c <> 0 Then
      If WorksheetFunction.IsText(c) Then
        If InStr(Trim(c), " ") Then
          s = Split(Trim(c), " ")
          For i = LBound(s) To UBound(s)
            If Left(s(i), 2) = "RS" And Len(s(i)) = 12 And IsNumeric(Right(s(i), 10)) Then
              nr = nr + 1
              .Cells(nr, 2) = s(i)
            ElseIf Left(s(i), 3) = "TER" And Len(s(i)) = 13 And IsNumeric(Right(s(i), 10)) Then
              nr = nr + 1
              .Cells(nr, 2) = s(i)
            End If
          Next i
        ElseIf Left(Trim(c), 2) = "RS" And Len(Trim(c)) = 12 And IsNumeric(Right(Trim(c), 10)) Then
          nr = nr + 1
          .Cells(nr, 2) = Trim(c)
        ElseIf Left(Trim(c), 3) = "TER" And Len(Trim(c)) = 13 And IsNumeric(Right(Trim(c), 10)) Then
          nr = nr + 1
          .Cells(nr, 2) = Trim(c)
        End If
      End If
    End If
  Next c
  .Columns(2).AutoFit
End With
Application.ScreenUpdating = True
End Sub

Before you use the macro with Excel 2007 or newer, save your workbook, Save As, a macro enabled workbook with the file extension .xlsm

Then run the ExtractStrings macro.
 
Upvote 0
hoyonono,

IF text in cell contains “TER??????????“ or “RS??????????“ or …

I missed the or ...

Can we see some other examples of the or ...?

Or, can it always be any 2 or 3 uppercase letters with 10 numbers?
 
Upvote 0
hoyonono,

If it can it always be any 2 or 3 uppercase letters with 10 numbers, then:

With the same screenshots as my reply #4:

Please TEST this FIRST in a COPY of your workbook (always make a backup copy before trying new code, you never know what you might lose).

Code:
Sub ExtractStrings_V2()
' hiker95, 12/09/2014, ME823147
Dim c As Range, s, i As Long, nr As Long
Application.ScreenUpdating = False
With Sheets("Feuil1")
  nr = 0
  For Each c In .Range("A1", .Range("A" & Rows.Count).End(xlUp))
    If c <> "" And c <> 0 Then
      If WorksheetFunction.IsText(c) Then
        If InStr(Trim(c), " ") Then
          s = Split(Trim(c), " ")
          For i = LBound(s) To UBound(s)
            If Left(s(i), 2) Like "[A-Z][A-Z]" And Len(s(i)) = 12 And IsNumeric(Right(s(i), 10)) Then
              nr = nr + 1
              .Cells(nr, 2) = s(i)
            ElseIf Left(s(i), 3) Like "[A-Z][A-Z][A-Z]" And Len(s(i)) = 13 And IsNumeric(Right(s(i), 10)) Then
              nr = nr + 1
              .Cells(nr, 2) = s(i)
            End If
          Next i
        ElseIf Left(Trim(c), 2) Like "[A-Z][A-Z]" And Len(Trim(c)) = 12 And IsNumeric(Right(Trim(c), 10)) Then
          nr = nr + 1
          .Cells(nr, 2) = Trim(c)
        ElseIf Left(Trim(c), 3) Like "[A-Z][A-Z][A-Z]" And Len(Trim(c)) = 13 And IsNumeric(Right(Trim(c), 10)) Then
          nr = nr + 1
          .Cells(nr, 2) = Trim(c)
        End If
      End If
    End If
  Next c
  .Columns(2).AutoFit
End With
Application.ScreenUpdating = True
End Sub

Before you use the macro with Excel 2007 or newer, save your workbook, Save As, a macro enabled workbook with the file extension .xlsm

Then run the ExtractStrings_V2 macro.
 
Upvote 0
Will the 10 digits in your codes always be the only 10-digit numbers within the text that is in each cell?
If all 10-digit numbers in the text will only belong to codes, then there would be no need to maintain a list of acceptable codes as this macro should do what you want...
Code:
Sub ExtractCodes()
  Dim R As Long, P As Long, N As Long, LastRow As Long, Space As Long, Combined As Variant, vOut As Variant
  LastRow = Cells(Rows.Count, "A").End(xlUp).Row
  Combined = Range("A1:A" & LastRow)
  ReDim vOut(1 To UBound(Combined), 1 To 1)
  For R = 1 To UBound(Combined)
    For P = 1 To Len(Combined(R, 1))
      If Mid(Combined(R, 1), P, 10) Like "##########" Then
        N = N + 1
        Space = InStrRev(" " & Combined(R, 1), " ", P)
        vOut(N, 1) = Mid(Combined(R, 1), Space, P - Space + 10)
      End If
    Next
  Next
  Range("B1:B" & LastRow) = vOut
End Sub
 
Upvote 0
Yey! That's it :)

Thanks for your fast and accurate answer hiker95 and Rick Rothstein.

hicker95, ExtractStrings_V2 works great, i just needed to make very little change as there was also 2 letter/11numerics and others but its on my side. Your macro is great for the informations i provided :)

Rick Rothstein, working great too :) i just used 1 more step for filter cells strating by "XXX..." , there was a lot of unexpected numbers extracted. But same as hicker95, working perfectly with informations i provided.

Thank you both,
Cya
 
Upvote 0
hoyonono,

Thanks for the feedback.

You are very welcome. Glad we could help.

And, come back anytime.

i just needed to make very little change as there was also 2 letter/11numerics and others but its on my side.

If you can give us more details, and, examples, we can update our macros for you.
 
Upvote 0

Forum statistics

Threads
1,223,099
Messages
6,170,108
Members
452,302
Latest member
TaMere

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top