VBA - Split the data

Mallesh23

Well-known Member
Joined
Feb 4, 2009
Messages
976
Office Version
  1. 2010
Platform
  1. Windows
Hi Team,

Column A is a Consolidated Invoices of different Country's Company. different style of drafting.

Now task is extract Invoices separately. Expected Output is in Column B.

I am Unable to split the data as Column A data is not a standard one.

Looking for a Macro. Thanks.

Sample dummy data

Book1
AB
1InvoiceExpected Invoice
25300000798-PIF-PR5300000798
37350001120 RW BAL7350001120
45100000182-Bal-205100000182
5D-PR-10967-315000492315000492
6F-W-10713-3150004797
72333-RW-4400000176-P4400000176
862001105926200110592
97800000292-8451-P7800000292
10635011186 - 10 - p635011186
11D-ED-10566-315000470315000470
12F-W-10488-31500047003150004700
135650000669/W&D/DEP5650000669
145650000671/W&D/DEP5650000671
155650000606 fnl pmt S5650000606
165750000669-bal-355750000669
17refu25188/60000005026000000502
189161/4750000388/vs/p4750000388
19F-ED-9861-31500044043150004404
206550000757fp106550000757
219202/4750000419/VS/P4750000419
227250000387RWDEP7250000387
239619/4750000712/RW/C4750000712
24F-W-10455-31500046823150004682
25D-W-11009-31500049413150004941
2661001540222366-10-P6.10015E+13
2762000007806200000780
28F-RW-10212-0004571
29P25132/6000000466 RW6000000466
302365-ED-4400000196-D4400000196
3124903 RW/PP (ML)
326550000363dp106550000363
3354926931649657885492693164965788
346350000532 - 10 - D6350000532
356550000882dp306550000882
36D/P/5050000791-167625050000791
37f-sr-3150003955-88533150003955
386150000163/Refunded Dep/Cancelled Contract/106150000163
39AX4150000361841500003618
404150000330 - D
4124318 PR/PO
424300001631 - 10 - P4300001631
43DP-PR-#2394-281
Sheet1


Thanks
mg
 
Hi Peter/Rick

Thank you so much for giving multiplt ways of doing task.

I have similar Task2 , here I want to extract two digit no from a string.
as per Criteria list in Column D.

Find Which Product code has come in invoice, from Criteria list.
and Seperate it into Column D.

Column- A Input Data
Column - B Expected Output.
Column - D Criteria.

Below data with output

Book1 (version 2).xlsb
ABCDE
1InvoicesProduct codeProduct code CriteriaMeaning
27350002095-17661-10-bal1010Laptop
37550001171 30 REF3030Desktop
45500001065-10-DEP-176421035desktop
55100001456-10Dep-100101020Tab
65100001662-10PIF-994510
75100001577-35PIF-983135
86450000141-35-PIF35
94300122304 - 20 - P20
105650000619/15334 F W-No two digit here or Product code not found
11-no data leave blank/dash either
125010001010dp3535
135010000955fp1010
145010000016fp1010
155010001013dp1010
165010001014dp10/2010/or20First one /Either of it/ blank
Sheet2



Thanks
mg
 
Upvote 0

Excel Facts

Why are there 1,048,576 rows in Excel?
The Excel team increased the size of the grid in 2007. There are 2^20 rows and 2^14 columns for a total of 17 billion cells.
I have assumed large data again & no need to be dynamic.
Also assumed that product codes do not start with a zero. If they can, then a modification can be made.

See if this does what you want. Per your notes on the last example, this will return the first 2-digit set.

VBA Code:
Sub Get_Product_Code()
  Dim RX As Object
  Dim a As Variant, b As Variant
  Dim i As Long
 
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Pattern = "(^|\D)(\d\d)(\D|$)"
  a = Range("A2", Range("A" & Rows.Count).End(xlUp)).Value
  ReDim b(1 To UBound(a), 1 To 1)
  For i = 1 To UBound(a)
    If RX.Test(a(i, 1)) Then b(i, 1) = RX.Execute(a(i, 1))(0).submatches(1)
  Next i
  Range("B2").Resize(UBound(a)).Value = b
End Sub
 
Upvote 0
Oops, didn't take account of "No two digit here or Product code not found"

This version should allow for that (return blank). If there are more than one product codes found then this will return the last one, but that still fits with your requirement I believe.

VBA Code:
Sub Get_Product_Code_v2()
  Dim RX As Object
  Dim a As Variant, b As Variant, itm As Variant
  Dim i As Long
  Dim ProdCodes As String
  
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.Pattern = "(^|\D)(\d\d)(?=\D|$)"
  ProdCodes = "|" & Join(Application.Transpose(Range("D2", Range("D" & Rows.Count).End(xlUp))), "|") & "|"
  a = Range("A2", Range("A" & Rows.Count).End(xlUp)).Value
  ReDim b(1 To UBound(a), 1 To 1)
  For i = 1 To UBound(a)
    For Each itm In RX.Execute(a(i, 1))
      If InStr(1, ProdCodes, "|" & itm.submatches(1) & "|") > 0 Then b(i, 1) = itm.submatches(1)
    Next itm
  Next i
  Range("B2").Resize(UBound(a)).Value = b
End Sub

Mallesh23 2020-08-13 1.xlsm
ABCDE
1InvoicesProduct codeProduct code CriteriaMeaning
27350002095-17661-10-bal1010Laptop
37550001171 30 REF3030Desktop
45500001065-10-DEP-176421035desktop
55100001456-10Dep-100101020Tab
65100001662-10PIF-994510
75100001577-35PIF-983135
86450000141-35-PIF35
94300122304 - 20 - P20
105650000619/15334 F WNo two digit here or Product code not found
11no data leave blank/dash either
125010001010dp3535
135010000955fp1010
145010000016fp1010
155010001013dp1010
165010001014dp10/2020First one /Either of it/ blank
175010000016fp15No product code found
185010000016fp20/1620multiple 2-digits but one is not product code
195010000016fp16/20/17/1010multiple 2-digits but some not product codes
Sheet2
 
Upvote 0
Hi Peter

Really amazing Thanks once again, I am excited to learn Regular experssion now.

Please can you comment below line to understand what it is doing.

RX.Pattern = "(^|\D)(\d\d)(?=\D|$)"
If InStr(1, ProdCodes, "|" & itm.submatches(1) & "|") > 0 Then b(i, 1) = itm.submatches(1)


Also One more Challenge I have here, If I want to extract text(abbreviation) as Criteria list what will be the code.
Similar task but Criteria is now Text


Column- A Input Data
Column - B Expected Output.
Column - D Criteria.


Below is Data Table,expected output and Criteria Column.

Book1 (version 2).xlsb
ABCDE
1InvoicesProduct abbreviationProduct AbbreviationMeaning
27350002095-17661-10-mosMOSmosMouse
37550001171 30 hedHEDhedheadphone
45500001065-10-Pen-17642PenPenPendrive
55100001456-10Pen-10010penRTRouter
65100001662-10Pen-9945penhHard DISK
75100001577-35Pen-9831penmtMotherboard
86450000141-35-RTRTSRScreen Guard
94300122304 - 20 - HHOTGOTG Pendrive
105650000619/15334 F DDDDesktop
11KBKeyboard
125010001010MT35MT
135010000955mt10MT
145010000016SR10SR
155010001013OTG10OTG
165010001014KB10/20KB
Sheet2


Thanks
mg
 
Upvote 0
Please can you comment below line to understand what it is doing.

RX.Pattern = "(^|\D)(\d\d)(?=\D|$)"
The pattern is
The start of the string in the cell (^) or (|) not a digit (\D)
Followed by two digits (\d\d)
Then look ahead to what comes next (?=) either a non-digit (\D) or (|) the end of the text ($)

Please can you comment below line to understand what it is doing.

If InStr(1, ProdCodes, "|" & itm.submatches(1) & "|") > 0 Then b(i, 1) = itm.submatches(1)
Look at each pair of digits found (the blue part above) in the list of product codes to see if it is there. All codes separated by "|" to ensure the exact code is found in case any product codes in column D had more than 2-digits
 
Upvote 0
If I want to extract text(abbreviation) as Criteria list what will be the code.
Try

VBA Code:
Sub Get_Product_Abbrev()
  Dim RX As Object
  Dim a As Variant, b As Variant
  Dim i As Long
  
  Set RX = CreateObject("VBScript.RegExp")
  RX.Global = True
  RX.IgnoreCase = True
  RX.Pattern = "(^|[^a-z])(" & Join(Application.Transpose(Range("D2", Range("D" & Rows.Count).End(xlUp))), "|") & ")([^a-z]|$)"
  a = Range("A2", Range("A" & Rows.Count).End(xlUp)).Value
  ReDim b(1 To UBound(a), 1 To 1)
  For i = 1 To UBound(a)
    If RX.Test(a(i, 1)) Then b(i, 1) = RX.Execute(a(i, 1))(0).submatches(1)
  Next i
  Range("B2").Resize(UBound(a)).Value = b
End Sub
 
Upvote 0
Hi Peter

Great !! This also worked as expected. ?

So we have seperated product code, product abbreviation.
Last question can you help me in extracting Full Account No using above same approach.

Column A is data
Column B is expected data.

Criteria to pick Full Account No is Column D. Account No should begin with specified digit. else blank

Below is a table with expected output and Criteria

Book1 (version 2).xlsb
ABCD
1InvoicesProduct abbreviationCriteria Account Code
27350002095-17661-10-mos73500020957350
3P25125/6000000453 RW6000000453600
4P/W/1656616566165
54500000535-10-P4500000535450
6D-W-10983-315000492710983109
7FP-PR-#11746-31111746117
8F-W-10713-31500047973150004797315
9refu25188/600000050225188251
10F-SID-10306-3150004631500046315
11
129618/475017300/RW/D475017300475
13F-W-13158-31500047003150004700315
Sheet2



Thanks
mg ?
 
Upvote 0
Could there really be blank cells in the column D criteria list or did that only happen because you decided later to insert a blank row in your sample data?
 
Upvote 0
Further question. I didn't notice that the values in column B match row-by-row the values in column D. Is that how it will work?
That is, would cell B2 below be blank because there is not a "109*" number in A2 or will B2 still be 7350002095 because 7350 is a prefix listed somewhere in column D?

Mallesh23 2020-08-13 1.xlsm
ABCD
1InvoicesProduct abbreviationCriteria Account Code
27350002095-17661-10-mos109
37350
Sheet5)
 
Upvote 0
Hi Peter,

OMG, , my mistake , Typo Error.
1) Criteria list is not row by row. bring output first Match as per Criteria list.
2) Reg 7350, usually account Criteria is 3 digit. if future change in Criteria list it should accomodate

Criteria will be Inner loop. assume there are no duplicates in account code, and always 3 digit. Thanks


Thanks
mg
 
Upvote 0

Forum statistics

Threads
1,216,117
Messages
6,128,935
Members
449,480
Latest member
yesitisasport

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top