Parse complex, multi record string

TreeDude

New Member
Joined
Oct 1, 2021
Messages
7
Office Version
  1. 365
Platform
  1. Windows
I need to parse complex, variable length strings (from a csv export from a purchasing system when coding is split) where each string may contact one record or multiple records. This example contains three records. The fields may contain 2 to 99 individual records.

Some of the data has been replaced with fake data to preserve business confidentiality:

3 Accounts: 650.22 USD (8.46%) New Store/Facility (180000) US, 6666 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-200 - Whole Body-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 200 - Whole Body (200), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020); 3,196.56 USD (41.59%) New Store/Facility (180000) US, 6666 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-160 - Meat-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 160 - Meat (160), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020); 3,839.09 USD (49.95%) New Store/Facility (180000) US, 8888 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-180 - Seafood-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 180 - Seafood (180), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020)

Records start with number of accounts ("3 Accounts") not repeated in the string, then a numeric amount. Then each significant field (Proj ID, BU, Account, Team, Sub-Team, Project Code) in each record is repeated three to six times - once in parenthesis, once w/ Field Name: 00000 format, and one to four times in "open strings". The end of each record is a semicolon.

I need to extract each significant field ONCE and remove all redundant or unnecessary data. And, of course, multiple records need to be separated into multiple lines. To make it more complicated, these strings have cells (fields) both left and right of them, mostly in good shape from the export. So I have to count the number of actual records in the string and store that to use for loops that copy the data to the left and right down the appropriate number of times.

Non-split records in this field look like this:

9999-66666-180000-120--21020

And are easily parsed using hyphens as a delimiter. So the result of parsing the "split" fields should be 6 fields separated by hyphens, even if the fields are empty, by however many lines are necessary for the number of records.

The solution could be VBA, formulas, or a combination of both. I've been racking my brain but can't really wrap my head around a solution. Thank you in advance.
 

Attachments

  • Example.png
    Example.png
    36.3 KB · Views: 13

Excel Facts

Copy a format multiple times
Select a formatted range. Double-click the Format Painter (left side of Home tab). You can paste formatting multiple times. Esc to stop
you can split that cell to columns using the semocolon as separator, but then you want all the rest of the data nice assigned to a term.
Do you want VBA (a macro) writing everything to a table ?

Map1
ABCDE
13 Accounts: 650.22 USD (8.46%) New Store/Facility (180000) US, 6666 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-200 - Whole Body-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 200 - Whole Body (200), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020); 3,196.56 USD (41.59%) New Store/Facility (180000) US, 6666 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-160 - Meat-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 160 - Meat (160), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020); 3,839.09 USD (49.95%) New Store/Facility (180000) US, 8888 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-180 - Seafood-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 180 - Seafood (180), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020)3 Accounts: 650.22 USD (8.46%) New Store/Facility (180000) US, 6666 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-200 - Whole Body-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 200 - Whole Body (200), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020) 3,196.56 USD (41.59%) New Store/Facility (180000) US, 6666 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-160 - Meat-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 160 - Meat (160), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020) 3,839.09 USD (49.95%) New Store/Facility (180000) US, 8888 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-180 - Seafood-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 180 - Seafood (180), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020)
Blad2
 
Upvote 0
How about
+Fluff 1.xlsm
ABCDEFG
1Proj IDBUAccountTeamSub-TeamProjCode
23 Accounts: 650.22 USD (8.46%) New Store/Facility (180000) US, 6666 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-200 - Whole Body-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 200 - Whole Body (200), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020); 3,196.56 USD (41.59%) New Store/Facility (180000) US, 6666 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-160 - Meat-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 160 - Meat (160), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020); 3,839.09 USD (49.95%) New Store/Facility (180000) US, 8888 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-180 - Seafood-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 180 - Seafood (180), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020)666699999180000200 21020
Main
Cell Formulas
RangeFormula
B2:G2B2=LET(t,FILTERXML("<k><m>"&SUBSTITUTE(SUBSTITUTE($A2,", ",":"),":","</m><m>")&"</m></k>","//m[starts-with(.,'"&B$1&"')]/following::m[1]"),IFERROR(INDEX(LEFT(t,FIND(" ",t)-1),1),""))
 
Upvote 0
Hi,

If your data is always in that format, this should work also:

Book3.xlsx
ABCDEFG
1Proj IDBUAccountTeamSub-TeamProjCode
23 Accounts: 650.22 USD (8.46%) New Store/Facility (180000) US, 6666 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-200 - Whole Body-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 200 - Whole Body (200), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020); 3,196.56 USD (41.59%) New Store/Facility (180000) US, 6666 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-160 - Meat-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 160 - Meat (160), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020); 3,839.09 USD (49.95%) New Store/Facility (180000) US, 8888 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-180 - Seafood-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 180 - Seafood (180), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020)666699999180000200 21020
Sheet950
Cell Formulas
RangeFormula
B2:G2B2=TRIM(LEFT(SUBSTITUTE(SUBSTITUTE(MID($A2,SEARCH(B1,$A2,4)+LEN(B1)+2,LEN($A2)),",","")," ",REPT(" ",LEN($A2)),1),LEN($A2)))
 
Upvote 0
Thank you everyone. Not a full solution, but each post looks like it could be a piece of the solution.

The formulas posted by Fluff and jtakw are a lot more complex than I usually create, and I thought of myself as fairly good in Excel. They each look promising, except that each of them only returns one of three accounts in A2.

BSALV's solution, OTOH, parses A2 into the correct number of columns. Combining these two approaches, with a macro that takes the number of Accounts (handily the first thing in the target cell), adds that number of rows below, parses on semicolons, transposes everything back to the data column starting in the position of the original data cell, parses text to columns using one of those formulas, then copies the constant text in columns to the left and right (the data cell that needs to be parsed is actually in Col. H or something like that) - something like that might work.

jtakw said "If your data is always in that format". Unfortunately, it's not. Certain accounts (not shown in the samples I posted) do not have a Project ID and they don't have an initial delimiter holding the place of the Project ID. They just start off with BU. I'll have to see which of the formulas can be made to compensate for the fact some of the data has Project ID, some doesn't, and there is no delimiter holding its place when it doesn't.

When I get back to my office (off tomorrow for reasons), I will have to dig into this more.
 
Upvote 0
VBA Code:
Sub splitsen()
     For Each c In Range("a1:A10").Cells                        'several such cells, one by one
          If Len(c.Value) > 0 Then
               sp = Split(c.Value, ";")                         'split on ";"
               For j = 0 To UBound(sp)                          'each part
                    Do                                          'infinite loop
                         i1 = InStr(sp(j), ":")                 'search ":"
                         If i1 > 0 Then                         'if found
                              i2 = InStrRev(sp(j), " ", i1)     'find preceding  space
                              sp(j) = WorksheetFunction.Replace(WorksheetFunction.Replace(sp(j), i2, 1, vbLf), i1, 1, "=")     'replace that space with vblf and : with =
                         End If
                    Loop While i1 > 0                           'until last :
                    c.Offset(, 2).Resize(, UBound(sp) + 1).Value = sp     'write parts to worksheet
               Next
          End If
     Next
End Sub

Map5
ABCDE
13 Accounts: 650.22 USD (8.46%) New Store/Facility (180000) US, 6666 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-200 - Whole Body-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 200 - Whole Body (200), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020); 3,196.56 USD (41.59%) New Store/Facility (180000) US, 6666 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-160 - Meat-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 160 - Meat (160), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020); 3,839.09 USD (49.95%) New Store/Facility (180000) US, 8888 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-180 - Seafood-21020 - Refrigerated Cases, Proj ID: 6666 - Fake City (6666), BU: 99999 - XYZ - Fake City (99999), Account: 180000 - CIP-STORES (180000), Team: 180 - Seafood (180), Sub-Team: , ProjCode: 21020 - Refrigerated Cases (21020)3 Accounts= 650.22 USD (8.46%) New Store/Facility (180000) US, 6666 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-200 - Whole Body-21020 - Refrigerated Cases, Proj ID= 6666 - Fake City (6666), BU= 99999 - XYZ - Fake City (99999), Account= 180000 - CIP-STORES (180000), Team= 200 - Whole Body (200), Sub-Team= , ProjCode= 21020 - Refrigerated Cases (21020) 3,196.56 USD (41.59%) New Store/Facility (180000) US, 6666 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-160 - Meat-21020 - Refrigerated Cases, Proj ID= 6666 - Fake City (6666), BU= 99999 - XYZ - Fake City (99999), Account= 180000 - CIP-STORES (180000), Team= 160 - Meat (160), Sub-Team= , ProjCode= 21020 - Refrigerated Cases (21020) 3,839.09 USD (49.95%) New Store/Facility (180000) US, 8888 - Fake City-99999 - XYZ - Fake City-180000 - CIP-STORES-180 - Seafood-21020 - Refrigerated Cases, Proj ID= 6666 - Fake City (6666), BU= 99999 - XYZ - Fake City (99999), Account= 180000 - CIP-STORES (180000), Team= 180 - Seafood (180), Sub-Team= , ProjCode= 21020 - Refrigerated Cases (21020)
Blad1
 
Upvote 0
except that each of them only returns one of three accounts in A2.
I thought that was what you wanted.
Can you post an exampled of exactly what you want from that data.
 
Upvote 0

Forum statistics

Threads
1,214,827
Messages
6,121,818
Members
449,049
Latest member
cybersurfer5000

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top