VBA to extract only part of text from a cells (parse raw output)

smide

Board Regular
Joined
Dec 20, 2015
Messages
162
Office Version
  1. 2016
Platform
  1. Windows
Hello.
I need some assistance with an excel spreadsheet which contains very raw output report in row 1 (Sheet2) columns (A-NO).
I'm not very proficient with excel-vba but I'm open to suggestions and willing to learn to get this to work properly.

As I already stated in row1 in Sheet2 I have a raw report output which contains certain Product prices.
Each product is presented with four characteristics: unique Product's id and three prices (shop, factory and market).

The problem is that I need to find cells that contain this data about product's id and prices, then to extract only numbers from those cells without unnecessary characters.

There are certain rules in raw output report that deserves attention:

1a) The cells which contain product's prices (all cells are in row 1, Sheet2 as I said) always follow the next pattern:

Example.

Sheet2

cell (1,D) productne:{"shop":15.80
cell (1,E) fact:10.40
cell (1,F) mark:14.20}

* so, cells with this characteristics/special characters/brackets are always consecutive cells in row 1 (Sheet2) - I need to extract only number from them (15.80 10.40 14.20)

1b) Sometimes "dummy" cells appears with contents: storTotal:{"shop":{"totals":75 or cells with contents: shop:19.30 are also "dummy" cells - I do not need number from this type of cells, skip them (!!)


Summary: when it comes to product's prices extract ONLY numbers from cells with productne:{"shop": content and only they can be considered as the first relevant and then I need (ALWAYS) also number(s) from following two cells (fact: and mark:)


2) Product's id

- There are four type of cells in total which contain product's id characteristics (again of course all cells are somewhere in row 1, Sheet2):

Product id cell types:

2a) events:[{"id":45 - I need only number 45 from this type of cell

2b) {"id":626702572 - I need only number 626702572

but there are also "dummy" cells with similar "id" characteristics:

2c) account:[{"id":2370

or

2d) annual:[{"id":3460

I do not need id from these cells (!!) (from 2c) and/or 2d) ) - skip them (!)


3) "Between" rule

Obviously trio (shop-fact-mark eg. those three cells) are always somewhere between two cells with 2a) or 2b) contents (between two product id's).
Valid Product id is always the first left cell, left to the shop-fact-mark trio.

Important: Sometimes between two cells with id characteristics (two id's) there are no this required cell with productne:{"shop": content.
In that case the whole row after id stays empty (see example below).

ABCDEFGHIJKLMNOPQRSTU
1events:[{"id":status:"I"events:[{"id":7650starts:maxTotal:50{"id":976shop:19.30productne:{"shop":15.80fact:10.40
mark:14.20}
maxSpread:5000.0shop:35.23fact:12{"id":3333account:[{"id":2370cutoff:{"id":45700
productne:{"shop":77.88
fact:72.11
mark:75.80}
....

<tbody>
</tbody>

* I bolded only valid cells, please notice how I skipped "dummy" cells/data in G1,L1,M1 and O1.
Also, there are no required productne:{"shop": cell after 7650 and 3333 Product id's so the rest of that row(s) is empty.


I want to run my code from Sheet1 and results (according to above example) should be in Sheet1 cells J3:M3.

JKLM
1
2idshopfactmark
37650
497615.8010.4014.20
53333
64570077.8872.1175.80
7

<tbody>
</tbody>

(apologize for the long post)
 
Last edited:

Excel Facts

Copy formula down without changing references
If you have =SUM(F2:F49) in F50; type Alt+' in F51 to copy =SUM(F2:F49) to F51, leaving the formula in edit mode. Change SUM to COUNT.
I think it would be more understandable if you made a much simpler example but...

events:[{"id":status:"I"events:[{"id":7650starts:maxTotal:50{"id":976shop:19.30productne:{"shop":15.80fact:10.40mark:14.20}
rules
find a cell with a ":" followed by 4 numerals
then find a group of 3 adjacent cells that start productne, fact,mark
query
why is events:[{"id":7650 not considered to match with the last 3 cells in the row ?

<colgroup><col><col><col><col><col><col><col><col><col><col><col></colgroup><tbody>
</tbody>
 
Upvote 0
Regarding my question I have now two separate codes.

One extract product's id and other products prices but as I said they are running separately and beacuse of that my prices output are often in the the wrong row (next to the wrong id).
(In the case that between two id's there are no those valid price cells).

Extract id's:

Code:
Dim C As Long, X As Long, Data As Variant, Result As Variant
  Data = Sheets("Sheet2").Range("A1", Sheets("Sheet2").Cells(1, Columns.Count).End(xlToLeft))
  ReDim Result(1 To UBound(Data, 2), 1 To 1)
  For C = 1 To UBound(Data, 2)
    If Data(1, C) Like "*""[Ii][Dd]"":#*" And _
       Left(LCase(Data(1, C)), 14) <> "annual:[{""id"":" And _
       Left(LCase(Data(1, C)), 15) <> "account:[{""id"":" Then
      X = X + 1
      Result(X, 1) = Mid(Data(1, C), InStrRev(Data(1, C), ":") + 1)
    End If
  Next
  Sheets("Sheet1").Range("J3").Resize(UBound(Result)) = Result  End Sub


Extract prices:

Code:
Sub Sredi()

Dim a As Variant, b As Variant
  Dim i As Long, k As Long, pos As Long
  
  With Sheets("Sheet2")
    a = .Range("A1", .Cells(1, .Columns.Count).End(xlToLeft)).Value
  End With
  ReDim b(1 To UBound(a, 2), 1 To 3)
  For i = 1 To UBound(a, 2) - 2
    pos = InStr(1, a(1, i), "{""shop"":", vbTextCompare)
    If pos > 0 And IsNumeric(Mid(a(1, i), pos + 8)) Then
      k = k + 1
      b(k, 1) = Val(Mid(a(1, i), pos + 8))
      b(k, 2) = Val(Mid(a(1, i + 1), InStr(1, a(1, i + 1), ":") + 1))
      b(k, 3) = Val(Mid(a(1, i + 2), InStr(1, a(1, i + 2), ":") + 1))
    End If
  Next i
  If k > 0 Then
    Sheets("Sheet1").Range("K3:M3").Resize(k).Value = b
  End IfEnd Sub
I'll appreciate if someone help me to concatenate somehow this two codes into single one which will return result according to my original request.
 
Last edited:
Upvote 0
can you answer my query


oldbrewer said:
why is events:[{"id":7650 not considered to match with the last 3 cells in the row ?

<tbody>

Sorry, didn't noticed your question.

</tbody>
There is no need to match events:[{"id":7650 with last 3 cells in row1 (Sheet2).
In the output table (Sheet1) from my example the row which contains 7650 is empty because there is no valid prices cells between that id (7650) and next id (id from row1 (Sheet2)).
 
Upvote 0
so we look for a cell containing "id: providing it is followed by a cell containing shop: then we retrieve the id cell and the 3 cells after shop ?



<tbody>
</tbody>
 
Upvote 0

Forum statistics

Threads
1,215,924
Messages
6,127,725
Members
449,401
Latest member
TTXS

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top