Extracting unit measures from text strings

emi_david

New Member
Joined
Dec 29, 2015
Messages
7
Hi,

So I'm scraping data from online stores into an excel sheet and often get cells with the following text:

"ACANA ADULT SMALL BREED 2.5KG"

I want 2.5 to be taken out of this cell (A1 for example) and moved into the next column (B1), and the unit kg to be moved in the next one yet (C1). The string "2.5kg" can be written any where in the text string. It can also be written in different ways such as "2.5 kg", "2,5 kg" or "2500g" depending on the store. However, after the string "2.5 kg" has been located and moved to B1 and C1 I can easily remove any and all blank spaces from these columns.

Obviously I cannot use =Right(x) since the number of characters differ and the hunted string can be anywhere in the text made available by the online store.

I need something like:
Unit = ml, l, g, kg, oz
For i = 0 to n
If it says "unit" in the text Then
Take all the characters before the unit until the next letter And
Move the chosen letters to C1 and the chosen numbers to B1
End If
Next i

If possible I'd like a macro for it. I so hope there is a general formula to use rather than adhoc solutions for each store output. Please tell me if you want to see more sample data.

Best regards,
emi_david
 

Excel Facts

What does custom number format of ;;; mean?
Three semi-colons will hide the value in the cell. Although most people use white font instead.
Are they always one of the five: "ml, l, g, kg, oz"?

Are there ever any other numbers, that aren't referencing the measurement? (i.e. SKU numbers or product numbers)
 
Upvote 0
Are they always one of the five: "ml, l, g, kg, oz"?

Are there ever any other numbers, that aren't referencing the measurement? (i.e. SKU numbers or product numbers)

The units that need to be extracted can be any wright or volume unit. I can also think of lb, lbs and fl oz. Of course capitalization differs as well.

Numbers are sometimes part of the product name, or can refer to other properties of the product.

CANAGAN GTC2 S.B F-RUN CHIC dogs over 1 year - DOGS 2KG

In another similar case nutrition information should be extracted in a similar way, so that the split of

"Energi (kcal) 95 kcal, Energi (kJ) 410 kJ, Fett 0 g, Varav mättat fett 0 g, Kolhydrater 24 g, Varav socker 22 g, Protein 0 g, Salt 0.5 g"

is made based on the activators "kcal", "g", and "kJ"
in the same way as described above, but until next "," rather than the next letter.
 
Last edited:
Upvote 0
It's going to be very difficult to automate if there is no logic. Especially if you're saying that:

"Energi (kcal) 95 kcal, Energi (kJ) 410 kJ, Fett 0 g, Varav mättat fett 0 g, Kolhydrater 24 g, Varav socker 22 g, Protein 0 g, Salt 0.5 g"

is all one line that you want to separate. In the case of:

CANAGAN GTC2 S.B F-RUN CHIC dogs over 1 year - DOGS 2KG

In order to know exactly where to get the measurement number from, Excel would have to know either the entire list of possible weight measurements (KG, mL, g, etc.) or know the positioning. Since you say the measurement is not always at the end, the first way might be the easiest. Can you make a list of all the measurement types that occur?

So far we have "ml, l, g, kg, oz, kcal, kJ" -- if you can make an entire list of possible measurement types, we could simply search for one of the measurement types in the cell, and figure out how to get the number next to it.
 
Upvote 0
Here's the example file. Press button "Get Data" to get result.
Source code:
Code:
Sub GetData()
    
    Dim r
    With CreateObject("VBScript.RegExp")
        .IgnoreCase = True: .Pattern = "\s([0-9.]+)([a-z]+)$"
        For r = 1 To 10 'Chane 1 and 10 to the row numbers you need
            With .Execute(Cells(r, 1))(0)
                Cells(r, 2) = .SubMatches(0)
                Cells(r, 3) = .SubMatches(1)
            End With
        Next
        
    End With

End Sub
 
Upvote 0
That file is just 10 lines of "ACANA ADULT SMALL BREED 2.5KG" and it won't show the "Get Data" button for security or compatibility reasons.

???
 
Upvote 0
@svendiamond
So what? I posted original code - use it. Copy string into A1:A10 and run it.
 
Upvote 0
Very cool. Reading up about RegExp right now. My skills are not that advanced yet.
 
Upvote 0
@svendiamond The list of units you ask for can be expanded later but will something like kg, g, oz, fl oz, ml, cl, dl, hg, kcal, kj. How would you code the kind of function you describe?

@Sektor Yes, that is a start. As I understand, you check for numbers followed by letters followed by end without blank spaces.

I would first need to run a code that takes away any blank spaces adjacent to and inside the units (kg, g, oz, fl oz, ml, cl, dl, hg)

I recorded the following to see that this is possible:

Code:
Cells.Replace What:=" g", Replacement:="g", LookAt:=xlPart, SearchOrder _        :=xlByRows, MatchCase:=False, SearchFormat:=False, ReplaceFormat:=False
    Cells.Replace What:=" kg", Replacement:="kg", LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
        ReplaceFormat:=False
    Cells.Replace What:=" ml", Replacement:="ml", LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
        ReplaceFormat:=False
    Cells.Replace What:="0 l", Replacement:="0l", LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
        ReplaceFormat:=False

In order to avoid that words starting with l or g get pulled into the preceding word, I need to make a long string of replacements like "0 g" "1 g" "2 g" etc. Is there a way to write [any number] followed by a blank space and then the letter "g" to replace this with something? Is there a way to write Blank space followed by "l" followed by end?

Then I would run the code successfully for all text that have the measurements in the end. I have tried it on a big file and it works like a charm. Is it possible to take away 2.5KG from the original cell when it is moved to columns B and C?

What about the cases where the unit is not in the end?
I post another list below as an example:

Mexikansk tacopaj Fryst 220g Felix
Köttfärspaj Fryst 300g Felix
Ost & skinkpaj Fryst 220g ICA
Filodegspaj med Champinjon & Kantarell Fryst 220g Felix
 
Last edited:
Upvote 0
Hi

You can make a list of the units you accept and extract all the respective data.

Similar to Sektor's but specifying the units:

Code:
Sub GetData()
    
    Dim regexMatches As Object
    Dim r As Range, rC As Range
    Dim lRow As Long, lMatch As Long
    
    Const sUnits As String = "(kg)|g|(oz)|(fl oz)|(ml)|(cl)|(dl)|(hg)"
    
    Set r = Range("A1:A10")
    With CreateObject("VBScript.RegExp")
        .Pattern = "\b(\d[0-9.,]+)\s*(" & sUnits & ")\b"
        .IgnoreCase = True
        .Global = True
        
        For lRow = 1 To r.Count
            Set regexMatches = .Execute(r(lRow, 1).Value)
            For lMatch = 1 To regexMatches.Count
                With regexMatches(lMatch - 1)
                    r(lRow, 2 * lMatch).Value = .submatches(0)
                    r(lRow, 2 * lMatch + 1).Value = .submatches(1)
                End With
            Next lMatch
        Next
        
    End With

End Sub


This is an example.
Notice that in A5 the 220 is not extracted although it is followed by a "g".



<table border="1" cellpadding="1" style="background:#FFF; border-collapse:collapse;border-width:2px;border-color:#CCCCCC;font-family:Arial,Arial; font-size:10pt" ><tr><th style="border-width:1px;border-color:#888888;background:#9CF " > </th><th style="border-width:1px;border-color:#888888;background:#9CF; text-align:center" >A</th><th style="border-width:1px;border-color:#888888;background:#9CF; text-align:center" width=30 >B</th><th style="border-width:1px;border-color:#888888;background:#9CF; text-align:center" >C</th><th style="border-width:1px;border-color:#888888;background:#9CF; text-align:center" width=30 >D</th><th style="border-width:1px;border-color:#888888;background:#9CF; text-align:center" width=30 >E</th><th style="border-width:1px;border-color:#888888;background:#9CF; text-align:center" width=30 >F</th><th style="border-width:1px;border-color:#888888;background:#9CF; text-align:center" width=30 >G</th><th style="border-width:1px;border-color:#888888;background:#9CF; text-align:center" width=30 >H</th></tr><tr><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;background:#9CF; text-align:center; " ><b>1</b></td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:left;border-width: 1px;border-color:#888888; ">Mexikansk tacopaj Fryst 220g Felix</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; ">220</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:left;border-width: 1px;border-color:#888888; ">g</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td></tr><tr><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;background:#9CF; text-align:center; " ><b>2</b></td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:left;border-width: 1px;border-color:#888888; ">Köttfärspaj Fryst 3,45 ml Felix</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; ">3,45</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:left;border-width: 1px;border-color:#888888; ">ml</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td></tr><tr><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;background:#9CF; text-align:center; " ><b>3</b></td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:left;border-width: 1px;border-color:#888888; ">Ost & skinkpaj Fryst 2.14 fl oz ICA</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; ">2.14</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:left;border-width: 1px;border-color:#888888; ">fl oz</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td></tr><tr><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;background:#9CF; text-align:center; " ><b>4</b></td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:left;border-width: 1px;border-color:#888888; ">Filodegspaj med Champinjon & Kantarell F<br>ryst 22 oz Felix</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; ">22</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:left;border-width: 1px;border-color:#888888; ">oz</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td></tr><tr><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;background:#9CF; text-align:center; " ><b>5</b></td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:left;border-width: 1px;border-color:#888888; ">Skinkpaj Fryst 2.14 fl oz ICA 220 gelix <br>3,14 ml XYZ 12KG PP</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; ">2.14</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:left;border-width: 1px;border-color:#888888; ">fl oz</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; ">3,14</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:left;border-width: 1px;border-color:#888888; ">ml</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; ">12</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:left;border-width: 1px;border-color:#888888; ">KG</td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td></tr><tr><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;background:#9CF; text-align:center; " ><b>6</b></td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td><td style="border-width:1px;border-color:#000000; padding-left:0.5em; padding-top:0.4em; padding-right:0.5em; padding-bottom:0.25em;text-align:right;border-width: 1px;border-color:#888888; "> </td></tr><tr><td colspan=9 style="background:#9CF; padding-left:1em" > [Book1]Sheet4</td></tr></table>
 
Last edited:
Upvote 0

Forum statistics

Threads
1,214,588
Messages
6,120,412
Members
448,959
Latest member
camelliaCase

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top