Extract first numbers from right in variable layout alphanumeric text

agrospud

New Member
Joined
Apr 27, 2016
Messages
9
Office Version
  1. 365
Platform
  1. Windows
I have a variation on the problem of separating numbers from alphanumeric text that I have not managed to solve myself or find a solution for anywhere on the internet, so it’s time to seek help.
I am regularly supplied with a price list that is frequently over 1000 rows long with a cell in each row containing a product description. This description is of variable length and contains a mixture of words and numbers. The final numbers are the product quantity and are nearly always followed by the unit, such as ML, L KG or T. There is occasionally a further descriptive word following this, or sometimes a star symbol (*). The product quantity varies in size and can be a decimal value.
The bit that makes things tricky is that there is often a number within the product description itself that is often not relevant to the quantity.
The only things that are consistent is that the target number is the first number reading from right to left and there is always a space before it and the next item to the left, either text or another number. Also, there may or may not be a space between the target number and its following unit.
What I want to be able to do is extract the quantity into one column and the unit into another column. I have been focusing on formulas rather than VBA as my abilities with VBA are basic.
One formula I tried without success to modify is one this forum and was supplied by István Hirsch. The formula he supplied to a poster seeking to extract decimal values was :
=LOOKUP(10^8,--MID(B2,MIN(FIND({0,1,2,3,4,5,6,7,8,9},B2&"0123456789")),{1,2,3,4,5,6,7,8}))
Unfortunately this only extracts the first numbers from the left whereas I want the numbers from the right.
I am using Windows 8.1 and Excel 2010.
A data sample and what I would like to extract is shown below:
DESCRIPTION
SIZE
UNIT
PRODUCTA 100 500G
500
G
PRODUCTB 100 500 G
500
G
PRODUCTC 500 G
500
G
PRODUCTC 500 G BOTTLE
500
G
PRODUCTD 250 WP 2.5KG
2.5
KG
PRODUCTD 250 WP 2.5 KG
2.5
KG
PRODUCTE 400DF 3 KG *
3
KG
PRODUCTG 11-12-19 25 KG
25
KG
PRODUCTH PRODUCTHH 10KG
10
KG
PRODUCTH PRODUCTHH 10 KG
10
KG
PRODUCTH PRODUCTHH PRODUCTHHH 10 KG
10
KG
PRODUCTJ 45.55KG
45.55
KG
PRODUCTJ 45.55 KG *
45.55
KG
PRODUCTK 5 5L
5
L
PRODUCTK 5 5 L *
5
L
PRODUCTL-15 5 L
5
L
PRODUCTM 1000 500ML
500
ML
PRODUCTM 1000 500 ML
500
ML
PRODUCTN + PRODUCTO 1T
1
T
PRODUCTN + PRODUCTO 1.5T *
1.5
T
PRODUCTN + PRODUCTO 1.5 T *
1.5
T
PRODUCTN + PRODUCTO 1.5 T BAG
1.5
T

<tbody>
</tbody>
 

Excel Facts

Repeat Last Command
Pressing F4 adds dollar signs when editing a formula. When not editing, F4 repeats last command.
Give this a try:
Excel Workbook
ABCD
1DESCRIPTIONSIZEUNIT
2PRODUCTA 100 500G500G
3PRODUCTB 100 500 G500G
4PRODUCTC 500 G500G
5PRODUCTC 500 G BOTTLE500G
6PRODUCTD 250 WP 25 KG25KG
7PRODUCTD 250 WP 2.5 KG2.5KG
8PRODUCTE 400DF 3 KG *3KG
9PRODUCTG 11-12-19 25 KG25KG
10PRODUCTH PRODUCTHH 10KG10KG
11PRODUCTH PRODUCTHH 10 KG10KG
12PRODUCTH PRODUCTHH PRODUCTHHH 10 KG10KG
131P 45 kg 34 T34T
14PRODUCTJ 45.55 KG *45.55KG
15PRODUCTK 5 5L5L
16PRODUCTK 5 5 L *5L
17PRODUCTL-15 5 L5L
18PRODUCTM 1000 500ML500ML
19PRODUCTM 1000 500 ML500ML
20PRODUCTN + PRODUCTO 1T1T
21PRODUCTN + PRODUCTO 1.5T *1.5T
22PRODUCTN + PRODUCTO 1.5 T *1.5T
23PRODUCTN + PRODUCTO 1.5 T BAG1.5T
Sheet
Excel Workbook
ABCD
1DESCRIPTIONSIZEUNIT
2PRODUCTA 100 500G500G
3PRODUCTB 100 500 G500G
4PRODUCTC 500 G500G
5PRODUCTC 500 G BOTTLE500G
6PRODUCTD 250 WP 25 KG25KG
7PRODUCTD 250 WP 2.5 KG2.5KG
8PRODUCTE 400DF 3 KG *3KG
9PRODUCTG 11-12-19 25 KG25KG
10PRODUCTH PRODUCTHH 10KG10KG
11PRODUCTH PRODUCTHH 10 KG10KG
12PRODUCTH PRODUCTHH PRODUCTHHH 10 KG10KG
131P 45 kg 34 T34T
14PRODUCTJ 45.55 KG *45.55KG
15PRODUCTK 5 5L5L
16PRODUCTK 5 5 L *5L
17PRODUCTL-15 5 L5L
18PRODUCTM 1000 500ML500ML
19PRODUCTM 1000 500 ML500ML
20PRODUCTN + PRODUCTO 1T1T
21PRODUCTN + PRODUCTO 1.5T *1.5T
22PRODUCTN + PRODUCTO 1.5 T *1.5T
23PRODUCTN + PRODUCTO 1.5 T BAG1.5T
Sheet
 
Upvote 0
This works perfectly. Thank you very much. I will now try and figure out how it works!
 
Upvote 0
Or,

1] In "Size" B2, formula copy down :,

=LOOKUP(9^9,--RIGHT(LEFT(A2,FIND(C2&" ",A2&" ")-1),ROW($1:$16)))

2] In "Unit" C2, formula copy down :,

=LOOKUP(1,-FIND({"G";"KG";"T";"L";"ML"}&" ",A2&" "),{"G";"KG";"T";"L";"ML"})

Regards
 
Upvote 0
Or,

1] In "Size" B2, formula copy down :,

=LOOKUP(9^9,--RIGHT(LEFT(A2,FIND(C2&" ",A2&" ")-1),ROW($1:$16)))

2] In "Unit" C2, formula copy down :,

=LOOKUP(1,-FIND({"G";"KG";"T";"L";"ML"}&" ",A2&" "),{"G";"KG";"T";"L";"ML"})

Regards
That fails with, for example, these texts

PRODUCT ABC 25 G

PRODUCTL 5KG *
 
Upvote 0
Peter is correct. bosco_yip's formula works on the examples I posted but slight variations can cause errors. I had noticed this also when using the formulas on my real life data. I am unable to pick the reasons for the seemingly random failures that occur.
 
Upvote 0
I am unable to pick the reasons for the seemingly random failures that occur.
The reason is that the formula looks for the occurrences of the particular unit abbreviations followed by a space. Depending on the order those things occur in the string, the wrong abbreviation can found. For example, "T" & "L" are two of the abbreviations of interest but that formula finds these, which are not the correct ones.
PRODUCT ABC 25 G

PRODUCTL 5KG *
That then has a flow-on effect to the other formula causing it to return an error.


Note that Istvan's formula can also return incorrect results but I suspect that less likely, and can be fixed if needed. For example, with the sample data as shown & Istvan's formulas in place, now insert about 15 new rows at the top of the sheet and note the formula results.
 
Last edited:
Upvote 0
Thanks for the explanation Peter re bosco_yip's formula. However I don't follow what you mean about Istvan's formula. I could not see any problem when adding new rows to the top of the list?
 
Upvote 0
Peter is right. If you are going to insert new rows between the existing rows then only these formulae give correct results:
B2:
=RIGHT(SUBSTITUTE(LEFT(A2,LOOKUP(1,-MID(A2,ROW(INDEX(A:A,1):INDEX(A:A,LEN(A2))),1),ROW(INDEX(A:A,1):INDEX(A:A,LEN(A2)))))," ",REPT(" ",10)),10)+0
C2:
=TRIM(LEFT(SUBSTITUTE(TRIM(REPLACE(A2,1,LOOKUP(1,-MID(A2,ROW(INDEX(A:A,1):INDEX(A:A,LEN(A2))),1),ROW(INDEX(A:A,1):INDEX(A:A,LEN(A2)))),""))," ",REPT(" ",4)),4))
 
Upvote 0
Sorry Istvan but I don't follow. The formula works horizontally within the row so why does inserting new rows cause a problem?
 
Upvote 0

Forum statistics

Threads
1,215,068
Messages
6,122,950
Members
449,095
Latest member
nmaske

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top