Extract first numbers from right in variable layout alphanumeric text

agrospud

New Member
Joined
Apr 27, 2016
Messages
9
Office Version
  1. 365
Platform
  1. Windows
I have a variation on the problem of separating numbers from alphanumeric text that I have not managed to solve myself or find a solution for anywhere on the internet, so it’s time to seek help.
I am regularly supplied with a price list that is frequently over 1000 rows long with a cell in each row containing a product description. This description is of variable length and contains a mixture of words and numbers. The final numbers are the product quantity and are nearly always followed by the unit, such as ML, L KG or T. There is occasionally a further descriptive word following this, or sometimes a star symbol (*). The product quantity varies in size and can be a decimal value.
The bit that makes things tricky is that there is often a number within the product description itself that is often not relevant to the quantity.
The only things that are consistent is that the target number is the first number reading from right to left and there is always a space before it and the next item to the left, either text or another number. Also, there may or may not be a space between the target number and its following unit.
What I want to be able to do is extract the quantity into one column and the unit into another column. I have been focusing on formulas rather than VBA as my abilities with VBA are basic.
One formula I tried without success to modify is one this forum and was supplied by István Hirsch. The formula he supplied to a poster seeking to extract decimal values was :
=LOOKUP(10^8,--MID(B2,MIN(FIND({0,1,2,3,4,5,6,7,8,9},B2&"0123456789")),{1,2,3,4,5,6,7,8}))
Unfortunately this only extracts the first numbers from the left whereas I want the numbers from the right.
I am using Windows 8.1 and Excel 2010.
A data sample and what I would like to extract is shown below:
DESCRIPTION
SIZE
UNIT
PRODUCTA 100 500G
500
G
PRODUCTB 100 500 G
500
G
PRODUCTC 500 G
500
G
PRODUCTC 500 G BOTTLE
500
G
PRODUCTD 250 WP 2.5KG
2.5
KG
PRODUCTD 250 WP 2.5 KG
2.5
KG
PRODUCTE 400DF 3 KG *
3
KG
PRODUCTG 11-12-19 25 KG
25
KG
PRODUCTH PRODUCTHH 10KG
10
KG
PRODUCTH PRODUCTHH 10 KG
10
KG
PRODUCTH PRODUCTHH PRODUCTHHH 10 KG
10
KG
PRODUCTJ 45.55KG
45.55
KG
PRODUCTJ 45.55 KG *
45.55
KG
PRODUCTK 5 5L
5
L
PRODUCTK 5 5 L *
5
L
PRODUCTL-15 5 L
5
L
PRODUCTM 1000 500ML
500
ML
PRODUCTM 1000 500 ML
500
ML
PRODUCTN + PRODUCTO 1T
1
T
PRODUCTN + PRODUCTO 1.5T *
1.5
T
PRODUCTN + PRODUCTO 1.5 T *
1.5
T
PRODUCTN + PRODUCTO 1.5 T BAG
1.5
T

<tbody>
</tbody>
 
However I don't follow what you mean about Istvan's formula. I could not see any problem when adding new rows to the top of the list?
Here's the sample data with Istvan's original formula

Excel Workbook
ABC
1DESCRIPTIONSIZEUNIT
2PRODUCTA 100 500G500G
3PRODUCTB 100 500 G500G
4PRODUCTC 500 G500G
5PRODUCTC 500 G BOTTLE500G
6PRODUCTD 250 WP 2.5KG2.5KG
7PRODUCTD 250 WP 2.5 KG2.5KG
8PRODUCTE 400DF 3 KG *3KG
9PRODUCTA 10x100KG BAG ABC 1T1T
10PRODUCTH PRODUCTHH 10KG10KG
11PRODUCTH PRODUCTHH 10 KG10KG
12PRODUCTH PRODUCTHH PRODUCTHHH 10 KG10KG
13PRODUCTJ 45.55KG45.55KG
14PRODUCTJ 45.55 KG *45.55KG
15PRODUCTK 5 5L5L
16PRODUCTK 5 5 L *5L
17PRODUCTL 5 KG *5KG
18PRODUCTM 1000 500ML500ML
19PRODUCTM 1000 500 ML500ML
20PRODUCTN + PRODUCTO 1T1T
21PRODUCTN + PRODUCTO 1.5T *1.5T
22PRODUCTN + PRODUCTO 1.5 T *1.5T
23PRODUCTN + PRODUCTO 1.5 T BAG1.5T
Sheet1



Here is the same sheet with 12 new rows added at the top. Look at rows 16-17 and 27-29

Excel Workbook
ABC
1
2
3
4
5
6
7
8
9
10
11
12
13DESCRIPTIONSIZEUNIT
14PRODUCTA 100 500G500G
15PRODUCTB 100 500 G500G
16PRODUCTC 500 G#N/A#N/A
17PRODUCTC 500 G BOTTLE#N/A#N/A
18PRODUCTD 250 WP 2.5KG2.5KG
19PRODUCTD 250 WP 2.5 KG2.5KG
20PRODUCTE 400DF 3 KG *3KG
21PRODUCTA 10x100KG BAG ABC 1T1T
22PRODUCTH PRODUCTHH 10KG10KG
23PRODUCTH PRODUCTHH 10 KG10KG
24PRODUCTH PRODUCTHH PRODUCTHHH 10 KG10KG
25PRODUCTJ 45.55KG45.55KG
26PRODUCTJ 45.55 KG *45.55KG
27PRODUCTK 5 5L#N/A#N/A
28PRODUCTK 5 5 L *#N/A#N/A
29PRODUCTL 5 KG *#N/A#N/A
30PRODUCTM 1000 500ML500ML
31PRODUCTM 1000 500 ML500ML
32PRODUCTN + PRODUCTO 1T1T
33PRODUCTN + PRODUCTO 1.5T *1.5T
34PRODUCTN + PRODUCTO 1.5 T *1.5T
35PRODUCTN + PRODUCTO 1.5 T BAG1.5T
Sheet1
 
Upvote 0

Excel Facts

What is =ROMAN(40) in Excel?
The Roman numeral for 40 is XL. Bill "MrExcel" Jelen's 40th book was called MrExcel XL.
In the formulae in post #2 the part ROW($1:$500) is to count up to the length of the string (now supposed to be max. 500 characters) (1,2,3 ..) to check the according character: Mid(a1,1,1)=”P”, mid(a2,2,1)=”r”, mid(a2,3,1)=”o” (for example from „Product”) etc.
if you insert new rows, for example from row 1 to row 20, then the row function for the originally 2nd and 3rd rows etc changes to ROW($21:$522) consequently the MID function will not test the needed character range (which always should be 1 to the length of the string).
Shortly, inserting new rows shifts the tested character range towards higher values, which may cause errors. The formulae in post #9 are free from that drawback.
 
Upvote 0
Thanks for the explanation Istvan.
Actually the modified formula you gave to isolate the units doesn't work as the number of nested brackets exceeds capabilities of, I assume, Excel 2010. However, this is not a problem as the original formula works and I am unlikely to insert any rows into the start of the list anyway. So, I can work with what you have already posted. Thanks again to you, and also Peter and bosco_yip for your help.
 
Upvote 0
Actually the modified formula you gave to isolate the units doesn't work as the number of nested brackets exceeds capabilities of, I assume, Excel 2010.
There should be absolutely no issue with nested brackets with the new formulas and Excel 2010 as 64 levels of nesting are allowed. If the formula is failing, it will be for some other reason.

Could you provide a sample line for which the formulas fail as they work on all the previous sample data for me?

If you are not going to insert rows then fine, stick with the other formula, I'm really asking out of interest.
 
Last edited:
Upvote 0
Well Peter when I checked I discovered that the file I was using had at some point been saved in Excel 97-2003 which obviously couldn't cope with that level of nesting.
It works fine in Excel 2010 as you said it should.
Cheers.
 
Upvote 0
Well Peter when I checked I discovered that the file I was using had at some point been saved in Excel 97-2003 which obviously couldn't cope with that level of nesting.
It works fine in Excel 2010 as you said it should.
Cheers.
All good then. :)
 
Upvote 0

Forum statistics

Threads
1,215,073
Messages
6,122,970
Members
449,095
Latest member
Mr Hughes

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top