Formula Extract Characters at the Beginning of a String

gonesalsa

New Member
Joined
May 5, 2014
Messages
7
Hello
I need to extract the numbers at the beginning of a string. The problem is that there could 1, 2 or 3 digits. Here are examples of the string:

4JXXXXXXXXX1204
10EXXXXXXXXX1204
127JXXXXXXXXX1204

I only want to extract the 4, 10 and the 127 in the above strings. I tried =LEFT(B1,LEN(B1)-14), but the formula isn't working on the last string where the number is 3 digits. it only pulls 12, instead of 127.

Thanks
 
Conclusion: these are simple formula that will usually work for most cases.
If you want to be sure that you're really getting the digits at the beginning of a string you have to test them one by one, not in bulk like with Left().

Well said! It seems that this is a bit of a theme this morning!

It's slightly unfortunate that the "bulk-testing" approach with LEFT (or RIGHT) still probably makes up the majority of proposed solutions to questions of this type, despite its evident lack of rigour.

Regards
 
Upvote 0

Excel Facts

Lock one reference in a formula
Need 1 part of a formula to always point to the same range? use $ signs: $V$2:$Z$99 will always point to V2:Z99, even after copying
Might be worth considering...

=LOOKUP(9.99999999999999E+307,MATCH(LEFT(A2,{1,2,3}),{4,10,127}&"",0),{4,10,127})
 
Upvote 0
It's slightly unfortunate that the "bulk-testing" approach with LEFT (or RIGHT) still probably makes up the majority of proposed solutions to questions of this type, despite its evident lack of rigour.

I agree almost completely. Just not with the "slightly unfortunate", "really unfortunate" would maybe be better. :(

These formulas you are referring to are simpler and smaller but that's a poor excuse for the lack of rigour.

For up to 3 initial digits maybe this one is not so bigger and is surely more rigorous:

=LEFT(A1,MATCH(FALSE,ISNUMBER(-MID(A1,{1,2,3,4},1)),0)-1)

Maybe it's time to stop posting those formulas or, at least, to post a caveat so that people understand their limits.
 
Upvote 0
I agree almost completely. Just not with the "slightly unfortunate", "really unfortunate" would maybe be better. :(

These formulas you are referring to are simpler and smaller but that's a poor excuse for the lack of rigour.

For up to 3 initial digits maybe this one is not so bigger and is surely more rigorous:

=LEFT(A1,MATCH(FALSE,ISNUMBER(-MID(A1,{1,2,3,4},1)),0)-1)

Maybe it's time to stop posting those formulas or, at least, to post a caveat so that people understand their limits.

Agree completely. Only last night I was trying to explain this precise issue in reply to some solutions posted (by contributors of "MVP" status) on a different forum.

It really is quite endemic, so you're perhaps correct that it's more than "slightly" unfortunate.

As for a caveat, I'm not sure that would be practical: how many sub-clauses of that caveat would there have to be to cover the myriad possibilities regarding dates/scientific notation, etc. that must be avoided? Better if the message to, as you say, perform the test on an individual character basis, rather than on "bulk" sections, is promoted as being the only approach which guarantees success.

Regards
 
Upvote 0
As for a caveat, I'm not sure that would be practical: how many sub-clauses of that caveat would there have to be to cover the myriad possibilities regarding dates/scientific notation, etc. that must be avoided? Better if the message to, as you say, perform the test on an individual character basis, rather than on "bulk" sections, is promoted as being the only approach which guarantees success.

You are right. A caveat may be tricky.

I mean that I would not just ban the simple Left() formula.

For a specific problem this simple formula may be enough, and to always use a general full proof solution no matter what the case may not be needed. In that case the caveat would refer that the solution is bound to the format of the data.

As a general solution, however, I agree. let's get rid of these formulas. (missing a combat smiley)
 
Upvote 0
pgc01, XOR LX,

I do not agree with you. Conscious use of the formula is more important to me than the universality of uses.
Formula for every cases can be completly useless for some cases. Sometimes efficiency is more important than versality.
We should (have to) know something about our data.
In case above if we know there are two letter after digits and again numbers and letters, then we can use this simple (pgc01 formula) formula.

Kind regards
billszysz
 
Upvote 0
pgc01, XOR LX,

I do not agree with you. Conscious use of the formula is more important to me than the universality of uses.
Formula for every cases can be completly useless for some cases. Sometimes efficiency is more important than versality.
We should (have to) know something about our data.
In case above if we know there are two letter after digits and again numbers and letters, then we can use this simple (pgc01 formula) formula.
I happen to agree with you... I would rather see a simple formula that handles the actual data structuring the user has than a full-blown, "cover anything that may be thrown at it" type formula. However, the problem tends to be that those posting questions here usually do not give us the rigorous structuring behind their data that is needed, mainly because it is so obvious to them that it does not occur to them that we who know nothing about their "business model" haven't a clue about the data's structuring. So we are usually left to guess at what seems to be their data structuring and then craft our formulas from that. I don't think there is any easy answer but to have the person asking the question to clarify things for us, but normally we (well, I at least) go with the formula that looks like it should work and wait for the questioner to tell us it doesn't work and why... then progress from there. Sometimes fellow volunteers spot something that should be looked into further (read that as, "we missed something when we crafted the original formula") and that is good... we all watch each other's backs and the outcome usually is the questioner eventually ends up with the solution he/she was after.
 
Upvote 0
Couldn't disagree more - sorry.

Given a scenario in which we have a choice between proposing a solution which we know works for all strings of that type, and one which we are pretty confident (but not certain) of working for all strings of that type, why choose the latter?

I could perhaps sympathize more if taking the former approach meant unnecessarily complicating the formula-work, but that's very rarely the case (and certainly not here).

And that's not to mention the practical difficulties involved in trying to "know something about our data": in this particular case, in order to guarantee success for the earlier-proposed solutions (for which I pointed out the drawbacks), we would have to go through a (faintly ridiculous) process of first getting the OP to confirm, categorically, that not a single one of the strings in question contained any one of several dozen (perhaps much more) substrings which would otherwise cause that formula to fail.

Call my views overambitious, idealistic, a mere result of my mathematical upbringing if you will, but, in my opinion, absolute truth (my interpretation of "cover anything that may be thrown at it") is not something we should so easily forego in favour of "efficiency" and "simplicity".

Regards
 
Upvote 0
Couldn't disagree more - sorry.
Don't be sorry... I welcome the differences, they are what make this world of ours interesting.

Given a scenario in which we have a choice between proposing a solution which we know works for all strings of that type, and one which we are pretty confident (but not certain) of working for all strings of that type, why choose the latter?

I could perhaps sympathize more if taking the former approach meant unnecessarily complicating the formula-work, but that's very rarely the case (and certainly not here).
I probably wasn't clear about my opinion here... if the change is easy and does not burden the formula with a half-a-dozen or more extra function calls (which might become significant if he user has a half-a-million or more rows containing these formulas) that handle a more varied data structure than what the OP has, then fine, but I do not see the need to overload a formula with extra function calls "just in case". I ideal case, of course, is for the OP to tell us exactly what is needed and then design an efficient formula to handle that exact need.

And that's not to mention the practical difficulties involved in trying to "know something about our data": in this particular case, in order to guarantee success for the earlier-proposed solutions (for which I pointed out the drawbacks), we would have to go through a (faintly ridiculous) process of first getting the OP to confirm, categorically, that not a single one of the strings in question contained any one of several dozen (perhaps much more) substrings which would otherwise cause that formula to fail.
Ideally, rather than an extended back-and-forth with the OP, having him explain how the data is formed in some detail should suffice (it is his/her data so he/she should know how it is constructed).

Call my views overambitious, idealistic, a mere result of my mathematical upbringing if you will, but, in my opinion, absolute truth (my interpretation of "cover anything that may be thrown at it") is not something we should so easily forego in favour of "efficiency" and "simplicity".
You might find it interesting to know I graduated college as a Math Major (here in the States) more than 40 years ago (although my working career was as a Civil Engineer/Programmer).
 
Upvote 0
XOR LX,

I respect your opinion but I disagree with it.
My English is too bad to discuss but I will try to show why I do not agree with your opinion.
Look at the first post ... the examples are clear ... maximum 3 digits at the beginning of the text and behind them a few letters.
Now look at my post # 9 .. I expressed the opinion that we should be careful with more than 3 digits. This means that we know something about our data (first post, examples)
But for a maximum of three digits is absolutely sufficient. (I mean this solution pgc01 =-LOOKUP (1,-LEFT (A1, {1,2,3,4})) without ";4".)
So why did you write...?

It's slightly unfortunate that the "bulk-testing" approach with LEFT (or RIGHT) still probably makes up the majority of proposed solutions to questions of this type, despite its evident lack of rigour.
Rigour are respected in this case... do you agree? And this is not general solution, but for this case only... simply and efficient but not universal.

Regards
 
Upvote 0

Forum statistics

Threads
1,215,420
Messages
6,124,802
Members
449,190
Latest member
cindykay

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top