Text to Number Conversion - Which functions should be used?

aaalviny

Board Regular
Joined
Apr 25, 2013
Messages
128
Disclaimer:
I am not turning a deaf ear to others' comment, but I just had enough evidence to turn them down. I don't turn them down due to no reasons.


p.s. I do admire the use of converting 8-digit number-formatted dates to a date format by use of TEXT("0000-00-00")
I think it's the best solution given the Excel's behaviour on dates.
I didn't thoroughly use it although I heard it once before, and this time it reminds me again.

just an example which I learnt here.


and p.s. I can't find anywhere suitable to put this article, so I probably put it in this board






Text to Number Conversion - Which functions should be used?


sometimes we encounter situation that we may need to convert texts to numbers,
by using either VALUE(A1), --A1, A1*1, A1^1, A1+0, INT(A1)
which of them should be used? Below are my analysis


in the context, --A1, A1*1, A1^1, A1+0 and INT are of the same type, and represented by --A1 in the discussion below, unless otherwise specified.


1. -- or INT() or VALUE() matches the aim of the formula?


the aim of these are to convert texts to numbers
the underlying meaning of --, is to add twice the negative signs to A1, which does not show the meaning of conversion.
the aim of INT() is to extract the integer portion of a number
on the other hand, VALUE(), the function itself, means to convert the texts to number, which matches the true aim




2. -- is simpler than VALUE()?


By simple, it doesn't mean the length, but it means the ease of understanding, and edit/update
VALUE() is easier to be understood than --, because
as mentioned before, VALUE() matches the aim to convert texts to numbers
If one writes VALUE(A1), I think every excel users, even Newbie, can understand the meaning of the formula very well
On the other hand, there are always users asking what -- is.




3. Convention is to use --?


Convention doesn't necessarily mean the correct method.
And everywhere is using different conventions for this conversion (My environment uses *1)
There are so many kinds of conventions to -- because it does not represent the original aim
e.g. If one wants to use a cell to store A1 + 2
=A1+2*1 won't be used, =A1+0+2 won't be used, =(A1+2)*1 won't be used.
because the correct representation is =A1 + 2
although all gives the correct result of the value.


and "convention" is to use VLOOKUP but not INDEX+MATCH
in fact INDEX+MATCH is better in various aspects, except the length being longer
(this is another issue I can talk about if you wish)






4. VALUE() requires one more parenthesis than --


Yes it's true, you may argue excel 2003 (that's what I am using) only allows 7 in one cell.
The fact is, under a good excel design, 3 parenthesis in the formula of any cells are already too much.
too many parenthesis also mean harder to read and maintain
There may be violation of DRY principle or the user is putting too much things in one cell.
some examples of wrong usage includes:
Code:
=IF(Curr = "USD", a, IF(Curr = "GBP", b, IF(Curr = "JPY", c, IF(Curr = "CAD", d))))
Code:
=IF(ISNA(MATCH(1, $A$1:$A$3, 0)), 0, MATCH(1, $A$1:$A$3, 0))
Code:
=VLOOKUP($A1, $B$2:$F$10, 2, 0)*VLOOKUP($A1, $B$2:$F$10, 3, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)-VLOOKUP($A1, $B$2:$F$10, 5, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)/VLOOKUP($A1, $B$2:$F$10, 5, 0)


and just a random search on the threads:
Code:
=LOOKUP(9.99999999999999E+307,MID(SUBSTITUTE(A2," ","x"),SMALL(IF(ISNUMBER(MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)+0),ROW(INDIRECT("1:"&LEN(A2)))),ROW(INDIRECT("1:"&LEN(A2)))),5)+0)
by immediatelly looking the formula, do you find it trival to understand the formula?
personally I don't
If you find it trival, then you may use it, and I admit that you are really really very smart!!
(if you can be certain that only you will read the file in the future -- unless you delete it, how can you sure the file won't be exposed to others?)






5. -- is faster than VALUE?


Yes, -- is faster than VALUE
by my machine, converting 3,000,000 texts to numbers requires
2.44s by --, and
3.08s by VALUE
with a difference of 0.64s


but there are several points I'd like to make on this speed
a.
it's very seldom to have such large amounts of numbers stored wrongly as texts
I use "wrongly" because wrong data structure was used, e.g. in database, numbers should not be stored in the field type text
If there is really such a large amount of mis-formated texts, there is a problem with the raw data and it ought to be solved.


b.
even 3,000,000 texts give 0.64s difference.
In reality there won't be such large amount of texts.
Moreover, given there is no volatile funtions in the workbook (this is again another issue I can talk about if you wish)
the difference only happens once, and unless the data changes.
yet by proportion, 10,000 texts only give 0.002s difference


This is very micro-optimisation
In view of the small calculation time difference, I think this part is not a matter in comparing -- and VALUE


To supplement, the main calculation speed for excel depends on the number of lookups used and the amount of volatile functions used
which very much relies on the proper data structure, and formula designing e.g. DRY (Don't Repeat Yourself)
It can gives seconds to minutes of difference.
This is the real things about calculation speed that excel designers should care about.
And to have a fast calculation speed, a good workbook design with correct choice of algorithms will mostly do the job.


And volatile functions should be avoided (If it's really required due to poor data structures, copy and paste as value after adding a remark stating the original formula used)
(this is again another issue I can talk about if you wish)


c.
If SUMPRODUCT is used, the speed is slower because of SUMPRODUCT itself, not about -- / VALUE
because SUMPRODUCT should also be avoided and there are other alternatives
(this is again another issue I can talk about if you wish)






6. -- is shorter in terms of length than VALUE()?


Yes, 2 characters comparing to 7 charaters.
Length is the matter of syntax pre-defined by Excel, which is independent of the logic.
Other matters above are more important than the syntax pre-defined.




Conclusion
As a conclusion, please use VALUE() instead of --, *1, ^1, +0 or INT() for Excel best practice






Alvin - 2013 05 18
 
I would refer you to the book quoted to get a complete case of the benefits of R1C1 formulas. I'm not a big user of this, as I said, so I'm not the one to start a thread on it. One of the benefits is that when you write a formula and drag it down or across, it looks exactly the same in every cell. So you can quickly see anomalies (this is why I use it for "debugging"). Of course at first it's hard to understand the references, but in time I imagine you'd become used to it.
Hi ξ,
I would say using different formulas, esp. in the middle of a column is one of the bad practice and should be avoided, manual adjustment should be done on new columns. (seems better not go off-topic)
I mentioned users because you have emphasized this matter (unless I am mistaken) -- that other people might need to take over the spreadsheet and that's why you take pains to make them easy to understand.
I wonder it is "users" or "developer" (i.e. people who maintain) you are referring to?
 
Upvote 0

Excel Facts

What do {} around a formula in the formula bar mean?
{Formula} means the formula was entered using Ctrl+Shift+Enter signifying an old-style array formula.
So, are you saying that you prefer VALUE() because it is more efficient or because it is better at aim-matching?
ξ

Note, for the record I think double negation matches the task and the result, and the "function used", so to speak. I.e., it's all good.
 
Upvote 0
So, are you saying that you prefer VALUE() because it is more efficient or because it is better at aim-matching?
ξ

Note, for the record I think double negation matches the task and the result, and the "function used", so to speak. I.e., it's all good.
better at aim-matching (most importantly)
also better discrimination on the input of the function

VALUE() is little worse in terms of efficiency.

Overall I weigh aim-matching most (of course also readable and understandable, etc.), and think that efficiency only does little improvement

and from the statement from wiki
We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil
""Premature optimization" is a phrase used to describe a situation where a programmer lets performance considerations affect the design of a piece of code. This can result in a design that is not as clean as it could have been or code that is incorrect, because the code is complicated by the optimization and the programmer is distracted by optimizing."
AND
"One must be careful, in the pursuit of good coding style, not to over-emphasize efficiency. Frequently, a clean, readable and 'usable' design is much more important than a fast, efficient design that is hard to understand."

And...can you explain why aim-matching is valid for the function "arithmetic negative" to do format conversion?
I presented my detailed argument in #55
 
Last edited:
Upvote 0
I was so optimistic after reading that ...
I hoped no need to say anymore,
But it shows me that people who reply here just don't understand my viewpoints...and give comments that I already responded, but without giving a response to what I responded.
how sad it is.
 
Upvote 0
Hi, I was trying to point out that people understand your viewpoint but just don't agree on all that you are saying, which isn't the same thing. Hopefully that will make you a little less sad :)


And...can you explain why aim-matching is valid for the function "arithmetic negative" to do format conversion?
I presented my detailed argument in #55
I presented my argument (less detailed) in #64, first paragraph.
 
Upvote 0
Hi, I was trying to point out that people understand your viewpoint but just don't agree on all that you are saying, which isn't the same thing. Hopefully that will make you a little less sad :)



I presented my argument (less detailed) in #64, first paragraph.
I have already pointed out that you mentioned only the aim of the task and the end-result but not the aim-matching of the functions

to re-quote yours:
Hi,

Since Excel is a spreadsheet application used for number crunching, I think a mathematical operation that produces a correct result is by definition a match between the "aim" and the "result". In fact, I frequently use the values 1 or 0 in cells or in formulas due to their unique properties when applied to mathematical operations. To me, this is simply a technique, and a valuable one which should be a part of a user's repertoire, if they have any desire to proceed beyond beginner level.
"I think a mathematical operation that produces a correct result is by definition a match between the "aim" and the "result"."

let me illustrate an example

I want to compute 4!
I entered (2*3*4)
it gives the correct result, but not the correct definition, which is 1*2*3*4
mathematical operation that produces a correct result <= you only considered the end result
is by definition a match between the "aim" and the "result" <=aim of the task and the end result

so I can't find the aim-matching of the functions used in your above statement

"""2*3*4 is a mathematical operation that produces a correct result of 4!
and is by definition a match between "the aim to get the 4! result" and the "result""""

I do think it's a problematic statement above to respond to my "aim-matching of the function used"
 
Upvote 0
Hi,
So you are arguing that we should never write 4! but always 2*3*4 instead (or 1*2*3*4, as the case may be)?

I think that 4! matches the aim of finding the factorial of 4.
ξ
 
Upvote 0
Hi,
So you are arguing that we should never write 4! but always 2*3*4 instead (or 1*2*3*4, as the case may be)?

I think that 4! matches the aim of finding the factorial of 4.
ξ

I mean 4! and 1*2*3*4 are the same things
but 2*3*4 is not a correct representation of 4! although they give the same correct answer
 
Upvote 0
Hi,
That may be so, but I will still use 2*3*4 in cases where I am evaluating 4!, since it simplifies the work of solving equations - and yet I find myself able to sleep at night ;)
ξ
 
Upvote 0

Forum statistics

Threads
1,215,339
Messages
6,124,362
Members
449,155
Latest member
ravioli44

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top