Text to Number Conversion - Which functions should be used?

aaalviny

Board Regular
Joined
Apr 25, 2013
Messages
128
Disclaimer:
I am not turning a deaf ear to others' comment, but I just had enough evidence to turn them down. I don't turn them down due to no reasons.


p.s. I do admire the use of converting 8-digit number-formatted dates to a date format by use of TEXT("0000-00-00")
I think it's the best solution given the Excel's behaviour on dates.
I didn't thoroughly use it although I heard it once before, and this time it reminds me again.

just an example which I learnt here.


and p.s. I can't find anywhere suitable to put this article, so I probably put it in this board






Text to Number Conversion - Which functions should be used?


sometimes we encounter situation that we may need to convert texts to numbers,
by using either VALUE(A1), --A1, A1*1, A1^1, A1+0, INT(A1)
which of them should be used? Below are my analysis


in the context, --A1, A1*1, A1^1, A1+0 and INT are of the same type, and represented by --A1 in the discussion below, unless otherwise specified.


1. -- or INT() or VALUE() matches the aim of the formula?


the aim of these are to convert texts to numbers
the underlying meaning of --, is to add twice the negative signs to A1, which does not show the meaning of conversion.
the aim of INT() is to extract the integer portion of a number
on the other hand, VALUE(), the function itself, means to convert the texts to number, which matches the true aim




2. -- is simpler than VALUE()?


By simple, it doesn't mean the length, but it means the ease of understanding, and edit/update
VALUE() is easier to be understood than --, because
as mentioned before, VALUE() matches the aim to convert texts to numbers
If one writes VALUE(A1), I think every excel users, even Newbie, can understand the meaning of the formula very well
On the other hand, there are always users asking what -- is.




3. Convention is to use --?


Convention doesn't necessarily mean the correct method.
And everywhere is using different conventions for this conversion (My environment uses *1)
There are so many kinds of conventions to -- because it does not represent the original aim
e.g. If one wants to use a cell to store A1 + 2
=A1+2*1 won't be used, =A1+0+2 won't be used, =(A1+2)*1 won't be used.
because the correct representation is =A1 + 2
although all gives the correct result of the value.


and "convention" is to use VLOOKUP but not INDEX+MATCH
in fact INDEX+MATCH is better in various aspects, except the length being longer
(this is another issue I can talk about if you wish)






4. VALUE() requires one more parenthesis than --


Yes it's true, you may argue excel 2003 (that's what I am using) only allows 7 in one cell.
The fact is, under a good excel design, 3 parenthesis in the formula of any cells are already too much.
too many parenthesis also mean harder to read and maintain
There may be violation of DRY principle or the user is putting too much things in one cell.
some examples of wrong usage includes:
Code:
=IF(Curr = "USD", a, IF(Curr = "GBP", b, IF(Curr = "JPY", c, IF(Curr = "CAD", d))))
Code:
=IF(ISNA(MATCH(1, $A$1:$A$3, 0)), 0, MATCH(1, $A$1:$A$3, 0))
Code:
=VLOOKUP($A1, $B$2:$F$10, 2, 0)*VLOOKUP($A1, $B$2:$F$10, 3, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)-VLOOKUP($A1, $B$2:$F$10, 5, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)/VLOOKUP($A1, $B$2:$F$10, 5, 0)


and just a random search on the threads:
Code:
=LOOKUP(9.99999999999999E+307,MID(SUBSTITUTE(A2," ","x"),SMALL(IF(ISNUMBER(MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)+0),ROW(INDIRECT("1:"&LEN(A2)))),ROW(INDIRECT("1:"&LEN(A2)))),5)+0)
by immediatelly looking the formula, do you find it trival to understand the formula?
personally I don't
If you find it trival, then you may use it, and I admit that you are really really very smart!!
(if you can be certain that only you will read the file in the future -- unless you delete it, how can you sure the file won't be exposed to others?)






5. -- is faster than VALUE?


Yes, -- is faster than VALUE
by my machine, converting 3,000,000 texts to numbers requires
2.44s by --, and
3.08s by VALUE
with a difference of 0.64s


but there are several points I'd like to make on this speed
a.
it's very seldom to have such large amounts of numbers stored wrongly as texts
I use "wrongly" because wrong data structure was used, e.g. in database, numbers should not be stored in the field type text
If there is really such a large amount of mis-formated texts, there is a problem with the raw data and it ought to be solved.


b.
even 3,000,000 texts give 0.64s difference.
In reality there won't be such large amount of texts.
Moreover, given there is no volatile funtions in the workbook (this is again another issue I can talk about if you wish)
the difference only happens once, and unless the data changes.
yet by proportion, 10,000 texts only give 0.002s difference


This is very micro-optimisation
In view of the small calculation time difference, I think this part is not a matter in comparing -- and VALUE


To supplement, the main calculation speed for excel depends on the number of lookups used and the amount of volatile functions used
which very much relies on the proper data structure, and formula designing e.g. DRY (Don't Repeat Yourself)
It can gives seconds to minutes of difference.
This is the real things about calculation speed that excel designers should care about.
And to have a fast calculation speed, a good workbook design with correct choice of algorithms will mostly do the job.


And volatile functions should be avoided (If it's really required due to poor data structures, copy and paste as value after adding a remark stating the original formula used)
(this is again another issue I can talk about if you wish)


c.
If SUMPRODUCT is used, the speed is slower because of SUMPRODUCT itself, not about -- / VALUE
because SUMPRODUCT should also be avoided and there are other alternatives
(this is again another issue I can talk about if you wish)






6. -- is shorter in terms of length than VALUE()?


Yes, 2 characters comparing to 7 charaters.
Length is the matter of syntax pre-defined by Excel, which is independent of the logic.
Other matters above are more important than the syntax pre-defined.




Conclusion
As a conclusion, please use VALUE() instead of --, *1, ^1, +0 or INT() for Excel best practice






Alvin - 2013 05 18
 
To further supplement: "Algorithem efficiency"

[h=3]Readability, trade offs and trends [edit][/h]One must be careful, in the pursuit of good coding style, not to over-emphasize efficiency. Frequently, a clean, readable and 'usable' design is much more important than a fast, efficient design that is hard to understand. There are exceptions to this 'rule' (such as embedded systems, where space is tight, and processing power minimal) but these are rarer than one might expect.
However, increasingly, for many 'time critical' applications such as air line reservation systems, point-of-sale applications, ATMs (cash-point machines), Airline Guidance systems,Collision avoidance systems and numerous modern web based applications—operating in a real-time environment where speed of response is fundamental—there is little alternative.

Algorithmic efficiency - Wikipedia, the free encyclopedia

the good word it uses:
not over-emphasizing efficiency, Frequently, a clean, readable and 'usable' design is much more important than a fast, efficient design that is hard to understand.
 
Upvote 0

Excel Facts

Why does 9 mean SUM in SUBTOTAL?
It is because Sum is the 9th alphabetically in Average, Count, CountA, Max, Min, Product, StDev.S, StDev.P, Sum, VAR.S, VAR.P.
First of all,
Mega workbook has no necessary relationship to speed.
If an excel contains many required data and complex calculation, the file size has to be big
Even if a workbook has small file size, there can be many volatile functions / duplicated calculations / array formulas which lead to slow worksheets.

then, when proper design are used (as I always mentioned, proper database structure, proper use of binary lookups, "DRY", no volatile functions, etc.)
the auto calc mode actually is working very well, and everytime I update the cells, only those depending on them are updated. So when I update some far away cells, no calculation is needed.)
at least, for my 200MB well-designed and the historic 80MB not well-designed worksheets, the 200MB one is faster than the 80MB.

I don't need to re-calc in my excel design, all in auto-calculation mode and there's no need to wait while I am working.
I don't need to press F9 when working, for your interest.

OK, your workbook works for you. I currently use workbooks built using a 'Best Practice Modelling' add-in and I hate them, because they have taken their idea of best practice to the extreme. This comes down, effectively, to only doing one kind of operation in a worksheet, which creates huge numbers of sheets for no good reason that I can fathom.
My point is that you appear to be stuck on a particular track, and want the whole MrExcel community to follow your path because it's best practice. Some very experienced people have explained why the conventions exist; you have acknowledged that there are advantages to them. So, will you learn from this interchange or will you continue to argue your point on ever-smaller pieces of ground?

Denis
 
Upvote 0
or will you continue to argue your point on ever-smaller pieces of ground?
Nice turn of phrase, Denis, and on mark.
 
Upvote 0
Hi,

Since Excel is a spreadsheet application used for number crunching, I think a mathematical operation that produces a correct result is by definition a match between the "aim" and the "result". In fact, I frequently use the values 1 or 0 in cells or in formulas due to their unique properties when applied to mathematical operations. To me, this is simply a technique, and a valuable one which should be a part of a user's repertoire, if they have any desire to proceed beyond beginner level.

The whole thing is somewhat minor and therefore unimportant - anyone should use -- or Value() as they please without concern. From my experience the conventions used are double negation (the majority), add zero (second place), and all other techniques, including VALUE (all rare). There's really no point to making a big deal out of something that is neither a "best practice" (defined as that which the high performing spreadsheet designers do), nor very important. Many of your points are fine, but overkill - nobody is arguing that we produce spreadsheets that are unmaintainable, confusing, or error-prone. But it is really stretching the argument beyond the breakning point to suggest that one formula using --() creates all of these problems.

To the latter, I'm afraid that in my experience no matter how simple and "maintainable" I make a spreadsheet, 90% of users can't grasp how it works. This could be only because it contains an IF() formula, a VLOOKUP(), or even because it uses input cells (another "best practice"). So when I make spreadsheets that others must use and maintain, I use all of my expertise to make something that works, even when dummies take it over - with the assumption that they don't understand it (and may not be able to).

ξ
 
Upvote 0
^^ to follow up, I have a book on Excel best practices which advocates strongly (and convincingly, in the event) that using R1C1 formula style is an Excel best practice. Of course, almost no one I know actually follows this best practice. So I guess that's a case of a "best practice" that is ignored, and could be argued until next Tuesday without effect, if that's any consolation to you.

Note: the book in question is Excel Best Practices for Business by Abdulezer. However, I do occasionally put Excel in R1C1 mode temporarily to review a worksheet for possible errors (!). I'm almost willing to go with this permanently but I know it would just confuse the heck out of everyone else, even if it is best.
 
Upvote 0
OK, your workbook works for you. I currently use workbooks built using a 'Best Practice Modelling' add-in and I hate them, because they have taken their idea of best practice to the extreme. This comes down, effectively, to only doing one kind of operation in a worksheet, which creates huge numbers of sheets for no good reason that I can fathom.
My point is that you appear to be stuck on a particular track, and want the whole MrExcel community to follow your path because it's best practice. Some very experienced people have explained why the conventions exist; you have acknowledged that there are advantages to them. So, will you learn from this interchange or will you continue to argue your point on ever-smaller pieces of ground?

Denis
You are quoting the things that are not related to the topic though I know it's me who responded to others' critics on me.
I am not sure what your best practice modelling add-in is, so I guess I won't comment on it unless you give further details.

Again, I can't force anyone, I am just presenting my advice.

whether it is "smaller" pieces of ground can not be measured,
and I have to say people here really don't value the "aim-matching" principle as important.
and I would say claiming efficient and fast is only an "over-emphasis on efficiency" but sacrifice design

So that's why people here are thinking -- as correct. and think what I am arguing is a small piece.
And I believe I finally have presented my points clear enough at this moment.

That's all I want to say
 
Upvote 0
Hi,

Since Excel is a spreadsheet application used for number crunching, I think a mathematical operation that produces a correct result is by definition a match between the "aim" and the "result". In fact, I frequently use the values 1 or 0 in cells or in formulas due to their unique properties when applied to mathematical operations. To me, this is simply a technique, and a valuable one which should be a part of a user's repertoire, if they have any desire to proceed beyond beginner level.
Hi,

I doubt you are only viewing the aim of your task and the end result to claim that it is aim-matching...
please consider the aim-matching of the function.


The whole thing is somewhat minor and therefore unimportant - anyone should use -- or Value() as they please without concern. From my experience the conventions used are double negation (the majority), add zero (second place), and all other techniques, including VALUE (all rare). There's really no point to making a big deal out of something that is neither a "best practice" (defined as that which the high performing spreadsheet designers do), nor very important. Many of your points are fine, but overkill - nobody is arguing that we produce spreadsheets that are unmaintainable, confusing, or error-prone. But it is really stretching the argument beyond the breakning point to suggest that one formula using --() creates all of these problems.
Again, that's because you value it as unimportant so other people, like you, are disproving my points.
It's important to raise the efficiency by 80%, but just too minor if to raise the efficiency by 1% and has to sacrifice the design
this is kind of over-emphasis on efficiency.

Again because many people are under-valuing this principle, the conventions become to use --

and I have to make clear that, "high performing spreadsheet designers" means both design and efficiency are good, but not only on the performance of speed.
As Best Practice does not only mean the efficiency.

I am not sure whether your final statements are talking about SUMPRODUCT or --.
I only state the problem of -- being indiscrimate and not aim-matching.
[/QUOTE]
To the latter, I'm afraid that in my experience no matter how simple and "maintainable" I make a spreadsheet, 90% of users can't grasp how it works. This could be only because it contains an IF() formula, a VLOOKUP(), or even because it uses input cells (another "best practice"). So when I make spreadsheets that others must use and maintain, I use all of my expertise to make something that works, even when dummies take it over - with the assumption that they don't understand it (and may not be able to).
so your users seem to be very elementary about excel? if yes then I would rather say it's not applicable for us to talk about the design and readability of excel to them, as they don't have the skills to do this yet. (given they may have no knowledge to IF / VLOOKUP..they are really very elementary...)
Here, things are talking from the developer's prospective.
^^ to follow up, I have a book on Excel best practices which advocates strongly (and convincingly, in the event) that using R1C1 formula style is an Excel best practice. Of course, almost no one I know actually follows this best practice. So I guess that's a case of a "best practice" that is ignored, and could be argued until next Tuesday without effect, if that's any consolation to you.

Note: the book in question is Excel Best Practices for Business by Abdulezer. However, I do occasionally put Excel in R1C1 mode temporarily to review a worksheet for possible errors (!). I'm almost willing to go with this permanently but I know it would just confuse the heck out of everyone else, even if it is best.
I am holding against side to R1C1, as one quick concern is the readability of the reference, for instance
a cell B5 contains the formula =RC[50]
how long it takes to figure out what it is?
(similar to the case to use VLOOKUP(,,100, 0)
And can you explain the benefit of using R1C1 mode by starting a new thread? I hope to have a quick look on it
 
Upvote 0
Hi aalviny,
and I have to say people here really don't value the "aim-matching" principle as important.
I don't think this is quite true. It's just the the aim-matching principle is to vague. No one really wants to produce work that doesn't match their aim. I think the disagreement is more that the majority of users of double negation find it to be sufficient so far as both getting results and being understandable/readable. You are in the minority on that point.
and I would say claiming efficient and fast is only an "over-emphasis on efficiency" but sacrifice design

I'm not sure what you mean here, but there are also different opinions on efficiency/speed/optimization/file size. Some are very concerned about these things, others a little concerned, and others hardly worry about it at all (unless it actually becomes a problem - files too big or taking too long to calculate). I suppose it's good to be proactive by designing efficiency in from the start, but be that as it may it's the case the some folks will only get to that when they need to. Again, I doubt anyone wants to create inefficient spreadsheets. It's just a matter of how much you know, and how much you want to put into it.
 
Upvote 0
I would refer you to the book quoted to get a complete case of the benefits of R1C1 formulas. I'm not a big user of this, as I said, so I'm not the one to start a thread on it. One of the benefits is that when you write a formula and drag it down or across, it looks exactly the same in every cell. So you can quickly see anomalies (this is why I use it for "debugging"). Of course at first it's hard to understand the references, but in time I imagine you'd become used to it.

I mentioned users because you have emphasized this matter (unless I am mistaken) -- that other people might need to take over the spreadsheet and that's why you take pains to make them easy to understand. Similarly with Sumproduct, I mentioned this because it is a prime example of where double negation is employed, and double negation is the thing you seem to object too on the grounds of your "aim-matching" principle.

ξ
 
Upvote 0
Hi aalviny,

I don't think this is quite true. It's just the the aim-matching principle is to vague. No one really wants to produce work that doesn't match their aim. I think the disagreement is more that the majority of users of double negation find it to be sufficient so far as both getting results and being understandable/readable. You are in the minority on that point.
Hi,
It seems you are still concerning about the aim-matching of the task and result, but not the aim-matching of the functions used. (Or you haven't seen my replies above?)
understandable, readable and aim-matching are important during the choice of functions
I'm not sure what you mean here, but there are also different opinions on efficiency/speed/optimization/file size. Some are very concerned about these things, others a little concerned, and others hardly worry about it at all (unless it actually becomes a problem - files too big or taking too long to calculate). I suppose it's good to be proactive by designing efficiency in from the start, but be that as it may it's the case the some folks will only get to that when they need to. Again, I doubt anyone wants to create inefficient spreadsheets. It's just a matter of how much you know, and how much you want to put into it.
I am meaning the fact that -- is a little faster than VALUE, if you read through the previous replies.
comparing to the importance of DRY, using binary lookups, good table structure, no volatile functions, etc. that can give significant improvement to the speed.
they are just like difference of raising the efficiency from 50% to 90% and 90% to 91%
to do this small efficiency, it is ok as long as the design is good, e.g. keeping aim-matching, understandable and readable.
otherwise it is over-emphasis on efficiency. (actually even it's ok, the time spent to improve this efficiency may just cancel out the efficiency improvement)

Alvin
 
Upvote 0

Forum statistics

Threads
1,215,326
Messages
6,124,258
Members
449,149
Latest member
mwdbActuary

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top