Text to Number Conversion - Which functions should be used?

aaalviny

Board Regular
Joined
Apr 25, 2013
Messages
128
Disclaimer:
I am not turning a deaf ear to others' comment, but I just had enough evidence to turn them down. I don't turn them down due to no reasons.


p.s. I do admire the use of converting 8-digit number-formatted dates to a date format by use of TEXT("0000-00-00")
I think it's the best solution given the Excel's behaviour on dates.
I didn't thoroughly use it although I heard it once before, and this time it reminds me again.

just an example which I learnt here.


and p.s. I can't find anywhere suitable to put this article, so I probably put it in this board






Text to Number Conversion - Which functions should be used?


sometimes we encounter situation that we may need to convert texts to numbers,
by using either VALUE(A1), --A1, A1*1, A1^1, A1+0, INT(A1)
which of them should be used? Below are my analysis


in the context, --A1, A1*1, A1^1, A1+0 and INT are of the same type, and represented by --A1 in the discussion below, unless otherwise specified.


1. -- or INT() or VALUE() matches the aim of the formula?


the aim of these are to convert texts to numbers
the underlying meaning of --, is to add twice the negative signs to A1, which does not show the meaning of conversion.
the aim of INT() is to extract the integer portion of a number
on the other hand, VALUE(), the function itself, means to convert the texts to number, which matches the true aim




2. -- is simpler than VALUE()?


By simple, it doesn't mean the length, but it means the ease of understanding, and edit/update
VALUE() is easier to be understood than --, because
as mentioned before, VALUE() matches the aim to convert texts to numbers
If one writes VALUE(A1), I think every excel users, even Newbie, can understand the meaning of the formula very well
On the other hand, there are always users asking what -- is.




3. Convention is to use --?


Convention doesn't necessarily mean the correct method.
And everywhere is using different conventions for this conversion (My environment uses *1)
There are so many kinds of conventions to -- because it does not represent the original aim
e.g. If one wants to use a cell to store A1 + 2
=A1+2*1 won't be used, =A1+0+2 won't be used, =(A1+2)*1 won't be used.
because the correct representation is =A1 + 2
although all gives the correct result of the value.


and "convention" is to use VLOOKUP but not INDEX+MATCH
in fact INDEX+MATCH is better in various aspects, except the length being longer
(this is another issue I can talk about if you wish)






4. VALUE() requires one more parenthesis than --


Yes it's true, you may argue excel 2003 (that's what I am using) only allows 7 in one cell.
The fact is, under a good excel design, 3 parenthesis in the formula of any cells are already too much.
too many parenthesis also mean harder to read and maintain
There may be violation of DRY principle or the user is putting too much things in one cell.
some examples of wrong usage includes:
Code:
=IF(Curr = "USD", a, IF(Curr = "GBP", b, IF(Curr = "JPY", c, IF(Curr = "CAD", d))))
Code:
=IF(ISNA(MATCH(1, $A$1:$A$3, 0)), 0, MATCH(1, $A$1:$A$3, 0))
Code:
=VLOOKUP($A1, $B$2:$F$10, 2, 0)*VLOOKUP($A1, $B$2:$F$10, 3, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)-VLOOKUP($A1, $B$2:$F$10, 5, 0)+VLOOKUP($A1, $B$2:$F$10, 4, 0)/VLOOKUP($A1, $B$2:$F$10, 5, 0)


and just a random search on the threads:
Code:
=LOOKUP(9.99999999999999E+307,MID(SUBSTITUTE(A2," ","x"),SMALL(IF(ISNUMBER(MID(A2,ROW(INDIRECT("1:"&LEN(A2))),1)+0),ROW(INDIRECT("1:"&LEN(A2)))),ROW(INDIRECT("1:"&LEN(A2)))),5)+0)
by immediatelly looking the formula, do you find it trival to understand the formula?
personally I don't
If you find it trival, then you may use it, and I admit that you are really really very smart!!
(if you can be certain that only you will read the file in the future -- unless you delete it, how can you sure the file won't be exposed to others?)






5. -- is faster than VALUE?


Yes, -- is faster than VALUE
by my machine, converting 3,000,000 texts to numbers requires
2.44s by --, and
3.08s by VALUE
with a difference of 0.64s


but there are several points I'd like to make on this speed
a.
it's very seldom to have such large amounts of numbers stored wrongly as texts
I use "wrongly" because wrong data structure was used, e.g. in database, numbers should not be stored in the field type text
If there is really such a large amount of mis-formated texts, there is a problem with the raw data and it ought to be solved.


b.
even 3,000,000 texts give 0.64s difference.
In reality there won't be such large amount of texts.
Moreover, given there is no volatile funtions in the workbook (this is again another issue I can talk about if you wish)
the difference only happens once, and unless the data changes.
yet by proportion, 10,000 texts only give 0.002s difference


This is very micro-optimisation
In view of the small calculation time difference, I think this part is not a matter in comparing -- and VALUE


To supplement, the main calculation speed for excel depends on the number of lookups used and the amount of volatile functions used
which very much relies on the proper data structure, and formula designing e.g. DRY (Don't Repeat Yourself)
It can gives seconds to minutes of difference.
This is the real things about calculation speed that excel designers should care about.
And to have a fast calculation speed, a good workbook design with correct choice of algorithms will mostly do the job.


And volatile functions should be avoided (If it's really required due to poor data structures, copy and paste as value after adding a remark stating the original formula used)
(this is again another issue I can talk about if you wish)


c.
If SUMPRODUCT is used, the speed is slower because of SUMPRODUCT itself, not about -- / VALUE
because SUMPRODUCT should also be avoided and there are other alternatives
(this is again another issue I can talk about if you wish)






6. -- is shorter in terms of length than VALUE()?


Yes, 2 characters comparing to 7 charaters.
Length is the matter of syntax pre-defined by Excel, which is independent of the logic.
Other matters above are more important than the syntax pre-defined.




Conclusion
As a conclusion, please use VALUE() instead of --, *1, ^1, +0 or INT() for Excel best practice






Alvin - 2013 05 18
 
Further to the above the following table illustrates errors in using VALUE(), INT() and N() for coercion:


Excel 2010
ABCDEF
1ValueType--VALUE()INT()N()
21string1110
31numeric1111
4-1numeric-1-1-1-1
5-1string-1-1-10
6TRUEboolean1#VALUE!11
7FALSEboolean0#VALUE!00
806:14:23string0.2599884260.25998842600
906:14:23numeric0.2599884260.25998842600.26
Sheet1
Cell Formulas
RangeFormula
C2=--A2


VALUE() will not coerce boolean values to numeric values.
INT() will not yield the correct decimal value for time.
N() will not coerce time values captured as string.

Conclusion is that there are no functions (that I am aware of) that will work consistently and reliably as using good ol' plain arithmetic!
 
Upvote 0

Excel Facts

Excel Joke
Why can't spreadsheets drive cars? They crash too often!
What a load of crap!!!

You clutter the worksheet with loads of completely unnecessary formulae that increase:
(a) maintenance overheads;
(b) the risk of maintenance error;
(c) file size; and
(d) time required for a recalc by adding an extra output that wasn't asked for,
then have the gall to suggest it doesn't matter??? What planet are you from???
It is easier to maintain, those formulae are easy to read and take instant time to understand, right?
maintenance is not only to work on the existing task, but also to prepare for future changes of objectives and requirement. Therefore if boss asks for more info in the future, what will happen? Do the user find it easy to maintain and update?
that's why I illustrate adding an extra output, and it's very important in the changing world

I once come across a spreadsheet with 100 long sum product formulae, and when I hope to edit the things, it's just very hard and the calculation speed is very slow. And, of course, I eventually redesigned it. (And can you imagine, for that spreadsheet, at the beginning there are only a few sum product formulae, it's the change to the objectives and requirements that lead to a complicated spreadsheet which is hard to maintain.

I would say the maintenance risk is lower than sumproduct (at least it hides the intermediate steps of the calculation)
for file size, yes it's a concern, but to weigh the file size and maintenance effort, I would choose maintenance effort, it's more important to get things done with accuracy and efficiency, as well as others finding it easy to read.
 
Upvote 0
As an amendment to the initial table I posted:


Excel 2010
ABCDEFG
1valuedata type--VALUE()INT()N()Remark
2-1string-1-1-10N() fails to coerce value to the correct numerical value
3TRUEboolean1#VALUE!11VALUE() fails coercion
4FALSEboolean0#VALUE!00VALUE() fails coercion
501-May-2013string4139541395413950N() fails to coerce value to the correct numerical value
606:04:59string0.2534610.25346100N() and INT() fails to coerce value to the correct numerical value
Sheet1
Cell Formulas
RangeFormula
C2=--A2
C3=--A3
C4=--A4
C5=--A5
C6=--A6
D2=VALUE(A2)
D3=VALUE(A3)
D4=VALUE(A4)
D5=VALUE(A5)
D6=VALUE(A6)
E2=INT(A2)
E3=INT(A3)
E4=INT(A4)
E5=INT(A5)
E6=INT(A6)
F2=N(A2)
F3=N(A3)
F4=N(A4)
F5=N(A5)
F6=N(A6)
A2=TEXT(-1,0)
A3=TRUE
A4=FALSE
A5="01-May-2013"
A6="06:04:59"
 
Upvote 0
Further to the above the following table illustrates errors in using VALUE(), INT() and N() for coercion:

Excel 2010
ABCDEF
1ValueType--VALUE()INT()N()
21string1110
31numeric1111
4-1numeric-1-1-1-1
5-1string-1-1-10
6TRUEboolean1#VALUE!11
7FALSEboolean0#VALUE!00
806:14:23string0.2599884260.25998842600
906:14:23numeric0.2599884260.25998842600.26

<colgroup><col style="width: 25pxpx"><col><col><col><col><col><col></colgroup><thead>
</thead><tbody>
</tbody>
Sheet1

Worksheet Formulas
CellFormula
C2=--A2

<thead>
</thead><tbody>
</tbody>

<tbody>
</tbody>



VALUE() will not coerce boolean values to numeric values.
INT() will not yield the correct decimal value for time.
N() will not coerce time values captured as string.

Conclusion is that there are no functions (that I am aware of) that will work consistently and reliably as using good ol' plain arithmetic!
just a reply to this post first

when you choose which formulas to use, we have to be clear what the input, output and the aim is.
when designing, it's important to work on the desired process but not have any extra process embedded.
eg when in Vba there is a known error and you want to use resume next, you should not simply put it there to apply to other undiscovered errors.

1. About input structure, for a consistent data structure, only 1 aim is required, I.e. only converting strings to numbers, or converting Boolean to numbers.
if you are using the seemingly reliable --, you cannot discover any potentials about the data formats of raw data, probably there may be 1 wrong data in the whole database. That's to recall the principle of doing only what is needed but not extra.
To demonstrate, imagine there is one Boolean data out of 10000 numbers formatted as text.
if -- is used, the error cannot be found. This is not what reliable means.

And if the data structure consists of all the things you listed above, it is clearly not a database of qualified data, isn't it?

again, do what it is supposed, there are already different functions as you listed above to cater for different needs
Arithmetic negative of a Boolean variable has no meaning. You found the meaning only because excel' special design. It is not a valid logic to have arithmetic negative of a Boolean in human sense, right?
 
Upvote 0
alvin_excel_no_sumproduct_20130518.PNG

The SUMIF may be faster, but the addition of the helper columns will invariably slow the model. I'm a big fan of using helper columns to make complex formulae easier to understand; and yes sometimes they do add performance benefits. This however is not an example of this.

Firstly, A3 should state "Total (Not Yes and Not No)". Or logic is somewhat different.
Secondly, if I were to use helper columns I would avoid 26 unnecessary IF function calls and rather use e.g. in D6: =--(H6="no")
Finally; my preference to this problem would be two formulae, chosen bother for performance and also because I find most people would find it alot easier to get to grips with than reviewing a series of helper columns:
In B2: =SUMPRODUCT(--(E6:E18="yes"),--(H6:H18="no"),J6:J18)
In B3: =SUM(J6:J18)-B2
 
Last edited:
Upvote 0
It is not a valid logic to have arithmetic negative of a Boolean in human sense, right?
I found good use of doing exactly this before, yes!

Alvin I have provided good reason to favour to -- over a function, both addressing each point you made in your initial post. Further I have shown instances where the functions do not provide the correct coerced value. I favour a single method that works reliably across all data types. If I need to discriminate then I may deviate or just test the type. Alvin, regardless of what I or anybody says you are going to argue back, with each point you make stepping one step closer to absurd! Let's allow the reader to decide now. ;)
 
Upvote 0
Well I said I wouldn't debate this; but for the sake of the readers:


Indeed. Well actually, VALUE() does. INT() is intended to truncate a decimal value to a whole number. But I suggest that that a meaningless mathematical operation using an arithmetic operator matches the aim of the formula too. We know that, in Excel, whenever an expression includes an arithmetic operator (--;+
;-;*;/;%;^) Excel attempts to resolve each operand as a numeric value (i.e. coercion is invoked). Therefore it is by design and it "matches the aim of the formula". Understanding this holds no further complexity of understanding the purpose of the VALUE() function.

Glad to see the discussion, I hope to learn something as I am not an encyclopedia

As mentioned above, arithmetic negation to a Boolean does not have a meaning, it's by excels design to make it possible yet it is out of logic. If users continue to adapt to the wrong logic and find out their conventions, its correct if you are involved to excel's design and think its correct. But staying outside from it, this logic is just nonhuman. And given the wrong logic, the aim is clearlynot reached.
recall the if A is false, the there is no meaning to discuss whether B is true or false.
note : negation of Boolean true is false, as not(true)

I would say on par. Any user unaware of coercion has to learn it. Learning what VALUE() does is no simpler nor more complex than understanding that use of arithmetic operator also invokes a coercion attempt. Besides it's not rocket science and it's really easy to read up on and to understand. Yes we occasionally see people ask what is the purpose of -- but I believe what is being asked is the context (after all everyone knows --1=1). People ask why it is necessary to invoke coercion. If we start substituting -- for VALUE, I believe we will have equally as many questions asking why it is necessary.

I have learnt it due to excels design. To any human, seeing --"3" does not give the meaning to covert formats, right?. It only makes sense when you are involved to the whole excel environment (please stay out of the excel world to think about the question I would say)
I don't believe many people will ask what value means, because the function name itself is already giving a clue to the meaning of the functions, and its easy to search.
yet -- gives no clueto the meaning. It only does twice arithmetic negation.

well it seems not to be an objective discussion to the point whether people can understand.
but from my experience, I remember I had to spend a second to figure out what it means, as -- doesn't give any clues to the underlying meaning


Convention is important. It ensures consistency in how we work. It means that when you evaluate a complex formula and see -- you know immediately a deliberate attempt to coerce a text value to a numeric value. Indeed there are some conventions that aren't as good as they should be. This isn't one of them!


Yes it's a trivial point but most of our conventions are trivial. It is exactly all these trivial points though that when adhered to in a complex model makes all the difference. I for one don't want the extra parenthesis even if only for formula evaluation. I'm a big fan of analysing formulae with F9 over certain portions. Extra parenthesis only makes that process that little more tricky. But yes, this is a trivial point.

To the above two, as mentioned in my first article, I don't have much to supplement. Extra parenthesis is not a problem given it is a well designed spreadsheet.
its easy to understand value as converting texts to numbers but thoughts are always required to think about the use of --, does it mean the true arithmetic negation, or the conversion of formats?

Oh yes it is! The double negation operation invokes VERY little overhead, while the VALUE() function call by comparison to -- involves considerably more work. Is it noticeable? Not unless you scale it up. But unnecessary function calls is a significant culprit behind slow calcing models. Understanding coercion and boolean logic and avoiding unnecessary function calls can make a noticeable difference in the calc speed of your workbooks. What we are talking about here is micro-optimization; but micro-optimization is indeed very important. Use of all the micro-optimization best practises is what makes the difference between a 10MB file and a 1MB file. I once inherited a spreadsheet that was over 20MB. By the time I removed some duplicate calculations, substituted linear lookups for binary lookups, applied boolean logic using digital values I reduced the file size down below 1MB. All the improvements were micro-optimization techniques. So use of -- (or any arithmetic operator) over a function is important.

Well in my environment I am always dealing with models of over 80 MB, and I have tried to deal with those over 200MB
its required, to the complexity of the finance environment and the uncertainty of future,also with the different basis of projection and sensitivity testing.

When I was not aware of the importance of design, the 200 MB spreadsheet takes me possibly 30mins to have a full calculation and I always had to use manual calc mode.
after a year when I reworked, I used auto calculation mode to deal with it with no volatile formulas and using binary lookups and no DRY violation.
it just takes little time for the calculation.
(this 200MB thing is about data processing, but not the same type of thing for the 80MB model. 80MB things are all about modeling, not because of lots of same type of data as database)

Well I suppose. But that's not our motivation to use it. Not sure where you picked that argument up from Alvin.

Just in case anyone cares about the readability of longer characters


Please don't! You will be breaking away from convention and you will add more overhead to your formulae. This is a very bad suggestion. I agree that there are some stupid conventions. This is not one of them!

Interestingly one may ask why -- is the preferred (or most common) method? I believe it is because double negation comes 1st in {arithmetic} order precedence relations. I recall reading a suggestion how this is a teeny tiny bit quicker than using an operator lower down the list of order precedence; but I cannot remember the logic to that claim. Still, everyone seems to use --, it strikes me as being a very well-considered convention so I'm going to stick with it. Be good to yourself and do the same. ;)
The order of precedence is designed with basis
By taking it to a higher place, do you mean you have to put other things in that formula to a lower priority? All in all they are all to be calculated, I don't feel there should be any special treatments to the order of precedence?

And yes I nearly forgot about N() in the discussion, thanks for the reminder
 
Upvote 0
The SUMIF may be faster, but the addition of the helper columns will invariably slow the model. I'm a big fan of using helper columns to make complex formulae easier to understand; and yes sometimes they do add performance benefits. This however is not an example of this.

Firstly, A3 should state "Total (Not Yes and Not No)". Or logic is somewhat different.
Secondly, if I were to use helper columns I would avoid 26 unnecessary IF function calls and rather use e.g. in D6: =--(H6="no")
Finally; my preference to this problem would be two formulae, chosen bother for performance and also because I find most people would find it alot easier to get to grips with than reviewing a series of helper columns:
In B2: =SUMPRODUCT(--(E6:E18="yes"),--(H6:H18="no"),J6:J18)
In B3: =SUM(J6:J18)-B2
In B2: =SUMPRODUCT(--(E6:E18="yes"),--(H6:H18="no"),J6:J18)
To me it's already a complicated formula, seems the level of simple differs among us ;)
What will happen if there are more conditions to add?
and if more values of interest where simple subtraction does not work?
and then how about when you have to change all "yes" to "y" in all sum products formulas?
and it is "not yes or not no", did you misread it?

for my test of 4 columns withe 56000 data, and 5 values of interest
sumif uses 0.41s and sum product uses 0.61s
 
Upvote 0
It is very difficult to debate this with you as you are actually making very few accurate statements. Your statement that you deal with 200MB (even 80MB) only reinforces to me and suggests that you are not in possession of enough knowledge yet in terms of Excel development and spreadsheet design. As an active Excel Developer, Trainer and Consultant I have been exposed to many monstrosities. 200MB is very rare, 80MB I see often enough. Almost always the workbooks could be massively reduced. The culprit doesn't tend to be the data but rather poor choice of formulae and bad design. And I am convinced that you have yet to learn design, given your earlier example.

What's more you're making it up as you go. You had no idea that your suggested functions don't work on certain data types. Had I asked you one what circumstances you would recommend VALUE() I'm sure you would have suggested all. After all you never suggested a mixture of arithmetic and VALUE() in your initial post. This is what I mean when I say you are determined to argue.

Further you seem to lack an understanding of the basics of boolean logic; which is not unique to the Excel environment only.
note : negation of Boolean true is false, as not(true)

Excel 2010
ABC
1NegationAs boolean
2TRUE-1TRUE
3FALSE0FALSE
Sheet1
Cell Formulas
RangeFormula
B2=-A2
B3=-A3
C2=AND(B2)
C3=AND(B3)


On these grounds I'm afraid you make an appauling case to support your convention.
 
Last edited:
Upvote 0
It is very difficult to debate this with you as you are actually making very few accurate statements. Your statement that you deal with 200MB (even 80MB) only reinforces to me and suggests that you are not in possession of enough knowledge yet in terms of Excel development and spreadsheet design. As an active Excel Developer, Trainer and Consultant I have been exposed to many monstrosities. 200MB is very rare, 80MB I see often enough. Almost always the workbooks could be massively reduced. The culprit doesn't tend to be the data but rather poor choice of formulae and bad design. And I am convinced that you have yet to learn design, given your earlier example.

What's more you're making it up as you go. You had no idea that your suggested functions don't work on certain data types. Had I asked you one what circumstances you would recommend VALUE() I'm sure you would have suggested all. After all you never suggested a mixture of arithmetic and VALUE() in your initial post. This is what I mean when I say you are determined to argue.

Further you seem to lack an understanding of the basics of boolean logic; which is not unique to the Excel environment only.

Excel 2010
ABC
1NegationAs boolean
2TRUE-1TRUE
3FALSE0FALSE

<tbody>
</tbody>
Sheet1

Worksheet Formulas
CellFormula
B2=-A2
C2=AND(B2)
B3=-A3
C3=AND(B3)

<tbody>
</tbody>

<tbody>
</tbody>



On these grounds I'm afraid you make an appauling case to support your convention.
those 80MB is not initially developed by me, in fact I planned to redevelop it but it requires time and simply I don't have enough time to work on it yet.
i am not sure whether you have touch spreadsheets in my industry
if there is really much data and objectives required, it can grow much large and large, especially the continuing development of methodology and requirements in my industry are just never ending and only become more and more complicated. (Yes it is really very much)

for 200MB it's because I am working on hundreds thousands of records (I personally have worked by database approach in fox pro, to check the result, and of course this is the correct approach) but others just want to have an excel checking since they are not technologically advanced enough to understand the nested queries
And basically given only the data it's just already 100MB+

and I have to emphasize, choice of functions should only be used to cater what required to be done, but not less nor extra.
if I need to convert text to numbers, use value, if I have to convert blanks and numbers to all numbers, use N, etc

well Boolean is Boolean, never a number, don't be trapped by excel's logic. Only few support the convention false = 0 and true = 1 (correct me if wrong, excel, Vba, c support this only)
 
Upvote 0

Forum statistics

Threads
1,215,949
Messages
6,127,888
Members
449,411
Latest member
AppellatePerson

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top