Early Dates in Power Query are Off by a Day


March 30, 2018 - by

Early Dates in Power Query are Off by a Day

Power Query is the hot, young, new feature in Excel. Remember that it was not created by the Excel team. The Power Query team decided that they are not going to be bound by a decades-old mistake in VisiCalc. Here is why that can cause problems for you.

For background, dates in Excel are stored as a serial number. If you put in today's date - March 30, 2018 in Excel, they will show you some form of the date, but Excel is storing 43189. That means that today is 43189 days since January 1, 1900.

It also means that to get to tomorrow's date, you could use =F3+1. To find the number of days between dates, you could subtract one date from another. It is a good way to allow date calculations.

Caution

It also means that you can't easily record dates from the 1800's. This is bad for geneaologists or accountants at 150-year old companies.

Each date is represented by a specific serial number
Each date is represented by a specific serial number


The system was invented by Dan Bricklin and Bob Frankston when they created VisiCalc in 1978-1979. But Bob and Dan made one mistake. Let's take our model way back in time, to February and March of 1900. The date associated with the serial number 60 is February 29, 1900.


Very early dates are wrong in Excel
Very early dates are wrong in Excel

Leap days go back to Julius Caesar. Since it takes the Earth 365.242189 days to go around the sun, having a calendar of 365 days means that the seasons would shift by 24 days every century. Julius Caesar created a plan of adding a leap day in every year divisible by 4. That would have been perfect if the Earth went around the Sun every 365.25 days. But that tiny difference from .25 to .242189 meant that over the course of two millenia, the seasons were still off. Pope Gregory proposed a system in 1582 where there were three rules:

  • Rule 1: A year divisible by 4 would be a leap year, except:
  • Rule 2: A year divisible by 100 would not be a leap year, except:
  • Rule 3: A year divisible by 400 would be a leap year.

The rules were proposed in 1582 but were slow to be adopted. Japan did not agree until 1873. Bulgaria, Estonia, Russia, Greece and Turkey switched from 1916-1927. Rule #2 has only happened in 1700, 1800, and 1900. Rule #3 happened in 1600 and 2000. If you are reading this, you were likely alive for February 29 2000, but you may not have realized it was an exception to an exception to an exception. But back in 1978, this had not happened for 79 years, so it was not widely known. VisiCalc made the mistake of including February 29, 1900.

It is not a big deal, really. Who ever goes back to see if February 2, 1900 was a Thursday or a Wednesday (Excel says it was a Thursday, but it was really a Wednesday). And who is going to be tracking open receivables from early 1900? Let's face it, if you invoiced a vendor on February 15, 1900 and they haven't paid you yet, it is time to write off the receivable.

For compatibility, Mitch Kapor programmed the same error in to Lotus 1-2-3.

Steve Jobs, not wanting to purposely program an error made the Macintosh clock start on January 1, 1904.

At Microsoft, Excel had to be compatible with then market-leader Lotus 1-2-3, and the non-existant February 29, 1900 was introduced into Excel and remains there to this day.

But the Power Query architects aren't spreadsheet people. They don't have statues of Bricklin and Frankston in their office. They don't know this history. They made the decision that their dates would be the number of days elapsed since December 31, 1899. This makes Power Query people feel a tiny bit superior to Excel people because the weekdays reported by Power Query for 60 days in early 1900 are more correct than in Excel.

This is not anything to worry about. No one is dealing with dates from those 60 days.

But here is something far more common. Consider this table of well-known numbers.

Henry Heinz thought 57 sounded like a lucky number.
Henry Heinz thought 57 sounded like a lucky number.

I am going to ask you to "do the wrong thing" and accidentally follow these steps:

  1. Select column N
  2. Press Ctrl + Shift + 3 to format the column as a date
  3. Do not notice that you did either of these things.

    Whoops - the numbers are inadvertently dates
    Whoops - the numbers are inadvertently dates
  4. Use Data, Get & Transform, From Table or Range.
  5. When you get to Power Query, notice the dates in the number column. In the Applied Steps box, delete the Changed Format step.

When the data comes back to Excel, everything is off by 1. Circles no longer exist. Heinz has 56 varieties instead of 57.

Mitch Kapor was right to be wrong
Mitch Kapor was right to be wrong

I realize this is our last Friday in Excel Lent. I realize this is a contrived and obscure example. What are the odds someone would accidentally format a column of numbers in the 1-60 range as dates before going to Power Query? It seems low, but it has happened.

Power Query is an awesome feature. I am sure the architects figured nothing could go wrong if they were smarter than those people in 1978 who made a mistake. But will billions of spreadsheets working because we all agree to accept the mistake, you are tearing a tiny hole in the fabric of Excel.

Every Friday, I examine a bug or other fishy behavior in Excel.

Excel Thought Of the Day

I've asked my Excel Master friends for their advice about Excel. Today's thought to ponder:

"Ctrl+Shift+U toggles the height of the formula bar"

Title Photo: Yanina Trekhleb