File size cap when saving to csv?

BuJay

New Member
Joined
Jun 24, 2020
Messages
7
Office Version
  1. 365
  2. 2019
  3. 2016
  4. 2013
Platform
  1. Windows
I am seeing really, really strange behavior right now. Would appreciate any and all thoughts.

I created an excel file with 2500 columns and 100,000 rows and saved it as a csv to practice loading large csv files into Python using Pandas. I was able to load the 100,000 row file with pandas successfully.

Then, I simply copied the 100,000 rows (excluding headers) and pasted in rows 100,001 through 200,000 to create a 200,000 row csv file. I was able to load the 200,000 row file with pandas successfully.

Then, I simply added another 100,000 rows (excluding headers) and pasted in rows 200,001 through 300,000 to create a 300,000 row csv file. I was able to load the 300,000 row file with pandas successfully.

Here is where is gets strange. The size of the 300,000 row csv is 4.19 GB.

When I open that file and add another 100,000 rows and save as a 400,000 row csv file, the file size remains 4.19 GB and something is corrupting the csv file as its structure appears to change and I cannot load it successfully.

I am deducing that something is corrupting it during save process. Any thoughts?

As an aside, I know there isn't any real reason to use this process for large files - I get that. I am still curious as to what is going on. Also, it is not a python nor pandas issue.

Thanks
 

Some videos you may like

Excel Facts

How to change case of text in Excel?
Use =UPPER() for upper case, =LOWER() for lower case, and =PROPER() for proper case. PROPER won't capitalize second c in Mccartney

BuJay

New Member
Joined
Jun 24, 2020
Messages
7
Office Version
  1. 365
  2. 2019
  3. 2016
  4. 2013
Platform
  1. Windows
Any help or thoughts?
 

JonXL

Active Member
Joined
Feb 5, 2018
Messages
357
Office Version
  1. 365
  2. 2016
Platform
  1. Windows
No advice, just a question: When you open the 'corrupt' CSV in a text editor, what exactly is wrong with its structure, etc.?
 

BuJay

New Member
Joined
Jun 24, 2020
Messages
7
Office Version
  1. 365
  2. 2019
  3. 2016
  4. 2013
Platform
  1. Windows
It's strange - I can run the "error_bad_lines" command but it throws basically every row as an error after the initial error row....the whole point is that I can't really open the .csv's in excel to look at the rows or when I do, everything looks fine.....
 

jay_py

New Member
Joined
Mar 29, 2020
Messages
7
Office Version
  1. 2016
  2. 2013
Platform
  1. Windows
Since you are only reading the csv using Python, it's not the cause of the problem.

A .csv file is basically similar to a .txt file, which means you can open a .csv with any text editor. Try to open your large file with a text editor, look through the columns and rows - are there anything extra that shouldn't be there? I suspect maybe you copy/pasted something extra...
 

Watch MrExcel Video

Forum statistics

Threads
1,109,176
Messages
5,527,251
Members
409,754
Latest member
ekTZ

This Week's Hot Topics

Top