File size cap when saving to csv?

BuJay

New Member
Joined
Jun 24, 2020
Messages
7
Office Version
365, 2019, 2016, 2013
Platform
Windows
I am seeing really, really strange behavior right now. Would appreciate any and all thoughts.

I created an excel file with 2500 columns and 100,000 rows and saved it as a csv to practice loading large csv files into Python using Pandas. I was able to load the 100,000 row file with pandas successfully.

Then, I simply copied the 100,000 rows (excluding headers) and pasted in rows 100,001 through 200,000 to create a 200,000 row csv file. I was able to load the 200,000 row file with pandas successfully.

Then, I simply added another 100,000 rows (excluding headers) and pasted in rows 200,001 through 300,000 to create a 300,000 row csv file. I was able to load the 300,000 row file with pandas successfully.

Here is where is gets strange. The size of the 300,000 row csv is 4.19 GB.

When I open that file and add another 100,000 rows and save as a 400,000 row csv file, the file size remains 4.19 GB and something is corrupting the csv file as its structure appears to change and I cannot load it successfully.

I am deducing that something is corrupting it during save process. Any thoughts?

As an aside, I know there isn't any real reason to use this process for large files - I get that. I am still curious as to what is going on. Also, it is not a python nor pandas issue.

Thanks
 

Some videos you may like

Excel Facts

Can a formula spear through sheets?
Use =SUM(January:December!E7) to sum E7 on all of the sheets from January through December

BuJay

New Member
Joined
Jun 24, 2020
Messages
7
Office Version
365, 2019, 2016, 2013
Platform
Windows
Any help or thoughts?
 

JonXL

Active Member
Joined
Feb 5, 2018
Messages
343
Office Version
365, 2016
Platform
Windows
No advice, just a question: When you open the 'corrupt' CSV in a text editor, what exactly is wrong with its structure, etc.?
 

BuJay

New Member
Joined
Jun 24, 2020
Messages
7
Office Version
365, 2019, 2016, 2013
Platform
Windows
It's strange - I can run the "error_bad_lines" command but it throws basically every row as an error after the initial error row....the whole point is that I can't really open the .csv's in excel to look at the rows or when I do, everything looks fine.....
 

jay_py

New Member
Joined
Mar 29, 2020
Messages
7
Office Version
2016, 2013
Platform
Windows
Since you are only reading the csv using Python, it's not the cause of the problem.

A .csv file is basically similar to a .txt file, which means you can open a .csv with any text editor. Try to open your large file with a text editor, look through the columns and rows - are there anything extra that shouldn't be there? I suspect maybe you copy/pasted something extra...
 

Watch MrExcel Video

Forum statistics

Threads
1,102,351
Messages
5,486,361
Members
407,541
Latest member
Emilybuhman

This Week's Hot Topics

Top