File size cap when saving to csv?

BuJay

New Member
Joined
Jun 24, 2020
Messages
7
Office Version
365, 2019, 2016, 2013
Platform
Windows
I am seeing really, really strange behavior right now. Would appreciate any and all thoughts.

I created an excel file with 2500 columns and 100,000 rows and saved it as a csv to practice loading large csv files into Python using Pandas. I was able to load the 100,000 row file with pandas successfully.

Then, I simply copied the 100,000 rows (excluding headers) and pasted in rows 100,001 through 200,000 to create a 200,000 row csv file. I was able to load the 200,000 row file with pandas successfully.

Then, I simply added another 100,000 rows (excluding headers) and pasted in rows 200,001 through 300,000 to create a 300,000 row csv file. I was able to load the 300,000 row file with pandas successfully.

Here is where is gets strange. The size of the 300,000 row csv is 4.19 GB.

When I open that file and add another 100,000 rows and save as a 400,000 row csv file, the file size remains 4.19 GB and something is corrupting the csv file as its structure appears to change and I cannot load it successfully.

I am deducing that something is corrupting it during save process. Any thoughts?

As an aside, I know there isn't any real reason to use this process for large files - I get that. I am still curious as to what is going on. Also, it is not a python nor pandas issue.

Thanks
 

Some videos you may like

Excel Facts

Which came first: VisiCalc or Lotus 1-2-3?
Dan Bricklin and Bob Frankston debuted VisiCalc in 1979 as a Visible Calculator. Lotus 1-2-3 debuted in the early 1980's, from Mitch Kapor.

BuJay

New Member
Joined
Jun 24, 2020
Messages
7
Office Version
365, 2019, 2016, 2013
Platform
Windows
Any help or thoughts?
 

JonXL

Active Member
Joined
Feb 5, 2018
Messages
354
Office Version
365, 2016
Platform
Windows
No advice, just a question: When you open the 'corrupt' CSV in a text editor, what exactly is wrong with its structure, etc.?
 

BuJay

New Member
Joined
Jun 24, 2020
Messages
7
Office Version
365, 2019, 2016, 2013
Platform
Windows
It's strange - I can run the "error_bad_lines" command but it throws basically every row as an error after the initial error row....the whole point is that I can't really open the .csv's in excel to look at the rows or when I do, everything looks fine.....
 

jay_py

New Member
Joined
Mar 29, 2020
Messages
7
Office Version
2016, 2013
Platform
Windows
Since you are only reading the csv using Python, it's not the cause of the problem.

A .csv file is basically similar to a .txt file, which means you can open a .csv with any text editor. Try to open your large file with a text editor, look through the columns and rows - are there anything extra that shouldn't be there? I suspect maybe you copy/pasted something extra...
 

Subscribe on YouTube

Watch MrExcel Video

Forum statistics

Threads
1,106,111
Messages
5,509,269
Members
408,720
Latest member
mohamed nageeb

This Week's Hot Topics

  • Turn fraction around
    Hello I need to turn a fraction around, for example I have 1/3 but I need to present as 3/1
  • TIme Clock record reformatting to ???
    Hello All, I'd like some help formatting this (Tbl-A)(Loaded via Power Query) [ATTACH type="full" width="511px" alt="PQdata.png"]22252[/ATTACH]...
  • TextBox Match
    hi, I am having a few issues with my code below, what I need it to do is when they enter a value in textbox8 (QTY) either 1,2 or 3 the 3 textboxes...
  • Using Large function based on Multiple Criteria
    Hello, I can't seem to get a Large formula to work based on two criteria's. I can easily get a oldest value based one value, but I'm struggling...
  • Can you check my code please
    Hi, Im going round in circles with a Compil Error End With Without With Here is the code [CODE=rich] Private Sub...
  • Combining 2 pivot tables into 1 chart
    Hello everyone, My question sounds simple but I do not know the answer. I have 2 pivot tables and 2 charts that go with this. However I want to...
Top