Starting again from a different approach

Horkenheimer

New Member
Joined
Feb 13, 2020
Messages
16
Office Version
  1. 2019
Platform
  1. MacOS
Hi there,

Thanks to Fazza, I am going to start again with a clear description of my intended goal and the size of the project. I need to select some key information and combine two enormous worksheets of 100,000 plus rows and 15 plus columns.
1) The first task is the following. On one worksheet I need to keep column A and B. I need to then align the first large number in one column (highlighted in yellow) and the second large numbers in the next column, etc. Every row has at least one large number. (One row has 30 large numbers, so I will need a total of 30 columns of large numbers). This number is a reference number. Everything else will be erased.

index_nouns_V1.1.xltm
ABCDEFGHIJKLMNO
140an#s#p%p;71136797211511437014853735147310501365885568442275408203
141a'mann08363213
142a-bombn102756049
143a-horizonn08676100
144a-linen2700080
index


I was kindly offered the formula =IFERROR(INDEX($F142:$AH142,0,SMALL(IF(LEN($F142:$AH142)>3,COLUMN($F142:$AH142)),COLUMNS($AI142:AO142))-COLUMN($F142)+1),"") (see column AP in the .xltm below) for the above task and it worked perfectly. But I had to fill 30 columns and 100,000 rows with the formula, which made the worksheet extremely heavy and it keeps crashing the program every time I attempt to clean up the file by clearing the seemingly blank cells (in which the formula evaluates to "") and switching the other cells to values.

2) The next large task is to connect each reference number (highlighted) with an associated code which is located on another sheet. Ultimately, the hope is to have each large number have an adjacent column with the code number. I have started this task in the second .xltm, as you can see.

For this task I used: =VLOOKUP(AI1,'/Users/kedrickjames/Desktop/dict 2/[data_data_connected.xls]data'!$A:$B,2)

Index_nouns_V3.xlsx
AIAJAKALAMANAOAP
14213679721231511437014853735147310501365885568442275408203
143836321314 
index_organized
Cell Formulas
RangeFormula
AJ142:AJ143AJ142=VLOOKUP(AI142,'/Users/kedrickjames/Desktop/dict 2/[data_data_connected.xls]data'!$A:$B,2)
AP142:AP143AP142=IFERROR(INDEX($F142:$AH142,0,SMALL(IF(LEN($F142:$AH142)>3,COLUMN($F142:$AH142)),COLUMNS($AI142:AO142))-COLUMN($F142)+1),"")
Press CTRL+SHIFT+ENTER to enter array formulas surrounded with curly braces.


The target lookup sheet has 82221 rows. Each large number has an associated code in column B that I need, as you can see below:

data_nounsV1.1.txt
AB
25210073
26214453
27219143
28221193
29230833
30232803
31234513
32239533
33244443
34249003
35263903
36273653
37280053
data_nounsV1.1


3. ULTIMATELY, I will end up with column A & B (from the first graph) then 30 columns containing only the codes (in the end I won't need the large reference numbers)...

Does this make sense? Any guidance on this project would be extremely helpful. The problem is the files are so big they keep crashing my BRAND NEW desktop Mac... So maybe I need to start again and approach this differently?

Thanks in advance!!!!
 

Excel Facts

Waterfall charts in Excel?
Office 365 customers have access to Waterfall charts since late 2016. They were added to Excel 2019.
Could you clarify some details for me, please? You have a large table that you'd like to review row-by-row, beginning in column C and moving to the right, with the objective to identify all large numbers in that row. Those large numbers are to be consolidated somewhere, beginning at some column far to the right in the same row. If more than one large number is found in the row, then those numbers are shown in sequential columns to the right. Is this correct for Step 1?

How is a "large number" defined...is it any number greater than some threshold?
Do the rows (after column B) contain only numeric data?
Is this task to be done only once, as might be the case if cleaning up a data set, or is it to be repeated regularly?
I considered suggested the possibility of transposing the entire table and then sorting the results from largest to smallest to bring the largest values to the top of the table, but you'd have to break the original table up into manageable pieces to work within maximum column limits of Excel.

Would a simpler formula help with the crash issues?....or perhaps store the largest 30 values from each row in an array? Below is a notional example using the simpler LARGE function.

MrExcel_2.xlsm
ABCDEFGHIJKLMNO
1Notional starting table largest values
21234in array
3keep A3keep B3956811555670542222415956811670555956
4keep A4keep B4684692724590873963460963873724692963
5keep A5keep B5803295911168526552321911803552526911
6keep A6keep B6479828396715818376669828818715669828
7keep A7keep B7862843268468153771591862843771591862
8keep A8keep B8557678442881313103119881678557442881
9keep A9keep B9846444637549659944583944846659637944
Sheet20
Cell Formulas
RangeFormula
K3:N9K3=LARGE($C3:$I3,K$2)
O3:O9O3=LARGE($C3:$I3,{1;2;3;4})
Press CTRL+SHIFT+ENTER to enter array formulas surrounded with curly braces.
 
Upvote 0
Could you clarify some details for me, please? You have a large table that you'd like to review row-by-row, beginning in column C and moving to the right, with the objective to identify all large numbers in that row. Those large numbers are to be consolidated somewhere, beginning at some column far to the right in the same row. If more than one large number is found in the row, then those numbers are shown in sequential columns to the right. Is this correct for Step 1?

How is a "large number" defined...is it any number greater than some threshold?
Do the rows (after column B) contain only numeric data?
Is this task to be done only once, as might be the case if cleaning up a data set, or is it to be repeated regularly?
I considered suggested the possibility of transposing the entire table and then sorting the results from largest to smallest to bring the largest values to the top of the table, but you'd have to break the original table up into manageable pieces to work within maximum column limits of Excel.

Would a simpler formula help with the crash issues?....or perhaps store the largest 30 values from each row in an array? Below is a notional example using the simpler LARGE function.

MrExcel_2.xlsm
ABCDEFGHIJKLMNO
1Notional starting table largest values
21234in array
3keep A3keep B3956811555670542222415956811670555956
4keep A4keep B4684692724590873963460963873724692963
5keep A5keep B5803295911168526552321911803552526911
6keep A6keep B6479828396715818376669828818715669828
7keep A7keep B7862843268468153771591862843771591862
8keep A8keep B8557678442881313103119881678557442881
9keep A9keep B9846444637549659944583944846659637944
Sheet20
Cell Formulas
RangeFormula
K3:N9K3=LARGE($C3:$I3,K$2)
O3:O9O3=LARGE($C3:$I3,{1;2;3;4})
Press CTRL+SHIFT+ENTER to enter array formulas surrounded with curly braces.

Thanks for your response, KRice. To clarify, each row is a word that has one to thirty (at the very most) values that are attached to each. I need to tidy this up so that the first value of each word is in one column, the second value is in the second column etc.
I will then create an adjacent column in which to put the code associated with each of those values.

index_V2.xlsx
ABCDEFGHIJ
50680headn554625881320872561905710182584849928211516007
50681head-shrinkern105076011810183045
50682head_achen1435001620
50683head_blightn1424095520
50684head_cabbagen11896407207730316
50685head_cabbage_plantn1189640720
50686head_coldn1416924120
index_organized

In the above graph column C is the first value, column D is the code associated with column C. Column E is the second value and column F will be the code associated with Column E, and so on. (In this table I have not yet used the VLOOKUP to fill column F or insert columns beside G, H, I, & J)

I was wondering if instead of using the function I have used to organize these columns (which has made this file so big it's almost unusable) I should instead create a database with macros (I am totally new to this!!!) rather than inserting functions for all of this. As you can see in the above graph, there needs to be functions in all the cells of columns EFGHIJ to catch any values, even though most often there will be no value at all. Every row has one value, way less of the rows of 2, 3, 4 values etc. Only one row (of 100,000 rows) has 30 values.
If I keep going with this methodology, should I combine the pages into one document for VLOOKUP? Maybe that would cutdown on file size?
Does this make sense?
 
Upvote 0
It sounds as if you have data sets that might be better managed with a database. But if you stay with Excel, I wonder why the first table is being constructed as you've described. For example, will data in column D ever be used? Since column D holds small values associated with Column C, but you want to extract the "large" value in Column C, is it even necessary to include the information in Column D when the table is first built?

Rather than building a large table with VLOOKUP functions that return many null results, and then "re-process" the table to purge it of extraneous information and blank cells, you might consider using VBA to construct the initial table...not to modify the current unwieldy table, but to use VBA to perform the look-ups from whatever sources they are drawn from and to write only the results that are desired into a consolidated table. If that sounds like it might be a more preferred approach, I would encourage you to post small working samples from your data sources (overwritten with other data if the original contains sensitive information) to request help with some code to pull the data into a consolidated table structure that you can use. With this approach, you wouldn't have 30 x 100,000+ formulas in the worksheet, and that should allow you to move forward with a manageable table that doesn't cause (hopefully) the crash/slowness issues.
 
Upvote 0

Forum statistics

Threads
1,214,821
Messages
6,121,755
Members
449,049
Latest member
excelknuckles

We've detected that you are using an adblocker.

We have a great community of people providing Excel help here, but the hosting costs are enormous. You can help keep this site running by allowing ads on MrExcel.com.
Allow Ads at MrExcel

Which adblocker are you using?

Disable AdBlock

Follow these easy steps to disable AdBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the icon in the browser’s toolbar.
2)Click on the "Pause on this site" option.
Go back

Disable AdBlock Plus

Follow these easy steps to disable AdBlock Plus

1)Click on the icon in the browser’s toolbar.
2)Click on the toggle to disable it for "mrexcel.com".
Go back

Disable uBlock Origin

Follow these easy steps to disable uBlock Origin

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back

Disable uBlock

Follow these easy steps to disable uBlock

1)Click on the icon in the browser’s toolbar.
2)Click on the "Power" button.
3)Click on the "Refresh" button.
Go back
Back
Top